ceph 集群处理stale的pg$ L5 A3 W& t. s, V9 G' G: c1 W" Z
处理过程- 首先用ceph pg dump|grep stale 找出所有的stale的pg
- 然后用 ceph force_create_pg pg_id; {) m4 M8 j7 W# g
如果做到这里,可以看到之前的stale的状态的PG,现在已经是creating状态的了,这个时候一个关键的步骤需要做下: - 重启整个集群的OSD
2 e+ ~: Q" I$ `- \4 d t/ a
在重启完成了以后,集群的状态就会恢复正常了,也能够正常的写入新的数据了 ) N% U0 ]2 x# `) }' U J
[root@mon1 ~]# ceph pg dump |grep stale
9 {; I3 V6 E' s) }, \
dumped all7.385 19460 0 0 0 0 4363375984 1547 1547 stale+peering 2022-08-07 18:30:16.932885 9719'4511237 110154:5399674 [14] 14 [14] 14 9719'4511237 2022-08-06 07:29:51.095989 9719'4511237 2022-08-02 00:57:43.318114 0 7.2a6 19409 0 0 0 0 4324918151 1542 1542
I% P4 A3 S$ \" s' t- R0 L6 [1 E g
stale+peering 2022-08-07 16:09:26.464409 5938'4407602 99931:5800676 [15] 15 [15] 15 5938'4407602 2022-08-06 07:36:00.102984 5938'4407602 2022-08-01 23:45:58.573722 0 8.39 280 0 0 0 0 0 1597 1597 stale+peering 2022-08-07 16:09:26.461915 5938'2119986 99931:2386270 [15] 15 [15] 15 5938'2119986 2022-08-05 21:32:12.656384 5938'2119986 2022-08-01 22:58:58.614188 0 7.34 19337 0 0 0 0 4278284806 1580 1580 5 |4 k8 K; n) |0 R4 B
stale+peering 2022-08-07 16:09:26.461100 9719'4369235 99931:5261881 [15] 15 [15] 15 9719'4369235 2022-08-06 08:22:37.168815 9719'4369235 2022-08-04 21:34:38.449584 0 7.1d8 19383 0 0 0 0 4332924749 1593 1593
! {# ?3 j- p* t8 o
stale+peering 2022-08-07 18:30:16.914876 9719'4456286 110154:5409919 [14] 14 [14] 14 9719'4456286 2022-08-06 09:09:03.624425 9719'4456286 2022-08-02 01:25:18.343799 0 7.1e6 19375 0 0 0 0 4342149879 1564 1564 stale+peering 2022-08-07 18:30:16.930931 10754'4463130 110154:5047778 [14] 14 [14] 14 10754'4463130 2022-08-06 01:41:35.137028 10754'4463130 2022-08-04 21:39:21.624235
! z* `2 t9 h* y/ k* [9 M
[root@mon1 ~]# ceph pg 7.385 query ' T: Y8 Z7 B# T; P8 x5 n( j
Error ENOENT: i don't have pgid 7.385
1 R0 j* V: w) R. k3 c
[root@mon1 ~]# ceph pg 7.385 query Error ENOENT: i don't have pgid 7.385 ) z% Z/ ^2 n* W# `3 i( U0 F- A5 g
[root@mon1 ~]# ceph pg 7.2a6 query ) }$ k4 w( |3 W/ J5 {, T6 Q/ B0 f7 L7 d
Error ENOENT: i don't have pgid 7.2a6 [root@mon1 ~]# [root@mon1 ~]# [root@mon1 ~]# [root@mon1 ~]# cd /backup/ " ?; A' I3 I, H d8 C4 T! O4 e2 [
[root@mon1 backup]# ls + p+ \' @" F! H/ H
osd14pgs osd15pgs pgback-osd14 pgback-osd15 pgexport.sh 9 L3 o }. R1 E* j% ?
[root@mon1 backup]# cd pgback-osd14/
: B1 f' K8 v' }, G
[root@mon1 pgback-osd14]# ls |grep 7.385osd14pg-7.385.file
1 [: D! @ t) C1 E8 n) G0 J' C+ y2 M
一定要有这样的提示,才能回滚: Error ENOENT: i don't have pgid 7.385 先停止osd服务: ' ?) d. B R8 @5 ]: T& C$ ?" x) ?
导出ceph-remove [root@mon1 pgback-osd14]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-14/ --id=14 --op export-remove --pgid 7.385 --file /tmp/osd14-7.385pg
+ h3 L6 b; N: {& @8 z( M
导入pg
. K ]1 F2 Q q2 w, ?1 [
[root@mon1 pgback-osd14]# sudo -u ceph ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-14/ --id=14 --op import --pgid 7.385 --file osd14pg-7.385.file ' T! \- ?) Q9 z0 H6 H8 H3 ]5 @ B5 M
启动osd:
0 d7 _# g% }# \& d% ?0 m
4 \# U: x& g7 i1 V! e4 g! F, i; D: |. h4 d+ l
3 e' I5 V+ w9 J o2 V* w
! m- g/ {1 U& g& I& R. _* G5 h
没有的需要重建:8 Y) Y9 [/ V T4 Y! O3 W
处理方法: 第一步,找到stale状态pg ceph pg dump |grep stale 第二步:重新创建pg ceph force_create_pg $pg_id 5 X1 G, X. `2 X1 h
' J* `3 l0 ^0 `6 e, t |