ceph 集群处理stale的pg* E, w5 E' u: ^" H$ r% }$ }. ^5 @
处理过程- 首先用ceph pg dump|grep stale 找出所有的stale的pg
- 然后用 ceph force_create_pg pg_id
- E6 o( s* K) h; {) d# P" Z Q
如果做到这里,可以看到之前的stale的状态的PG,现在已经是creating状态的了,这个时候一个关键的步骤需要做下: - 重启整个集群的OSD4 ^9 u3 `% o6 s: t
在重启完成了以后,集群的状态就会恢复正常了,也能够正常的写入新的数据了 7 a3 g1 P0 U8 X- A } m. \% ^5 R! }
[root@mon1 ~]# ceph pg dump |grep stale + ]' O# A& _/ l( k# Q% A
dumped all7.385 19460 0 0 0 0 4363375984 1547 1547 stale+peering 2022-08-07 18:30:16.932885 9719'4511237 110154:5399674 [14] 14 [14] 14 9719'4511237 2022-08-06 07:29:51.095989 9719'4511237 2022-08-02 00:57:43.318114 0 7.2a6 19409 0 0 0 0 4324918151 1542 1542 6 `, L! @) }+ ^
stale+peering 2022-08-07 16:09:26.464409 5938'4407602 99931:5800676 [15] 15 [15] 15 5938'4407602 2022-08-06 07:36:00.102984 5938'4407602 2022-08-01 23:45:58.573722 0 8.39 280 0 0 0 0 0 1597 1597 stale+peering 2022-08-07 16:09:26.461915 5938'2119986 99931:2386270 [15] 15 [15] 15 5938'2119986 2022-08-05 21:32:12.656384 5938'2119986 2022-08-01 22:58:58.614188 0 7.34 19337 0 0 0 0 4278284806 1580 1580 $ e, M1 o: t: a3 G1 |3 V" P* n% B
stale+peering 2022-08-07 16:09:26.461100 9719'4369235 99931:5261881 [15] 15 [15] 15 9719'4369235 2022-08-06 08:22:37.168815 9719'4369235 2022-08-04 21:34:38.449584 0 7.1d8 19383 0 0 0 0 4332924749 1593 1593
! V, v; o) ~! E% u: J+ {: P9 R
stale+peering 2022-08-07 18:30:16.914876 9719'4456286 110154:5409919 [14] 14 [14] 14 9719'4456286 2022-08-06 09:09:03.624425 9719'4456286 2022-08-02 01:25:18.343799 0 7.1e6 19375 0 0 0 0 4342149879 1564 1564 stale+peering 2022-08-07 18:30:16.930931 10754'4463130 110154:5047778 [14] 14 [14] 14 10754'4463130 2022-08-06 01:41:35.137028 10754'4463130 2022-08-04 21:39:21.624235 & O$ d1 g0 g" B' |5 l. P
[root@mon1 ~]# ceph pg 7.385 query
0 o, @3 I3 a5 f/ y' Q" R
Error ENOENT: i don't have pgid 7.385 " j$ G* z% k+ u0 s4 {/ n
[root@mon1 ~]# ceph pg 7.385 query Error ENOENT: i don't have pgid 7.385 ' G0 K' D. ]6 y/ ]3 d
[root@mon1 ~]# ceph pg 7.2a6 query " b* I0 x4 G7 J) x2 v9 V& _& e
Error ENOENT: i don't have pgid 7.2a6 [root@mon1 ~]# [root@mon1 ~]# [root@mon1 ~]# [root@mon1 ~]# cd /backup/ 9 ]! ?2 j$ n" f7 i- v8 P6 O
[root@mon1 backup]# ls
s7 x/ `. ]. \8 i
osd14pgs osd15pgs pgback-osd14 pgback-osd15 pgexport.sh
/ i7 W1 g4 A( q6 g+ _# a
[root@mon1 backup]# cd pgback-osd14/
2 ?! L- c0 x9 Q+ _! l& |5 [0 f. O
[root@mon1 pgback-osd14]# ls |grep 7.385osd14pg-7.385.file + d8 B5 c+ z! U, d g7 Y
一定要有这样的提示,才能回滚: Error ENOENT: i don't have pgid 7.385 先停止osd服务: ; _. e2 m' j2 a
导出ceph-remove [root@mon1 pgback-osd14]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-14/ --id=14 --op export-remove --pgid 7.385 --file /tmp/osd14-7.385pg : @/ N ]; \1 z9 l" h$ G
导入pg
& m+ w0 ^) ]' c! M' Z- s, m
[root@mon1 pgback-osd14]# sudo -u ceph ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-14/ --id=14 --op import --pgid 7.385 --file osd14pg-7.385.file ' x+ ]4 [$ n6 }5 e' p) N
启动osd:9 h; u1 g; N- U n; p5 h$ i S m
- i- o- `2 l; C. h5 i
& r' d/ G' a% @7 F* S, z
1 n k( b% g7 v/ o# T/ M4 r
: v4 |1 {+ B' A& ]没有的需要重建:+ E4 l3 V0 ]+ b; _: j
处理方法: 第一步,找到stale状态pg ceph pg dump |grep stale 第二步:重新创建pg ceph force_create_pg $pg_id
3 G( o$ L, a4 c$ A$ ^9 t* Q b9 \
|