ceph 集群处理stale的pg0 o6 \( v- U) K' }" o4 E% f
处理过程- 首先用ceph pg dump|grep stale 找出所有的stale的pg
- 然后用 ceph force_create_pg pg_id
5 u% Z. M+ u, L5 y4 i
如果做到这里,可以看到之前的stale的状态的PG,现在已经是creating状态的了,这个时候一个关键的步骤需要做下: - 重启整个集群的OSD7 H; H* z/ e1 \, a' I1 X: f
在重启完成了以后,集群的状态就会恢复正常了,也能够正常的写入新的数据了
( k3 q' e9 ^ R) w* |; u[root@mon1 ~]# ceph pg dump |grep stale 8 J. r% U; V0 \2 M
dumped all7.385 19460 0 0 0 0 4363375984 1547 1547 stale+peering 2022-08-07 18:30:16.932885 9719'4511237 110154:5399674 [14] 14 [14] 14 9719'4511237 2022-08-06 07:29:51.095989 9719'4511237 2022-08-02 00:57:43.318114 0 7.2a6 19409 0 0 0 0 4324918151 1542 1542
[' W% }$ R4 k- \
stale+peering 2022-08-07 16:09:26.464409 5938'4407602 99931:5800676 [15] 15 [15] 15 5938'4407602 2022-08-06 07:36:00.102984 5938'4407602 2022-08-01 23:45:58.573722 0 8.39 280 0 0 0 0 0 1597 1597 stale+peering 2022-08-07 16:09:26.461915 5938'2119986 99931:2386270 [15] 15 [15] 15 5938'2119986 2022-08-05 21:32:12.656384 5938'2119986 2022-08-01 22:58:58.614188 0 7.34 19337 0 0 0 0 4278284806 1580 1580 - W. F8 T: Q9 H, J1 O
stale+peering 2022-08-07 16:09:26.461100 9719'4369235 99931:5261881 [15] 15 [15] 15 9719'4369235 2022-08-06 08:22:37.168815 9719'4369235 2022-08-04 21:34:38.449584 0 7.1d8 19383 0 0 0 0 4332924749 1593 1593 : |, R$ E4 n2 R) ~
stale+peering 2022-08-07 18:30:16.914876 9719'4456286 110154:5409919 [14] 14 [14] 14 9719'4456286 2022-08-06 09:09:03.624425 9719'4456286 2022-08-02 01:25:18.343799 0 7.1e6 19375 0 0 0 0 4342149879 1564 1564 stale+peering 2022-08-07 18:30:16.930931 10754'4463130 110154:5047778 [14] 14 [14] 14 10754'4463130 2022-08-06 01:41:35.137028 10754'4463130 2022-08-04 21:39:21.624235 6 Q* V7 i) p) Z5 M
[root@mon1 ~]# ceph pg 7.385 query
* |& Q$ R0 [0 k2 ]
Error ENOENT: i don't have pgid 7.385
9 p2 `. Q0 _; V* P
[root@mon1 ~]# ceph pg 7.385 query Error ENOENT: i don't have pgid 7.385
5 K+ G' s* P) a0 V, a% z6 q
[root@mon1 ~]# ceph pg 7.2a6 query 5 K! t) n, R) {# r- K" _/ U
Error ENOENT: i don't have pgid 7.2a6 [root@mon1 ~]# [root@mon1 ~]# [root@mon1 ~]# [root@mon1 ~]# cd /backup/
: R$ t% c- c. y7 @5 p: w
[root@mon1 backup]# ls % [/ B" a: k1 b- u Q) S
osd14pgs osd15pgs pgback-osd14 pgback-osd15 pgexport.sh 9 ~! s/ \" J9 W5 ^6 w
[root@mon1 backup]# cd pgback-osd14/
' u6 R7 L/ ?. y" ~2 R& |5 ]
[root@mon1 pgback-osd14]# ls |grep 7.385osd14pg-7.385.file
* G" g, p* H2 m. c
一定要有这样的提示,才能回滚: Error ENOENT: i don't have pgid 7.385 先停止osd服务:
& s! G3 l- b0 k; W
导出ceph-remove [root@mon1 pgback-osd14]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-14/ --id=14 --op export-remove --pgid 7.385 --file /tmp/osd14-7.385pg
7 r0 {& Q' v! | {0 a( P* t
导入pg 2 G2 ]; Q) z, B, c" Q Z
[root@mon1 pgback-osd14]# sudo -u ceph ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-14/ --id=14 --op import --pgid 7.385 --file osd14pg-7.385.file ( d8 R, W B3 G- j
启动osd:
: i( K3 A6 \" z) }
" s+ R1 Y; P/ |$ s# ~
5 b4 s3 J0 m7 H6 f
1 n/ X+ C8 H' F# R4 z0 j! Y, o) O
- m6 Z, k& z2 ]5 b/ t S没有的需要重建:) f# L e# Q" s. y* ~
处理方法: 第一步,找到stale状态pg ceph pg dump |grep stale 第二步:重新创建pg ceph force_create_pg $pg_id $ c8 m& b0 ~% {4 G! [' P
& n3 @' A& V& u& b- }7 ? |