- 积分
- 16844
在线时间 小时
最后登录1970-1-1
|

楼主 |
发表于 2023-5-22 17:59:35
|
显示全部楼层
1.查看集群状态5 E& w4 W* j9 J' O4 J: H
[root@k8snode001 ~]# ceph health detail1 W8 ?5 Y- W* q' R" W' g! g
: W9 r4 w9 q; V
HEALTH_ERR 1/973013 objects unfound (0.000%); 17 scrub errors; Possible data damage: 1 pg recovery_unfound, 8 pgs inconsistent, 1 pg repair; Degraded data redundancy: 1/2919039 objects degraded (0.000%), 1 pg degraded9 ^1 E6 y: a( k& |0 t
" G) G/ j* a* D$ J( E0 i* _
OBJECT_UNFOUND 1/973013 objects unfound (0.000%)
7 G( ~* W) I4 r l$ @- L
7 q* `, h/ A6 q X6 D pg 2.2b has 1 unfound objects
9 f4 A" E5 v( h, x4 ? N, b' l! F: ]
OSD_SCRUB_ERRORS 17 scrub errors
8 j$ T% S! y$ Z0 h2 J, p" q6 ]; o! L& D5 D5 z$ a2 |0 j; d
PG_DAMAGED Possible data damage: 1 pg recovery_unfound, 8 pgs inconsistent, 1 pg repair8 L% \9 C4 D$ O% v5 T$ u" Q
/ o5 V5 E+ q% N& w5 b: Z! [. F/ @
pg 2.2b is active+recovery_unfound+degraded, acting [14,22,4], 1 unfound
7 U% b) W& C4 j6 ^ c
9 x' k, ]) |( w1 U! q. O pg 2.44 is active+clean+inconsistent, acting [14,8,21]
5 \/ U. x$ m' E8 M/ m) _) p6 v0 X! E
pg 2.73 is active+clean+inconsistent, acting [25,14,8]
. v5 C7 X; O5 `+ o7 N- e( [5 W! [3 i% j9 Q% V+ ]) H
pg 2.80 is active+clean+scrubbing+deep+inconsistent+repair, acting [4,8,14]/ p, r) V6 O! c+ w8 i- v8 ?
6 o. v: o+ ^% }$ T# k pg 2.83 is active+clean+inconsistent, acting [14,13,6]
: U8 q1 Q& T$ P9 z, K" y5 D' b$ d' z
pg 2.ae is active+clean+inconsistent, acting [14,3,2]
+ d) c& R2 _4 K. X
0 }1 p! B* c2 K8 I" K pg 2.c4 is active+clean+inconsistent, acting [8,21,14]
8 a! R$ l. [7 a" y) N2 d; E, Y2 p
pg 2.da is active+clean+inconsistent, acting [23,14,15]
' w" O5 K% N7 U. D
% p' r, p5 E/ [2 t pg 2.fa is active+clean+inconsistent, acting [14,23,25]
$ ]# [5 j% l% N* p
* b7 n7 l3 h; f2 T: M( NPG_DEGRADED Degraded data redundancy: 1/2919039 objects degraded (0.000%), 1 pg degraded$ i( j/ j* f; C. ?2 Q: h
4 C/ G" }* w) ~1 g+ N: _0 z6 k pg 2.2b is active+recovery_unfound+degraded, acting [14,22,4], 1 unfound
- c7 @, |2 F* b% _9 _% ?1 n( l: d2 z1 U
$ o6 Z" a" i" i0 V
) N" B. W& E- e1 Q( n- i8 d' Q: G) |% d+ J
从输出发现pg 2.2b is active+recovery_unfound+degraded, acting [14,22,4], 1 unfound
1 i& H( V& b+ H- |8 e U/ [- [9 m+ q
现在我们来查看pg 2.2b,看看这个pg得想想信息。+ O: M! q3 Q0 H0 {4 K1 v
' ^ j& G g- k
+ ?9 O' B9 S4 M
7 R1 e$ d' f! X/ w( f[root@k8snode001 ~]# ceph pg dump_json pools |grep 2.2b# L4 z- h- O( T. U- { x$ v! j; w; r
c6 F8 |' x; k6 @dumped all
7 {+ J9 v' }- P( m% C: D: G& S8 H) ^1 _: _! x% Y
2.2b 2487 1 1 0 1 9533198403 3048 3048 active+recovery_unfound+degraded 2020-07-23 08:56:07.669903 10373'5448370 10373:7312614 [14,22,4] 14 [14,22,4] 14 10371'5437258 2020-07-23 08:56:06.637012 10371'5437258 2020-07-23 08:56:06.637012 0: t) J. a: b, t4 @& u, a7 J
1 w0 t) T6 a8 t& |" q* R3 k8 t0 R
# `- g$ A/ y2 p& V7 `4 P7 f& D
9 ?7 x: _6 u; X. _可以看到它现在只有一个副本# C0 }* j& ^1 b2 N" a; a
" t' _9 b9 u; W5 t% b( y5 V5 p
2.查看pg map. n. v9 P+ A# U- B
+ \6 q& }5 s, Q4 _4 ^+ e5 H) j! B7 ]3 r# @# b
[root@k8snode001 ~]# ceph pg map 2.2b
1 X, Q$ F9 b! C) ~6 R3 s
0 R8 {9 w+ V# H. [0 r& uosdmap e10373 pg 2.2b (2.2b) -> up [14,22,4] acting [14,22,4]
1 o2 o, @. ?/ n2 Q5 {" R
/ ]& o6 n4 ~! c" F- I
8 R3 A2 G0 C% C2 u+ H! i, k
. z1 ]) S! @% [6 T3 u从pg map可以看出,pg 2.2b分布到osd [14,22,4]上 o' ?* p$ r1 y2 [/ r# O, {
+ o& q3 w% g. d1 P' G' z
' o3 `7 J( }8 n9 v# U: Q3.查看存储池状态
( `) r% s( s, z; ~# G' c4 C5 g+ M. s! D
4 A# q" j( }2 F# ~& s. E! k[root@k8snode001 ~]# ceph osd pool stats k8s-1' ?! F; n( t* g# F
% T$ P+ ?8 X! Ypool k8s-1 id 22 U9 |* Y& I+ B8 }% y* k& h
) K8 k) |$ _8 Z( v# n) ^
1/1955664 objects degraded (0.000%)
, ^0 [# a" f1 ?5 U
^' C- S. n, A5 n4 e2 ` 1/651888 objects unfound (0.000%)
; V- T2 N+ z" _
- `$ e2 C& M2 t1 h client io 271 KiB/s wr, 0 op/s rd, 52 op/s wr
6 c1 r, C" ~. T3 O( w$ g; v3 y% c4 B# J! m& e
% H5 z- ]& A l. G6 i& {
0 \. b( |6 L1 G* h5 Q[root@k8snode001 ~]# ceph osd pool ls detail|grep k8s-15 G5 Z6 X3 }0 R5 f4 @( k
2 y9 G; o' I: T# p0 o- P: npool 2 'k8s-1' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 88 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
/ e* a$ ?* D) R7 O
3 Z' l2 u) \: E) |; @# Y/ j _$ K
$ v; k+ s! Y# c$ D- r3 a. c$ B4 {3 W3 i' g0 S- ]# \0 b) w5 U
4.尝试恢复pg 2.2b丢失的块
4 g: f; N, D/ ~& ^[root@k8snode001 ~]# ceph pg repair 2.2b
8 c# f1 }. W) C0 ~( g% ]/ h Y6 ?& } Q; i- ?
8 X0 |3 L, I6 X+ l0 } a0 E4 j/ ~$ t6 t6 d
如果一直修复不成功,可以查看卡住PG的具体信息,主要关注recovery_state,命令如下* m A2 }) b4 @% ` U
1 j: }0 j5 T, V' I& T6 R# E* m
& g8 g) i* w( F) G# V
! h* |, e4 E1 i) {+ }* p6 M/ m[root@k8snode001 ~]# ceph pg 2.2b query
- _. R; b- h3 p/ w H- o& n
( k0 x% k- @4 v* B5 v* x, W4 e{
& T- D! K9 i1 l- u: C' E/ f' l- I+ B
"...... K6 w1 D' s+ o$ S
* B) Y7 `, ?6 G' p6 S
"recovery_state": [3 h$ {1 c+ `* Y+ x9 m
" q% I1 a7 S6 j+ r {
) q% y& L: W% `! z: Z. Z4 k/ w
3 G4 ^. f: V% T4 O/ |# p "name": "Started/Primary/Active",
% ]0 m8 g7 B9 f
7 w j/ l" \3 g1 ` R4 G "enter_time": "2020-07-21 14:17:05.855923",) _4 b4 h5 b) a& h1 e2 j# H$ h
; D4 _6 _1 q' W J0 l1 D$ V% s
"might_have_unfound": [],: Z7 `- u6 W# U, z/ \- O( w
+ z9 l2 q6 G% i! A0 R "recovery_progress": {
1 F9 o8 l) a# m8 u8 k# v- K4 U" _3 V$ ^4 Y7 e7 q; m0 K+ U$ k
"backfill_targets": [],5 A6 t2 R* `4 a
! P0 L3 H' C% e& V7 A% S1 e4 K3 l
"waiting_on_backfill": [],
2 ]. U' N; C T0 G* t9 Q* S: [
2 K9 _5 ?. u" u3 k+ z "last_backfill_started": "MIN",
% s# w1 z" P) U7 @6 ]1 K) m
j( h1 T$ a( Z" B9 N "backfill_info": {
# ^' A9 d* A* K% ?' D+ f! _$ o B
1 x C" y' G# ^ Y& F0 W "begin": "MIN"," m* H" d( w* E- j0 Z+ x0 y; c
2 Q8 I R) d3 S9 l
"end": "MIN",
, T! y* a/ i6 a. h+ \% s& T4 r U% j
"objects": []
1 T- _ ?$ \% X1 r: U: P& U' K; W
$ d p! S* L4 G/ c. p6 w1 s9 X/ u },
; P D9 E& z. L- c
" c5 j+ G. K3 z3 f! H$ f9 P- r "peer_backfill_info": [],0 j( l$ J' T9 s7 f
% h8 Y) r: T& |: Q
"backfills_in_flight": [],! [8 Q) w1 T/ s9 ^) \6 F
4 v' D0 \8 i- B "recovering": [],
8 A' v1 @$ t1 @3 ~4 Y3 z) @+ l& @
9 U1 u+ l" E4 [5 R, ?; \ "pg_backend": {
: t k+ a7 O' s1 `
+ S/ M5 R5 v! U. U "pull_from_peer": [],
; f) ^2 n# j. ]- V( K/ ~9 z
1 ?5 z- J' y3 {0 o1 W "pushing": []* A6 C G# s7 h+ e+ f& s; y& l
% a/ t+ I9 I# d6 Z) ~
}& |( ?/ g+ d: d4 q( T, h( L
1 l8 |" [) j6 u) @. r. k3 M" l# P
},, k# y& U2 S0 {
3 v/ X0 [9 q6 U7 ^9 U
"scrub": {; D5 y8 @/ h) A( o+ Z
9 U/ @0 M. \ [$ ~) g9 k
"scrubber.epoch_start": "10370",( C( t/ ~) w( S
& n) ~2 E, O. p' t, E/ A7 ?8 \1 _& N "scrubber.active": false,
[) k( c6 M% j# w
! E5 E9 x7 M# d2 W: i, M "scrubber.state": "INACTIVE",
5 _9 n/ a6 j8 a6 y
8 H) J6 H3 b( P2 | "scrubber.start": "MIN",! n$ O9 J1 g/ [/ ]
4 o6 k! o, ^% Z8 H6 c$ p/ @+ D
"scrubber.end": "MIN",3 b J1 K/ b- r
; N7 }3 `! v1 i* F "scrubber.max_end": "MIN",9 \* f$ q, X" y2 @
7 B" s8 J! B; g "scrubber.subset_last_update": "0'0",
! Z; o& U4 J ]+ @8 y$ p
6 l9 ^7 J; G( {( E0 R2 G8 G. G "scrubber.deep": false,8 Z" Z0 u+ U. N. T
* Z4 ?8 U& N- ?+ { "scrubber.waiting_on_whom": []
0 } S; V: ^/ j
; d$ ?: j. {3 Z' G4 ]: v2 a* B; H( p7 F! W }) s1 U( u( S/ I$ F. {/ f8 m7 ^
/ ?! Y% [4 v" y
},
! @6 X. _6 R1 ?+ u8 ^4 s
2 r2 t$ M% R& L$ N4 ~' ^+ { {
& W/ g8 u+ K, d5 q/ X2 d2 {* p/ `" ^ X
"name": "Started",, m8 a a' p2 Y+ w' Y( P% O
8 g9 m- O& k3 t; X( y" e "enter_time": "2020-07-21 14:17:04.814061"" F1 t y8 L0 h& N% h d
& ?% N, J* d1 }2 { }4 x: C4 ^6 M) P+ \* s: X" c
0 {# e& t: Z; i: [& H8 c ],3 o0 N0 V: `" S2 y( Q+ b1 W
% |7 W( P( }: M& G5 ] "agent_state": {}1 r# M! D; W! a+ _! X" x! s- a
6 \+ t3 h* P8 S}* d6 u, ?# o; `' K5 h; F! I
7 z% E! D) p" n! r8 R
' I- \( J6 b% C% P0 y6 J! j# Q; B
如果repair修复不了;两种解决方案,回退旧版或者直接删除# u4 N5 F5 U$ E# q/ K; O
' W6 Z, _8 L- k% R& ^5 u6 W5.解决方案
! {: l7 W: v0 b+ {- r回退旧版" `# o& S4 h( \% p: d
& q6 @4 k2 _- h
[root@k8snode001 ~]# ceph pg 2.2b mark_unfound_lost revert
- U5 G, L: M7 A4 |' P- s o$ b
- H5 g3 Y& q& h# q3 O$ [直接删除. R' R1 [5 s/ w6 [
- @8 b( H* i2 _: X1 Q2 s" ?9 K[root@k8snode001 ~]# ceph pg 2.2b mark_unfound_lost delete9 z) N- m1 W( `, X+ p) Z2 F( F
' `% z" v# n2 Z
' x/ g* {- \" I# P+ R8 z2 C$ }; ]9 q: w+ h, J b0 S# L E
6.验证
5 x. u/ W8 R1 v我这里直接删除了,然后ceph集群重建pg,稍等会再看,pg状态变为active+clean! B- k3 C, R2 @+ s4 z2 z) v
u1 V% g% M* m[root@k8snode001 ~]# ceph pg 2.2b query
; ?, g, h0 y2 v
9 w) V4 u& J% ?& z- `$ l+ w% ~{
4 `! A0 }7 @: q, [' O5 r- G# M
"state": "active+clean",5 G, y3 |- N0 O2 {2 S
' m' u b- b$ I% T. _, ~' e
"snap_trimq": "[]",3 c9 H: I+ v1 O. ~& ~( a! M
* d: I U8 U; v' Q" H "snap_trimq_len": 0,
% P; U" N; |0 H, N6 L
6 P0 G6 h( w3 W( L/ W* N "epoch": 11069,
- y3 m# U6 G+ y! J
/ K8 ]$ [" a5 ]% h4 H/ V "up": [
+ @7 ? o5 m9 Q0 I+ `, R/ `4 @& X' C' \! E; c
12,1 B" F9 F* X9 `" {5 E' }1 c6 ]
p1 M- }2 F, M) e; m+ F' T
22,' H N+ ^. @/ n6 J
. {9 l4 q9 g1 {, o+ ~ 4/ ?5 y( W) W% ]2 ~! N0 Z
! J5 o. D1 ?) j, G( n% S6 X% h
],, o9 I9 `$ F; G
- r @5 x# d' u% B. }
K" `( o, g, `7 _* k) a& b4 v2 j$ g* D3 m* J# b
再次查看集群状态
0 ^0 Q5 T- V* \1 O6 n5 H( @- ]. A# m9 }
[0 d" |0 J) X& r0 j# I6 ^) H/ q9 y( v C+ C
[root@k8snode001 ~]# ceph health detail8 p) D b" e# S1 G, n0 t3 U, F
$ }( }; d9 C/ g4 Q, R# k( d
HEALTH_OK( D0 |) }- D* {9 l- K
' O, Q& ?% x, S7 ^' L# k$ U
|
|