找回密码
 注册
查看: 3364|回复: 4

health: HEALTH_WARN Reduced data availability 100.000% pgs unknown

[复制链接]

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
发表于 2021-7-20 17:00:03 | 显示全部楼层 |阅读模式
cluster:
+ n: d. w1 V: q[root@compute01 src]# ceph -s; C+ r2 R3 B4 e- h
  cluster:: h: c0 L0 D" v# c- d
    id:     31403b11-8a1e-432f-876e-5a2c852f9dcc; h: y8 A* j, u7 C
    health: HEALTH_WARN
" H$ ]2 n6 f) w" N* N2 G; y' l& c            Reduced data availability: 640 pgs inactive
) i8 Z$ {( t, j* o+ u ; ]. n* M; O5 I5 E) e; a* ~9 `
  services:
9 J- ]& b! c. t  e: j    mon: 3 daemons, quorum compute01,compute02,compute03 (age 42m): G$ |3 i* Q4 Z$ @: V2 p
    mgr: compute01(active, since 42m), standbys: compute02, compute03
; H$ k! h1 R9 w( B5 s3 Q& C! W    osd: 3 osds: 3 up (since 26m), 3 in (since 26m)2 Y/ b% }3 P+ q: |% l
- K6 Y, }9 H! j' \
  data:
* W6 W, A$ f; N5 u: K! t    pools:   6 pools, 640 pgs
, M8 E2 ^( P* E/ D5 a    objects: 0 objects, 0 B6 ^' z4 q, M5 P. @
    usage:   3.1 GiB used, 3.3 TiB / 3.3 TiB avail8 O* C# @6 U- }/ A" T) h6 [) }
    pgs:     100.000% pgs unknown
5 q3 o  @( l, S, W9 i0 c) J" g             640 unknown) _2 i  o, C* f# P( p, @5 @

1 x& n) z+ j9 ~6 {) N) @; E( ]遇到问题,一直处于这种状态:4 y+ S; ?% q* f6 i( z. t
3 M% D% I+ n9 V- Z9 T. T! m% X0 b
导出文件:
& y' O4 x+ L1 I5 h
8 J/ k/ ^7 C3 E- t[root@compute01 ~]# ceph osd crush tree
  ^' I1 A. H0 V, XID CLASS WEIGHT TYPE NAME    9 S1 x; c1 {( L2 z. _
-1            0 root default ' d/ D0 ?" `9 h. r

3 M- |# F) p; d8 C. o) ]4 Z1 S9 R/ d* j+ h. _2 ^' d3 D+ n' f) p
发现什么都没有,缺少东西$ ^3 p  p: L. m/ H
" ?% Y5 h3 K2 W0 q# v
[root@compute01 ~]# ceph osd getcrushmap -o /tmp/mycrushmap: x/ }) h# E. Q
12
7 t' c" ~; Y2 h导出的数据只有12行,少了很多。$ E3 L6 ~( j2 |. o8 S
+ Z% L) h$ h6 t* K6 U; {5 r
- f" o4 l% |1 h
转换成可以读的文件:3 k+ N" N7 ?/ s! d! c# t
2 j8 T! j& e: y  M& u0 K
[root@compute01 tmp]# crushtool -d /tmp/mycrushmap > /tmp/mycrushmap.txt( n& t* k7 `4 ?9 J  d
0 O. _; Q! s% X7 m! B$ o& z! h
[root@compute01 tmp]# crushtool -c mycrushmap.txt -o mycrushmap2$ a4 {+ u- i/ o- X/ F
item 'compute01' in bucket 'default' is not defined
" K$ c6 W8 ~( N) T  P9 J( W/ r[root@compute01 tmp]# vim mycrushmap.txt , Z; c* t: L- u/ X: [# _
[root@compute01 tmp]# crushtool -c mycrushmap.txt -o mycrushmap2
7 V8 z! |  L7 Z7 I- R: D# J4 v转换的时候发现缺少东西;
! Q' |) S. [& q. h0 W再次编辑:
# l  c" |% @" n3 ^[root@compute01 tmp]# vim mycrushmap.txt 9 q3 a7 d9 ]3 M. s: ]
2 S; x5 l( @6 {0 o" g

$ T+ v* [6 G& }& \( B1 }" b# r# begin crush map
7 z1 ~6 O/ r5 ptunable choose_local_tries 0! \' x' m4 P0 f# b
tunable choose_local_fallback_tries 0
9 {% P+ p" q! h5 ?' K- Ltunable choose_total_tries 50
8 H( f5 J, W+ D! Ptunable chooseleaf_descend_once 15 J* o" Y6 Q, K; c% t# s
tunable chooseleaf_vary_r 1
1 C$ u1 S' t& }8 [2 s& e5 F7 `tunable chooseleaf_stable 1
0 v- x0 k- B7 z( a- J3 Jtunable straw_calc_version 1( j/ K& h- f' @- {) z$ |) r# B- W
tunable allowed_bucket_algs 54
) q/ g5 ]: i  i$ T3 P% ~4 A# devices
$ K8 K$ V6 C( J' H+ ldevice 0 osd.0 class hdd
2 ?9 n9 r, s2 k$ B  r' Pdevice 1 osd.1 class hdd
2 K# K' T: G' G; Sdevice 2 osd.2 class hdd
8 H3 S2 v% C; }! ~8 I# types
- i0 K6 X4 R7 ?' m. E: |5 \type 0 osd
5 U; w' [  D+ a, x. I& _type 1 host0 I. f, c4 j* Z5 `) ^9 W4 Z+ p
type 2 chassis: a( z6 z- }$ \. [% E; s
type 3 rack3 g0 j( w% G  Z
type 4 row
4 x. [# V% {4 ^  o+ Ftype 5 pdu' b' Y5 O4 v3 X5 G+ [# ^* g
type 6 pod, C' @3 l. x" L3 ~
type 7 room) u5 m4 h/ P$ w# r6 G0 n$ n
type 8 datacenter
. `% W9 i, s: V  H. f% @type 9 zone1 F1 j; c, V$ J2 _- w+ L
type 10 region+ Z) T! T9 W2 y+ ^' }6 Z
type 11 root
5 v: N% _# S) ^9 {' C% W' i% C# s7 A+ h" }6 N' L) @: U2 E5 s
# buckets
  e- C6 W. b) r3 Broot default {
* t: A8 S4 D6 o, s; s- `& ] id -1  # do not change unnecessarily
2 a/ ~+ I: x4 t/ V id -2 class hdd  # do not change unnecessarily
9 d" s- M; y0 R # weight 0.000
4 C! _* e7 q4 i2 o3 G7 N  s alg straw2  X' H) H7 E8 u/ E, @) F/ m7 w- `
hash 0 # rjenkins19 z$ N# V$ L5 @) t
0 s9 s3 N9 I3 r
}
, T7 h: k* Q) \, j# rules/ c; T& \8 N4 T, J( ]! _0 P
rule replicated_rule {
& i# `: k1 N+ n9 X. x+ j id 0% R/ V8 l+ n5 p' E' @( y
type replicated( o5 j0 e5 e+ R' h' Z7 H: s
min_size 1( M& |1 [' K' e: W
max_size 10# F" t+ D1 M. f. K$ m
step take default
5 j0 {, \' }. T& I4 }# D step chooseleaf firstn 0 type host( F* b+ o4 |% a- e( w
step emit, s4 d8 {: h" r6 p
}
. |3 V( x4 C7 `, m# end crush map( S6 K9 K0 u; c$ Q* U; [

! z8 w1 L) C0 g# y, `: T. _' \
6 h3 W3 j$ E6 `9 E/ e7 ~8 m; u发现少了很多东西,添加上吧:
  n; Q' F0 j: |7 @) g9 j8 Y. R6 u
$ ?- ^; @" Y7 c( g. E! k# begin crush map( F& {& I/ j- _6 t! i
tunable choose_local_tries 00 M3 ^) L4 w8 E1 P) a, a5 U
tunable choose_local_fallback_tries 0: v6 m% a, s7 ^
tunable choose_total_tries 50* p3 ^! o" i/ o
tunable chooseleaf_descend_once 1
0 _- g- i4 U! [/ a' z- W! M" htunable chooseleaf_vary_r 10 |* @9 ]) r- |* Y1 F
tunable chooseleaf_stable 1* W; Z9 U$ Q3 X- Z+ x' n1 I, X
tunable straw_calc_version 1
. O9 ~: i, u; q2 }* e% A# Dtunable allowed_bucket_algs 54
( }5 S' e4 o- g7 ?; M0 A# devices! d6 `4 v9 `) }
device 0 osd.0 class hdd
" a4 ]% y: [$ |: B- ^device 1 osd.1 class hdd
% A+ o) D2 G8 }5 r+ R4 t7 ]  i3 Adevice 2 osd.2 class hdd
# l/ R3 E4 A$ z) G- F, @% J# types! P0 {) c$ e( o. \  `  g6 O1 u# {- v% ?
type 0 osd# p0 t- E+ Z/ Q6 _2 f- S
type 1 host
+ U" Z" L0 _% h- O% y( Qtype 2 chassis
9 ]6 A3 S2 ]* E9 a- z) |type 3 rack
$ \8 Y# n, \2 |% d$ P& A  a" @type 4 row2 O+ Q" e0 S( T" F
type 5 pdu/ S1 ]# W% S# \0 X7 D  X. w
type 6 pod
% j/ Z; W. V  d. G+ Y' Mtype 7 room
* F# E$ H3 @! |1 q  X3 U0 `5 k# ctype 8 datacenter
# o2 u* [( r& p6 r, z/ Atype 9 zone
3 h  H; Q+ c. b2 v7 mtype 10 region
# v8 C' q, }: ^$ xtype 11 root
; E& @5 @5 o( |: ^host compute02 {
. Q. ?% q5 f6 O4 ]4 |( L$ R; ~: P+ d        id -3           # do not change unnecessarily0 B2 F4 V8 G; b( _+ z7 K/ Z
        id -4 class hdd         # do not change unnecessarily# u% A% H( C6 H( m. ^% g/ o( o! |
        # weight 1.000
1 i$ |4 \3 d6 P  H7 l) B        alg straw2% i% B/ Z  y+ ^# v* P$ L. }" y2 c
        hash 0  # rjenkins1
3 r: h0 \- r) ]! |. b        item osd.0 weight 1.000* p3 W* q* }9 ]4 }1 F2 e
}
* r$ f" v' I/ N. O- O0 Chost compute01 {  h: P) v; ]# P0 K
        id -5           # do not change unnecessarily5 \8 X( r( a, e' {4 b9 T( x' S
        id -6 class hdd         # do not change unnecessarily
5 ~8 Z  u; q9 m: v1 k! j        # weight 1.000
% n" `6 h6 _/ W2 U+ L        alg straw2
: M& R8 }' Y& r4 b9 U# m1 N+ x        hash 0  # rjenkins1& X' C$ a9 M3 c$ ?& M0 L/ F2 ]) j
        item osd.1 weight 1.000
( ~4 O) ?& @  C}4 w9 a2 {  b7 l" @' y: z+ v
host compute03 {1 x) L# {. x; ^& I- z9 J, \
        id -7           # do not change unnecessarily
- _6 @9 q' x& ?2 _7 c- S9 _1 O8 ~        id -8 class hdd         # do not change unnecessarily) P1 w6 @" Z. l0 n7 o" i
        # weight 1.000$ z/ G' E9 m$ O+ e% H6 F9 H7 S' u
        alg straw2
+ \0 m3 s0 d  w1 m4 Q( X        hash 0  # rjenkins1
! X, i4 ?+ g% a        item osd.2 weight 1.000! u" b  a& i  A. u
}
7 p3 {& I9 t; x# J4 ^, \
# buckets6 i/ A: y8 X; s1 y
root default {* P! X0 k  D3 [& ?# |' o
id -1  # do not change unnecessarily! H, `: q7 t6 k! J! w" z
id -2 class hdd  # do not change unnecessarily
3 [. P: S! E" {; s* t) r3 _ # weight 0.0007 G% p! w7 B" P4 [) n
alg straw2+ x. C# @+ J3 i$ ?7 f
hash 0 # rjenkins1/ A1 S& P8 R8 l+ [. e" o
       item compute02 weight 1.000
- `# H6 w3 K( l        item compute01 weight 1.0009 ]5 X2 I, {. V" G  t2 O- M
        item compute03 weight 1.000" C3 o0 S" P0 V5 @$ s% E
}
" K3 ~- x7 c0 s7 L% ?
# rules* j+ |; M$ C! a5 t
rule replicated_rule {$ y* U% F! O9 U3 ~- Q. Q4 B1 x
id 0/ J$ l# Z6 x* P4 W/ v; J
type replicated2 T' K: N8 A5 s; ]6 b6 i: Q4 j
min_size 15 g' K8 o: J- `# l  m; o; L% N
max_size 10
) i# u1 q6 o0 u4 ^. l step take default
( ^6 o+ V- A  {6 [* q" f$ K. r4 y step chooseleaf firstn 0 type host
- j# h& v. A* g step emit% Q! Y( b* J( f. t' E
}' m3 _# G# u: M0 q: l9 n
# end crush map1 Q+ K# t: ]: E) B( P
* P+ l. F  C  @: ?  \2 m- Z

$ e2 \/ e, L# F4 \% x: ]4 {9 W添加好之后,检查下对应关系,因为ceph节点和osd添加顺序的问题,导致1节点和2节点颠倒了,要注意这个地方,其他忽略;; _9 f; s/ k# N4 o" j8 m0 C8 b
转换成ceph认识的文件:9 c) c2 D) A: U0 S: G4 Q8 T
[root@compute01 tmp]# crushtool -c mycrushmap.txt -o mycrushmap2: U0 v% X' I! i. M
* s0 D" @  ]6 Q& I2 F( R
[root@compute01 tmp]# ceph osd setcrushmap -i /tmp/mycrushmap2 ; `2 T0 U9 K. ~* {! z' Z
13
' @! E2 W0 ]! h3 `- r[root@compute01 tmp]# ceph -s( j7 i3 g& X5 ~6 d* ]% c* A. g( l6 s
  cluster:1 J- c+ b* X3 Y
    id:     31403b11-8a1e-432f-876e-5a2c852f9dcc4 U- e: C, g. E- s+ x  R0 t8 S
    health: HEALTH_WARN5 V5 G9 Q. A+ J8 z# I
            Reduced data availability: 212 pgs inactive
$ e0 l$ Y  B; P( a
0 a" G3 x8 P( K% {# F! f  services:
5 z( x  F, v0 `5 i* D3 W    mon: 3 daemons, quorum compute01,compute02,compute03 (age 56m)% F5 c* f# }' U  h; X$ X) [
    mgr: compute01(active, since 56m), standbys: compute02, compute03$ R1 P6 C5 k+ b# O5 t$ X1 n3 e
    osd: 3 osds: 3 up (since 40m), 3 in (since 40m)* a% G* v' k8 |

( L! l/ ]8 \) V  data:
, M7 o. ~0 c9 g+ {: e    pools:   6 pools, 640 pgs$ |# d8 Z) q, E1 ?5 k+ I; H
    objects: 0 objects, 0 B
& g3 s4 U; }/ t( A: f0 I    usage:   3.1 GiB used, 3.3 TiB / 3.3 TiB avail
  Z4 K  q8 W5 a$ k7 ]7 F& i    pgs:     33.125% pgs unknown
0 U* v& r. O" p9 q# l             428 active+clean
- H/ y2 N3 ^3 d6 H  O             212 unknown. M; W2 n% I4 f1 H/ z9 H+ g+ h

/ g7 g% n! @9 e[root@compute01 tmp]# ceph -s
# X% H( o+ a+ Z" X  cluster:
0 S7 F3 s( g3 B/ F3 h    id:     31403b11-8a1e-432f-876e-5a2c852f9dcc# T) u. b1 a7 K& m$ i
    health: HEALTH_OK% |- D8 X+ f! W! S7 [& ?( l6 M  h
: m! o) j/ Z/ Y! V6 c& `- u
  services:
" T' J, i% g+ T0 s+ l0 x) X    mon: 3 daemons, quorum compute01,compute02,compute03 (age 56m)' R/ H( \8 X: Q7 ^! K3 [; {" F
    mgr: compute01(active, since 56m), standbys: compute02, compute03
% B0 J( z, F9 q    osd: 3 osds: 3 up (since 40m), 3 in (since 40m)' D% v# D0 X- o0 r: ]( [/ I
# c3 v$ _1 ~; x2 v8 _* D; t$ w
  data:
! g  a4 `. W4 T- n' N) Y    pools:   6 pools, 640 pgs
  [  y7 x; p+ `! R1 \    objects: 0 objects, 0 B+ c2 K* `9 g/ S
    usage:   3.1 GiB used, 3.3 TiB / 3.3 TiB avail
* Y) X! t" i0 l6 K8 }8 }$ A" C    pgs:     640 active+clean
0 Q5 f3 t) ~5 o( G2 U9 Q* ^
, g6 e6 A% X5 m[root@compute01 tmp]# ceph -s$ z( n6 \* A3 H
  cluster:
: T& _: f0 ~. X5 G; I    id:     31403b11-8a1e-432f-876e-5a2c852f9dcc+ s. {2 A9 b) k! F
    health: HEALTH_OK
: i& r+ ^' Y2 `8 S" x 1 N9 u5 z7 x: _& V3 P
  services:
, e" J# G* R) ^2 X    mon: 3 daemons, quorum compute01,compute02,compute03 (age 56m)# w+ t9 z. r, H6 }- N
    mgr: compute01(active, since 56m), standbys: compute02, compute034 m/ W) w; y$ @- H% ?- j, q
    osd: 3 osds: 3 up (since 40m), 3 in (since 40m)
+ X. I6 K9 G1 @+ m/ q 4 o" G4 {* {7 m7 h0 f; n
  data:
- Q: N# s5 P+ `    pools:   6 pools, 640 pgs
; v, C0 k3 A- b4 m6 n3 H& ]    objects: 0 objects, 0 B
7 w4 A: e: x6 S' }. L0 f' x    usage:   3.1 GiB used, 3.3 TiB / 3.3 TiB avail
' D, ]. X, r' i4 ]# U" w4 n    pgs:     640 active+clean1 Y/ T+ Q% b9 T+ L/ P$ `! `
& F. d, n* p- ?
恢复正常了,问题解决。+ O5 _) r) d. h1 Z* ^' p* H! V1 B
) O1 v5 z# f6 J3 m- S7 j1 e0 N
总结下:遇到这种问题,重做依然问题存在,很头疼。只能检查到底什么原因导致的问题。6 E3 s8 v5 k5 |% ?- I

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
 楼主| 发表于 2021-7-20 17:00:04 | 显示全部楼层
[root@compute03 ~]# ceph osd tree3 ~0 W+ w% W1 @: Y1 ~4 f! [
ID CLASS WEIGHT TYPE NAME    STATUS REWEIGHT PRI-AFF
1 W) \7 F0 z: u3 m/ W-1            0 root default                        
/ r. @* z& G' g  o/ e: o 0   hdd      0 osd.0            up  1.00000 1.00000
- o* r0 X9 u) A. ]6 W 1   hdd      0 osd.1            up  1.00000 1.00000 8 R) y* A' h, v* f
2   hdd      0 osd.2            up  1.00000 1.00000
) T9 S8 Z7 F1 Y9 j[root@compute03 ~]# ceph osd tree
  C' a9 ^, f5 w, Q" kID CLASS WEIGHT TYPE NAME    STATUS REWEIGHT PRI-AFF
  F+ r' u* f% M% l5 I/ }# z7 R-1            0 root default                         . _" G6 H0 l" B3 S! m2 I( R0 w
0   hdd      0 osd.0            up  1.00000 1.00000
/ r+ `' B$ v! `1 Q9 G. S 1   hdd      0 osd.1            up  1.00000 1.00000 8 P  p' l6 l/ b$ M/ M
2   hdd      0 osd.2            up  1.00000 1.00000 / t! ]2 ?& }" E" V1 e: T
! z# M, V& Q+ h1 {0 ]
  f$ s1 G9 b& @
刚开始并没有发现什么问题,但总觉得有点奇怪,哪里怪呢?就是有点别扭,反正没有找到。
" C4 ~" l$ r, Y. u( H
" E2 K. g8 z4 L后来正常了,才发现有些东西发生边了。少了一些描述:; \, b1 s$ b# K. M  v( \4 c

) y$ \, U) l9 f+ h! L& _[root@compute01 tmp]# ceph osd tree
8 B+ N) i, O" j. cID CLASS WEIGHT  TYPE NAME          STATUS REWEIGHT PRI-AFF $ Z7 H, @# l: `
-1       3.00000 root default                              
" T' I* @5 l; A/ S-5       1.00000     host compute01                        
0 g. m& O$ N$ y' S" | 1   hdd 1.00000         osd.1          up  1.00000 1.00000
4 P+ v0 u) r$ D  C, ^5 v! N' u: W-3       1.00000     host compute02                        
/ g3 `+ l6 n* Z7 r; f 0   hdd 1.00000         osd.0          up  1.00000 1.00000 0 g! z# H, G9 ?# V
-7       1.00000     host compute03                        
7 n. x! ^# F$ R! {' b) _ 2   hdd 1.00000         osd.2          up  1.00000 1.00000 3 V" ]. n, R7 u3 l

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
 楼主| 发表于 2021-7-20 17:46:05 | 显示全部楼层
完成修复过程。

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
 楼主| 发表于 2022-10-17 14:56:13 | 显示全部楼层
,解决步骤:! ]1 w1 w. x2 p

" N! Y4 @. k8 U9 l( `' _3 Fceph osd crush add-bucket ceph1 host
8 f# k) f) Z, M! Y/ c' Z2 ?多个主机节点,添加多个主机host名称,方便管理
# G3 f1 _' g5 k6 q$ ?默认的话就移动到default root下:9 ^9 U" ]; |" M9 c  \) q3 E% t8 c
ceph osd crush move ceph1 root=default0 C0 ?& Z6 ~' B% r
将多个都移动到default下
0 k1 G  |; K/ w4 o6 W, G, L) B1 [
  a; _3 x' u# i; h" K, ~这里是默认下配置host6 E) r6 d) D8 V
ceph osd crush set osd.0 1.00000 host=ceph14 H1 F  ^1 E' Z5 {: T
ceph osd crush set osd.1 1.00000 host=ceph1

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
 楼主| 发表于 2022-10-17 14:56:31 | 显示全部楼层
ceph osd crush add-bucket ceph1 host
2 `# n3 f0 C+ C: j* U0 |3 n1 E3 X0 H% G3 w
默认的话就移动到default root下:
" `4 h0 H2 P$ Q+ q+ U% E1 {ceph osd crush move ceph1 root=default5 ?4 ?$ Q: h& {) c
1 ?$ a8 X. E# W' K

& a  t; _3 U$ z, c6 I这里是默认下配置host
5 y0 n3 Y3 ^, w/ a2 Sceph osd crush set osd.0 1.00000 host=ceph1
0 O5 I' B5 }5 c% d9 q5 j) Eceph osd crush set osd.1 1.00000 host=ceph1
2 p5 `7 V/ N0 K2 `$ k即可完美解决上诉的问题,前面的姐姐方法虽然可以,但是比较繁杂,没有此项步骤简单快捷。
" Y) x/ q& M4 h& `* a
您需要登录后才可以回帖 登录 | 注册

本版积分规则

返回首页|Archiver|手机版|小黑屋|易陆发现技术论坛 ( 蜀ICP备2026014127号-1 )

GMT+8, 2026-6-11 23:01 , Processed in 0.025490 second(s), 22 queries .

Powered by Discuz! X5.0

© 2001-2026 Discuz! Team.

快速回复 返回顶部 返回列表