|
|
尝试二、修复down掉的osd4 u/ k" H: N. w6 R2 {7 N1 r
1 ~( t1 ^- R$ @- J7 m% a- P
该方法主要应用于某个osd物理损坏,导致激活不了7 Q: }9 `/ n( _/ i4 ^4 ?( q
" M! ?4 u/ o0 B4 F1、查看osd树
- L1 o- w- b; K& e* A* e* C
) e8 A8 {/ }; v& o7 m* W' q* D复制代码! A/ z/ d' X4 ^+ c: j( h
root@ceph01:~# ceph osd tree. D: k. X% j* c2 p* Y
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY 6 M/ _- |2 Z7 P! }& {, F
-1 0.29279 root default
" l5 g4 y; D- L+ K* l5 W-2 0.14639 host ceph01
3 a7 x' z1 u6 U 0 0.14639 osd.0 up 1.00000 1.00000
& \( v4 V4 s6 m5 e, y# z-3 0.14639 host ceph02
7 w8 U- G3 k: j) y" ] 1 0.14639 osd.1 down 0 1.00000 $ D- N+ q( N% n/ T3 u
复制代码
2 n2 w# Q4 K$ _5 g( d2 `2 \发现osd.1是down掉的。/ b( W/ o' x7 W7 Z7 W S
& X) x* N1 `* N' f
2、将osd.1的状态设置为out! ]- N* q" k/ }8 {
7 y5 \( f$ u: E
root@ceph02:~# ceph osd out osd.11 S2 c. q/ d5 {( A% X
osd.1 is already out. 5 C w( y6 p0 q" d9 G0 A
3、从集群中删除6 w3 z1 i' c' p6 H- X
0 k P) c# ]; c: c9 U3 ]5 W
root@ceph02:~# ceph osd rm osd.1 * O: q- n( _' R( \& R
removed osd.1
# @) P* Q; v/ o; G4、从CRUSH中删除
" ?" C, N% a2 p6 e
6 k9 D2 F0 {" a9 z6 Q5 lroot@ceph02:~# ceph osd crush rm osd.1
# ~7 g+ e/ F6 G8 Premoved item id 1 name 'osd.1' from crush map7 R6 C6 _" {% s$ u' |
5、删除osd.1的认证信息
1 C1 \$ Z% {8 G- s) q- ~7 _0 s/ ?
/ C. ?% P' Z6 Z6 {) Jroot@ceph02:~# ceph auth del osd.18 x' }6 N( l* k
updated
* m2 _3 @1 j/ M {4 v6、umount
" N- F+ K: P2 e0 X$ i3 g8 S; y& N; V; h3 ^
umount /dev/sdb19 d+ ?9 w, ?8 ]
- K. y/ T1 |' y6 p; N
! J5 \. z! l* q8 C, ?* s8 U( w
7、再次查看osd的集群状态' M9 ]9 ]8 Q; v- t& O$ n
/ e+ o% c& }! F/ g
复制代码6 E9 q( G' ^ w( C- H
root@ceph02:~# ceph osd tree
% a$ b1 x1 U* W, [9 v0 OID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
; R( Y, G* E+ N% K6 x. j1 G-1 0.14639 root default
$ g- ^9 u4 Q3 G0 M$ V-2 0.14639 host ceph01
+ g/ b0 K6 B' I, K 0 0.14639 osd.0 up 1.00000 1.00000
7 ^! D- K* ]9 b* q" ?+ V) K-3 0 host ceph02
0 ~# T" V! Z5 n z" Y! P复制代码) B& B" p1 a' _, _; M
8、登录ceph-deploy节点
9 o& G/ M# r4 v8 c) t
1 Y/ j3 z5 a9 @9 h! h# z* }root@ceph01:~# cd /root/my-cluster/8 H: g/ j& y/ k, d/ B% @
root@ceph01:~/my-cluster# # E, ?: t e) x- k6 ]" V
9、初始化磁盘' ~9 R2 h: u: j7 \! H
L6 w5 B( A3 b( d5 M8 q$ o' I1 Y
ceph-deploy --overwrite-conf osd prepare ceph02:/dev/sdb1
5 U) m/ {* X- w" _3 K
X: O5 W* B* z4 L- \在后面的版本中需要重新执行添加osdceph-deploy osd create node1 --data /dev/sdb
5 J" \: e! N0 p% ^# Y1 h; F
$ y# O4 ]+ U$ C. |$ H7 N10、再次激活所有的osd(记住是所有的,不只是down掉这一个)4 y' `( H1 o0 a' d) T. j' w
! j" Y; Y* F4 Z: |* A6 Rceph-deploy osd activate ceph01:/dev/sdb1 ceph02:/dev/sdb1
3 a: Q3 J( W9 ?0 {( W) k" D0 x11、查看osd树和健康状态
, [& a; I# |- k9 Y( x4 M9 V
# U7 U, b7 T% |7 M) |+ I: ^复制代码
7 e/ G* z% B* U x( x/ [root@ceph01:~/my-cluster# ceph osd tree* N2 Y+ v. [* \6 H. l* [$ k
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
4 Q8 n! @9 c$ u" W( Z! m) q2 E-1 0.29279 root default
- O! m; c6 b0 z# j8 x. I N-2 0.14639 host ceph01 " |4 M) q/ E4 j8 n" Q7 _
0 0.14639 osd.0 up 1.00000 1.00000
. I9 h) T" Z5 t& b# L+ X. F/ ~* ]-3 0.14639 host ceph02 ' |# ?5 L7 g% e; r' {7 |: W
1 0.14639 osd.1 up 1.00000 1.00000
; j) {% k6 O2 ^& `root@ceph01:~/my-cluster# # W9 N) [( K3 ~, m/ Q; W; y4 s" L5 U1 D
复制代码
5 E2 g' o: o; a, r( l1 p复制代码
u/ q+ d1 I+ broot@ceph01:~/my-cluster# ceph -s
. ]/ c) k1 s& W6 s+ G( t) N) V: d' @ cluster ecacda71-af9f-46f9-a2a3-a35c9e51db9e6 W0 J7 |( c& u- M# f
health HEALTH_OK1 G) y8 m3 q! W7 n1 z! Y: x
monmap e1: 1 mons at {ceph01=10.111.131.125:6789/0}
9 V7 l; K$ ?/ v& Q/ s6 b' O) f election epoch 14, quorum 0 ceph01
2 \. E! U4 J( \$ \* B8 H osdmap e150: 2 osds: 2 up, 2 in5 {% [9 k/ D$ }1 B- f" j. A) U- }5 @
flags sortbitwise,require_jewel_osds
Y- v* P/ h' T! H% s6 R" I" i% \ pgmap v9284: 64 pgs, 1 pools, 17 bytes data, 3 objects
p7 s% v3 d5 F- H. c( A/ D) R 10310 MB used, 289 GB / 299 GB avail
* h% b' |( b Z% q5 R 64 active+clean! Q- i: \+ J5 ] ~0 J7 G
复制代码! H% o, V0 i' }/ L. V, p* W& ~
只有为 HEALTH_OK 才算是正常的。7 f3 B' X/ o# l9 G! y
+ t ]7 a9 z" X. {
* M0 ^3 F9 s& ]$ B. c' V T |
|