|
|
当mon节点全部出现问题的时候或者单独一个节点出现问题时恢复过程0 _0 u8 z2 x& V8 [* K; o8 K$ c
4 V8 N& S/ f5 ~
: | V/ o" |) b: @6 kceph一直无法正常的执行ceph -s命令;
6 K& z& _) x7 ~+ g8 ]3 I
/ z; p0 }# X( t, K! r; f; a2 T: P! V8 m5 ?7 a7 U9 o
4 l! u d& I# I/ c+ Z* Q
ceph分部署存储告警monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]2 L6 f: F) X! y8 t; i: V+ s
, F4 G3 a3 j1 f' Z; p8 ^2024-10-17T22:33:47.295+0800 7f20fe7fc700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]( w2 R3 J v1 ? \
2024-10-17T22:33:47.297+0800 7f20ff7fe700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]+ T# g9 g$ g. ]: u) |
y5 {2 x$ {. J" `! k8 T
0 ^: \8 @3 m, \! }' j
1 f# b* I% n: y- |+ O3 g+ u5 [; H环境中也就只有gm268-3节点因重启失败夯住是好的,gm268-1和gm268-2都已经被损坏。只能想办法从3上入手解决。/ }/ F+ m0 y" E# A( G
' P: ~1 M1 D! B$ B+ b6 t8 W结果过程:
, r7 H# g: m1 e1 X9 @2 @, j
* o& x' y+ H; l! L \+ I3 ^1、在gm268-3节点上导出monmap文件:
0 H, w9 W& I' U
. @2 E2 {- l0 k' p4 W/ }0 a; L5 O" v2 E+ i! ~# J) M$ c
$ monmaptool --create --clobber --fsid ce68aab8-8f46-11ed-88c0-ac1f6b3a30b9 --add gm268-3 10.12.3.2:6789 --add gm268-2 10.12.2.2:6789 --add gm268-1 10.12.1.2:6789 /tmp/monmap
4 N0 N' w& o l1 m* J9 E3 Zmonmaptool: monmap file /tmp/monmap& k' T' g# z9 `, b9 X. X9 s
monmaptool: set fsid to ce68aab8-8f46-11ed-88c0-ac1f6b3a30b93 i! I7 u; m' L' n# B4 ?
monmaptool: writing epoch 0 to /tmp/monmap (3 monitors)
+ @! n E! \! d5 |5 J( k8 ?" k& h3 C! D* A6 I8 V
5 t) x4 ]1 Z U+ T( G, P9 J g0 b+ `. D导出monmap,好的节点写在前面,后面把所有的坏节点加上就可以了。1 N) W' x2 D$ K+ S1 o0 P4 r
+ I6 m h# Y; ?. r/ V' s
查看下导出的文件信息:) C* d$ |0 W }1 \ v% e/ C3 ?, n; ?) [
9 i" D- C/ E! o5 a9 Q/ e
$ monmaptool --print /tmp/monmap + M0 B$ Y% k' ^; t
monmaptool: monmap file /tmp/monmap
7 B. ^$ G" q, z. B& J6 nepoch 0" }& l {6 k$ W7 I
fsid ce68aab8-8f46-11ed-88c0-ac1f6b3a30b90 [7 B9 `* l; n+ M
last_changed 2024-10-18T13:17:03.645872+08005 N$ J1 h! [3 i8 f8 |
created 2024-10-18T13:17:03.645872+0800
7 X( |- n8 q% q. S$ `9 R+ i! }' V8 ~min_mon_release 0 (unknown)$ ^/ |( X+ |6 A- Z: U( E0 A
0: v1:10.12.1.2:6789/0 mon.gm268-1
/ X( M, ?! v* p, h" g1: v1:10.12.2.2:6789/0 mon.gm268-2. a" M Z l" L/ A5 K
2: v1:10.12.3.2:6789/0 mon.gm268-3
8 s" r. n2 a! T' [" |) S3 K6 |) ]0 k: e: Y8 T0 ^& r- v
* U. S. ~6 R7 k& w4 s
% d" j! y* a- H+ `! }% i# ]2、去gm268-1和gm268-2的节点上找到/var/lib/ceph/mon 目录,备份下。删除掉。因为文件被修改了,导致文件有异常,没有导致认证出问题。原有的/etc/ceph/目录不能删除。/ ]- H+ _5 c/ K- O
2 _4 C) k% L5 y3 U3 r8 a! \. M9 a/ a7 u& m' P
3、将正常节点上keyring和导出的monmap文件传送到其他两个节点上:+ O( t. J: D6 G" }+ Y
! l" _% n. P F: w. b scp /var/lib/ceph/mon/ceph-gm268-3/keyring gm268-2:/tmp/
) P2 z$ ?( z' u& V* pscp /var/lib/ceph/mon/ceph-gm268-3/keyring gm268-1:/tmp/
; D( @& b2 y& g( |+ D
# b! }" }0 P" P* h. l nscp /tmp/monmap gm268-1:/tmp/- A1 E- f8 c% }$ m2 O0 I
scp /tmp/monmap gm268-1:/tmp/$ u) A. ^1 s! M( ]# {- G
3 i0 @5 V, F* u$ F& q
8 z6 K5 w" g! P# j6 u0 _, m% V4、重做gm268-1和gm268-2 节点mon 0 ?1 R2 {1 z2 W5 _& C2 _9 z
ceph-mon --cluster ceph -i gm268-1 --mkfs --monmap /tmp/monmap --keyring /tmp/keyring -c /etc/ceph/ceph.conf
3 m7 }; z: @. L2 B' z0 \% r4 W+ |0 N# M9 @3 e, r0 {! G
切换到/var/lib/ceph/mon目录下
' _9 @$ b0 K( C1 V* ~执行:
. O( d& j) g' g9 Z( kchown -R ceph:ceph mon/: F. S1 n2 `8 l
) D, O! `1 A( i启动mon服务:# t- p7 q3 y) @
systemctl start ceph-mon@gm268-1.service
- H$ S, N( R, I& Q5 T9 {
( {9 O" |7 }7 Q9 x1 m9 q3 h- O查看服务:
% O5 Y7 @9 t3 z' `6 M
4 Y" ?$ k) b! Y$ systemctl status ceph-mon@gm268-1.service
. ?; V. a4 M( m0 S" `● ceph-mon@gm268-1.service - Ceph cluster monitor daemon4 H/ k9 b& @2 p, _: T$ r. D
Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
# }2 D) X& Q' o! ?; A Active: active (running) since Fri 2024-10-18 13:21:24 CST; 38min ago4 G$ i. l4 D1 U8 f: a9 s$ t$ R1 {
Main PID: 664542 (ceph-mon)
% _( p/ m) `4 ?( W# o/ b Tasks: 27
# P! Y+ T7 T( T) } Memory: 286.0M2 X# c/ ], i! |0 P1 t- R
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@gm268-1.service
& o/ v+ V8 U' }$ s9 d └─664542 /usr/bin/ceph-mon -f --cluster ceph --id gm268-1 --setuser ceph --setgroup ceph
( z0 q/ ^$ I0 z8 x, @: n& D' _
) J& t! E1 x7 b/ \. {Oct 18 13:21:24 gm268-1 systemd[1]: Started Ceph cluster monitor daemon.
# c7 `- T/ d b( W2 D$ y, mOct 18 13:21:24 gm268-1 ceph-mon[664542]: 2024-10-18T13:21:24.793+0800 7fcc5f804700 -1 mon.gm268-1@0(probing) e11 stashing newest monmap 11 for next startup
7 l# o! N R: q2 O! ?5 HOct 18 13:21:24 gm268-1 ceph-mon[664542]: ignoring --setuser ceph since I am not root/ _0 n. Q2 }2 J! U! N
Oct 18 13:21:24 gm268-1 ceph-mon[664542]: ignoring --setgroup ceph since I am not root+ G2 c6 I+ T) V6 s
/ x0 A' \* z3 ^8 }
) _$ N. ^! g3 z! z# C5 u节点修复完成。
. L5 B: E" m5 b" S/ I节点二上8 I6 L c; h8 \% H. I: k# i \
2 Z/ n3 w, I/ v
ceph-mon --cluster ceph -i gm268-2 --mkfs --monmap /tmp/monmap --keyring /tmp/keyring -c /etc/ceph/ceph.conf
, L, I; \; i/ e, A. ?# b$ z% z. I
2 H4 ]- s: V. i o0 ]# [- ^切换到/var/lib/ceph/mon目录下2 y- e# c% P2 {3 \. Y0 {* U9 d' Z
执行:9 Z# Q, t! P1 P5 C. z
chown -R ceph:ceph mon/
& q( c1 V) C: Q
& P+ R% S) o+ R& K& h; \4 W启动mon服务:
+ x& s0 p5 ^1 a; bsystemctl start ceph-mon@gm268-2.service9 c+ \/ E4 T, h6 t
7 `# n& j {7 d4 S
' R: d# _: E# r; {* n
6 X( g" C& s2 S$ systemctl status ceph-mon@gm268-2.service
" J; a3 N: y% r, C! Z● ceph-mon@gm268-2.service - Ceph cluster monitor daemon
; p7 p7 t& K/ B8 t Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)$ |- g' A( J2 N( \
Active: active (running) since Fri 2024-10-18 13:09:42 CST; 51min ago% X0 B Z3 S( C, `# e8 `$ S- A
Main PID: 157382 (ceph-mon)
; [) Z; G+ H0 i9 Y Tasks: 27, n5 R: R! O" I
Memory: 587.1M, L+ E# [" l: d+ q( V
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@gm268-2.service H. `0 @; ~: C
└─157382 /usr/bin/ceph-mon -f --cluster ceph --id gm268-2 --setuser ceph --setgroup ceph: B w& n. Z+ {& E7 h, b
! p5 V- C7 S T
# V6 E! G- B" f7 k& F. s9 }" r
|
|