|
|
当mon节点全部出现问题的时候或者单独一个节点出现问题时恢复过程- X! f6 s {2 R- {
- A. g8 H; [) U" o, ?; m' D( | m( U, Q9 q' _
ceph一直无法正常的执行ceph -s命令;- C" N& G9 V8 u0 ?+ {. C8 M4 N
9 Z3 w2 X$ |7 L0 _' y6 z
8 I0 U, W+ z3 @3 V' O- v3 a8 k: H
ceph分部署存储告警monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
8 a* B# z3 e* R
* l" v% y# F) G- w3 q V/ |7 ^2024-10-17T22:33:47.295+0800 7f20fe7fc700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
+ b0 y) a8 \/ A4 H0 x2024-10-17T22:33:47.297+0800 7f20ff7fe700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
) B+ A" z3 V* B& Y2 l1 \, d
) W7 h9 v+ I8 e" \6 l
; M5 d, E9 i* Y6 E9 p0 A
) f3 p, x4 M5 V环境中也就只有gm268-3节点因重启失败夯住是好的,gm268-1和gm268-2都已经被损坏。只能想办法从3上入手解决。
* v" h" H# ?: I& P3 x1 y+ u0 f |, I i7 L2 x6 ]8 [
结果过程:9 R6 X. s/ G+ C& [, p* R1 m+ U, F
' f2 t+ ]8 E; X! f( M, w$ r
1、在gm268-3节点上导出monmap文件:
9 S0 L5 _: J+ q8 ?9 Z7 B$ l
$ ~+ _. s7 \8 G3 E/ ]( E) e, [7 _4 `6 A3 l+ F- [% q; T4 i
$ monmaptool --create --clobber --fsid ce68aab8-8f46-11ed-88c0-ac1f6b3a30b9 --add gm268-3 10.12.3.2:6789 --add gm268-2 10.12.2.2:6789 --add gm268-1 10.12.1.2:6789 /tmp/monmap; P' l g, n+ i8 X
monmaptool: monmap file /tmp/monmap
& N6 Z0 t( S0 V b" \5 fmonmaptool: set fsid to ce68aab8-8f46-11ed-88c0-ac1f6b3a30b9
& ~* Z) {+ V' N1 x5 Lmonmaptool: writing epoch 0 to /tmp/monmap (3 monitors)( W$ j0 J3 g4 J3 X8 q
' c1 Y" J* l& K" K
+ N) i' A& A% {3 Q. F( N导出monmap,好的节点写在前面,后面把所有的坏节点加上就可以了。' M2 N6 B& y' D+ E$ I# `2 T7 q8 ~
8 R; q) J! O+ u: [5 |
查看下导出的文件信息:. y% O0 `9 p# e
3 d; s% e5 ?; L/ m$ T/ x, u0 a" M6 V$ monmaptool --print /tmp/monmap ; _ L6 G/ u* H7 |( B& E: {1 ]
monmaptool: monmap file /tmp/monmap
4 _& @. \! s; S* Lepoch 0; c% T# \7 U: \, ?* L8 M- B3 n
fsid ce68aab8-8f46-11ed-88c0-ac1f6b3a30b9
* Z2 d8 W6 `# F# C$ B& i9 Y5 Xlast_changed 2024-10-18T13:17:03.645872+08007 a* q6 Q9 h" S8 r5 `4 w4 w. j
created 2024-10-18T13:17:03.645872+0800
) U- x7 M! j# m- K# mmin_mon_release 0 (unknown)0 `& J! t( _9 j4 ]" [! Y1 c
0: v1:10.12.1.2:6789/0 mon.gm268-1 G; z9 B2 s1 N& o N0 W
1: v1:10.12.2.2:6789/0 mon.gm268-2& v/ B/ k" E' S, b3 {
2: v1:10.12.3.2:6789/0 mon.gm268-3! F) T) h( g* N$ z3 H |! R1 `
- `2 r' s W( p" d2 @
& z# [! T- d/ s4 H
4 Z, e3 v# b' b: @+ R2、去gm268-1和gm268-2的节点上找到/var/lib/ceph/mon 目录,备份下。删除掉。因为文件被修改了,导致文件有异常,没有导致认证出问题。原有的/etc/ceph/目录不能删除。
) Q6 Y3 g/ h# |% q! l" z
' n; K. m* K6 s+ ?& `
4 ]7 }# [! s/ H3、将正常节点上keyring和导出的monmap文件传送到其他两个节点上:
B1 x1 B+ d6 w ~/ o5 `8 u! z! f8 f: Z. l8 X2 D
scp /var/lib/ceph/mon/ceph-gm268-3/keyring gm268-2:/tmp/
' ]6 v' U' M6 {! ~scp /var/lib/ceph/mon/ceph-gm268-3/keyring gm268-1:/tmp/% c' J3 k& W8 K$ g9 _. o& h, X8 H7 E
) T* @5 \6 ?* s q" fscp /tmp/monmap gm268-1:/tmp/% q8 n4 I& O( t6 n2 Y! \
scp /tmp/monmap gm268-1:/tmp/" N5 W# w* L# w* X* d0 M' M6 I. o
! Z7 K# o s3 \9 |/ D) n
, u9 Z/ u4 Y' a( u& G3 b
4、重做gm268-1和gm268-2 节点mon
* V( @9 G0 W7 k$ P) ]; c9 i" [" zceph-mon --cluster ceph -i gm268-1 --mkfs --monmap /tmp/monmap --keyring /tmp/keyring -c /etc/ceph/ceph.conf
7 l A, ~. o/ T+ G/ \* A6 D: b/ L4 f- e8 v" m6 v
切换到/var/lib/ceph/mon目录下
1 \* U0 |" N" k: ~执行:5 B$ ?0 l6 g5 ^* v) T) _; X
chown -R ceph:ceph mon/
" K1 B* }1 f9 [# k! c5 t
' G% Q, p. u% x% ?启动mon服务:* ?- P0 R( S' h1 j3 k
systemctl start ceph-mon@gm268-1.service
( k1 o$ A+ L I6 l' G
6 I/ C& z+ l( S6 T; r% X" T3 }查看服务:
% D% A# r/ y9 U* d7 n( D$ j
2 q, H, G2 f( j* X% `6 y$ systemctl status ceph-mon@gm268-1.service
$ }" @4 H V5 r6 y3 x4 P' e0 Q● ceph-mon@gm268-1.service - Ceph cluster monitor daemon. j1 b% l* ^+ E
Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
) d! H @% {5 L Active: active (running) since Fri 2024-10-18 13:21:24 CST; 38min ago" C' L% X; {1 b* M3 j$ F
Main PID: 664542 (ceph-mon)
0 [ J$ \& F6 w8 i Tasks: 27
1 p- s& v7 F" c, { Memory: 286.0M9 n: L& k1 b; [- x" G+ h0 m/ q
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@gm268-1.service
4 W D; }) c" g N5 R. q; g+ O, @ └─664542 /usr/bin/ceph-mon -f --cluster ceph --id gm268-1 --setuser ceph --setgroup ceph
# i' h2 Q$ {4 \# o: I& r/ w3 T7 X& S& v: h4 e
Oct 18 13:21:24 gm268-1 systemd[1]: Started Ceph cluster monitor daemon.1 Q$ S2 n8 U" _9 B2 i
Oct 18 13:21:24 gm268-1 ceph-mon[664542]: 2024-10-18T13:21:24.793+0800 7fcc5f804700 -1 mon.gm268-1@0(probing) e11 stashing newest monmap 11 for next startup8 o+ A6 o! B) R0 ?5 P+ n% B
Oct 18 13:21:24 gm268-1 ceph-mon[664542]: ignoring --setuser ceph since I am not root2 h8 S+ N8 H$ y2 \
Oct 18 13:21:24 gm268-1 ceph-mon[664542]: ignoring --setgroup ceph since I am not root! h9 B" m3 t i p7 G6 _
7 s* p! Q9 w- {; g" n& x0 D. A# M3 i4 s/ Z, {" E. D' y
节点修复完成。; F$ m$ p+ s2 W' F1 J) H
节点二上
7 |* {9 L' o% F0 h" l! g/ o0 c4 m W5 V" K
ceph-mon --cluster ceph -i gm268-2 --mkfs --monmap /tmp/monmap --keyring /tmp/keyring -c /etc/ceph/ceph.conf & l- e. q, V% M# {* [/ c( `
. K, N; Q" x5 ?8 {
切换到/var/lib/ceph/mon目录下( b; ?. T6 y8 O2 u8 ~
执行:$ k$ l. y! s) V" ?2 [
chown -R ceph:ceph mon/2 A d: R0 s' z% f
9 w% @! z4 c: W t6 A @
启动mon服务:
+ Y4 l1 ]7 r+ u, Esystemctl start ceph-mon@gm268-2.service
' y/ \8 k7 t/ ?. L
) q t! ^3 s' _% `% ?) S O2 C3 b+ h7 {
! t, g$ |( ?, A; p8 L5 M6 t1 X K: v
$ systemctl status ceph-mon@gm268-2.service 8 B* M6 u3 }. U6 A
● ceph-mon@gm268-2.service - Ceph cluster monitor daemon
, k: |/ y% H* f& J& l# t ^& d; Q Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)% O1 T( j* Y' j9 a, q: s7 x: Y
Active: active (running) since Fri 2024-10-18 13:09:42 CST; 51min ago$ v: R+ C8 k4 i: u$ l3 F. U! A5 P0 h
Main PID: 157382 (ceph-mon)
, n* H) }+ ~5 K Tasks: 27( a D' ^. B H- J& Y. O u( ^
Memory: 587.1M
5 d0 _: I O. P( v* y- X. x0 F CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@gm268-2.service
: T8 R# j0 L4 v └─157382 /usr/bin/ceph-mon -f --cluster ceph --id gm268-2 --setuser ceph --setgroup ceph
$ \1 ^- x2 b: e* M* s6 `! q' d# V; I5 Z$ {0 U
, c# k' c& r. m+ |0 h4 U3 N2 v( T4 G |
|