|
|
当mon节点全部出现问题的时候或者单独一个节点出现问题时恢复过程7 \* G N& ~* f' X) d7 c
* T/ e: k8 b4 x
0 q2 A0 j6 ]+ `# {! t: w
ceph一直无法正常的执行ceph -s命令;$ h3 \3 Y& G8 A) v& F2 E
, T- B/ i/ A! H/ ^& q6 o& a
% u! O. K, ]+ x: P
8 `" L2 v( Y& w _4 `, {* b0 m$ e
ceph分部署存储告警monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
' N+ H( @* v' [3 n1 l9 y7 o8 @' J8 l5 d9 q7 j) W ]" H
2024-10-17T22:33:47.295+0800 7f20fe7fc700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
! }$ U3 v1 o3 B# J4 t2024-10-17T22:33:47.297+0800 7f20ff7fe700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
# h9 [. \9 z/ w8 {9 v" k- n. q5 j8 r. h/ N( {$ [1 r# e7 C
4 h; I; L- D! a
+ y) s$ _. @. c# i0 G环境中也就只有gm268-3节点因重启失败夯住是好的,gm268-1和gm268-2都已经被损坏。只能想办法从3上入手解决。$ c. R2 e: ~/ u# E
0 R7 @( x& N- I( w4 W5 W
结果过程:
" K, v" }/ J: a! M+ {9 e6 f8 m$ f- S2 X- q) L. }+ v# A" Q
1、在gm268-3节点上导出monmap文件:
* l" \; T B6 W; ?3 Q" n) i$ u S2 J7 Y& B0 b& @
% B+ b Z* E; p6 b( B0 [7 u$ monmaptool --create --clobber --fsid ce68aab8-8f46-11ed-88c0-ac1f6b3a30b9 --add gm268-3 10.12.3.2:6789 --add gm268-2 10.12.2.2:6789 --add gm268-1 10.12.1.2:6789 /tmp/monmap
5 m+ B3 p6 x' f5 ?9 p5 @) Y# Imonmaptool: monmap file /tmp/monmap
* \3 [) \1 H* l% i: m# Qmonmaptool: set fsid to ce68aab8-8f46-11ed-88c0-ac1f6b3a30b9: I9 L ]) `' ]" Z* \4 h+ A
monmaptool: writing epoch 0 to /tmp/monmap (3 monitors)
" T3 p R5 t8 W2 j; I: [0 x+ a
- G3 Q; j! u5 ^; K! c3 G P7 x
导出monmap,好的节点写在前面,后面把所有的坏节点加上就可以了。
- ^0 ~- T, u# B0 K! {" h& W1 ]2 o( r" T+ X0 E' A
查看下导出的文件信息:& D1 C9 v9 y7 s1 [
# ^( M2 C& ?; M/ Z& y
$ monmaptool --print /tmp/monmap
) @' g3 C, t6 ?& W1 a1 P% p s$ Vmonmaptool: monmap file /tmp/monmap+ `' G: ]; g# \/ H4 t
epoch 0
, |4 a3 ]( l/ }; n+ f' m6 afsid ce68aab8-8f46-11ed-88c0-ac1f6b3a30b9
1 }& r( s& [9 B; Olast_changed 2024-10-18T13:17:03.645872+08003 Q9 j+ y) i! t0 j1 R2 Z; Z
created 2024-10-18T13:17:03.645872+0800
! j- M$ g# |! y9 r8 T1 S/ q _min_mon_release 0 (unknown)
6 H# h8 Z! S$ Z$ Y0: v1:10.12.1.2:6789/0 mon.gm268-1
. ^( i: G) u9 ~# `: X( I1: v1:10.12.2.2:6789/0 mon.gm268-2: [( U! n( `' e" c$ K* h3 v# M
2: v1:10.12.3.2:6789/0 mon.gm268-3+ R- k# h# U; U$ N. F7 n2 `4 o
* I! N) j# Z7 R
' t, v6 ^$ D, g" x2 M/ A
6 Z' b% s& j$ u8 W2、去gm268-1和gm268-2的节点上找到/var/lib/ceph/mon 目录,备份下。删除掉。因为文件被修改了,导致文件有异常,没有导致认证出问题。原有的/etc/ceph/目录不能删除。
Z1 }* N* x9 f' o* X8 G, w% T; H9 U2 q% X9 r0 c6 r" |& U4 _1 v7 {
5 C6 [; X% g7 p4 u3、将正常节点上keyring和导出的monmap文件传送到其他两个节点上:2 c: r0 _, N+ a( ?
9 i( B" e7 H/ r( F
scp /var/lib/ceph/mon/ceph-gm268-3/keyring gm268-2:/tmp/
( b& D8 Z4 X; k! I1 g; }) V( {; Oscp /var/lib/ceph/mon/ceph-gm268-3/keyring gm268-1:/tmp/
3 R0 u# w+ S5 a
& o, l; W: h2 I4 H7 @scp /tmp/monmap gm268-1:/tmp/0 L1 \* C, U: g3 R9 }. F/ p& n
scp /tmp/monmap gm268-1:/tmp/
- g5 }, R* H i3 `5 a" a, M4 c% F* u' x0 n, x6 n @. F; o
( [0 t V* c' k$ r* O# M4、重做gm268-1和gm268-2 节点mon p# B9 o! ^3 e6 o1 O- c; J g+ O
ceph-mon --cluster ceph -i gm268-1 --mkfs --monmap /tmp/monmap --keyring /tmp/keyring -c /etc/ceph/ceph.conf ( X0 C) `% D) F3 I, x% }5 _2 K
( ]4 c- I6 d! r切换到/var/lib/ceph/mon目录下3 m! \6 U* K4 ^& P3 F4 H2 \* q8 n
执行:
8 V9 `3 G* @$ F4 m p4 S$ I2 s6 wchown -R ceph:ceph mon// @/ Q4 `; o& B _
1 ~, O7 R) c& k4 i
启动mon服务:) _3 M0 x/ F! Q2 _3 M a' A
systemctl start ceph-mon@gm268-1.service
+ l1 y% w( K& W7 R4 L% l5 i9 I: R; p# Y# U, m
查看服务:
( F+ i! Z- S7 v9 p3 g. s
% \# b8 W4 q' S5 h( B$ systemctl status ceph-mon@gm268-1.service $ x* }" u! }4 f! l! u3 `. m
● ceph-mon@gm268-1.service - Ceph cluster monitor daemon
( d4 l7 V7 ?9 K7 B7 x1 [ Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled), L4 {2 c7 A! S6 |( l
Active: active (running) since Fri 2024-10-18 13:21:24 CST; 38min ago7 b! F3 W4 S7 `! y
Main PID: 664542 (ceph-mon)& w: N" y5 r B
Tasks: 274 k2 _- W0 X* T. Y/ a2 y- G) a; a
Memory: 286.0M" x, b) Y0 s0 u2 `, O' Y
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@gm268-1.service8 @' J+ b+ }. S; B& P5 u# g/ @
└─664542 /usr/bin/ceph-mon -f --cluster ceph --id gm268-1 --setuser ceph --setgroup ceph5 h T: l$ Y' }: S1 ]
; O# u) H* }8 E1 sOct 18 13:21:24 gm268-1 systemd[1]: Started Ceph cluster monitor daemon.$ r+ |' C3 c: a/ D
Oct 18 13:21:24 gm268-1 ceph-mon[664542]: 2024-10-18T13:21:24.793+0800 7fcc5f804700 -1 mon.gm268-1@0(probing) e11 stashing newest monmap 11 for next startup
& k$ D6 T/ q, GOct 18 13:21:24 gm268-1 ceph-mon[664542]: ignoring --setuser ceph since I am not root
# {0 N5 a' S) o6 O; ^9 B. p" QOct 18 13:21:24 gm268-1 ceph-mon[664542]: ignoring --setgroup ceph since I am not root
3 T& p0 d8 f3 Y. r7 L8 ?$ U0 T; v
$ @$ x, K/ w; f% U+ `+ `7 u6 J4 e" g' T( b% N
节点修复完成。
, N4 j7 `( W- v" q/ q. d4 |节点二上
- D3 }* P+ g/ f8 ?( x! f# W) S1 a ?7 Z4 V; ~
ceph-mon --cluster ceph -i gm268-2 --mkfs --monmap /tmp/monmap --keyring /tmp/keyring -c /etc/ceph/ceph.conf
+ T- V: h/ c/ |' m3 }5 Y* }0 |
; u$ ?0 ^6 j+ R u( z切换到/var/lib/ceph/mon目录下; s( W3 ^+ }: ~8 v/ ?; j
执行:
( Q/ g, Z6 C/ x7 K& N! Cchown -R ceph:ceph mon/
5 `2 F) |9 y P3 N" i( Y, p" x1 _9 o: y/ e1 f4 c
启动mon服务:& b( F4 m6 a/ q2 X1 r2 w
systemctl start ceph-mon@gm268-2.service
4 G8 [% ?, h5 a2 l7 O; W, Y; r. k; o4 g- P! j0 i1 d$ F6 E
1 u; q! L- `/ o0 u# b) y, k
# n( A: u8 S ^! z
$ systemctl status ceph-mon@gm268-2.service 0 L2 a) f% j0 Q/ b! C& Q, y
● ceph-mon@gm268-2.service - Ceph cluster monitor daemon4 n C3 u+ _: F! {& V+ P, V* B
Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled), O$ I+ r+ } t/ d
Active: active (running) since Fri 2024-10-18 13:09:42 CST; 51min ago
1 r4 m, u Z8 { Main PID: 157382 (ceph-mon)6 g/ K! j* m& d6 _
Tasks: 272 m, ]' U# p+ \7 n
Memory: 587.1M0 @! Q2 _8 O/ E( c1 R( \+ @8 n& d
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@gm268-2.service" ~ {, V. \' ?: n- v
└─157382 /usr/bin/ceph-mon -f --cluster ceph --id gm268-2 --setuser ceph --setgroup ceph& b( F7 ^( A8 J: G0 K- L) R# b
. b0 t. I8 X8 R; V! E
$ {1 [: r& e+ h9 y$ } |
|