|
|
当mon节点全部出现问题的时候或者单独一个节点出现问题时恢复过程- P" w6 c1 F; b& z
/ A j1 J, L! m- c D
, a9 |: S8 ?( r! Zceph一直无法正常的执行ceph -s命令;& D; _" W9 }# x8 M
4 N6 }7 F5 f4 {. z+ T1 B2 a
1 J$ @5 w+ T9 T( t# ?6 E. x/ S; _ N/ t) [5 x
ceph分部署存储告警monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]$ k8 g+ P; o/ t2 w- J: H* n
1 k) s- n7 \& |, Y0 \% ~2 {& m2024-10-17T22:33:47.295+0800 7f20fe7fc700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]! z w1 O, O [# S2 W! Q6 {" W
2024-10-17T22:33:47.297+0800 7f20ff7fe700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
! D. B" O4 h% |& w
' y6 E' [0 g. G- _8 v+ ]& ^! p
% ]& g% F) @# {% L/ U" j5 Q
( x4 N; A# e6 F [; e- {- Y* r; i W环境中也就只有gm268-3节点因重启失败夯住是好的,gm268-1和gm268-2都已经被损坏。只能想办法从3上入手解决。
# ^& N9 ~5 V& b7 j
# P+ u( ]( ^" v' D结果过程:
|& h. m: o9 q6 W. s w
& i2 J9 F% ]( X1、在gm268-3节点上导出monmap文件:' d/ z6 a6 ?: p% ^
1 |$ l' l$ j# U
+ _1 C8 R$ Q+ N. C
$ monmaptool --create --clobber --fsid ce68aab8-8f46-11ed-88c0-ac1f6b3a30b9 --add gm268-3 10.12.3.2:6789 --add gm268-2 10.12.2.2:6789 --add gm268-1 10.12.1.2:6789 /tmp/monmap5 s8 }1 ?. i& K& c
monmaptool: monmap file /tmp/monmap
6 f) s2 K2 r% @; c- [9 o1 y7 b0 w" fmonmaptool: set fsid to ce68aab8-8f46-11ed-88c0-ac1f6b3a30b9
/ c3 E: {% H w4 Q- d8 hmonmaptool: writing epoch 0 to /tmp/monmap (3 monitors)
- u7 s' N, B( Y/ [; G6 d
# x! M; P( @8 V$ o- j. N, \- Q& J6 a$ B; N: t1 Y, }6 e; h v
导出monmap,好的节点写在前面,后面把所有的坏节点加上就可以了。
@# Y H) ^! A$ }5 U) l3 G9 b2 w, |5 x9 u2 B s2 C; |+ }' a
查看下导出的文件信息:; R" h! }, i7 c( h0 @! p
: t7 g# I* J" Y5 l" M! n5 {1 \
$ monmaptool --print /tmp/monmap
6 f$ i' H8 L6 ~/ Ymonmaptool: monmap file /tmp/monmap: w5 T3 X' e. W8 b- H! }+ [- m
epoch 0
2 h5 U( o! x$ r" vfsid ce68aab8-8f46-11ed-88c0-ac1f6b3a30b9: F* _: ?* @7 x) l+ t/ g
last_changed 2024-10-18T13:17:03.645872+0800' e: x8 {! I" u! D5 J) ]. W
created 2024-10-18T13:17:03.645872+08001 t9 {8 a% E6 P Y9 G8 D- Y
min_mon_release 0 (unknown)- j1 Z" d# ~: O7 s6 _, D# ]# }
0: v1:10.12.1.2:6789/0 mon.gm268-1- X+ j' Z- a- l% s4 Q
1: v1:10.12.2.2:6789/0 mon.gm268-2
1 s* b9 Y9 T4 r6 u6 r g' Y2: v1:10.12.3.2:6789/0 mon.gm268-3
9 y3 d( F( t+ J) D" R D
2 l0 q$ M% t& w7 r% h Q% n8 W# x( v
; `$ q3 l/ t! u$ d/ o6 G% p1 H3 s6 R4 h0 h
2、去gm268-1和gm268-2的节点上找到/var/lib/ceph/mon 目录,备份下。删除掉。因为文件被修改了,导致文件有异常,没有导致认证出问题。原有的/etc/ceph/目录不能删除。
6 m, p' V' G: P8 M7 r' g \
( h" @, L! K' x+ t. F* |, |
: x8 R; s) \- z, i( ]0 U, Q6 v3、将正常节点上keyring和导出的monmap文件传送到其他两个节点上:3 C V q( h" D: Q
$ K9 p; ^8 s* v
scp /var/lib/ceph/mon/ceph-gm268-3/keyring gm268-2:/tmp/
" g9 J/ O$ B$ e, {) x4 G- @scp /var/lib/ceph/mon/ceph-gm268-3/keyring gm268-1:/tmp/
: Q; b) T, e: ]" B. D0 f8 @' X4 v0 e4 ~* h& X7 P9 b, h" F
scp /tmp/monmap gm268-1:/tmp/2 B- {# d# v* R/ g9 [% |: f/ ~
scp /tmp/monmap gm268-1:/tmp/% z5 y: ]- \1 Y- G3 e( k, p" v
/ {( K9 G( g6 L% x# `' Y
5 ^2 `! l4 U7 @9 v9 B# g4、重做gm268-1和gm268-2 节点mon 4 Z% _! Q/ H: k) V* S7 c
ceph-mon --cluster ceph -i gm268-1 --mkfs --monmap /tmp/monmap --keyring /tmp/keyring -c /etc/ceph/ceph.conf
, y; q, O2 S+ i+ d
7 V( S" n+ P+ g0 n, z切换到/var/lib/ceph/mon目录下
1 E; k7 D8 r/ k4 M执行:9 K9 B. Z8 B4 N- b) W
chown -R ceph:ceph mon/. q$ d4 t* y; l; U/ Z& z2 k/ ~
9 V% y# u W8 G8 l9 `7 d% P, C启动mon服务:
) l7 @+ _$ R" `# \systemctl start ceph-mon@gm268-1.service
4 H* T T1 E, @( u' h6 W3 B. F# c8 ?, x' Y& x
查看服务:
$ M9 j0 h# ~- J2 K& p$ ^/ U0 D( H, y
$ B5 ^2 C. F; o* e# a$ systemctl status ceph-mon@gm268-1.service
) l. e- B w' P: p* V0 T$ ]● ceph-mon@gm268-1.service - Ceph cluster monitor daemon
; m5 u, W# Q2 e) K Q! w" [ Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
7 Q$ r6 z3 T/ O3 `0 Q* C4 R$ o Active: active (running) since Fri 2024-10-18 13:21:24 CST; 38min ago f$ {$ s: g* r% K# n0 S0 \: }& {
Main PID: 664542 (ceph-mon)9 D- N: ^1 g% c( P+ Y0 E
Tasks: 27' v: S3 V) M% J, V7 V. ]# W m
Memory: 286.0M9 s. j. b [0 f0 Y8 i; ?
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@gm268-1.service# S& W% Q+ X3 y& t1 S& j! A
└─664542 /usr/bin/ceph-mon -f --cluster ceph --id gm268-1 --setuser ceph --setgroup ceph+ z" T! i9 O8 T! \( s+ j+ \% o. c' L7 P
; u0 p, o u" R* b" [9 \
Oct 18 13:21:24 gm268-1 systemd[1]: Started Ceph cluster monitor daemon.: @! g8 P) V" v6 M" i
Oct 18 13:21:24 gm268-1 ceph-mon[664542]: 2024-10-18T13:21:24.793+0800 7fcc5f804700 -1 mon.gm268-1@0(probing) e11 stashing newest monmap 11 for next startup
: c% `) w' j& r0 u, {9 wOct 18 13:21:24 gm268-1 ceph-mon[664542]: ignoring --setuser ceph since I am not root
! C1 X" }4 q4 F9 b% m; x! t# tOct 18 13:21:24 gm268-1 ceph-mon[664542]: ignoring --setgroup ceph since I am not root' P; i8 N7 u- w5 B3 ~
0 j p0 P H0 `4 E, H0 x+ W- e
8 \- I! Q* m# c+ W节点修复完成。$ ^. e2 ]$ ^" O" b. o/ z
节点二上
. x5 i. B* ~7 m9 w9 i# u/ ^: ]% i
ceph-mon --cluster ceph -i gm268-2 --mkfs --monmap /tmp/monmap --keyring /tmp/keyring -c /etc/ceph/ceph.conf ; m% c# h, G8 o- v& X: n
) d- s8 g( e. e
切换到/var/lib/ceph/mon目录下6 T1 e* G5 q2 `* b) \
执行:- G. g/ q( S6 J! r! j3 `
chown -R ceph:ceph mon/
, U" K7 V# x8 `# f! z
- O' P! W; L) n z0 k) Q9 a/ a5 V9 J2 [启动mon服务:0 u8 R. a6 Q% E$ I; |! |
systemctl start ceph-mon@gm268-2.service. c+ ]( m; @; e+ G3 o
. B' c" g; f1 y$ D7 W0 \
@6 o/ o- @: ^3 e& l4 x9 S4 N8 G# V/ S) ?! S. y. i7 z
$ systemctl status ceph-mon@gm268-2.service 9 n1 e8 T& L: d& d' ^( p( ?
● ceph-mon@gm268-2.service - Ceph cluster monitor daemon
5 ^( y/ b( r9 L. H Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)% x/ W8 G" v# Q) r3 A7 i; i' R5 Z- M
Active: active (running) since Fri 2024-10-18 13:09:42 CST; 51min ago
6 }: k! c! c6 X$ }! q) Z2 _' B Main PID: 157382 (ceph-mon)! s! ~: {/ P, g/ L" r
Tasks: 27
# N6 g& h' `( u2 I+ m4 g Memory: 587.1M2 e( G o: n/ I! }7 i
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@gm268-2.service
( l% t- X* v! {3 y4 h' | └─157382 /usr/bin/ceph-mon -f --cluster ceph --id gm268-2 --setuser ceph --setgroup ceph2 b6 J. k. S; E& j, S6 q6 D
) L7 Y+ f) R* o; d8 q' S
. r, u6 w0 b/ }0 ^/ Q$ n$ j
|
|