|
|
楼主 |
发表于 2018-9-28 12:35:27
|
显示全部楼层
添加mon节点
7 B5 n8 A& m5 T ceph monitor的个数是2n+1(n>=0)个,在线上至少3个,只要正常的节点数>=n+1,ceph的paxos算法能保证系统的正常运行。所以,对于3个节点,同时只能挂掉一个。
8 V" C8 |% u4 q( i; m 当前ceph cluster中仅含有1个mon节点,将其扩展至3个mon节点。
& a9 I9 k* v: R0 r 查看当前ceph cluster状态2 l$ L( J, |5 x' `. N5 f+ o
[root@ceph-osd-1 ceph-cluster]# ceph -s! I% Z3 X+ R. k. w6 G
cluster 9d717e10-a708-482d-b91c-4bd21f4ae36c
. F5 E; N8 ]' Z+ A# f health HEALTH_OK4 s3 I# ]0 T; n9 e" o$ N( |2 Y
monmap e5: 1 mons at {ceph-osd-1=10.10.200.163:6789/0}, election epoch 69, quorum 0 ceph-osd-16 f: R9 N0 M( b2 @/ t
osdmap e220: 7 osds: 7 up, 7 in
6 T# P% v+ N5 { pgmap v473: 256 pgs, 1 pools, 0 bytes data, 0 objects
+ K5 _" m2 D! J3 @ 36109 MB used, 11870 GB / 11905 GB avail
$ U0 U$ S4 w% k, c 256 active+clean
0 D4 @1 J! m$ o' p- g1 A 8 S$ I9 F$ R" w* n1 E+ j; _
此时要向ceph cluster中添加两个节点分别为ceph-osd-2,ceph-osd-3
\/ J; f$ J' u9 z( M/ U; h 首先修改配置文件如下,添加public_network
( s' o3 h8 X* E# [% k/ |# u0 c[global]
' V6 r( S& O- I8 Wauth_service_required = cephx
, i# H, R0 y( ~. Q4 q; B0 zfilestore_xattr_use_omap = true7 v9 w7 n2 A1 Z- E' [; [: {$ v
auth_client_required = cephx
2 W6 Q0 n& x* c0 ?6 a k: z( Bauth_cluster_required = cephx
E* N. ?; O- F6 a8 ^" D3 Nmon_host = 10.10.200.163, 10.10.200.164
+ _) ?) F! ]; n- E) Vmon_initial_members = ceph-osd-1, ceph-osd-2+ P1 n' L7 b5 S" R7 I" p. K/ H
fsid = 9d717e10-a708-482d-b91c-4bd21f4ae36c
( C+ l% L" Y9 e4 w/ opublic_network = 10.10.200.0/24
0 {. O$ p. P3 m3 g$ g" E- [$ F6 M. N) f$ ?, S9 ^
添加mon节点
# f3 w. |: y3 t; a3 ^+ i* f8 |[root@ceph-osd-1 ceph-cluster]# ceph-deploy mon create ceph-osd-2 ceph-osd-3</span>1 e6 D n R# r; {( R7 g9 }3 O
$ I: K, M1 `' \+ F8 v 查看添加mon节点后,查看mon quorum状态信息4 y; B$ N. X! z* V& ?2 `# E. L
[root@ceph-osd-1 ceph-cluster]# ceph quorum_status --format json-pretty7 r7 o1 C9 G% V# T- m; P+ y
7 q( ?3 f( B" _! t1 D: n
{ "election_epoch": 72,
$ e5 L+ e$ W; @+ Y) H "quorum": [( S5 K. ?) T& _9 ?
0,# h- B! m' t0 m3 |5 W
1,
- n' L' I0 J8 ~( E" n+ i 2],
7 v$ O; J) M. u* U; h "quorum_names": [
4 R4 l( @0 j! c! ^$ K) V "ceph-osd-1",5 O& }- F: j0 G) c1 f* I
"ceph-osd-2",* {1 O! Q3 C; Q/ b8 ]- V
"ceph-osd-3"],* A& u4 Q- X* Z4 y& ]
"quorum_leader_name": "ceph-osd-1",& }* P$ I& O- M4 {3 @6 j
"monmap": { "epoch": 7, ?9 p/ R, Q! v& y" @
"fsid": "9d717e10-a708-482d-b91c-4bd21f4ae36c",
8 l2 V7 c* ^4 ^ "modified": "2014-11-14 09:10:28.111133",
7 T6 i' w# ?7 z0 e* q" {/ } "created": "0.000000",, K2 O% k' o; F: Q3 D+ S
"mons": [: ?0 g! O& y) w
{ "rank": 0,' S' [& v1 e( q' l3 I- e9 X
"name": "ceph-osd-1",4 m0 {4 Q% C1 c; R' ?
"addr": "10.10.200.163:6789\/0"},
$ r2 `9 y" V6 I { "rank": 1,
4 p D7 U J6 ?# X "name": "ceph-osd-2",4 q- x$ `4 P/ Z7 O3 i- H/ Z. P) V5 @. [
"addr": "10.10.200.164:6789\/0"},
8 U) a( \4 x( L6 R) |( x { "rank": 2,8 h6 g9 h( M6 q' h
"name": "ceph-osd-3",
; j9 o: C1 q0 H3 c3 i8 Y, S. J "addr": "10.10.200.165:6789\/0"}]}}* @) u7 J7 |8 ?, P! B% _) f- n! y
- Q) }2 X3 h6 i: m 查看此时ceph cluster状态
) u. P* A3 {( [) O: E[root@ceph-osd-1 ceph-cluster]# ceph -s
% w2 l; b3 C) C. s y cluster 9d717e10-a708-482d-b91c-4bd21f4ae36c7 y7 t5 t2 l) A) s, N* W
health HEALTH_WARN clock skew detected on mon.ceph-osd-3
! M! A0 U7 E2 p% `' X& {% d monmap e7: 3 mons at {ceph-osd-1=10.10.200.163:6789/0,ceph-osd-2=10.10.200.164:6789/0,ceph-osd-3=10.10.200.165:6789/0}, election epoch 72, quorum 0,1,2 ceph-osd-1,ceph-osd-2,ceph-osd-3# \/ c8 i$ Y K$ x# A
osdmap e220: 7 osds: 7 up, 7 in8 j) u( y; O; k' }
pgmap v475: 256 pgs, 1 pools, 0 bytes data, 0 objects3 Z# g2 p( P+ w: o0 `: q+ y0 V
36109 MB used, 11870 GB / 11905 GB avail$ @% p- M& O" [+ d* |4 Q v( i
256 active+clean
6 F2 h2 V/ i' h6 I, a- K8 N6 m( X$ z4 q
可以发现mon.ceph-osd-3节点的时间与mon.ceph-osd-1的时间不同步,同步各mon节点的时间。
& ~, S" r# S T1 x* E 此时ceph mon节点已经添加完毕,模拟ceph-osd-1 mon节点故障,查看ceph cluster能否正常工作,查看此时ceph cluster信息1 r& Z& [/ @! K" R3 p, E, v
[root@ceph-osd-2 ~]# ceph -s
2 A+ X. {$ `1 I \5 J2014-11-14 09:27:28.582467 7f9cd8712700 0 -- :/1014338 >> 10.10.200.163:6789/0 pipe(0x7f9cd4024230 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9cd40244c0).fault
( U( q# F2 `* \5 } C cluster 9d717e10-a708-482d-b91c-4bd21f4ae36c# O' F0 ~6 l6 [5 z
health HEALTH_WARN 256 pgs degraded; 256 pgs stuck unclean; 256 pgs undersized; 1/7 in osds are down; 1 mons down, quorum 1,2 ceph-osd-2,ceph-osd-3% ^5 V4 O3 J5 z# _6 h# O: K
monmap e7: 3 mons at {ceph-osd-1=10.10.200.163:6789/0,ceph-osd-2=10.10.200.164:6789/0,ceph-osd-3=10.10.200.165:6789/0}, election epoch 88, quorum 1,2 ceph-osd-2,ceph-osd-31 l9 H' a. {2 [
osdmap e263: 7 osds: 6 up, 7 in0 ~; ^( ^! ~0 j6 U( T. g
pgmap v542: 256 pgs, 1 pools, 0 bytes data, 0 objects: h# N: x3 R f+ f$ i4 l5 Y
36112 MB used, 11870 GB / 11905 GB avail
! d1 b, f2 E( {' J6 \+ ?7 K 256 active+undersized+degraded: m8 ]2 v+ g& b% b! n) s
8 u" g: V" \3 D' w: T Y
因为ceph-osd-1节点上面拥有1个mon节点以及1个osd节点,所以在osd cluster中,有个osd也处于down状态。! {) [" t8 u* f I9 \
本文的开头部分讲过,ceph mon规定在3个节点的状态下,只允许1个mon节点down,那么2个mon节点down会怎么样,继续down掉ceph-osd-2节点& h4 f! S A/ l: T* H
通过ceph -s查看此时ceph cluster状态信息
; n; S5 }$ Y) p) \' R[root@ceph-osd-3 ~]# ceph -s
" I+ j! O5 D5 X" O8 q: W2014-11-14 09:30:23.483264 7f677c28b700 0 -- :/1014680 >> 10.10.200.163:6789/0 pipe(0x7f6778023290 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f6778023520).fault
8 ^/ l4 t: H r# T2014-11-14 09:30:26.483313 7f677c18a700 0 -- :/1014680 >> 10.10.200.164:6789/0 pipe(0x7f676c000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f676c000e90).fault
9 [/ e7 v1 L# B$ X! p+ G* Y, w. D2 G0 E2014-11-14 09:30:29.483664 7f677c28b700 0 -- :/1014680 >> 10.10.200.163:6789/0 pipe(0x7f676c0030e0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f676c003370).fault
9 x" ^1 w1 S4 v- C# B/ Y' _$ m2014-11-14 09:30:32.483904 7f677c18a700 0 -- :/1014680 >> 10.10.200.164:6789/0 pipe(0x7f676c003a00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f676c003c90).fault
0 c9 ^3 L3 h: c7 R" A- c! h* ~2014-11-14 09:30:35.484221 7f677c28b700 0 -- :/1014680 >> 10.10.200.163:6789/0 pipe(0x7f676c0031b0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f676c002570).fault
1 o6 j: G: D [( t2 d M/ l0 f2014-11-14 09:30:38.484476 7f677c18a700 0 -- :/1014680 >> 10.10.200.164:6789/0 pipe(0x7f676c002a60 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f676c002cf0).fault
7 y7 L. Y4 R0 w) r
) B3 S# a% N5 b8 k7 I5 m( Z 通过以上信息,ceph cluster已经无法正常工作。所以在3节点的mon cluster中,仅允许1个mon节点down掉。
3 Y* [7 s+ _5 x9 c% M4 o b- H" j: n( ^
|
|