|
|
楼主 |
发表于 2018-9-28 12:35:27
|
显示全部楼层
添加mon节点9 M1 Z+ J$ U4 o6 f# o8 Z* y! @! s
ceph monitor的个数是2n+1(n>=0)个,在线上至少3个,只要正常的节点数>=n+1,ceph的paxos算法能保证系统的正常运行。所以,对于3个节点,同时只能挂掉一个。. B7 e; {# {8 h0 t" c* G) r
当前ceph cluster中仅含有1个mon节点,将其扩展至3个mon节点。& d) u) i5 b9 G5 S- a
查看当前ceph cluster状态/ u7 w7 Z8 t: ]2 R7 u. M6 |' @
[root@ceph-osd-1 ceph-cluster]# ceph -s
/ y+ m; K; }) S1 E2 I! A. b- x cluster 9d717e10-a708-482d-b91c-4bd21f4ae36c
4 Y9 k7 [' S) p( U% s health HEALTH_OK; y( q! e$ z1 u$ F
monmap e5: 1 mons at {ceph-osd-1=10.10.200.163:6789/0}, election epoch 69, quorum 0 ceph-osd-12 \$ |' ~7 S# n B
osdmap e220: 7 osds: 7 up, 7 in
% A8 c) z0 ~: x7 J pgmap v473: 256 pgs, 1 pools, 0 bytes data, 0 objects
: r4 }8 m. B9 e: f0 Q) s; y" N 36109 MB used, 11870 GB / 11905 GB avail
8 l$ v$ Y4 j ^9 C) W 256 active+clean/ B2 m- x* T* q( P
$ T) `8 ]! q L& {% ~ 此时要向ceph cluster中添加两个节点分别为ceph-osd-2,ceph-osd-3
7 X' q( M$ t- m* ~- p 首先修改配置文件如下,添加public_network6 A2 A+ s; x+ a% n* ^! C3 F
[global]
2 g9 [( ?$ f3 K: T4 ]auth_service_required = cephx
8 e! A" x- Y" _1 i) c3 \6 a! Ifilestore_xattr_use_omap = true
9 G2 D0 B; s. r( |; d( E1 |: ?* ]' f" Hauth_client_required = cephx
5 `& l& N6 I- ?8 Oauth_cluster_required = cephx( e7 b" \4 v% Z* |
mon_host = 10.10.200.163, 10.10.200.164
1 x* h9 Z! Y& L) R5 v! {8 bmon_initial_members = ceph-osd-1, ceph-osd-2, H! M& h. q# q: _/ `9 p2 Z
fsid = 9d717e10-a708-482d-b91c-4bd21f4ae36c& F! }& k) G: C, a
public_network = 10.10.200.0/246 U6 k! I+ z( C3 w4 g
: H. o! U& p' c2 L D ?4 f" f
添加mon节点& x% \( M+ N0 u/ G; a6 I% f: c
[root@ceph-osd-1 ceph-cluster]# ceph-deploy mon create ceph-osd-2 ceph-osd-3</span>4 e5 u! ], m4 E
2 \) Y' w* _7 \# i
查看添加mon节点后,查看mon quorum状态信息
5 ?3 U- ] O' w7 p[root@ceph-osd-1 ceph-cluster]# ceph quorum_status --format json-pretty3 U' |# F7 z( Y( W# |5 l
- ~ ~! U$ W i* N2 M{ "election_epoch": 72,
) `) f/ d. S, Q* j "quorum": [3 Y8 n( m9 l3 ^! O3 p# w5 y
0,
0 y, b! }4 B0 ~0 ]2 |7 s+ r4 r 1,
! z+ \3 ^ f* o+ M# {$ L 2],5 n6 i/ M- ?" V3 {5 {
"quorum_names": [
K b8 o* K" }. T: w/ e "ceph-osd-1",
3 ?3 A! r( H6 h: x+ X "ceph-osd-2",
5 S' {5 H! d: u" f% ]5 Y# o0 E "ceph-osd-3"],5 U# c: P" h, I; _$ d2 b
"quorum_leader_name": "ceph-osd-1",
% S$ f, c" S# f/ h4 B! l "monmap": { "epoch": 7,; v" ?8 q+ x' t4 D$ C
"fsid": "9d717e10-a708-482d-b91c-4bd21f4ae36c",
* w, f# \5 Y( A3 E$ L ^ "modified": "2014-11-14 09:10:28.111133",) H1 Y8 y8 ? y& T& d9 v( q5 E
"created": "0.000000",
" e; i m: w) }: ^& H "mons": [( F9 A' i- Q1 V$ f& `# E" I
{ "rank": 0,
; |; L D# H) Z7 k" T) v( ]! x& G "name": "ceph-osd-1",. J& R- O3 a' u
"addr": "10.10.200.163:6789\/0"},, ^$ J1 o3 k: | o, z+ ], Y, t
{ "rank": 1,
2 t5 ~) _; u# V5 P "name": "ceph-osd-2",* l: Q" L3 [ }( Z& j
"addr": "10.10.200.164:6789\/0"},0 m3 D5 n' x+ ^; P* ^* e- a
{ "rank": 2,
( j/ V' `5 |6 r6 p. ?) P9 u "name": "ceph-osd-3",
- y2 ]3 q1 o. O1 q "addr": "10.10.200.165:6789\/0"}]}}- A+ z" B+ b) |# p8 D
' Z/ v2 ~& N. A' Z* F# N 查看此时ceph cluster状态
; E4 I4 f1 T& }5 Q \8 y7 Z[root@ceph-osd-1 ceph-cluster]# ceph -s
B# l2 }2 R3 L2 @0 k8 m cluster 9d717e10-a708-482d-b91c-4bd21f4ae36c
7 v( q1 H) O) e! H+ ~ health HEALTH_WARN clock skew detected on mon.ceph-osd-3
, x$ H+ o' p2 t0 R0 R4 `8 q; q( s monmap e7: 3 mons at {ceph-osd-1=10.10.200.163:6789/0,ceph-osd-2=10.10.200.164:6789/0,ceph-osd-3=10.10.200.165:6789/0}, election epoch 72, quorum 0,1,2 ceph-osd-1,ceph-osd-2,ceph-osd-3
. Z9 j# K! u5 T# M8 x osdmap e220: 7 osds: 7 up, 7 in
' E* z# E2 Y; @0 G- U6 w* u/ n pgmap v475: 256 pgs, 1 pools, 0 bytes data, 0 objects
+ M, h: A8 B. ^; q' i! M 36109 MB used, 11870 GB / 11905 GB avail
0 l8 D2 e. ?- h, _# Z/ Q 256 active+clean
3 m: z& j; ?' j' V* t. z
& l3 Z3 f8 {( u1 _1 g+ T- U1 l; B 可以发现mon.ceph-osd-3节点的时间与mon.ceph-osd-1的时间不同步,同步各mon节点的时间。
2 Z: z% v5 P1 ~2 y 此时ceph mon节点已经添加完毕,模拟ceph-osd-1 mon节点故障,查看ceph cluster能否正常工作,查看此时ceph cluster信息" s& ] Z! D( F6 Z
[root@ceph-osd-2 ~]# ceph -s
6 B2 P* u. e! E; w; z9 D2014-11-14 09:27:28.582467 7f9cd8712700 0 -- :/1014338 >> 10.10.200.163:6789/0 pipe(0x7f9cd4024230 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9cd40244c0).fault; F$ o+ ~4 U f1 c
cluster 9d717e10-a708-482d-b91c-4bd21f4ae36c$ \. V4 \0 Z) W4 A: s
health HEALTH_WARN 256 pgs degraded; 256 pgs stuck unclean; 256 pgs undersized; 1/7 in osds are down; 1 mons down, quorum 1,2 ceph-osd-2,ceph-osd-3. o: i/ [. ^" H% F* }9 p$ S/ g
monmap e7: 3 mons at {ceph-osd-1=10.10.200.163:6789/0,ceph-osd-2=10.10.200.164:6789/0,ceph-osd-3=10.10.200.165:6789/0}, election epoch 88, quorum 1,2 ceph-osd-2,ceph-osd-3
/ D8 g6 l# E4 F# ]% G0 i osdmap e263: 7 osds: 6 up, 7 in
: p( Z2 \2 F. A pgmap v542: 256 pgs, 1 pools, 0 bytes data, 0 objects! @. `1 l# L, I" u x9 v& S
36112 MB used, 11870 GB / 11905 GB avail
! i9 u# o. F7 F 256 active+undersized+degraded' w$ E; t5 P J
: ?3 n; K# q7 j
因为ceph-osd-1节点上面拥有1个mon节点以及1个osd节点,所以在osd cluster中,有个osd也处于down状态。
* T& R( t! ~ m' z1 z. `4 E7 m 本文的开头部分讲过,ceph mon规定在3个节点的状态下,只允许1个mon节点down,那么2个mon节点down会怎么样,继续down掉ceph-osd-2节点
+ k! `( y7 V. d5 q, | 通过ceph -s查看此时ceph cluster状态信息7 w8 A, [# e# @$ v. }7 z j! |
[root@ceph-osd-3 ~]# ceph -s
/ G1 g5 m3 ~! x& B" K2014-11-14 09:30:23.483264 7f677c28b700 0 -- :/1014680 >> 10.10.200.163:6789/0 pipe(0x7f6778023290 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f6778023520).fault
6 k- Z) Z' o. p7 Q" R2014-11-14 09:30:26.483313 7f677c18a700 0 -- :/1014680 >> 10.10.200.164:6789/0 pipe(0x7f676c000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f676c000e90).fault
3 B4 Q1 O+ ?9 Q" Q: M2014-11-14 09:30:29.483664 7f677c28b700 0 -- :/1014680 >> 10.10.200.163:6789/0 pipe(0x7f676c0030e0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f676c003370).fault
1 q4 D; w/ k- b P7 |2014-11-14 09:30:32.483904 7f677c18a700 0 -- :/1014680 >> 10.10.200.164:6789/0 pipe(0x7f676c003a00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f676c003c90).fault4 C- H# T& }; v* A4 s" V
2014-11-14 09:30:35.484221 7f677c28b700 0 -- :/1014680 >> 10.10.200.163:6789/0 pipe(0x7f676c0031b0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f676c002570).fault
8 ?7 h. F; J0 b' z S/ g9 r! E2014-11-14 09:30:38.484476 7f677c18a700 0 -- :/1014680 >> 10.10.200.164:6789/0 pipe(0x7f676c002a60 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f676c002cf0).fault
3 c: b7 K5 T; z' p, g, D; i& q
- S5 a+ N/ G1 n. G! P 通过以上信息,ceph cluster已经无法正常工作。所以在3节点的mon cluster中,仅允许1个mon节点down掉。
# v1 a# N4 W3 \3 X8 d
& j3 T/ I2 @5 e5 R* s4 Q |
|