|
|
楼主 |
发表于 2018-9-28 12:35:27
|
显示全部楼层
添加mon节点6 V [# i0 U* @
ceph monitor的个数是2n+1(n>=0)个,在线上至少3个,只要正常的节点数>=n+1,ceph的paxos算法能保证系统的正常运行。所以,对于3个节点,同时只能挂掉一个。$ [9 ^* x6 C2 _% L
当前ceph cluster中仅含有1个mon节点,将其扩展至3个mon节点。
/ B4 T8 S0 \1 G; B- X7 L* g$ q" h 查看当前ceph cluster状态 v" p4 w, l* J- i& v" U, j
[root@ceph-osd-1 ceph-cluster]# ceph -s; a3 [1 a/ x0 Q7 l% l
cluster 9d717e10-a708-482d-b91c-4bd21f4ae36c2 ?$ c/ Q, N$ o* m
health HEALTH_OK
- p% C% G+ y4 y# G+ y* a5 @* Z. i monmap e5: 1 mons at {ceph-osd-1=10.10.200.163:6789/0}, election epoch 69, quorum 0 ceph-osd-1
5 `9 }: U! z9 u5 N5 P! b osdmap e220: 7 osds: 7 up, 7 in* f; R5 ?7 ?6 i' Q" ?* e
pgmap v473: 256 pgs, 1 pools, 0 bytes data, 0 objects
* h T9 @4 U' b3 X 36109 MB used, 11870 GB / 11905 GB avail
/ z. y3 d7 K2 M 256 active+clean
# G* A0 t# o+ }1 l3 O8 M% I% }
6 e X" Q7 N# a( R1 a/ t 此时要向ceph cluster中添加两个节点分别为ceph-osd-2,ceph-osd-3
" y) W4 t H0 |5 m 首先修改配置文件如下,添加public_network/ ?2 k, m# Q4 A7 ~
[global]) ~6 M( b8 R- t* v `3 h# x
auth_service_required = cephx6 v3 e) V7 N0 M2 j4 x
filestore_xattr_use_omap = true7 x; h! i" ]: B
auth_client_required = cephx% L& ?' K8 f6 Q- k) N
auth_cluster_required = cephx; Q+ G h+ B- T% S2 z6 r
mon_host = 10.10.200.163, 10.10.200.164
0 x: b* S1 [* gmon_initial_members = ceph-osd-1, ceph-osd-2$ N$ v' h. \+ E0 T' ?+ L; Z8 L
fsid = 9d717e10-a708-482d-b91c-4bd21f4ae36c/ A, ? k0 A% _( g6 {
public_network = 10.10.200.0/24) B; Y, k; N# V, j+ G( ?! t8 J" F
2 G; u2 [7 q/ z9 P2 q* A2 l6 h
添加mon节点4 t m+ G# i! w, r% j$ f6 [) N& [
[root@ceph-osd-1 ceph-cluster]# ceph-deploy mon create ceph-osd-2 ceph-osd-3</span>
k- n6 p5 F( T
% P9 ~6 }2 X* r* I0 U( j 查看添加mon节点后,查看mon quorum状态信息/ y0 n/ E$ {: `' p3 _- L% C5 E/ e
[root@ceph-osd-1 ceph-cluster]# ceph quorum_status --format json-pretty7 |: G+ ]( o# E+ T# j
, }2 ]0 }$ e' S2 y
{ "election_epoch": 72,! ?: x( n* T# P4 d$ ?+ r& o
"quorum": [' L# v% |! X* P- I/ n& a
0,/ I* b) e& g, e8 ]* O" t
1,* ]" p; B, {2 B8 N
2],+ Q9 Z! u* A1 H( D ?
"quorum_names": [, u8 i9 ?' M; H8 a
"ceph-osd-1",8 s3 A! A" V* P
"ceph-osd-2",. x1 p; s$ l, ]" A7 a2 H
"ceph-osd-3"],% b; P& t+ M$ l6 S& }# ?
"quorum_leader_name": "ceph-osd-1",
, {5 w4 g' b9 N3 L. G, Z) z+ ~ "monmap": { "epoch": 7,
7 I2 c2 t0 x7 Z4 F) B4 ` "fsid": "9d717e10-a708-482d-b91c-4bd21f4ae36c",% l3 v6 H" f1 s' o7 J& l
"modified": "2014-11-14 09:10:28.111133",
" ]; Z/ |* v: w2 m4 Y9 r% W' x' r "created": "0.000000",
0 d4 l0 {& v% d; B; N O: n& w "mons": [
7 j) ~, [* o5 H$ m4 p" d1 | { "rank": 0,
. s1 Y {) Z, u/ k( F- ?# o "name": "ceph-osd-1",) }* x# E2 m) U( j" Z) u0 ^* g
"addr": "10.10.200.163:6789\/0"},
3 Q6 b. z( n; `+ p) S { "rank": 1,
3 Q! i' m! C* A7 x, V a% ^ "name": "ceph-osd-2",
0 O0 A9 }2 P0 J* ]7 v. u "addr": "10.10.200.164:6789\/0"},
' g9 a6 G/ y" _( \6 l) F { "rank": 2,
8 F7 O# @: T( A/ D: {1 E "name": "ceph-osd-3",
. Z+ |) y) g \+ p( M. T9 p2 ^ "addr": "10.10.200.165:6789\/0"}]}}# k B; k# R6 s2 l4 o) g/ {8 o/ H
$ p2 M6 D6 a* |: ^
查看此时ceph cluster状态
& j+ ], ~9 d2 s9 }+ {0 W1 S" \[root@ceph-osd-1 ceph-cluster]# ceph -s
! H( B$ x! u, E8 g) @ cluster 9d717e10-a708-482d-b91c-4bd21f4ae36c' r% \& f+ @ t
health HEALTH_WARN clock skew detected on mon.ceph-osd-3
# \+ b* U7 H; B2 M! N# i monmap e7: 3 mons at {ceph-osd-1=10.10.200.163:6789/0,ceph-osd-2=10.10.200.164:6789/0,ceph-osd-3=10.10.200.165:6789/0}, election epoch 72, quorum 0,1,2 ceph-osd-1,ceph-osd-2,ceph-osd-3" b; H/ A* n" \" B T/ U; _1 ~# F$ b- x
osdmap e220: 7 osds: 7 up, 7 in# C2 B# W! k2 ^. j O
pgmap v475: 256 pgs, 1 pools, 0 bytes data, 0 objects
) R& K; y w+ s3 c: Z8 B* b$ k 36109 MB used, 11870 GB / 11905 GB avail
7 G1 C2 ]( }3 d6 D) U, W8 w 256 active+clean' S& M1 n8 w8 P/ S/ [; H U
. S+ N' c$ o8 N" t 可以发现mon.ceph-osd-3节点的时间与mon.ceph-osd-1的时间不同步,同步各mon节点的时间。4 f. W8 y" |- j9 b
此时ceph mon节点已经添加完毕,模拟ceph-osd-1 mon节点故障,查看ceph cluster能否正常工作,查看此时ceph cluster信息- t3 e7 n4 m% w
[root@ceph-osd-2 ~]# ceph -s
: A8 w, d' f/ j- ~2014-11-14 09:27:28.582467 7f9cd8712700 0 -- :/1014338 >> 10.10.200.163:6789/0 pipe(0x7f9cd4024230 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9cd40244c0).fault1 t* m( P+ }4 y! n* X! N3 D
cluster 9d717e10-a708-482d-b91c-4bd21f4ae36c
6 Q7 M5 k9 g. D health HEALTH_WARN 256 pgs degraded; 256 pgs stuck unclean; 256 pgs undersized; 1/7 in osds are down; 1 mons down, quorum 1,2 ceph-osd-2,ceph-osd-3+ n1 c7 s; r: O: [3 v$ r# H! A$ g
monmap e7: 3 mons at {ceph-osd-1=10.10.200.163:6789/0,ceph-osd-2=10.10.200.164:6789/0,ceph-osd-3=10.10.200.165:6789/0}, election epoch 88, quorum 1,2 ceph-osd-2,ceph-osd-3
0 k) k c) w) Z6 A8 f( { osdmap e263: 7 osds: 6 up, 7 in5 ?$ \3 p- c6 H! G
pgmap v542: 256 pgs, 1 pools, 0 bytes data, 0 objects& Y5 s) ^! A; d A7 d
36112 MB used, 11870 GB / 11905 GB avail
5 o3 z% z" F0 V" E 256 active+undersized+degraded- w# b5 \! a! J2 o! s
$ p( {; M7 d& u @ 因为ceph-osd-1节点上面拥有1个mon节点以及1个osd节点,所以在osd cluster中,有个osd也处于down状态。; l% h% Q; ^* A" o# U
本文的开头部分讲过,ceph mon规定在3个节点的状态下,只允许1个mon节点down,那么2个mon节点down会怎么样,继续down掉ceph-osd-2节点
7 G* V# _4 d% |3 U, d; S 通过ceph -s查看此时ceph cluster状态信息
; M I# B; ? x' ?, k& R[root@ceph-osd-3 ~]# ceph -s
S9 I: |6 [7 v0 s7 p, U+ C, ^+ r2014-11-14 09:30:23.483264 7f677c28b700 0 -- :/1014680 >> 10.10.200.163:6789/0 pipe(0x7f6778023290 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f6778023520).fault; N8 X* S1 ^& H5 R
2014-11-14 09:30:26.483313 7f677c18a700 0 -- :/1014680 >> 10.10.200.164:6789/0 pipe(0x7f676c000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f676c000e90).fault
w: y: [4 U! C& q$ R2 } T2014-11-14 09:30:29.483664 7f677c28b700 0 -- :/1014680 >> 10.10.200.163:6789/0 pipe(0x7f676c0030e0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f676c003370).fault
4 M) _1 y$ S0 i7 a) S* v- b$ v2014-11-14 09:30:32.483904 7f677c18a700 0 -- :/1014680 >> 10.10.200.164:6789/0 pipe(0x7f676c003a00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f676c003c90).fault
; f/ ~" f2 x6 Q7 v# l6 r2014-11-14 09:30:35.484221 7f677c28b700 0 -- :/1014680 >> 10.10.200.163:6789/0 pipe(0x7f676c0031b0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f676c002570).fault
: {/ d8 o( R5 t' }# l8 s/ I2014-11-14 09:30:38.484476 7f677c18a700 0 -- :/1014680 >> 10.10.200.164:6789/0 pipe(0x7f676c002a60 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f676c002cf0).fault
; O c _; W& U& H; k7 i1 y5 K5 i! w; n/ y; B3 R; A# T
通过以上信息,ceph cluster已经无法正常工作。所以在3节点的mon cluster中,仅允许1个mon节点down掉。: [ u$ B1 T3 Z: c; ~- h
% w. M/ D6 N5 A( W2 B
|
|