|
|
0 当前Ceph版本和CentOS版本:
; B: R( c9 X) S0 ]6 I9 ~[root@ceph1 ceph]# ceph -v
0 P3 F+ B! h# r6 Kceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)3 `3 }/ Z3 A" L5 R9 ?
[root@ceph1 ceph]# cat /etc/redhat-release % L3 \$ x. {$ K% S: c/ q
CentOS Linux release 7.5.1804 (Core)1 E' j# Q. ^& U1 }5 y
1 P$ ^) A; ?0 ~* L
5 E: K8 a( e9 I' C3 a# r. ?
1.节点间配置文件内容不一致错误
. q( T8 A8 D/ o2 ~; t输入ceph-deploy mon create-initial命令获取密钥key,会在当前目录(如我的是~/etc/ceph/)下生成几个key,但报错如下。意思是:就是配置失败的两个结点的配置文件的内容于当前节点不一致,提示使用--overwrite-conf参数去覆盖不一致的配置文件。* y+ W- U8 f) ^, Q2 p& ^; M" ]
[root@ceph1 ceph]# ceph-deploy mon create-initial
. f+ E% Z6 J, C9 Q6 Z...
" b H; `8 @" J% f[ceph2][DEBUG ] remote hostname: ceph2 y+ U3 I' i+ t, o
[ceph2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf( h7 p. i+ |) u+ O M1 S8 r2 D/ `+ _
[ceph_deploy.mon][ERROR ] RuntimeError: config file /etc/ceph/ceph.conf exists with different content; use --overwrite-conf to overwrite
0 w; v: M& Z. H( P[ceph_deploy][ERROR ] GenericError: Failed to create 2 monitors3 k; Q! |" `. m# ]0 A4 m
...* r. E) C+ P/ E1 M
& T$ o' T7 W g输入命令如下(此处我共配置了三个结点ceph1~3):7 ^0 j1 Q+ V1 R1 u7 c/ Q
[root@ceph1 ceph]# ceph-deploy --overwrite-conf mon create ceph{3,1,2} ?2 p6 L1 A; j2 r
...7 X* ?5 W0 R L4 w0 ~
[ceph2][DEBUG ] remote hostname: ceph2
3 e( c0 p: ~+ r z9 Y2 a* v[ceph2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf. l2 Q. J& j; Y: b/ g! \2 L8 g" C
[ceph2][DEBUG ] create the mon path if it does not exist- E* S) p$ |1 {3 ~
[ceph2][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph2/done1 n5 D9 {9 J" L4 Q, r+ B9 r
...: @/ |7 r% ]; Z, g* ]
; o2 e$ G! P% j7 c5 {之后配置成功,可继续进行初始化磁盘操作。+ k" H7 n; r* V* z
2.too few PGs per OSD (21 < min 30)警告) j0 n, T. t A# X! _. U1 b
[root@ceph1 ceph]# ceph -s t! z ~9 V' j% r. P8 } ~
cluster:
( P4 {9 e3 C# K/ J% ~$ V+ S id: 8e2248e4-3bb0-4b62-ba93-f597b1a3bd40" ^% @4 F* E+ b) k7 F
health: HEALTH_WARN: u6 z, ]; _* c* N
too few PGs per OSD (21 < min 30). w& C0 q: L) b& y& C
7 D: B. K9 ^ f+ L. w& O
services:+ w6 Z2 }+ S& x" x2 L1 B5 U# m8 q8 |
mon: 3 daemons, quorum ceph2,ceph1,ceph33 p6 f3 j; I2 e
mgr: ceph2(active), standbys: ceph1, ceph3
4 y* y4 O/ D% J. l# V osd: 3 osds: 3 up, 3 in
2 Z$ c& U( k3 S# a- T9 | rgw: 1 daemon active
3 B/ U" v+ B) G; [' I% ?% n+ _: R
/ _ W( |; |* R data:8 M/ M6 L( S8 J: E0 U
pools: 4 pools, 32 pgs/ D. J7 `. M5 Z$ Y1 H
objects: 219 objects, 1.1 KiB
0 f4 u* t% c5 w' F usage: 3.0 GiB used, 245 GiB / 248 GiB avail
5 ~7 L# Y" T$ Z* V pgs: 32 active+clean
. A! R3 t' s- S6 |; [# `
4 o, W$ v h D
0 {' h2 h. g! Z& G/ i从上面集群状态信息可查,每个osd上的pg数量=21<最小的数目30个。pgs为32,因为我之前设置的是2副本的配置,所以当有3个osd的时候,每个osd上均分了32÷3*2=21个pgs,也就是出现了如上的错误 小于最小配置30个。
) [7 D. n* }- I4 C集群这种状态如果进行数据的存储和操作,会发现集群卡死,无法响应io,同时会导致大面积的osd down。
& F# `7 F/ b& Q: X) n. M3 n; w解决办法:增加pg数
; G' A; \ r1 ~5 W因为我的一个pool有8个pgs,所以我需要增加两个pool才能满足osd上的pg数量=48÷3*2=32>最小的数目30。
/ s4 C( ^# g# X$ h[root@ceph1 ceph]# ceph osd pool create mytest 8
) C9 t: M% o* O4 bpool 'mytest' created" }) J6 |. D# j# e
[root@ceph1 ceph]# ceph osd pool create mytest1 8
$ d; Y$ z# l4 P0 l# l/ fpool 'mytest1' created
5 }! G; u/ {6 r# S' F) P[root@ceph1 ceph]# ceph -s
6 R, R% C Z+ I cluster:
5 U) @; m( S \" G. F% \* ^ id: 8e2248e4-3bb0-4b62-ba93-f597b1a3bd40: d% P+ n5 G V0 | @- Y' Q6 M+ `1 k
health: HEALTH_OK
. w! b" M* S- |8 D & N X; v9 f( f3 u0 P4 r* _& z, ^+ C
services:
V0 B/ R$ G5 t; m) t, Q mon: 3 daemons, quorum ceph2,ceph1,ceph34 w+ k7 t' y1 r7 U p
mgr: ceph2(active), standbys: ceph1, ceph3
9 c6 S7 V. Z1 O- I+ E0 F" A6 D+ F osd: 3 osds: 3 up, 3 in
! o+ ~" f1 f* z0 z- i5 y rgw: 1 daemon active$ N$ I1 t2 }% T% w. }
5 I9 c$ W- V9 h6 [
data:
$ i, u6 L$ ^: \' G( s pools: 6 pools, 48 pgs
+ \. M5 _! p3 l, K6 K+ J objects: 219 objects, 1.1 KiB% {5 k4 {: h1 {8 D! C
usage: 3.0 GiB used, 245 GiB / 248 GiB avail
) B$ n" x. q- x; y R M pgs: 48 active+clean
5 r* T% z; \3 S% f1 B6 O9 g& ?6 R7 c' P) ^: e
集群健康状态显示正常。+ U5 _ m: U3 ?" @0 C
3.集群状态是HEALTH_WARN application not enabled on 1 pool(s)) t" p3 K$ X* V8 e7 q' x
如果此时,查看集群状态是HEALTH_WARN application not enabled on 1 pool(s):) Z) r M5 a, I1 ?
[root@ceph1 ceph]# ceph -s1 d$ F- c; ^: |# X4 A) S2 `
cluster:
' i5 H7 K' M& |2 M id: 13430f9a-ce0d-4d17-a215-272890f47f28
& I! Q3 A2 t& G2 f7 c) y health: HEALTH_WARN* m. O9 u, P |1 m3 j
application not enabled on 1 pool(s)8 y5 z/ U: |& H/ i3 g
& ]" V* L$ P: |/ O) I5 {[root@ceph1 ceph]# ceph health detail
3 u* Z; S: y% n4 A" dHEALTH_WARN application not enabled on 1 pool(s)
6 Z; T* k1 ~7 v' N9 VPOOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
6 o' l& M2 Y; g/ X+ v- E application not enabled on pool 'mytest'3 ]! z" Z& u' Y
use 'ceph osd pool application enable <pool-name> <app-name>', where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications./ e5 F& h4 R! {9 \+ M1 u1 B
2 G4 F* Y5 f1 A7 N) t运行ceph health detail命令发现是新加入的存储池mytest没有被应用程序标记,因为之前添加的是RGW实例,所以此处依提示将mytest被rgw标记即可:
* J8 v8 K+ U5 N. E8 L: |[root@ceph1 ceph]# ceph osd pool application enable mytest rgw
* z) W$ v4 a" t& e1 ~/ P# ]enabled application 'rgw' on pool 'mytest'
+ V Z2 P/ n9 ?" f8 z L) a
) B3 e. q9 d. G D" e! |再次查看集群状态发现恢复正常' r5 P+ e: b2 e6 T2 o" ~$ c1 ?
[root@ceph1 ceph]# ceph health
; `6 y5 U. q4 n4 G1 |7 U+ q& hHEALTH_OK
& E' d' g7 @# B4 z% d/ a9 _8 E
/ [# G, H* \* z8 A4 r4.删除存储池报错6 G$ V9 D, j9 f$ b# Q
以下以删除mytest存储池为例,运行ceph osd pool rm mytest命令报错,显示需要在原命令的pool名字后再写一遍该pool名字并最后加上--yes-i-really-really-mean-it参数
6 T: E; o: L, M" s, O[root@ceph1 ceph]# ceph osd pool rm mytest" |+ ^# H# _) c6 v, H8 w5 ~- v
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool mytest. If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it.' r, \- f7 Z* W6 R
+ f; b& e o) q- U0 D8 @8 u9 q& U按照提示要求复写pool名字后加上提示参数如下,继续报错:5 A. y/ {/ x6 R; N. D R
[root@ceph1 ceph]# ceph osd pool rm mytest mytest --yes-i-really-really-mean-it1 R% A. c( x: x- w/ g9 q
Error EPERM: pool deletion is disabled; you must first set the 5 |; p, K6 V( ^1 H# h: e' S) W# e
mon_allow_pool_delete config option to true before you can destroy a pool8 _6 N) |( {7 l3 \
* f4 I3 _, l! a1 J7 p) {
错误信息显示,删除存储池操作被禁止,应该在删除前现在ceph.conf配置文件中增加mon_allow_pool_delete选项并设置为true。所以分别登录到每一个节点并修改每一个节点的配置文件。操作如下: i% Z7 r; R' G7 A$ R! R
[root@ceph1 ceph]# vi ceph.conf 9 U3 x/ L: P7 e
[root@ceph1 ceph]# systemctl restart ceph-mon.target
. }. W$ w; C0 x+ c- P: Y V# T8 `; O9 b2 I! u/ | t
在ceph.conf配置文件底部加入如下参数并设置为true,保存退出后使用systemctl restart ceph-mon.target命令重启服务。
$ |; _. I; _! j1 y F1 F[mon]% P- O% q9 ^) f8 M
mon allow pool delete = true
1 J# ` B5 W6 l7 N8 y( `6 j- c) g+ X- z: h
其余节点操作同理。# |+ p3 i9 X3 @: M! G& r3 T( n5 e$ F
[root@ceph2 ceph]# vi ceph.conf 4 H5 h5 I1 V, |5 [. n7 B' I
[root@ceph2 ceph]# systemctl restart ceph-mon.target
$ h/ ^- Y1 [9 V0 U- j. f7 G[root@ceph3 ceph]# vi ceph.conf j6 q: @6 y" O: z8 W3 D6 o9 `
[root@ceph3 ceph]# systemctl restart ceph-mon.target
0 ?; T" Z' v- h! \/ `7 z- V4 n0 i* S+ `
再次删除,即成功删除mytest存储池。
* j- Q, U( r# t3 V8 e[root@ceph1 ceph]# ceph osd pool rm mytest mytest --yes-i-really-really-mean-it
! \4 k6 d/ l3 [pool 'mytest' removed
. ]. d& c0 Z5 @3 v9 @
6 Y& ~0 W6 u% [, b5.集群节点宕机后恢复节点排错5 f$ w" k3 t+ {" R
笔者将ceph集群中的三个节点分别关机并重启后,查看ceph集群状态如下:
7 ]0 S j1 m. \[root@ceph1 ~]# ceph -s) o5 _! p: r% q- u% O" n K
cluster:) P: H; m5 O0 [+ r6 [& H
id: 13430f9a-ce0d-4d17-a215-272890f47f281 G; b( E' `' E. m
health: HEALTH_WARN; a9 @( G, |+ Q" z
1 MDSs report slow metadata IOs/ L" J6 {% d5 }( |* m
324/702 objects misplaced (46.154%)
/ l' z" C `/ P& G' U Reduced data availability: 126 pgs inactive
& a' Y9 Z8 U: J) r+ A5 G Degraded data redundancy: 144/702 objects degraded (20.513%), 3 pgs degraded, 126 pgs undersized f* G: l% T9 C
! r* M0 Z7 ~5 j6 g& R services:
5 R& o' I8 s/ t mon: 3 daemons, quorum ceph2,ceph1,ceph3
( L" X" T( u. a- O4 {7 s mgr: ceph1(active), standbys: ceph2, ceph3! a% f, N) E, d- v" i: \7 Y
mds: cephfs-1/1/1 up {0=ceph1=up:creating}
`. U0 u* e" {4 F: Q osd: 3 osds: 3 up, 3 in; 162 remapped pgs
R |( {3 {& O+ W
3 s% o0 b' v* u" E- l/ W6 p! x data:# u3 X% ]* T. q% H/ Q' R9 E( B
pools: 8 pools, 288 pgs W* J$ c! d7 `9 ~$ r, a
objects: 234 objects, 2.8 KiB% U8 O( ]3 M( X% y7 H( e& {. \
usage: 3.0 GiB used, 245 GiB / 248 GiB avail
. M% [, \; |6 M% N3 K2 l' p pgs: 43.750% pgs not active ]3 V. T) l- X1 a( u' c, S
144/702 objects degraded (20.513%)6 [4 f! q$ m. E9 O. Y. \
324/702 objects misplaced (46.154%)4 s; J- L; y. Q. _0 U
162 active+clean+remapped
3 O* G0 @& m9 W3 x# T( w f 123 undersized+peered; @, h" G9 }' Y& |- D4 ]2 L8 C
3 undersized+degraded+peered2 c# Z! U8 Z& C3 S t
( F2 ?" |! i* ^9 i3 {6 ~
查看( G# j, D* y% @+ U8 ^7 |3 A
[root@ceph1 ~]# ceph health detail
# H5 @. E6 U( f8 M4 _ E) o- c1 FHEALTH_WARN 1 MDSs report slow metadata IOs; 324/702 objects misplaced (46.154%); Reduced data availability: 126 pgs inactive; Degraded data redundancy: 144/702 objects degraded (20.513%), 3 pgs degraded, 126 pgs undersized" T* e4 ?- G. \6 F! y4 h
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
( ~! Y/ K0 W* s% ?6 m) n/ @* ]6 @- U mdsceph1(mds.0): 9 slow metadata IOs are blocked > 30 secs, oldest blocked for 42075 secs* N, N8 O4 [- e; v o3 O
OBJECT_MISPLACED 324/702 objects misplaced (46.154%)
; B$ X6 }# F9 l. NPG_AVAILABILITY Reduced data availability: 126 pgs inactive! V8 m; i" m& E
pg 8.28 is stuck inactive for 42240.369934, current state undersized+peered, last acting [0]2 E- Q6 h$ E1 J
pg 8.2a is stuck inactive for 45566.934835, current state undersized+peered, last acting [0]
; j6 O2 M l8 I1 U, T; @7 D- K pg 8.2d is stuck inactive for 42240.371314, current state undersized+peered, last acting [0]
5 w$ k8 P2 v E3 j; x* l pg 8.2f is stuck inactive for 45566.913284, current state undersized+peered, last acting [0]
, _1 m4 {7 g! q$ ` pg 8.32 is stuck inactive for 42240.354304, current state undersized+peered, last acting [0]
+ d/ G' U3 W" {9 t0 S, r5 I! x ....
9 ^6 O6 Z. Q% ` pg 8.28 is stuck undersized for 42065.616897, current state undersized+peered, last acting [0]. l# S7 B% Q+ J6 s2 l
pg 8.2a is stuck undersized for 42065.613246, current state undersized+peered, last acting [0]! O' ^7 O1 c2 C: [" a' b" m
pg 8.2d is stuck undersized for 42065.951760, current state undersized+peered, last acting [0]8 i6 ]. V. t1 J( l
pg 8.2f is stuck undersized for 42065.610464, current state undersized+peered, last acting [0]7 |5 b& Z* `; s! C3 a1 j* G, q- ]
pg 8.32 is stuck undersized for 42065.959081, current state undersized+peered, last acting [0]
: a4 V, P, z# C' z' a8 c) \( z! f ....
0 w$ C* @" d* {# x4 V( G, u
. V* g" q! y# M [" b8 F- ^5 K可见在数据修复中, 出现了inactive和undersized的值, 则是不正常的现象
8 ? R6 t* r0 \( D% o6 \解决方法:' G, Z( A) o Z$ A- l) ]
①处理inactive的pg:* ?1 J0 I1 Y N8 ?* `6 ?# f! P+ o
重启一下osd服务即可
( r/ J0 b/ B" Y[root@ceph1 ~]# systemctl restart ceph-osd.target 7 N+ `( K4 k" y" t1 r
1
$ r/ x3 Z! V" S继续查看集群状态发现,inactive值的pg已经恢复正常,此时还剩undersized的pg。
3 _+ {% f6 @5 e! J[root@ceph1 ~]# ceph -s
' F5 `% I' Y7 l# C# x cluster:: T7 N+ L/ ^# R& I! S3 d
id: 13430f9a-ce0d-4d17-a215-272890f47f28, X, U8 L" g7 R9 w, v6 \( x/ _
health: HEALTH_WARN7 G7 @- D8 y' X5 A2 M
1 filesystem is degraded
3 T# W" t+ v) X. c, }1 f 241/723 objects misplaced (33.333%) J" U+ H2 j+ K4 g. z. e4 f
Degraded data redundancy: 59 pgs undersized
% L! k/ r. ~4 J8 {) ]0 r8 h# y 7 ^& V" z6 ?2 q
services:; \, T+ |+ ^/ d' g$ e
mon: 3 daemons, quorum ceph2,ceph1,ceph3; ]& b5 `% Y- O" o" B
mgr: ceph1(active), standbys: ceph2, ceph3( {: p3 M" p) b! K- R* _0 \8 H' G6 f& j
mds: cephfs-1/1/1 up {0=ceph1=up:rejoin}" r: U+ u3 {0 s! d( u. ~6 R
osd: 3 osds: 3 up, 3 in; 229 remapped pgs
# i0 j! ?8 t5 e2 i; T5 Q# ?1 Q rgw: 1 daemon active
& N0 l$ Y4 C) ] 9 Y. \: x# M ^; D& a. y
data:. x7 [; p# [" m
pools: 8 pools, 288 pgs
9 P2 }9 L7 T$ ?7 }, q# U6 F8 Q objects: 241 objects, 3.4 KiB9 }& X1 N" {+ H6 s. P# S
usage: 3.0 GiB used, 245 GiB / 248 GiB avail# {0 u4 m0 {; X( r7 f/ w
pgs: 241/723 objects misplaced (33.333%)
7 m* B4 N0 o( H2 D 224 active+clean+remapped
$ |, Y7 P- l+ ?2 b. C 59 active+undersized% v9 Z& C2 {! V2 k7 d, m
5 active+clean5 w5 K. M2 Y: r4 N0 e+ G H! k
% U6 Z" U- Z& P* \# P/ K io:
+ Q) u4 f- ]0 ] client: 1.2 KiB/s rd, 1 op/s rd, 0 op/s wr
1 f; c+ R7 E6 |* a4 O* D
8 l6 \9 i; ?# M6 v! x- a* }0 W" C②处理undersized的pg:
% J- |1 E; n" t- `. G3 Y! y. N
学会出问题先查看健康状态细节,仔细分析发现虽然设定的备份数量是3,但是PG 12.x却只有两个拷贝,分别存放在OSD 0~2的某两个上。8 s$ V" F2 ?. _: k9 B( N( J
[root@ceph1 ~]# ceph health detail , r7 R e* U) z3 n! {" g- N
HEALTH_WARN 241/723 objects misplaced (33.333%); Degraded data redundancy: 59 pgs undersized* ?- `: R+ s0 s
OBJECT_MISPLACED 241/723 objects misplaced (33.333%)
+ s4 k# e( b- ZPG_DEGRADED Degraded data redundancy: 59 pgs undersized
& g% ?" R0 v( b: q pg 12.8 is stuck undersized for 1910.001993, current state active+undersized, last acting [2,0]$ G" K& K8 l2 O q4 d) }! a; s/ P9 P0 r
pg 12.9 is stuck undersized for 1909.989334, current state active+undersized, last acting [2,0]# _' Y: y( x) L/ o1 O* @
pg 12.a is stuck undersized for 1909.995807, current state active+undersized, last acting [0,2]
" T9 p3 _4 Z- | m0 N pg 12.b is stuck undersized for 1910.009596, current state active+undersized, last acting [1,0]* P+ A( T) w8 ]: U, ?" s- y
pg 12.c is stuck undersized for 1910.010185, current state active+undersized, last acting [0,2]
1 K3 s: x+ {+ d& l pg 12.d is stuck undersized for 1910.001526, current state active+undersized, last acting [1,0]
7 O! \0 `4 \( D9 R+ o/ p pg 12.e is stuck undersized for 1909.984982, current state active+undersized, last acting [2,0]
. [2 A2 F3 `7 \. [ pg 12.f is stuck undersized for 1910.010640, current state active+undersized, last acting [2,0]$ d* i2 [9 v# L7 y) F" N& }% t0 t' M* D
4 S: D1 x$ b5 J9 j# g
进一步查看集群osd状态树,发现ceph2和cepn3宕机再恢复后,osd.1 和osd.2进程已不在ceph2和cepn3上。; g0 x$ U4 I K$ l# q/ g
[root@ceph1 ~]# ceph osd tree& R6 R7 a. e; o: A. W8 f! x
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF . V; z8 X2 E% @7 b# Q6 S) o
-1 0.24239 root default - t0 E. D- |! K' I W6 v
-9 0.16159 host centos7evcloud
% X) W; D/ ~7 n$ C \) q 1 hdd 0.08080 osd.1 up 1.00000 1.00000
* T' Y" n7 O+ r9 } 2 hdd 0.08080 osd.2 up 1.00000 1.00000 4 |: ?+ O3 x' p) \
-3 0.08080 host ceph1
7 V6 Z2 K" K2 S! U; j! @& L 0 hdd 0.08080 osd.0 up 1.00000 1.00000
% S1 I/ R9 \. e: [8 |-5 0 host ceph2
! T) O6 ?- P% v6 v-7 0 host ceph38 V1 }+ ]# H7 {) z& f
* [ V! [$ C, ~5 S
分别查看osd.1 和osd.2服务状态。9 ]: s: g! S0 O S( {) K9 v9 v5 O
: a% S% y, @2 W) t9 ?解决方法:5 h( y9 Q& y* I3 L6 o
分别进入到ceph2和ceph3节点中重启osd.1 和osd.2服务,将这两个服务重新映射到ceph2和ceph3节点中。
2 j+ w$ ^! m2 d/ r$ E& T[root@ceph1 ~]# ssh ceph22 [* p+ c5 t3 G
[root@ceph2 ~]# systemctl restart ceph-osd@1.service
) q% P0 j4 I# R- S/ ]' R[root@ceph2 ~]# ssh ceph3& j( G' I# G% p3 C
[root@ceph3 ~]# systemctl restart ceph-osd@2.service
+ s8 \! q3 Q! m9 G; p% d
8 X0 V- N5 N R- t最后查看集群osd状态树发现这两个服务重新映射到ceph2和ceph3节点中。
r0 F+ d# w/ I- {/ \5 @0 c2 k[root@ceph3 ~]# ceph osd tree
5 s- P6 @' d! R* E2 AID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
- f+ {+ ^5 v/ r4 s-1 0.24239 root default 7 d- E' H) M: z# f1 D
-9 0 host centos7evcloud 4 _1 w$ [* @( J
-3 0.08080 host ceph1
( k' P4 ~# o, |% Y$ N1 w 0 hdd 0.08080 osd.0 up 1.00000 1.00000 5 _3 Q" p3 ?3 D# A+ d
-5 0.08080 host ceph2 7 @! n: g7 V; X/ {" [6 l8 K4 n2 l5 W
1 hdd 0.08080 osd.1 up 1.00000 1.00000
1 w5 K, U; B( S8 ^-7 0.08080 host ceph3 $ L* I! n0 g x9 u: v( ^
2 hdd 0.08080 osd.2 up 1.00000 1.00000) F* W7 d3 P( `/ N. e
6 l) M+ R1 Z4 l+ P集群状态也显示了久违的HEALTH_OK。. Z/ t/ x8 M: S! I8 \
[root@ceph3 ~]# ceph -s
' ~8 I( M G9 ^5 L% D1 | cluster:
) I6 b! K6 q; x7 X id: 13430f9a-ce0d-4d17-a215-272890f47f28' ~7 ^( J* ^( t( }* B: C6 g
health: HEALTH_OK- r; u, q: K/ u% u& t: {/ M
! C; _: [. ^: U2 T/ W services:+ b5 M9 n+ H [
mon: 3 daemons, quorum ceph2,ceph1,ceph3
, _" ~, f2 w* @/ e3 ~ mgr: ceph1(active), standbys: ceph2, ceph3
! z' j- C9 Z3 z' O9 a/ W! g5 R mds: cephfs-1/1/1 up {0=ceph1=up:active}" e; t' k4 e, l) c# C8 e/ ]
osd: 3 osds: 3 up, 3 in" u7 v9 @$ q3 C
rgw: 1 daemon active
+ w( T& s q" D$ L! S* f
- v V, U$ y; D! A7 k* v7 c$ H data:' N& t% s! t4 S, W8 e
pools: 8 pools, 288 pgs
$ J; T; |% Z+ r9 i4 B objects: 241 objects, 3.6 KiB1 T+ |, @* |1 o5 T& P
usage: 3.1 GiB used, 245 GiB / 248 GiB avail5 d/ I7 @2 r, Y5 ^5 l
pgs: 288 active+clean, ^1 X2 e7 Q, ]5 _/ `' V4 A
' S9 l& h1 ~# {# c' m ]; S4 Z6.卸载CephFS后再挂载时报错3 Z" Y3 q4 e N0 J8 a9 L2 {2 f
挂载命令如下:5 g/ k; U/ e. P0 U
mount -t ceph 10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ /mnt/mycephfs/ -o name=admin,secret=AQBAI/JbROMoMRAAbgRshBRLLq953AVowLgJPw==) Z# }/ K$ c/ |$ B0 ?6 [# e
, i8 ~2 F7 n. s" c0 H" O
卸载CephFS后再挂载时报错:mount error(2): No such file or directory2 i. E- Z0 ?3 n0 t3 i2 N m7 q
说明:首先检查/mnt/mycephfs/目录是否存在并可访问,我的是存在的但依然报错No such file or directory。但是我重启了一下osd服务意外好了,可以正常挂载CephFS。
: c" n9 x/ H# ][root@ceph1 ~]# systemctl restart ceph-osd.target
0 o6 s! v: J" e7 L, X[root@ceph1 ~]# mount -t ceph 10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ /mnt/mycephfs/ -o name=admin,secret=AQBAI/JbROMoMRAAbgRshBRLLq953AVowLgJPw==
% T4 K6 s* U4 ^$ d% ]
- I9 O# i* `: Q/ j& ~9 r! _3 R可见挂载成功~!
( o z( ]) K) W3 z[root@ceph1 ~]# df -h
. x, O6 K& [9 t4 W7 R+ nFilesystem Size Used Avail Use% Mounted on# S# G, g! \1 x1 H
/dev/vda2 48G 7.5G 41G 16% /1 P9 g. m+ r1 m( S
devtmpfs 1.9G 0 1.9G 0% /dev/ [; P' q6 {# w
tmpfs 2.0G 8.0K 2.0G 1% /dev/shm$ X# R: v: H1 z: G' z
tmpfs 2.0G 17M 2.0G 1% /run
9 Y# t2 T* K) j0 wtmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup+ M9 Z1 G: u& n' n: v# m
tmpfs 2.0G 24K 2.0G 1% /var/lib/ceph/osd/ceph-0
. ^* c6 y K v# C0 Btmpfs 396M 0 396M 0% /run/user/0
* o1 X7 ^1 u; P6 E5 p10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ 249G 3.1G 246G 2% /mnt/mycephfs
/ k# v: V3 D% N# k/ V0 c2 p
" o- ~7 P& g/ o9 b积累中。。。
) b) ?* M. O) {5 K7 P- |=========================================================================
! h: F+ R4 d; _1 ~5 Y' x6 h0 h总结:
l; @5 L( q- {+ |* f) e2 x查看集群状态发现报错或警告后,往往通过ceph health detail命令可以查看到系统给出的处理建议。通过这些建议一般可以处理大多数集群出现的问题。
0 u3 T, B9 l3 u1 c- B
$ U1 e5 B' e3 F' o& o; o- M# F |
|