|
|
0 当前Ceph版本和CentOS版本:9 s; o( z$ F. u
[root@ceph1 ceph]# ceph -v
0 F: s% R. W k0 K# Dceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
# A Z3 w# D; p1 s" o2 d[root@ceph1 ceph]# cat /etc/redhat-release 1 `% h: J6 X$ K/ k2 {/ r6 c/ W7 L
CentOS Linux release 7.5.1804 (Core)
+ |. M, ^2 U3 m
5 ^$ }4 r- |( a% H- ]& a7 R
) P' W. h! a( ]1.节点间配置文件内容不一致错误6 o$ U# B% t6 N) H5 e
输入ceph-deploy mon create-initial命令获取密钥key,会在当前目录(如我的是~/etc/ceph/)下生成几个key,但报错如下。意思是:就是配置失败的两个结点的配置文件的内容于当前节点不一致,提示使用--overwrite-conf参数去覆盖不一致的配置文件。3 S& T. s$ ]: j$ H7 S; W2 u5 d4 w
[root@ceph1 ceph]# ceph-deploy mon create-initial
, U+ {5 `# ^! T9 N$ {...9 t. f Y% `0 I8 n0 f8 r* Q
[ceph2][DEBUG ] remote hostname: ceph25 \) k% W% {8 ^* H. h
[ceph2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf% l# H' \; ?0 B
[ceph_deploy.mon][ERROR ] RuntimeError: config file /etc/ceph/ceph.conf exists with different content; use --overwrite-conf to overwrite! L; z9 @* ~1 g: Z* ]- ^6 R# `4 x
[ceph_deploy][ERROR ] GenericError: Failed to create 2 monitors3 O+ H" g& W4 l3 a: p7 V! C9 P
...: i# { L# w8 \7 _
* G% o- j. ~* W. O
输入命令如下(此处我共配置了三个结点ceph1~3):4 R7 x" p! Y6 u) M! a$ @2 M) [
[root@ceph1 ceph]# ceph-deploy --overwrite-conf mon create ceph{3,1,2}% P8 H2 j" C; j9 H
...
7 c8 _: }6 i9 l, i* ~' r' `[ceph2][DEBUG ] remote hostname: ceph2
! A" x! i' z2 T/ h' e[ceph2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
?8 n3 _) t) ^6 e[ceph2][DEBUG ] create the mon path if it does not exist& t/ i C+ }/ o- } `
[ceph2][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph2/done
: U( T: w1 j- P) b" U...
- d6 \/ L7 b7 I5 A2 k
( \0 F0 ~' [* r3 y7 U4 \之后配置成功,可继续进行初始化磁盘操作。
. L. B$ z% w8 b- |9 f3 x& G' P2.too few PGs per OSD (21 < min 30)警告
& F) B" L0 s% x3 X0 l[root@ceph1 ceph]# ceph -s6 v$ P9 f) l" L' l9 U8 q, a; q' Q
cluster:8 k y# A6 Q) l: v
id: 8e2248e4-3bb0-4b62-ba93-f597b1a3bd40
0 c, p" t4 o0 G% `! t health: HEALTH_WARN. v. M2 A! y7 Y0 R: ?! J
too few PGs per OSD (21 < min 30)
# o* X- @$ y& I # W( p p1 }, t. C1 \* h7 ^
services:" {8 _, F& y- v5 n9 O
mon: 3 daemons, quorum ceph2,ceph1,ceph32 g; k' Z$ o. N# X+ p
mgr: ceph2(active), standbys: ceph1, ceph3) D) _9 r; @" V* V c# N9 A) _
osd: 3 osds: 3 up, 3 in2 y1 m }9 S! |$ f% V
rgw: 1 daemon active0 F% [, X: a, l* f+ T
/ `' p5 r7 F1 Q, K8 q& g. `2 M
data:
4 x" F/ T* o2 i. f+ G5 d$ F1 P pools: 4 pools, 32 pgs+ z& \( ]' e4 x w q
objects: 219 objects, 1.1 KiB
/ t }: m( X' V' ^$ o/ S3 `; y+ L usage: 3.0 GiB used, 245 GiB / 248 GiB avail
) D& U( E9 g- r8 I8 ] pgs: 32 active+clean
; M9 g2 o8 b4 L0 ^7 @! N/ m" x$ } g4 p
$ I4 |* `2 ~- O0 k$ e# ~- k从上面集群状态信息可查,每个osd上的pg数量=21<最小的数目30个。pgs为32,因为我之前设置的是2副本的配置,所以当有3个osd的时候,每个osd上均分了32÷3*2=21个pgs,也就是出现了如上的错误 小于最小配置30个。: H2 _) B9 E/ R' s
集群这种状态如果进行数据的存储和操作,会发现集群卡死,无法响应io,同时会导致大面积的osd down。
8 s5 J: q. j1 s! G解决办法:增加pg数) {# n! k: N& {: X/ n# N& O
因为我的一个pool有8个pgs,所以我需要增加两个pool才能满足osd上的pg数量=48÷3*2=32>最小的数目30。
, v4 U% r+ }$ C9 m. V9 K[root@ceph1 ceph]# ceph osd pool create mytest 8
; K: U7 f. M$ y0 `2 i, O1 zpool 'mytest' created
1 D3 p3 K2 q) G# }[root@ceph1 ceph]# ceph osd pool create mytest1 8/ i# r% l6 n8 b, O
pool 'mytest1' created: O0 k# [" z5 m- x7 N. M' d
[root@ceph1 ceph]# ceph -s& R$ b; B! @' K5 R0 e+ g1 S
cluster:; K& G% ?: C! u8 o1 W
id: 8e2248e4-3bb0-4b62-ba93-f597b1a3bd40( U7 ?6 }# _1 \; R% y
health: HEALTH_OK l+ F$ i* I: h* l2 E' ?
: Y* w" R8 M! P' n( v9 s services:
, X; [' a( L+ x+ T, e7 D Y mon: 3 daemons, quorum ceph2,ceph1,ceph3
! X+ m1 H' O' M! _" C mgr: ceph2(active), standbys: ceph1, ceph37 P5 ?5 q' C% e2 _( d- t5 l
osd: 3 osds: 3 up, 3 in# _, I+ K& d: }) F# M( e; g9 ?
rgw: 1 daemon active/ _' ?( Z. Y' H, t* Z/ h0 ~$ u2 s5 t
O5 |* ?3 M4 I* t' w$ l- b
data:; V/ f- r* V: k3 M+ `" e9 k
pools: 6 pools, 48 pgs2 G1 b" }+ Z7 @3 v% o9 l
objects: 219 objects, 1.1 KiB
# E7 `, k) y- K usage: 3.0 GiB used, 245 GiB / 248 GiB avail
) k( A0 Y' c: C, ^! l! }8 s pgs: 48 active+clean- ~ r' F% ]+ O+ _) c! _
' q/ j0 q& }+ u8 ]
集群健康状态显示正常。
6 Z6 G6 f' ^' }- G% s7 l; l$ ?% e3.集群状态是HEALTH_WARN application not enabled on 1 pool(s)8 E, b# p @. r& }' U
如果此时,查看集群状态是HEALTH_WARN application not enabled on 1 pool(s):3 A# ^! H* ^# r- C$ i. Z
[root@ceph1 ceph]# ceph -s
3 ~" y/ [7 e; p# A7 p* k" r cluster:
8 P- j0 V) E7 w9 C! ]# ` id: 13430f9a-ce0d-4d17-a215-272890f47f282 E/ a( P: b8 q" c* r& b& W0 w V
health: HEALTH_WARN
1 c- P7 T2 k% E4 K6 N( u% E application not enabled on 1 pool(s)( N* b$ C1 @2 I4 M
: g4 K# M g# X# I: o- ~; d
[root@ceph1 ceph]# ceph health detail
& D" ~ z2 S6 dHEALTH_WARN application not enabled on 1 pool(s)
7 {+ m: N* s" S4 R z/ }" bPOOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
4 q. _) `7 x8 P. a* w application not enabled on pool 'mytest'
% s/ `6 A$ J! e( @* F6 C# { use 'ceph osd pool application enable <pool-name> <app-name>', where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications.) b7 b1 v% Y0 [/ [
: p" f2 e4 v9 J' q1 `+ z$ B
运行ceph health detail命令发现是新加入的存储池mytest没有被应用程序标记,因为之前添加的是RGW实例,所以此处依提示将mytest被rgw标记即可:; h9 d q0 B: b
[root@ceph1 ceph]# ceph osd pool application enable mytest rgw8 `; I0 c+ `' t0 h" U) |: I
enabled application 'rgw' on pool 'mytest'- J% l7 Z: i: G! o6 [
V7 C0 |$ _) r. \再次查看集群状态发现恢复正常5 M P$ [1 G. a8 c' s# J
[root@ceph1 ceph]# ceph health) q9 b+ \0 u' u5 u
HEALTH_OK
3 M; v: {# ?! P, G' {7 X4 c3 T
2 U k5 m" F6 _7 R& R& l4.删除存储池报错, A% \2 ]2 t- {9 [
以下以删除mytest存储池为例,运行ceph osd pool rm mytest命令报错,显示需要在原命令的pool名字后再写一遍该pool名字并最后加上--yes-i-really-really-mean-it参数
! o) x0 b* m5 y[root@ceph1 ceph]# ceph osd pool rm mytest
- R+ R, m; v: x/ DError EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool mytest. If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it.4 z! f: j2 H) {2 s% w2 {
" P7 v: `; K* g/ l$ O5 Y
按照提示要求复写pool名字后加上提示参数如下,继续报错:
, A w" r: C" H7 l: d[root@ceph1 ceph]# ceph osd pool rm mytest mytest --yes-i-really-really-mean-it0 d9 ~( o9 i2 ~
Error EPERM: pool deletion is disabled; you must first set the
7 @+ s1 ?6 p3 V: M9 }mon_allow_pool_delete config option to true before you can destroy a pool
n" h7 Q8 f5 t6 v5 F6 C. t: L5 ?! K- y9 d
错误信息显示,删除存储池操作被禁止,应该在删除前现在ceph.conf配置文件中增加mon_allow_pool_delete选项并设置为true。所以分别登录到每一个节点并修改每一个节点的配置文件。操作如下:
1 [) I: {( i3 j$ w6 V[root@ceph1 ceph]# vi ceph.conf
" w* C( U0 ~4 A( J( x. A+ k! q, y4 J" i[root@ceph1 ceph]# systemctl restart ceph-mon.target7 [5 S8 V( d/ G V# I
/ d8 r# d) v& z5 M4 G. i; q$ }( w% L9 f
在ceph.conf配置文件底部加入如下参数并设置为true,保存退出后使用systemctl restart ceph-mon.target命令重启服务。
, C2 }+ k! N' j# C& u% I# F& u. e[mon]6 k1 P. o4 |2 l% w, j* K
mon allow pool delete = true
5 B9 U( a( t( U: F' I: |+ d. s# b+ }7 ~6 N5 k, Q3 _6 Q$ M; f
其余节点操作同理。4 V+ a3 L; R: e2 C
[root@ceph2 ceph]# vi ceph.conf % [; N1 W! q- h7 [
[root@ceph2 ceph]# systemctl restart ceph-mon.target$ u2 }' {7 `# e. T
[root@ceph3 ceph]# vi ceph.conf
) y7 _0 b" C6 x[root@ceph3 ceph]# systemctl restart ceph-mon.target
. C9 x' c9 ~$ Z5 S1 B
8 e& q1 \" A3 W i9 M再次删除,即成功删除mytest存储池。* ^$ C& ~4 \% u0 l
[root@ceph1 ceph]# ceph osd pool rm mytest mytest --yes-i-really-really-mean-it
3 c5 u X) O( f0 ? d7 i6 }4 }pool 'mytest' removed
9 F7 R) C$ n" ]% f: M a
$ ?3 Q( V* ^ ?# v" F ~9 y5.集群节点宕机后恢复节点排错
8 U2 k; O J W- G" S( K笔者将ceph集群中的三个节点分别关机并重启后,查看ceph集群状态如下:' Z- o% y+ ~. P* g3 r8 I% |
[root@ceph1 ~]# ceph -s
' O; O4 l: z& ^7 N6 k cluster:
4 J7 L' Z) U y1 [/ H id: 13430f9a-ce0d-4d17-a215-272890f47f28- E1 x- `5 o3 t+ e$ p, d
health: HEALTH_WARN
/ K t! O) P+ }. f% V/ X- b* }( S. l 1 MDSs report slow metadata IOs" e; C8 p5 ?1 C; \
324/702 objects misplaced (46.154%)) n0 |) w8 `& a0 a+ e
Reduced data availability: 126 pgs inactive& S4 j5 n+ p6 H5 t3 V$ E( R8 Y
Degraded data redundancy: 144/702 objects degraded (20.513%), 3 pgs degraded, 126 pgs undersized
1 _& F9 t5 f; ^+ z/ I, g: ^2 o 5 |# s# ~8 U3 n/ g7 E3 y, \0 Q' q
services:% d8 R+ i# C" Q: u( P! w
mon: 3 daemons, quorum ceph2,ceph1,ceph3/ v! g' G& L; ]+ F* u& I0 `+ N- g
mgr: ceph1(active), standbys: ceph2, ceph37 b9 W) ~+ U2 \9 d0 \4 ]
mds: cephfs-1/1/1 up {0=ceph1=up:creating}
7 x( T- Y' R- T4 a osd: 3 osds: 3 up, 3 in; 162 remapped pgs
$ ?: C7 [4 v) d$ m 3 [' _5 V2 g2 U( v5 l
data:
) |2 O* ]9 t. f/ M" n' P pools: 8 pools, 288 pgs
3 c" B1 K& i4 y* w6 O2 ^ d: { objects: 234 objects, 2.8 KiB3 _5 _# e( A \) P- k( g& Z% e
usage: 3.0 GiB used, 245 GiB / 248 GiB avail
( E! ]* |* W( G5 Z pgs: 43.750% pgs not active8 A( K5 g4 b) P/ ~8 ?, @
144/702 objects degraded (20.513%)
+ a7 U- ^9 A5 {/ [. K: F: { 324/702 objects misplaced (46.154%)
E& Q. F% A# v6 M/ U6 } 162 active+clean+remapped3 x# l- N, k1 f/ A& w
123 undersized+peered" b2 ]# E! |6 `1 E, E3 D8 S
3 undersized+degraded+peered
( X8 L8 }7 q( R9 z$ v7 L2 Q
3 B4 e9 w; g" u- h% g! u; H* X查看
- e' d; u1 c( L1 Y[root@ceph1 ~]# ceph health detail
( w* R4 ?5 t1 J5 L( I; HHEALTH_WARN 1 MDSs report slow metadata IOs; 324/702 objects misplaced (46.154%); Reduced data availability: 126 pgs inactive; Degraded data redundancy: 144/702 objects degraded (20.513%), 3 pgs degraded, 126 pgs undersized
" A0 U! _, w3 U2 R$ d: EMDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs2 d/ x K! X+ b3 r! a7 m
mdsceph1(mds.0): 9 slow metadata IOs are blocked > 30 secs, oldest blocked for 42075 secs& i/ b" t+ Y& d; H& s) d+ S
OBJECT_MISPLACED 324/702 objects misplaced (46.154%)8 N1 g) {# K4 R% F& ~0 C2 ]* \
PG_AVAILABILITY Reduced data availability: 126 pgs inactive. D W `$ Y: m. h; n5 U r$ q+ a
pg 8.28 is stuck inactive for 42240.369934, current state undersized+peered, last acting [0]
9 a* y1 A! M4 F; t2 y/ L% |5 R pg 8.2a is stuck inactive for 45566.934835, current state undersized+peered, last acting [0]
5 [% V' p7 G5 t: u& }0 F/ W pg 8.2d is stuck inactive for 42240.371314, current state undersized+peered, last acting [0]
: {# W) k- h% F% \5 @ pg 8.2f is stuck inactive for 45566.913284, current state undersized+peered, last acting [0]
3 l$ ~4 @* V& { pg 8.32 is stuck inactive for 42240.354304, current state undersized+peered, last acting [0]; g) u# o! u" ]3 H0 r. I# q. t
..../ y" A; A0 o! U. l8 C6 q
pg 8.28 is stuck undersized for 42065.616897, current state undersized+peered, last acting [0]1 b( d" ~1 Q! f( |* l
pg 8.2a is stuck undersized for 42065.613246, current state undersized+peered, last acting [0]
! n2 ?6 b+ L+ @- p pg 8.2d is stuck undersized for 42065.951760, current state undersized+peered, last acting [0]) a$ k& @8 J* O) R3 K% D; y. k2 P
pg 8.2f is stuck undersized for 42065.610464, current state undersized+peered, last acting [0]
* q' ]) m# p3 k& v1 M, H& B pg 8.32 is stuck undersized for 42065.959081, current state undersized+peered, last acting [0]( _) X6 h! q8 Y" Z3 z5 x, B# l
....
) i( y) @$ ~8 h8 m+ G/ _* D2 _! c1 i! }
可见在数据修复中, 出现了inactive和undersized的值, 则是不正常的现象7 Y( ]) H- Z* G1 e/ S3 ~
解决方法:
$ j( v- `5 i0 V1 p+ S2 N①处理inactive的pg:) b8 V- Q1 U W I' q; ?
重启一下osd服务即可' e0 R% H! i# E+ I! j" B0 I
[root@ceph1 ~]# systemctl restart ceph-osd.target
. b' z/ y" K9 v7 Q- G" u1: d$ N) X9 k [( @: I
继续查看集群状态发现,inactive值的pg已经恢复正常,此时还剩undersized的pg。
) ~9 D4 q* f) _# }! c' e9 e! B[root@ceph1 ~]# ceph -s
3 h, U$ Q3 a" G. i4 _- | cluster:
0 A: P; Q: F) W' z id: 13430f9a-ce0d-4d17-a215-272890f47f28
3 b/ n. o8 Z8 }) n health: HEALTH_WARN
, O; j; |, e9 ^0 K, C6 V3 V1 V; o 1 filesystem is degraded& k+ }3 `. u0 y- @& t, A% r x) n
241/723 objects misplaced (33.333%)
3 b7 J s' v! m Degraded data redundancy: 59 pgs undersized' }" @- @3 x+ M6 s3 ^2 J' l# @8 r
/ @& I' y& v0 t1 }$ l+ O
services:: n$ ]/ a4 P, Y0 h( e" a! k
mon: 3 daemons, quorum ceph2,ceph1,ceph37 s* x4 s9 x0 _9 A. o. u- m
mgr: ceph1(active), standbys: ceph2, ceph38 R a& O& L8 E- \% J$ X
mds: cephfs-1/1/1 up {0=ceph1=up:rejoin}
& l! C2 I( x6 R8 x osd: 3 osds: 3 up, 3 in; 229 remapped pgs4 @& B% W* `( Y5 r$ Y
rgw: 1 daemon active
& @' t3 d; _% A0 ?4 U) L
- w$ t9 ?0 U7 Q+ T9 G2 u: l data:
2 d: R9 A; l* C5 K. @! Y pools: 8 pools, 288 pgs
2 _! Q1 o/ d6 `& I* O! T% q! e; H objects: 241 objects, 3.4 KiB j$ R$ v& D5 L R+ K( Q0 t2 Z, q! Z
usage: 3.0 GiB used, 245 GiB / 248 GiB avail8 L4 F/ S+ N, N9 R
pgs: 241/723 objects misplaced (33.333%)6 H# V" p4 f s2 k4 n+ q4 E
224 active+clean+remapped q0 m N% U5 E( q& `3 A
59 active+undersized
! Q. B# W# C8 g5 ^- ]" I, E) u 5 active+clean. p0 _+ B1 W) C0 ?
: M# i! J% n6 ~, U" S! P
io:0 H6 v) c; A1 l7 H3 p3 U2 B
client: 1.2 KiB/s rd, 1 op/s rd, 0 op/s wr: r( M& N3 c9 A" U
! d2 r5 R7 P: ~* f& x( z②处理undersized的pg:
5 N; m! Z) {; ^. J( d+ I1 Y/ F! {& f0 e7 w. \' a4 h* r& H* L
学会出问题先查看健康状态细节,仔细分析发现虽然设定的备份数量是3,但是PG 12.x却只有两个拷贝,分别存放在OSD 0~2的某两个上。8 j- Y* m8 g, V6 m+ E. x) P
[root@ceph1 ~]# ceph health detail
1 ~9 R4 h* w9 ?- @* Q" YHEALTH_WARN 241/723 objects misplaced (33.333%); Degraded data redundancy: 59 pgs undersized
8 p' b( v t9 J& U# @OBJECT_MISPLACED 241/723 objects misplaced (33.333%)
" F" ]5 `3 u0 n( r. V& MPG_DEGRADED Degraded data redundancy: 59 pgs undersized
9 D% P: i3 X% A8 E2 F2 V pg 12.8 is stuck undersized for 1910.001993, current state active+undersized, last acting [2,0]4 l% i# H" B9 |. f5 L: k2 b
pg 12.9 is stuck undersized for 1909.989334, current state active+undersized, last acting [2,0]* c$ V) V4 |6 ?: l+ l5 Q, x
pg 12.a is stuck undersized for 1909.995807, current state active+undersized, last acting [0,2]
0 U1 y! n2 e- {" Q7 n pg 12.b is stuck undersized for 1910.009596, current state active+undersized, last acting [1,0]1 n3 y6 j$ u4 L, Y) E9 k
pg 12.c is stuck undersized for 1910.010185, current state active+undersized, last acting [0,2]! ]: C" ]/ \) h' a/ |
pg 12.d is stuck undersized for 1910.001526, current state active+undersized, last acting [1,0] _( `, m) }: n$ [+ e$ ~5 W
pg 12.e is stuck undersized for 1909.984982, current state active+undersized, last acting [2,0]
$ j# O7 G$ R- Y1 s; D# R pg 12.f is stuck undersized for 1910.010640, current state active+undersized, last acting [2,0]
$ [1 v, [6 ~3 D4 c9 H: N0 k4 x# l% t0 W: h6 {
进一步查看集群osd状态树,发现ceph2和cepn3宕机再恢复后,osd.1 和osd.2进程已不在ceph2和cepn3上。+ m$ b' ~: x9 x. \9 w
[root@ceph1 ~]# ceph osd tree+ V) {- v+ l" I* }1 f# A
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
% b- c" ]. y& S6 c-1 0.24239 root default
$ A$ }+ K" d% s; g& N5 r3 F% g-9 0.16159 host centos7evcloud + M0 ~2 \7 ]/ N
1 hdd 0.08080 osd.1 up 1.00000 1.00000 8 x. Y/ g0 U7 [2 }0 l2 ?: N5 q9 w
2 hdd 0.08080 osd.2 up 1.00000 1.00000 6 E2 l n# w9 a+ H0 ^
-3 0.08080 host ceph1
7 x- K# u n8 z3 U 0 hdd 0.08080 osd.0 up 1.00000 1.00000
4 R7 H* n0 b0 f6 H5 h-5 0 host ceph2 # T. p+ @- t4 q% [( y: D
-7 0 host ceph3, X7 N6 z, E6 D6 o
+ w5 v) s5 A3 h
分别查看osd.1 和osd.2服务状态。# o" j2 k: [1 N! {, h- y
( V8 P6 s# I X& v8 i1 @$ R" x解决方法:
5 l9 q/ @6 f, X% a8 X% e2 X: S分别进入到ceph2和ceph3节点中重启osd.1 和osd.2服务,将这两个服务重新映射到ceph2和ceph3节点中。
1 k* c, ~7 H! Y/ ?) D[root@ceph1 ~]# ssh ceph2( L& J# A7 C3 y' x5 ]8 c
[root@ceph2 ~]# systemctl restart ceph-osd@1.service8 O; `! z; x" J8 f# V, m
[root@ceph2 ~]# ssh ceph32 ~5 v& K( u8 ?) g& c& R
[root@ceph3 ~]# systemctl restart ceph-osd@2.service3 b3 \3 Z: x3 B4 \- L1 t. r
+ q! [2 F6 e" P' A% G) @# X! e最后查看集群osd状态树发现这两个服务重新映射到ceph2和ceph3节点中。5 F$ i" S6 v$ y
[root@ceph3 ~]# ceph osd tree; A0 F% T- ~# T& [
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
* W% ?' A" y( i, |-1 0.24239 root default
& x5 `& I5 \) b-9 0 host centos7evcloud 6 V% b! j( q6 L+ f7 G: l# |
-3 0.08080 host ceph1
! ~, j3 x1 M$ g$ @- t; s* H F 0 hdd 0.08080 osd.0 up 1.00000 1.00000 5 i- y; W; N7 f* n2 v
-5 0.08080 host ceph2 ; }) r( l5 l D( t' m9 M: N- ], t8 Z
1 hdd 0.08080 osd.1 up 1.00000 1.00000 3 b5 ?1 B+ ?; ]/ ^4 J
-7 0.08080 host ceph3
7 ]" q: v8 e2 k( S 2 hdd 0.08080 osd.2 up 1.00000 1.00000
& U/ ?* M, Z" {% [$ B/ O/ a- \+ T6 _" F6 O+ l/ Q5 z9 s/ t
集群状态也显示了久违的HEALTH_OK。
0 _% n4 }) m( S* e# B7 {[root@ceph3 ~]# ceph -s/ ^& D5 z; U; e3 I
cluster:. j6 l. J/ b9 _* u1 L8 j; H
id: 13430f9a-ce0d-4d17-a215-272890f47f28. I0 N L0 J5 J! n& o9 ^
health: HEALTH_OK4 [7 c7 i$ r8 w+ q ~! Z
1 {4 m8 }: R; W7 R0 u+ h, H% \0 Q services:
# k& r& G# i- k, ?3 N0 S mon: 3 daemons, quorum ceph2,ceph1,ceph3" h- `5 d! U! W: Y3 i
mgr: ceph1(active), standbys: ceph2, ceph3: U% p A7 R$ Z$ x2 s) [
mds: cephfs-1/1/1 up {0=ceph1=up:active}
1 j) x# V2 p% a osd: 3 osds: 3 up, 3 in( R9 ?- M0 m5 N+ b' Q
rgw: 1 daemon active
" V$ q0 s$ D2 N7 A- p / K6 \- i( v* G$ {+ n. C
data:6 e/ Y8 ]) H; V& p& w0 G
pools: 8 pools, 288 pgs
: k7 f0 c* y8 d; s! f8 B( U objects: 241 objects, 3.6 KiB
6 ~. U4 n3 ?0 P' u/ [2 w @' \ usage: 3.1 GiB used, 245 GiB / 248 GiB avail
$ d7 S& p! U3 h% t pgs: 288 active+clean
+ S: b; L' e# e, ~' S5 i& U! v
. ?# V1 h9 i S6.卸载CephFS后再挂载时报错
& R8 y& Z+ \% ]' E4 ]6 I% ~挂载命令如下:
0 W0 n, y9 a% T6 L( W0 E+ f, rmount -t ceph 10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ /mnt/mycephfs/ -o name=admin,secret=AQBAI/JbROMoMRAAbgRshBRLLq953AVowLgJPw==. p: l- k0 ~3 B3 E
* I" ]# }$ s3 Z! H, g/ W7 f* j( f卸载CephFS后再挂载时报错:mount error(2): No such file or directory8 Z5 g. @. v; C- Q" l
说明:首先检查/mnt/mycephfs/目录是否存在并可访问,我的是存在的但依然报错No such file or directory。但是我重启了一下osd服务意外好了,可以正常挂载CephFS。
# [- J! ? e8 @ m[root@ceph1 ~]# systemctl restart ceph-osd.target& K0 S' R8 O& N
[root@ceph1 ~]# mount -t ceph 10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ /mnt/mycephfs/ -o name=admin,secret=AQBAI/JbROMoMRAAbgRshBRLLq953AVowLgJPw==: t9 u9 y: Z+ V6 J6 k: X
: s. T) @$ e( m& [# s4 e
可见挂载成功~!" x( D. H$ E# D/ r$ O. |6 C3 ]
[root@ceph1 ~]# df -h
0 }" }. t+ L9 ~4 m# MFilesystem Size Used Avail Use% Mounted on- K1 W8 \, j, ~+ C6 D
/dev/vda2 48G 7.5G 41G 16% /
1 l& b" n" {* f& Idevtmpfs 1.9G 0 1.9G 0% /dev
( X$ p0 J) v% |: a* ?tmpfs 2.0G 8.0K 2.0G 1% /dev/shm
7 j' t1 S3 ?1 D" T6 X' rtmpfs 2.0G 17M 2.0G 1% /run9 S2 ?% p0 x9 C" L
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
. r! Y* R8 \/ {- `tmpfs 2.0G 24K 2.0G 1% /var/lib/ceph/osd/ceph-0 I5 c. c$ j+ a: p! l
tmpfs 396M 0 396M 0% /run/user/0! N* L8 N; p7 A! p' A7 x1 C
10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ 249G 3.1G 246G 2% /mnt/mycephfs
: y& U: E! e% [% I( Q4 `' e3 b4 w: t! E+ W
积累中。。。
" L) V7 z/ X( S) i=========================================================================/ m ?5 J% {; c: g) B' A
总结:& x( w8 H% w( l& N
查看集群状态发现报错或警告后,往往通过ceph health detail命令可以查看到系统给出的处理建议。通过这些建议一般可以处理大多数集群出现的问题。
) T% Y9 M0 C( @$ M" m
4 o( m8 K$ ~3 b: ^! w5 c r P$ B |
|