|
|
0 当前Ceph版本和CentOS版本:
: z# m$ g1 P( E3 `+ p+ ~: v0 I! X[root@ceph1 ceph]# ceph -v# L: h5 K" f) I
ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)$ j4 ^4 h2 D* p# C
[root@ceph1 ceph]# cat /etc/redhat-release
* m0 n. N0 r" ]3 b* NCentOS Linux release 7.5.1804 (Core)5 G6 F+ J& w5 E& B, _
! d3 T1 U/ ^" w5 T; Q) C
8 r5 D; I- y. h1 b: p B: l1.节点间配置文件内容不一致错误9 A; t# f! {0 z* T& @1 m& O1 I
输入ceph-deploy mon create-initial命令获取密钥key,会在当前目录(如我的是~/etc/ceph/)下生成几个key,但报错如下。意思是:就是配置失败的两个结点的配置文件的内容于当前节点不一致,提示使用--overwrite-conf参数去覆盖不一致的配置文件。! {9 A0 i; G5 b
[root@ceph1 ceph]# ceph-deploy mon create-initial
, P( T0 y$ c5 s4 ], u' k...
# Z: j$ Y4 u0 j$ Z' t[ceph2][DEBUG ] remote hostname: ceph2( _4 n$ t9 z6 I0 d- ~9 ^7 s
[ceph2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf+ \3 ]- d# V0 b" p
[ceph_deploy.mon][ERROR ] RuntimeError: config file /etc/ceph/ceph.conf exists with different content; use --overwrite-conf to overwrite
6 o: a' z9 _4 g; D/ z$ O[ceph_deploy][ERROR ] GenericError: Failed to create 2 monitors2 |6 `1 e/ o1 v/ e+ ?5 m
...
8 v# j' U; u8 @' T- Z4 P" w G
% V/ @0 J! {7 p% n F输入命令如下(此处我共配置了三个结点ceph1~3):/ `2 w# b. d2 g( e: u9 @
[root@ceph1 ceph]# ceph-deploy --overwrite-conf mon create ceph{3,1,2}
/ M' q ^3 v s0 N" ?...; A, ?! W9 F% _: r+ I
[ceph2][DEBUG ] remote hostname: ceph2
! m9 P. [) j5 u: y9 p+ X[ceph2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
1 F) |( e" r# {/ B[ceph2][DEBUG ] create the mon path if it does not exist
, B5 ]& s8 B9 k. [! q7 s[ceph2][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph2/done
+ e( ]) l! m/ v...
, o9 I, h0 G9 j8 ?. ^" v: o
1 `: l) j: m4 h$ d3 N+ H' [之后配置成功,可继续进行初始化磁盘操作。" r* j# |$ h- D6 u3 G
2.too few PGs per OSD (21 < min 30)警告) n; u) c+ D- E" K H
[root@ceph1 ceph]# ceph -s1 j- G, t4 G f8 L
cluster:. G( I9 |6 e1 K+ R `
id: 8e2248e4-3bb0-4b62-ba93-f597b1a3bd40& V# v4 I$ _# d. B
health: HEALTH_WARN" Z- V1 ~# @2 s! h+ M5 m
too few PGs per OSD (21 < min 30)/ p" @! P5 D+ u7 ~) q
2 I- }1 ]( d S7 @2 L3 |; j
services:7 x( A. l* C- N0 F4 }# E W+ _
mon: 3 daemons, quorum ceph2,ceph1,ceph3' P J+ b0 O4 o+ O. R
mgr: ceph2(active), standbys: ceph1, ceph32 N2 r' z+ z1 F+ o4 O0 ~! u
osd: 3 osds: 3 up, 3 in5 Q# z9 \; v) j; g/ `* `/ J
rgw: 1 daemon active
! g$ x ]/ R% _# `! C6 h, O
$ W6 T& h8 W# c0 j data:0 [6 u4 \8 k% [1 M5 B. h6 c5 D5 D5 r
pools: 4 pools, 32 pgs) V+ l9 N7 o% w! a& t
objects: 219 objects, 1.1 KiB% X: g5 K, t$ t' i4 k
usage: 3.0 GiB used, 245 GiB / 248 GiB avail8 h* _) `% m, E9 t
pgs: 32 active+clean
+ q2 p- _0 z- {" C0 e; ?+ G _7 X# U1 k3 d& W f
- r( i. |9 [4 c9 J% t从上面集群状态信息可查,每个osd上的pg数量=21<最小的数目30个。pgs为32,因为我之前设置的是2副本的配置,所以当有3个osd的时候,每个osd上均分了32÷3*2=21个pgs,也就是出现了如上的错误 小于最小配置30个。
8 a8 e/ Q7 `. g8 R, ^( K& {1 F集群这种状态如果进行数据的存储和操作,会发现集群卡死,无法响应io,同时会导致大面积的osd down。: ^, f% ^% z" A% P
解决办法:增加pg数
/ ^$ @ O( c7 M2 g! p因为我的一个pool有8个pgs,所以我需要增加两个pool才能满足osd上的pg数量=48÷3*2=32>最小的数目30。
8 _& `" L$ N2 X# P# Q[root@ceph1 ceph]# ceph osd pool create mytest 88 A/ r0 j9 i' g; {; j
pool 'mytest' created
; o$ d( I5 ?8 V) B3 T ?[root@ceph1 ceph]# ceph osd pool create mytest1 8* H* B. s) c% ^, ^
pool 'mytest1' created0 I8 d, Q4 o S
[root@ceph1 ceph]# ceph -s! ^# J* @" e1 [; ^4 G: |
cluster:
. X* y n. v" M% O! x$ O4 I2 |/ P# X id: 8e2248e4-3bb0-4b62-ba93-f597b1a3bd40
. B# b4 Y* x8 L; V health: HEALTH_OK
, t5 T0 s, f7 g, X
- Z, t0 c" B* l: f! a services:
) R' Y4 S! y/ {2 A, y% B mon: 3 daemons, quorum ceph2,ceph1,ceph3# m! h. W: S: g! Z$ {
mgr: ceph2(active), standbys: ceph1, ceph3# I* ]% R* Z, d( z3 _. C
osd: 3 osds: 3 up, 3 in+ E9 q' n* ^: j
rgw: 1 daemon active
+ D; {! g1 l) @' _9 _1 n 3 W) _: X1 `: m1 [. Q4 ~5 J7 t
data:
; N; a' l p2 K6 z- G5 @ pools: 6 pools, 48 pgs
: t! u) K0 U! x7 ~6 E objects: 219 objects, 1.1 KiB
( v5 {8 n8 ?4 \3 `0 V' o usage: 3.0 GiB used, 245 GiB / 248 GiB avail4 }# \' t$ L) ]2 f
pgs: 48 active+clean/ [; _# S) A6 u8 e* E
4 f- ^% v! |6 A- r' m I
集群健康状态显示正常。4 L1 O: {( [* a/ b3 O- n
3.集群状态是HEALTH_WARN application not enabled on 1 pool(s)
+ h) n3 F# L8 j. G$ n/ A# X如果此时,查看集群状态是HEALTH_WARN application not enabled on 1 pool(s):+ d6 E7 A9 ^# _) h
[root@ceph1 ceph]# ceph -s( ] |+ ^, j" _$ s
cluster:1 N0 R, Q; H# X/ D& c- s* G
id: 13430f9a-ce0d-4d17-a215-272890f47f28
; s* m. j( @' E+ r0 m/ A health: HEALTH_WARN
* c" t/ P: p1 ?* P( B; Z application not enabled on 1 pool(s)' a# `9 N6 d3 x# W" O5 s/ E
! ]6 o; _- e3 `" X2 y( l* Z1 n[root@ceph1 ceph]# ceph health detail
x! c+ D1 |/ k9 U: c& B4 f v7 u% OHEALTH_WARN application not enabled on 1 pool(s)1 d/ H8 h' _# k& c' N
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s), _4 e) q5 f0 o+ h5 u: E; b" Q( S; q
application not enabled on pool 'mytest'
; R5 z+ h0 A6 `4 U use 'ceph osd pool application enable <pool-name> <app-name>', where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications.
" q' l0 I. Z. A1 a f( e
0 _! \( S$ J' @7 _5 | `运行ceph health detail命令发现是新加入的存储池mytest没有被应用程序标记,因为之前添加的是RGW实例,所以此处依提示将mytest被rgw标记即可:$ D$ F. v/ q7 L' s
[root@ceph1 ceph]# ceph osd pool application enable mytest rgw% B! A0 R! C4 T
enabled application 'rgw' on pool 'mytest'
' X* h# W) ?/ a/ `) o; p. D; L
再次查看集群状态发现恢复正常
1 x% R& Y* C/ R. R6 j[root@ceph1 ceph]# ceph health
1 Y& z5 }! u, l, s. \HEALTH_OK
4 m3 Y/ s* S% ?
* ^) k; y& Q- j4.删除存储池报错
' k8 _, E8 Q& J; M/ N以下以删除mytest存储池为例,运行ceph osd pool rm mytest命令报错,显示需要在原命令的pool名字后再写一遍该pool名字并最后加上--yes-i-really-really-mean-it参数 `/ g+ x3 w2 E. Z, y7 H
[root@ceph1 ceph]# ceph osd pool rm mytest; w* U9 _6 w6 T3 r. N) {3 \, b+ v
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool mytest. If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it.
$ L5 p$ D" S/ @( M+ V# V" U( z& m* A7 y! C& `8 [* v$ b0 b
按照提示要求复写pool名字后加上提示参数如下,继续报错:
9 ^1 K! B$ W+ t/ h- l4 N[root@ceph1 ceph]# ceph osd pool rm mytest mytest --yes-i-really-really-mean-it, E. f9 n2 \/ m5 e
Error EPERM: pool deletion is disabled; you must first set the ! `; [3 B i/ |( ]
mon_allow_pool_delete config option to true before you can destroy a pool
$ C: f: v Z) X3 }$ u4 x0 J& f' } F0 _* r" W' e
错误信息显示,删除存储池操作被禁止,应该在删除前现在ceph.conf配置文件中增加mon_allow_pool_delete选项并设置为true。所以分别登录到每一个节点并修改每一个节点的配置文件。操作如下:- j8 b' n; O* _. F# G) d+ Y; w
[root@ceph1 ceph]# vi ceph.conf 5 u$ {0 C# Y% }5 \& Q$ U+ ]' x2 Q
[root@ceph1 ceph]# systemctl restart ceph-mon.target
8 ^# V6 \+ d7 w' ~/ r$ } U2 J; [5 {
在ceph.conf配置文件底部加入如下参数并设置为true,保存退出后使用systemctl restart ceph-mon.target命令重启服务。
7 V; H3 a- g( K n, B[mon]% Z* @% t, `/ m
mon allow pool delete = true# z5 D3 n6 q) \0 |
- _7 j6 E2 F' @+ T
其余节点操作同理。
! {( C9 b6 }3 H9 w+ w0 a[root@ceph2 ceph]# vi ceph.conf
, B5 M5 b1 W6 O4 X[root@ceph2 ceph]# systemctl restart ceph-mon.target
1 ~3 p, m3 x8 \7 J7 e* L[root@ceph3 ceph]# vi ceph.conf
' @+ z3 d8 x4 [[root@ceph3 ceph]# systemctl restart ceph-mon.target
1 I" w% F! F! T5 w+ s8 G8 Y- f; V1 ?5 u: e! ?+ R% D
再次删除,即成功删除mytest存储池。9 J$ W- R% _+ P4 Q' ^
[root@ceph1 ceph]# ceph osd pool rm mytest mytest --yes-i-really-really-mean-it
' Y* U# ~: [6 {% {" ~& g- R6 T. Cpool 'mytest' removed4 A- s( `- U$ O
! H) o. g% w* M% Y0 {3 Z6 h5.集群节点宕机后恢复节点排错
) H: q+ \0 K9 b: C4 Q3 C笔者将ceph集群中的三个节点分别关机并重启后,查看ceph集群状态如下:7 S4 ~4 M# d: y3 `& z& ]
[root@ceph1 ~]# ceph -s8 }+ ^- U1 A% z$ g. c
cluster:/ R% ^3 S9 X2 l
id: 13430f9a-ce0d-4d17-a215-272890f47f28
4 p' u; p7 O2 r1 h5 ?! C health: HEALTH_WARN
, F7 s) T' S0 H9 a: n$ k 1 MDSs report slow metadata IOs& F# {3 m& g% V3 Q! {% z
324/702 objects misplaced (46.154%), Q6 Q$ g8 M7 _+ A. ]1 ~
Reduced data availability: 126 pgs inactive
- U2 ]0 `* H8 K+ Y( r/ K9 o Degraded data redundancy: 144/702 objects degraded (20.513%), 3 pgs degraded, 126 pgs undersized1 a$ F* a: f4 G( V1 `$ ^8 k, l0 R
; `3 m6 l& l p6 W- M$ f9 ? services:
0 K' I. ?! p" C$ D; y* D( [ mon: 3 daemons, quorum ceph2,ceph1,ceph3: f# I$ ]2 a' G U+ a2 G! i
mgr: ceph1(active), standbys: ceph2, ceph3
1 ]8 Z% v7 H% t8 l mds: cephfs-1/1/1 up {0=ceph1=up:creating}' j% H2 J6 Z" C" T
osd: 3 osds: 3 up, 3 in; 162 remapped pgs6 H5 h+ b; p+ Z+ z2 w8 W
! s. B: I( h, I* K data:3 ^8 ~: x7 X" B- n2 p! H4 z- y' ?
pools: 8 pools, 288 pgs
* M4 `' B7 ^0 E2 c+ F3 I3 Z6 _ objects: 234 objects, 2.8 KiB
7 s9 Z9 Y6 F8 j6 S, z w( T usage: 3.0 GiB used, 245 GiB / 248 GiB avail
5 O( \% u# {( D* X/ C pgs: 43.750% pgs not active
0 C! I# D/ A+ n, J( ?# l 144/702 objects degraded (20.513%)7 T) R, }9 \ d( _; a
324/702 objects misplaced (46.154%)
: d, p& l. e: W' O6 @! d2 {# K 162 active+clean+remapped
, S3 b' p; R, a3 p, U, b" b9 [ 123 undersized+peered
. {8 {, a) ?" h0 K1 h 3 undersized+degraded+peered$ B _/ i! a X# @. m* m
6 X% ^; i v3 r# l: N
查看
4 w0 a7 c2 z- e5 v* p[root@ceph1 ~]# ceph health detail
/ B3 ^# ^6 s+ s; BHEALTH_WARN 1 MDSs report slow metadata IOs; 324/702 objects misplaced (46.154%); Reduced data availability: 126 pgs inactive; Degraded data redundancy: 144/702 objects degraded (20.513%), 3 pgs degraded, 126 pgs undersized$ p/ P) t. D* @' D. C# X
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs" ~0 F B# y1 |# z# F! i
mdsceph1(mds.0): 9 slow metadata IOs are blocked > 30 secs, oldest blocked for 42075 secs
% i% e: A7 J1 G2 s+ [ a2 y/ W+ EOBJECT_MISPLACED 324/702 objects misplaced (46.154%)
7 `1 y/ U5 B7 C1 aPG_AVAILABILITY Reduced data availability: 126 pgs inactive
* T6 \. W! v1 e! L pg 8.28 is stuck inactive for 42240.369934, current state undersized+peered, last acting [0]
( @" D+ ?; Q1 Y9 M0 T8 ~5 V) w pg 8.2a is stuck inactive for 45566.934835, current state undersized+peered, last acting [0]6 L9 G! @! s* x* U. j8 V9 U
pg 8.2d is stuck inactive for 42240.371314, current state undersized+peered, last acting [0]- X% A+ u y7 H8 Q4 X
pg 8.2f is stuck inactive for 45566.913284, current state undersized+peered, last acting [0]
% Y/ Y$ Z; a7 X) p: q7 u pg 8.32 is stuck inactive for 42240.354304, current state undersized+peered, last acting [0]
: U( R y7 T* B ....! e0 G& F) B' F* [, T' c8 m0 V" w
pg 8.28 is stuck undersized for 42065.616897, current state undersized+peered, last acting [0] S: }% \ ^1 V" g5 d) n) T
pg 8.2a is stuck undersized for 42065.613246, current state undersized+peered, last acting [0]
! S9 B/ w/ a. \ pg 8.2d is stuck undersized for 42065.951760, current state undersized+peered, last acting [0]
& z, e& s& I' f/ |5 E) l! M pg 8.2f is stuck undersized for 42065.610464, current state undersized+peered, last acting [0]9 J4 A: c* a! X v" J
pg 8.32 is stuck undersized for 42065.959081, current state undersized+peered, last acting [0]% Z1 |! ?$ J$ ~4 Z3 z5 J7 B
....2 ]! w( q' D( v
; \8 L# N4 T1 x
可见在数据修复中, 出现了inactive和undersized的值, 则是不正常的现象! |* b6 O8 Q2 t2 O9 Q0 _8 W
解决方法:
9 Q7 C3 |6 @9 W: ^2 }) y①处理inactive的pg:2 O3 K9 ^4 Q' \7 _, i/ e
重启一下osd服务即可; j9 {' p6 \2 A* u/ @* ~
[root@ceph1 ~]# systemctl restart ceph-osd.target * q1 I- g) ]2 b( X* V
1/ M, ~( W& b6 O: G9 D- _- S& A3 J1 q8 U
继续查看集群状态发现,inactive值的pg已经恢复正常,此时还剩undersized的pg。& w5 ?$ J6 t- p4 P, |$ _
[root@ceph1 ~]# ceph -s
$ X& `; W; x7 m* m* t5 g% D cluster:* o) L# V2 c; s$ \2 i
id: 13430f9a-ce0d-4d17-a215-272890f47f282 j8 Z8 [; p$ q; g4 B
health: HEALTH_WARN
8 v) [8 ]+ V1 b" c+ G8 }7 K7 Q 1 filesystem is degraded0 I- E* M2 X% Y' E+ }
241/723 objects misplaced (33.333%)
+ J D- j4 m$ I( l' g& B Degraded data redundancy: 59 pgs undersized
9 V K: _ d# l7 a* L1 a
7 L2 H' Q1 c. g" b) Z0 H services:. ?. f4 a+ a$ K7 k6 j& O
mon: 3 daemons, quorum ceph2,ceph1,ceph3
: i* Q, C' t8 J% ^# H mgr: ceph1(active), standbys: ceph2, ceph3
' O8 F6 s: W. k) k$ X- B: s mds: cephfs-1/1/1 up {0=ceph1=up:rejoin}
% W0 w1 M6 d0 |4 G3 I; _ osd: 3 osds: 3 up, 3 in; 229 remapped pgs
1 r' N/ U& z E9 I2 Q/ j rgw: 1 daemon active9 |# c- r7 b. R2 B
, v, P( @% m9 t% t) b# a data:
$ _0 {8 L, k* I2 K* ]: O pools: 8 pools, 288 pgs+ K( K. L. {9 [+ L% f
objects: 241 objects, 3.4 KiB b, X6 f1 A, s$ T! S
usage: 3.0 GiB used, 245 GiB / 248 GiB avail
0 o3 B2 g1 b: p/ Q4 Y# @ pgs: 241/723 objects misplaced (33.333%)& f5 D9 G5 u( x
224 active+clean+remapped3 f$ L$ k* _0 @1 Y' j
59 active+undersized
! y2 R7 T0 w2 g1 o, k 5 active+clean
( i; K. E7 M8 {3 R X6 \' p
) @- I& p' F/ _9 U2 ~7 o8 G3 ] io:
. g: T G, p. e% S6 K% b client: 1.2 KiB/s rd, 1 op/s rd, 0 op/s wr
# X8 E# D: t. _( G2 v" E( ]
9 p& |, `, d9 Y9 R7 }$ |②处理undersized的pg:7 p y2 ]* I0 A+ S6 Q f; W/ n
2 [* }: _4 W% e; r* e& C
学会出问题先查看健康状态细节,仔细分析发现虽然设定的备份数量是3,但是PG 12.x却只有两个拷贝,分别存放在OSD 0~2的某两个上。
" L( u' U4 r+ J: q8 |5 a[root@ceph1 ~]# ceph health detail
5 J! F% S+ _! s) B7 sHEALTH_WARN 241/723 objects misplaced (33.333%); Degraded data redundancy: 59 pgs undersized
! {3 R8 g) \, X( c* R$ BOBJECT_MISPLACED 241/723 objects misplaced (33.333%)
# V, f1 v U6 d" @2 @" c; \# E5 sPG_DEGRADED Degraded data redundancy: 59 pgs undersized/ _' [* U' C. d+ F3 K
pg 12.8 is stuck undersized for 1910.001993, current state active+undersized, last acting [2,0]
+ f( r: `4 X# f' Y2 Y$ c pg 12.9 is stuck undersized for 1909.989334, current state active+undersized, last acting [2,0]
9 N/ Y" ]; F9 ^/ ?' q. f pg 12.a is stuck undersized for 1909.995807, current state active+undersized, last acting [0,2]
/ ?2 Y5 |: b: N% j pg 12.b is stuck undersized for 1910.009596, current state active+undersized, last acting [1,0]( t. Q f) n7 N W% k
pg 12.c is stuck undersized for 1910.010185, current state active+undersized, last acting [0,2]
% P2 A6 F' h& G! p: S: Y3 i pg 12.d is stuck undersized for 1910.001526, current state active+undersized, last acting [1,0]. X* G4 F% b2 k; R! U
pg 12.e is stuck undersized for 1909.984982, current state active+undersized, last acting [2,0]
1 T, Y* l( ^/ O6 z/ I- { pg 12.f is stuck undersized for 1910.010640, current state active+undersized, last acting [2,0]
$ a/ g I# n( B& f3 Y8 z6 R ?1 o! ^5 I/ Z! Q$ L! t4 \% o
进一步查看集群osd状态树,发现ceph2和cepn3宕机再恢复后,osd.1 和osd.2进程已不在ceph2和cepn3上。
' ] [" {8 u2 V* c, i' a[root@ceph1 ~]# ceph osd tree
7 {" c( w6 |/ e T [( bID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
2 s1 o( v$ Q. T( N: F/ [' x1 B. N-1 0.24239 root default 0 O# j- C, ]& ^7 T
-9 0.16159 host centos7evcloud
! K) h3 X3 H6 a5 n6 S ]2 s 1 hdd 0.08080 osd.1 up 1.00000 1.00000
, n6 b$ L% Y# @: E' _ n* c 2 hdd 0.08080 osd.2 up 1.00000 1.00000 8 H. Y/ G# C7 V7 A/ ~
-3 0.08080 host ceph1 5 D+ `3 F$ ^$ F5 V
0 hdd 0.08080 osd.0 up 1.00000 1.00000
. g! t! ~1 }3 z-5 0 host ceph2
4 l, H' v0 b4 z0 D-7 0 host ceph33 X1 K. F5 [0 J
! U, B, i+ w" D
分别查看osd.1 和osd.2服务状态。
# O! H9 C% V( b, S# o2 w a0 E: W1 w2 r- Z$ `: X7 d7 u3 _
解决方法:
4 _! V+ `1 j" B O5 b: H, Y分别进入到ceph2和ceph3节点中重启osd.1 和osd.2服务,将这两个服务重新映射到ceph2和ceph3节点中。
8 e) |( n. u1 d[root@ceph1 ~]# ssh ceph21 _/ H% H1 s. M3 \. W7 p
[root@ceph2 ~]# systemctl restart ceph-osd@1.service
! a# c6 _5 n3 D5 p1 ][root@ceph2 ~]# ssh ceph3
6 y; R g& r) z. g+ g' B[root@ceph3 ~]# systemctl restart ceph-osd@2.service+ O5 ^5 x; b* l0 Y+ E( M9 f3 I
8 j5 Q! }/ ~; i O
最后查看集群osd状态树发现这两个服务重新映射到ceph2和ceph3节点中。) k: O! G# j2 M2 W+ t
[root@ceph3 ~]# ceph osd tree
* p3 U' r1 I' `0 YID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
; a% i5 u$ y( Q; ^8 z: x$ F' [-1 0.24239 root default
! _4 d- Z2 X1 @2 b-9 0 host centos7evcloud 0 [& m0 i f+ J( o' @. m( A0 |- o
-3 0.08080 host ceph1 ]. n4 h1 c# x3 u+ z& q* z2 q
0 hdd 0.08080 osd.0 up 1.00000 1.00000
" S/ E- C; p3 z# y7 c& t-5 0.08080 host ceph2 ) g7 s- M! |4 N' V3 G7 @8 c! C
1 hdd 0.08080 osd.1 up 1.00000 1.00000 * u8 d2 }' e3 j
-7 0.08080 host ceph3 # Z k, H8 U: _0 L1 ~! R. N* w
2 hdd 0.08080 osd.2 up 1.00000 1.00000
5 I n$ c. y9 ~7 E U3 S
. n* C9 o+ o# \7 F$ X d集群状态也显示了久违的HEALTH_OK。& t* e5 a# Y: ~ v( Z
[root@ceph3 ~]# ceph -s
2 u ]* z- h2 M) J1 r cluster:. g' x O: A, O% M
id: 13430f9a-ce0d-4d17-a215-272890f47f28; e" C# ~. @3 E% {3 A
health: HEALTH_OK( H( J! J7 H7 e5 K& v7 U
) f/ N9 A6 q9 g5 s9 W$ Y
services:8 S7 z% [" r; t6 ~( B% P- H" s) ~: X c u
mon: 3 daemons, quorum ceph2,ceph1,ceph3
7 T; }, f: b0 q/ W: F mgr: ceph1(active), standbys: ceph2, ceph3
! e. y6 s6 a- k9 B6 \; [ mds: cephfs-1/1/1 up {0=ceph1=up:active}) E& |8 L5 t4 r, r' O
osd: 3 osds: 3 up, 3 in
) i" v3 o. H$ B( a) l' r rgw: 1 daemon active( S! K# ^* o; z8 x* X
% y# t1 M" @5 T( e. a+ D$ b$ Q$ \4 p
data:4 V& q* O: B) y* [
pools: 8 pools, 288 pgs( v% d- R6 N! q" e: m, d: T. A
objects: 241 objects, 3.6 KiB8 ?% o4 z& s. d6 c) C" _7 D
usage: 3.1 GiB used, 245 GiB / 248 GiB avail, Z" j$ `( r! y4 r) t
pgs: 288 active+clean
9 j f" k1 E2 \- n5 Q0 E
$ l5 T' t; _: d! X6.卸载CephFS后再挂载时报错: P7 s- m8 T7 G5 R8 p% l
挂载命令如下:2 r; T. o8 ]# ~ I
mount -t ceph 10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ /mnt/mycephfs/ -o name=admin,secret=AQBAI/JbROMoMRAAbgRshBRLLq953AVowLgJPw==3 S' u# n/ i8 ]/ v9 }& m
( Z+ C# d5 V) q% U( Z2 Y# L. ~/ D卸载CephFS后再挂载时报错:mount error(2): No such file or directory1 `1 A, n" x& u7 h' `/ Z# Y
说明:首先检查/mnt/mycephfs/目录是否存在并可访问,我的是存在的但依然报错No such file or directory。但是我重启了一下osd服务意外好了,可以正常挂载CephFS。
' C) x# I: K2 p: Z[root@ceph1 ~]# systemctl restart ceph-osd.target
" j7 T0 Y) `9 U( w6 L& U[root@ceph1 ~]# mount -t ceph 10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ /mnt/mycephfs/ -o name=admin,secret=AQBAI/JbROMoMRAAbgRshBRLLq953AVowLgJPw==; b+ k2 Q, Y: F1 O& V; m
' f" ]7 J: w1 r7 I+ z
可见挂载成功~!$ U9 y4 a% L- x# j1 g1 a% C t
[root@ceph1 ~]# df -h
, D. E& ?: c2 K, s0 w5 U0 WFilesystem Size Used Avail Use% Mounted on. m; O$ @/ U0 j. ~
/dev/vda2 48G 7.5G 41G 16% // P0 ]7 {/ v/ X5 a7 ?: U
devtmpfs 1.9G 0 1.9G 0% /dev
4 _1 s/ o7 W* i& s, R$ ftmpfs 2.0G 8.0K 2.0G 1% /dev/shm
: |/ L1 {) \* z' z' h0 | H( itmpfs 2.0G 17M 2.0G 1% /run" G8 h& U4 G B
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup6 E& R0 f1 l, D# V( Y( C+ `9 R
tmpfs 2.0G 24K 2.0G 1% /var/lib/ceph/osd/ceph-0
+ G0 B. N" x/ z. n. x; @tmpfs 396M 0 396M 0% /run/user/0
* g% M5 ~9 V0 c- J10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ 249G 3.1G 246G 2% /mnt/mycephfs
; Q9 F6 S& I( N0 K* I' ^" x6 }' X9 {& w" v; C+ B0 R
积累中。。。
& m3 ?7 w0 a! L4 U$ s=========================================================================. Z( J" T+ t/ _% C- h
总结:
6 ]+ k2 ?/ i- Y! E* Z& p) e查看集群状态发现报错或警告后,往往通过ceph health detail命令可以查看到系统给出的处理建议。通过这些建议一般可以处理大多数集群出现的问题。
* C2 _+ i6 J6 G) i l; z( `
/ O( c% t+ N `, n" E |
|