|
|
楼主 |
发表于 2022-7-22 11:11:33
|
显示全部楼层
First Tried Reweighting the OSDs# W T% u2 h1 e* e- |$ M
I previously had a similar issue were an OSD was nearfull and ran reweight to help resolve they issue
0 K- i3 ?2 P. j; H% g: V+ R5 Oceph osd reweight-by-utilization
/ h) ^! a; ^9 r5 |6 {This is what the cluster looked like before starting the reweight process.$ _$ |% g2 D5 `6 K4 Z2 T& C& \
[root@osd1 ~]# ceph -s
' I6 d" Y# U( r+ H" Q; s& [ p cluster:# l4 t' D9 L8 q6 V' B
id: ffdb9e09-fdca-48bb-b7fb-cd17151d5c09
5 W) S s* e4 d) j health: HEALTH_ERR8 X5 F; i; ~; c
1 backfillfull osd(s)' |! o/ y1 U! u- ?
2 pool(s) backfillfull/ S* R6 z H. j* Z$ t: G, B$ V
26199/6685016 objects misplaced (0.392%); A- r5 D) }3 l
Degraded data redundancy (low space): 1 pg backfill_toofull
7 s H+ {/ [* V$ j& r& Y# U% [
8 T. Y; q( ^2 g services:7 \9 V0 [( k' o, [# X9 \
mon: 3 daemons, quorum osd1,osd2,osd39 o4 o- \7 @4 f% j9 r) x- g z
mgr: osd1(active), standbys: osd2
6 y3 ^9 a' g- e3 B" v& p4 ^( _" C# | mds: cephfs-2/2/2 up {0=osd1=up:active,1=osd2=up:active}, 1 up:standby
8 _! y) N2 [4 x- K" d osd: 15 osds: 15 up, 15 in; 1 remapped pgs
( G6 Q% K2 f8 y- B1 X& I! E
1 P$ f" ~( u% J) O, i6 U data:2 J7 U$ X4 J ?2 q
pools: 2 pools, 256 pgs
, w1 `# e$ K/ F1 U objects: 3264k objects, 12342 GB3 S! {* V" K7 |. y. R& J; I: o7 B
usage: 24773 GB used, 18898 GB / 43671 GB avail; L+ F6 {3 ]6 X& D7 ~
pgs: 26199/6685016 objects misplaced (0.392%)
" T m2 L9 Z B+ P9 l( C# _. P 255 active+clean/ c ]) g1 `& h% j s7 X" L; _; G% [/ o) T
1 active+remapped+backfill_toofull
) V- S% e+ y- E' a2 ~ 7 \1 q3 W* @3 p$ c
[root@osd1 ~]# ceph osd status, N/ U+ W/ [# h( A# x% s* j) V
+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+
7 o( `3 H! U9 }| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |: E/ i* ^: k) v, S+ m5 E7 Y
+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+
1 b6 Q B& N, D' B8 I/ j| 0 | osd1.example.com | 1741G | 1053G | 0 | 0 | 0 | 0 | exists,up |- W% c' } o( o4 M' |5 y
| 1 | osd2.example.com | 2034G | 760G | 0 | 0 | 0 | 0 | exists,up |
' m; h! r* d* r( @$ c| 2 | osd3.example.com | 1937G | 857G | 0 | 0 | 0 | 0 | exists,up |
; u* v5 g0 R* f5 H3 v Z| 3 | osd4.example.com | 2031G | 763G | 0 | 0 | 0 | 0 | exists,up |7 f" g7 P" U# S2 T. E7 C
| 4 | osd1.example.com | 2032G | 761G | 0 | 0 | 0 | 0 | exists,up |/ C4 f. |9 o2 I# o% O8 X
| 5 | osd1.example.com | 2033G | 761G | 0 | 0 | 0 | 0 | exists,up |* v, m! F9 B; ^7 l9 C: x
| 6 | osd2.example.com | 485G | 446G | 0 | 0 | 0 | 0 | exists,up |
7 J9 z% Y1 d5 V, m( O| 7 | osd3.example.com | 677G | 254G | 0 | 0 | 0 | 0 | exists,up |$ V4 V* o5 z2 A% B3 q
| 8 | osd3.example.com | 869G | 61.7G | 0 | 0 | 0 | 0 | backfillfull,exists,up |
9 D# ~: {, L9 F! x0 r8 [4 g# a| 9 | osd4.example.com | 676G | 255G | 0 | 0 | 0 | 0 | exists,up |
& ]9 G) o4 O$ o# {6 x| 10 | osd4.example.com | 194G | 736G | 0 | 0 | 0 | 0 | exists,up |: m! D- k# h0 ^
| 11 | osd5.example.com | 2806G | 2782G | 0 | 0 | 0 | 0 | exists,up |/ T/ J" t5 _, J+ u" c% D7 ]4 U
| 12 | osd5.example.com | 1938G | 3650G | 0 | 0 | 0 | 0 | exists,up |
" g6 c2 o- E1 G* D| 13 | osd5.example.com | 2901G | 2687G | 0 | 0 | 0 | 0 | exists,up |
; G1 n5 Z& u Y$ }- p2 A| 14 | osd5.example.com | 2412G | 3067G | 0 | 0 | 0 | 0 | exists,up |8 s. ^1 J2 I9 z U8 t+ n3 o( B" f
+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+( ?$ u7 X/ X) x! J5 N
[root@osd1 ~]# ceph osd reweight-by-utilization# u2 |" {7 S2 F4 {0 W
moved 9 / 512 (1.75781%)( ?$ d( M' |" |
avg 34.1333
7 W* Z7 R e4 ]% p7 ostddev 16.7087 -> 16.5484 (expected baseline 5.64427); m# t: x5 t; q2 x2 d
min osd.6 with 8 -> 8 pgs (0.234375 -> 0.234375 * mean)
9 _# ^% q# A. `: H$ U0 f- Pmax osd.13 with 60 -> 60 pgs (1.75781 -> 1.75781 * mean)9 g# W p. E5 i ^$ J# \
oload 120
* x/ a& k- k$ E) ^* nmax_change 0.05& O. L; v$ n! S- `
max_change_osds 44 A5 n. p" I( N9 c0 U
average_utilization 0.5673
9 O" q0 w- f# hoverload_utilization 0.6807
& A4 U1 d9 T5 @8 Iosd.8 weight 0.6501 -> 0.6001
: P* U) @' k5 y# Fosd.1 weight 0.7501 -> 0.7001
9 V f4 w* |5 q t( P. k! hosd.5 weight 0.8852 -> 0.8353
. q1 \0 c+ u; J0 P+ kosd.4 weight 0.9500 -> 0.9000% X. |8 ]. v! f
This process will take a while to run based on the size of your cluster and your configuration.
b2 R: c+ R$ q4 T: L* bFor me it took about 24 hours to complete, and it didn’t resolve my issue, so I attempted another reweight, and again after 24 hours later I now have two OSDs with a status of backfillfull. So obviously need to look into another way of getting this resolved.
# p8 u# D+ I8 i0 M7 r7 RSecond Tried Increasing PG
2 ]/ V- L: Y1 ]: \I did some addition checking and looked further into the issue.
u/ |$ v5 Y# uI first checked the OSD troubleshooting and then the PG troubleshooting, I tracked down I had a pg issue.
4 l# @+ o6 i2 a1 H, {8 ALooks like pg 1.33 is getting low space and not continuing with the backfill. We have misplaced objects and not missing objects which is good, our cluster is still running during this process.3 J% k0 u- T% X
[root@osd1 ~]# ceph health detail& m. G# H6 k" C! T* `3 I
HEALTH_ERR 2 backfillfull osd(s); 2 pool(s) backfillfull; 70105/6685016 objects misplaced (1.049%); Degraded data redundancy (low space): 1 pg backfill_toofull
' S e- j& V% GOSD_BACKFILLFULL 2 backfillfull osd(s)( V1 n8 y* B/ M1 ]& T
osd.8 is backfill full
+ a: D0 g7 a, z' w osd.9 is backfill full
D! t) o* f: bPOOL_BACKFILLFULL 2 pool(s) backfillfull
3 _& n( U0 {9 S* G" J pool 'cephfs_data' is backfillfull# c; h$ [, n' H7 W
pool 'cephfs_metadata' is backfillfull8 l1 _3 J* @, r2 D8 g/ |- v- {
OBJECT_MISPLACED 70105/6685016 objects misplaced (1.049%)
v5 l5 \+ a/ q/ K. A' E5 i1 APG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull6 Z- z9 T7 q/ M! e8 ^4 [' L- }
pg 1.33 is active+remapped+backfill_toofull, acting [12,4]
- Z* \. l8 i2 b/ e0 o+ _. u& M[root@osd1 ~]# ceph osd status" d v% _6 E8 g$ m7 |
+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+6 g3 x9 w2 F& v
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
. T, W* \9 a) z7 \2 R+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+" {; J7 g' \( x L2 M# M; G
| 0 | osd1.example.com | 1741G | 1053G | 0 | 0 | 0 | 0 | exists,up |; h/ }: b- V% c) J+ ]8 w K
| 1 | osd2.example.com | 1937G | 856G | 0 | 0 | 0 | 0 | exists,up |
d6 k- t! U; Q6 c. Q2 @8 p7 Y. K| 2 | osd3.example.com | 2033G | 760G | 0 | 0 | 0 | 0 | exists,up |
+ z- l7 e/ C' i; ?| 3 | osd4.example.com | 2180G | 614G | 0 | 0 | 0 | 0 | exists,up |
) h/ L2 R o# q- c| 4 | osd1.example.com | 1936G | 857G | 0 | 0 | 0 | 0 | exists,up |
- J6 W2 v6 u, h| 5 | osd1.example.com | 1840G | 954G | 0 | 0 | 0 | 0 | exists,up |
8 z( X3 X/ K* d& @ z, ?| 6 | osd2.example.com | 485G | 446G | 0 | 0 | 0 | 0 | exists,up |
0 c; A6 ?, z$ |% a| 7 | osd3.example.com | 677G | 254G | 0 | 0 | 0 | 0 | exists,up |
: q+ P' C! y) H7 c| 8 | osd3.example.com | 869G | 61.7G | 0 | 0 | 0 | 0 | backfillfull,exists,up |
2 F% d$ w* K1 d/ V' K- S| 9 | osd4.example.com | 867G | 64.3G | 0 | 0 | 0 | 0 | backfillfull,exists,up |
% k# w" _ Y( s- Z| 10 | osd4.example.com | 194G | 737G | 0 | 0 | 0 | 0 | exists,up |6 l9 T8 h- ^; K2 J
| 11 | osd5.example.com | 2806G | 2782G | 0 | 0 | 0 | 0 | exists,up |
4 M8 D* |/ ~: D7 t" k/ r5 h| 12 | osd5.example.com | 1938G | 3650G | 0 | 0 | 0 | 0 | exists,up |
1 n; x( Q* p6 x2 Q& g7 t7 t. \| 13 | osd5.example.com | 2901G | 2687G | 0 | 0 | 0 | 0 | exists,up |, T$ p; [$ v: b1 u% {
| 14 | osd5.example.com | 2412G | 3067G | 0 | 0 | 0 | 0 | exists,up |
: [1 l. M0 m2 `5 f+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+8 t' J1 x1 v7 i. C! W7 ?
We can see that now today I have 2 OSDs that are backfillfull, which isn’t good, and I can see that pg 1.33 seems to be the one that is giving us a problem.
, K9 H* ?7 m$ {. z6 _After doing some additional research I was able to determine the when I setup my Ceph cluster, I only had <10 OSDs, now I’m running 16 OSDs. I had made a bad assumption there was a single OSD per server, but in fact we have 4 drives in each server which gives us 4 OSDs per physical server. Each OSD manages an individual storage device.1 h0 W* y7 E; E7 ~
Based on the Ceph documentation in order to determine the number of pg you want in your pool, the calculation would be something like this. (OSDs * 100) / Replicas, so in my case I now have 16 OSDs, and 2 copies of each object.
7 D/ b* s" J& j9 y16 * 100 / 2 = 800
, G$ S6 Y4 Z9 l$ _2 D# mThe number of pg must be in powers of 2, so the next matching power of 2 would be 1024. So I checked our pool pg size and attempted to make adjustments to see if they helps.
# b4 W8 k$ m: {Remember when making changes to pg_num also increase pgp_num.
, L2 y) F* |4 ?[root@osd1 ~]# ceph osd lspools. k1 t5 B0 k( }% R7 ?
1 cephfs_data,2 cephfs_metadata," ?. S2 ]+ R* W, ^; a* J6 `- y
[root@osd1 ~]# ceph osd pool get cephfs_data size+ k8 |: w a( D
size: 27 F7 B* `* D6 g; g4 _, ]
[root@osd1 ~]# ceph osd pool get cephfs_data min_size
( t( C* c8 S7 o( b+ P. {min_size: 1
4 {2 m! g- ^5 N3 N[root@osd1 ~]# ceph osd pool get cephfs_data pg_num
4 N( Q2 ~+ z6 Lpg_num: 128
8 l: J* _9 ~. K) v9 k+ K' d[root@osd1 ~]# ceph osd pool get cephfs_data pgp_num2 z+ J0 |9 D" E2 _% v- p
pgp_num: 128& x) E5 {" }( U% H# x
We can see that when I created the pool I used the default of 128, not realizing that I was going to be adding OSDs over time and it’s recommended to adjust pg_num and pgp_num based on the increasing number of OSDs. So I attempted to increase pg_num from 128 to 1024.
; p! f% `' @1 O% n( B[root@osd1 ~]# ceph osd pool set cephfs_data pg_num 1024) j/ M, k8 g' G, b3 V! w
Error E2BIG: specified pg_num 1024 is too large (creating 920 new PGs on ~15 OSDs exceeds per-OSD max of 32)
/ N: F; g) M0 A3 |) y# D7 [% hI’m not able to make such a radical jump from 128 to 1024, so I did a smaller increase from 128 to 256.
+ x K% S( B/ X! ?2 |( s3 P5 Z* N8 z[root@osd1 ~]# ceph osd pool set cephfs_data pg_num 2563 p' @3 X* ~' ]6 }6 W, x
set pool 1 pg_num to 2562 M# `) [0 W, u3 m) t3 _4 v! ]/ o$ {& {
This has initiated the changes in my pool, and before making any further adjustments it will take some time for the cluster to recover. I’m going to wait for this to complete again before making any further changes.
! [* u5 P: X0 b( I6 Z5 y9 }So you can see what my Ceph health check looks like, this is where we are at now after making those changes.
, V: q- p# K& q7 D/ ^' t9 y1 S[root@osd1 ~]# ceph -s
9 S& f# g5 ?2 w) q1 d$ ]! J( [, X cluster:
! p8 f) v* o+ d3 X5 c id: ffdb9e09-fdca-48bb-b7fb-cd17151d5c09
8 }; l$ `) W. P; N health: HEALTH_ERR/ ?/ D7 |4 k6 d% [# ]* |3 g# d
2 backfillfull osd(s)" P) D1 ]) W6 O0 \+ y6 ^" R
2 pool(s) backfillfull
& N- G, z: w! R" d* h2 i 2830303/6685016 objects misplaced (42.338%); B8 }# ?, c% t9 Y0 M( d0 x6 E
Degraded data redundancy: 2/6685016 objects degraded (0.000%), 1 pg degraded
! g# W# u( ]/ ?7 q% Z Degraded data redundancy (low space): 2 pgs backfill_toofull
# [# r7 A; N' x8 O( y E ( t2 V# A: ?6 J: G- I8 u. o
services:5 z% ?) p$ |. E) H$ D( {
mon: 3 daemons, quorum osd1,osd2,osd3% W2 q4 h: Q# \2 L* `
mgr: osd1(active), standbys: osd2" y7 P" x1 }, B; s, V( y4 {$ {
mds: cephfs-2/2/2 up {0=osd1=up:active,1=osd2=up:active}, 1 up:standby4 Z% e. f9 O: l; i6 f7 b8 N
osd: 15 osds: 15 up, 15 in; 130 remapped pgs! h4 B: M/ B2 f3 d9 p- Q/ M" {4 b
! W: K! ^" D/ p# x3 Q! w
data:
5 m. `% F3 h/ w$ K, o$ `- h, @) y pools: 2 pools, 384 pgs' C2 C& h S. P0 w: _
objects: 3264k objects, 12342 GB0 S7 N1 l, [( z5 |$ u/ s, v
usage: 24915 GB used, 18756 GB / 43671 GB avail4 X* x! q* C0 g$ [
pgs: 2/6685016 objects degraded (0.000%)* r! M! |, E3 ~5 _' c
2830303/6685016 objects misplaced (42.338%)4 \0 H1 O& P2 t+ z9 p7 w
253 active+clean% ?8 w8 J, m, \5 j# I5 C- ^0 a
120 active+remapped+backfill_wait
$ n" D+ d5 T. G. e7 N 8 active+remapped+backfilling
% ]0 g( W7 G# M5 Z 2 active+remapped+backfill_wait+backfill_toofull! L( A% G, s4 s$ e7 R- x# U
1 active+recovery_wait+degraded3 k* O3 P1 [' ?& W& x
2 o G) i% Z4 Y. F" z% r
io:, `# [" |9 ?+ J% Y$ ?: Z7 r. f' {
recovery: 95900 kB/s, 24 objects/s- ?. v! U6 @1 D+ a0 q5 d6 L6 n9 {
1 n# C" j/ a% u+ m- s S! T0 w+ j% W
[root@osd1 ~]# ceph health detail
. C9 [2 a5 S6 E- F+ W) G5 y6 sHEALTH_ERR 2 backfillfull osd(s); 2 pool(s) backfillfull; 2792612/6685016 objects misplaced (41.774%); Degraded data redundancy: 2/6685016 objects degraded (0.000%), 1 pg degraded; Degraded data redundancy (low space): 2 pgs backfill_toofull) i7 q1 s5 y" r
OSD_BACKFILLFULL 2 backfillfull osd(s)
+ t. S9 l+ B/ E$ x osd.8 is backfill full# ?# R6 A+ B+ e U2 G7 }
osd.9 is backfill full
$ I0 h5 S6 n/ `POOL_BACKFILLFULL 2 pool(s) backfillfull
+ J9 m. e* O+ L" I$ H# s5 W pool 'cephfs_data' is backfillfull! D0 j6 o; L; E0 D2 {
pool 'cephfs_metadata' is backfillfull
8 w9 ^: v6 c2 ~6 L+ M5 p+ @ GOBJECT_MISPLACED 2792612/6685016 objects misplaced (41.774%)
( v; v6 O9 V6 ^; E9 [8 M2 pPG_DEGRADED Degraded data redundancy: 2/6685016 objects degraded (0.000%), 1 pg degraded
# G% U% t% X6 U. W; @, k pg 1.3a is active+recovery_wait+degraded, acting [11,2], G _+ b( w2 ^
PG_DEGRADED_FULL Degraded data redundancy (low space): 2 pgs backfill_toofull
% X; [; i, z4 U pg 1.33 is active+remapped+backfill_wait+backfill_toofull, acting [12,4]$ r! g& {; _ @. Q
pg 1.a6 is active+remapped+backfill_wait+backfill_toofull, acting [7,14]! V. a: p; v) F, v" K
Earlier when I started only pg 1.33 was showing backfill_toofull, and now we have pg 1.33 and 1.a6 both showing. Lets wait for the dust to settle after our last change before making any more adjustments.
/ I, L2 G" [8 `+ q+ sThe Recovery Process
4 l! H c8 a- k2 LAfter 24 hours it’s looking good, no errors, but it’s still got going through a recover process. We’re down from 42% to 18% objects misplaced, and our OSDs no longer have any backfill error messages, so looks like we’re on the right path.* E* ~! {& m" i5 K C( }( I
[root@osd1 ~]# ceph -s
0 d2 u2 R6 N; I E4 a cluster:
, s& O3 ]1 ?1 B1 J" S5 n( |; l _ id: ffdb9e09-fdca-48bb-b7fb-cd17151d5c09
) J& x# M; k1 r% Y, x& S: F& E health: HEALTH_ERR; _! f2 x2 e7 @0 _, v# h
1235611/6685016 objects misplaced (18.483%)6 q5 r" N' P {7 Q, }
Degraded data redundancy (low space): 5 pgs backfill_toofull
! N- f q9 ~) Y6 V( Q0 c4 [' ~$ f 6 x0 ~8 A, J3 @# A$ L
services:
; n+ p! H/ b o* m4 t- k mon: 3 daemons, quorum osd1,osd2,osd3
7 ~ a$ h8 y6 B mgr: osd1(active), standbys: osd2
5 L1 j2 Z( V X/ H8 c mds: cephfs-2/2/2 up {0=osd1=up:active,1=osd2=up:active}, 1 up:standby; U" D R$ F0 d" w
osd: 15 osds: 15 up, 15 in; 57 remapped pgs
2 u% w& i+ u- x5 x4 u
; G( ?* K% \8 V1 s# u% V: l- Z" ~ data:
; q4 ]5 `( U0 y* ]+ d7 s pools: 2 pools, 384 pgs8 E6 {# i% e5 x) A, r2 G0 g
objects: 3264k objects, 12342 GB; G/ }4 ?2 ?6 z+ [3 i! Y
usage: 25062 GB used, 18609 GB / 43671 GB avail
% X5 i, y! V' S- {/ A$ x! f; J pgs: 1235611/6685016 objects misplaced (18.483%)
6 {2 i; c7 x. R4 h 327 active+clean" G* w4 i0 y6 U' K" a
49 active+remapped+backfill_wait1 ]* i' M6 y, O! G% A- B
5 active+remapped+backfill_wait+backfill_toofull$ j: _) B2 j c8 ?
3 active+remapped+backfilling
( v; o* f* G7 ?2 @ e
( A& M: G$ k9 s io:
6 A+ L3 y$ j- n& e; u' P4 u recovery: 38584 kB/s, 9 objects/s
6 G& a# D6 y' o$ e 6 U0 g- h; b# s! _1 q
[root@osd1 ~]# ceph -s ) X$ @2 e/ p2 Z( @5 }+ B
cluster: j7 H% z* k3 N/ L3 Z
id: ffdb9e09-fdca-48bb-b7fb-cd17151d5c09$ i- z9 u% a0 B- \3 R0 c6 ?
health: HEALTH_ERR
) e7 V3 H3 v' h5 C 1235327/6685016 objects misplaced (18.479%)
1 I$ R0 s; K/ U3 ]+ T; Z$ ~3 E Degraded data redundancy (low space): 5 pgs backfill_toofull
( A4 O/ N3 }0 H6 B7 H& ^3 t " S: l- U; X. u, O" l- j
services:: M) l# P) c6 `3 L4 r- Z
mon: 3 daemons, quorum osd1,osd2,osd3( i" Q f0 N6 e! f
mgr: osd1(active), standbys: osd2
8 d+ T: Q' x9 K( O$ d' m mds: cephfs-2/2/2 up {0=osd1=up:active,1=osd2=up:active}, 1 up:standby
- N7 d" J, |' m( ? osd: 15 osds: 15 up, 15 in; 57 remapped pgs
4 f5 ]2 ?# Q1 u: A
* Z" G, D$ z8 b j. V1 s1 Z data:
1 e7 k3 m2 g- V+ {0 m pools: 2 pools, 384 pgs
/ G0 O; D9 Q7 D+ w) j objects: 3264k objects, 12342 GB
i$ W+ [. a+ E* F/ a usage: 25063 GB used, 18608 GB / 43671 GB avail5 }# |! ^6 g; S4 L0 C. ~+ S* N4 `
pgs: 1235327/6685016 objects misplaced (18.479%)0 z' ^. g7 }& {
327 active+clean6 h3 c1 v3 v1 s7 ]% j! w/ Y9 V, U! Q
49 active+remapped+backfill_wait0 l9 }5 r8 d" `/ y" t+ m; w
5 active+remapped+backfill_wait+backfill_toofull
- u* Q! q$ n$ N" P 3 active+remapped+backfilling
8 J$ D8 f* i. k0 I C; h* s) P; } 4 @1 ?# h9 E# h' X7 [1 N
io:* c% ]9 I) w- n, K
recovery: 32430 kB/s, 8 objects/s
2 r8 e$ F8 P& n: S, |1 y * ]9 P& r" c4 F2 a
[root@osd1 ~]# ceph osd status
* s- f- f* I# _, O2 C+----+-------------------------+-------+-------+--------+---------+--------+---------+-----------+
" e6 {8 }' u$ c" i| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |7 H( a/ Q& [6 a1 k5 q
+----+-------------------------+-------+-------+--------+---------+--------+---------+-----------+
4 }) q+ }( C& }+ ?| 0 | osd1.example.com | 1789G | 1004G | 0 | 0 | 0 | 0 | exists,up |
" V- X$ r$ T5 b6 @- t6 l4 H8 ]5 T& U| 1 | osd2.example.com | 2228G | 566G | 0 | 0 | 0 | 0 | exists,up |
3 ?/ ]8 E( u/ U& B! v| 2 | osd3.example.com | 2270G | 524G | 0 | 0 | 0 | 0 | exists,up |
4 w6 `2 n' ^7 s8 T( s0 a| 3 | osd4.example.com | 2164G | 629G | 0 | 0 | 0 | 0 | exists,up |( ` O% [, d8 q9 _ ^
| 4 | osd1.example.com | 2069G | 725G | 0 | 0 | 0 | 0 | exists,up |5 F2 _ {% c1 g% K g
| 5 | osd1.example.com | 1454G | 1339G | 0 | 0 | 0 | 0 | exists,up |9 k' _, b9 Q; q; z) V4 z) W
| 6 | osd2.example.com | 485G | 446G | 0 | 0 | 0 | 0 | exists,up |
" f( H e# U+ f| 7 | osd3.example.com | 437G | 494G | 0 | 0 | 0 | 0 | exists,up |2 {8 A9 Q# T3 }6 H d0 U
| 8 | osd3.example.com | 627G | 303G | 0 | 0 | 0 | 0 | exists,up |
2 _/ R( i5 p5 e# r1 _| 9 | osd4.example.com | 771G | 159G | 0 | 0 | 0 | 0 | exists,up |0 z% \3 W; ^5 a$ o: @
| 10 | osd4.example.com | 339G | 591G | 0 | 0 | 0 | 0 | exists,up |
4 y% _4 n$ J0 j0 c! N7 V# ~) T| 11 | osd5.example.com | 2464G | 3124G | 0 | 0 | 0 | 0 | exists,up |" Q7 L! |9 _* i) `# _
| 12 | osd5.example.com | 2174G | 3414G | 0 | 0 | 0 | 0 | exists,up |
" c$ v$ [/ b2 ^: a| 13 | osd5.example.com | 3418G | 2170G | 0 | 0 | 0 | 0 | exists,up |
8 T8 }: ^5 v0 W| 14 | osd5.example.com | 2367G | 3112G | 0 | 0 | 0 | 0 | exists,up |8 ~/ }1 I+ J: Y2 a6 z
+----+-------------------------+-------+-------+--------+---------+--------+---------+-----------+
( ?- t( O- \( B- W6 H[root@osd1 ~]# * o# z+ ?! `; ]5 |) E$ f5 [
The recovery process is looking good. I’ll check back again tomorrow to make sure it’s finished and all of our alerts have cleared.
. @: @( Y% v3 G" n3 I& bOnce that is done I’ll make one more adjustment on the pg_num to bring it up to the right level for the number of our OSDs.4 Y _7 R2 \, [) B5 r
|
|