找回密码
 注册
查看: 616|回复: 1

How to resolve Ceph pool getting active+remapped+backfill_toofull

[复制链接]

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
发表于 2022-7-22 11:11:17 | 显示全部楼层 |阅读模式
First Tried Reweighting the OSDs
I previously had a similar issue were an OSD was nearfull and ran reweight to help resolve they issue
ceph osd reweight-by-utilization
This is what the cluster looked like before starting the reweight process.
[root@osd1 ~]# ceph -s   cluster:    id:     ffdb9e09-fdca-48bb-b7fb-cd17151d5c09    health: HEALTH_ERR            1 backfillfull osd(s)            2 pool(s) backfillfull            26199/6685016 objects misplaced (0.392%)            Degraded data redundancy (low space): 1 pg backfill_toofull   services:    mon: 3 daemons, quorum osd1,osd2,osd3    mgr: osd1(active), standbys: osd2    mds: cephfs-2/2/2 up  {0=osd1=up:active,1=osd2=up:active}, 1 up:standby    osd: 15 osds: 15 up, 15 in; 1 remapped pgs   data:    pools:   2 pools, 256 pgs    objects: 3264k objects, 12342 GB    usage:   24773 GB used, 18898 GB / 43671 GB avail    pgs:     26199/6685016 objects misplaced (0.392%)             255 active+clean             1   active+remapped+backfill_toofull [root@osd1 ~]# ceph osd status+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+| id |           host          |  used | avail | wr ops | wr data | rd ops | rd data |         state          |+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+| 0  | osd1.example.com | 1741G | 1053G |    0   |     0   |    0   |     0   |       exists,up        || 1  | osd2.example.com | 2034G |  760G |    0   |     0   |    0   |     0   |       exists,up        || 2  | osd3.example.com | 1937G |  857G |    0   |     0   |    0   |     0   |       exists,up        || 3  | osd4.example.com | 2031G |  763G |    0   |     0   |    0   |     0   |       exists,up        || 4  | osd1.example.com | 2032G |  761G |    0   |     0   |    0   |     0   |       exists,up        || 5  | osd1.example.com | 2033G |  761G |    0   |     0   |    0   |     0   |       exists,up        || 6  | osd2.example.com |  485G |  446G |    0   |     0   |    0   |     0   |       exists,up        || 7  | osd3.example.com |  677G |  254G |    0   |     0   |    0   |     0   |       exists,up        || 8  | osd3.example.com |  869G | 61.7G |    0   |     0   |    0   |     0   | backfillfull,exists,up || 9  | osd4.example.com |  676G |  255G |    0   |     0   |    0   |     0   |       exists,up        || 10 | osd4.example.com |  194G |  736G |    0   |     0   |    0   |     0   |       exists,up        || 11 | osd5.example.com | 2806G | 2782G |    0   |     0   |    0   |     0   |       exists,up        || 12 | osd5.example.com | 1938G | 3650G |    0   |     0   |    0   |     0   |       exists,up        || 13 | osd5.example.com | 2901G | 2687G |    0   |     0   |    0   |     0   |       exists,up        || 14 | osd5.example.com | 2412G | 3067G |    0   |     0   |    0   |     0   |       exists,up        |+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+[root@osd1 ~]# ceph osd reweight-by-utilizationmoved 9 / 512 (1.75781%)avg 34.1333stddev 16.7087 -> 16.5484 (expected baseline 5.64427)min osd.6 with 8 -> 8 pgs (0.234375 -> 0.234375 * mean)max osd.13 with 60 -> 60 pgs (1.75781 -> 1.75781 * mean)oload 120max_change 0.05max_change_osds 4average_utilization 0.5673overload_utilization 0.6807osd.8 weight 0.6501 -> 0.6001osd.1 weight 0.7501 -> 0.7001osd.5 weight 0.8852 -> 0.8353osd.4 weight 0.9500 -> 0.9000
This process will take a while to run based on the size of your cluster and your configuration.
For me it took about 24 hours to complete, and it didn’t resolve my issue, so I attempted another reweight, and again after 24 hours later I now have two OSDs with a status of backfillfull.  So obviously need to look into another way of getting this resolved.
Second Tried Increasing PG
I did some addition checking and looked further into the issue.
I first checked the OSD troubleshooting and then the PG troubleshooting, I tracked down I had a pg issue.
Looks like pg 1.33 is getting low space and not continuing with the backfill.  We have misplaced objects and not missing objects which is good, our cluster is still running during this process.
[root@osd1 ~]# ceph health detailHEALTH_ERR 2 backfillfull osd(s); 2 pool(s) backfillfull; 70105/6685016 objects misplaced (1.049%); Degraded data redundancy (low space): 1 pg backfill_toofullOSD_BACKFILLFULL 2 backfillfull osd(s)    osd.8 is backfill full    osd.9 is backfill fullPOOL_BACKFILLFULL 2 pool(s) backfillfull    pool 'cephfs_data' is backfillfull    pool 'cephfs_metadata' is backfillfullOBJECT_MISPLACED 70105/6685016 objects misplaced (1.049%)PG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull    pg 1.33 is active+remapped+backfill_toofull, acting [12,4][root@osd1 ~]# ceph osd status+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+| id |           host          |  used | avail | wr ops | wr data | rd ops | rd data |         state          |+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+| 0  | osd1.example.com | 1741G | 1053G |    0   |     0   |    0   |     0   |       exists,up        || 1  | osd2.example.com | 1937G |  856G |    0   |     0   |    0   |     0   |       exists,up        || 2  | osd3.example.com | 2033G |  760G |    0   |     0   |    0   |     0   |       exists,up        || 3  | osd4.example.com | 2180G |  614G |    0   |     0   |    0   |     0   |       exists,up        || 4  | osd1.example.com | 1936G |  857G |    0   |     0   |    0   |     0   |       exists,up        || 5  | osd1.example.com | 1840G |  954G |    0   |     0   |    0   |     0   |       exists,up        || 6  | osd2.example.com |  485G |  446G |    0   |     0   |    0   |     0   |       exists,up        || 7  | osd3.example.com |  677G |  254G |    0   |     0   |    0   |     0   |       exists,up        || 8  | osd3.example.com |  869G | 61.7G |    0   |     0   |    0   |     0   | backfillfull,exists,up || 9  | osd4.example.com |  867G | 64.3G |    0   |     0   |    0   |     0   | backfillfull,exists,up || 10 | osd4.example.com |  194G |  737G |    0   |     0   |    0   |     0   |       exists,up        || 11 | osd5.example.com | 2806G | 2782G |    0   |     0   |    0   |     0   |       exists,up        || 12 | osd5.example.com | 1938G | 3650G |    0   |     0   |    0   |     0   |       exists,up        || 13 | osd5.example.com | 2901G | 2687G |    0   |     0   |    0   |     0   |       exists,up        || 14 | osd5.example.com | 2412G | 3067G |    0   |     0   |    0   |     0   |       exists,up        |+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+
We can see that now today I have 2 OSDs that are backfillfull, which isn’t good, and I can see that pg 1.33 seems to be the one that is giving us a problem.
After doing some additional research I was able to determine the when I setup my Ceph cluster, I only had <10 OSDs, now I’m running 16 OSDs.  I had made a bad assumption there was a single OSD per server, but in fact we have 4 drives in each server which gives us 4 OSDs per physical server. Each OSD manages an individual storage device.
Based on the Ceph documentation in order to determine the number of pg you want in your pool, the calculation would be something like this. (OSDs * 100) / Replicas, so in my case I now have 16 OSDs, and 2 copies of each object.
16 * 100 / 2 = 800
The number of pg must be in powers of 2, so the next matching power of 2 would be 1024. So I checked our pool pg size and attempted to make adjustments to see if they helps.
Remember when making changes to pg_num also increase pgp_num.
[root@osd1 ~]# ceph osd lspools1 cephfs_data,2 cephfs_metadata,[root@osd1 ~]# ceph osd pool get cephfs_data sizesize: 2[root@osd1 ~]# ceph osd pool get cephfs_data min_sizemin_size: 1[root@osd1 ~]# ceph osd pool get cephfs_data pg_numpg_num: 128[root@osd1 ~]# ceph osd pool get cephfs_data pgp_numpgp_num: 128
We can see that when I created the pool I used the default of 128, not realizing that I was going to be adding OSDs over time and it’s recommended to adjust pg_num and pgp_num based on the increasing number of OSDs.  So I attempted to increase pg_num from 128 to 1024.
[root@osd1 ~]# ceph osd pool set cephfs_data pg_num 1024Error E2BIG: specified pg_num 1024 is too large (creating 920 new PGs on ~15 OSDs exceeds per-OSD max of 32)
I’m not able to make such a radical jump from 128 to 1024, so I did a smaller increase from 128 to 256.
[root@osd1 ~]# ceph osd pool set cephfs_data pg_num 256set pool 1 pg_num to 256
This has initiated the changes in my pool, and before making any further adjustments it will take some time for the cluster to recover. I’m going to wait for this to complete again before making any further changes.
So you can see what my Ceph health check looks like, this is where we are at now after making those changes.
[root@osd1 ~]# ceph -s   cluster:    id:     ffdb9e09-fdca-48bb-b7fb-cd17151d5c09    health: HEALTH_ERR            2 backfillfull osd(s)            2 pool(s) backfillfull            2830303/6685016 objects misplaced (42.338%)            Degraded data redundancy: 2/6685016 objects degraded (0.000%), 1 pg degraded            Degraded data redundancy (low space): 2 pgs backfill_toofull   services:    mon: 3 daemons, quorum osd1,osd2,osd3    mgr: osd1(active), standbys: osd2    mds: cephfs-2/2/2 up  {0=osd1=up:active,1=osd2=up:active}, 1 up:standby    osd: 15 osds: 15 up, 15 in; 130 remapped pgs   data:    pools:   2 pools, 384 pgs    objects: 3264k objects, 12342 GB    usage:   24915 GB used, 18756 GB / 43671 GB avail    pgs:     2/6685016 objects degraded (0.000%)             2830303/6685016 objects misplaced (42.338%)             253 active+clean             120 active+remapped+backfill_wait             8   active+remapped+backfilling             2   active+remapped+backfill_wait+backfill_toofull             1   active+recovery_wait+degraded   io:    recovery: 95900 kB/s, 24 objects/s [root@osd1 ~]# ceph health detailHEALTH_ERR 2 backfillfull osd(s); 2 pool(s) backfillfull; 2792612/6685016 objects misplaced (41.774%); Degraded data redundancy: 2/6685016 objects degraded (0.000%), 1 pg degraded; Degraded data redundancy (low space): 2 pgs backfill_toofullOSD_BACKFILLFULL 2 backfillfull osd(s)    osd.8 is backfill full    osd.9 is backfill fullPOOL_BACKFILLFULL 2 pool(s) backfillfull    pool 'cephfs_data' is backfillfull    pool 'cephfs_metadata' is backfillfullOBJECT_MISPLACED 2792612/6685016 objects misplaced (41.774%)PG_DEGRADED Degraded data redundancy: 2/6685016 objects degraded (0.000%), 1 pg degraded    pg 1.3a is active+recovery_wait+degraded, acting [11,2]PG_DEGRADED_FULL Degraded data redundancy (low space): 2 pgs backfill_toofull    pg 1.33 is active+remapped+backfill_wait+backfill_toofull, acting [12,4]    pg 1.a6 is active+remapped+backfill_wait+backfill_toofull, acting [7,14]
Earlier when I started only pg 1.33 was showing backfill_toofull, and now we have  pg 1.33 and 1.a6 both showing.  Lets wait for the dust to settle after our last change before making any more adjustments.
The Recovery Process
After 24 hours it’s looking good, no errors, but it’s still got going through a recover process.  We’re down from 42% to 18% objects misplaced, and our OSDs no longer have any backfill error messages, so looks like we’re on the right path.
[root@osd1 ~]# ceph -s   cluster:    id:     ffdb9e09-fdca-48bb-b7fb-cd17151d5c09    health: HEALTH_ERR            1235611/6685016 objects misplaced (18.483%)            Degraded data redundancy (low space): 5 pgs backfill_toofull   services:    mon: 3 daemons, quorum osd1,osd2,osd3    mgr: osd1(active), standbys: osd2    mds: cephfs-2/2/2 up  {0=osd1=up:active,1=osd2=up:active}, 1 up:standby    osd: 15 osds: 15 up, 15 in; 57 remapped pgs   data:    pools:   2 pools, 384 pgs    objects: 3264k objects, 12342 GB    usage:   25062 GB used, 18609 GB / 43671 GB avail    pgs:     1235611/6685016 objects misplaced (18.483%)             327 active+clean             49  active+remapped+backfill_wait             5   active+remapped+backfill_wait+backfill_toofull             3   active+remapped+backfilling   io:    recovery: 38584 kB/s, 9 objects/s [root@osd1 ~]# ceph -s   cluster:    id:     ffdb9e09-fdca-48bb-b7fb-cd17151d5c09    health: HEALTH_ERR            1235327/6685016 objects misplaced (18.479%)            Degraded data redundancy (low space): 5 pgs backfill_toofull   services:    mon: 3 daemons, quorum osd1,osd2,osd3    mgr: osd1(active), standbys: osd2    mds: cephfs-2/2/2 up  {0=osd1=up:active,1=osd2=up:active}, 1 up:standby    osd: 15 osds: 15 up, 15 in; 57 remapped pgs   data:    pools:   2 pools, 384 pgs    objects: 3264k objects, 12342 GB    usage:   25063 GB used, 18608 GB / 43671 GB avail    pgs:     1235327/6685016 objects misplaced (18.479%)             327 active+clean             49  active+remapped+backfill_wait             5   active+remapped+backfill_wait+backfill_toofull             3   active+remapped+backfilling   io:    recovery: 32430 kB/s, 8 objects/s [root@osd1 ~]# ceph osd status+----+-------------------------+-------+-------+--------+---------+--------+---------+-----------+| id |           host          |  used | avail | wr ops | wr data | rd ops | rd data |   state   |+----+-------------------------+-------+-------+--------+---------+--------+---------+-----------+| 0  | osd1.example.com | 1789G | 1004G |    0   |     0   |    0   |     0   | exists,up || 1  | osd2.example.com | 2228G |  566G |    0   |     0   |    0   |     0   | exists,up || 2  | osd3.example.com | 2270G |  524G |    0   |     0   |    0   |     0   | exists,up || 3  | osd4.example.com | 2164G |  629G |    0   |     0   |    0   |     0   | exists,up || 4  | osd1.example.com | 2069G |  725G |    0   |     0   |    0   |     0   | exists,up || 5  | osd1.example.com | 1454G | 1339G |    0   |     0   |    0   |     0   | exists,up || 6  | osd2.example.com |  485G |  446G |    0   |     0   |    0   |     0   | exists,up || 7  | osd3.example.com |  437G |  494G |    0   |     0   |    0   |     0   | exists,up || 8  | osd3.example.com |  627G |  303G |    0   |     0   |    0   |     0   | exists,up || 9  | osd4.example.com |  771G |  159G |    0   |     0   |    0   |     0   | exists,up || 10 | osd4.example.com |  339G |  591G |    0   |     0   |    0   |     0   | exists,up || 11 | osd5.example.com | 2464G | 3124G |    0   |     0   |    0   |     0   | exists,up || 12 | osd5.example.com | 2174G | 3414G |    0   |     0   |    0   |     0   | exists,up || 13 | osd5.example.com | 3418G | 2170G |    0   |     0   |    0   |     0   | exists,up || 14 | osd5.example.com | 2367G | 3112G |    0   |     0   |    0   |     0   | exists,up |+----+-------------------------+-------+-------+--------+---------+--------+---------+-----------+[root@osd1 ~]#
The recovery process is looking good.  I’ll check back again tomorrow to make sure it’s finished and all of our alerts have cleared.
Once that is done I’ll make one more adjustment on the pg_num to bring it up to the right level for the number of our OSDs.

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
 楼主| 发表于 2022-7-22 11:11:33 | 显示全部楼层
First Tried Reweighting the OSDs
4 U" b: Y/ ]2 UI previously had a similar issue were an OSD was nearfull and ran reweight to help resolve they issue- Q4 p( h4 T4 e: ]; p
ceph osd reweight-by-utilization
2 h; T% D! p) \0 wThis is what the cluster looked like before starting the reweight process.
/ Z* A, Q+ L# m8 P9 B3 I# e* f/ P; _[root@osd1 ~]# ceph -s
, }$ V* y. i, }5 o  cluster:
/ M* Q- F4 d4 M- j0 u' u    id:     ffdb9e09-fdca-48bb-b7fb-cd17151d5c09
! _& e4 o/ S6 |( Q& \# c" {: T6 l    health: HEALTH_ERR
/ \& K' }6 q5 p, X8 S            1 backfillfull osd(s)3 N+ {& [3 T6 q' q' p- m) @
            2 pool(s) backfillfull* `: L8 Q( k% N6 ?
            26199/6685016 objects misplaced (0.392%)  V# ~" H# u( m9 U/ L
            Degraded data redundancy (low space): 1 pg backfill_toofull
& W7 p) l" @. q2 A2 M; c  g7 R
9 c% k# L+ L" D+ v  services:
4 \) U% Z' X" v2 Q    mon: 3 daemons, quorum osd1,osd2,osd3; ^* F6 S5 @  p5 d8 V
    mgr: osd1(active), standbys: osd2
6 |+ t+ }  {; q0 f0 x    mds: cephfs-2/2/2 up  {0=osd1=up:active,1=osd2=up:active}, 1 up:standby
7 L7 s6 q' }$ I8 }  N) u    osd: 15 osds: 15 up, 15 in; 1 remapped pgs
* Q( L  Y8 E1 \3 {1 H6 W $ M' J' G7 J9 J8 D' \5 g) ~* s
  data:
# {+ d3 y- R8 N( c. J! p    pools:   2 pools, 256 pgs7 T4 f. @: {( l: T+ n$ k' m
    objects: 3264k objects, 12342 GB
# P. I' i5 l* ]9 _# o9 R1 g! H1 S( a+ f    usage:   24773 GB used, 18898 GB / 43671 GB avail+ P/ |& O. P! W. M
    pgs:     26199/6685016 objects misplaced (0.392%)
% b8 p1 ]5 _- j3 G2 V5 l: |             255 active+clean
  _) y9 M! g) @2 G+ t             1   active+remapped+backfill_toofull; o! q+ B( o! Z' R+ A7 \. {

" c/ N" z& W, B( h) M: D! @; w, _/ C1 y[root@osd1 ~]# ceph osd status
- Z9 D! n/ d9 d6 ?) u+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+3 V# x4 _% S: B# d+ T1 R8 G
| id |           host          |  used | avail | wr ops | wr data | rd ops | rd data |         state          |8 T# ~& }8 Z- H/ z
+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+. ?8 V% t& B0 G/ F, X
| 0  | osd1.example.com | 1741G | 1053G |    0   |     0   |    0   |     0   |       exists,up        |+ x. U  Q" E8 Z
| 1  | osd2.example.com | 2034G |  760G |    0   |     0   |    0   |     0   |       exists,up        |: z4 U& \/ k% z0 C& @0 B8 E
| 2  | osd3.example.com | 1937G |  857G |    0   |     0   |    0   |     0   |       exists,up        |
4 p8 H* M; V+ ~  \| 3  | osd4.example.com | 2031G |  763G |    0   |     0   |    0   |     0   |       exists,up        |
' v- b8 n! P8 i6 i: a| 4  | osd1.example.com | 2032G |  761G |    0   |     0   |    0   |     0   |       exists,up        |. O/ }) s, L7 g6 q
| 5  | osd1.example.com | 2033G |  761G |    0   |     0   |    0   |     0   |       exists,up        |
6 m9 v( V& f+ E# H  |9 || 6  | osd2.example.com |  485G |  446G |    0   |     0   |    0   |     0   |       exists,up        |
* o, R& k- K0 C, X* P9 K! B; U! J| 7  | osd3.example.com |  677G |  254G |    0   |     0   |    0   |     0   |       exists,up        |1 F/ O! x3 C4 u( F
| 8  | osd3.example.com |  869G | 61.7G |    0   |     0   |    0   |     0   | backfillfull,exists,up |
0 R) E! X& w/ Q! R8 D8 u, j% Y| 9  | osd4.example.com |  676G |  255G |    0   |     0   |    0   |     0   |       exists,up        |
7 r& @1 g4 Q' A( q, ~7 K. ?| 10 | osd4.example.com |  194G |  736G |    0   |     0   |    0   |     0   |       exists,up        |
" X7 `' ]" H' }1 M| 11 | osd5.example.com | 2806G | 2782G |    0   |     0   |    0   |     0   |       exists,up        |
( @$ j* Q% h+ f& b5 G- S. j| 12 | osd5.example.com | 1938G | 3650G |    0   |     0   |    0   |     0   |       exists,up        |2 i3 _  S7 X8 c+ X" B7 ~
| 13 | osd5.example.com | 2901G | 2687G |    0   |     0   |    0   |     0   |       exists,up        |1 \4 y( N) y: d
| 14 | osd5.example.com | 2412G | 3067G |    0   |     0   |    0   |     0   |       exists,up        |' E- Q+ @( L* a6 L% H
+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+
/ ~. H1 x2 `' V[root@osd1 ~]# ceph osd reweight-by-utilization) s" b* ?5 f0 L# `8 B6 c
moved 9 / 512 (1.75781%)2 p. R& m+ J8 R/ X4 }" p# E
avg 34.1333/ x# b0 q1 V8 M$ O* @$ I4 G
stddev 16.7087 -> 16.5484 (expected baseline 5.64427)/ W& g! v7 R9 b8 T9 ]
min osd.6 with 8 -> 8 pgs (0.234375 -> 0.234375 * mean)) K; i7 \: Y* Y, @
max osd.13 with 60 -> 60 pgs (1.75781 -> 1.75781 * mean)
3 `: ]: E3 V# v2 }! hoload 120! x4 b1 b' f& s9 e# x9 n4 u
max_change 0.05
* b) u  a5 G, T; G8 L$ m# Imax_change_osds 4
" u9 L5 @+ j5 H/ d8 K- M! V* J2 v! Z) |average_utilization 0.5673
' e0 ]; s0 @  Q) C% Xoverload_utilization 0.68076 K% Q( J8 Y8 i
osd.8 weight 0.6501 -> 0.6001
; G+ \# b3 o+ v+ r9 W$ sosd.1 weight 0.7501 -> 0.7001' b/ @1 P8 W4 ^* ~
osd.5 weight 0.8852 -> 0.8353
0 d/ l* P' q( r0 @6 s! iosd.4 weight 0.9500 -> 0.9000
- @) F. h# T4 y7 S$ cThis process will take a while to run based on the size of your cluster and your configuration.$ {) C/ o  \3 {* Z) o. W- G
For me it took about 24 hours to complete, and it didn’t resolve my issue, so I attempted another reweight, and again after 24 hours later I now have two OSDs with a status of backfillfull.  So obviously need to look into another way of getting this resolved.  W, [1 O) [- B; Y1 `; v
Second Tried Increasing PG
( V% y' o" U; d; Y% _, zI did some addition checking and looked further into the issue.( T3 I& |8 Z9 @1 N
I first checked the OSD troubleshooting and then the PG troubleshooting, I tracked down I had a pg issue.
" }9 B( L: C/ k9 [9 Q+ dLooks like pg 1.33 is getting low space and not continuing with the backfill.  We have misplaced objects and not missing objects which is good, our cluster is still running during this process.
. c# t9 p) T# k# w$ `[root@osd1 ~]# ceph health detail9 m( J& K3 S$ g
HEALTH_ERR 2 backfillfull osd(s); 2 pool(s) backfillfull; 70105/6685016 objects misplaced (1.049%); Degraded data redundancy (low space): 1 pg backfill_toofull0 x) R" ^( \- p1 k/ f" R6 S2 K
OSD_BACKFILLFULL 2 backfillfull osd(s)
9 f2 `7 M4 `( R# K( H: H( d5 ]& {4 a    osd.8 is backfill full
" U: _$ n/ B# T/ l- |" M4 _* O# O    osd.9 is backfill full
+ N) k& a! Q; [) I$ l# ?# ?' ZPOOL_BACKFILLFULL 2 pool(s) backfillfull
+ @, w7 f! X4 H- S    pool 'cephfs_data' is backfillfull
9 K7 [4 g3 Q! M/ N2 B: P: j    pool 'cephfs_metadata' is backfillfull
& R$ ]0 Y$ {+ i# ]OBJECT_MISPLACED 70105/6685016 objects misplaced (1.049%)
4 B; B0 D+ P" L4 a3 KPG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull: E# `) i3 Y* v/ b2 w0 X" V
    pg 1.33 is active+remapped+backfill_toofull, acting [12,4]8 J% s7 A1 L; f. G
[root@osd1 ~]# ceph osd status
3 z+ V0 H2 B3 ?. Z" v/ D+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+
9 t. i. p% i, e/ K| id |           host          |  used | avail | wr ops | wr data | rd ops | rd data |         state          |
% m7 d0 |; `& A, X8 L  K+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+2 q* D$ O9 O/ I* X0 T) a
| 0  | osd1.example.com | 1741G | 1053G |    0   |     0   |    0   |     0   |       exists,up        |
4 `0 w: y7 h0 Q* ~| 1  | osd2.example.com | 1937G |  856G |    0   |     0   |    0   |     0   |       exists,up        |
* v2 s  L" f" c$ ?! I| 2  | osd3.example.com | 2033G |  760G |    0   |     0   |    0   |     0   |       exists,up        |
) J5 h5 U6 \# t' D4 V| 3  | osd4.example.com | 2180G |  614G |    0   |     0   |    0   |     0   |       exists,up        |
% N# G# u; ^' T| 4  | osd1.example.com | 1936G |  857G |    0   |     0   |    0   |     0   |       exists,up        |
  k. l0 Z  `7 G) W* O# @| 5  | osd1.example.com | 1840G |  954G |    0   |     0   |    0   |     0   |       exists,up        |7 P0 @6 `9 O( ^* L" d5 G
| 6  | osd2.example.com |  485G |  446G |    0   |     0   |    0   |     0   |       exists,up        |
" a6 `$ Z$ M/ n: O) g; W| 7  | osd3.example.com |  677G |  254G |    0   |     0   |    0   |     0   |       exists,up        |
) o% w- s' z$ i5 M! V, ?| 8  | osd3.example.com |  869G | 61.7G |    0   |     0   |    0   |     0   | backfillfull,exists,up |
8 {6 j- t, P5 ^# ?| 9  | osd4.example.com |  867G | 64.3G |    0   |     0   |    0   |     0   | backfillfull,exists,up |
" r5 k% P9 d6 S; D1 y$ F: _| 10 | osd4.example.com |  194G |  737G |    0   |     0   |    0   |     0   |       exists,up        |
; B- }! q2 P4 o, C9 w: H8 z" [| 11 | osd5.example.com | 2806G | 2782G |    0   |     0   |    0   |     0   |       exists,up        |
' A3 O1 q( Q; O9 n| 12 | osd5.example.com | 1938G | 3650G |    0   |     0   |    0   |     0   |       exists,up        |
! ]+ \& u% d" v+ U, G; V* K5 Y| 13 | osd5.example.com | 2901G | 2687G |    0   |     0   |    0   |     0   |       exists,up        |
8 L. _0 z2 K& V+ @4 a  E| 14 | osd5.example.com | 2412G | 3067G |    0   |     0   |    0   |     0   |       exists,up        |: E# G8 f1 v' B# k
+----+-------------------------+-------+-------+--------+---------+--------+---------+------------------------+! ?+ ~& X! r5 `- O- w6 u
We can see that now today I have 2 OSDs that are backfillfull, which isn’t good, and I can see that pg 1.33 seems to be the one that is giving us a problem.
. X' w4 H2 Y9 N  HAfter doing some additional research I was able to determine the when I setup my Ceph cluster, I only had <10 OSDs, now I’m running 16 OSDs.  I had made a bad assumption there was a single OSD per server, but in fact we have 4 drives in each server which gives us 4 OSDs per physical server. Each OSD manages an individual storage device.! ?& _8 `/ C  w
Based on the Ceph documentation in order to determine the number of pg you want in your pool, the calculation would be something like this. (OSDs * 100) / Replicas, so in my case I now have 16 OSDs, and 2 copies of each object.
; D/ `2 Z9 X. q7 S# U) K% c16 * 100 / 2 = 800
5 t8 C# @8 Z$ [3 Z9 b5 GThe number of pg must be in powers of 2, so the next matching power of 2 would be 1024. So I checked our pool pg size and attempted to make adjustments to see if they helps.
  w5 k" g8 j9 o9 c: x% t' URemember when making changes to pg_num also increase pgp_num.
1 _& {( b* X3 x/ k: Q5 y! j[root@osd1 ~]# ceph osd lspools
: Q6 b& I; t9 c% `1 cephfs_data,2 cephfs_metadata,' E  m0 |& z6 }
[root@osd1 ~]# ceph osd pool get cephfs_data size: K+ U' T$ S0 R, a: Q/ ?
size: 2
, R* K& D) M$ T, j, r- r[root@osd1 ~]# ceph osd pool get cephfs_data min_size
# l! q1 {: }  h2 J2 \' xmin_size: 18 n' f$ d( R) O! `+ Z/ L6 `
[root@osd1 ~]# ceph osd pool get cephfs_data pg_num
, b/ h, C9 `; l, u5 npg_num: 128
+ l7 @; }# R( {/ @& b3 P/ O% ?; i[root@osd1 ~]# ceph osd pool get cephfs_data pgp_num! o) u, b8 {* y/ k/ r
pgp_num: 128
7 i8 }: B1 s0 ~# ]& MWe can see that when I created the pool I used the default of 128, not realizing that I was going to be adding OSDs over time and it’s recommended to adjust pg_num and pgp_num based on the increasing number of OSDs.  So I attempted to increase pg_num from 128 to 1024.1 U# e7 g; O3 E! B. }( i; Z
[root@osd1 ~]# ceph osd pool set cephfs_data pg_num 1024, C  W" |# s" H) X% r: i$ w
Error E2BIG: specified pg_num 1024 is too large (creating 920 new PGs on ~15 OSDs exceeds per-OSD max of 32). O5 ?' a) p1 @
I’m not able to make such a radical jump from 128 to 1024, so I did a smaller increase from 128 to 256.
, A2 a8 U/ M0 ?& k. \[root@osd1 ~]# ceph osd pool set cephfs_data pg_num 2563 h- I. a* p3 A' c9 {4 c: x
set pool 1 pg_num to 256
1 ]! u* Z& v9 E7 F, e+ KThis has initiated the changes in my pool, and before making any further adjustments it will take some time for the cluster to recover. I’m going to wait for this to complete again before making any further changes.
5 f4 X% f# g2 K3 V- p" ~So you can see what my Ceph health check looks like, this is where we are at now after making those changes.
5 K! J1 z3 D1 B+ y) F# \& f- |[root@osd1 ~]# ceph -s
# u! _1 [- w6 e' B( R. P% S  cluster:
: n: ^% o  G+ r* V/ T    id:     ffdb9e09-fdca-48bb-b7fb-cd17151d5c09
% N4 N6 |5 G; m    health: HEALTH_ERR
! N% k. _3 g4 h- }- [            2 backfillfull osd(s)% p/ I6 Q4 `5 \
            2 pool(s) backfillfull
: u4 K2 _$ B& I6 G  n( a# f            2830303/6685016 objects misplaced (42.338%)2 {) z3 m- h# k9 y4 A" B3 D
            Degraded data redundancy: 2/6685016 objects degraded (0.000%), 1 pg degraded
$ O; C3 A4 I- ^            Degraded data redundancy (low space): 2 pgs backfill_toofull* [/ v7 e- N! r8 b; j  I2 `
! i) ]( \6 P& J8 k8 }7 U
  services:: P" b; k' m8 K) s$ q
    mon: 3 daemons, quorum osd1,osd2,osd3' x9 C: m& V* I: R( r/ H+ w! f
    mgr: osd1(active), standbys: osd2% k3 g$ J" L1 I, e( V
    mds: cephfs-2/2/2 up  {0=osd1=up:active,1=osd2=up:active}, 1 up:standby
  S8 s0 Z: y# u    osd: 15 osds: 15 up, 15 in; 130 remapped pgs1 \& j- S- ^% ^$ E. J$ x! L- ?
5 e4 Y: g' p9 {! G/ C
  data:
: _" U- e1 @6 b1 s    pools:   2 pools, 384 pgs; V. x# n  d; N7 e5 H2 a& {
    objects: 3264k objects, 12342 GB
& l* v  J+ [1 l7 n" B$ o' A    usage:   24915 GB used, 18756 GB / 43671 GB avail# a0 `+ I9 D9 V/ A* M$ K9 |
    pgs:     2/6685016 objects degraded (0.000%)* g/ [5 {2 l, X' T+ `
             2830303/6685016 objects misplaced (42.338%)# d" l  M* v- c( i$ B
             253 active+clean  j( j4 u5 U; T6 F" o
             120 active+remapped+backfill_wait5 |+ R3 I2 x. E- S5 S6 l
             8   active+remapped+backfilling
$ t) Y: [0 e7 f9 i- S9 J             2   active+remapped+backfill_wait+backfill_toofull
# y0 Q2 l* ^* J8 M             1   active+recovery_wait+degraded
7 e1 [; U2 E% h$ ]5 z
9 W9 W6 W1 }! @! x8 N9 f8 b  io:! O1 O' a& z3 T7 Z
    recovery: 95900 kB/s, 24 objects/s
: @8 z' T6 y) W : v% C; v' M3 s+ l
[root@osd1 ~]# ceph health detail
8 P( f4 `9 C* J8 k: M- [HEALTH_ERR 2 backfillfull osd(s); 2 pool(s) backfillfull; 2792612/6685016 objects misplaced (41.774%); Degraded data redundancy: 2/6685016 objects degraded (0.000%), 1 pg degraded; Degraded data redundancy (low space): 2 pgs backfill_toofull
) z! c) A1 O# ]0 t$ n2 J8 U/ \2 IOSD_BACKFILLFULL 2 backfillfull osd(s)& @2 l6 T4 U& k6 S- Q
    osd.8 is backfill full
0 ~7 ~) t. C2 {! u6 i    osd.9 is backfill full/ y7 ]9 J6 A, w1 Y+ t9 S" D- @
POOL_BACKFILLFULL 2 pool(s) backfillfull
# |2 T( Y2 q, f! A: u& x! h    pool 'cephfs_data' is backfillfull, a! }) o( O( W& B3 k$ L! C% k- E4 \
    pool 'cephfs_metadata' is backfillfull& n: ~! K! T6 e' m- q
OBJECT_MISPLACED 2792612/6685016 objects misplaced (41.774%)
% M" T+ D1 ]% a8 jPG_DEGRADED Degraded data redundancy: 2/6685016 objects degraded (0.000%), 1 pg degraded! h0 B/ p, k1 Q; v1 K! p: H
    pg 1.3a is active+recovery_wait+degraded, acting [11,2]
5 C) G  e. U: O+ {3 `PG_DEGRADED_FULL Degraded data redundancy (low space): 2 pgs backfill_toofull
( [: z& x; {: ]1 d' G0 d/ z    pg 1.33 is active+remapped+backfill_wait+backfill_toofull, acting [12,4]" f; E% w) Q1 p/ l
    pg 1.a6 is active+remapped+backfill_wait+backfill_toofull, acting [7,14]! S. M2 L! K, k2 p
Earlier when I started only pg 1.33 was showing backfill_toofull, and now we have  pg 1.33 and 1.a6 both showing.  Lets wait for the dust to settle after our last change before making any more adjustments.
0 R9 @$ F5 v- Y( f" M0 S# zThe Recovery Process
) {& O& f4 p2 @0 x/ PAfter 24 hours it’s looking good, no errors, but it’s still got going through a recover process.  We’re down from 42% to 18% objects misplaced, and our OSDs no longer have any backfill error messages, so looks like we’re on the right path.# d+ w8 d3 K3 c' s* N6 y
[root@osd1 ~]# ceph -s ; ~; {" c! v  p$ o
  cluster:8 {: o' S0 X4 P' J. u
    id:     ffdb9e09-fdca-48bb-b7fb-cd17151d5c09
4 Q8 m1 S0 \# ?    health: HEALTH_ERR+ v9 V. E* w7 z: t+ }2 n/ {
            1235611/6685016 objects misplaced (18.483%)( e. O3 o+ S; _
            Degraded data redundancy (low space): 5 pgs backfill_toofull
- {& X% v5 ~. o- ~. \
" ^& g! d! m! j* P& o2 o4 q( I2 R  services:
5 x: W$ H' g! J: m/ u" \    mon: 3 daemons, quorum osd1,osd2,osd3
, O, v# L: t- ]    mgr: osd1(active), standbys: osd2/ _2 l0 k5 F( u0 e4 d
    mds: cephfs-2/2/2 up  {0=osd1=up:active,1=osd2=up:active}, 1 up:standby
1 p" P. G6 x; O  l    osd: 15 osds: 15 up, 15 in; 57 remapped pgs
. v8 i1 M& r1 ?1 t+ n
* O5 @$ y& U5 V7 p* V% i! B* L7 ^+ [  data:
5 e3 b! z  H: K$ b+ ]    pools:   2 pools, 384 pgs
) Z* @+ [/ E" _7 z: K    objects: 3264k objects, 12342 GB6 _4 g/ F: Y0 L4 f  v2 c. Y& v) n
    usage:   25062 GB used, 18609 GB / 43671 GB avail
# y3 b, O# h9 G: n9 \9 l. V( |  J    pgs:     1235611/6685016 objects misplaced (18.483%)$ M! G8 L. _8 n4 s+ z: E! Z9 d* R
             327 active+clean
) m6 m3 [: j3 z, a( W             49  active+remapped+backfill_wait
# @5 r' U  A1 u' H0 b7 H" _             5   active+remapped+backfill_wait+backfill_toofull
* b1 B& O: k- R$ g/ t- e             3   active+remapped+backfilling
+ ?6 `) p5 ^; ~1 R4 `* c$ c$ y
. m6 [, |& h1 o; \; t0 }% c! F2 x  io:. Q7 B6 G; [3 ~4 ^
    recovery: 38584 kB/s, 9 objects/s# \5 s  h' w' q3 D& Z: }0 B
9 [0 j9 ]# V6 ]: y6 A' ?
[root@osd1 ~]# ceph -s + I' ^) x. n1 I( K  u6 B2 E; w/ w& m
  cluster:
* {1 \6 m8 r- ^& q7 }    id:     ffdb9e09-fdca-48bb-b7fb-cd17151d5c096 i( `- X4 z5 S9 Q& |8 q
    health: HEALTH_ERR7 C$ ^: V" I, J0 O1 V( i4 b
            1235327/6685016 objects misplaced (18.479%)
3 _6 T; x, G) ~& x# @            Degraded data redundancy (low space): 5 pgs backfill_toofull' p0 I2 K; M4 M8 z. W6 m' C% U

, W. g7 H( W+ g- D4 O  services:2 G2 E, h  O7 X) m7 z5 o) z+ a
    mon: 3 daemons, quorum osd1,osd2,osd3
' F( r& ?$ S! h* q# D6 a    mgr: osd1(active), standbys: osd2; ^) z7 D; a# \
    mds: cephfs-2/2/2 up  {0=osd1=up:active,1=osd2=up:active}, 1 up:standby5 Z, E- `! e9 A4 P' M! e  n9 a% r
    osd: 15 osds: 15 up, 15 in; 57 remapped pgs
* o5 V3 q3 \& a1 T; a9 F: y
7 B/ D8 Y8 w. u' w- c& X5 E: f  data:* Y: f  X* d% ?6 ~
    pools:   2 pools, 384 pgs
" B3 B/ i* f8 J9 q2 ?  U! |$ q    objects: 3264k objects, 12342 GB$ o8 O- J8 w4 d- _
    usage:   25063 GB used, 18608 GB / 43671 GB avail6 A6 s% e) k0 Q7 f% ]7 h
    pgs:     1235327/6685016 objects misplaced (18.479%)
9 [+ v( y" B9 c0 k1 I9 ^4 z; k             327 active+clean6 K/ }/ h9 V+ ?( O5 {# B) {2 B: t
             49  active+remapped+backfill_wait& Q2 P7 G8 r% \, p4 r6 L* L" ?
             5   active+remapped+backfill_wait+backfill_toofull
( _( t' w( t- e. e' H* Y             3   active+remapped+backfilling7 n. n. y1 N  S4 Y
- c0 z% {' i6 H, Z3 _( L. s
  io:
& N: D- A3 l' a  s    recovery: 32430 kB/s, 8 objects/s+ W  @8 m7 P! Y' P6 \1 a

+ _5 U/ [0 g1 P+ a0 m! ]) S" Z% P[root@osd1 ~]# ceph osd status" q2 _! m6 s9 g$ C" K2 A0 [3 {
+----+-------------------------+-------+-------+--------+---------+--------+---------+-----------+% r: E( j  T9 }" C
| id |           host          |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
2 X& f+ U. N+ k; T+----+-------------------------+-------+-------+--------+---------+--------+---------+-----------+
6 F" f& X/ A- i5 ?| 0  | osd1.example.com | 1789G | 1004G |    0   |     0   |    0   |     0   | exists,up |
  w: s/ g5 N, L6 N| 1  | osd2.example.com | 2228G |  566G |    0   |     0   |    0   |     0   | exists,up |
. z5 G; H/ M  [5 @# p5 f# K/ s| 2  | osd3.example.com | 2270G |  524G |    0   |     0   |    0   |     0   | exists,up |2 u3 ]0 Q% p* q2 _
| 3  | osd4.example.com | 2164G |  629G |    0   |     0   |    0   |     0   | exists,up |
" D4 y) l8 ?; w7 r. e| 4  | osd1.example.com | 2069G |  725G |    0   |     0   |    0   |     0   | exists,up |4 w2 ]) q# N) P( B9 d
| 5  | osd1.example.com | 1454G | 1339G |    0   |     0   |    0   |     0   | exists,up |0 T# u0 [6 g8 u
| 6  | osd2.example.com |  485G |  446G |    0   |     0   |    0   |     0   | exists,up |
0 T7 [5 m% L3 ~0 Q| 7  | osd3.example.com |  437G |  494G |    0   |     0   |    0   |     0   | exists,up |2 {% ]& h; i$ c) ]  H: i0 \  F
| 8  | osd3.example.com |  627G |  303G |    0   |     0   |    0   |     0   | exists,up |! q3 P; R" U3 Z& z( v7 t
| 9  | osd4.example.com |  771G |  159G |    0   |     0   |    0   |     0   | exists,up |
* _# c9 }: f9 ~6 ~) b& ]| 10 | osd4.example.com |  339G |  591G |    0   |     0   |    0   |     0   | exists,up |
! V8 A7 H$ r/ T  c# z4 [  b7 ]9 x| 11 | osd5.example.com | 2464G | 3124G |    0   |     0   |    0   |     0   | exists,up |
7 G: v/ U6 |9 P+ L" D, S| 12 | osd5.example.com | 2174G | 3414G |    0   |     0   |    0   |     0   | exists,up |
& T: t5 Z. e  W7 v6 \9 V| 13 | osd5.example.com | 3418G | 2170G |    0   |     0   |    0   |     0   | exists,up |
3 m, E! l3 t! c( C0 V& Y| 14 | osd5.example.com | 2367G | 3112G |    0   |     0   |    0   |     0   | exists,up |) @+ f0 n0 y7 W
+----+-------------------------+-------+-------+--------+---------+--------+---------+-----------+
( L0 k% l* n+ I5 r! F[root@osd1 ~]# & c- ^7 y5 X, B  ]
The recovery process is looking good.  I’ll check back again tomorrow to make sure it’s finished and all of our alerts have cleared.6 ~& x. W( E. A  p) y6 Q
Once that is done I’ll make one more adjustment on the pg_num to bring it up to the right level for the number of our OSDs.& b( R7 a# G* a, l% }
您需要登录后才可以回帖 登录 | 注册

本版积分规则

返回首页|Archiver|手机版|小黑屋|易陆发现技术论坛 ( 蜀ICP备2026014127号-1 )

GMT+8, 2026-6-11 23:56 , Processed in 0.020247 second(s), 23 queries .

Powered by Discuz! X5.0

© 2001-2026 Discuz! Team.

快速回复 返回顶部 返回列表