找回密码
 注册
查看: 594|回复: 2

1 Large omap objects ceph health deatil

[复制链接]

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
发表于 2022-8-19 17:00:37 | 显示全部楼层 |阅读模式
Large omap objects$ U- j6 J  @, e9 X' K$ y5 @
# ceph health detail
; w: R- s7 J. T& pHEALTH_WARN 1 large omap objects
" _! d# `( ?" F  w5 T5 T( ALARGE_OMAP_OBJECTS 1 large omap objects
& t$ z0 C8 y/ ]( h    1 large objects found in pool 'is_recovery' #出现large omap的pool2 S, ]6 W0 d  h
    Search the cluster log for 'Large omap object found' for more details.' Z# z2 y9 x2 @$ Z) x) ]. a  k

* b) Z. B. S) ]  F% `+ H' Q9 D" T) ]9 E

- t1 s! ^  f2 m. y) J( i1 k
+ w: B7 A' J, J  G9 }6 wceph pg ls-by-pool  is_recovery|awk '{print "ceph pg "$1 " query|grep num_large_omap_objects"}'|sh -x
& C2 C, ?2 G4 @. [& Kceph pg 11.0 query|grep num_large_omap_objects ; s$ R2 a6 h$ ]4 t1 e' N8 i
ceph pg 11.1 query|grep num_large_omap_objects* Z% H$ H$ q5 E; s0 H8 W
ceph pg 11.2 query|grep num_large_omap_objects
, [, O+ b2 Y7 s( G+ ^! Q! j+ X6 U7 Y* @

; P2 r, Z9 `- |
' |4 S: g0 F3 h  W9 Z( r: J+ L4 X4 d
[root@ceph-1 ~]# ceph daemon mds.ceph-1 flush journal
/ Y6 l" }* H/ {( h+ \# ^{
: f; V  p2 i' T( O; B- Q4 w, P+ ^    "message": "",
. Z- U2 C0 S# b1 n+ o. f    "return_code": 0
; A5 q" t/ T+ T; q% k}
- g. Y8 q5 z7 X. {9 \5 ^[root@ceph-1 ~]#. I0 j3 N: p" J. B6 u
[root@ceph-2 ~]# ceph daemon mds.ceph-2 flush journal- B7 x- J! U9 O1 ~+ a' R) _3 _9 T
"mds_not_active"' X; D1 g# j: t
[root@ceph-2 ~]# ceph daemon mds.ceph-2 flush journal* d* Q; Y/ F$ [9 ]6 V% w: I
"mds_not_active"# f; {- C" Q1 c6 s  b- d

. E* a! g+ u: h# z
, |+ |* z! o( B. m* T: e8 J$ X8 C0 ]% `: Z

0 _0 E$ w% P2 R1 G( `- d7 A, W8 S9 z/ m/ K

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
 楼主| 发表于 2022-8-23 09:53:54 | 显示全部楼层
index pool的 large omap 处理. {0 V' V" T" w" l# b
向单个bucket压测2000W个object,默认设置shard数为16,压测到1800W出现large omap,介绍一下错误定位和如何处理。
0 H; A3 m* b. g9 ]9 |; F/ P: ^" p  c8 K! [
异常定位
1 c, S  A2 c& U: `9 |1 _1 c; o集群状态如下! U4 b7 }+ F% q* w$ Q
4 l3 V% v. ]( I- c- {- j6 V5 ^/ h
[root@demo123 cephuser]# ceph health detail
5 ]* c- r, Z9 D) q( Y5 i! MHEALTH_WARN 16 large omap objects4 b$ I5 V- q, |) c2 I$ M
LARGE_OMAP_OBJECTS 16 large omap objects
1 z) ]1 P: w6 Z4 F* P( g5 e5 P    16 large objects found in pool 'cn-bj-test2.rgw.buckets.index'! ^7 J4 S( m. y5 ^% j
    Search the cluster log for 'Large omap object found' for more details.# S- o: Z1 J4 m( ^7 |. F
复制
& J" W0 Z9 G$ ]3 Y# \: T通过脚本找到对应的pg信息,脚本请查看之前一篇omap large处理的文章。
6 Q  f9 A# V2 }$ @3 w' R  c; U- a
% g8 b) K) @8 Y+ \0 C7 n- E[root@demo123 cephuser]# python large_omap.py' E' I+ h; `5 \( O1 p9 |
Large omap objects poolname = cn-bj-test2.rgw.buckets.index. r# N9 v  `" j% Q
pgid=13.1f OSDs=[78, 9, 59] num_large_omap_objects=1* u/ u3 I% Q* P" I# W% f" D
pgid=13.33 OSDs=[59, 79, 19] num_large_omap_objects=15 Q+ ^7 J; f* I& V9 m
pgid=13.3c OSDs=[49, 29, 78] num_large_omap_objects=1+ Q) W; e9 E3 @- q' A6 ?
pgid=13.3d OSDs=[48, 69, 9] num_large_omap_objects=1# z' K4 P, r' d5 T+ W. G& E
pgid=13.45 OSDs=[88, 39, 28] num_large_omap_objects=10 I/ d  Q. g, z
pgid=13.4d OSDs=[38, 29, 89] num_large_omap_objects=1% z; \3 t$ M! |& p  a* F( T
pgid=13.50 OSDs=[68, 19, 59] num_large_omap_objects=1
  R$ }& w7 E# |$ Fpgid=13.6b OSDs=[39, 79, 8] num_large_omap_objects=1  Q# X" K0 T9 }2 r8 J
pgid=13.8e OSDs=[38, 9, 78] num_large_omap_objects=10 @) B* G; W# z( ^- M1 H* Z
pgid=13.d1 OSDs=[9, 88, 38] num_large_omap_objects=1
. }1 ?+ y% l- lpgid=13.d2 OSDs=[59, 88, 28] num_large_omap_objects=1
, R' @0 r( a- d' k: _0 T) l9 Lpgid=13.e1 OSDs=[19, 88, 49] num_large_omap_objects=17 ]& R# G% e8 E) e* u  Y1 ~0 E
pgid=13.e4 OSDs=[38, 19, 89] num_large_omap_objects=1, A' O9 a+ ~# x: h
pgid=13.e7 OSDs=[19, 89, 38] num_large_omap_objects=1
& l* Y. v. `6 W. F& M, I% x& m0 Mpgid=13.ec OSDs=[89, 28, 48] num_large_omap_objects=1
# ^0 J: K6 k; I" p( y9 L4 Mpgid=13.f5 OSDs=[38, 88, 19] num_large_omap_objects=1+ ]' X* x6 ~2 k
复制
9 L& r) _, G+ X% V& ]) g2 F8 C查找OSD日志,确定object名称(".dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.11"),发现omap条目数达到了2378492,超过默认告警值
) Y/ z0 @0 |5 {: i3 b+ [7 S: {+ l& F( s! \. k$ r
[root@demo123 cephuser]# zcat /var/log/ceph/ceph-osd.19.log-20181231.gz |grep "omap"0 n. p: R$ S) H4 o% J6 D
2018-12-30 23:00:42.334766 7f6583f44700  0 log_channel(cluster) log [WRN] : Large omap object found. Object: 13:87443b2d:::.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.11:head Key count: 2378492 Size (bytes): 491722758
* Z4 K! q8 T3 ~3 _  ?: Z复制1 Q  X; X& I7 P1 g
默认告警值为2000000,2378492>2000000,不建议去修改这个默认值,因为改得过大会加大集群出现异常的风险,属于掩耳盗铃。
5 u. M' `/ r8 K2 L" s8 o
# U0 ?( g) e4 |) `" L[root@demo123 cephuser]# ceph daemon /var/run/ceph/ceph-osd.19.asok config show |grep large7 Z' S* D) ]) ~9 t+ q7 k
    "osd_bench_large_size_max_throughput": "104857600",
# A: r( M: ?7 d- d: P  L    "osd_deep_scrub_large_omap_object_key_threshold": "2000000",
' v) U) b  p+ V7 r) E( N    "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824",
* b0 ?+ f7 @' A" T0 {复制
# _6 l/ D8 d5 A& _查看一下发生omap过大的bucket,确定相关信息/ q) Z2 S1 Z) E# e
0 Z1 r1 K7 ?1 i# J2 D/ |; b
[root@demo123 cephuser]# radosgw-admin bucket stats --bucket=demo1# e6 g. M$ C3 R. Z
{
$ V" S& @1 X8 `    "bucket": "demo1",
1 a/ x5 h* i* W. b' A- j' r    "zonegroup": "68f1dcf5-0470-4a48-8cd2-51c837a2cafb",3 |3 J% x3 q5 j
    "placement_rule": "default-placement",: k4 x" O) L) q) @: @
    "explicit_placement": {
- [7 @# A6 [( j2 L  [$ N4 w* R        "data_pool": "",; f) P" }, @3 u$ g' L& v+ h" [, ^
        "data_extra_pool": "",2 `6 i! O9 F8 d4 h' ~
        "index_pool": ""
6 {2 X4 J6 e0 m8 U2 z2 Y    },
+ L' q1 B# A" O- {+ H3 n: a    "id": "afd874cd-f976-4007-a77c-be6fca298b71.34209.1", #当前bucket instance ID,% ]+ X" l) D2 p$ n, Z7 z7 O* z
    "marker": "afd874cd-f976-4007-a77c-be6fca298b71.34209.1",
' p; E* i* h9 n( k& v. ?5 T3 L    "index_type": "Normal",' a, y3 G# x' U+ ^
    "owner": "s3test",
4 q0 E! w1 I+ G    "ver": "0#2638037,1#2637965,2#2632835,3#2632869,4#2632799,5#2632597,6#2633289,7#2633175,8#2637227,9#2637609,10#2637997,11#2632455,12#2631337,13#2631624,14#2631983,15#2632359",
% z2 R4 R1 @( V% t    "master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0", #16个shard
. Y8 c$ P) b% |; x2 z- b    "mtime": "2018-11-28 16:47:45.560039",
: y: o/ [& E; i2 \  G) Y2 c    "max_marker": "0#00002638036.2638608.5,1#00002637964.2638536.5,2#00002632834.2649479.5,3#00002632868.2633634.5,4#00002632798.2633370.5,5#00002632596.2633168.5,6#00002633288.2633860.5,7#00002633174.2633747.5,8#00002637226.2637798.5,9#00002637608.2638181.5,10#00002637996.2638569.5,11#00002632454.2633026.5,12#00002631336.2631914.5,13#00002631623.2632195.5,14#00002631982.2632554.5,15#00002632358.2632930.5",7 H1 X2 X7 j5 K# Y% x
    "usage": {
1 {4 ]) q' J2 H, a        "rgw.main": {
7 X' I) d% s0 b            "size": 1975757355553,4 j5 v# E- C+ ?; L, X' d
            "size_actual": 2047893610496,
1 B8 }  {' m$ N+ E& |9 z6 N/ w' ~            "size_utilized": 1975757355553,; c7 Y. K4 E- ]" P' O2 x
            "size_kb": 1929450543,
9 d8 ]" Z9 F1 d2 b7 k. d0 Z  R            "size_kb_actual": 1999896104,
/ d# q. B# Z( B. p            "size_kb_utilized": 1929450543,. d% C# m" N  v, S
            "num_objects": 19998962 #近2000Wobject  X5 _9 ?5 o: T" w! Z' G8 O
        }
% \$ z* M1 w# G/ }; S- c    },
) i) ]7 C" [  W: y# I    "bucket_quota": {
4 S( o" @* t& Y2 M, N5 h        "enabled": false,
/ l7 o4 o  i) \/ m        "check_on_raw": false,% `( }" W, n4 g: ^% r
        "max_size": -1,4 b/ T. m" H; w" O! `0 {# i1 P
        "max_size_kb": 0,/ }5 R# t6 e3 l! G8 ^
        "max_objects": -17 o" W/ c* Y4 S5 ]4 G
    }
& [- l) c6 Z/ t5 l}
* C, K( G8 m4 f  D4 J! F' K1 z复制
& q' o# ?7 z9 d异常处理
( K- a# G3 \1 Z4 k' e. g; |8 [通过bucket reshard操作,将原来的bucket 重新划分shard,shard数量从16->64。注意reshard有风险,最好停掉客户端的读写操作以后再进行,同时如果你使用了multisite,请根据官方说明立即关闭Dynamic resharding特性。
6 B/ i6 P) q  p5 A, K+ S( C4 z$ E( D, T
Dynamic resharding 说明: http://docs.ceph.com/docs/mimic/radosgw/dynamicresharding/
7 ]+ \3 o% z* R! t9 m
' h; f9 J, q2 ~$ `4 W做完reshard需要手工删除之前的索引数据,工具也提示了下面的内容。8 b# w9 v( V+ R2 \3 G* L: |2 T

' m$ \% T- n: a! v8 o[root@demo123 cephuser]# radosgw-admin bucket reshard --bucket demo1 --num-shards 64/ z& m& O  b1 g0 z7 |* N
*** NOTICE: operation will not remove old bucket index objects ***
4 {" l: K3 r) H2 B$ I***         these will need to be removed manually             ***) B8 `- X7 F8 U, }% \
tenant:
0 }. u2 L+ \1 b$ O+ ]bucket name: demo1$ w. S- S1 _6 F: q1 V
old bucket instance id: afd874cd-f976-4007-a77c-be6fca298b71.34209.1
% j- Y7 c8 ~0 E& p2 M( k0 [new bucket instance id: afd874cd-f976-4007-a77c-be6fca298b71.45786.1* ?. T6 \$ f: ]  y$ q
total entries: 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 19998962
0 Z0 a$ u3 W5 {$ l; J# o2019-01-03 11:42:33.741314 7f74d15c6dc0  0 WARNING: RGWReshard::add failed to drop lock on demo1:afd874cd-f976-4007-a77c-be6fca298b71.34209.1 ret=-2/ y) ^5 b: @5 C. Q
复制& }0 C4 P7 {3 D- f% b
检查reshard结果
" O+ v0 j7 y+ i# K" D4 K  g- }* _$ F- c2 Q& J9 @
[root@demo123 cephuser]# radosgw-admin bucket stats --bucket=demo1
" t- E4 h+ _7 L3 F{: q9 ~: W! N0 G0 E/ g5 W, b& y
    "bucket": "demo1",
$ ^5 }2 A9 F$ M0 |* ]3 g0 s    "zonegroup": "68f1dcf5-0470-4a48-8cd2-51c837a2cafb",1 g1 W$ T3 l9 g
    "placement_rule": "default-placement"," Z* _( N! J% d! q
    "explicit_placement": {1 r, E- c6 \6 ~+ A, E& y5 `- M
        "data_pool": "",
& c8 q+ A  L3 D: T/ y" a$ s        "data_extra_pool": "",
2 A) T0 P1 @) {8 y! h( T. ]        "index_pool": ""# z" s1 d4 r- E
    },
! c9 y. q% Q2 J2 R; o) q; g5 H    "id": "afd874cd-f976-4007-a77c-be6fca298b71.45786.1", #bucket instance ID发生变化
3 T, v0 |4 t9 F. T* W& ?    "marker": "afd874cd-f976-4007-a77c-be6fca298b71.34209.1",$ ]; G& U3 I. }7 h5 w
    "index_type": "Normal",
6 w/ h- w' [( n# i; U" L    "owner": "s3test",
# w8 z. R. n* S; g2 z4 S    "ver": "0#4920,1#4920,2#4883,3#4877,4#4882,5#4883,6#4885,7#4880,8#4882,9#4880,10#4878,11#4883,12#4923,13#4883,14#4882,15#4874,16#4878,17#4880,18#4884,19#4881,20#4882,21#4881,22#4876,23#4922,24#4883,25#4887,26#4881,27#4879,28#4879,29#4879,30#4882,31#4884,32#4880,33#4879,34#4917,35#4876,36#4883,37#4885,38#4884,39#4879,40#4883,41#4880,42#4880,43#4882,44#4884,45#4877,46#4879,47#4877,48#4881,49#4880,50#4881,51#4881,52#4883,53#4876,54#4880,55#4884,56#4881,57#4885,58#4882,59#4881,60#4881,61#4881,62#4883,63#4882",#shard 数量变为了647 e6 I7 y* n  v1 w0 }. o
    "master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0,16#0,17#0,18#0,19#0,20#0,21#0,22#0,23#0,24#0,25#0,26#0,27#0,28#0,29#0,30#0,31#0,32#0,33#0,34#0,35#0,36#0,37#0,38#0,39#0,40#0,41#0,42#0,43#0,44#0,45#0,46#0,47#0,48#0,49#0,50#0,51#0,52#0,53#0,54#0,55#0,56#0,57#0,58#0,59#0,60#0,61#0,62#0,63#0",7 ?7 U) C+ D" }$ R- G( w
    "mtime": "2019-01-03 11:32:50.349905",5 ~% D4 p( x1 g& B
    "max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#,11#,12#,13#,14#,15#,16#,17#,18#,19#,20#,21#,22#,23#,24#,25#,26#,27#,28#,29#,30#,31#,32#,33#,34#,35#,36#,37#,38#,39#,40#,41#,42#,43#,44#,45#,46#,47#,48#,49#,50#,51#,52#,53#,54#,55#,56#,57#,58#,59#,60#,61#,62#,63#",
* b) ]( C# B; W+ Z, o/ N( w    "usage": {
6 u! z  K% ~" W* p        "rgw.main": {
& P& x& u" e; J+ ^+ }            "size": 1975757355553,
5 \4 C) f) E* j' ^# @( D9 L( ]            "size_actual": 2047893610496,
7 I7 b% c4 i& S7 P            "size_utilized": 1975757355553,8 L% w$ K! c. ?7 q- l8 o7 M
            "size_kb": 1929450543,+ t8 M: @! O3 r! {9 e- V
            "size_kb_actual": 1999896104,
1 R& U1 B0 {# }3 C9 m            "size_kb_utilized": 1929450543,* c0 L! R% L) v# W8 p0 t
            "num_objects": 19998962( \/ ]; D- \/ ^: E8 M
        }8 R/ `7 e9 X  l6 \$ p) w0 B
    },
+ s- @) a& X, h7 H& Z9 f    "bucket_quota": {/ Z: k7 L. X3 V; v4 o( ~4 V
        "enabled": false,
5 O: F7 |2 e: l1 \1 H, i2 Y. A- J        "check_on_raw": false,  t4 K3 U  u$ b1 J' \0 S
        "max_size": -1,( i* V6 E3 n6 D0 Q
        "max_size_kb": 0,1 \' S. X$ v' ]' V, U6 I
        "max_objects": -1+ x, l( V% V; o6 Z" |6 J
    }
4 I" k1 {7 g$ }' p  y6 p}) U' t6 z; }: d! p/ v
复制4 l- b) J" [; Y+ s- T
回收旧数据
' ?1 I2 r9 z  I5 B! d根据之前工具的提示需要回收index和meta两个pool里面的残留数据
& |" x" Z) L* ]
6 Z4 J4 {: u3 K+ J* ?回收index pool数据
% F6 r1 J" _% n; |* ^( A  L2 r$ E, y# P! a
[root@demo123 cephuser]# rados ls -p cn-bj-test2.rgw.buckets.index|grep "afd874cd-f976-4007-a77c-be6fca298b71.34209.1"  ~- t0 H/ _: l$ _2 B# g* |
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.51 o9 Z; [- g! Q% f% }) W7 d0 ]
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.15% n, f4 t  p2 A  S) a+ Y  ?8 O
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.2
) p; P8 g) @& u9 n9 \4 h0 N. @8 _- [.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.12 L$ E4 E, V: k! }9 d; i0 Z
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.0
' V: O1 w; R6 J.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.4
) c. P+ O# l" c; H5 [: W' W.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.11% P3 |3 d0 l/ ~0 ^2 u: M2 ^
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.13
8 e5 L* A, u2 l- s- H- y.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.6
  o* {, H1 I3 ^* }: E.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.3" I: G/ F" c4 P7 r: \# }: z1 c& z, Y
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.7' J7 I4 K! w, b" d6 `  O
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.9; t, Z% M& \! n. G. \
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.14
5 X( r1 U/ y* a/ q# J  l1 U" o& |.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.10* g$ u3 J; ^! J  s* [# N
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.124 X4 O( f3 v8 G1 s
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.8
# r2 _8 Y9 X- O6 e复制
8 V+ y, z% c. @/ t0 i$ V使用rados rm命令删除数据
+ K( e. D- |$ |% Q! l5 y  c
; D; R; h: y# o) _* Y& L! J[root@demo123 supdev]# rados ls -p cn-bj-test2.rgw.buckets.index|grep "afd874cd-f976-4007-a77c-be6fca298b71.34209.1"|awk '{print "rados rm -p cn-bj-test2.rgw.buckets.index "$1}'|sh -x
: b: N( m% m* x2 x* V/ Q# L+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.5
2 D9 w, `$ c$ |" ~5 ]+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.15
! [) L" G! w6 |& @; A9 P7 |+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.2
; ^" q% s" a; O. c: G+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.1
7 _; o9 E7 s7 B: V+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.0+ w' u7 }) k% x5 ?9 o6 ^- {$ A
+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.4
9 ~5 R; _7 V7 ?1 q) p& G+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.11
- a- P9 P! o( c6 N! w+ O% C+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.13
! H- i( r: R- h. H0 B6 `+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.65 f% I% r$ G8 g6 Q) U/ l0 i
+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.38 d8 o# _, g6 h; q0 T8 r; s3 R; s& g
+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.7
, n$ o" `) X: d& }+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.90 [2 ?+ ~; D! Q
+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.14$ w* ^" f# x2 z5 c
+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.10* u% N' T3 o4 k* v
+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.12$ e- n: L% l9 ~
+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.8
. K  s) J. Y: u+ x2 j1 n复制- ?0 l5 @8 b' i% L  p7 s0 I
回收meta pool的数据
& X) h: {8 d7 }, m& D, o( U- `) Z' s% \# F7 @. u  q' }
[root@demo123 cephuser]# rados ls -p cn-bj-test2.rgw.meta --all2 B# j2 _7 |6 j+ Q# L
root    demo1
2 c: _! x$ u* ~1 a: F  _. Q8 _root    .bucket.meta.demo1:afd874cd-f976-4007-a77c-be6fca298b71.45786.1  \: @; q- E! E6 M
root    .bucket.meta.demo1:afd874cd-f976-4007-a77c-be6fca298b71.34209.1 #残留
: T# @8 x. @) j* \: _  ]root    my-new-container_segments+ H) X" k2 m6 ]6 a
root    .bucket.meta.demo2:afd874cd-f976-4007-a77c-be6fca298b71.34353.1
0 ]3 c% g$ _# f/ U1 Hroot    .bucket.meta.my-new-container:afd874cd-f976-4007-a77c-be6fca298b71.7991.1$ z+ v1 l! I% Y$ z2 O) y
users.uid    s3test.buckets
  y' q1 ^2 j6 f3 t1 S* R( |3 Ousers.uid    swiftuser
8 ^% P2 l: ]' X$ {3 A- t. ~! }users.swift    swiftuser:swiftuser19 w# O6 l3 |" e2 B
users.keys    SNACA4LX9DS21NGMSRX4
1 ~7 ]4 O* }( ~  D( Y7 sroot    .bucket.meta.my-new-container_segments:afd874cd-f976-4007-a77c-be6fca298b71.7991.4
' B" N, q7 o5 J2 m. Ausers.uid    s3test
+ ^9 J8 \$ i; u0 F2 yroot    demo2
6 k% V. j8 @* yusers.keys    XP8E2452AB6EBU3RPD0C
4 R- N& f& D1 _' g3 h. L% F7 Hroot    my-new-container# }; W% `0 ^* T* \) z; H+ S( i2 Q
users.uid    swiftuser.buckets; P4 J  C, S% ]
users.uid    synchronization-user4 o# s6 S* j0 A6 F/ j! C
复制
5 u, a) m0 X6 T! N4 r! @" E0 U注意这里用的ceph L版本,使用了namespace,所以要指定namespace才能删除/ l+ l5 }8 X5 }: F/ P. X- m
3 E+ L! Q2 K% I0 Q
[root@demo123 cephuser]# rados rm  -p cn-bj-test2.rgw.meta .bucket.meta.demo1:afd874cd-f976-4007-a77c-be6fca298b71.34209.1 --namespace=root% R7 q+ ^# T" n% D2 ?, E) h8 D
[root@demo123 cephuser]# rados ls -p cn-bj-test2.rgw.meta --all
3 w( e* H) H1 Droot    demo1
+ O8 d' a& [- F! j, A  g& K$ |/ lroot    .bucket.meta.demo1:afd874cd-f976-4007-a77c-be6fca298b71.45786.1
6 p4 B; m# `& s7 K: R2 K2 Troot    my-new-container_segments) N) z, s. ~- Z! Q! {
root    .bucket.meta.demo2:afd874cd-f976-4007-a77c-be6fca298b71.34353.1
+ ?/ j6 q1 O: f* eroot    .bucket.meta.my-new-container:afd874cd-f976-4007-a77c-be6fca298b71.7991.1
2 B5 z' X7 @7 Z' xusers.uid    s3test.buckets
+ H5 q+ R: R  [1 r; busers.uid    swiftuser# J% J2 b( k' r- p
users.swift    swiftuser:swiftuser1
; {, m( r4 z0 tusers.keys    SNACA4LX9DS21NGMSRX4
5 f- P8 r; E1 x* [) D& W/ ~root    .bucket.meta.my-new-container_segments:afd874cd-f976-4007-a77c-be6fca298b71.7991.48 o7 m7 @/ M% ]/ a2 f
users.uid    s3test1 m2 e0 b6 }$ h4 A
root    demo2
: P4 p. E  g6 T7 gusers.keys    XP8E2452AB6EBU3RPD0C( V: `2 \3 N7 b
root    my-new-container
1 Y. J7 k1 r! U+ V3 f0 b" dusers.uid    swiftuser.buckets6 D- x) m8 w; }6 {
users.uid    synchronization-user. F% Q$ K' u2 w" |* Q; t
复制! a" M1 ?! n+ g. @0 l7 B5 h
清除large omap告警
& X* g! G. X  ^+ B删完了object并不会恢复告警,需要手工对相应的pg进行deep-scrub操作,具体如下
' S" I9 D+ T8 H' e/ G
$ e$ u1 m2 n4 S) h% c/ Z" V; W[root@demo123 cephuser]# python large_omap.py( L) {" k5 R# z8 j" M1 k/ f
Large omap objects poolname = cn-bj-test2.rgw.buckets.index; w- Z0 u: s( s0 E5 L8 C4 s" L& o
pgid=13.33 OSDs=[59, 79, 19] num_large_omap_objects=1
3 l9 }8 p3 L5 \: s5 h" ipgid=13.3c OSDs=[49, 29, 78] num_large_omap_objects=18 h) e9 \: t& N
pgid=13.3d OSDs=[48, 69, 9] num_large_omap_objects=13 ]+ k# c4 D" _
pgid=13.45 OSDs=[88, 39, 28] num_large_omap_objects=1( Q" g' x4 A' t$ e" |  p' x/ F
pgid=13.4d OSDs=[38, 29, 89] num_large_omap_objects=17 I3 v7 ]5 ^0 T6 ?' C
pgid=13.50 OSDs=[68, 19, 59] num_large_omap_objects=1. P* r% M$ l  Q
pgid=13.6b OSDs=[39, 79, 8] num_large_omap_objects=1
5 w% t# c( L' M1 N9 P$ g" Apgid=13.8e OSDs=[38, 9, 78] num_large_omap_objects=1
* c6 l/ O  i# v8 Zpgid=13.d1 OSDs=[9, 88, 38] num_large_omap_objects=1/ d/ p5 S; k+ A; M' m7 s
pgid=13.d2 OSDs=[59, 88, 28] num_large_omap_objects=1; H9 n9 T: G/ l
pgid=13.e1 OSDs=[19, 88, 49] num_large_omap_objects=13 J% f9 K- i; U) j8 R' E
pgid=13.e4 OSDs=[38, 19, 89] num_large_omap_objects=1! H* B& e/ y& @7 a- v% ?: D
pgid=13.e7 OSDs=[19, 89, 38] num_large_omap_objects=1) v& [0 z% d. |  S2 H6 q  ?
pgid=13.ec OSDs=[89, 28, 48] num_large_omap_objects=1
! W( J" ], v, N% z9 Zpgid=13.f5 OSDs=[38, 88, 19] num_large_omap_objects=1
6 V6 a, y- \: H3 p" c* f[root@demo123 cephuser]# ceph pg deep-scrub 13.33* }7 y+ Z# W7 D
instructing pg 13.33 on osd.59 to deep-scrub, b0 v( p& n0 \/ _" A/ n
复制9 e3 e7 F: o( A2 I; `3 Y/ }$ ?
操作完可以看到有pg进行dep-scrub,之后状态恢复
5 c2 ^! S+ \3 U; Q5 P& ^4 \) F) N6 }! V" Y5 ~6 o
[root@demo123 cephuser]# ceph -s/ X8 o8 P8 _: u* T" ?1 m0 o2 F
  cluster:
+ F+ o5 W& K3 Q  R. b7 r: U/ w' m    id:     21cc0dcd-06f3-4d5d-82c2-dbd411ef0ed9( H* w% \4 p- I
    health: HEALTH_WARN1 g# h& p. E( G1 O- P6 k2 c3 ^
            16 large omap objects. v, e$ i( X% m1 ^6 Y( a3 t

4 p1 P: s! b9 Z! W: C( s4 V* j" N  services:
3 ^! r! u8 V1 O    mon: 3 daemons, quorum demo122,demo131,demo141
( W% y# V! f) d- }6 v    mgr: demo141(active)/ s% b# p2 e" ?1 w; a3 `7 b9 c, Q. n
    osd: 90 osds: 90 up, 90 in
/ y5 B& c5 c$ X/ @    rgw: 1 daemon active* |. H+ E* T- ]9 M. {, u# Y. }

; c$ V' H6 V" x) j/ z# L  data:
' W/ G# p0 P! L' r. V& J    pools:   7 pools, 3712 pgs
0 E6 ]0 Q2 ]! ]  @  ~/ G3 d    objects: 20.13M objects, 1.80TiB3 ^, n5 w0 a3 O, Y
    usage:   7.28TiB used, 408TiB / 415TiB avail& ]' v- o: D* p9 Y$ L
    pgs:     3711 active+clean
2 _2 o0 o6 P2 ~8 f# O             1    active+clean+scrubbing+deep #开始deep scrub3 M5 T9 ~& i7 G" P" v

$ R! |3 j* O% @: @  io:
+ M/ W; y8 O0 M' {/ e: ]; ?* q    client:   5.29MiB/s rd, 935B/s wr, 69op/s rd, 28op/s wr
5 k) g# ~. e0 b! }# N& v# ^  O8 _1 g: o* Z$ j
[root@demo123 cephuser]# ceph -s9 g5 W2 X% u9 m) [* A
  cluster:" |5 J- q( E: [4 z; X# x
    id:     21cc0dcd-06f3-4d5d-82c2-dbd411ef0ed9& D5 {6 p: T$ M+ i" \4 k( u
    health: HEALTH_WARN! u, w$ d$ {) s$ o) e: k
            15 large omap objects #减少了1个/ N1 i: [% M. F, w0 [

/ H/ h8 l6 y; _5 l. O  services:
5 [$ C0 h" \: }& Y0 r) ~    mon: 3 daemons, quorum demo122,demo131,demo1415 e! x5 T' f: r
    mgr: demo141(active)
7 P7 M; c. L5 @2 h- X    osd: 90 osds: 90 up, 90 in
& Q: `# D; B3 R4 n" V    rgw: 1 daemon active0 t) m  ~( o! v8 K2 u) n' c- N
& ~6 g% q" l0 |' E  T% m* k
  data:
2 \) ^, v7 p7 S( ^, b    pools:   7 pools, 3712 pgs( B  r) ^) ?; E# o% G4 @
    objects: 20.13M objects, 1.80TiB2 x7 |7 S4 \; _1 k9 T) n
    usage:   7.28TiB used, 408TiB / 415TiB avail
% W6 }# w% a- b    pgs:     3712 active+clean
6 k! E7 i) _6 j/ ?- K( C5 g! P) j' }# L* [# k# S
  io:1 |) m$ W5 p5 |
    client:   5.33MiB/s rd, 680B/s wr, 36op/s rd, 6op/s wr$ E* b6 l2 ~7 ^/ Q
复制
* X" g- t1 c' s+ m总结4 Z! F' G! V8 w# X) R
index pool的omap告警一般就分为两类:4 Z! ?- U/ s4 f! e+ Q: X0 l

$ F- g* y0 \6 R- D( |6 S一类是object条目数过多,导致对应的index 元数据条目数过多,可以用上面的方法处理。2 N* q3 H0 U* ?9 }( I
另外一类是bilog过多,这里的方法就不适用了,需要手工进行bilog清理,关于bilog后续会有详细章节介绍。
! B& D5 i' {4 C  \, V: M+ \4 C3 ~6 y
- _  N, }1 x) H( n3 l* G

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
 楼主| 发表于 2022-8-23 09:54:43 | 显示全部楼层
线上multisite环境出现HEALTH_WARN 32 large omap objects,已经bucket auto reshard=false,所以排除是bucket index 所在的shard omap过大引发的问题,官方的给出的告警信息无法定位到具体的object,于是有了下面的排错过程: d8 Y; S9 h' a

/ }% Q5 b& V4 W1 J排查过程
2 l& c7 |( G( Z/ z7 v$ ][root@demo supdev]# ceph health detail
7 t9 Y9 t/ H8 g/ a8 D( @6 I, ?HEALTH_WARN 32 large omap objects1 u. _' z0 U7 l3 a% z; j- j* j1 F
LARGE_OMAP_OBJECTS 32 large omap objects
7 q) ^3 Y! b* x5 i    32 large objects found in pool 'cn-bj-test1.rgw.log' #出现large omap的pool& k* W: k( g8 h8 N/ _" Z) p- K/ k
    Search the cluster log for 'Large omap object found' for more details.% ?+ s1 G1 a7 b
/ y) X- L8 ?# }: A0 q

7 C1 k9 a! ?9 ^[root@demo supdev]# ceph pg ls-by-pool cn-bj-test1.rgw.log |awk '{print "ceph pg "$1 " query|grep num_large_omap_objects"}'|sh -x
! r  ?$ g  g1 o' n  u9 ~8 |ceph pg 11.0 query|grep num_large_omap_objects
3 x. ^/ r1 z* a6 }" \# w  Uceph pg 11.1 query|grep num_large_omap_objects
! f2 m( U1 ?) O# f+ M+ E* [ceph pg 11.2 query|grep num_large_omap_objects
; D3 R$ R. n- n3 r, [  g4 E* t......" U& o: [2 E5 \2 g8 |
+ ceph pg 11.1e6 query# X$ N! w7 v* w: g1 o2 B3 l
+ grep num_large_omap_objects
! p+ u7 v  o' V: G                "num_large_omap_objects": 1 #有large omap的objcet数量
: |5 C1 U' U4 G) s                    "num_large_omap_objects": 0- d+ z5 @8 V- y* E
                    "num_large_omap_objects": 0) W0 d) E# t( {9 q0 c! R
# [7 e: D/ S: e# T# s4 E
0 T1 X: l, x- ?; l
[root@demo supdev]# ceph pg 11.1e6 query #查询pg详细信息4 d8 v) \5 D$ o: W5 f$ I. V
{& g2 f2 Z/ f( F1 F
    "state": "active+clean",
: j1 U8 G7 l; N* \0 j& }.....
$ t2 G8 `) p$ L% t    "info": {" N" M+ [' j: q+ y
        "pgid": "11.1e6",
2 T( q. p9 H) }' }        "last_update": "10075'3051746",
- m* r3 e6 E0 m/ P2 U; [* ~# B6 z        "last_complete": "10075'3051746",  R$ _. F. s; ~/ a
        "log_tail": "10075'3050200",+ y/ x% \; i" p+ L: d1 I; O
        "last_user_version": 3051746,: \. A& g' l# K+ s
        "last_backfill": "MAX",
5 t) X9 Z" B5 U$ U1 O        "last_backfill_bitwise": 0,
, V. @/ U6 u* F' B4 `        "purged_snaps": [],
2 }; x% D* }4 U8 S/ |# k.....0 i+ `4 h9 f; }. ]/ N

; o) m* V: L; c, b3 h8 A              "acting": [
: t, R" |7 e+ O+ E' R) P4 g+ m                    46, #主OSD id=46
4 i- x2 {! L7 S7 a; F, _                    63, #从OSD
* C: M8 W- y$ p2 s! L$ L  s                    23  #从OSD
+ r& f  ?, g1 R% t                ],9 }7 \( K3 o( t$ M# @# W
            "stat_sum": {
% \0 l4 ]# r% Z' R( x3 \$ C* ~. N                "num_bytes": 40,
; c* ^; D2 t2 `" g                "num_objects": 2,
( {' I5 X  v) }1 G6 ~                "num_object_clones": 0,9 K( M3 \/ m9 r9 G( O2 x, O& u/ A
                "num_object_copies": 6,
3 y7 ], @# h4 G6 n& T( _0 ?0 [                "num_objects_missing_on_primary": 0,
4 b3 Y, @/ z8 e+ V5 F& h                "num_objects_missing": 0,9 O5 y' k3 h- g! V" Q/ W4 j
                "num_objects_degraded": 0,& X' u' o: A  Z, P# w( g
                "num_objects_misplaced": 0,
! k" {8 Q! _3 v, @% l                "num_objects_unfound": 0,1 I) Z4 r; f( m7 {5 i5 [
                "num_objects_dirty": 2,
0 t/ B0 I/ i, X/ X                "num_whiteouts": 0,
4 R/ w# T7 c1 p4 B' `* K7 b                "num_read": 3055759,
- L( B8 `/ g# q) u2 y                "num_read_kb": 3056162,5 k8 `$ c  B; ~( t& \+ k
                "num_write": 5986011,* {( S4 n2 U/ G; q. U! k" t/ q* D
                "num_write_kb": 53,
* e' p! ~% A; S$ f                "num_scrub_errors": 0,
1 V* q% `7 [! ~                "num_shallow_scrub_errors": 0,
3 U7 z& ~( N5 K  {! p                "num_deep_scrub_errors": 0,' I) i8 o* \* ~. {
                "num_objects_recovered": 0," N4 Q; \1 y/ n$ d0 h! |- [: M
                "num_bytes_recovered": 0,
- k( X4 M7 P& m9 j  M                "num_keys_recovered": 0,4 o+ d, M4 n( @7 Z8 k: i8 V
                "num_objects_omap": 1," k, l( s% V5 a
                "num_objects_hit_set_archive": 0,
# u- G+ M% P, m2 M8 d                "num_bytes_hit_set_archive": 0,5 r+ i4 ~' G0 s' `
                "num_flush": 0,3 e: d, P' O+ D6 K
                "num_flush_kb": 0,/ F; R( g, R- s
                "num_evict": 0,
, _$ V! ^8 L* N1 N2 H5 [6 O# M                "num_evict_kb": 0,* t+ G9 S/ H2 j  z
                "num_promote": 0,/ \6 r- r& |( @( O8 N/ W0 \
                "num_flush_mode_high": 0,
. _# {) F! c  p7 j% B0 i" a) u                "num_flush_mode_low": 0,* c1 W3 e: P8 ^+ Z2 M* c2 q8 ?
                "num_evict_mode_some": 0,
% u; S1 \0 U% X( b/ G$ ?                "num_evict_mode_full": 0,
4 K# P% i: X$ q0 m4 E7 u                "num_objects_pinned": 0,
0 d- B) g) x, Z* b- T' V9 A                "num_legacy_snapsets": 0,
9 e! d8 O) |) g0 J6 _4 [                "num_large_omap_objects": 1 #large omap的object数量+ i1 O9 p+ E; B3 r  ?! b. N2 q
            },  w4 J& J9 }# ^
            ...% y2 D6 B3 W( Y4 I
                "agent_state": {}
# x& T5 ]1 p" P$ @}) M9 S3 M2 W8 z) D0 f5 u+ t

9 a' e; g! u4 @- a! V: o: V1 \5 k
& c7 B6 j% E2 o% E, T% r& j# p[root@demo supdev]# ceph osd find 46 #根据OSD id找到对应的主机信息8 G+ V4 t5 p. q( h
{6 [; e! O, w* J
    "osd": 46,
% @( `. [' P8 D    "ip": "100.1.1.40:6812/3691515",8 _  `6 V* p( g% U* b( N' ~, o
    "crush_location": {
$ b' Q1 L9 p; n, F: R        "host": "TX-100-1-40-sata",
) s$ |/ F( [8 f  D/ N        "media": "site1-rack2-sata",: o4 Y6 r, C$ p4 V7 g! F( l8 Y
        "mediagroup": "site1-sata",# O' Q) i) l! R4 m6 F
        "root": "default") l- z3 D- M1 |/ _! _! v/ a8 b' B
    }
& Z1 \8 i) W/ f6 z/ q4 ~}* C. y( F  [* W1 y9 g

2 ?- x( R! z& k' L, R5 @3 J  H% ?' j& z# O1 e( r
[root@demo supdev]# zcat /var/log/ceph/ceph-osd.46.log-20181210.gz |grep omap #根据OSD日志找到具体的object名称, m  j2 J" z, @5 p3 E4 F
2018-12-09 23:03:18.803799 7f90e9b46700  0 log_channel(cluster) log [WRN] : Large omap object found. Object: 11:67885262:::sync.error-log.3:head Key count: 2934286 Size (bytes): 657040594
: \( ^/ m$ z0 E#OSD 46上的object名称为sync.error-log.3的omap超出标准9 ]$ a  |$ t0 y8 K3 ^& a5 X
1 o  |" g3 Y# r- g
: ^% n" X/ S8 f: U' O

; r( C- W2 d3 }. ]3 j( d# E9 U[root@demo supdev]# rados ls -p cn-bj-test1.rgw.log|grep "sync.error-log.3$" #确定objects存在
. J1 V$ J0 h8 w$ Async.error-log.3% ^) Q/ c- W: L1 A/ a$ M

% ?  g2 e( O6 O: p#注意整个multisite的同步过程中的错误日志信息以omap形式存储在sync.error-log.* - y3 _* Q# V0 f! j- ^% K
#吐槽一下,错误日志分32个shard存储,代码写死了,而且错误日志目前还只能通过手工清理,无法像其他日志一样自动trim,随着错误日志不断堆积,才引发了今天的问题。' R6 c& ~  I+ R4 [
1 F* i' Z2 z4 j# i
[root@demo supdev]# radosgw-admin sync error list|more#查看错误日志2 T, b' T2 O7 b2 ^4 p: i
[
- w6 h# b; V3 m    {8 Z2 R" ^6 t0 f0 T
        "shard_id": 0,3 V  y- W- o7 r/ x: |! [8 O
        "entries": [
" }5 @! V# a6 u, }( ]0 a) G            {) g4 \* C( c* R, i: e
                "id": "1_1540890427.972991_36.1",
/ i9 h  i4 g( d2 w                "section": "data",, \1 z7 N7 G7 b4 L7 t
                "name": "demo2:afd874cd-f976-4007-a77c-be6fca298b71.34353.1:3",$ J2 Z0 q& J/ }" z9 R
                "timestamp": "2018-10-30 09:07:07.972991Z",  e( y  N) Q4 k/ t5 D  s, Z2 P
                "info": {
2 y+ Q7 b/ a' ^/ @; `: F                    "source_zone": "afd874cd-f976-4007-a77c-be6fca298b71",
2 o# `9 A% F$ l- U& L: p9 L                    "error_code": 5,  l& ^+ D9 K' k) K0 q
                    "message": "failed to sync bucket instance: (5) Input/output error". Q. F; c! z6 o+ h
                }( e) `2 A: L" l9 D
            },
' b5 Q6 f3 |% d  N( Z7 B. _1 J....../ a1 `( Z# S' M% B
            {
. m% j0 Y! @3 Q                "id": "1_1543395420.626552_32014.1",
: G% F7 u( O/ {0 Z; \/ i0 S" B                "section": "data",
7 a. W# k! [: ^, n( J. m" i; {  N* }                "name": "demo1:afd874cd-f976-4007-a77c-be6fca298b71.34209.1:0/file1205085",
7 o. S  _! H9 G+ @; T: \                "timestamp": "2018-11-28 08:57:00.626552Z",
. H7 x- a0 V! m3 V3 V8 f                "info": {
  l' h6 J+ L" D1 U. B                    "source_zone": "afd874cd-f976-4007-a77c-be6fca298b71",6 _; f/ e  d' O  O# r5 [- E, Q
                    "error_code": 5,: L! }7 v! c& a) S0 L# B* ~, m8 m" [
                    "message": "failed to sync object(5) Input/output error"8 g! j" O9 N1 T+ g# d
                }
: _0 V( M3 Q; _            }
1 B( m4 \2 {- `
! r  Z* L  h3 C) ^" H/ M- ]
! v8 x9 _, d6 d( p3 b[root@TX-97-140-6 supdev]# radosgw-admin sync error trim --start-date=2018-11-14 --end-date=2018-11-28 #按日期清理错误日志记录* F. ~4 S& b0 j) I
复制, c6 `4 j& l* A+ ~* i
优化定位效率
' ]% f1 G9 _- I, c, w简单写了个脚本,先根据warn信息找pool,之后再根据pool找出有large omap objects的pg,凑合用,不保证没bug,在12.2.10下面测试通过。/ k% ?5 s  m5 S, Q# K7 `
* G& p) E, @, \% H, h
[root@demo cephuser]# cat large_obj.py/ ]7 m" N* Y3 Z, s* D( g- ~
import json
) f: l* }% Y6 M& Uimport rados
8 w' ~, x/ C; Pimport rbd
/ [3 I( A' y% {0 C& ^' _3 c, |1 c! l6 }+ {! ?' P
ceph_conf_path = '/etc/ceph/ceph.conf', H8 I8 D. [7 G) V
rados_connect_timeout = 52 K9 Z$ X/ m1 R) f

* f/ G6 i! @2 o& |0 s0 a2 fclass RADOSClient(object):) Y; ^! S& j9 t9 O$ `: l- J( Y2 a
    def __init__(self,driver,pool=None):
9 O; K4 d  J8 e5 G. p% x1 {        self.driver = driver
# C' w* d* Y9 b        self.client, self.ioctx = driver._connect_to_rados(pool)5 c$ F& A, t* ?  h9 A
    def __enter__(self):# k& q3 p7 C: S, B
        return self
4 i4 C& H" s' s+ R& Z0 X    def __exit__(self, type_, value, traceback):1 w  Y  O: [4 Z+ v
        self.driver._disconnect_from_rados(self.client, self.ioctx)4 M1 U4 J# W) V) b" M2 ?

1 E# L( F: ~& c% Fclass RBDDriver(object):: `6 S$ d, j# b0 o# f9 g. I
    def __init__(self,ceph_conf_path,rados_connect_timeout,pool=None):
8 k1 J( p& t" s! \1 m' j        self.ceph_conf_path = ceph_conf_path
, Y) G' ^) ?, q( M$ @, E$ K        self.rados_connect_timeout = rados_connect_timeout
; p, _1 @. ~" E, E) ]( c        self.pool = pool
" B% B9 U# U% W& M. u" e4 @    def _connect_to_rados(self, pool=None):
* h( D- U% r) d3 Y/ i7 N; H2 O# q! v        client = rados.Rados(conffile=self.ceph_conf_path)
( g) }+ h8 T0 t% H, Y: h        try:, V& V) Q% ^, W" D  G: U6 N, p
            if self.rados_connect_timeout >= 0:9 T4 [3 a! G$ r9 f
                client.connect(timeout=
7 z5 N8 r$ u. N2 M                               self.rados_connect_timeout)1 [8 M, P4 |  i1 o4 d
            else:
0 T- q& P( ?. c0 ?7 y* d" w                client.connect()7 u$ K' z' [' P/ n7 a
            if self.pool == None:, O/ t5 j( F+ J" y/ e: i" |8 k
                                ioctx = None9 k2 r- }, g4 X$ u
            else:5 p1 Z' P& F/ J; q% I
                                ioctx = client.open_ioctx(self.pool)" B0 x& C/ `- x4 z' `, E
            return client, ioctx
- t/ T3 @# `* o' ~        except rados.Error:0 ]  d) |/ M) ^( K3 Y
            msg = "Error connecting to ceph cluster."7 Y. y3 l4 |9 G% U7 P8 @
            client.shutdown()6 s; K  V9 @4 N
            raise msg
4 V! h* X1 j1 ^+ l5 w* ?
/ m/ a2 |1 T, {    def _disconnect_from_rados(self, client, ioctx=None):
' b0 @3 C0 [+ v2 z) h0 I6 W. U% ]                if ioctx == None:  m+ [) G: F+ r- J. }' g3 ^% `5 k( z
                        client.shutdown()
+ I9 U& F2 m8 u4 i                else:2 Q' H  {2 u/ _0 v% j
                        ioctx.close()0 p, R+ e: K5 s  k' I
                        client.shutdown()
; k9 X: X9 S5 F9 H5 Q/ Q. x8 m" Z+ l9 a
class cmd_manager():/ C& d0 t$ a/ |5 x/ J* ?
    def get_large_omap_obj_poolname(self):5 Z/ [' g5 L& A7 [' h
        with RADOSClient(RBDDriver(ceph_conf_path,rados_connect_timeout)) as dr:
; H# V* M# w+ a7 U& x+ u# H& i8 P& k                result = ''
' \3 ?# J( a( G+ c. M                cmd = '{"prefix": "health", "detail": "detail", "format": "json"}'! |/ j9 @, Y1 O6 R
                result = dr.client.mon_command(cmd,result)
: y4 ~& T, m& c/ z: }9 K7 t                if result[0] == 0:' k) c& E3 {0 |1 a( e+ N! C
                    res_ = json.loads(result[1])
0 G/ n; V8 M2 X' T* u                    if res_["checks"]['LARGE_OMAP_OBJECTS']:. w2 M7 |. o4 T; u4 n5 A
                        return res_["checks"]['LARGE_OMAP_OBJECTS']['detail'][0]['message'].split("'")[1]
1 x  j( F+ @  p, b* _                else:) `. x! s8 Z# e3 Z( Z$ }) h
                    return False
& N0 G/ y9 F6 V2 \6 w) T    def get_pg_list_by_pool(self,poolname):
6 v! B! |  I9 G5 O$ q        with RADOSClient(RBDDriver(ceph_conf_path,rados_connect_timeout)) as dr:
+ s, k/ a: i4 T                result = ''3 }4 s" z2 Q- F1 Y
                cmd = '{"prefix": "pg ls-by-pool", "poolstr": "' + poolname + '", "format": "json"}'1 E7 V# w# F% l# V, I
                result = dr.client.mon_command(cmd,result)) H+ g: `* x+ N& m0 h# t
                if result[0] == 0:" x( E8 r5 g7 s1 o( X& a
                    return json.loads(result[1])
0 Y4 x" [8 b3 o                else:
3 E& \, s' b2 U                    return False! i) E# r- K. f* Z# V, f

" i/ J# {( n* l+ o0 V4 Jcmd_ = cmd_manager()
( `1 q; G4 w2 i+ s( D  Jpoolname =  cmd_.get_large_omap_obj_poolname(); ]+ v9 w* X" L
print "Large omap objects poolname = {0}".format(poolname)9 w& F( F9 ^- E0 F
res =  cmd_.get_pg_list_by_pool(poolname)
/ a1 T7 Y2 X- l5 S1 nfor i in res:
4 g9 m" g/ @/ ^; F* w% @) ?& _, `    if i["stat_sum"]["num_large_omap_objects"] != 0:
* U: [+ [$ N$ l7 Z6 B5 b        print "pgid={0} OSDs={1} num_large_omap_objects={2}".format(i["pgid"],i["acting"],i["stat_sum"]["num_large_omap_objects"])  F6 \: g: _0 g0 F( _5 G
复制  {/ m7 i2 M- r: T. C. m
再爆一个雷1 n( f1 w9 b, T& [4 J  u
如果你认为通过上面方式清除omap集群就能立马恢复状态,那就太天真,告警信息“HEALTH_WARN 32 large omap objects”依然挂在那里不尴不尬,虽然omap清理了,但是因为对应PG状态没更新,所以告警信息依然存在,只能通过手工或者其他方式去触发PG的状态更新,我这边是通过ceph pg deep-scrub {pg}去触发pg信息更新,注意如果你用scrub是没用,必须deep-scrub,这里又要吐槽官方的逻辑设计,真是WFK!当然你也可以放那里不管,等后台自动deep-scrub也能恢复。
  D* x1 \2 B5 r* c9 x/ d6 A$ u0 w5 |( O1 d6 Z
您需要登录后才可以回帖 登录 | 注册

本版积分规则

返回首页|Archiver|手机版|小黑屋|易陆发现技术论坛 ( 蜀ICP备2026014127号-1 )

GMT+8, 2026-6-11 23:01 , Processed in 0.019374 second(s), 23 queries .

Powered by Discuz! X5.0

© 2001-2026 Discuz! Team.

快速回复 返回顶部 返回列表