找回密码
 注册
查看: 595|回复: 2

1 Large omap objects ceph health deatil

[复制链接]

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
发表于 2022-8-19 17:00:37 | 显示全部楼层 |阅读模式
Large omap objects
6 w' k" K% g; w# ?. l# v# ceph health detail
9 `0 [7 \9 u, }2 _5 }4 lHEALTH_WARN 1 large omap objects" k5 Y5 y4 V% x) i8 w* d- e3 t
LARGE_OMAP_OBJECTS 1 large omap objects8 U0 J$ D) t+ r6 f" ]/ h
    1 large objects found in pool 'is_recovery' #出现large omap的pool/ d# Y( ?4 Y$ \9 Y! n& U3 |
    Search the cluster log for 'Large omap object found' for more details.1 V, y8 Y% T+ z+ m$ u% x1 K! U
4 B% @. o/ W6 }5 B
2 H& |, `7 q: w6 Q1 ~) o9 b

0 K( |8 U, e' Z9 }* [
8 h3 F2 @4 x0 Z. {- W& tceph pg ls-by-pool  is_recovery|awk '{print "ceph pg "$1 " query|grep num_large_omap_objects"}'|sh -x
0 r3 {, D* @/ B' M; \- gceph pg 11.0 query|grep num_large_omap_objects
- o) G: N) i' O8 A7 Tceph pg 11.1 query|grep num_large_omap_objects
; {4 ^5 B' O( G' \4 i% i. L3 {. s) Nceph pg 11.2 query|grep num_large_omap_objects( O! E6 _! e& d, q, h: a% C

& j# g0 F" R7 s# x$ O5 U4 D8 Y* r7 t+ E9 W" i$ [. I3 }6 p& y  {' B; ~
7 d( L3 q$ q5 Q! q# b$ m, M% o

) s  q" }& M" c% ^) z[root@ceph-1 ~]# ceph daemon mds.ceph-1 flush journal
7 v& y0 h6 U+ z7 U8 O{
( ?& q/ F3 F) H( [    "message": "",3 _: T$ u1 N; b' P1 Q
    "return_code": 0/ i: Z7 X; }& o6 Q5 U9 p
}2 A' W8 e2 m5 a' `
[root@ceph-1 ~]#  _% k' L; G0 V" M0 h4 @+ H
[root@ceph-2 ~]# ceph daemon mds.ceph-2 flush journal
! _3 T* `2 R5 H4 Q# X"mds_not_active"
' Z2 c: l# W3 ]! w9 S# Y[root@ceph-2 ~]# ceph daemon mds.ceph-2 flush journal4 N+ }) c8 G' p  ~* Q& x
"mds_not_active"+ h% W4 P* V9 f

: Q+ `' {+ v9 t% @+ [' I) G; U& S9 ?. B( X3 `( n# p

* M: F5 }7 ^: {5 m) f: q4 ?% K4 o0 w/ t4 i$ W
0 P! y0 L! i0 l* Y- Y* N' ~& e

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
 楼主| 发表于 2022-8-23 09:53:54 | 显示全部楼层
index pool的 large omap 处理
& M# S# v; G$ E8 g向单个bucket压测2000W个object,默认设置shard数为16,压测到1800W出现large omap,介绍一下错误定位和如何处理。& h( b) U& v3 B% ^( r# [  l5 }, d" b

$ R3 s5 [6 W- O5 D2 V异常定位+ u; B0 i/ D# P' `
集群状态如下9 V3 q! d, Z# R

: V5 J5 B4 ?- y[root@demo123 cephuser]# ceph health detail& ?7 d' n/ w' C7 _7 m9 ?( x
HEALTH_WARN 16 large omap objects' g* R4 Y4 z5 N" R% V  R
LARGE_OMAP_OBJECTS 16 large omap objects
$ J& f3 v# L$ O! \& [- J    16 large objects found in pool 'cn-bj-test2.rgw.buckets.index'! u$ K8 A0 `, \. e9 @. E
    Search the cluster log for 'Large omap object found' for more details.
8 O' N3 l  S3 k. f- j8 E复制+ \- D8 U( e# Z3 S" K% H
通过脚本找到对应的pg信息,脚本请查看之前一篇omap large处理的文章。8 m# s6 J( s7 X

: r( S$ s; f& F* R[root@demo123 cephuser]# python large_omap.py
1 S0 l2 B3 f7 n2 O, TLarge omap objects poolname = cn-bj-test2.rgw.buckets.index
9 z  @, N- Q3 K+ @7 ^3 opgid=13.1f OSDs=[78, 9, 59] num_large_omap_objects=1
1 d6 X9 q) ?" R4 D5 @8 Zpgid=13.33 OSDs=[59, 79, 19] num_large_omap_objects=13 r" K' {7 Q5 V! n7 B6 a; `
pgid=13.3c OSDs=[49, 29, 78] num_large_omap_objects=1! s. \# r" ?7 H: I9 p$ Y
pgid=13.3d OSDs=[48, 69, 9] num_large_omap_objects=1* r6 X( k7 l. H. M
pgid=13.45 OSDs=[88, 39, 28] num_large_omap_objects=1
8 \  \. ]" ~0 d- _% qpgid=13.4d OSDs=[38, 29, 89] num_large_omap_objects=1
6 s- P( `0 M% ~: H# q; t* `pgid=13.50 OSDs=[68, 19, 59] num_large_omap_objects=1
3 H; c% M' L& P; {pgid=13.6b OSDs=[39, 79, 8] num_large_omap_objects=1
" Q  G- I# O* r) F$ N& j- _* o' p- ipgid=13.8e OSDs=[38, 9, 78] num_large_omap_objects=1
; i! c! f: y7 A0 h' p- u7 `pgid=13.d1 OSDs=[9, 88, 38] num_large_omap_objects=1
9 _& ^& `/ {5 `pgid=13.d2 OSDs=[59, 88, 28] num_large_omap_objects=1; B3 |8 v. u. U* J
pgid=13.e1 OSDs=[19, 88, 49] num_large_omap_objects=1
# T7 d8 B- \7 x& ~/ X- ppgid=13.e4 OSDs=[38, 19, 89] num_large_omap_objects=17 v  b0 @" O* Q1 T. A) ~
pgid=13.e7 OSDs=[19, 89, 38] num_large_omap_objects=1. t1 u- ?, L, o: D8 R2 }( f
pgid=13.ec OSDs=[89, 28, 48] num_large_omap_objects=1$ V; z- K/ h0 T# [6 z- q
pgid=13.f5 OSDs=[38, 88, 19] num_large_omap_objects=1/ ]% q. i& l9 c# h, Q
复制, Z/ `7 L5 a2 t* P
查找OSD日志,确定object名称(".dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.11"),发现omap条目数达到了2378492,超过默认告警值
  @) C% S% R9 w  F3 C/ m# I  n+ J9 T$ J
$ Q- G6 Q( k9 x5 K+ v, L[root@demo123 cephuser]# zcat /var/log/ceph/ceph-osd.19.log-20181231.gz |grep "omap"
9 j" G' F/ a+ ~2018-12-30 23:00:42.334766 7f6583f44700  0 log_channel(cluster) log [WRN] : Large omap object found. Object: 13:87443b2d:::.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.11:head Key count: 2378492 Size (bytes): 4917227583 d' M9 d$ z& K% i
复制/ T# S' G  I! a1 _' q' q3 o, I
默认告警值为2000000,2378492>2000000,不建议去修改这个默认值,因为改得过大会加大集群出现异常的风险,属于掩耳盗铃。, L3 n; s& }/ P6 T+ h' Y
: ]+ w" s5 S( p2 L7 i) ^0 G
[root@demo123 cephuser]# ceph daemon /var/run/ceph/ceph-osd.19.asok config show |grep large0 @6 l0 y4 J( W
    "osd_bench_large_size_max_throughput": "104857600",
  Q! r, n& @' c    "osd_deep_scrub_large_omap_object_key_threshold": "2000000",
1 h1 F8 O7 {. S, b    "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824",, l8 D2 [' T' s( p# J
复制
0 N: a- @6 M. x& s/ _& l查看一下发生omap过大的bucket,确定相关信息* m0 u9 t8 j$ I( g

1 F9 g' e. i9 [3 X[root@demo123 cephuser]# radosgw-admin bucket stats --bucket=demo1
% o- R' y  f- N, f{
+ ]: E% L9 f+ }8 A" f* k" t    "bucket": "demo1",' I7 _4 D  ^+ A
    "zonegroup": "68f1dcf5-0470-4a48-8cd2-51c837a2cafb",% c( E% s6 |9 ]' L+ O
    "placement_rule": "default-placement",
8 m$ C( j; k# v7 d    "explicit_placement": {& K5 O# x$ H9 c8 M6 d  f. r& ~  x$ l
        "data_pool": "",
6 a+ J, f. v) q5 b$ \        "data_extra_pool": "",# h; x& Q1 H/ J, e( k; G! y/ u5 f
        "index_pool": ""
3 I& b$ E* V& w, j, N! D    },
& `: x& s4 ?. [7 D  [    "id": "afd874cd-f976-4007-a77c-be6fca298b71.34209.1", #当前bucket instance ID,! u' \2 @2 H0 \
    "marker": "afd874cd-f976-4007-a77c-be6fca298b71.34209.1",
/ t8 B- H" F  z( V    "index_type": "Normal",
9 U/ J) b* i. m$ ^6 {% G    "owner": "s3test",; I8 R: ]; D2 `0 ~7 b
    "ver": "0#2638037,1#2637965,2#2632835,3#2632869,4#2632799,5#2632597,6#2633289,7#2633175,8#2637227,9#2637609,10#2637997,11#2632455,12#2631337,13#2631624,14#2631983,15#2632359",
( B5 e- e$ J! z9 v1 l, [" _* E+ f( n    "master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0", #16个shard- h( `& g7 L& I- t
    "mtime": "2018-11-28 16:47:45.560039",' L$ {# x+ u; A; m
    "max_marker": "0#00002638036.2638608.5,1#00002637964.2638536.5,2#00002632834.2649479.5,3#00002632868.2633634.5,4#00002632798.2633370.5,5#00002632596.2633168.5,6#00002633288.2633860.5,7#00002633174.2633747.5,8#00002637226.2637798.5,9#00002637608.2638181.5,10#00002637996.2638569.5,11#00002632454.2633026.5,12#00002631336.2631914.5,13#00002631623.2632195.5,14#00002631982.2632554.5,15#00002632358.2632930.5",/ _- H( n4 Y8 t2 D( e  t
    "usage": {2 ]! K4 e8 i# J# i7 V! ]1 n
        "rgw.main": {
5 z8 n) f& c7 G; {            "size": 1975757355553,0 q  K- R# |, z9 M' F
            "size_actual": 2047893610496,
) R1 M" N) w8 s* R/ D9 N4 k$ P            "size_utilized": 1975757355553," s' B4 A7 c+ c. F9 \! [. Z, m
            "size_kb": 1929450543,
# n( |  r7 }$ I- ?6 U% K            "size_kb_actual": 1999896104,. D8 R' C0 }3 m9 b
            "size_kb_utilized": 1929450543,: J  b+ q3 r, i* L) ^
            "num_objects": 19998962 #近2000Wobject
% Z" U* _. R, d        }
; A" U  Q7 U( Q- R, [' R( A7 B  g    },
% w5 j* ]1 x# z* X5 J2 q    "bucket_quota": {" u' g9 B: t9 q
        "enabled": false,
5 h% c7 `9 i$ K        "check_on_raw": false,/ _+ R1 k) b% @5 z8 R1 L4 m" M
        "max_size": -1,! y4 C/ C( @( j( x- v$ h
        "max_size_kb": 0,6 f8 v6 u* s- q/ t4 l+ ]" s
        "max_objects": -1& P' ^3 q. {7 T/ B
    }
# O* e8 U: e$ j5 K! R+ Y+ u* m}% j5 Q( H7 o  c7 @8 L
复制
, ?8 a1 l2 m' W0 C7 I- Q8 w异常处理
$ M( f( V" H/ G$ G# F8 [3 b; b通过bucket reshard操作,将原来的bucket 重新划分shard,shard数量从16->64。注意reshard有风险,最好停掉客户端的读写操作以后再进行,同时如果你使用了multisite,请根据官方说明立即关闭Dynamic resharding特性。4 a0 A9 l5 }3 A# G
$ j% ^4 G( ~9 e
Dynamic resharding 说明: http://docs.ceph.com/docs/mimic/radosgw/dynamicresharding/4 Z; U1 x0 D8 x) m4 a

' V+ |# `# F8 o/ x6 _7 o做完reshard需要手工删除之前的索引数据,工具也提示了下面的内容。
, E9 z$ p# d; A8 I# K+ i' o
, L' g2 J% R" {- x[root@demo123 cephuser]# radosgw-admin bucket reshard --bucket demo1 --num-shards 64
$ u1 w4 _  {( N& ]8 R, s*** NOTICE: operation will not remove old bucket index objects ***
7 h9 e5 V. I) {1 E***         these will need to be removed manually             ***
0 S$ N( h* D# ^9 e3 g1 itenant:
1 O0 N: M' V) s1 ]' {  b# c; z9 U) y4 jbucket name: demo1: u# O9 c& t6 A8 x
old bucket instance id: afd874cd-f976-4007-a77c-be6fca298b71.34209.1) N* W, s- \/ ~# m' h
new bucket instance id: afd874cd-f976-4007-a77c-be6fca298b71.45786.1
) \, J* a9 W, ?% Vtotal entries: 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 19998962* A5 ?* I: d: c; K. {
2019-01-03 11:42:33.741314 7f74d15c6dc0  0 WARNING: RGWReshard::add failed to drop lock on demo1:afd874cd-f976-4007-a77c-be6fca298b71.34209.1 ret=-2) n7 Q7 g  O2 u" x; U1 k
复制- |# m4 S, O" A; X
检查reshard结果
8 S# X+ Q" S* L2 `* D% V' a, D# ]6 A/ ^  M7 v7 Z6 C. q! Y
[root@demo123 cephuser]# radosgw-admin bucket stats --bucket=demo1
& K- x$ M: {/ n{* t. y% [( r7 z% l# A) [
    "bucket": "demo1",2 k' h' o5 _1 X* \; L  n
    "zonegroup": "68f1dcf5-0470-4a48-8cd2-51c837a2cafb",' |) r& a9 o0 W8 F4 a" u/ p
    "placement_rule": "default-placement",) T. g) ^4 O. m; q3 W
    "explicit_placement": {6 ?% ?3 x1 M" U- {
        "data_pool": "",
1 g% E4 i9 c4 i' O  \        "data_extra_pool": "",) @5 |0 x+ h( M4 [( K& T
        "index_pool": "": {1 {/ q: _, V! r, h3 ^" y2 n
    },: A& e% Z) r( m- T" o. L
    "id": "afd874cd-f976-4007-a77c-be6fca298b71.45786.1", #bucket instance ID发生变化
* P- N1 [3 x. O6 e$ e    "marker": "afd874cd-f976-4007-a77c-be6fca298b71.34209.1",
1 D0 u& Y) _9 j    "index_type": "Normal",- {8 _$ S- e! F. p6 G$ V4 ?* z
    "owner": "s3test",+ {  v$ q; x  E6 i5 |* }9 a
    "ver": "0#4920,1#4920,2#4883,3#4877,4#4882,5#4883,6#4885,7#4880,8#4882,9#4880,10#4878,11#4883,12#4923,13#4883,14#4882,15#4874,16#4878,17#4880,18#4884,19#4881,20#4882,21#4881,22#4876,23#4922,24#4883,25#4887,26#4881,27#4879,28#4879,29#4879,30#4882,31#4884,32#4880,33#4879,34#4917,35#4876,36#4883,37#4885,38#4884,39#4879,40#4883,41#4880,42#4880,43#4882,44#4884,45#4877,46#4879,47#4877,48#4881,49#4880,50#4881,51#4881,52#4883,53#4876,54#4880,55#4884,56#4881,57#4885,58#4882,59#4881,60#4881,61#4881,62#4883,63#4882",#shard 数量变为了64
& `. _) L% W1 ~4 [# P' T    "master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0,16#0,17#0,18#0,19#0,20#0,21#0,22#0,23#0,24#0,25#0,26#0,27#0,28#0,29#0,30#0,31#0,32#0,33#0,34#0,35#0,36#0,37#0,38#0,39#0,40#0,41#0,42#0,43#0,44#0,45#0,46#0,47#0,48#0,49#0,50#0,51#0,52#0,53#0,54#0,55#0,56#0,57#0,58#0,59#0,60#0,61#0,62#0,63#0",3 P0 {) a- A! J$ F
    "mtime": "2019-01-03 11:32:50.349905"," I4 T/ E, t5 i4 w) D( j0 l
    "max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#,11#,12#,13#,14#,15#,16#,17#,18#,19#,20#,21#,22#,23#,24#,25#,26#,27#,28#,29#,30#,31#,32#,33#,34#,35#,36#,37#,38#,39#,40#,41#,42#,43#,44#,45#,46#,47#,48#,49#,50#,51#,52#,53#,54#,55#,56#,57#,58#,59#,60#,61#,62#,63#",- `6 w8 |) t; n' |0 O! i6 Z4 {
    "usage": {: r. h& d$ R9 y% i! X6 k- y
        "rgw.main": {% g1 D. |/ J1 |" ^9 j" M+ e
            "size": 1975757355553,) k$ @/ U' ~, g/ }0 H. y2 ?
            "size_actual": 2047893610496,( K) d' A" c) c! `3 M) G
            "size_utilized": 1975757355553,5 I9 Q! P5 h9 @+ F5 x5 g. Y
            "size_kb": 1929450543,& b' M8 _; H- c4 t1 N$ U( I/ ?( W
            "size_kb_actual": 1999896104,5 L) j' b% J( D8 z
            "size_kb_utilized": 1929450543,
3 U' {$ @9 ]0 c& q& o& P( z            "num_objects": 19998962
$ _  b$ I  \6 h1 w        }
4 |" T% x) T9 j$ H    },
. S/ R- o* J4 Z    "bucket_quota": {
6 y1 @( @& K4 ], R  {0 s        "enabled": false,
/ |/ b0 u8 ?5 U" k" O        "check_on_raw": false,0 B, R# q! o- J7 J' f
        "max_size": -1,
0 y' w' V3 F6 g$ Y        "max_size_kb": 0,
( \( t3 g! B* ?        "max_objects": -1
% A9 N. z4 o3 B    }
9 _, A+ A0 a, W" ?+ {8 A}
8 M+ Y/ V& }' W# U" V3 y复制( s, K- v7 A* z$ o1 s' S
回收旧数据5 v. [% P6 t9 C) Q# A. r# |
根据之前工具的提示需要回收index和meta两个pool里面的残留数据9 z4 x  w+ M) `. N

( Z+ |8 o9 p0 y2 [7 _回收index pool数据
6 K3 J" i' L2 x# h0 H, a  h- J- l% I, k  e$ |& Q
[root@demo123 cephuser]# rados ls -p cn-bj-test2.rgw.buckets.index|grep "afd874cd-f976-4007-a77c-be6fca298b71.34209.1"
8 V$ h0 x% B% f0 p2 k; ?9 {& b- n.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.5
8 @4 J  M$ s# s# k.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.15
$ H6 |! Q+ h/ `.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.2
3 I0 Z( D' n- ?: j- s.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.1
' Y8 f# W3 r6 {. I. u.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.0) r5 Y8 E# `# d- k& Q. r
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.4  [) @( |/ q7 @( ^9 K/ ]
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.11
! S& [1 t0 K# P( C.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.13
0 B7 ~1 ^6 G( z9 h+ L. w8 K( t- T.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.6
5 f- C" y- n7 B$ C" U( H.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.32 h: A0 D7 x' n; a1 ]$ h
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.72 H/ H$ R9 w5 F3 h& c3 y
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.9) A# t! V6 W/ C( h% X
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.14
7 X& r+ X) [) H( k; w! f1 f.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.100 E* A7 |1 P# L8 u  T9 B2 @( u  v. t
.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.12
/ B% m  l: L: C, O9 |.dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.80 U" G5 A( y+ }. Y" T7 @/ {
复制
! \  `; N- ?6 D5 o: p使用rados rm命令删除数据
& a( X" d" \+ u, g
: z4 P3 _- E5 Z" M( C9 p! }[root@demo123 supdev]# rados ls -p cn-bj-test2.rgw.buckets.index|grep "afd874cd-f976-4007-a77c-be6fca298b71.34209.1"|awk '{print "rados rm -p cn-bj-test2.rgw.buckets.index "$1}'|sh -x
! C, H. r2 J% U  \$ J+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.55 x9 o- j1 Y$ o& K% m
+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.15
' ^1 K; ~7 j4 ?( Q$ h+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.20 j0 [+ z' y' N1 V* c0 o9 u2 {
+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.1  |9 m. V! E* d( J! N6 A
+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.0
) g+ l3 Y! }0 S  R5 ^" h+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.4& S2 a1 [% c3 q: R! T
+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.11( D0 @5 \4 ^5 r/ o& w
+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.13/ z5 N' ~( [* L% F
+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.6
: q/ R; t, g) Q+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.3( M0 u* ]4 L! k5 ?; l! `, d. y
+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.7
9 L* u8 n$ ?9 Q+ @" a0 I+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.99 \* I: n/ ]" K+ ]& N+ s
+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.14
  c& a# D2 D8 s8 b1 s7 u+ B/ y( S+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.10
" ?  ]9 N% I7 v* l3 j$ D( V% }+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.12
- M# a- p# C2 N# T+ rados rm -p cn-bj-test2.rgw.buckets.index .dir.afd874cd-f976-4007-a77c-be6fca298b71.34209.1.82 H+ X7 d0 i# f& M) c+ F* K6 r
复制  [- f' A, @8 y; e& t, }
回收meta pool的数据' g1 s; y0 w7 @" {  W

& V$ I8 O& o7 l! {[root@demo123 cephuser]# rados ls -p cn-bj-test2.rgw.meta --all
3 D: \0 w7 f! v! n6 iroot    demo1
: z1 i; k; w5 [3 {/ M) xroot    .bucket.meta.demo1:afd874cd-f976-4007-a77c-be6fca298b71.45786.12 U2 Z5 M6 c; s! H
root    .bucket.meta.demo1:afd874cd-f976-4007-a77c-be6fca298b71.34209.1 #残留, q! B8 y6 H) Y9 U& [
root    my-new-container_segments
6 q, g, o( @4 d# i! _6 ?root    .bucket.meta.demo2:afd874cd-f976-4007-a77c-be6fca298b71.34353.15 Q' U6 Z- m7 N1 x3 N+ ]
root    .bucket.meta.my-new-container:afd874cd-f976-4007-a77c-be6fca298b71.7991.13 |8 Y" \4 i) q7 I
users.uid    s3test.buckets6 L" ?# U7 a) K+ k3 K
users.uid    swiftuser
8 f9 h' v; p9 i; t  E% lusers.swift    swiftuser:swiftuser1
: G5 b- s/ d" P; ]1 eusers.keys    SNACA4LX9DS21NGMSRX4
: }/ Z9 z6 h: X9 iroot    .bucket.meta.my-new-container_segments:afd874cd-f976-4007-a77c-be6fca298b71.7991.4
0 |- P" P4 F5 x1 Musers.uid    s3test; j0 k8 J4 j/ }
root    demo2
0 ~. ?. U# r2 ?9 [; \4 E# zusers.keys    XP8E2452AB6EBU3RPD0C
$ m+ m; C! q$ l  i1 o5 G; G1 G. D2 }root    my-new-container! _( ^2 q4 X9 O, e( ]! J. U9 b
users.uid    swiftuser.buckets
4 `; ]- s; \0 i+ u" ~# H! musers.uid    synchronization-user
8 V( u2 `( s" w' E' h2 `复制
# o8 c. ^7 ~8 K6 }4 v% D注意这里用的ceph L版本,使用了namespace,所以要指定namespace才能删除
7 S- `8 z1 L9 i- C3 T4 E8 Z- w: |, v  Z+ s
[root@demo123 cephuser]# rados rm  -p cn-bj-test2.rgw.meta .bucket.meta.demo1:afd874cd-f976-4007-a77c-be6fca298b71.34209.1 --namespace=root; l* q" h' x  n0 Q  I+ q- Z
[root@demo123 cephuser]# rados ls -p cn-bj-test2.rgw.meta --all
) u) K8 Y+ z  l. _! h* Froot    demo1* _6 [* N, A+ h0 E8 Y; s* y, U
root    .bucket.meta.demo1:afd874cd-f976-4007-a77c-be6fca298b71.45786.1# e( b8 C# Z( W, ]- v+ n# N0 d) w( ~
root    my-new-container_segments
1 B! F- J+ o+ T& n3 R% Uroot    .bucket.meta.demo2:afd874cd-f976-4007-a77c-be6fca298b71.34353.1
; i  z3 o! c/ c& Y# p- zroot    .bucket.meta.my-new-container:afd874cd-f976-4007-a77c-be6fca298b71.7991.1: t3 s( @8 T; `5 t* r8 q. n. F% d
users.uid    s3test.buckets( ~+ i% e; `0 Q% U( {7 g
users.uid    swiftuser# I' M, F  G4 p. d$ `" h6 S
users.swift    swiftuser:swiftuser16 v, b7 S& J/ m$ R1 s$ {! W
users.keys    SNACA4LX9DS21NGMSRX4" N+ h! P( F& v  Q( G: V" ]
root    .bucket.meta.my-new-container_segments:afd874cd-f976-4007-a77c-be6fca298b71.7991.4
8 y2 o) T5 ?7 |! Yusers.uid    s3test3 i' Y/ s, x  V5 Q- X$ _& U
root    demo2
. _; S4 s+ j8 G9 Ausers.keys    XP8E2452AB6EBU3RPD0C
1 F" I' X8 o/ G# [' O! Kroot    my-new-container
- C5 a' g* l5 x1 @9 _; Jusers.uid    swiftuser.buckets3 ^. N( F% k$ l" u8 D( v. U9 x, {& Q0 a
users.uid    synchronization-user
3 i# s0 F5 O8 g1 Q  w+ t6 e! }复制6 o$ p, @  f  [/ q7 i6 c
清除large omap告警
+ U' G* @; Z% T8 F. k  p4 z删完了object并不会恢复告警,需要手工对相应的pg进行deep-scrub操作,具体如下. ], A  e0 J! B8 m
* ^  Q9 N: I; l8 h, F- d1 ?
[root@demo123 cephuser]# python large_omap.py. g) b8 O" V1 B
Large omap objects poolname = cn-bj-test2.rgw.buckets.index
  o4 g" y( h* rpgid=13.33 OSDs=[59, 79, 19] num_large_omap_objects=1
2 C0 W( h' @8 jpgid=13.3c OSDs=[49, 29, 78] num_large_omap_objects=15 ?7 ^: g: z( m: _  z) r$ |9 Y
pgid=13.3d OSDs=[48, 69, 9] num_large_omap_objects=1( V1 f) w6 [% }. x/ j( H
pgid=13.45 OSDs=[88, 39, 28] num_large_omap_objects=1
: K, l, m) Q. y1 ~pgid=13.4d OSDs=[38, 29, 89] num_large_omap_objects=1" |, q3 Z1 |/ Z4 G$ d
pgid=13.50 OSDs=[68, 19, 59] num_large_omap_objects=1
! Z" O5 `) Z: p" |! Fpgid=13.6b OSDs=[39, 79, 8] num_large_omap_objects=18 C4 a1 {, s. I9 B! q
pgid=13.8e OSDs=[38, 9, 78] num_large_omap_objects=18 b/ m9 K  g4 q& _4 K* D; q
pgid=13.d1 OSDs=[9, 88, 38] num_large_omap_objects=1
# G- t- v* }3 G" o9 H* y$ vpgid=13.d2 OSDs=[59, 88, 28] num_large_omap_objects=1
, E) P( k6 h: L7 Vpgid=13.e1 OSDs=[19, 88, 49] num_large_omap_objects=1
$ \) s8 k2 s' n/ Kpgid=13.e4 OSDs=[38, 19, 89] num_large_omap_objects=1
# {5 L; |0 S1 N' Ipgid=13.e7 OSDs=[19, 89, 38] num_large_omap_objects=1
/ |  g, R8 l( x# z% G: p5 @pgid=13.ec OSDs=[89, 28, 48] num_large_omap_objects=1
5 E, {; M; f# t" c5 B7 b' [( hpgid=13.f5 OSDs=[38, 88, 19] num_large_omap_objects=19 I" Q  s5 {1 |2 O8 f3 [
[root@demo123 cephuser]# ceph pg deep-scrub 13.33
4 U) d# e! ^3 f2 E; @instructing pg 13.33 on osd.59 to deep-scrub6 A2 G- N" b' ~' Q# }
复制
. O0 H3 J( _' H; h8 b5 }操作完可以看到有pg进行dep-scrub,之后状态恢复
& ^4 a; m. M/ r/ A- x3 W5 [5 W, f; |6 g; S& B# x; u8 v
[root@demo123 cephuser]# ceph -s
+ f0 ]- w( y$ z5 D+ M  cluster:
# a- \% G1 f2 {! x: s% M    id:     21cc0dcd-06f3-4d5d-82c2-dbd411ef0ed91 K! {" E7 S3 K
    health: HEALTH_WARN* _* V9 O5 c% m; S+ Q: j) \6 ~( r5 e
            16 large omap objects: }3 B2 c* k/ y. f2 c, Y5 y
% S- q$ w/ G' B9 X5 L' l  x3 m
  services:
; K6 ?0 s. ~: U( }    mon: 3 daemons, quorum demo122,demo131,demo141, L9 E6 @0 [* M# r. v
    mgr: demo141(active)3 h0 _: ^: U" {
    osd: 90 osds: 90 up, 90 in& M4 v8 q. h/ K3 p% q
    rgw: 1 daemon active
/ A! k$ L8 t! A6 n+ B: G* r
7 X' K2 R( T* D4 @8 y- j  data:4 a5 `7 K* E- j- {/ G  X1 K3 \% A
    pools:   7 pools, 3712 pgs
9 K. b- b- l; [$ U1 ^  c) R# F    objects: 20.13M objects, 1.80TiB7 A% S1 V6 [, i$ |( n0 q
    usage:   7.28TiB used, 408TiB / 415TiB avail
: T+ E9 a& i% ]6 [  F    pgs:     3711 active+clean, J) K/ n1 N! `
             1    active+clean+scrubbing+deep #开始deep scrub" }- e/ }0 e% ?$ a5 a) W- k  e

) J6 h8 \1 e9 O: a% }  io:! X8 V5 c$ N/ X! q2 `1 o/ b
    client:   5.29MiB/s rd, 935B/s wr, 69op/s rd, 28op/s wr, f7 b. n0 _5 ]% f
7 k  y4 y$ f0 t2 g* `% w
[root@demo123 cephuser]# ceph -s& W. ^( L+ K, H/ y* N: z$ K9 K4 T
  cluster:
$ O  }+ ^) G/ e. M0 f. F    id:     21cc0dcd-06f3-4d5d-82c2-dbd411ef0ed9& H" a# B' w, m
    health: HEALTH_WARN
. |$ p, m0 l# S6 v0 v7 X  J            15 large omap objects #减少了1个
5 D; e8 ^! Q, Y5 f% {3 u2 U: \; G2 @# a" C0 s# p3 a
  services:0 r! _  i& u) e8 P' \, V1 P
    mon: 3 daemons, quorum demo122,demo131,demo141
& P5 N$ X3 v. o" O    mgr: demo141(active)4 p( Y/ d; @; U9 f+ M
    osd: 90 osds: 90 up, 90 in
. {; N% d- c8 p; P5 s3 \+ c: C    rgw: 1 daemon active4 v* ]6 `0 t1 L0 {" I0 O+ W8 ?2 E) {

. g  H9 _+ X! U: j8 U0 g* v/ }  d0 g  data:
) O7 D% d! N/ _! r5 K0 O    pools:   7 pools, 3712 pgs1 T* f) s% o" Y# U% |9 N# a
    objects: 20.13M objects, 1.80TiB4 m/ L: C8 e3 O1 S( \8 o
    usage:   7.28TiB used, 408TiB / 415TiB avail$ s/ e' y8 P9 e# l+ Z( N- |& M
    pgs:     3712 active+clean
; b, T; _# `2 W4 g" v8 X# }' e; _+ A$ }) v: W
  io:  w5 x1 r$ d4 `# D1 k
    client:   5.33MiB/s rd, 680B/s wr, 36op/s rd, 6op/s wr2 N+ K2 H2 f- x3 N. j0 q' D
复制
" ~8 ^4 q/ m: t! n总结
1 W% v# ?! H( l3 e! W2 j5 Hindex pool的omap告警一般就分为两类:
( |1 G. j7 Z/ O# ?* b& R6 O& Q8 @! t
一类是object条目数过多,导致对应的index 元数据条目数过多,可以用上面的方法处理。, z9 `, d2 }+ s! {7 C6 ~
另外一类是bilog过多,这里的方法就不适用了,需要手工进行bilog清理,关于bilog后续会有详细章节介绍。
) c. g0 r& O/ M. j/ a+ W& Z, ~; A' |7 }3 c

1

主题

0

回帖

12

积分

管理员

积分
12
QQ
 楼主| 发表于 2022-8-23 09:54:43 | 显示全部楼层
线上multisite环境出现HEALTH_WARN 32 large omap objects,已经bucket auto reshard=false,所以排除是bucket index 所在的shard omap过大引发的问题,官方的给出的告警信息无法定位到具体的object,于是有了下面的排错过程. }* h' p' S& K7 H; o. W4 x
6 q& I7 C, D) K, n- n& K7 P
排查过程
8 c/ g4 f. J* u[root@demo supdev]# ceph health detail4 k) |* ^2 a$ q! T9 Y# f
HEALTH_WARN 32 large omap objects
. H4 ?5 n" N2 RLARGE_OMAP_OBJECTS 32 large omap objects# ^0 z: p' n7 @/ X& H3 @' ]
    32 large objects found in pool 'cn-bj-test1.rgw.log' #出现large omap的pool% U* _7 q0 ?2 I$ W: U7 x1 m
    Search the cluster log for 'Large omap object found' for more details.
; E; H! W, _8 ]3 b% O$ m% ?0 |; V  W' [6 S

! y6 |1 T8 K+ n7 i2 h3 W. G- m[root@demo supdev]# ceph pg ls-by-pool cn-bj-test1.rgw.log |awk '{print "ceph pg "$1 " query|grep num_large_omap_objects"}'|sh -x
) D, ?' V# {3 n6 S8 u$ m$ Q8 Wceph pg 11.0 query|grep num_large_omap_objects- d9 H4 _% `8 o) \* }6 I( N
ceph pg 11.1 query|grep num_large_omap_objects: T# p0 L8 i( F# @& ]" x
ceph pg 11.2 query|grep num_large_omap_objects1 R0 K% y* U( i3 e: Q7 s) b
......7 _9 b* e. t0 O2 x/ W7 ]* `
+ ceph pg 11.1e6 query
4 a( `" w+ i- t+ K1 E9 D+ grep num_large_omap_objects) E9 r4 S5 n# f  k/ O- v
                "num_large_omap_objects": 1 #有large omap的objcet数量5 M3 J' }( U6 M
                    "num_large_omap_objects": 0
" f( Z. h5 z1 W* ]( Z                    "num_large_omap_objects": 0
' c& x/ h. M6 W! P7 j3 H
( h8 a; w7 s. p5 L7 k! i
5 j* z. X: e; L' x: s[root@demo supdev]# ceph pg 11.1e6 query #查询pg详细信息
. m. y4 r  x. c4 d{8 K! S- E$ q. l- w2 U7 r/ X( n
    "state": "active+clean",
, I( A, a+ ~# {0 z; c# s" R.....
; N$ r3 T/ [4 y6 m0 ?7 k+ H    "info": {
/ m) S- M6 B6 S        "pgid": "11.1e6",
3 T0 ]8 O9 \7 K4 H2 ~; y        "last_update": "10075'3051746",0 b# D; D0 C' A: x
        "last_complete": "10075'3051746",
4 @- u  X  f9 R  h+ g        "log_tail": "10075'3050200",. v- Z3 x9 d) i7 \1 x# p  y
        "last_user_version": 3051746,6 y+ l% ^7 e- ?3 \) g$ L
        "last_backfill": "MAX",2 B8 z. L5 ~( e' W$ k8 y! P5 P0 G
        "last_backfill_bitwise": 0,) M+ [/ Y1 G$ n0 n# p$ e
        "purged_snaps": [],: |' K6 n6 {1 G" H! ^' G& \
.....4 S  J4 ?* r' M0 D* ?' e* T& y
$ S. s6 j8 i0 A( U2 H
              "acting": [
" ^) b+ Y2 m0 z- O                    46, #主OSD id=46- t6 R2 ]. p) z' i( M, H9 e3 V! W, g% t
                    63, #从OSD* H% X, G! |3 C7 W# ~6 k
                    23  #从OSD1 m4 z5 \: y  F; Q
                ],
7 k2 f' v5 w, {- {- a            "stat_sum": {
2 F) {: G3 ^! o8 @4 }2 B- X0 `- A                "num_bytes": 40,
" X! ~: f5 i$ z) [* V, C' y                "num_objects": 2,: D1 \* m7 T' ~& v& l( {9 N
                "num_object_clones": 0,' L, c5 ^. L! }. K7 ]! k* o9 Q
                "num_object_copies": 6,
1 W9 k0 x) R* I, W. T+ s* y" ^                "num_objects_missing_on_primary": 0,0 x5 B+ ?! a& B1 F/ b: C4 v
                "num_objects_missing": 0,
* v# f' v1 i! D0 F% A% Z0 Y                "num_objects_degraded": 0,
2 j; E; ~$ C% Z& C                "num_objects_misplaced": 0,
9 n# }: a2 E% l, B3 R+ g                "num_objects_unfound": 0,! n" n; V, \9 r3 j
                "num_objects_dirty": 2,( ~8 N  W: o$ X8 o5 J
                "num_whiteouts": 0,$ M2 }# o; Z# T7 I/ ]* i  \6 H
                "num_read": 3055759,
6 x: A& _9 ^. E9 l& N                "num_read_kb": 3056162,! D2 ?& |3 s! c' |7 {
                "num_write": 5986011,! N1 p9 b% i& ]! h- U8 |
                "num_write_kb": 53,2 ?0 Y' u1 s' {2 T
                "num_scrub_errors": 0,
. p& e% D9 B8 r5 y1 B+ z                "num_shallow_scrub_errors": 0,
: m' T8 {/ y8 J                "num_deep_scrub_errors": 0,
1 }; D6 I9 i) g9 R  Q7 C7 S                "num_objects_recovered": 0,8 l) {$ Q% f+ q# R( L5 @0 G
                "num_bytes_recovered": 0," V$ S  S/ [1 B
                "num_keys_recovered": 0,
. X$ b6 G( ], R* b. g# _' r5 u                "num_objects_omap": 1,7 W& V* L& J3 A2 X' B4 T! s- s
                "num_objects_hit_set_archive": 0,
  B% u1 e  D# C' ~. z/ D3 l                "num_bytes_hit_set_archive": 0,2 k2 d. T5 ^# ?5 G$ J
                "num_flush": 0,
, k) c$ b4 m1 p  m, R1 I3 ~                "num_flush_kb": 0,8 E* R0 F4 W% k9 z4 I1 e
                "num_evict": 0,
& J, p% f7 m7 S* I                "num_evict_kb": 0,
0 Q( Q- J8 V0 K, u7 l: P                "num_promote": 0,
3 p$ f# T9 `& N& L7 p3 a                "num_flush_mode_high": 0,
. v" L( M* u. v4 H: `. E0 ~, U, r                "num_flush_mode_low": 0,  T8 p* m3 h0 V4 \
                "num_evict_mode_some": 0,
2 O+ b5 I/ i, x; b                "num_evict_mode_full": 0,
. C' |6 u  n' `) n0 V5 y                "num_objects_pinned": 0,- v7 `, I1 F) J- r! w# p
                "num_legacy_snapsets": 0,
; d4 C4 d# d2 i$ `                "num_large_omap_objects": 1 #large omap的object数量% a' E' _5 u, z, x2 c8 Y3 @; ~6 t
            },
$ i  u; y& l) N& m; E% X            ...2 i  E0 ~# e, r" Z: y1 C7 O0 r
                "agent_state": {}
& D% F. `! T9 H8 W# v6 s}6 O3 z9 f6 x' P9 O* |
  B3 c5 a  y) A6 Q

5 _8 B) v, X/ l6 S1 }6 U% c[root@demo supdev]# ceph osd find 46 #根据OSD id找到对应的主机信息
' D* m/ L$ Q; F  x0 H' p  ]" w{
( `( T8 j/ |8 U/ T: U    "osd": 46,2 M' D) T- r- ^# d
    "ip": "100.1.1.40:6812/3691515",7 R! v' i  q% m6 H
    "crush_location": {3 m" Z: G7 ~& a# `2 Z
        "host": "TX-100-1-40-sata",# e$ ^( I6 L, n, C5 @
        "media": "site1-rack2-sata",( @1 @0 `; y. l
        "mediagroup": "site1-sata",
! {- o7 Z5 u8 k( `5 h        "root": "default"
6 s' q# ^* x7 f) l* o& J    }( ]" N# f" \% c! N3 a* x4 Q6 I
}- x, P( w/ b1 i1 G+ j+ x2 b6 ]
2 w$ t% U" m" S0 ~) C' u" a

' f1 S$ H9 G3 g& B[root@demo supdev]# zcat /var/log/ceph/ceph-osd.46.log-20181210.gz |grep omap #根据OSD日志找到具体的object名称- b4 I2 e# B9 w7 i4 n
2018-12-09 23:03:18.803799 7f90e9b46700  0 log_channel(cluster) log [WRN] : Large omap object found. Object: 11:67885262:::sync.error-log.3:head Key count: 2934286 Size (bytes): 657040594
5 R- O* _0 G, J* l6 n#OSD 46上的object名称为sync.error-log.3的omap超出标准9 K# Z( t. }; Y: e1 S

' T  v4 E' j4 Y9 |8 `" o  U' H4 I  H8 k. u
# K8 d  d* [) }
[root@demo supdev]# rados ls -p cn-bj-test1.rgw.log|grep "sync.error-log.3$" #确定objects存在
! u1 w) p( a; t: X" u4 ]sync.error-log.3
9 H8 i7 ?/ X+ D: d# z
, |+ ]- ?( Q8 f#注意整个multisite的同步过程中的错误日志信息以omap形式存储在sync.error-log.* % H6 M" F1 t& ~- u" P  a( j& r
#吐槽一下,错误日志分32个shard存储,代码写死了,而且错误日志目前还只能通过手工清理,无法像其他日志一样自动trim,随着错误日志不断堆积,才引发了今天的问题。
( w" _& h! u4 z# ?) j/ |2 B. a/ }, n
[root@demo supdev]# radosgw-admin sync error list|more#查看错误日志( E5 {( r/ I& z% L
[
- U; d4 y/ N9 X2 R- d7 r! _4 w/ b    {5 u. a5 O- y' r
        "shard_id": 0,+ J; l/ r4 {4 ^
        "entries": [( o/ q3 V4 y* R1 m( S5 I8 j
            {" S$ @  {- E& x. P! m0 \& U* t
                "id": "1_1540890427.972991_36.1",
7 e, a( }7 z7 t                "section": "data",& m4 t+ g+ n( V+ t' T- ~3 i
                "name": "demo2:afd874cd-f976-4007-a77c-be6fca298b71.34353.1:3",; N$ ?' x$ R. S4 \: e1 a
                "timestamp": "2018-10-30 09:07:07.972991Z",
3 C( f  v; w6 x                "info": {
& p- a# g+ |- y' F7 }                    "source_zone": "afd874cd-f976-4007-a77c-be6fca298b71",
' @4 Q& |* |) n' U                    "error_code": 5,
% ]1 Z4 `( R2 w5 [" U+ C                    "message": "failed to sync bucket instance: (5) Input/output error"  w- |, n5 @7 n2 h1 E1 S
                }
; y5 w0 v8 E# s' e! w3 r            },
7 K2 N; r* z( l......# A6 A, }7 S1 A9 |
            {+ H& O1 L& }3 Q3 N& {" H
                "id": "1_1543395420.626552_32014.1",; v, P6 c; y/ l5 X1 D& Z
                "section": "data",1 _3 ~8 p) D" ~4 H) P
                "name": "demo1:afd874cd-f976-4007-a77c-be6fca298b71.34209.1:0/file1205085",
4 }+ ?8 }* D+ O( d                "timestamp": "2018-11-28 08:57:00.626552Z",% h( y: U# a- q8 l* n
                "info": {
* k* u% x  @+ R                    "source_zone": "afd874cd-f976-4007-a77c-be6fca298b71",8 W! p) q' l; a' R6 Y
                    "error_code": 5,
' Q/ Z+ l; c/ K2 h0 R# j6 v3 W                    "message": "failed to sync object(5) Input/output error"0 w; K, a7 j5 y  p! t6 i2 k& j
                }
0 m! ^5 F- @- q/ V            }6 X5 Z) k* z" V% [# l: G% n$ U1 y

4 L% Q0 T# [# M4 N( T* U7 v2 z' ^4 t9 X
[root@TX-97-140-6 supdev]# radosgw-admin sync error trim --start-date=2018-11-14 --end-date=2018-11-28 #按日期清理错误日志记录
, V0 y& y1 ?! ^! b7 Y复制. a, H3 D) k5 X7 R; c* M
优化定位效率
  ^3 ?5 y& H: _# ^简单写了个脚本,先根据warn信息找pool,之后再根据pool找出有large omap objects的pg,凑合用,不保证没bug,在12.2.10下面测试通过。8 ]; Q# x7 i8 e5 l; @! x
7 P( I, Q/ G- Z
[root@demo cephuser]# cat large_obj.py
+ k% ]. f7 C! Vimport json# M' f6 f9 N+ h0 A3 {3 T8 e
import rados0 h  J) \' _0 S) k
import rbd; [. X  U/ g$ c: R) m, J
, o7 T. N5 ]" B' e0 W+ |: B
ceph_conf_path = '/etc/ceph/ceph.conf'
" k) w0 {. Y+ m& v$ W% Arados_connect_timeout = 5
# h( |! P2 P1 |8 O" ]) c* y6 I, X! l7 }* G, K
class RADOSClient(object):
( X* F" F4 s# i: d    def __init__(self,driver,pool=None):
: N  g( N( L2 R. X, p        self.driver = driver
2 }, R) D- r$ }$ y4 l        self.client, self.ioctx = driver._connect_to_rados(pool)
, ~1 L7 m, C  N7 n, e" U    def __enter__(self):8 [9 K1 l1 O6 x; b* H2 n) o% q
        return self
. c) v) e# W  x, O7 @" _" R    def __exit__(self, type_, value, traceback):
! @2 G1 N( K0 M% U) M        self.driver._disconnect_from_rados(self.client, self.ioctx)" N, l# T$ r. e9 k
% W8 M7 Y0 G! G; n5 B0 A
class RBDDriver(object):
. |, E' K  ^6 M( i2 d" R0 \    def __init__(self,ceph_conf_path,rados_connect_timeout,pool=None):8 }0 g+ G& c% p/ U
        self.ceph_conf_path = ceph_conf_path! ]. Q/ @9 m5 f) ^" P
        self.rados_connect_timeout = rados_connect_timeout& Y5 V* s* S& j. R" s) Z6 b! }' Q" G  E4 j: ?
        self.pool = pool% o, q* l; S2 a/ A& {# V( A
    def _connect_to_rados(self, pool=None):
9 _  N! e' G8 J+ F; }        client = rados.Rados(conffile=self.ceph_conf_path)
# Y7 J3 A$ f+ q+ [3 o( U2 a  D        try:1 a# s( _8 u1 A0 v- g
            if self.rados_connect_timeout >= 0:( s4 q3 Q3 ~. q
                client.connect(timeout=
  x( `- h+ A$ f) h4 R                               self.rados_connect_timeout)" l6 B5 }& \' J/ c; y
            else:
2 W+ P5 j- g, H6 N: v1 ]2 R                client.connect()
  q$ L; ~" Q/ o            if self.pool == None:$ {9 f) {( j+ f9 \7 T2 Y
                                ioctx = None
- E; p3 [* z, n0 F3 g) A            else:
) I  l1 e+ v7 G* A% a- ^4 S                                ioctx = client.open_ioctx(self.pool)
" j& `" I/ E- `0 s. ^* e            return client, ioctx" R4 ~- i7 j% `( V
        except rados.Error:
! `4 f3 [' o# G0 a            msg = "Error connecting to ceph cluster."
, N( e& v1 P7 P4 x            client.shutdown()% ]; e2 m% k: i9 a* ]
            raise msg- x, `! s9 d/ W4 H7 O3 x4 b1 t

! d6 Y5 s) K+ v* @3 R8 T    def _disconnect_from_rados(self, client, ioctx=None):% J8 l* }2 T) h$ W
                if ioctx == None:
6 p$ Y6 c9 r( o- ^( c& j3 N                        client.shutdown()
9 _/ _6 {1 r5 d# i                else:8 E; S- K. R+ h& x$ G. ^
                        ioctx.close()5 ]: s- c2 x3 c' o
                        client.shutdown()
; m. J) z, {' f; h4 d" t
% W) a$ ^  R* `  u9 d; W* a6 Aclass cmd_manager():
2 G# ^" D6 |7 c    def get_large_omap_obj_poolname(self):
% b4 L, y0 O: i3 V        with RADOSClient(RBDDriver(ceph_conf_path,rados_connect_timeout)) as dr:) Y/ \9 v! ^% w: G
                result = ''
+ N$ v' b6 L9 E  z" v                cmd = '{"prefix": "health", "detail": "detail", "format": "json"}'' R- K* K7 w8 @7 q6 d1 [3 _
                result = dr.client.mon_command(cmd,result)
# U! G$ l7 v& }! N& \- b; [# O' _                if result[0] == 0:
& o; T. {/ O9 S% c. S6 s, _                    res_ = json.loads(result[1])
$ h! Y+ f# t4 L2 }" W                    if res_["checks"]['LARGE_OMAP_OBJECTS']:
% B( a: Z" e8 Y# B) _, ]6 ^                        return res_["checks"]['LARGE_OMAP_OBJECTS']['detail'][0]['message'].split("'")[1]  Q* i' f+ v) u4 k  `7 T* F
                else:- |" m# I9 _8 O. f; U- I
                    return False
& ?. q0 e# k, r    def get_pg_list_by_pool(self,poolname):
$ V1 t/ K1 N+ v$ z        with RADOSClient(RBDDriver(ceph_conf_path,rados_connect_timeout)) as dr:" G; ^* z. u( d
                result = ''
8 u$ _# {% j3 E( O) g; s                cmd = '{"prefix": "pg ls-by-pool", "poolstr": "' + poolname + '", "format": "json"}'; q5 l# I5 Q( G6 z% c
                result = dr.client.mon_command(cmd,result)
; a  {' B) T" N                if result[0] == 0:
- G! A/ P* c# v1 g+ I4 `                    return json.loads(result[1])
, E7 V  K. o3 A! ?                else:9 k1 y/ T$ _5 D8 S- ~
                    return False. M( R, r/ C' p4 j7 p9 [6 N
2 A/ ?2 m# L1 t" A
cmd_ = cmd_manager()
6 [( @2 v  p* Q* [poolname =  cmd_.get_large_omap_obj_poolname()
, _1 B! i* K" T: @% |print "Large omap objects poolname = {0}".format(poolname)- \3 ~& r& ~4 A) X
res =  cmd_.get_pg_list_by_pool(poolname)) R/ v0 w2 R( K1 @9 H: G/ u1 O  O) ~; j
for i in res:0 T1 {- p/ X" U; E6 V
    if i["stat_sum"]["num_large_omap_objects"] != 0:
* l: ?; T3 e! q2 S9 l( ~4 Q        print "pgid={0} OSDs={1} num_large_omap_objects={2}".format(i["pgid"],i["acting"],i["stat_sum"]["num_large_omap_objects"])
# u' W8 }9 y# M复制
: \7 |6 }8 @% d& R再爆一个雷
: \# H6 h( k+ S% Z如果你认为通过上面方式清除omap集群就能立马恢复状态,那就太天真,告警信息“HEALTH_WARN 32 large omap objects”依然挂在那里不尴不尬,虽然omap清理了,但是因为对应PG状态没更新,所以告警信息依然存在,只能通过手工或者其他方式去触发PG的状态更新,我这边是通过ceph pg deep-scrub {pg}去触发pg信息更新,注意如果你用scrub是没用,必须deep-scrub,这里又要吐槽官方的逻辑设计,真是WFK!当然你也可以放那里不管,等后台自动deep-scrub也能恢复。
+ r. r5 q: A. i/ T* H0 m1 a5 d
2 T  I: e- w  N+ Z
您需要登录后才可以回帖 登录 | 注册

本版积分规则

返回首页|Archiver|手机版|小黑屋|易陆发现技术论坛 ( 蜀ICP备2026014127号-1 )

GMT+8, 2026-6-11 23:56 , Processed in 0.021489 second(s), 23 queries .

Powered by Discuz! X5.0

© 2001-2026 Discuz! Team.

快速回复 返回顶部 返回列表