|
|
|
临时调整osd full的阈值,然后删除不需要的rbd磁盘 ceph tell osd.* injectargs '--mon-osd-full-ratio 0.98'
1 o; B1 x+ @9 m' G& j
; O9 q/ [: y+ K/ {' i+ e7 x调整每个osd的weigh值,使数据重新分布# ceph osd crush reweight osd.10 1.05 ! C0 O( r5 F, s7 a& j5 X& X
. w% c8 e4 [, R/ E6 [9 V
4 d, d4 |5 B$ j
reweighted item id 10 name 'osd.10' to 1.05 in crush map
, x3 z, g. n" d2 I# | c
2 @' J4 v! z+ U0 h5 V
$ G9 P2 G( {; O& v) W9 {6 Hosd缺省的weight值为1,调整以后数据会向weigh值高的osd上重新分布,把一些比较空闲的osd weight值调高,接收数据,使用率高的osd weight调低,释放数据/ R f. V& _2 l* v1 J) A& I
9 d6 X. f, p/ O9 h. p f2 }# Q
; n" x0 ?& i0 P1 @/ n9 Q5 |8 j; mceph中各osd的pg数量是近似均匀的,可以认为各pg下的数据容量大致相等,因此从原理上来说保证各osd pg相等,则各osd上的磁盘使用量也差不多相同,但是由于算法做不到绝对均匀的原因某些osd上的pg数会相差比较大,这样某些osd上的空间使用量就会比较多。建议是ceph部署完成,各pool也创建完成后,主动手工观察,通过命令调整osd的权重来调整osd上的pg数;, \ [2 w) k: C- V/ s
# [% N% @* o. _ K: B" i: [+ c+ M
统计各osd上所有pg数:
& @ J. L8 @7 }; N: _; vceph pg dump | awk '# g" d9 d s/ _# \( t! A) Z
/^pg_stat/ { col=1; while($col!="up") {col++}; col++ }
! Z! b1 `* T6 M) \* f" s, e /^[0-9a-f]+\.[0-9a-f]+/ { match($0,/^[0-9a-f]+/); pool=substr($0, RSTART, RLENGTH); poollist[pool]=0;0 b0 m( O4 A2 ~$ X: E& F. {+ M0 G* B
up=$col; i=0; RSTART=0; RLENGTH=0; delete osds; while(match(up,/[0-9]+/)>0) { osds[++i]=substr(up,RSTART,RLENGTH); up = substr(up, RSTART+RLENGTH) }
' V1 b2 V1 j; d; \0 F for(i in osds) {array[osds,pool]++; osdlist[osds];}# W$ T: N1 H9 O
}' Y- j: i$ Y9 r7 M
END {- K7 O. B7 Z5 ~1 h& ]* H" V7 D
printf("\n");/ G8 D9 h( c) |4 J1 U4 m+ C0 ~1 L% \
printf("pool :\t"); for (i in poollist) printf("%s\t",i); printf("| SUM \n");
4 \+ p5 p R2 d+ J4 L, i/ Q4 D8 T for (i in poollist) printf("--------"); printf("----------------\n");
( y, o9 ^0 h( q3 e1 x8 Y! E7 ~ for (i in osdlist) { printf("osd.%i\t", i); sum=0;
- P8 t) J& {: u' N. ` for (j in poollist) { printf("%i\t", array[i,j]); sum+=array[i,j]; poollist[j]+=array[i,j] }; printf("| %i\n",sum) }6 D3 ^5 N# o- A
for (i in poollist) printf("--------"); printf("----------------\n");
' w P3 t- r& [ printf("SUM :\t"); for (i in poollist) printf("%s\t",poollist); printf("|\n");
6 M; u& s4 L# `/ L* e! _0 |}'/ k. u' t& a' d6 o0 q; p- M
- y1 T2 B4 }: i( I" P3 {5 |9 W( h9 ?* l! ` \. `
上面这个是获取各osd所有pool的pg数,其实我们只关注default.rgw.buckets.data这个pool,其他pool数据很少,我们通过ceph df知道default.rgw.buckets.data的pool id是23,通过一个更简单命令得到23号pool在各osd上的pg数排序:, |$ C% F& c! s+ u6 C" X
8 b& o ]9 u; V. i5 Aceph pg dump|grep '^23\.'|awk -F ' ' '{print $1, $15}'|awk -F "[ ]|[[]|[,]|[]]" '{print $3, $4}'|tr -s ' ' '\n'|sort|uniq -c|sort -n4 I5 X" V1 v: L4 h
6 T2 D7 R+ P5 K- i4 p
输出如下:
! L9 a8 O" d' M& M- g
: a/ K3 w7 v1 ? 95 1( {7 S2 A" ^. u
95 312; h7 k5 A0 T( w; c3 z8 Y- I' b
96 2523 E! z- x1 ?5 }+ K! c" e
99 177" v9 x* f1 V5 ]8 J* P6 |
99 265
4 j& W m v2 g$ D" ` 99 62( a+ f6 k* ^. `2 X9 I9 ^( y6 V- x" v
101 121
9 ], Q. T+ J2 Q, g) @ 101 261
+ r1 R9 k. S" A4 U+ K$ c. C) G# A2 v+ T3 V! ]2 p; [9 G" w" i( [
......
" z, o# e' h$ r& F3 V4 h1 O& H! @. d2 T
132 102
1 l3 {! b8 N: f% K 132 105, m+ i K! ? S' {: @
132 179- s0 m& D }( V" @* H; n8 M+ R, H
132 253
" V3 K( n% f- v 132 256
' L& h6 a) r' _( u. u 133 111+ y& S/ F. a+ l9 T/ V0 R
133 115
' O* U" n) F7 V% \+ q3 r- B 133 1510 C+ q N0 c6 J. h9 V9 r
133 203, W+ c& F0 @/ R9 C1 V: W* f" y
133 259
( ~+ ~% ~' H: a7 D% ^- C+ x% ` 133 271
2 Q( Y! k# j& o- E9 u- V" e7 A 133 292
- Z. |3 D) D7 V" Q2 U4 V2 e% }4 f) h 134 257
8 D) M% O, q5 |% G 134 302
# |. m7 }. H( k; O' P 134 61
G, U: T& w. B5 `, S4 T# y 135 220/ ^+ R# }1 J9 [/ [9 K
6 L6 g$ p- R* O6 i8 s4 n7 }
可以看到最少的osd.1上面只有95个pg,最多的osd.220上有135个pg,调整pg的命令:' P# _/ ^, p8 t* _: F$ }
: q' F8 f* u: m' _
ceph osd reweight-by-pg 105 default.rgw.buckets.data
) R* l' k+ z+ R. J. \3 Q3 M. k3 u0 x! S1 j" q k# |
注:105这个数字貌似不起作用,ceph内部会自己调整
" z# x. f ]9 D: M5 P: z
1 q/ c% C, t/ ~' l# k& I
) l# ~; V* K, q5 Q0 T" |3 v0 l4 x" Q) `5 [% B6 f& \
由于每次手动观察比较麻烦,可以通过计算每次调整后的个osd上pg数的方差来判断效果,如果每次反差在减小,说明分布相对更均匀一些:
# d) Y5 `. B: s; r g+ z$ X& l9 n# }( J+ E( W6 z+ T
ceph pg dump|grep '^23\.'|awk -F ' ' '{print $1, $15}'|awk -F "[ ]|[[]|[,]|[]]" '{print $3, $4}'|tr -s ' ' '\n'|sort|uniq -c|sort -r|awk '{printf("%s\n", $1)}'|awk '{x[NR]=$0;s+=$0;n++} END{a=s/n;for(i in x) {ss += (x-a)^2} sd = sqrt(ss/n); print "SD = "sd}'( r1 \% q* k2 _. @7 D6 K- l: d
dumped all in format plain
* [' w% b, K4 s9 z/ ESD = 8.71607
$ k; H! A8 o9 ]# T. x/ a3 I/ t+ ]0 l) y$ B, H
: G4 |0 L' a) s4 B' I/ y
, ^" F6 z# H, x9 x" K/ {3 H9 N1 b9 o9 k3 A' p0 x4 H5 u* e
* J# B6 X2 l: ?# d- l5 c' Z* j
' w3 ?- p& m9 ?" f. ECEPH 数据不一致时,需要对ceph pg的数据进行平衡
7 c% A2 d" S' d* g6 o3 y9 Z* u; K/ l X& R. \
1:检查数据分布是否均衡2 p- F4 c- h1 d g/ U% a5 w8 k: y
" I9 g* j. V' Q, k#查看osd使用情况# U- e3 c% h/ ]" O: X' T
" q9 C9 i2 T1 f8 J* ]& M
# ceph osd df tree
m! C% E' }9 `2 j3 z# E
3 a- P# F/ Q; h1 }0 X#查看osd_num,PGS, %USE
3 ^% o1 I1 |+ I% V2 n" Y' c, _( S; e+ [* Z' ^+ Q7 G5 y
# ceph osd df tree | awk '/osd\./{print $NF" "$(NF-1)" "$(NF-3) }'5 Z, N) e- T1 j- |# v
$ d# r7 {% Y! B+ l' i
osd.0 up 0.92: q& V T) @( }0 T' b# M* o: }
8 G, d8 x+ C0 ]3 C9 P
osd.3 up 1.02
' }5 b0 R% X. E) S4 H+ l
0 d. F8 @. m0 g4 H- P% Zosd.1 up 0.901 U, x, _8 O1 M( p) \, i7 }' b
0 {/ n# J2 L* C& Fosd.4 up 1.23
% z" v/ z. f- B) Y; y e) y, }( a5 \2 g8 @# J$ r5 ]8 ]
osd.2 up 0.95
" U. Z5 n5 o# m. i8 P) ~. V& R: W9 m& v9 C
osd.5 up 1.03
% W* V+ }0 ~' }* t% g/ s: Z* g* S0 q$ @
#, S2 l, R5 n# {7 v j9 P1 d. T3 m
$ V/ e. M* @0 Q( C& D2:reweight-by-pg 按归置组分布情况调整 OSD 的权重
, D6 q5 M2 k. }, F! K6 K
/ I! Y& z! V* C- i; T7 ^# ?6 k# d# ceph osd reweight-by-pg6 B) e4 [$ g* _2 z
0 s7 |1 |% s/ Q0 l
# ]) B' B6 e( ^" J* ]
8 W C# [( e2 I3 aEX:' K* c4 A: Q3 R" P$ ^, \9 ]+ n
. p: F b1 X9 U1 s6 e $ceph osd reweight-by-pg" l) f) F( m* n/ }$ J
moved 35 / 4032 (0.868056%) #35个PG发送迁移
3 ?2 z$ y9 a f1 } avg 115.2 #每个OSD承载的平均PG数目为115.27 t+ V$ T4 H; C0 I" ?2 O
stddev 10.378 -> 9.47418 (expected baseline 10.5787) #执行本次调整后, 标准方差将由10.378变为9.47418
2 k* e' U l* g. M, c" a! L min osd.15 with 93 -> 92 pgs (0.807292 -> 0.798611 * mean) #当前负载最轻的OSD为osd.15,只承载了93个PG, 执行本次调整后,将承载92个PG
" f! P' T5 o: O+ E max osd.6 with 141 -> 132 pgs (1.22396 -> 1.14583 * mean) #当前负载最重的OSD为osd.6, 承载了141个PG, 执行本次调整后,讲承载132个PG
4 s- B; W/ f& Q; n' u! } oload 120+ |+ W6 K7 t) r$ s
max_change 0.055 f8 f# L% {5 W- A% {+ D' F
max_change_osds 4
/ w! g# \& ?2 q! M1 j average_utilization 21.1365
, _+ h& l1 C- p# n overload_utilization 25.3638
' L) ]& ^( [$ P osd.6 weight 1.0000 -> 0.9500 #执行本次调整后,对osd.6,osd.23,osd.7,osd.27的reweight进行调整# i+ O+ [( I2 A* L6 w! I
osd.23 weight 0.8500 -> 0.9000
4 h' _# }3 J6 w5 P, Q7 b' O osd.7 weight 0.9000 -> 0.9500
( J0 J, g6 b0 W7 M/ h$ x- s osd.27 weight 0.8500 -> 0.9000
2 c" @! B7 l* U8 f2 f8 Q& P6 k/ f0 y/ V2 w h4 n
& q& d/ K/ B6 h1 c4 K7 C C
+ ]8 }# D. ], R5 a0 ~ Z3: reweight-by-utilization 按利用率调整 OSD 的权重
% a: h! A+ d# Y: Z( R7 {& `/ v1 X
3 l& F; |2 A8 T# b
8 q; `1 m$ p6 M6 q) N
6 k0 J0 x, Q3 E+ Z# B, q2 |5 N$ r: ~# ceph osd reweight-by-utilization# C0 i& h' S' h+ V1 }9 b
2 _% Y1 m0 c+ C: P X! N1 U
moved 10 / 843 (1.18624%) #10个PG发送迁移
8 n, |, `8 H) D. J' c9 S3 E0 U* C/ }7 a) r9 A- U9 O( o4 B
avg 140.5 #每个OSD承载的平均PG数目为140.5, v1 \; P8 l- J
. I+ P& G' Q, }3 H9 z1 B; j0 {stddev 8.69387 -> 12.339 (expected baseline 10.8205) #执行本次调整后, 标准方差将由8.69387 变为12.3397 O4 f8 _- \7 d
: {& P7 L6 m% O+ ^+ c2 [. U
min osd.3 with 127 -> 127 pgs (0.903915 -> 0.903915 * mean) #负载最轻的OSD为osd.3,只承载了127个PG, 执行本次调整后,将承载127/ d- m' W9 h1 L' V) c! t( z
# Y' H4 b& w S% X4 f
max osd.0 with 154 -> 154 pgs (1.09609 -> 1.09609 * mean) #负载最重的OSD为osd.0, 承载了154个PG, 执行本次调整后,讲承载154个PG" B% T$ M9 Z2 B! m
- S* \: X( t P& k
4 s$ W2 z3 z7 f# g2 g7 ?, S4 T) C) M9 ?, ?' [/ M" O" P+ t$ O
oload 1207 I; `, `1 Q- Y, G8 c0 e9 j
* Z! H+ x9 j. ]1 e- F2 K: ]4 o3 r) u. X
max_change 0.057 k! T$ v/ o0 z9 V% {5 E
' T7 g5 ~7 I) L( G
max_change_osds 4
' [- ?9 i4 T, p; {5 R2 |. k$ ~+ G" A1 r1 ?
average_utilization 0.0904
6 d/ a! ]+ P; {! U# l1 t+ H/ O' ?+ S6 n- G
overload_utilization 0.1084
, E5 o ~9 p6 D9 t3 \; f
& s* Q( A2 ^; o5 s: W0 h* I: D5 Tosd.4 weight 0.9500 -> 0.9000- |% `( i# a, I
' E; G" _ r+ K# }7 o3 c5 V! K检查数据的平衡状态:
& ~. W# H6 M' W; i+ a5 v' a+ `& }$ k- j% X8 w5 J2 T
# ceph -s8 P2 m/ b5 M) I$ k4 i3 B9 t- I
- V: V8 d" ], @" p; |0 L' ]% p& z3 B T" x
0 j9 M* r% R- r; a 4: 数据均衡后还原权重
$ w. z K% \% m
+ Z1 [" Y( y( A& u#统计osd_num, REWEIGHT
1 T# m/ C- s( n; u+ C. Q
5 z6 b# F3 V3 S7 s/ s ?) O[root@node-10 ~]# ceph osd df tree | awk '/osd\./{print $NF" "$4 }'9 R$ S. P' N6 q( R& c+ H9 L
8 M; i. I7 j+ ]6 e
osd.0 1.00000
. _- Z: k. f$ z i9 y3 f
6 P+ E. y& \+ C$ ~6 K( Kosd.3 1.000005 q, _2 r) R4 s9 [
/ W0 ~( g% L& t& N
osd.1 1.00000
! `7 n8 J- F6 C+ H8 m% ~: ]8 U, S8 a! n
osd.4 0.90002
' Y3 B' O# m# }- _; H w1 O: T2 L
1 D& Y& e8 ?# M) }5 x1 O- p0 h" |osd.2 1.00000
5 ?5 D3 Z6 ]6 N7 {7 x- x B8 T) @7 r# t# S
osd.5 1.00000( `7 w& d# ?7 z% _+ t
, O: X5 }8 q) m) k I8 N[root@node-10 ~]#
$ Q" F0 O& ^6 w+ [, u
' ` W8 o6 ?* |8 O0 s/ X; Z( H% O9 ?( _* j" {
1 B: Y! p% s$ K9 t0 V#依次设置osd权重为默认值,1.0
8 U8 i$ S! D, x5 A0 L
o& r% g r. p/ y#ceph osd reweight {id} {weight}
! s# S z7 f6 M1 H+ A0 z4 _. n U
2 K7 J1 ~4 y$ D" T C7 {#说明:osd weight的取值为0~1
' X* T0 q% M0 o) X- i3 P. D, ^4 h F) h% P \7 @
% T# {) b/ S; t) b$ h6 O
q0 s7 L4 S8 [3 d4 |, u$ ceph osd reweight 5 1.0! C: H% l" Z8 x( ~
7 \7 G0 {1 n% e* Y2 [; Q
$ Q k& Q& `+ u% s$ K& q# J1 W通过ceph osd tree 可以查看到 weight 和reweight的值
" U o- o5 A: r5 \* q h
0 Q. M* b6 p+ e2 L! X( s1 P& ^# g8 }+ G) i1 s: R1 {
( `6 Y7 {6 ~' Q# A+ a |
|