|
|
|
临时调整osd full的阈值,然后删除不需要的rbd磁盘 ceph tell osd.* injectargs '--mon-osd-full-ratio 0.98'
+ T2 ^6 ^+ U/ O6 l
& P+ b; c+ y2 O: \9 s5 E- x调整每个osd的weigh值,使数据重新分布# ceph osd crush reweight osd.10 1.05 : y$ c$ y' j9 ~5 X4 ?5 V- y
# w/ D! L6 k! [( Q9 [ l% U/ Y- [6 g; B2 t
reweighted item id 10 name 'osd.10' to 1.05 in crush map$ Z4 y) H* Z# v
' N, g7 L% U- @
6 D8 E7 f# U0 I: Q8 Z6 G/ j" X
osd缺省的weight值为1,调整以后数据会向weigh值高的osd上重新分布,把一些比较空闲的osd weight值调高,接收数据,使用率高的osd weight调低,释放数据
9 z% u, G3 ] f( t3 m# ` q& q g6 G! c
5 j# P2 g8 s/ i+ F+ ~# p2 ^# _3 d
ceph中各osd的pg数量是近似均匀的,可以认为各pg下的数据容量大致相等,因此从原理上来说保证各osd pg相等,则各osd上的磁盘使用量也差不多相同,但是由于算法做不到绝对均匀的原因某些osd上的pg数会相差比较大,这样某些osd上的空间使用量就会比较多。建议是ceph部署完成,各pool也创建完成后,主动手工观察,通过命令调整osd的权重来调整osd上的pg数;
; n) q9 s0 w& ~' w
+ P, T8 C! J0 ~$ p/ L8 H$ ~统计各osd上所有pg数:% s' m6 o5 r: k$ r: r
ceph pg dump | awk '& C/ Q7 }4 A7 t5 u+ J4 ^4 n, x0 ?
/^pg_stat/ { col=1; while($col!="up") {col++}; col++ }
1 g6 [! J! j/ ?! Q* S+ A /^[0-9a-f]+\.[0-9a-f]+/ { match($0,/^[0-9a-f]+/); pool=substr($0, RSTART, RLENGTH); poollist[pool]=0;
& l6 L. }6 @0 z8 V! n up=$col; i=0; RSTART=0; RLENGTH=0; delete osds; while(match(up,/[0-9]+/)>0) { osds[++i]=substr(up,RSTART,RLENGTH); up = substr(up, RSTART+RLENGTH) }
) S6 F. t+ y0 `* ~ for(i in osds) {array[osds,pool]++; osdlist[osds];}
$ \1 Z! B6 a8 }( m9 t1 `}
1 c! | u1 f2 _3 n1 iEND {
7 I) K( ~" D6 R* t8 l+ t printf("\n");
/ c5 ]+ ^) e% g" U printf("pool :\t"); for (i in poollist) printf("%s\t",i); printf("| SUM \n");
, F/ Z+ D4 f* J% F& K. g: U for (i in poollist) printf("--------"); printf("----------------\n");
7 a" Z" v5 _0 `. A9 b3 T for (i in osdlist) { printf("osd.%i\t", i); sum=0;
6 E F" U# z/ B for (j in poollist) { printf("%i\t", array[i,j]); sum+=array[i,j]; poollist[j]+=array[i,j] }; printf("| %i\n",sum) }3 @. C- x& c- i
for (i in poollist) printf("--------"); printf("----------------\n");
* m t' i; V; A2 c8 C printf("SUM :\t"); for (i in poollist) printf("%s\t",poollist); printf("|\n");# W/ e1 A9 O6 O* \9 T9 p
}'! e: a1 \5 S( R) W7 \, k
; T( a% [' H A8 p- Q5 j ?: |& w( I V$ m2 F
上面这个是获取各osd所有pool的pg数,其实我们只关注default.rgw.buckets.data这个pool,其他pool数据很少,我们通过ceph df知道default.rgw.buckets.data的pool id是23,通过一个更简单命令得到23号pool在各osd上的pg数排序:
' z# z l. q, G% Y( r) L8 ], x. k! y+ f2 _
ceph pg dump|grep '^23\.'|awk -F ' ' '{print $1, $15}'|awk -F "[ ]|[[]|[,]|[]]" '{print $3, $4}'|tr -s ' ' '\n'|sort|uniq -c|sort -n
& n6 k5 h+ e- A9 C3 d' v1 E: q
! X2 J0 N, h- B Q: `5 p/ J, y输出如下:8 l' v% ?1 B. B2 U
$ q( n) c# t" P1 o
95 17 q; P1 Q+ a( i) I( v0 \% h
95 312: `% D" j" x/ L' ]: W r
96 252; J2 E) d; Q/ f- l w5 V* A; Q& X- ?, S
99 177
: d( H) Q- Q6 }0 \' \' ^ 99 265
+ `7 c. Q5 @0 ^+ l# I2 I3 j4 d 99 62
0 ?8 T5 s* ]! k% F4 t 101 1215 o M& A# {. a) `" `9 `
101 261
- u# o/ K6 a8 e1 u! q" {6 [8 K2 l7 G& x; h3 N2 I8 g. p/ M0 M
......4 _+ G- m; B6 \3 p; h9 q6 c
4 s; u6 s+ ?2 S: h
132 102- m' }# v0 ? a L! I' ^, J) x
132 105" w$ }+ S9 W3 H2 N8 M
132 179/ i5 r* M3 D) K
132 253
+ I) e$ h' ?& c3 D; s6 C+ t+ t( P 132 256
( j0 J. M4 j/ Y- E 133 111
+ K* e/ G: a: A/ u$ c. } 133 115- B: N/ e2 |, n- W! d8 r/ i
133 151
+ z$ R/ w9 w; H 133 203
% i/ |5 S; e% P' Z 133 259
" B% z8 K+ ? P* k+ s7 c 133 271# O, C5 B9 T1 z% G4 z8 n% ^0 k% W
133 292% I8 e* Q6 B/ r% w B4 u- }
134 257
* g4 m# v+ S: E7 J 134 302
- M. c5 H, E% P4 u7 U7 [0 D 134 61
6 U: W1 F$ W0 n0 A, M/ _2 N5 U7 ?/ _& } e 135 220
8 F9 D5 y7 S3 p" x1 Z& W1 f/ `' Z1 b6 i6 Q( X
可以看到最少的osd.1上面只有95个pg,最多的osd.220上有135个pg,调整pg的命令:
0 X+ k* o6 j: k+ [6 f3 X$ j
8 N7 d; _% B- I5 E- m: G" @! Y4 Kceph osd reweight-by-pg 105 default.rgw.buckets.data( f7 H$ y7 O4 U: u
/ t# n9 X) Z6 Q( n( @( D" l
注:105这个数字貌似不起作用,ceph内部会自己调整; b2 R( z1 u! O( }- N8 c X2 Q
( X$ J. i$ g& n
3 ?% p# u# V1 m7 R/ ^8 \0 _
) g& D' g/ n" T3 O# z3 J
由于每次手动观察比较麻烦,可以通过计算每次调整后的个osd上pg数的方差来判断效果,如果每次反差在减小,说明分布相对更均匀一些:* ?! O/ U: n ^$ N) N8 r
4 u( q" c' c- v( t& R1 t; d
ceph pg dump|grep '^23\.'|awk -F ' ' '{print $1, $15}'|awk -F "[ ]|[[]|[,]|[]]" '{print $3, $4}'|tr -s ' ' '\n'|sort|uniq -c|sort -r|awk '{printf("%s\n", $1)}'|awk '{x[NR]=$0;s+=$0;n++} END{a=s/n;for(i in x) {ss += (x-a)^2} sd = sqrt(ss/n); print "SD = "sd}', r" d3 c$ X& @* ~. }+ |. L
dumped all in format plain0 a, }4 `; Z- R' B2 \: m6 x
SD = 8.71607; |; H# ~! `, U4 j6 b' g Y
7 @ l$ X& e! j: @% b$ `
- d$ v: @/ O2 b& e
, h0 j, f: E. L, B; i% x# s3 x
/ Z, ?' Y- O5 a. n* d6 O
8 K2 a( w; x4 \5 }4 I" D! y4 E; V/ I' [+ ?9 c7 M0 s
CEPH 数据不一致时,需要对ceph pg的数据进行平衡# a% |. v4 _* d6 C7 ?
/ q. C7 m& R' n% v1:检查数据分布是否均衡
2 }, [ A( i4 z* H# Y5 N7 \$ a) Y% {3 W
#查看osd使用情况
! [. A' n% e/ }4 X4 S: u! B; R, d) }. |+ j, U8 _
# ceph osd df tree. w# o0 x4 }' _5 I" |6 v
, `7 d8 q* l! U1 Y: w
#查看osd_num,PGS, %USE; l4 j# C6 d; D5 C
/ |1 Q' K. ]2 r7 D1 P
# ceph osd df tree | awk '/osd\./{print $NF" "$(NF-1)" "$(NF-3) }'
: F" R u% h# j: ] _) \0 I* p3 }' M* w9 \7 G7 J1 g" C7 F
osd.0 up 0.925 R( B' i3 s: F
! w% f- a e* F: A& Uosd.3 up 1.02
* U1 z' Q8 X( G [ w5 ]; e$ P6 K& w
osd.1 up 0.907 N: Q! T( q$ ^3 L3 H
9 j+ a+ M2 O! p, r; \4 R; T! Qosd.4 up 1.231 Q9 @4 g1 U- }
* w0 s8 G! w g0 \1 X) \osd.2 up 0.95
+ N% i+ d4 R; K) `! \
~! G0 x' g8 l! y; oosd.5 up 1.03' O% |7 T! E6 w6 [ a
, W7 F7 R; ]1 r5 D#
/ d! V7 \3 H" y+ \0 ` v& j' \3 |" Y# p
2:reweight-by-pg 按归置组分布情况调整 OSD 的权重/ v5 p; O0 G2 ]. h4 a
+ K9 `) J3 `5 M- j9 ^
# ceph osd reweight-by-pg8 _5 V% W$ m5 O0 M8 @1 D8 }
" G! ^0 m' z+ I, I
% G' O6 s1 {/ I1 ^1 C" O: N7 B% H# l2 k
) X$ Z8 {$ x9 O9 YEX:( _' {7 w9 c# Z; ]+ a# _
" P. @2 a. W: i. [. {& L( Q' m $ceph osd reweight-by-pg
. h7 p# S8 i4 J$ U$ Q; f' } moved 35 / 4032 (0.868056%) #35个PG发送迁移" U* k' Z$ M8 _; Q9 g) b
avg 115.2 #每个OSD承载的平均PG数目为115.28 a9 a5 l# e4 e: |, f
stddev 10.378 -> 9.47418 (expected baseline 10.5787) #执行本次调整后, 标准方差将由10.378变为9.47418
}$ t( z7 K' q5 X( w F min osd.15 with 93 -> 92 pgs (0.807292 -> 0.798611 * mean) #当前负载最轻的OSD为osd.15,只承载了93个PG, 执行本次调整后,将承载92个PG
) P# M; W8 e8 y) H- i# C5 V4 i& v" Y max osd.6 with 141 -> 132 pgs (1.22396 -> 1.14583 * mean) #当前负载最重的OSD为osd.6, 承载了141个PG, 执行本次调整后,讲承载132个PG
( F8 S, S! e7 [, z oload 1203 |: R6 a; N6 Z$ z) N
max_change 0.05$ {4 v) v% K8 S7 w% h
max_change_osds 41 ]/ _9 c- u; o* F- W% I) G2 @: s
average_utilization 21.1365
1 P0 G1 e$ E1 L: q- D" x- z overload_utilization 25.3638) [. s6 _5 |& Y. A$ w
osd.6 weight 1.0000 -> 0.9500 #执行本次调整后,对osd.6,osd.23,osd.7,osd.27的reweight进行调整
3 V+ J) K+ l# ?2 K2 i' l- R1 I osd.23 weight 0.8500 -> 0.9000: F$ P n1 a% _/ j3 ^. W& m: r! {
osd.7 weight 0.9000 -> 0.9500# @2 g9 w8 G3 y4 K) R- a; ^0 Z
osd.27 weight 0.8500 -> 0.9000
$ E! N, P% K$ ^$ z9 C
$ M" \3 Y% G! p3 h
4 w+ H Y+ ^" \+ x. ^9 Z% m( l3 l @' E
3: reweight-by-utilization 按利用率调整 OSD 的权重4 g' }$ V+ [2 X* G1 y2 Z4 C& ^" q( L% h
# v* G& s& M$ k2 Z* b' G" ?; | b! g
# |5 z) V$ d/ u
/ K( q% \4 U9 o, r- w/ V0 G# ceph osd reweight-by-utilization
5 L" v2 m. j/ c7 F0 t
% t; a+ Q! H, C! J& [% N1 mmoved 10 / 843 (1.18624%) #10个PG发送迁移: M. k$ B9 F% U. A$ `! U
' c- V0 E' {) w2 b1 J
avg 140.5 #每个OSD承载的平均PG数目为140.5% j" L) b2 [ J# ?7 [
: k2 r Y9 b; w( e! }stddev 8.69387 -> 12.339 (expected baseline 10.8205) #执行本次调整后, 标准方差将由8.69387 变为12.339" w1 o( N% K; h: `; ]$ I
+ V9 @/ \/ b! }9 R4 \' `$ ?
min osd.3 with 127 -> 127 pgs (0.903915 -> 0.903915 * mean) #负载最轻的OSD为osd.3,只承载了127个PG, 执行本次调整后,将承载127
* I+ M3 E8 q4 I" @: v$ C
; x" Y3 f0 o5 [3 t* u- l! k: {' l# K: Lmax osd.0 with 154 -> 154 pgs (1.09609 -> 1.09609 * mean) #负载最重的OSD为osd.0, 承载了154个PG, 执行本次调整后,讲承载154个PG/ Q; }, l& h! \$ K
5 ]8 z1 W% h6 W* A
1 h2 u+ T/ P( ?# U% _1 g
9 i: k/ q$ q; L% H: ~oload 1204 E, Z% Y5 a& I2 p9 }' z+ S5 @; u
) F9 k8 c6 w- _! D* G) t# ?max_change 0.05* e% c% `2 \7 h% G& ~) p+ n
; f# i, r- J2 e6 x# M- Y2 [max_change_osds 4
) o8 H2 z4 H% K* J2 S. H1 G% q0 k6 b, n! N! f! Z: m6 ?5 i. j
average_utilization 0.09041 J0 o b' X5 X, A2 g
, W% G1 Q( w0 [/ Y Q Q
overload_utilization 0.1084' w0 z+ m) l/ m
6 Y2 N/ D: p3 T6 C j5 A) k
osd.4 weight 0.9500 -> 0.90002 F+ P5 B( ~; h' \1 U$ _
' P, a% M6 ^. r+ f5 Q* X# Y. _
检查数据的平衡状态:
5 X! S, k" }9 ^7 `
! ?. I G W7 ]! w$ U# ceph -s
$ X3 [( S" I$ Y7 h" ~1 g$ r( n2 g8 C0 q4 |3 k7 y" l
/ e/ a' Q9 l3 s1 [- x I6 @" Z) B: T K/ [. o( R2 s4 K
4: 数据均衡后还原权重1 E" h: B/ q. B" [! W. I
+ r. O) e; j& ?' {7 P! p* `
#统计osd_num, REWEIGHT- o' f: D0 J" f1 b
- m6 Y3 m* Q, w- e. ?4 g
[root@node-10 ~]# ceph osd df tree | awk '/osd\./{print $NF" "$4 }'' Q& Y- X/ `0 ~4 Z0 _6 M9 k# T6 ^
' S9 [% R) s4 T) j' p
osd.0 1.00000
3 s5 z" z# D$ Z2 X% m0 ?, G
6 r( i! W1 y+ a) W( Q- @osd.3 1.000009 M! m: x0 F6 o; F1 O, I
1 U3 J8 H! J2 Q; uosd.1 1.00000" B8 Z7 v0 i3 |. c% I& z; q
& i# o1 o* q; G4 U& n/ M! nosd.4 0.90002
1 _3 L! \6 L1 N/ a
' h2 W- t2 z% v0 s4 S' w4 n6 @9 Iosd.2 1.00000 f( ^2 T, |( k' V: w0 _2 i, I0 t& P
0 c" E7 h# |* Kosd.5 1.00000: z2 C5 z/ a# q+ \2 t
5 \8 g0 M' s3 G[root@node-10 ~]#
; x) [: K w6 j7 q* t5 A B; A) c; @- a
& `7 j6 f; K. n0 E! I3 c9 h
4 M% `# g- E2 H( v' S# N#依次设置osd权重为默认值,1.0
( O$ i4 ]% n+ ^% ?2 S* [; h- ?
$ R/ n9 `7 N4 y+ ~1 P+ `#ceph osd reweight {id} {weight}( z! G/ Y; M& P! R# ^7 c
8 t6 E1 R+ f2 c5 f- Y: u# h: K5 e" ]
#说明:osd weight的取值为0~1' ?7 e' s! W3 O5 w) j8 v$ C
: v4 p: Y7 E9 I. W; K/ t# I/ \
& F r+ V& N9 Z/ i( h' @" W
9 p6 y; D% }0 Y& ~. Z7 [* w- U
$ ceph osd reweight 5 1.02 [- Y' q5 T5 P7 y% f+ L( v A; ^
1 j2 R" o N6 s! v- ^4 P7 A L% S& n& j" Z7 v# E, I& p1 T
通过ceph osd tree 可以查看到 weight 和reweight的值, F2 q6 B% s: U8 k
% N% I3 W* t0 q( x
1 [6 c" A& N$ d
5 l) P5 G4 Y' _( w6 b/ f+ w* K, e/ l% Q* j |
|