节点主动重启维护* V" B2 }, [$ {/ k( G
准备: 节点必须为 health: HEALTH_OK 状态,操作如下:
3 [ B* l+ e- B6 R3 P8 }: X* Esudo ceph -s5 J3 F+ U) J& C
sudo ceph osd set noout% d6 W, R/ u0 x' J
sudo ceph osd set norebalance
) ]/ E6 f' E! w重启一个节点:" ?9 ]0 i9 \; _. |) s
sudo reboot7 u" q7 i- J% B& i
重启完成后检查节点状态,pgs: active+clean 为正常状态:
: A% W H8 p/ Z: T9 y# Jsudo ceph -s
; C' a6 B/ i* Q8 N, o# ~1 R正常状态,继续重启另一个节点
$ Y. O2 @: L! A/ E( d0 p+ X! Z4 X6 [1 T) I所有节点轮流重启后,检查状态正常 active+clean 后,如下设置:
: l. i4 `+ b5 i8 Wsudo ceph -s1 h8 f" v1 Z0 G0 {, V: I! D, M
sudo ceph osd unset noout
4 R+ e0 X2 @5 y3 Csudo ceph osd unset norebalance$ R- d' m' X/ q4 r( X* U9 J1 F
5 u% V( C! A/ Z; [
调整 pg_num 和 pgp_num
3 _6 c- F' |7 ^$ s( {& xceph -s 确保集群状态健康; q9 d, r8 q, M: j1 v3 n
pg_num 只能调大,不能调小
1 a* {5 d# J \5 P- W+ {& p! W每次按照 2 的 N 次方来调整
. j5 ^) L a1 t+ z线上有数据的情况下,平滑调整,不要一次调的太猛
$ e9 C' H# e9 [9 F先调 pg_num 无问题后,再调 pgp_num1 V- r9 {: ^; q9 }% {/ }& L
批量调整所有的 pg_num, [+ B* s* o: w9 T' ]' Z2 ^: S
! U' b: K1 t8 w# i
n=64" Q1 M8 [+ A' k/ H4 S, P. a4 G. D
for poolname in poolname pg_num $n ;, m6 |% f5 `* V, e
done
: E4 F& n4 Z' h0 f7 u6 B/ u1 E8 R7 f! ^; x6 b5 w# x
调整完,检查状态+ [& s' m. h7 d8 y5 E) S- g
ceph -w: [5 l0 Y7 ~8 B; N' }+ ^8 W, w
6 j9 q7 W7 o5 a批量调整所有的 pgp_num1 H% f4 g2 f6 ]; I7 \ X5 I
n=320 Z* y2 Q- L- d) A" p$ Z
for poolname in $(rados lspools); do
, n {8 P- t3 o8 i: iceph osd pool set $poolname pgp_num $n ;& J2 G1 l, A% q+ E4 a9 \9 Y" U2 j9 l
done& z8 Z8 T# c; A6 I0 _# I/ U
2 a* i, q' I3 o' F/ X* _删除默认 pool,增加其他命名的 pool
5 f5 X7 Z4 Y. jdata3 q3 T! u4 u3 x% ^- E
metadata' B Q6 Y: ^; w3 r
rbd
4 ?1 ?' s& h+ ]% R( w5 d, G0 w* g, M7 F8 A, P) _8 p7 _. H& ~
ceph osd pool create vmspool 8
$ p3 M; d% P0 u8 J5 vceph osd pool set vmspool pg_num 32 4 F: a- `; ^5 E/ A% y+ T: x+ Y
ceph osd pool set vmspool pgp_num 32 7 t6 M" n, H; h: C- V& J. I
$ y n9 z2 \* h. P* b6 Y
# `$ V5 Q( s6 [8 E
把已存在的集群的配置收集到 ceph-deploy5 L- i3 C9 s8 h" r* D+ n
mkdir -p cluster1
, B5 X, j% C: g( W3 O6 q. i0 Bcd cluster1
& v# _( T9 T5 k7 t% P! e) Qceph-deploy config pull HOST/ P2 o( ]* C9 Z- ^2 U
ceph-deploy gatherkeys HOST
1 {. a' C. P+ Q9 T: k. S所有的 node 增加一块硬盘 /dev/xvde% d6 D" B) e C5 B
ceph osd status
8 U5 `1 [6 L6 a: I/ d6 enode1=host1
l* V. t4 W$ j$ ~3 V! T3 r! p' Onode2=host2
, {! S: `$ s# @; I! ~7 Rnode3=host3
& |6 h. x2 L5 N7 \disk="/dev/xvde"
7 J8 F3 M$ x, y# g7 Rceph-deploy --overwrite-conf osd create --data $disk $node1
) c' K& f3 L' hceph-deploy --overwrite-conf osd create --data $disk $node2
& ?# R) ]/ p7 O7 F/ jceph-deploy --overwrite-conf osd create --data $disk $node30 _. j. U" u# j& h
' r5 V- {5 z2 k5 |& G* y- X& ]- L6 B1 Y: _4 o- r! v/ M
提示 pg 太小5 c6 M$ R5 S2 Z: g
ceph -s& X+ q4 n% g- t. l
health: HEALTH_WARN
/ }+ U& H5 N$ _. A. Y, h$ J+ t" |% a& p 1 pools have many more objects per pg than average2 n2 a/ l( F5 B
ceph health detail
4 ? }9 Q F( l( HHEALTH_WARN 1 pools have many more objects per pg than average. |. n" I3 l, O; z2 s- V
MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
) ~4 w9 v& M$ _7 ?* m! I; l6 O4 S; W pool cn-south-1.rgw.buckets.data objects per pg (2386) is more than 21.115 times cluster average (113)
- m/ L: ?. v5 r* x( gpool=cn-south-1.rgw.buckets.data
: n% q$ Q; S" s4 g5 Y$ P
Q7 ] R# u9 s8 s8 l2 @% g% w- gceph osd pool get poolname pg_num( V# n, I6 T; }! h! G/ K
ceph osd pool set poolname pg_num 64
1 I f( {/ N. B& x$ s+ Sceph osd pool set poolname pgp_num 64 " a0 c9 S R, w, U( V" M( x& q
. m& t, F( ^8 v. a. j9 D
9 ~ ]# _; G" X& y清除临时数据[size=0.8em]¶Deprecated since version 0.52. When you delete objects (and buckets/containers), the Gateway marks the data for removal, but it is still available to users until it is purged. Since data still resides in storage until it is purged, it may take up available storage space. To ensure that data marked for deletion isn’t taking up a significant amount of storage space, you should run the following command periodically: radosgw-admin temp remove% k( ^: f+ u& m
$ G7 ]' P8 v# | p/ H6 ?9 G |