节点主动重启维护
9 D1 X9 {6 `: r d$ Y) O- E8 g8 Z准备: 节点必须为 health: HEALTH_OK 状态,操作如下:
S9 ^4 l1 v1 _. w1 K rsudo ceph -s5 ?) z: _: l e' D
sudo ceph osd set noout- `7 d0 u! X- H( A' B" t9 k
sudo ceph osd set norebalance
0 a+ ~: T6 { O3 Z$ W8 w6 g重启一个节点:
, f7 M/ D% g/ O7 o8 @# esudo reboot1 Q0 E* M: \6 |8 P" B. |
重启完成后检查节点状态,pgs: active+clean 为正常状态:
' Q9 \, E! D: @+ T- ssudo ceph -s
" G8 M. T1 g5 {/ P1 O正常状态,继续重启另一个节点 g: i% [8 F! q4 L* x1 J
所有节点轮流重启后,检查状态正常 active+clean 后,如下设置:
/ F' C0 Y: R/ Q7 n8 Vsudo ceph -s
) |" y* ]: Y; a1 {9 C, ^# c7 x0 osudo ceph osd unset noout1 J' ]/ R7 F, R0 X6 y" x5 s; W
sudo ceph osd unset norebalance' n* Z- I; L; K+ P% E1 s* q# q
; [7 [$ w2 h. V
调整 pg_num 和 pgp_num
* k& p, ]9 ]8 \. ^0 Z0 ~& i. Z ~ceph -s 确保集群状态健康
# Y9 ^( P% h5 }0 H6 P/ upg_num 只能调大,不能调小 Q; `9 \# [6 z
每次按照 2 的 N 次方来调整
5 i/ B; r) a+ t- [! C: q线上有数据的情况下,平滑调整,不要一次调的太猛0 @$ j: U* n3 g6 q0 @: t$ L7 C2 \
先调 pg_num 无问题后,再调 pgp_num
/ k4 M( t9 [& M批量调整所有的 pg_num- a+ o ] K2 z* F5 t1 s
, D: B0 n2 E" S; ]
n=64
; n, ?' O8 N6 k4 q+ u: `5 t. Lfor poolname in poolname pg_num $n ;* L% R/ T* I7 A4 S, `& n" K M9 z
done x' w6 N. |! W8 m4 }$ c
7 X, ] V( y* `# Y" L" W1 Q调整完,检查状态8 t' S; i; f. z4 A0 j7 v
ceph -w
! j6 O$ ]- r2 T4 ?: u$ y# v/ o: G( J! k, V% J
批量调整所有的 pgp_num9 D0 ]1 n5 ]# j+ X
n=32
, H7 A/ Z8 w% u% S8 R! i! hfor poolname in $(rados lspools); do
7 E6 g( u9 `) m) r# a3 M) Uceph osd pool set $poolname pgp_num $n ;6 |2 C: g5 g; b. m: V q8 p
done0 T" i2 L1 [4 h% {( R3 Q' @9 |
* N1 t j/ J1 r- x
删除默认 pool,增加其他命名的 pool
& r6 y. e( P6 P+ H: Q# Q# I# idata
+ G; \; L3 c0 w# W; tmetadata9 _2 s' Z5 r, r( |8 H* i
rbd
; |5 q9 {4 Q; ]4 S5 O9 s
( c. |6 f; E! d, F% A2 k' q% V- Rceph osd pool create vmspool 8
1 g {4 J2 Q/ t: x4 `ceph osd pool set vmspool pg_num 32
" Y5 L. [, O) V T W) Sceph osd pool set vmspool pgp_num 32 + j; x3 D3 N& }6 q7 R) a8 V
( |% X, _3 Q; E6 O
( Z% F4 e0 Q- F把已存在的集群的配置收集到 ceph-deploy9 H+ @0 K' L3 k
mkdir -p cluster1
( {" T: m+ }4 p% bcd cluster1
9 V, C& x6 y: A8 }ceph-deploy config pull HOST, ?& D* C8 N( z; S e: [
ceph-deploy gatherkeys HOST
7 x; N3 m; s& R3 ~0 W B所有的 node 增加一块硬盘 /dev/xvde
m# \+ A5 _5 g/ E3 yceph osd status
6 z2 k- E/ F8 |+ u* }node1=host1
) d; O& ?$ L* Y- Unode2=host27 f7 D- G! Q8 @) U$ Y8 s' V
node3=host31 L" N9 [6 ? P: M" h
disk="/dev/xvde"; b* t( y: y/ k) X% D) [
ceph-deploy --overwrite-conf osd create --data $disk $node14 B$ y) w! i$ _' e: O
ceph-deploy --overwrite-conf osd create --data $disk $node2
( G$ i. s3 S: E6 G" }: d0 fceph-deploy --overwrite-conf osd create --data $disk $node3
+ i" s/ N* q+ Q8 X7 T. @
7 H Q* O9 c' C$ ]% L9 ^ r
% S& U3 L; R$ z7 m- R提示 pg 太小" s3 I0 d% J) l3 }" X: w3 ^; ]3 B2 c
ceph -s
- U* p8 J b; T6 L3 j health: HEALTH_WARN6 @3 b5 [* y$ r' F' x+ X% V) V
1 pools have many more objects per pg than average. B& X! b. f, R
ceph health detail
. U; y2 Y% q' a2 M, iHEALTH_WARN 1 pools have many more objects per pg than average
0 f7 v, J# l3 C3 `7 n# DMANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average7 r2 h; X3 P+ l J) z9 o: U( d
pool cn-south-1.rgw.buckets.data objects per pg (2386) is more than 21.115 times cluster average (113)( M0 I7 i9 X$ @3 T( s2 }8 _0 l
pool=cn-south-1.rgw.buckets.data
! M2 m5 W7 i* ~" C. Z2 T7 F6 O5 l6 b; { }
ceph osd pool get poolname pg_num/ d; G) F' j$ b. J* Z
ceph osd pool set poolname pg_num 64* {; b- d7 n d, O
ceph osd pool set poolname pgp_num 64
4 ?" Z6 a* H, h9 I. L
* W5 ~1 N d. ~5 L5 ?% b5 Y7 I0 l" l& e* g2 v8 [
清除临时数据[size=0.8em]¶Deprecated since version 0.52. When you delete objects (and buckets/containers), the Gateway marks the data for removal, but it is still available to users until it is purged. Since data still resides in storage until it is purged, it may take up available storage space. To ensure that data marked for deletion isn’t taking up a significant amount of storage space, you should run the following command periodically: radosgw-admin temp remove) Z$ W! M0 X; B" @9 S+ o$ |
0 K0 |. [2 @, h+ h5 e3 U5 n* {
|