节点主动重启维护 U' `. ~. z0 B" B
准备: 节点必须为 health: HEALTH_OK 状态,操作如下:
$ j6 I+ h* L/ _2 Zsudo ceph -s. [0 z2 v9 m+ ]6 E, I4 n
sudo ceph osd set noout$ A3 |! T8 _1 C
sudo ceph osd set norebalance
3 H7 i h$ p9 }& z- W& _重启一个节点:
4 P" e- L: F0 c1 Rsudo reboot
& {) N, D- X. J重启完成后检查节点状态,pgs: active+clean 为正常状态:
$ \! x% ~) `5 D! ]sudo ceph -s4 _0 k4 a9 E$ ?" k% z% m
正常状态,继续重启另一个节点
4 Q* g7 p+ P) c5 M/ U所有节点轮流重启后,检查状态正常 active+clean 后,如下设置:- T" Q# N" N* X, L# l
sudo ceph -s- g8 Q% _* b0 v% I$ w& g
sudo ceph osd unset noout( o/ {% I3 [- q. U
sudo ceph osd unset norebalance& L; P/ R5 o& d
- F3 [6 o! x2 U) N4 C' o调整 pg_num 和 pgp_num) B7 y, r' E' }+ D# S
ceph -s 确保集群状态健康& [( z9 v+ s2 B I. h7 Y, q
pg_num 只能调大,不能调小1 W5 X% {4 P" o% M! ^
每次按照 2 的 N 次方来调整; I9 S& t7 J2 F5 o' Z
线上有数据的情况下,平滑调整,不要一次调的太猛
$ @. f; t% A0 t先调 pg_num 无问题后,再调 pgp_num
" _6 R% Q7 @/ k批量调整所有的 pg_num3 H* k6 O- {+ r9 j+ P' E
/ [- i3 m4 R! }, U U& d, j
n=64% \9 R% C2 y5 y8 y; P: r
for poolname in poolname pg_num $n ;* `! |/ K x- @, d* k
done
% Z; _+ y' P. I; s: x+ G: g2 I# A5 C
调整完,检查状态
9 H: u J( J3 S5 L# rceph -w
4 a; E/ S" K; x" o1 v/ d; v2 C
6 M1 Z5 ], E3 a7 ]: k: Q! y批量调整所有的 pgp_num& K' ?; D/ f2 O" d
n=32
6 @- m8 ]2 ?4 Z3 F$ Qfor poolname in $(rados lspools); do ' Q5 w9 T9 u5 c: V/ _$ U
ceph osd pool set $poolname pgp_num $n ;
4 N" S2 N" c" u( U6 Z$ Q" r }done
+ V# N2 b7 |0 l$ o3 J. Z) ~. Y) o3 Q. g* Y
删除默认 pool,增加其他命名的 pool: U; d0 j" N6 q6 O9 X) I
data% v) s3 Q+ k4 }) I$ f! `1 Y( G
metadata
' k z4 p" @( B! prbd
! C' r2 j! \5 N7 K+ ^; @1 E2 R ^! R
ceph osd pool create vmspool 8
% q% |1 P; r# l" b2 K: j$ Oceph osd pool set vmspool pg_num 32 : K* a- ^5 r+ l7 C3 c" R9 Q
ceph osd pool set vmspool pgp_num 32 9 S' T a. X4 y# Q2 S2 X3 M
0 }7 b& _3 [% L Z8 C# i v6 H; O- g/ R, L7 }1 b
把已存在的集群的配置收集到 ceph-deploy4 z; Q% C8 Y' N6 M) Y
mkdir -p cluster1
0 t5 B) Z- A& T$ acd cluster1* X& ^. e; c/ ]* N+ \9 V5 P9 u& Z( A
ceph-deploy config pull HOST
' f) V7 K7 P; b9 ]3 E$ Pceph-deploy gatherkeys HOST
) ^9 c( t5 d/ g7 f, Z所有的 node 增加一块硬盘 /dev/xvde
& s0 l9 m& D) U I* d7 jceph osd status
$ W6 G5 |4 \7 c4 rnode1=host1( `! Z, }$ T3 b" \3 A4 \
node2=host2
2 r7 B5 K* Y# D+ r. H) F8 @node3=host35 c5 J* b; W u/ U" f! _! x
disk="/dev/xvde". D/ y- D( O3 S4 I
ceph-deploy --overwrite-conf osd create --data $disk $node1
3 o8 f, R- N/ Bceph-deploy --overwrite-conf osd create --data $disk $node2
2 s7 M8 v/ M( N2 K: kceph-deploy --overwrite-conf osd create --data $disk $node3
" l0 L. G) f6 v5 ?, H9 j1 ] m: `9 D5 |" b& [2 P! z
+ h d5 @; ]1 ^0 p, H# [' G
提示 pg 太小8 E0 z' V! k$ ]7 Z; ?$ l
ceph -s' T% P8 }$ e! l( {% V
health: HEALTH_WARN
6 j n3 J, R3 ~4 s& P; F' @% A 1 pools have many more objects per pg than average
. g3 I' @6 R! @ Yceph health detail' y3 t, g$ P Q0 k
HEALTH_WARN 1 pools have many more objects per pg than average. `4 j0 z) X( u+ H* e" J5 v$ ~
MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
6 ^: ^( ^4 R* j0 i& \4 e pool cn-south-1.rgw.buckets.data objects per pg (2386) is more than 21.115 times cluster average (113)7 h. u# {/ V3 }% x- {( t+ c
pool=cn-south-1.rgw.buckets.data5 x) v" p6 \$ ` O6 i6 x2 \
, c, S* Y6 Z* A: C! eceph osd pool get poolname pg_num
& F- t2 x8 W, L$ ^$ j* rceph osd pool set poolname pg_num 64
; k: p' d# E8 n, f5 c' lceph osd pool set poolname pgp_num 64
. |' S6 U, M( h* p# S2 @0 Z
" V1 G9 C* e: {1 D+ a8 z6 k2 ` t
' C- X# B: F' N0 ~# [9 u5 w清除临时数据[size=0.8em]¶Deprecated since version 0.52. When you delete objects (and buckets/containers), the Gateway marks the data for removal, but it is still available to users until it is purged. Since data still resides in storage until it is purged, it may take up available storage space. To ensure that data marked for deletion isn’t taking up a significant amount of storage space, you should run the following command periodically: radosgw-admin temp remove- l9 }1 V, w2 u# p
4 X/ Q' C2 U! d; x4 l |