节点主动重启维护% `: q( L' M4 k9 n
准备: 节点必须为 health: HEALTH_OK 状态,操作如下:
! O- O; @2 e1 _sudo ceph -s
( {' R* q. m8 x$ w8 Bsudo ceph osd set noout
- u/ u3 z" r. ^5 K8 @3 z3 s4 msudo ceph osd set norebalance
% \* n! ? L6 D8 ~0 l3 s重启一个节点:6 [& R9 i+ ~4 M; k5 s" O
sudo reboot
1 r4 [! D8 _2 q0 y重启完成后检查节点状态,pgs: active+clean 为正常状态:( b2 x; ?0 U# y9 n* p. |5 b
sudo ceph -s' }" `9 T: P9 |( r- t
正常状态,继续重启另一个节点+ ?+ {# b I8 R. i4 g$ `+ W
所有节点轮流重启后,检查状态正常 active+clean 后,如下设置:& p" ^& P) X5 Q" ^
sudo ceph -s2 v Q3 o% {& ?( \1 ^5 G
sudo ceph osd unset noout
$ Q4 V% i% R- V3 lsudo ceph osd unset norebalance
1 }* h# B7 z+ T" w: \$ O
: |9 S( G/ W! M8 [% A8 ~9 ^. w/ h调整 pg_num 和 pgp_num
' Z. P* j+ R A9 d( J% b" wceph -s 确保集群状态健康
4 `$ H1 Q! W+ I" wpg_num 只能调大,不能调小0 Y; ]0 _! X. F
每次按照 2 的 N 次方来调整
! ~! Y3 t/ ]* X4 C线上有数据的情况下,平滑调整,不要一次调的太猛
+ ~! f D" O# n- B先调 pg_num 无问题后,再调 pgp_num4 k# Y g, N# i1 m
批量调整所有的 pg_num
3 I b& M! x% }, K0 Y- G' r8 d# O) `5 z. S% V! n/ j
n=64
+ Y. v* r0 \# F* kfor poolname in poolname pg_num $n ;
8 l& g+ f/ r/ `0 L Mdone0 y D2 O( M& M/ d J
6 b/ e6 [( Q" I, c, X+ z调整完,检查状态% j* k" {% I8 o G' F% W/ _
ceph -w: ?0 s/ e+ G) {5 f8 @ t: \
, \+ T8 k% x8 J5 k6 v/ n" |批量调整所有的 pgp_num& g X+ R+ `. d. [: H# ?
n=32; f3 D" O0 K) c
for poolname in $(rados lspools); do $ B8 ?; O- i y" h- p; y9 x
ceph osd pool set $poolname pgp_num $n ;/ H( z' q$ G2 |3 P
done
1 _% ]! O, d, h" H. _ t
# V _6 v. a9 D6 u1 F3 r删除默认 pool,增加其他命名的 pool
% O& q/ \$ I" [% _data
2 ^! E: v% Q' ^* ]# ~0 Smetadata) T0 {; K0 y6 @) \& X$ O4 D
rbd% w! ~. \/ e k' l
' \9 U7 E% X7 jceph osd pool create vmspool 87 @/ ]/ w! L2 l6 q
ceph osd pool set vmspool pg_num 32
, ]2 T7 F6 T. H+ h' Cceph osd pool set vmspool pgp_num 32 . t% f: N! }; E f" b
4 a2 @5 G% r1 s, M$ Z4 K" u, j1 @
, G6 {( A3 G3 W0 G把已存在的集群的配置收集到 ceph-deploy
# h2 c# Q# y N: o$ k7 i" umkdir -p cluster1
! F: ]/ }' h3 a3 r1 U+ `0 S0 qcd cluster1
- L4 M* E! h3 tceph-deploy config pull HOST
7 I0 o3 t8 J) l/ ~ceph-deploy gatherkeys HOST( G# M+ E" e! X: R/ m* Y
所有的 node 增加一块硬盘 /dev/xvde$ g8 k7 p6 U8 L% P+ S4 X6 Y' ~# H
ceph osd status
5 A6 S% x! X: A7 H w( a! T: {node1=host1* r3 \# {* N! P; I% U
node2=host2
; u8 _1 ]1 d0 w" ]node3=host30 i8 m. h: X: b! Q5 ?/ y
disk="/dev/xvde"
' L# g8 [9 [# [ U7 d# w4 Yceph-deploy --overwrite-conf osd create --data $disk $node1) m$ u( i1 i: S( O0 d( ]( M: I0 ^0 W1 |2 x
ceph-deploy --overwrite-conf osd create --data $disk $node2
3 V. c# j8 D: [ s @1 Mceph-deploy --overwrite-conf osd create --data $disk $node3
$ H! H5 {8 {1 Q
% Q; i# @; e2 A1 I4 h" g& I4 Q% e7 L1 e, G5 C2 @+ Z9 d- V2 B) S+ q; w" [
提示 pg 太小
2 f/ l( g; a! S9 M2 Zceph -s
: q, D* `/ p( s: P) h" L health: HEALTH_WARN9 s% T4 j7 c% r$ q1 u7 Q5 `
1 pools have many more objects per pg than average
. q4 U( s4 n/ n0 T5 xceph health detail
2 @% u1 H! D5 {3 BHEALTH_WARN 1 pools have many more objects per pg than average; A0 A% O E* N0 h! t
MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
0 @7 a4 J% c/ X; j pool cn-south-1.rgw.buckets.data objects per pg (2386) is more than 21.115 times cluster average (113)% [9 I: J6 V; x1 [9 f
pool=cn-south-1.rgw.buckets.data
+ }7 {9 K0 c3 }$ d; j
1 |& `" @6 L5 | \ceph osd pool get poolname pg_num
4 [) k+ a& B9 q1 M7 j' T2 k9 B, Tceph osd pool set poolname pg_num 64& J* K+ S; D* A7 O2 h7 V
ceph osd pool set poolname pgp_num 64
! k- U* _/ A% h3 X' ?; d6 ?- O9 A# ~; m
' b( l! U |/ X( k& ?& f2 i$ S R& G清除临时数据[size=0.8em]¶Deprecated since version 0.52. When you delete objects (and buckets/containers), the Gateway marks the data for removal, but it is still available to users until it is purged. Since data still resides in storage until it is purged, it may take up available storage space. To ensure that data marked for deletion isn’t taking up a significant amount of storage space, you should run the following command periodically: radosgw-admin temp remove! K! P- C2 x; l2 K v
1 H) z2 c" D0 q( D) ~9 q V' R j, y1 A |