ceph 优化和运维注意事项

admin · 发表于 2021-12-5 21:50:56

节点主动重启维护
准备：节点必须为 health: HEALTH_OK 状态，操作如下：
sudo ceph -s
sudo ceph osd set noout
sudo ceph osd set norebalance
重启一个节点：
sudo reboot
重启完成后检查节点状态，pgs: active+clean 为正常状态：
sudo ceph -s
正常状态，继续重启另一个节点
所有节点轮流重启后，检查状态正常 active+clean 后，如下设置：
sudo ceph -s
sudo ceph osd unset noout
sudo ceph osd unset norebalance

调整 pg_num 和 pgp_num
ceph -s 确保集群状态健康
pg_num 只能调大，不能调小
每次按照 2 的 N 次方来调整
线上有数据的情况下，平滑调整，不要一次调的太猛
先调 pg_num 无问题后，再调 pgp_num
批量调整所有的 pg_num

n=64
for poolname in poolname pg_num $n ;
done

调整完，检查状态
ceph -w

批量调整所有的 pgp_num
n=32
for poolname in $(rados lspools); do
ceph osd pool set $poolname pgp_num $n ;
done

删除默认 pool，增加其他命名的 pool
data
metadata
rbd

ceph osd pool create vmspool 8
ceph osd pool set vmspool pg_num 32
ceph osd pool set vmspool pgp_num 32

把已存在的集群的配置收集到 ceph-deploy
mkdir -p cluster1
cd cluster1
ceph-deploy config pull HOST
ceph-deploy gatherkeys HOST
所有的 node 增加一块硬盘 /dev/xvde
ceph osd status
node1=host1
node2=host2
node3=host3
disk="/dev/xvde"
ceph-deploy --overwrite-conf osd create --data $disk $node1
ceph-deploy --overwrite-conf osd create --data $disk $node2
ceph-deploy --overwrite-conf osd create --data $disk $node3

提示 pg 太小
ceph -s
health: HEALTH_WARN
1 pools have many more objects per pg than average
ceph health detail
HEALTH_WARN 1 pools have many more objects per pg than average
MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
pool cn-south-1.rgw.buckets.data objects per pg (2386) is more than 21.115 times cluster average (113)
pool=cn-south-1.rgw.buckets.data

ceph osd pool get poolname pg_num
ceph osd pool set poolname pg_num 64
ceph osd pool set poolname pgp_num 64

清除临时数据[size=0.8em]¶

Deprecated since version 0.52.

When you delete objects (and buckets/containers), the Gateway marks the data for removal, but it is still available to users until it is purged. Since data still resides in storage until it is purged, it may take up available storage space. To ensure that data marked for deletion isn’t taking up a significant amount of storage space, you should run the following command periodically:

radosgw-admin temp remove

		自动登录	找回密码
密码			注册

ceph 优化和运维注意事项

浏览过的版块