修复ceph分布式存储mon节点异常问题解决过程并重新加入到ceph平台
停止有问题ceph的mon服务:# systemctl stop ceph-mon@host11
删除有问题的mon数据:
# rm -rf /var/lib/ceph/mon/ceph-host11/*
尝试使用rsync方式同步:
# rsync -avz root@host10:/var/lib/ceph/mon/ceph-host10/ /var/lib/ceph/mon/ceph-host11/
-bash: rsync: command not found
# dnf install -y rsync
-bash: dnf: command not found
报错:无法使用rsync方式同步:
使用scp方式复制:
# scp root@host10:/var/lib/ceph/mon/ceph-host10/ /var/lib/ceph/mon/ceph-host11/
scp: /var/lib/ceph/mon/ceph-host10: not a regular file
# scp -r root@host10:/var/lib/ceph/mon/ceph-host10/ /var/lib/ceph/mon/ceph-host11/
kv_backend 100% 8 9.3KB/s 00:00
LOCK 100% 0 0.0KB/s 00:00
CURRENT 100% 17 24.8KB/s 00:00
IDENTITY 100% 37 2.7KB/s 00:00
OPTIONS-9023316 100% 4943 5.9MB/s 00:00
MANIFEST-9024281 100% 4822KB49.2MB/s 00:00
OPTIONS-9024284 100% 4943 6.8MB/s 00:00
9106767.log 100% 14MB51.4MB/s 00:00
9106769.sst 100% 57MB55.0MB/s 00:01
keyring 100% 77 43.2KB/s 00:00
done 100% 0 0.0KB/s 00:00
systemd 100% 0 0.0KB/s 00:00
min_mon_release 100% 3 0.2KB/s 00:00
# ls
donekeyringkv_backendmin_mon_releasestore.dbsystemd
# mv * ..
# ls
# cd ..
# ls
ceph-host10donekeyringkv_backendmin_mon_releasestore.dbsystemd
# ls
donekeyringkv_backendmin_mon_releasestore.dbsystemd
# ll
total 12
-rw-r--r-- 1 root root 0 Mar 28 07:20 done
-rw------- 1 root root77 Mar 28 07:20 keyring
-rw------- 1 root root 8 Mar 28 07:20 kv_backend
-rw------- 1 root root 3 Mar 28 07:20 min_mon_release
drwxr-xr-x 2 root root 157 Mar 28 07:20 store.db
-rw-r--r-- 1 root root 0 Mar 28 07:20 systemd
# cd ..
# ls
ceph-host11
# ll
total 0
drwxr-xr-x 3 ceph ceph 105 Mar 28 07:20 ceph-host11
# chown -R ceph:ceph ceph-host11/
# cd ceph-host11/
# ls
donekeyringkv_backendmin_mon_releasestore.dbsystemd
# ll
total 12
-rw-r--r-- 1 ceph ceph 0 Mar 28 07:20 done
-rw------- 1 ceph ceph77 Mar 28 07:20 keyring
-rw------- 1 ceph ceph 8 Mar 28 07:20 kv_backend
-rw------- 1 ceph ceph 3 Mar 28 07:20 min_mon_release
drwxr-xr-x 2 ceph ceph 157 Mar 28 07:20 store.db
-rw-r--r-- 1 ceph ceph 0 Mar 28 07:20 systemd
启动mon服务:
# systemctl start ceph-mon@host11.service
Job for ceph-mon@host11.service failed because start of the service was attempted too often. See "systemctl status ceph-mon@host11.service" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed ceph-mon@host11.service" followed by "systemctl start ceph-mon@host11.service" again.
根据提示修改:
# systemctl reset-failed ceph-mon@host11.service
再次启动:
# systemctl start ceph-mon@host11.service
检查状态:
# systemctl status ceph-mon@host11.service
● ceph-mon@host11.service - Ceph cluster monitor daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2026-03-28 07:22:00 CST; 11s ago
Main PID: 68995 (ceph-mon)
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@host11.service
└─68995 /usr/bin/ceph-mon -f --cluster ceph --id host11 --setuser ceph --setgroup ceph
Mar 28 07:22:00 host11 systemd: Started Ceph cluster monitor daemon.
Mar 28 07:22:06 host11 ceph-mon: 2026-03-28 07:22:06.757 7fc8eef35700 -1 mon.host11@2(electing) e3 failed to get devid for : udev_device_new_from_sub...iled on ''
Mar 28 07:22:06 host11 ceph-mon: 2026-03-28 07:22:06.793 7fc8eef35700 -1 mon.host11@2(electing) e3 failed to get devid for : udev_device_new_from_sub...iled on ''
Hint: Some lines were ellipsized, use -l to show in full.
#
#
# ceph -s
cluster:
id: 9d22e36a-2bdd-4d2d-8394-ead777
health: HEALTH_WARN
3 nearfull osd(s)
5 pool(s) nearfull
5 daemons have recently crashed
services:
mon: 3 daemons, quorum host09,host10,host11 (age 22s)
mgr: host09(active, since 6w), standbys: host11, host10
osd: 40 osds: 40 up (since 6w), 40 in (since 6w)
data:
pools: 16 pools, 3072 pgs
objects: 7.15M objects, 27 TiB
usage: 65 TiB used, 83 TiB / 147 TiB avail
pgs: 3069 active+clean
3 active+clean+scrubbing+deep
io:
client: 16 MiB/s rd, 27 MiB/s wr, 339 op/s rd, 380 op/s wr
mon问题解决。
页:
[1]