HEALTH_WARN 1 failed cephadm daemon(s)
HEALTH_WARN 1 failed cephadm daemon(s)ceph health detail
HEALTH_WARN 2 failed cephadm daemon(s)
CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
daemon alertmanager.controller on controller is in error state
daemon grafana.controller on controller is in error state
经过排查,应该是系统层间安装过ceph集群,没有清理干净。新版本还不知道怎么全部清除,还在测试中。 # ceph status
cluster:
id: 4c1f752a-ed1a-11eb-8ce5-0025908471d6
health: HEALTH_WARN
2 failed cephadm daemon(s)
clock skew detected on mon.compute01
services:
mon: 2 daemons, quorum controller,compute01 (age 3h)
mgr: compute01.getqhn(active, since 3h), standbys: controller.kxfttd
osd: 3 osds: 3 up (since 3h), 3 in (since 3h)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 1.2 TiB / 1.2 TiB avail
pgs: 1 active+clean
# systemctl status ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6
ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@crash.compute01.service ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@osd.0.service
ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@mgr.compute01.bunbzp.service ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6.target
ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@node-exporter.compute01.service
# systemctl status ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6
ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@crash.compute01.service ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@osd.0.service
ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@mgr.compute01.bunbzp.service ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6.target
ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@node-exporter.compute01.service
# systemctl disable ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@crash.compute01.service ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@osd.0.service ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@mgr.compute01.bunbzp.service ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6.target ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@node-exporter.compute01.service
Removed /etc/systemd/system/ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6.target.wants/ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@crash.compute01.service.
Removed /etc/systemd/system/ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6.target.wants/ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@mgr.compute01.bunbzp.service.
Removed /etc/systemd/system/ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6.target.wants/ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@node-exporter.compute01.service.
Removed /etc/systemd/system/ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6.target.wants/ceph-1e87bca4-e7ce-11eb-aa90-0025908471d6@osd.0.service.
# cd /var/lib/ceph
# cd /var/lib/ceph
# rm -rf 1e87bca4-e7ce-11eb-aa90-0025908471d6/
# ceph status
cluster:
id: 4c1f752a-ed1a-11eb-8ce5-0025908471d6
health: HEALTH_OK
services:
mon: 2 daemons, quorum controller,compute01 (age 84s)
mgr: compute01.getqhn(active, since 30s)
osd: 3 osds: 3 up (since 14s), 3 in (since 4h)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 1.2 TiB / 1.2 TiB avail
pgs: 1 active+clean
问题竟然解决。
admin 发表于 2021-7-25 20:59
# ceph status
cluster:
id: 4c1f752a-ed1a-11eb-8ce5-0025908471d6
可能原因是因为生产了一个不一样的cluster_id导致信息不一致不对称引起。如果一个纯净的系统也出现这个问题,需要排除原因。
页:
[1]