admin 发表于 2025-4-9 17:00:42

failed to probe daemons or devices 问题ceph -s出现的错误

CEPHADM_REFRESH_FAILED: failed to probe daemons or devices
host private-registry.example.com `cephadm ceph-volume` failed: cephadm exited with an error code: 1, stderr:Non-zero exit code 125 from /bin/podman run --rm --ipc=host --net=host --entrypoint stat --init -e CONTAINER_IMAGE=ceph5-2-1-registry.example.com:5000/rhceph/rhceph-5-rhel8@sha256:d42c0d99ddeaa001570dce4eb90b71699e0401fe449966b935f669ffad22bd01 -e NODE_NAME=private-registry.example.com -e CEPH_USE_RANDOM_NONCE=1 ceph5-2-1-registry.example.com:5000/rhceph/rhceph-5-rhel8@sha256:d42c0d99ddeaa001570dce4eb90b71699e0401fe449966b935f669ffad22bd01 -c %u %g /var/lib/ceph
stat: stderr Trying to pull ceph5-2-1-registry.example.com:5000/rhceph/rhceph-5-rhel8@sha256:d42c0d99ddeaa001570dce4eb90b71699e0401fe449966b935f669ffad22bd01...
stat: stderr Error: initializing source docker://ceph5-2-1-registry.example.com:5000/rhceph/rhceph-5-rhel8@sha256:d42c0d99ddeaa001570dce4eb90b71699e0401fe449966b935f669ffad22bd01: reading manifest sha256:d42c0d99ddeaa001570dce4eb90b71699e0401fe449966b935f669ffad22bd01 in ceph5-2-1-registry.example.com:5000/rhceph/rhceph-5-rhel8: unauthorized: authentication required
How to eliminate this warning?
Resolution
Login to the cephadm shell on the lead monitor node.

Raw
# cephadm shell
Log in manually to the custom registry on all the new hosts simultaneously:

Raw
# ceph cephadm registry-login --registry-url <CUSTOM_REGISTRY_NAME>--registry_username <REGISTRY_USERNAME> --registry_password <PASSWORD>
Executing the aforementioned command will produce a podman-auth.json file in the /etc/ceph directory that contains the custom registry's authentication details.

Wait 3-5 minutes to see if the daemon starts up.

Raw
# watch ceph orch ls
OPTIONAL: Restart the daemon if it is still not in running state.

Raw
# ceph orch restart <DAEMON_NAME>
SPECIAL CASE:
For daemons like node-exporter, prometheus, alertmanager, grafana:

After logging into the custom registry, use the ceph config command to configure the custom container images:

Raw
# ceph config set mgr mgr/cephadm/OPTION_NAME CUSTOM_REGISTRY_NAME/CONTAINER_NAME
Use the following options for OPTION_NAME:

Raw
container_image_prometheus
container_image_grafana
container_image_alertmanager
container_image_node_exporter
Redeploy each daemon:

Raw
# ceph orch redeploy DAEMON_NAME
Root Cause
While adding ceph daemons on newly added hosts, cephadm is not able to determine the custom registry credentials under podman-auth.json file under the /etc/ceph directory or the file itself is missing.

Behind the scene, when we run this command it actually only logs in the host where the command is run. As a result, when cephadm attempts to pull the image on other hosts it still fails due to not logged in on the other hosts.

For this, instead of running the command on the host itself, use the ceph cephadm registry-login command from inside the cephadm shell and it should log in all the hosts in the cluster.

Diagnostic Steps
Check if the podman-auth.json file is present under /etc/ceph/ on each node.

Raw
# ls -l /etc/ceph/
Verify the daemon status:

Raw
# ceph orch ls
# ceph orch ps

页: [1]
查看完整版本: failed to probe daemons or devices 问题ceph -s出现的错误