ceph 版本升级4.3--->到5.3版本过程中的问题
Observing similar kind of issue when tried upgrade from 4.3z1 --> 5.3 (latest).Post running the cephadm-preflight playbook, the ceph services (mon,mgr,osd's) got failed on all the nodes.But ceph.target was running.As a result, ceph commands are getting hung.#systemctl | grep cephceph-crash loaded active running Ceph crash dump collector● ceph-mgr loaded failed failed Ceph Manager● ceph-mon loaded failed failed Ceph Monitorsystem-ceph\x2dcrash.slice loaded active active system-ceph\x2dcrash.slicesystem-ceph\x2dmds.slice loaded active active system-ceph\x2dmds.slicesystem-ceph\x2dmgr.slice loaded active active system-ceph\x2dmgr.slicesystem-ceph\x2dmon.slice loaded active active system-ceph\x2dmon.sliceceph.target loaded active active ceph target allowing to start/stop all ceph*@.service instances at once# systemctl status ceph.target● ceph.target - ceph target allowing to start/stop all ceph*@.service instances at once Loaded: loaded (/etc/systemd/system/ceph.target; enabled; vendor preset: enabled) Active: active since Wed 2023-12-20 14:58:47 EST; 4h 5min agoDec 20 14:58:47 ceph-msaini-taooh8-node5 systemd: Reached target ceph target allowing to start/stop all ceph*@.service instances at once.========================Upgrade logs========================# ceph --versionceph version 14.2.22-128.el8cp (40a2bf9c4e79e39754d69a95cd51bd60991284be) nautilus (stable)# ceph versions{ "mon": { "ceph version 14.2.22-128.el8cp (40a2bf9c4e79e39754d69a95cd51bd60991284be) nautilus (stable)": 3 }, "mgr": { "ceph version 14.2.22-128.el8cp (40a2bf9c4e79e39754d69a95cd51bd60991284be) nautilus (stable)": 3 }, "osd": { "ceph version 14.2.22-128.el8cp (40a2bf9c4e79e39754d69a95cd51bd60991284be) nautilus (stable)": 12 }, "mds": { "ceph version 14.2.22-128.el8cp (40a2bf9c4e79e39754d69a95cd51bd60991284be) nautilus (stable)": 3 }, "rgw": { "ceph version 14.2.22-128.el8cp (40a2bf9c4e79e39754d69a95cd51bd60991284be) nautilus (stable)": 4 }, "overall": { "ceph version 14.2.22-128.el8cp (40a2bf9c4e79e39754d69a95cd51bd60991284be) nautilus (stable)": 25 }}# ceph -scluster: id: 07cd16a8-f925-4d09-a041-6d725b939582 health: HEALTH_WARN 1 pool(s) have non-power-of-two pg_num 1 pools have too few placement groups 3 pools have too many placement groups mons are allowing insecure global_id reclaimservices: mon: 3 daemons, quorum ceph-msaini-taooh8-node3,ceph-msaini-taooh8-node2,ceph-msaini-taooh8-node1-installer (age 45m) mgr: ceph-msaini-taooh8-node1-installer(active, since 43m), standbys: ceph-msaini-taooh8-node2, ceph-msaini-taooh8-node3 mds: cephfs:1 {0=ceph-msaini-taooh8-node2=up:active} 2 up:standby osd: 12 osds: 12 up (since 38m), 12 in (since 57m) rgw: 4 daemons active (ceph-msaini-taooh8-node5.rgw0, ceph-msaini-taooh8-node5.rgw1, ceph-msaini-taooh8-node6.rgw0, ceph-msaini-taooh8-node6.rgw1)data: pools: 13 pools, 676 pgs objects: 382 objects, 456 MiB usage: 13 GiB used, 227 GiB / 240 GiB avail pgs: 676 active+cleanio: client: 2.5 KiB/s rd, 2 op/s rd, 0 op/s wr# podman psCONTAINER IDIMAGE COMMAND CREATED STATUS PORTS NAMESb4bc2bbf0671registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.6--path.procfs=/ro...54 minutes agoUp 54 minutes node-exporter288dbf3d1416registry.redhat.io/rhceph/rhceph-4-rhel8:latest 49 minutes agoUp 49 minutes ceph-mon-ceph-msaini-taooh8-node1-installere02558859efbregistry.redhat.io/rhceph/rhceph-4-rhel8:latest 46 minutes agoUp 46 minutes ceph-mgr-ceph-msaini-taooh8-node1-installerfdc68705313eregistry.redhat.io/rhceph/rhceph-4-rhel8:latest 30 minutes agoUp 30 minutes ceph-crash-ceph-msaini-taooh8-node1-installer# sudo ansible-playbook -i hosts infrastructure-playbooks/rolling_update.yml --extra-vars "health_osd_check_retries=50 health_osd_check_delay=30"PLAY RECAP **************************************************************************************************************************************************************************************************************************************ceph-msaini-taooh8-node1-installer : ok=375changed=59 unreachable=0 failed=0 skipped=633rescued=0 ignored=0ceph-msaini-taooh8-node2 : ok=370changed=39 unreachable=0 failed=0 skipped=685rescued=0 ignored=0ceph-msaini-taooh8-node3 : ok=370changed=39 unreachable=0 failed=0 skipped=690rescued=0 ignored=0ceph-msaini-taooh8-node4 : ok=252changed=28 unreachable=0 failed=0 skipped=460rescued=0 ignored=0ceph-msaini-taooh8-node5 : ok=379changed=38 unreachable=0 failed=0 skipped=625rescued=0 ignored=0ceph-msaini-taooh8-node6 : ok=368changed=37 unreachable=0 failed=0 skipped=645rescued=0 ignored=0ceph-msaini-taooh8-node7 : ok=319changed=38 unreachable=0 failed=0 skipped=495rescued=0 ignored=0localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0# ansible-playbook -vvvv infrastructure-playbooks/rolling_update.yml -i hosts stdout: |- { "mon": { "ceph version 16.2.10-220.el8cp (380780920862a7326df3e00903e9912b85af7d30) pacific (stable)": 3 }, "mgr": { "ceph version 16.2.10-220.el8cp (380780920862a7326df3e00903e9912b85af7d30) pacific (stable)": 3 }, "osd": { "ceph version 16.2.10-220.el8cp (380780920862a7326df3e00903e9912b85af7d30) pacific (stable)": 12 }, "mds": { "ceph version 16.2.10-220.el8cp (380780920862a7326df3e00903e9912b85af7d30) pacific (stable)": 3 }, "rgw": { "ceph version 16.2.10-220.el8cp (380780920862a7326df3e00903e9912b85af7d30) pacific (stable)": 4 }, "rgw-nfs": { "ceph version 16.2.10-220.el8cp (380780920862a7326df3e00903e9912b85af7d30) pacific (stable)": 1 }, "overall": { "ceph version 16.2.10-220.el8cp (380780920862a7326df3e00903e9912b85af7d30) pacific (stable)": 26 } }stdout_lines: <omitted>META: ran handlersMETA: ran handlersPLAY RECAP **************************************************************************************************************************************************************************************************************************************ceph-msaini-taooh8-node1-installer : ok=372changed=51 unreachable=0 failed=0 skipped=626rescued=0 ignored=0ceph-msaini-taooh8-node2 : ok=363changed=27 unreachable=0 failed=0 skipped=676rescued=0 ignored=0ceph-msaini-taooh8-node3 : ok=364changed=28 unreachable=0 failed=0 skipped=680rescued=0 ignored=0ceph-msaini-taooh8-node4 : ok=249changed=21 unreachable=0 failed=0 skipped=453rescued=0 ignored=0ceph-msaini-taooh8-node5 : ok=375changed=27 unreachable=0 failed=0 skipped=616rescued=0 ignored=0ceph-msaini-taooh8-node6 : ok=370changed=27 unreachable=0 failed=0 skipped=629rescued=0 ignored=0ceph-msaini-taooh8-node7 : ok=317changed=29 unreachable=0 failed=0 skipped=489rescued=0 ignored=0localhost : ok=1 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0# ceph --versionceph version 14.2.22-128.el8cp (40a2bf9c4e79e39754d69a95cd51bd60991284be) nautilus (stable)# ceph versions{ "mon": { "ceph version 16.2.10-220.el8cp (380780920862a7326df3e00903e9912b85af7d30) pacific (stable)": 3 }, "mgr": { "ceph version 16.2.10-220.el8cp (380780920862a7326df3e00903e9912b85af7d30) pacific (stable)": 3 }, "osd": { "ceph version 16.2.10-220.el8cp (380780920862a7326df3e00903e9912b85af7d30) pacific (stable)": 12 }, "mds": { "ceph version 16.2.10-220.el8cp (380780920862a7326df3e00903e9912b85af7d30) pacific (stable)": 3 }, "rgw": { "ceph version 16.2.10-220.el8cp (380780920862a7326df3e00903e9912b85af7d30) pacific (stable)": 4 }, "rgw-nfs": { "ceph version 16.2.10-220.el8cp (380780920862a7326df3e00903e9912b85af7d30) pacific (stable)": 1 }, "overall": { "ceph version 16.2.10-220.el8cp (380780920862a7326df3e00903e9912b85af7d30) pacific (stable)": 26 }}# podman psCONTAINER IDIMAGE COMMAND CREATED STATUS PORTS NAMES6ca1e2071341registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.3-rhel-8-containers-candidate-88814-20231215195330 16 minutes agoUp 16 minutes ceph-mon-ceph-msaini-taooh8-node1-installerf518b6b7588dregistry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.3-rhel-8-containers-candidate-88814-20231215195330 13 minutes agoUp 13 minutes ceph-mgr-ceph-msaini-taooh8-node1-installer74a1b25bee9eregistry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.3-rhel-8-containers-candidate-88814-20231215195330 3 minutes ago Up 3 minutes ceph-crash-ceph-msaini-taooh8-node1-installer38e14828d9aeregistry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.6 --path.procfs=/ro...2 minutes ago Up 2 minutes node-exporter## systemctl | grep cephceph-crash loaded active running Ceph crash dump collectorceph-mgr loaded active running Ceph Managerceph-mon loaded active running Ceph Monitorsystem-ceph\x2dcrash.slice loaded active active system-ceph\x2dcrash.slicesystem-ceph\x2dmgr.slice loaded active active system-ceph\x2dmgr.slicesystem-ceph\x2dmon.slice loaded active active system-ceph\x2dmon.sliceceph-mgr.target loaded active active ceph target allowing to start/stop all ceph-mgr@.service instances at onceceph-mon.target loaded active active ceph target allowing to start/stop all ceph-mon@.service instances at onceceph.target loaded active active ceph target allowing to start/stop all ceph*@.service instances at once# ansible-playbook infrastructure-playbooks/cephadm-adopt.yml -i hostsTASK ********************************************************************************************************************************************************************************************************fatal: : FAILED! => changed=falsecmd:- podman- run- --rm- --net=host- -v- /etc/ceph:/etc/ceph:z- -v- /var/lib/ceph:/var/lib/ceph:ro- -v- /var/run/ceph:/var/run/ceph:z- --entrypoint=ceph- registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.3-rhel-8-containers-candidate-88814-20231215195330- --cluster- ceph- orch- host- label- add- ceph-msaini-taooh8-node2- cephdelta: '0:00:01.795436'end: '2023-12-20 18:15:08.390207'msg: non-zero return coderc: 22start: '2023-12-20 18:15:06.594771'stderr: 'Error EINVAL: host ceph-msaini-taooh8-node2 does not exist'stderr_lines: <omitted>stdout: ''stdout_lines: <omitted># dnf install cephadm-ansibleUpdating Subscription Management repositories.Last metadata expiration check: 0:01:06 ago on Wed 20 Dec 2023 06:22:31 PM EST.Dependencies resolved.================================================================================================================================================================================================================================================= Package Architecture Version Repository Size=================================================================================================================================================================================================================================================Installing: cephadm-ansible noarch 1.17.0-1.el8cp rhceph-5-tools-for-rhel-8-x86_64-rpms 32 kInstalling dependencies: ansible-collection-ansible-posix noarch 1.2.0-1.el8cp.1 rhceph-5-tools-for-rhel-8-x86_64-rpms 131 k ansible-collection-community-general noarch 4.0.0-1.1.el8cp.1 rhceph-5-tools-for-rhel-8-x86_64-rpms 1.5 M ansible-core x86_64 2.15.3-1.el8 rhel-8-for-x86_64-appstream-rpms 3.6 M mpdecimal x86_64 2.5.1-3.el8 rhel-8-for-x86_64-appstream-rpms 93 k python3.11 x86_64 3.11.5-1.el8_9 rhel-8-for-x86_64-appstream-rpms 30 k python3.11-cffi x86_64 1.15.1-1.el8 rhel-8-for-x86_64-appstream-rpms 293 k python3.11-cryptography x86_64 37.0.2-5.el8 rhel-8-for-x86_64-appstream-rpms 1.1 M python3.11-libs x86_64 3.11.5-1.el8_9 rhel-8-for-x86_64-appstream-rpms 10 M python3.11-pip-wheel noarch 22.3.1-4.el8 rhel-8-for-x86_64-appstream-rpms 1.4 M python3.11-ply noarch 3.11-1.el8 rhel-8-for-x86_64-appstream-rpms 135 k python3.11-pycparser noarch 2.20-1.el8 rhel-8-for-x86_64-appstream-rpms 147 k python3.11-pyyaml x86_64 6.0-1.el8 rhel-8-for-x86_64-appstream-rpms 214 k python3.11-setuptools-wheel noarch 65.5.1-2.el8 rhel-8-for-x86_64-appstream-rpms 720 k sshpass x86_64 1.09-4.el8ap labrepo 30 kTransaction Summary=================================================================================================================================================================================================================================================Install15 PackagesTotal download size: 20 MInstalled size: 78 MIs this ok : y# systemctl | grep cephceph-crash loaded active running Ceph crash dump collector● ceph-mgr loaded failed failed Ceph Manager● ceph-mon loaded failed failed Ceph Monitorsystem-ceph\x2dcrash.slice loaded active active system-ceph\x2dcrash.slicesystem-ceph\x2dmgr.slice loaded active active system-ceph\x2dmgr.slicesystem-ceph\x2dmon.slice loaded active active system-ceph\x2dmon.sliceceph.target loaded active active ceph target allowing to start/stop all ceph*@.service instances at once# systemctl status ceph.target● ceph.target - ceph target allowing to start/stop all ceph*@.service instances at once Loaded: loaded (/etc/systemd/system/ceph.target; enabled; vendor preset: enabled) Active: active since Wed 2023-12-20 14:58:46 EST; 3h 54min agoDec 20 14:58:46 ceph-msaini-taooh8-node1-installer systemd: Reached target ceph target allowing to start/stop all ceph*@.service instances at once.# ceph -s# systemctl -l status ceph-mgr● ceph-mgr - Ceph Manager Loaded: loaded (/etc/systemd/system/ceph-mgr@.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2023-12-20 18:34:26 EST; 37min ago Main PID: 110855 (code=exited, status=143)Dec 20 18:34:24 ceph-msaini-taooh8-node1-installer ceph-mgr-ceph-msaini-taooh8-node1-installer: 2023-12-20T18:34:24.431-0500 7f0a8fddb7000 log_channel(cluster) log : pgmap v677: 701 pgs: 701 active+clean; 456 MiB data, 2.3 G>Dec 20 18:34:26 ceph-msaini-taooh8-node1-installer systemd: Stopping Ceph Manager...Dec 20 18:34:26 ceph-msaini-taooh8-node1-installer ceph-mgr-ceph-msaini-taooh8-node1-installer: teardown: managing teardown after SIGTERMDec 20 18:34:26 ceph-msaini-taooh8-node1-installer ceph-mgr-ceph-msaini-taooh8-node1-installer: teardown: Sending SIGTERM to PID 54Dec 20 18:34:26 ceph-msaini-taooh8-node1-installer ceph-mgr-ceph-msaini-taooh8-node1-installer: teardown: Waiting PID 54 to terminate .Dec 20 18:34:26 ceph-msaini-taooh8-node1-installer ceph-mgr-ceph-msaini-taooh8-node1-installer: teardown: Process 54 is terminatedDec 20 18:34:26 ceph-msaini-taooh8-node1-installer sh: f518b6b7588de6ed1793a6f58a4fa9ca41df91f58a7543dd90d97508e6f612e5Dec 20 18:34:26 ceph-msaini-taooh8-node1-installer systemd: ceph-mgr: Main process exited, code=exited, status=143/n/aDec 20 18:34:26 ceph-msaini-taooh8-node1-installer systemd: ceph-mgr: Failed with result 'exit-code'.Dec 20 18:34:26 ceph-msaini-taooh8-node1-installer systemd: Stopped Ceph Manager.# systemctl -l status ceph-mon● ceph-mon - Ceph Monitor Loaded: loaded (/etc/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2023-12-20 18:34:26 EST; 38min ago Main PID: 106377 (code=exited, status=143)Dec 20 18:34:26 ceph-msaini-taooh8-node1-installer ceph-mon-ceph-msaini-taooh8-node1-installer: debug 2023-12-20T18:34:26.595-0500 7f4c0a23b8801 rocksdb: close waiting for compaction thread to stopDec 20 18:34:26 ceph-msaini-taooh8-node1-installer ceph-mon-ceph-msaini-taooh8-node1-installer: debug 2023-12-20T18:34:26.595-0500 7f4c0a23b8801 rocksdb: close compaction thread to stoppedDec 20 18:34:26 ceph-msaini-taooh8-node1-installer ceph-mon-ceph-msaini-taooh8-node1-installer: debug 2023-12-20T18:34:26.595-0500 7f4c0a23b8804 rocksdb: Shutdown: canceling all background workDec 20 18:34:26 ceph-msaini-taooh8-node1-installer ceph-mon-ceph-msaini-taooh8-node1-installer: debug 2023-12-20T18:34:26.599-0500 7f4c0a23b8804 rocksdb: Shutdown completeDec 20 18:34:26 ceph-msaini-taooh8-node1-installer ceph-mon-ceph-msaini-taooh8-node1-installer: teardown: Waiting PID 86 to terminate .Dec 20 18:34:26 ceph-msaini-taooh8-node1-installer ceph-mon-ceph-msaini-taooh8-node1-installer: teardown: Process 86 is terminatedDec 20 18:34:26 ceph-msaini-taooh8-node1-installer sh: 6ca1e2071341cf2fa0140bced76763b36ec0f17f55ddba50794aa25a1245099eDec 20 18:34:26 ceph-msaini-taooh8-node1-installer systemd: ceph-mon: Main process exited, code=exited, status=143/n/aDec 20 18:34:26 ceph-msaini-taooh8-node1-installer systemd: ceph-mon: Failed with result 'exit-code'.Dec 20 18:34:26 ceph-msaini-taooh8-node1-installer systemd: Stopped Ceph Monitor.## rpm -qa | grep cephceph-grafana-dashboards-14.2.22-128.el8cp.noarchlibcephfs2-16.2.10-208.el8cp.x86_64cephadm-ansible-1.17.0-1.el8cp.noarchpython3-ceph-common-16.2.10-208.el8cp.x86_64ceph-base-16.2.10-208.el8cp.x86_64cephadm-16.2.10-220.el9cp.noarchpython3-ceph-argparse-16.2.10-208.el8cp.x86_64python3-cephfs-16.2.10-208.el8cp.x86_64ceph-common-16.2.10-208.el8cp.x86_64ceph-selinux-16.2.10-208.el8cp.x86_64
页:
[1]