ceph 分布式存储 ceph -s 提示1 daemons have recently crashed

admin · 发表于 2021-5-25 09:51:14

ceph 分布式存储 ceph -s 提示1 daemons have recently crashed
处理过程如下：

[root@compute02 ~]# ceph -s
  cluster:
id:    dd1ff8b6-f7b8-47a3-890c-17f75894562a
health: HEALTH_WARN
         Reduced data availability: 171 pgs stale
         1 daemons have recently crashed
         4 slow ops, oldest one blocked for 544 sec, osd.0 has slow ops

  services:
mon: 3 daemons, quorum compute01,compute02,compute03 (age 11m)
mgr: compute02(active, since 4d), standbys: compute03, compute01
osd: 3 osds: 3 up (since 9m), 3 in (since 9m)

  data:
pools: 4 pools, 512 pgs
objects: 7.35k objects, 52 GiB
usage: 35 GiB used, 1.6 TiB / 1.6 TiB avail
pgs:    341 active+clean
         171 stale+active+clean

[root@compute02 ~]# ceph crash ls-new
ID                                                             ENTITY NEW
2021-05-24_14:59:54.039272Z_69fc0f11-81bf-4428-aece-20a18f2b03e3 osd.0 *
[root@compute02 ~]# ceph crash info 2021-05-24_14:59:54.039272Z_69fc0f11-81bf-4428-aece-20a18f2b03e3
{
"os_version_id": "7",
"utsname_machine": "x86_64",
"entity_name": "osd.0",
"io_error": true,
"backtrace": [
      "(()+0xf630) [0x7f22dcef7630]",
      "(gsignal()+0x37) [0x7f22dbcea387]",
      "(abort()+0x148) [0x7f22dbceba78]",
      "(ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0x1a5) [0x56498eb2edfc]",
      "(KernelDevice::_aio_thread()+0xebe) [0x56498f17a1de]",
      "(KernelDevice::AioCompletionThread::entry()+0xd) [0x56498f17c89d]",
      "(()+0x7ea5) [0x7f22dceefea5]",
      "(clone()+0x6d) [0x7f22dbdb29fd]"
],
"io_error_optype": 8,
"io_error_length": 4096,
"assert_line": 534,
"utsname_release": "3.10.0-1160.el7.x86_64",
"io_error_offset": 288585248768,
"assert_file": "/home/miles/rpmbuild/BUILD/ceph-14.2.8/src/os/bluestore/KernelDevice.cc",
"io_error_devname": "dm-2",
"utsname_sysname": "Linux",
"os_version": "7 (Core)",
"os_id": "centos",
"assert_thread_name": "bstore_aio",
"assert_msg": "/home/miles/rpmbuild/BUILD/ceph-14.2.8/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7f22d04a8700 time 2021-05-24 22:59:54.033676\n/home/miles/rpmbuild/BUILD/ceph-14.2.8/src/os/bluestore/KernelDevice.cc: 534: ceph_abort_msg(\"Unexpected IO error. This may suggest a hardware issue. Please check your kernel log!\")\n",
"assert_func": "void KernelDevice::_aio_thread()",
"ceph_version": "14.2.8-111.el7",
"io_error_path": "/var/lib/ceph/osd/ceph-0/block",
"os_name": "CentOS Linux",
"timestamp": "2021-05-24 14:59:54.039272Z",
"process_name": "ceph-osd",
"utsname_hostname": "compute01",
"crash_id": "2021-05-24_14:59:54.039272Z_69fc0f11-81bf-4428-aece-20a18f2b03e3",
"assert_condition": "abort",
"utsname_version": "#1 SMP Mon Oct 19 16:18:59 UTC 2020",
"io_error_code": -5
}
[root@compute02 ~]# ceph crash  archive 2021-05-24_14:59:54.039272Z_69fc0f11-81bf-4428-aece-20a18f2b03e3
[root@compute02 ~]# ceph crash archive-all

[root@compute02 ~]# ceph -s
  cluster:
id:    dd1ff8b6-f7b8-47a3-890c-17f75894562a
health: HEALTH_WARN
         Reduced data availability: 171 pgs stale
         4 slow ops, oldest one blocked for 738 sec, osd.0 has slow ops

  services:
mon: 3 daemons, quorum compute01,compute02,compute03 (age 14m)
mgr: compute02(active, since 4d), standbys: compute03, compute01
osd: 3 osds: 3 up (since 12m), 3 in (since 12m)

  data:
pools: 4 pools, 512 pgs
objects: 7.35k objects, 52 GiB
usage: 35 GiB used, 1.6 TiB / 1.6 TiB avail
pgs:    341 active+clean
         171 stale+active+clean

归档问题解决

借助相关处理介绍如下：

使用ceph -s查看集群状态，发现一直有如下报错，且数量一直在增加

daemons have recently crashed

经查当前系统运行状态正常，判断这里显示的应该是历史故障，处理方式如下：

查看历史crash

ceph crash ls-new

根据ls出来的id查看详细信息

ceph crash info <crash-id>

将历史crash信息进行归档，即不再显示

ceph crash archive <crash-id>

归档所有信息

ceph crash archive-all

The time period for what “recent” means is controlled by the option mgr/crash/warn_recent_interval (default: two weeks).These warnings can be disabled entirely with:#ceph config set mgr/crash/warn_recent_interval 0

		自动登录	找回密码
密码			注册

ceph 分布式存储 ceph -s 提示1 daemons have recently crashed

浏览过的版块