1 OSD(s) experiencing BlueFS spillover ceph分布式存储

admin · 发表于 2025-3-27 09:17:29

ceph -s
  cluster:
id:    5fa16469-8be4-4457-8a78-12b1910afff7
health: HEALTH_WARN
         1 OSD(s) experiencing BlueFS spillover

ceph health detail
HEALTH_WARN 1 OSD(s) experiencing BlueFS spillover
[WRN] BLUEFS_SPILLOVER: 1 OSD(s) experiencing BlueFS spillover
   osd.18 spilled over 39 GiB metadata from 'db' device (186 GiB used of 186 GiB) to slow device

问题很奇怪，以前没有遇到过

HEALTH_WARN 1 OSD(s) experiencing BlueFS spillover
[WRN] BLUEFS_SPILLOVER: 1 OSD(s) experiencing BlueFS spillover
   osd.18 spilled over 39 GiB metadata from 'db' device (186 GiB used of 186 GiB) to slow device

查看官方解答说明：

导致 BlueFS Spillover 的代码问题已在 RHCS 5.0 及更高版本中解决。
请参阅 Root Cause 部分中的提及的 Bugzilla 和 Errata。

如果您可以接受因为这个问题导致的 HEALTH_WARN，且您的 Ceph 集群计划很快会升级到 RHCS 5.3 或更高版本，则不需要进行任何操作。

如果您希望清除 HEALTH_WARN，请在每个 OSD 上执行以下操作（一次在一个 OSD 上执行）。
- compact OSD
- 保护到托管 OSD 节点的 shell，并重新启动 OSD
- 再次 compact OSD
- 保护到托管 OSD 节点的 shell，并重新启动 OSD

解决办法：

# ceph daemon osd.<id> compact  <--wait 2 minutes afterwards

示例：
ceph daemon osd.18 compact{ "elapsed_time": 22.966318924999999}

# systemctl stop ceph-osd@{id}; sleep 2; systemctl start ceph-osd@{id}
示例：systemctl stop ceph-osd@18.service ;sleep 2 ;systemctl start ceph-osd@18.service

Remember to repeat both commands a second time after ~1 minute执行这两条命令要保证1分钟之后。

再执行ceph -s状态恢复正常。
ceph -s  cluster: id:    5fa16469-8be4-4457-8a78-12b1910afff7 health: HEALTH_OK

1320503165 · 发表于 2025-3-27 10:42:45

ceph daemon osd.18 perf dump | grep -C 3 bluefs
      "msgr_recv_encrypted_bytes": 154032,
      "msgr_send_encrypted_bytes": 12016
},
"bluefs": {
      "db_total_bytes": 200038932480,
      "db_used_bytes": 1681915904,
      "wal_total_bytes": 200038936576,

admin · 发表于 2025-3-28 09:24:15

HEALTH_WARN 1 OSD(s) experiencing BlueFS spillover
[WRN] BLUEFS_SPILLOVER: 1 OSD(s) experiencing BlueFS spillover
osd.25 spilled over 774 MiB metadata from 'db' device (186 GiB used of 186 GiB) to slow device

admin · 发表于 2025-3-28 09:30:15

ceph daemon osd.25 compact

{
"elapsed_time": 23.914333896999999
}

systemctl stop ceph-osd@25.service ;sleep 2 ;systemctl start ceph-osd@25.service

ceph health detail
HEALTH_OK

ceph -s
cluster:
id: 5fa16469-8be4-4457-8a78-12b1910afff7
health: HEALTH_OK

		自动登录	找回密码
密码			注册

1 OSD(s) experiencing BlueFS spillover ceph分布式存储

浏览过的版块