易陆发现/openstack区/docker区/bbs168x/openstack/云计算/ceph分布式存储/neturon

admin 发表于 2025-3-27 09:17:29

1 OSD(s) experiencing BlueFS spillover ceph分布式存储

ceph -s
cluster:
id: 5fa16469-8be4-4457-8a78-12b1910afff7
health: HEALTH_WARN
1 OSD(s) experiencing BlueFS spillover

ceph health detail
HEALTH_WARN 1 OSD(s) experiencing BlueFS spillover
BLUEFS_SPILLOVER: 1 OSD(s) experiencing BlueFS spillover
osd.18 spilled over 39 GiB metadata from 'db' device (186 GiB used of 186 GiB) to slow device

问题很奇怪，以前没有遇到过

HEALTH_WARN 1 OSD(s) experiencing BlueFS spillover
BLUEFS_SPILLOVER: 1 OSD(s) experiencing BlueFS spillover
osd.18 spilled over 39 GiB metadata from 'db' device (186 GiB used of 186 GiB) to slow device

查看官方解答说明：

导致 BlueFS Spillover 的代码问题已在 RHCS 5.0 及更高版本中解决。
请参阅 Root Cause 部分中的提及的 Bugzilla 和 Errata。

如果您可以接受因为这个问题导致的 HEALTH_WARN，且您的 Ceph 集群计划很快会升级到 RHCS 5.3 或更高版本，则不需要进行任何操作。

如果您希望清除 HEALTH_WARN，请在每个 OSD 上执行以下操作（一次在一个 OSD 上执行）。
- compact OSD
- 保护到托管 OSD 节点的 shell，并重新启动 OSD
- 再次 compact OSD
- 保护到托管 OSD 节点的 shell，并重新启动 OSD

解决办法：

# ceph daemon osd.<id> compact<--wait 2 minutes afterwards

示例：
ceph daemon osd.18 compact{ "elapsed_time": 22.966318924999999}

# systemctl stop ceph-osd@{id}; sleep 2; systemctl start ceph-osd@{id}
示例：systemctl stop ceph-osd@18.service ;sleep 2 ;systemctl start ceph-osd@18.service

Remember to repeat both commands a second time after ~1 minute执行这两条命令要保证1分钟之后。

再执行ceph -s状态恢复正常。
ceph -scluster: id: 5fa16469-8be4-4457-8a78-12b1910afff7 health: HEALTH_OK

1320503165 发表于 2025-3-27 10:42:45

ceph daemon osd.18 perf dump | grep -C 3 bluefs
   "msgr_recv_encrypted_bytes": 154032,
   "msgr_send_encrypted_bytes": 12016
},
"bluefs": {
   "db_total_bytes": 200038932480,
   "db_used_bytes": 1681915904,
   "wal_total_bytes": 200038936576,

admin 发表于 2025-3-28 09:24:15

HEALTH_WARN 1 OSD(s) experiencing BlueFS spillover
BLUEFS_SPILLOVER: 1 OSD(s) experiencing BlueFS spillover
osd.25 spilled over 774 MiB metadata from 'db' device (186 GiB used of 186 GiB) to slow device

admin 发表于 2025-3-28 09:30:15

ceph daemon osd.25 compact

{
"elapsed_time": 23.914333896999999
}

systemctl stop ceph-osd@25.service ;sleep 2 ;systemctl start ceph-osd@25.service

ceph health detail
HEALTH_OK

ceph -s
cluster:
id: 5fa16469-8be4-4457-8a78-12b1910afff7
health: HEALTH_OK

页: [1]

易陆发现互联网技术论坛's Archiver

1 OSD(s) experiencing BlueFS spillover ceph分布式存储