[root@8-5 ~]# ceph health detail
3 r1 P a5 u/ D- H" V7 cHEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests
9 J" y; Y3 A: e* P8 q[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs( U4 P2 a( z# J2 B% v& B |2 F
mds.cephfs.gm268-2.xdsdoz(mds.0): 100+ slow metadata IOs are blocked > 30 secs, oldest blocked for 2718 secs9 _2 L0 V. [7 n6 ~
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests$ ?1 z/ t: {* a1 S. q
mds.cephfs.gm268-2.xdsdoz(mds.0): 73 slow requests are blocked > 30 secs
N, ~7 ], N( r( @, c
1 i& o, ]0 s1 f7 I/ G" S6 B% J5 A5 K) e8 ~# a8 n
出现这种提示会导致集群对请求没有反应,解决办法就是重启所有的ceph节点即可:) v! o$ b4 W3 q9 [5 t) o# v
systemctl restart ceph.target或者重启服务器也可解决问题,响应慢可以使用重启的方式来重新发起集群数据均衡。& i: K" i$ r. ?: L7 |, a
' F" A1 V) C* N. i5 G- V
观察结果
9 p8 H- v% G4 G& q! J
4 H. B2 Q6 t2 @4 e0 g3 o
. |5 W. _2 g6 H. k" g1 W( t3 T- @1 ^" w- q% `5 r
1 H, i0 I' t' a8 c& G
1. Slow OSD heartbeats- # ceph -s
- health: HEALTH_WARN
- Slow OSD heartbeats on back (longest 6181.010ms)
- Slow OSD heartbeats on front (longest 5953.232ms)1 R5 y* ?, V7 x1 W. O: X, c
OSDs之间会相互测试(ping)访问速度,若两个OSDs之间的连接延迟高于1s,则表示OSDs之间的延迟太高,不利于CEPH集群的数据存储和访问。两个OSDs之间可以通过内网(存储服务器之间 / back)检测其延迟,也可以通过外网(存储服务器到使用服务器 / front)检测其延迟。若延迟过高,会将相应的OSDs down掉,进而可能导致CEPH数据丢失。 - # ceph health detail
-
- [WRN OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 11846.602ms)
- Slow OSD heartbeats on back from osd.12 [] to osd.25 [] 11846.602 msec
- Slow OSD heartbeats on back from osd.8 [] to osd.17 [] 3617.281 msec
- Slow OSD heartbeats on back from osd.16 [] to osd.27 [] 2784.517 msec
- Slow OSD heartbeats on back from osd.21 [] to osd.17 [] 1678.064 msec
- Slow OSD heartbeats on back from osd.11 [] to osd.15 [] 1675.884 msec
- Slow OSD heartbeats on back from osd.20 [] to osd.13 [] 1073.790 msec
- [WRN OSD_SLOW_PING_TIME_FRONT: Slow OSD heartbeats on front (longest 11427.677ms)
- Slow OSD heartbeats on front from osd.12 [] to osd.25 [] 11427.677 msec
- Slow OSD heartbeats on front from osd.8 [] to osd.17 [] 3787.868 msec
- Slow OSD heartbeats on front from osd.16 [] to osd.27 [] 3465.298 msec
- Slow OSD heartbeats on front from osd.11 [] to osd.15 [] 1469.591 msec
- Slow OSD heartbeats on front from osd.21 [] to osd.17 [] 1341.135 msec
- Slow OSD heartbeats on front from osd.20 [] to osd.13 [] 1224.235 msec
- Slow OSD heartbeats on front from osd.5 [] to osd.16 [] 1101.175 msec
-
- 通过以上信息查看,可以发现有一台主机和其它主机的OSDs延迟都比较高,将该主机的光纤网线拔下擦拭干净并重新插上得以解决。7 X3 f A, u/ n, T9 s' a1 R5 p+ S
2. slow ops- # ceph -s
- 21 slow ops, oldest one blocked for 29972 sec, mon.ceph1 has slow ops+ a+ {2 X& B" ?8 h. h( [
先保证所有存储服务器上的时间同步一致,再重启相应主机上的moniter服务解决。 3. pgs not deep-scrubbed in time- # ceph -s
- 47 pgs not deep-scrubbed in time
3 z+ d7 F; H4 f
应该是OSDs掉线后,CEPH自动进行数据恢复。再将相应的OSDs重新加入后,则需要将恢复的数据再擦除掉。于是提示相应的警告信息,正在进行删除相关的操作,且其pgs的数量会不断变少。等待一段时间后,则恢复正常,此时ceph文件系统性能很差。 4. MDS cache is too large- ceph config set mds mds_cache_memory_limit 10GB
-
- ceph config dump! u1 `3 ~5 z6 ?$ g
当MDS使用的缓存过高,比设定的阈值高很多时,则有此警告信息。使用如上命令设置更高的MDS缓存阈值,即可消除次警告信息,但会消耗更多的内存。使用config dump命令可以查看各项参数阈值信息。 此外,可能增大了mds_cache_memory_limit参数后,过了一段时间后仍然提示该警告,检测发现MDS缓存使用又超过新设定值的1.5倍大小了。此时,可以考虑设置多个活动状态的MDS服务。 - # 先开启3台服务器的MDS服务,确保这3台服务器的内存是够用的,最好这3台服务器的内存更大。
- ceph orch apply mds cephfs ceph106,ceph107,ceph109
- ceph fs set cephfs max_mds 3
-
- # 由于激活了3台服务器的MDS,缺少备用的MDS服务。再增加一个备用的MDS服务主机。
- ceph orch apply mds cephfs ceph106,ceph107,ceph109,ceph110. Y; `+ v& [, y/ {8 B0 s
5. Client node18 failing to respond to cache pressure表示node18主机和MDS服务之前的响应较慢,若过一会儿就显示health_ok,则不用管它。若是长期显示该警告,则在对应的node18主机上卸载ceph文件系统后重新挂载即可。 客户端在使用相应数据时,MDS服务端则将其数据缓存到服务器的内存中。当MDS服务端需要减少缓存消耗时,则会给客户端发送相应的请求。此时,客户端响应过慢,则提示此警告信息。若一直如此,会导致MDS服务器缓存无法释放,内存消耗持续增加甚至导致宕机。 可以查询ceph客户端的ID号及其使用inode数(num_caps的值)。 - ceph tell mds.0 session ls
4 u: k, K; ]7 x$ B' j& \
谨慎使用如下命令踢出目标客户端或全部客户端。 - ceph tell mds.0 session evict id=11134635
- ceph tell mds.0 session evict# e8 |% A3 P* A6 Z$ Y) C
踢出客户端是将客户端加入了黑名单,可以使用如下命令查看黑名单信息或移出黑名单。虽然移出黑名单,可能还不能让客户端正常挂载ceph文件系统,因此需要谨慎处理。 - ceph osd blacklist ls
- ceph osd blacklist rm 192.168.20.1:0/1498586492
- ceph osd blacklist clear- o: K6 J1 t* r/ ^/ E0 M
6. Reduced data availability: 4 pgs inactive, 4 pgs incomplete当有pgs出现incomplete时,表明pgs对应的OSDs存活数量少于最小副本数。因此,其对应的数据无法读写,处于reduced状态,会导致MDS服务出问题,提示如下报错信息,示例: - 3 MDSs report slow metadata IOs
- 2 MDSs report slow requests
- 2 MDSs behind on trimming
- Reduced data availability: 4 pgs inactive, 4 pgs incomplete
-
- pg 5.6de is incomplete, acting [254,356,222,352,111,247,100,133,351,206 (reducing pool cephfs_data min_size from 8 may help; search ceph.com/docs for 'incomplete')
- pg 5.6e9 is incomplete, acting [276,244,357,358,221,321,311,229,314,351 (reducing pool cephfs_data min_size from 8 may help; search ceph.com/docs for 'incomplete')
- pg 5.73b is incomplete, acting [186,279,351,247,293,354,359,220,181,283 (reducing pool cephfs_data min_size from 8 may help; search ceph.com/docs for 'incomplete')
- pg 5.eda is incomplete, acting [164,157,120,227,353,351,295,269,95,354 (reducing pool cephfs_data min_size from 8 may help; search ceph.com/docs for 'incomplete')
+ ?8 N/ Q0 K) D; w! S# W
此时,需要修复pgs。 - # 查询pg信息(pg id 为 5.6de)
- ceph pg 5.6de query
-
- # 强行重建pg
- ceph osd force-create-pg 5.6de --yes-i-really-mean-it' w! {: ^9 p3 F+ i8 P4 a& T3 u7 U3 y
7. failed to probe daemons or devices stderr:Non-zero exit code 125 from /bin/podman由于Ceph存储集群中个别服务器的podman容器出问题,导致相应服务启动失败。报告警告如下: - [WRN CEPHADM_REFRESH_FAILED: failed to probe daemons or devices
- host ceph105 ceph-volume inventory failed: cephadm exited with an error code: 1, stderr:Non-zero exit code 125 from /bin/podman run --rm --ipc=host --net=host --entrypoint stat -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=ceph105 docker.io/ceph/ceph:v15 -c %u %g /var/lib/ceph
- stat:stderr Error: readlink /var/lib/containers/storage/overlay/l/HMGABIBEWBRXOSBT4JLOKQIKDA: no such file or directory
- Traceback (most recent call last):
- File "", line 6112, in
- File "", line 1299, in _infer_fsid
- File "", line 1382, in _infer_image
- File "", line 3581, in command_ceph_volume
- File "", line 1477, in make_log_dir
- File "", line 2084, in extract_uid_gid
- RuntimeError: uid/gid not found$ P$ X9 C( k# ^+ U5 s0 K/ t
执行以下命令时,会有如上报错。而正常的存储节点则不会报错。 - cephadm shell
% S5 ^ Z4 w( M
该类报错表示podman的docker容器出错。查找出错的存储节点: - ceph orch ps | grep error7 ~) W. N3 h, z7 |# S" j) L
在各存储节点重新pull相应的docker镜像: - cephadm pull
- podman pull ceph/ceph:v15
- # 以上两个命令都可以达到目的,后者能看到下载的速度,以免等待较长时间下载几百M的文件而不清楚进度。
- # 重新pull镜像后,会提升ceph版本。不会影响使用9 G% V$ O, ~) V
检查podman的docker镜像 - podman images
- podman ps" M/ H; O" m2 Z- t9 w, D- t/ k
最后重启服务器或重启CEPH服务。 8. mds.cephfs.ceph109.avzzqn(mds.1): Behind on trimming (594/128) max_segments: 128, num_segments: 594有MDS服务器报警: - [WRN MDS_TRIM: 2 MDSs behind on trimming
- mds.cephfs.ceph109.avzzqn(mds.1): Behind on trimming (594/128) max_segments: 128, num_segments: 594
- mds.cephfs.ceph106.hggsge(mds.0): Behind on trimming (259/128) max_segments: 128, num_segments: 259
, Q9 K( C$ P7 c! `7 z# f
MDS服务器将元数据以segments(object)方式存放,当MDS中的segments数量超出mds_log_max_segments的设置值(默认为128)时,MDS服务开始启动Trimming,即将segments数据进行回写。当MDS中的segments数超过设定值两倍时,开始报警Behind on trimming信息。当MDS服务器内存足够时,推荐增大mds_log_max_segments参数值。 - ceph config set mds mds_log_max_segments 1024
3 O6 N m e9 V# v% }5 ^0 Z+ ]# \ 9. mds N slow requests are blocked > 30 secsMDS服务报警: - [WRN MDS_SLOW_REQUEST: 3 MDSs report slow requests
- mds.cephfs.ceph109.avzzqn(mds.1): 29 slow requests are blocked > 30 secs
- mds.cephfs.ceph110.sfagxf(mds.2): 1 slow requests are blocked > 30 secs
- mds.cephfs.ceph106.hggsge(mds.0): 3 slow requests are blocked > 30 secs: D. U9 |9 w7 f j$ |& R% ?
以上报警表示MDS响应慢,原因可能是:mds服务运行太慢、底层pg或OSD出问题导致写入日志未确认、或BUG。通过设置mds_op_complaint_time值为3000,问题依旧。 出现此警告时,OSD未报错。而mds服务运行应该正常,内存也足够用。通过阵列卡检测硬盘,发现有两台服务器分别有一块硬盘没有检测到。推测是相应的硬盘出问题,而OSD还未反应过来,带后续观察。 10. insufficient standby MDS daemons available当有mds服务crash的时候,候选的mds则补上。此时,已经连接上的计算服务器还是可以正常访问ceph存储。但是,新的计算服务器无法挂载ceph文件系统。 解决方法是,ssh登陆到mds服务有crash的服务器,然后重启其mds服务。再登陆备用的mds服务器,重启其mds服务。 - ssh ceph107
- systemctl restart ceph-8f1c1f24-59b1-11eb-aeb6-f4b78d05bf17@mds.cephfs.ceph106.hggsge.service
- ssh ceph102
- systemctl restart ceph-8f1c1f24-59b1-11eb-aeb6-f4b78d05bf17@mds.cephfs.ceph102.imxzno.service
8 T4 t1 [( `. D' g) h# l
7 z: z" u' |; a. z
4 u5 z7 l6 d, u+ y# B/ @3 s
( o v& s+ F" s0 O* A# N% | |