|
|
Rbd 无法删除; D1 Z( l- F, s+ e8 T2 M
rbd 无法删除,错误如下:3 A2 l2 f5 P; F, ^4 a
6 B# q7 o5 v& L* h6 r1 w
$ rbd rm nextcloud/mysql+ E5 ?- y$ }5 u9 A6 ]
2020-05-13 16:27:46.155 7f024bfff700 -1 librbd::image::RemoveRequest: 0x557a7af027a0 check_image_watchers: image has watchers - not removing6 d: f1 U3 t$ ]3 f7 n7 A! b
Removing image: 0% complete...failed.4 s- J' s0 _+ ` g
rbd: error: image still has watchers$ T, c8 J& S& u' M* J8 E
This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.8 ~( t( v; w6 V. m5 y( E
Z2 M, L, R4 X \7 i3 k, {$ rbd info nextcloud/mysql5 m; Q( l& L# K: B
rbd image 'mysql':6 F" _4 |, X$ ?$ {
size 40 GiB in 10240 objects
) J6 x) c- d+ d+ u order 22 (4 MiB objects)6 G4 m# w7 o7 o* | {' f) _
id: 17e006b8b4567# l/ H8 K( J: X, s" d/ h; M2 \; L- M
block_name_prefix: rbd_data.17e006b8b4567% \5 u2 ~9 W: l( Q+ }/ Z( i! e
format: 2/ j$ a4 o' v" @9 ~9 f+ m
features: layering
! U& Q" Z' M" v- ]0 J l op_features:
" g. i! R }! X. _8 b flags:
) [ F$ D7 { f% i create_timestamp: Tue Oct 15 10:47:34 2019
7 h/ G$ O1 \4 M) T, H7 i; U复制. s" u" b2 c/ G$ i2 I
查看当前 rbd 状态: c* b& }1 Z/ j! w1 j+ W7 F
: J* @# o, n) e" R$ J3 H3 f* o$ rbd status nextcloud/mysql- U7 c0 T# c% j) T! U5 m2 E
Watchers:9 Y; V" w8 ^: P# z( W6 g6 R
watcher=10.100.21.95:0/115493307 client.67866 cookie=7, c- [7 U2 a2 E+ B
复制# C; ]0 {' u5 E! |* O) `( Q; @$ C
发现有节点正在挂载,登入到相应机器进行查看:0 K1 T4 p7 c4 `. X( f/ _
; s* M8 W' Z5 y) |$ R
$ rbd showmapped
2 L% ^/ O. j2 \6 \/ J8 ^& @id pool image snap device
9 n7 T7 f5 E8 C* ^...
, x! e: K. @$ l% U/ I/ a3 nextcloud mysql - /dev/rbd3
, L1 h3 }9 [5 G! a4 s: x; s! j2 R8 h复制# k5 L7 m0 K. f! ]+ M2 }8 j
取消映射:
& p' B3 y( G4 y: V, H; r* l! G; V' y* N4 ^ m- W
$ rbd unmap nextcloud/mysql
2 z6 k" H5 c# j) A0 A ^复制 P& y) \$ O: q8 Y \
重新执行删除操作即可:
, ~, i9 {& ~6 l. ^$ U
& U8 a9 ~- j4 ?2 n" F: z' ~$ rbd rm nextcloud/mysql
$ Y# [3 X3 P* N2 g$ MRemoving image: 100% complete...done.
1 A$ i! q3 x2 y) i% |4 N复制! y$ H0 y2 s6 E+ u+ l+ j
暴力解决方案,直接对其添加黑名单,忽略挂载节点:
# Z/ v' o5 p0 h0 p* m+ B3 F$ j$ o1 [ V, ]7 f' W
$ ceph osd blacklist add 10.100.21.95:0/115493307
! S7 L- e/ Y/ S& x$ k$ rbd rm nextcloud/mysql
; y; j9 @4 t7 w. a5 X' B复制
8 }; m+ A& p1 ^OSD 延迟- G& B4 Y. { f: O: F+ L
查看是否有 osd 延迟:
. m7 ~0 i% z/ m w- C7 h0 e+ h
1 m a: _) F5 w+ a( d$ ceph osd perf
1 r3 r8 \4 a: Rosd commit_latency(ms) apply_latency(ms)5 w1 I7 q O, u. F, o! V: c
2 0 0- f( O- J3 g+ a
1 0 02 i n* d( q m+ r8 J
0 0 0! b& o. |" p8 W7 J9 J7 o
复制
$ |! ?0 c7 Y. Y. G/ C/ h/ T碎片整理
/ c+ o' {& Y9 V4 i7 c查看碎片:2 f' P2 V2 q/ t3 ~
+ ?5 k& L0 t k; ` Y$ \8 ]+ x% \
$ xfs_db -c frag -r /dev/mapper/VolGroup-lv_data1
/ w U/ O5 X/ c+ K8 X, w复制 |: ?. i' f5 c& y# R$ z
整理碎片:' S `1 P2 Z3 [& T" B; y* a' l K% o
* |' g+ ~8 G$ y4 A/ I) \1 R& p查看通电时长
- |$ H3 b* r J9 ^4 y2 g查看磁盘通电时长:
C4 Z; c3 n; @/ ^* ^" N5 |2 ^) V/ f% J" X9 _
$ smartctl -A /dev/mapper/VolGroup-lv_data1: a/ {' p* ~# m( j1 w
复制; _, w9 y3 _1 v5 n- V
修改副本数量
7 i' k9 a0 `; H- P: v; t3 b0 a修改副本数量:
" w% J: l1 o8 T3 ~4 U ] v' A5 \7 Z* F$ @( R V2 H
$ ceph osd pool set fs_data2 min_size 1 ^0 \" s+ C# M! k) h$ x' L( r
$ ceph osd pool set fs_data2 size 23 U0 h6 s* [5 V. u2 o: ~9 `% L
复制
# ^, |0 S- p) c' p: Y0 F添加 / 删除 pool
6 D4 _# Q* c7 B8 U& ]% b8 F- ^添加 / 删除 pool:
2 A9 B0 N% L' P7 \2 g" q, T. Q7 l/ \# t7 S% N) R, T
$ ceph fs add_data_pool fs fs_data2
& ~: A' b4 @1 N$ ceph fs rm_data_pool fs fs_data27 t& c( T" r4 L6 h
复制/ A5 d; ?9 n) p) x+ i
osd 数据均衡分布 m& l& u6 R& |$ V M" T1 s! p2 N. ]; Y
osd 数据均衡分布:- F" \9 R4 W# B6 R8 u0 @) i
) _& H) F; u, D$ j3 ]2 g+ a$ ceph balancer status
1 I$ V; Z, P4 W% w5 ^- }) P$ ceph balancer on1 a( v$ T6 Q$ Z7 \) m7 K
$ ceph balancer mode crush-compat' I* T$ p; q- P& J- Z3 t( h8 _
复制
$ e( Z% j- w6 l: \- }mds 无法查询$ s4 h- G+ e0 a6 f1 C+ s, H. [! N
mds 无法查询:+ n7 f H. _5 s/ n/ r+ q; W
R5 y0 Q7 G9 z4 h9 b& p. t$ ceph fs status7 i* o( ~$ F' s: ]) w% A4 s
Error EINVAL: Traceback (most recent call last): {% H. x a# S( H0 h
File "/usr/lib64/ceph/mgr/status/module.py", line 311, in handle_command
0 }6 r. ^. a) [$ A7 o return self.handle_fs_status(cmd)
3 H( m x0 s# S5 T6 E File "/usr/lib64/ceph/mgr/status/module.py", line 177, in handle_fs_status" F/ ]9 B9 h u, {" ~" y8 u) L
mds_versions[metadata.get('ceph_version', "unknown")].append(info['name'])0 z+ p9 ~. d& N: ?% b
AttributeError: 'NoneType' object has no attribute 'get'
. C# O1 ]% Y8 y' R8 V6 x0 `& q0 P% U% M* t K
$ ceph mds metadata
$ Y7 @7 x$ H6 p* s[! k6 D8 `; V. P7 R# q
{
/ b9 X- y. n" t8 S( ? "name": "BJ-YZ-CEPH-94-54"* b1 X" E" @9 s, l( h g% J" n
},! o- ?0 H. l2 P8 {
{8 q, b1 x4 H" e5 _
"name": "BJ-YZ-CEPH-94-53",
$ A* }4 Q. k5 G# l j& P "addr": "10.100.94.53:6825/4233274463",
% _: ^( b3 ^7 e4 b8 W "arch": "x86_64",/ F0 o# J& _$ a0 ^4 S* k
"ceph_release": "mimic",$ t$ B3 p9 [4 M( |; r5 @
"ceph_version": "ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)",
) R5 [/ Z i' J) d. u "ceph_version_short": "13.2.10",
4 ^" v4 ~( w, ]7 h( B) t "cpu": "Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz",* w, d! C2 I* f, A
"distro": "centos",. Q" P- L& |4 v5 _
"distro_description": "CentOS Linux 7 (Core)",9 p; x* q! [7 ^5 I+ m$ [
"distro_version": "7",& x0 G* ?' N' h
"hostname": "BJ-YZ-CEPH-94-53",
& h- T0 _: n$ `, T "kernel_description": "#1 SMP Sat Dec 10 18:16:05 EST 2016",- s, |" B6 K/ ^; h; r
"kernel_version": "4.4.38-1.el7.elrepo.x86_64",
" u+ |) _1 _6 z, R" O "mem_swap_kb": "67108860",
9 @' W4 P4 Q: o+ j "mem_total_kb": "131914936",9 v; Y* A( Y% R! `4 u3 ~
"os": "Linux"
4 ?+ a1 L6 ?4 N) s },
" g3 G/ j- [4 Z# \6 ` {% n# [% j' R* \) }
"name": "BJ-YZ-CEPH-94-52",6 @+ Z& M2 L/ t
"addr": "10.100.94.52:6800/3956121270",3 |' H. |/ E: l5 ?; e4 M! B7 {0 n1 [' k
"arch": "x86_64",. x8 y+ Y' I) R- {4 R
"ceph_release": "mimic",
2 h" Y" V4 l' t4 V3 Y# P7 F "ceph_version": "ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)"," t$ ~* E; @' t* Q( `& ~+ N4 m
"ceph_version_short": "13.2.10",
# k* ]. o3 o3 |4 }9 C' N, X "cpu": "Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz",- K- v* a( _/ ~ ]5 Y8 T. X( d
"distro": "centos",5 ^3 a5 K# e7 d8 h. g$ s
"distro_description": "CentOS Linux 7 (Core)",
+ w# K9 Q& B. O- B6 [ "distro_version": "7",, a- a$ V6 r* Q7 U* _* r. m& Z" j" }
"hostname": "BJ-YZ-CEPH-94-52",
6 q! ~* ?4 W) K; w- t' j- g "kernel_description": "#1 SMP Sat Dec 10 18:16:05 EST 2016",
4 R) C Z! G' k. A+ } "kernel_version": "4.4.38-1.el7.elrepo.x86_64",
0 q/ I1 |1 N; U1 { "mem_swap_kb": "67108860"," J8 ?7 u+ ^/ x- K; t* Y
"mem_total_kb": "131914936",
- U# a8 f2 ]8 g; R3 Z, m "os": "Linux"
6 Q+ ^0 ?( y# A P/ s9 O }
! W3 o1 B- R0 O# y]- v Z: A5 h) {! ?
复制
/ r( R" u2 h( x" c3 p重启 mds 解决。: H! K) C! h. q, B
1 B: X/ J4 ~/ I8 O% f( D
cephfs 显示状态正常但无法写入数据' H& w" M, }% j3 D2 D. I
cephfs 显示正常无法使用,一般是有异常 client 导致的,首先查找 mds 是否存在链接,尝试删除链接解决:! B# L9 m3 d) Q* |* M
" D' D4 z) u% R6 e$ @: O' U$ ceph tell mds.BJ-YZ-CEPH-94-52 session ls0 O+ s, ]( [- O5 U1 ]
$ ceph tell mds.BJ-YZ-CEPH-94-52 session evict id=834283( s1 a' }, @, j- i6 m' R) i
复制( ?# F- \' i" Q$ y
每一个 mds 的 id 号不通用,不能跨节点删除。
- ]+ H M( [ I/ @* t! `. U) v6 q0 m& w
fs 增加 mds
# a( L w. d; O5 Rfs 增加 mds:0 W( m0 O) @$ E, E* F2 h, z
8 \! t% W; J; @
$ ceph fs set fs max_mds 2" L& ~% {2 a7 ?2 ]" u
复制# J/ w4 O5 L9 ~+ X& L* u0 v5 n
mon 时区异常
3 r8 G9 |$ @- r! Zmon 因为时区有部分异常导致报错如下:: a Y0 A/ ]: S* y0 k7 a( e
1 S5 A5 J. r$ R3 v: S6 C$ ceph -s
! K8 G) |* g3 w4 [ X cluster:
7 j0 H$ [" s8 D' t8 U9 {1 c id: 2f77b028-ed2a-4010-9b79-90fd3052afc65 u/ B9 o. a! w8 a, f8 l
health: HEALTH_WARN
. L/ ]: Y# _4 P 9 slow ops, oldest one blocked for 211643 sec, daemons [mon.BJ-YZ-CEPH-94-53,mon.BJ-YZ-CEPH-94-54] have slow ops.
& K( e6 _' ~7 I4 _+ [/ @7 V& `0 N) i/ W1 N1 X2 [# A+ w. b9 v
services:
" P- l, R* x8 E2 E mon: 3 daemons, quorum BJ-YZ-CEPH-94-52,BJ-YZ-CEPH-94-53,BJ-YZ-CEPH-94-54. W, @9 W# t* t* j
mgr: BJ-YZ-CEPH-94-52(active), standbys: BJ-YZ-CEPH-94-54, BJ-YZ-CEPH-94-53; h! p" y6 K# ^/ o
mds: fs-2/2/2 up {0=BJ-YZ-CEPH-94-52=up:active,1=BJ-YZ-CEPH-94-53=up:active}, 1 up:standby-replay
9 T5 h3 t1 ]' @' o- [" l osd: 36 osds: 36 up, 36 in
$ G5 b; f" J: M5 t3 l: ?( C( w. [
data:
+ Q5 n" o. M) n; }9 p" S, G pools: 7 pools, 1152 pgs
* D6 B2 h9 r3 C$ Q1 J; c objects: 37.66 M objects, 67 TiB, l0 `- q7 y% T9 a& v
usage: 136 TiB used, 126 TiB / 262 TiB avail
) r5 l8 Z7 T0 F# O9 k& p, l% ] pgs: 1148 active+clean
7 g7 O6 e" }% d" A9 O. z; O 4 active+clean+scrubbing+deep; \5 @' S7 M3 Q) e7 U6 W
9 `- t+ E. L- V% o; R% }- N S
io:$ ^, B- C' K+ |7 o$ j1 d% Z
client: 13 KiB/s rd, 27 MiB/s wr, 2 op/s rd, 19 op/s wr: T2 @1 H/ S* H2 z1 T. a& Z
复制; z( E. h2 C( Z( j' o$ ~& r
配置 npt sever:
# c/ e/ H) K1 o s2 k! ]% [ ]+ h$ C1 }8 o* `, ~0 i2 X( B
$ systemctl status ntpd( R. b$ w/ v$ L& |
$ systemctl start ntpd
& {9 w# J5 B9 D9 q* n; n复制; n ]- s. U- z1 U
重启异常的 mon.targe 解决:
! S, k' B0 ^9 a* i; X, D
7 ]9 Q6 t- z' [+ U7 Y: o$ systemctl status ceph-mon.target% v" u% k# q7 i$ A1 s
$ systemctl restart ceph-mon.target; r: }8 W2 R5 f4 E. X
复制4 H8 U9 a1 U8 p3 z
1 MDSs report slow requests3 e' {/ a3 X5 [* ^8 P
报错如下:
% E) ]% {) P6 h# h9 `
% g+ V( s: `% T) u9 L' ~2 Q2 f$ y. } P$ ceph -s
: A8 s( y- F9 w2 }" }+ t, q cluster:
1 Q4 o3 H1 }4 ^& G. a) z6 c! t$ I id: b313ec26-5aa0-4db2-9fb5-a38b207471ee" [$ m1 V9 h6 E& ]: x
health: HEALTH_WARN
- d7 h8 A3 Q% _- z" \/ L 1 MDSs report slow requests
/ K1 Y4 {3 }9 q x2 ] Reduced data availability: 38 pgs inactive
2 U) c1 b* b/ [7 n. M) ?; W Degraded data redundancy: 122006/1192166 objects degraded (10.234%), 102 pgs degraded, 116 pgs undersized% a( s# L* R. r3 B) z+ Y$ M
101 slow ops, oldest one blocked for 81045 sec, daemons [osd.1,osd.2] have slow ops.1 x& ?0 S8 y2 B( C3 w
复制
0 e% o% r C3 ]' a+ W( u/ o# V重启 mon 即可解决:! A% g1 d5 `$ b8 w. g r9 p3 B, w# H$ T
# z X1 u5 M! Q7 t% y: t$ systemctl restart ceph-mon.target
8 |6 ~! o5 U$ g; s: B* B6 @2 ]' l* V复制3 H: X/ }! \- S& _
如果无法解决需要重启 mds 解决:/ o: i. I$ ^+ q% g# Z
: Y9 x8 G4 _& U6 S6 B( d
$ systemctl restart ceph-mds@${HOSTNAME}3 |& R" ^: `% ^
复制+ }+ j8 H0 H) C# n7 K% x
Reduced data availability: 38 pgs inactive
, ~, V1 h6 Y' c6 f2 U# g报错如下:https://zhuanlan.zhihu.com/p/74323736[1]
, y2 A7 H4 a/ [) I7 q% b0 X0 s8 ~& U( `7 j3 E
$ ceph -s* h u7 ?/ X! L+ f( W! L+ I* W
cluster:
3 r; E0 o5 j3 Y1 R5 J9 n9 n. @ id: b313ec26-5aa0-4db2-9fb5-a38b207471ee) k. u# z+ c2 l2 M$ B5 S6 R8 W
health: HEALTH_WARN9 F) p- s- a* k
1 MDSs report slow requests& D) K8 ] b* E/ V- ?/ c
Reduced data availability: 38 pgs inactive
4 r# A9 I6 \* t' ] 145 slow ops, oldest one blocked for 184238 sec, daemons [osd.1,osd.2] have slow ops.
& x. _: q9 p, C# w7 {9 [! A8 n" O% ~) Z3 g# O/ X7 Z" e
services:
9 |$ T! |+ C3 u/ i. R. P& \ mon: 3 daemons, quorum master001,master002,master003
, X1 K( w2 T1 P0 V7 N% m4 l' W mgr: master001(active), standbys: master002, master003
9 m0 l9 r/ N9 B& r" z2 c mds: kubernetes-2/2/2 up {0=master001=up:active,1=master002=up:active}, 1 up:standby; j- x" m5 r. n# A/ P9 ^% B
osd: 3 osds: 3 up, 3 in+ R w! W+ K5 L/ \
rgw: 1 daemon active
, k" u G+ ~" V% W4 v( l, s3 S+ h9 ?) e; C0 z/ U& F2 g; C
data:
4 [( V# m. ^7 `' V5 B" R( y, `- m pools: 9 pools, 244 pgs- I- ]( b# C8 A$ W3 v+ v
objects: 535.1 k objects, 177 GiB0 Y1 M4 t/ l. [ t0 t. h* ?* e) V
usage: 470 GiB used, 4.1 TiB / 4.6 TiB avail; L% \+ `1 M% u/ u2 ~
pgs: 15.574% pgs unknown
7 V; c2 T: u" w 206 active+clean; q8 T6 G4 c2 A: y
38 unknown) o- y/ P! Y! h, }
9 @! Z: f, y X$ g4 d6 U% r+ m V; K io:
0 N- F5 w; h) N client: 35 KiB/s wr, 0 op/s rd, 2 op/s wr3 s- v( M# I3 T9 w3 C
复制7 G" W6 m/ S# {' M' g: Z4 z: M
此问题属于 pg 丢失数据并且无法自动回复造成的。解决办法是清除 pg 数据让其自动修复,但这样可能会造成数据丢失(如果 size 为 1 则肯定丢失数据)' y, s1 ?# T2 l6 }) J
" {- E7 d( s" h' [3 }9 [# k1 `' D首先查看异常的 pg:( n' \9 ]/ ~2 Y6 Q; a6 k L! h
+ \8 N* z$ X g7 F3 [! s然后执行 query 查看信息:
* E- p. t$ @+ g6 M+ p5 Z9 r. V$ {$ z# S! g- N
$ ceph pg 1.6e query
1 c* e( v. e( T' j, t1 |* m1 X7 ^3 T0 P- hError ENOENT: i don't have pgid 1.6e# V# l( R8 I. u c+ w
复制
- X$ ]* s% p9 Z0 S' h% a上述无法查到 pg,通过如下命令查看异常的 pg:5 [4 g* p& O" u4 R
9 ^# D u, W4 ]; j7 J$ ceph pg dump_stuck unclean" t+ B3 N G; x. \- R
ok+ m7 E0 K' O3 v! _- H
PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY
3 H, k0 L) |. ?) L1.74 unknown [] -1 [] -1
$ U0 ~/ h2 [; L% c- l U1.70 unknown [] -1 [] -1# p( y, y+ F' T3 C' D
1.6a unknown [] -1 [] -1' C$ [3 H- b( P, ]6 Y# s" e9 X
1.2d unknown [] -1 [] -1- O6 a& }: G6 g% k& Z! n- [
1.20 unknown [] -1 [] -1
; c) v. {# h+ Y! I" k+ I1.1e unknown [] -1 [] -1
) H6 Y0 x0 ]4 l: G. u/ C1.1c unknown [] -1 [] -1
$ v1 c* W H% U2 B' f$ S. r1.17 unknown [] -1 [] -1* c1 L) F0 o1 N! w
1.9 unknown [] -1 [] -1. ^' n4 D8 J& H/ t! ]
1.29 unknown [] -1 [] -1
9 X/ c& X/ R+ e' i$ X1.56 unknown [] -1 [] -1
9 a. z( p, |: W; M1.72 unknown [] -1 [] -1
! |" |+ u" Q! T# X a1.45 unknown [] -1 [] -1
: j& B3 y! F# l# k$ ~1.4e unknown [] -1 [] -1
$ X+ |# c; Z0 g0 J" l) h1.46 unknown [] -1 [] -1
D' u2 w z3 ^! O5 i0 r! N1.22 unknown [] -1 [] -1
1 ]. K$ _" z) ^$ z7 d' }1.53 unknown [] -1 [] -17 b. Q0 @" s" y Z
1.59 unknown [] -1 [] -1
$ a2 w2 ?* J* E8 i1.24 unknown [] -1 [] -1/ j) `8 y5 {4 r7 C
1.55 unknown [] -1 [] -1
, ^3 N: ~1 b+ t1 J/ {1.3f unknown [] -1 [] -1. K, [6 j; y3 X4 v$ a
1.38 unknown [] -1 [] -1
7 _0 e- q( Q8 I. b* E- ~8 f1.a unknown [] -1 [] -1* q3 _, @$ q1 j+ Y0 b/ M) ?
1.7 unknown [] -1 [] -1- w: F8 x5 I9 K5 C& E4 a$ g% H
1.34 unknown [] -1 [] -19 F! L. d v' d( }1 j9 x% U1 }
1.64 unknown [] -1 [] -1
, h) e8 ]$ A; e+ T; O+ q% f+ L4 t1.6 unknown [] -1 [] -1- W/ a2 X0 ]. P( B: Z6 _
1.32 unknown [] -1 [] -1
. u! @( ? [2 ~; u; X8 W! `$ c' q7 K$ k1.4 unknown [] -1 [] -1
9 `3 j1 V3 d7 c# |1.2e unknown [] -1 [] -1
- i0 w! V1 h+ }1.31 unknown [] -1 [] -1
4 l% k b2 O1 j: f7 q" a1.5e unknown [] -1 [] -1
" r1 z5 ]- Q2 A1.0 unknown [] -1 [] -1
; L7 O% ^. s4 L4 T! D4 m1.42 unknown [] -1 [] -1; w! L9 R6 q5 P% B' n! B$ h2 [
1.15 unknown [] -1 [] -1% G; O2 f4 g: Z" Y; l
1.6e unknown [] -1 [] -1
) O6 z" S |* q% x1.41 unknown [] -1 [] -1
- G1 { ^' o2 A% w9 d1.10 unknown [] -1 [] -1
4 s H N* t0 R# V+ j复制* k+ ]) j) z ~
执行如下命令强制清除 pg 的数据:https://docs.ceph.com/docs/mimic ... troubleshooting-pg/[2]
0 j3 l% m/ ?8 }( P4 c0 J4 b
3 _9 c3 _, J3 U/ F$ j$ ceph osd force-create-pg 1.74 --yes-i-really-mean-it
i, a( k: k- f1 B {( A
9 P) S# _/ S- m# 批量执行
% `: A7 T9 M. s9 s# ceph pg dump_stuck unclean|awk '{print $1}'|xargs -i ceph osd force-create-pg {} --yes-i-really-mean-it
. |4 d* g* g4 L( y! b9 |复制' D8 g9 E1 J7 |3 s) ~
执行完成后即可恢复。
- |% I. a% e$ ^+ a' a/ v+ ~) i. ]. j3 C ^8 k# V. T
1 clients failing to respond to capability release; m( a- m/ s( f1 m
报错如下:
6 d+ R: E0 o) E6 G6 T$ L, a) _) R# K3 _& ?* W& R \
$ ceph health detail8 p) z5 ~4 Q( i( G4 o# |6 C
HEALTH_WARN 1 clients failing to respond to capability release
r) J8 d! K" u# ]3 ~+ M( MMDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release
3 A# @$ n1 K2 K( } mdsmaster001(mds.0): Client master003.k8s.shileizcc-ops.com: failing to respond to capability release client_id: 284951
z" O6 ]: {& y复制% S4 _& F0 N' f6 A1 F( u
清除次 ID 即可:https://blog.csdn.net/zuoyang1990/article/details/98530070[3]1 G& V# I! K3 T( K. Q; C
& \5 W$ ?( f2 U/ _( H9 Z
$ ceph daemon mds.master003 session ls|grep 2849515 k' X3 D5 F3 k4 S
$ ceph tell mds.master003 session evict id=2849513 D0 @" i+ l1 O) ]8 ? c
复制
% u/ s/ w, m/ G5 Q8 t如果报错如下:
$ w7 P: Z7 z* a) M4 ~ e
6 v, s3 ?5 n3 t0 N7 S, j$ ceph tell mds.master003 session evict id=284951( E4 G# j; d: V( i% R$ Q5 }
2020-08-13 10:45:03.869 7f271b7fe700 0 client.306366 ms_handle_reset on 10.100.21.95:6800/1646216103
5 |0 O+ V3 a" J: t( R2020-08-13 10:45:03.881 7f2730ff9700 0 client.316415 ms_handle_reset on 10.100.21.95:6800/1646216103
T* [; V, Y: ^) i9 j, C6 v( OError EAGAIN: MDS is replaying log Y$ [/ f, G+ M2 T% p* s
复制6 r+ u H' l' @. V9 F- ?
需要到 mds.0 节点执行,否则无法找到次 client。
8 m( ?: K+ N) j* v; `% A& Z( q4 s) p% f$ @2 ^% A
内核优化( Q; a4 ^) a0 a+ t! }. A0 u8 r
内核优化:https://blog.csdn.net/fuzhongfaya/article/details/80932766[4]" M# M0 n, k/ x3 @2 j6 W
- g1 d& s1 i5 @3 s- S# f
$ echo "8192" > /sys/block/sda/queue/read_ahead_kb* l& y" V1 }8 c9 U' {( p
$ echo "vm.swappiness = 0" | tee -a /etc/sysctl.conf
8 t4 ?8 `# V- c- m5 Z7 M$ sysctl -p
( I0 p5 l2 B' c* n- [$ echo "deadline" > /sys/block/sd[x]/queue/scheduler
. [- t q* f% N6 M1 I0 s0 z3 d7 |) F1 ^: l; e: q0 h# F) g
# ssd V0 O z! [# ^7 Z: M
# echo "noop" > /sys/block/sd[x]/queue/scheduler
+ T' X/ V8 U' {7 N" ^复制
; g$ r9 r$ N9 o# a% F. dswap 最好是直接关闭,配置内存参数在一定程度上不会生效。
+ s6 M. D( {9 n( q; R8 C, I4 K1 ^/ j' X. ^* u6 e5 B
配置文件
9 i3 U8 p9 Z0 w# u8 _8 N! v; T% V4 Z8 j3 W' ~. U! @3 f
40 核心 128 GB 配置文件:& E# I1 @7 Z- ^/ {1 T
9 A1 d5 O8 k U+ F6 G7 U
[global]! b2 }' j0 p" J+ C
fsid = 2f77b028-ed2a-4010-9b79-90fd3052afc6
/ K, ^7 O: d4 X6 k3 Zmon_initial_members = BJ-YZ-CEPH-94-52, BJ-YZ-CEPH-94-53, BJ-YZ-CEPH-94-54, l: a" t/ }" g' l9 d2 {% n# i
mon_host = 10.100.94.52,10.100.94.53,10.100.94.54; J9 H% B R/ l' t( g2 @
auth_cluster_required = cephx, D$ j! ^/ W3 Q- X
auth_service_required = cephx
) H" {0 p; G& @! ?5 G: Yauth_client_required = cephx4 W7 W% u3 W" e9 J/ s% C5 t. m
6 m' F# c8 o8 D5 z9 r6 }: I; O
public network = 10.100.94.0/24
, Y" N: w: z5 d& y3 Y0 Fcluster network = 10.100.94.0/24
{" u+ I- o T+ v, C' \* S$ n2 l% {: N P8 i0 @" a& R
[mon.a]
6 k# Z$ H& C# f8 Ihost = BJ-YZ-CEPH-94-52* u) ?' V2 B0 a$ D
mon addr = 10.100.94.52:6789
$ ~% s* g& w! A- K! H' w: X, \& N) }' G9 K, s$ k3 r, B/ C, J- H0 S
[mon.b]
m* ]8 V" W# Y! U4 M: whost = BJ-YZ-CEPH-94-53
% \4 \5 q3 ^6 Q H1 e" Xmon addr = 10.100.94.53:6789& H- X7 Q; ]: h& l% ] q% G
% z# o5 |5 G; s
[mon.c]2 h5 E5 M5 D( w" a- ` I, Q
host = BJ-YZ-CEPH-94-54
( d( n. v U. D( Qmon addr = 10.100.94.54:6789% }$ E! A4 `( i
: ?, c& j; `5 R[mon]
( ^+ w+ p Y) e+ tmon data = /var/lib/ceph/mon/ceph-$id
. }6 z7 E! P3 D' v6 ]) k3 [8 @- U# Q/ F$ W* ^7 W% r8 j
# monitor 间的 clock drift,默认值 0.05' E# ^3 `* m" F# u- M
mon clock drift allowed = 15 T% z8 D. C8 M& F! x
" s, j7 H: |& S( |, S' _) c: q# 向 monitor 报告 down 的最小 OSD 数,默认值 1
7 }1 i1 s: D& B/ K9 C$ b. nmon osd min down reporters = 11 T1 \7 |6 d d) g$ J/ E0 C! w
% V! g1 [2 A# |) ]; p# B
# 标记一个OSD状态为down和out之前ceph等待的秒数,默认值3007 H% b1 w" {" d
mon osd down out interval = 600. O4 r& m/ {8 G! u5 \
9 ?5 G" Z3 F/ K. v5 j/ K. ?mon_allow_pool_delete = true
& r f5 f+ e3 }/ s: t7 I" \2 R0 | ^+ {# F/ z
[osd]3 {3 H2 o: F$ ~' C2 l: v
# osd 数据路径
- z! k8 S- f, T5 q* qosd data = /var/lib/ceph/osd/ceph-$id U) e- B9 P+ m
( ^) _8 {4 f8 Q5 e! {2 m; B# 默认 pool pg,pgp 数量7 i& G0 g& Y6 F" Y0 m
osd pool default pg num = 1200+ g( t% ?9 \+ @* D
osd pool default pgp num = 1200
2 {3 ^9 ]/ q' W/ v. Z* t4 i
9 {7 O* T% D2 T# osd 的 journal 写日志时的大小默认 5120
2 ]# ~/ i0 T$ y$ x- Z6 Qosd journal size = 20000% V1 V1 D: U6 U+ D( q
- h" k6 r; K" [5 A# W/ g1 |1 b% F1 w# 格式化文件系统类型- I! \# s" b" A3 d. O8 V
osd mkfs type = xfs5 u. H1 {" _( S; g
; t/ \ ]1 Y: X! E4 s- Y# 格式化文件系统时附加参数+ e, c( |3 ^$ l- k; d
osd mkfs options xfs = -f
$ a0 Z7 [& i3 v. ]0 F
' t. S, m0 |) O! N' @# 为 XATTRS 使用 object map,EXT4 文件系统时使用,XFS 或者 btrf 也可以使用,默认 false
) O, f0 j6 ^: u. W' J8 Nfilestore xattr use omap = true' x( j/ V7 K* k
! @+ Q4 n4 Z7 w* a. f9 r
# 从日志到数据盘最小同步间隔(seconds),默认值 0.1
9 |- j; s& o: @6 Z0 _, Cfilestore min sync interval = 102 g) z# t9 ?5 X- d* j" ^
( L: {, s$ C8 h m" h
# 从日志到数据盘最大同步间隔(seconds),默认值 51 P, O9 a0 C2 z' @
filestore max sync interval = 152 n4 L0 c" L2 n+ \ i9 l
7 u7 ?4 n& ?- q3 \7 u& T7 E( d. P# 数据盘最大接受的操作数,默认值 500
- W( e; m8 t, Z' z1 P7 R+ H4 k, ^* Pfilestore queue max ops = 25000% Z4 n5 i9 {8 e' b' ]
) G, O. G0 r$ y$ v- _6 v# 数据盘能够 commit 的最大字节数(bytes),默认值 1006 \% v( a+ C2 p& j: T
filestore queue max bytes = 10485760
; j6 o3 O7 f) H) r4 L. {" s v3 f' V9 ?
# 数据盘能够 commit 的操作数,500" S) D/ s3 a% e0 @8 ?6 m! s- B2 n
filestore queue committing max ops = 5000
9 `- k8 [; o8 o* H5 N. h9 ^6 X! B4 b
# 数据盘能够 commit 的最大字节数(bytes),默认值 100
5 w5 ?4 M8 T% u- b6 }) |: Nfilestore queue committing max bytes = 10485760000
8 G7 `: R6 N3 B, m+ }, q( S: T" Q9 D, N
# 前一个子目录分裂成子目录中的文件的最大数量,默认值 2
; K$ L7 l; K. f7 n1 Jfilestore split multiple = 8
! |1 y/ I5 U: s$ o, I) J7 h" K+ U9 I" X( L) w0 T
# 前一个子类目录中的文件合并到父类的最小数量,默认值103 m# m+ d1 ]0 D- Y. k0 o- E! l8 B
filestore merge threshold = 40
" B, n, S. ^8 v( C2 ]4 d( N' t" s4 Z# @" {+ {5 ?2 y. f
# 对象文件句柄缓存大小,默认值 128/ @. x; w' _5 c! p" p
filestore fd cache size = 1024! Q; X! X9 O9 k2 o" ~
; O1 E- z, @& m# v% W
# 并发文件系统操作数,默认值 2: u3 F# y1 `" u8 @
filestore op threads = 32
2 A2 I9 J0 m o, U8 R g* W: }2 g, i& q
# journal 一次性写入的最大字节数(bytes),默认值 1048560
) K' Y7 A( N/ f. C Ojournal max write bytes = 1073714824, F8 y! d7 y% Z- ^
, Y# z! Y2 d; U( r5 ^ P
# journal一次性写入的最大记录数,默认值 100
! ^9 p8 f* @1 {) b$ f* w% g/ d: _journal max write entries = 10000* u( ?4 y% \5 Z7 x$ { \+ T X* ]
. P, y. Q- o& v7 h0 Z# journal一次性最大在队列中的操作数,默认值 50; r8 z9 }9 t% q* q" S G
journal queue max ops = 500001 |; d' [8 k* U
* c4 {/ I# N3 |
# journal一次性最大在队列中的字节数(bytes),默认值 33554432" U: A- M1 F" T9 B* m
journal queue max bytes = 104857600000 S& Z2 u/ {! G4 G0 O w+ v. S
5 E% y5 F7 g. `4 ^2 {
# # OSD一次可写入的最大值(MB), 默认 90. Y7 k: F* c, o9 k( _9 i
osd max write size = 512, R# y+ h, v; X, Z) ?5 |2 L
' E2 v5 s$ b, C* ], H& B, D
# 客户端允许在内存中的最大数据(bytes), 默认值100: c$ Y% f' [1 ]' }
osd client message size cap = 2147483648) p% A" h, Y5 ~8 s8 t9 W
3 }4 k; p! s- B) k. [9 Z
# 在 Deep Scrub 时候允许读取的字节数(bytes), 默认值524288
7 Y. j. D% F6 R- Oosd deep scrub stride = 1310720" T) s7 H' ^ m0 l$ ~" y
$ N: ?, m8 Z" s' [8 c4 q# 并发文件系统操作数, 默认值 2. g9 A6 f) ]; q! d
osd op threads = 32
8 S5 g9 L6 }" C8 {4 r; n% }1 ?$ p# {9 ?
# OSD 密集型操作例如恢复和 Scrubbing 时的线程, 默认值1
2 o3 I4 o7 e( |+ X; x6 R! iosd disk threads = 10
! N. X; U G3 A9 ^
' H4 O; M3 c; b. h# 保留 OSD Map 的缓存(MB), 默认 500; P \# u4 b; n7 S" F
osd map cache size = 10240
" f+ d1 X1 _& x- M0 U6 z T" J: J' b3 W1 n4 [
# OSD 进程在内存中的 OSD Map 缓存(MB), 默认 503 T+ ]2 z/ J Z$ j! d6 J
osd map cache bl size = 1280
# a. @& s9 A" S. i; R/ l, u
/ g' @+ i' l( T+ g& `# 默认值rw,noatime,inode64, Ceph OSD xfs Mount选项# g o m' d6 j& G1 I' D
osd mount options xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier"
" A* W/ m- ]* J+ D! D) E- _3 R8 C: d* g9 z
# 恢复操作优先级,取值 1-63,值越高占用资源越高, 默认值 104 c, w1 b" ?8 N5 [+ U2 D
osd recovery op priority = 20
2 I0 U4 b0 u O9 s- T5 I \, }$ ~- E0 w+ u, O5 _6 S9 x
# 同一时间内活跃的恢复请求数, 默认值 15
/ U0 \0 D( X U( @. \" Kosd recovery max active = 15; R) I3 j4 N9 a
3 p0 R) m* }4 k! o6 o, v" ]+ H i' c
# 一个 OSD 允许的最大 backfills 数, 默认值 10
. ^0 D- {/ C0 L- m8 Z6 v2 |9 josd max backfills = 10
+ U5 ? Z% q& n4 y( a6 L/ r- w% }8 a; x- w# p. J' S' }3 [
# 开启严格队列降级操作1 c' m; C- S! H8 V0 O0 Y& w% y
osd op queue cut off = high
) o6 y" q3 v" D# M: c% O% ^ R6 A' R" }7 o7 O
osd_deep_scrub_large_omap_object_key_threshold = 800000
$ }) d3 M" L* m! L" n8 zosd_deep_scrub_large_omap_object_value_sum_threshold = 10737418240
1 @0 n. y l; F' ]) B/ f
2 S9 Y% r' r! h6 M; H$ n* n9 F[mds]
* `3 c) K6 ?% ~# Q7 j* A# mds 缓存大小设置 60GB& V' ^+ Z9 M7 R7 y. m
mds cache memory limit = 62212254726
, t y, G6 o8 i- B
) i& R7 [7 Y4 c- h2 ]0 r# 超时时间默认 60 秒
4 Z( R8 p# n" v; `' m1 R$ M: lmds_revoke_cap_timeout = 360
5 F6 X O8 F6 Z/ C
% f2 `! w( J4 Umds log max segments = 51200- v8 Y8 S& z+ n* ~. M
mds log max expiring = 51200
; }& \1 P! p$ |0 C0 V4 [/ L! k0 O7 F" s' {: R3 y8 G' P4 T$ n. C
mds_beacon_grace = 300
! J9 C7 p. Y" b+ f
( V- f9 W% x d0 ~/ K* q5 K# 对目录碎片大小的硬限制 默认 100000- v; f# F' ?0 Q% a5 L9 `
# https://docs.ceph.com/docs/master/cephfs/dirfrags/
0 T A' x# \( \, R- Pmds_bal_fragment_size_max = 500000; ~! n* Y' V, k% K
6 R& E+ x6 z5 D2 q$ ^) J: I6 S## 官方配置 https://ceph.readthedocs.io/en/latest/cephfs/mds-config-ref/
' z, {7 H _9 E' u& U3 T8 [& m+ j' y% a- j* A! d& K
[client]: K) b$ k l3 G3 I9 d+ z
9 v) Z' w/ B1 a$ c3 r" B# X
# RBD缓存, 默认 true
3 o- B5 r4 j4 z9 x; V( Frbd cache = true
5 ? e' I2 L) Z' b+ B# |# [3 y1 ?( U, P- P6 d
# RBD缓存大小(bytes), 默认 335544320(320M)
+ h) a4 E2 r, s4 {& i# ~ X7 grbd cache size = 268435456
; s! g+ J2 c* N/ I) ^% J1 Q0 l+ I" H- g
# 缓存为 write-back 时允许的最大 dirty 字节数(bytes),如果为0,使用 write-through,默认值为 25165824& L& |/ x" \! U9 k9 e
rbd cache max dirty = 134217728
( f) Y" x2 x) `2 ^6 D2 {
# V. ~1 G6 x ?0 R+ ?. U' g4 l# 在被刷新到存储盘前 dirty 数据存在缓存的时间(seconds), 默认值为 1
9 v+ n" f* L" b1 y6 n+ K: rrbd cache max dirty age = 5/ o; _* b+ F6 D- E! }5 v
/ X4 k) [4 o. @8 T, }2 vclient_try_dentry_invalidate = false
8 q, Q/ R8 G. f o5 Q! Q3 k( A% M. B1 m: o1 A7 I4 o, P- \
[mgr]
# k* R* m& J9 D T6 E) Pmgr modules = dashboard
3 a, J1 D5 R4 y0 ?; u9 D, G8 z1 q# h: n! g$ `5 G! p2 V
# 华为云调优指南 https://support.huaweicloud.com/ ... object_05_0008.html5 W* `& U" a' _% j( L9 S# }1 S2 c
# https://poph163.com/2020/02/18/c ... %E8%B0%83%E4%BC%98/
4 s3 l/ \! D7 A复制7 _+ r2 N2 d \, }
full osd
8 p* {8 U+ M5 J/ N% Nfull osd 每个 osd 已经写满上限:https://docs.ceph.com/en/latest/ ... no-free-drive-space[5]
& K% m/ t' a" j4 B
" y# J; J3 e% r; f9 L. x6 ~, e$ ceph osd dump | grep full_ratio
- Y8 W9 v# c5 z% }# D: d- n% pfull_ratio 0.959 ~6 ?4 a) O) ^" ~: i0 d5 M
backfillfull_ratio 0.9
7 q9 N; H5 M9 `& Dnearfull_ratio 0.85; G5 W" F( N. F2 @2 t) r% z
复制5 h8 a r# i, f N: H! l ]
集群状态:
# D& T1 S1 d6 T* b# X2 c1 [4 E; n* ]! G2 ]. x# _' L( q" f% m; r
$ ceph -s
6 g7 X" C$ n/ o. k; V cluster:
( ?3 |9 `8 _1 w! {7 v id: 2f77b028-ed2a-4010-9b79-90fd3052afc6
6 B ]1 w: \+ s6 q* Y" ~ health: HEALTH_ERR
C/ X+ |) o# F: V$ ]! _; P7 I* ] 2 backfillfull osd(s)5 W% t W* ]% ]6 C" j; g
1 full osd(s)3 X+ c, B0 K i: z- f! J: ~- k; w
2 nearfull osd(s); ]% R; q& Y; j' ?& v: w
7 pool(s) full
5 X& T% h6 n" H' K复制* Q6 z7 h8 Q6 [3 B8 Z, t7 x
执行 osd 磁盘状态时,如果已经有超过 95% 使用率时则会报错 full osd 则会造成 cluster 无法正常使用:: u$ V. G) A( x1 s. g! R3 f$ O
N0 S& w" w& d+ P$ ceph osd df
2 o! m* A' v( ~* D" P! V5 F* V9 r. IID CLASS WEIGHT REWEIGHT SIZE USE DATA OMAP META AVAIL %USE VAR PGS
6 t7 L4 F3 ^2 H4 K5 q! n 0 hdd 7.27689 1.00000 7.3 TiB 4.7 TiB 4.7 TiB 918 MiB 9.1 GiB 2.5 TiB 65.15 0.84 68
8 x0 t; G9 t' r6 H 1 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 327 MiB 11 GiB 1.2 TiB 84.07 1.09 67
& w# a# ?; k" }7 U 2 hdd 7.27689 1.00000 7.3 TiB 4.3 TiB 4.3 TiB 924 MiB 8.4 GiB 2.9 TiB 59.70 0.77 67
- e( m1 F: g' B0 H 3 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 807 MiB 9.8 GiB 2.1 TiB 70.57 0.91 66! x- u, F7 w% K( m0 q" S. y# g$ C
4 hdd 7.27689 1.00000 7.3 TiB 6.7 TiB 6.7 TiB 770 MiB 13 GiB 583 GiB 92.18 1.19 66
, |: C$ _4 k4 U1 a: F, ^' F 5 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 623 MiB 10 GiB 1.8 TiB 75.87 0.98 66
, m! z- d, R! a 6 hdd 7.27689 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 602 MiB 11 GiB 1.6 TiB 78.67 1.02 64; v. V. {9 i' m+ v& k2 z
7 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 1.1 GiB 10 GiB 1.9 TiB 73.35 0.95 65, a7 g' M0 R0 a% l# P, A) a
8 hdd 7.27689 1.00000 7.3 TiB 5.9 TiB 5.9 TiB 498 MiB 11 GiB 1.4 TiB 81.29 1.05 68
4 _- s( q+ u& |& t 9 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.1 GiB 9.8 GiB 2.1 TiB 70.59 0.91 65+ ?) J( g% n0 W" a } D6 S; v% W* G
10 hdd 7.27689 1.00000 7.3 TiB 6.3 TiB 6.3 TiB 297 MiB 12 GiB 985 GiB 86.78 1.12 61
. m5 y! P5 }! T0 A8 {# l11 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 923 MiB 9.7 GiB 2.1 TiB 70.56 0.91 67
5 C3 c A5 r2 `5 j6 O12 hdd 7.27689 1.00000 7.3 TiB 5.9 TiB 5.9 TiB 203 MiB 11 GiB 1.4 TiB 81.39 1.05 65! W8 G3 I8 o" L. Q. _; Z0 @
13 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 799 MiB 10 GiB 1.9 TiB 73.29 0.95 66
' }5 A( a2 a: V14 hdd 7.27689 1.00000 7.3 TiB 4.9 TiB 4.9 TiB 873 MiB 9.4 GiB 2.3 TiB 67.77 0.88 71% {( G/ V! ]( Y4 o) E
15 hdd 0.29999 1.00000 7.3 TiB 6.9 TiB 6.9 TiB 191 MiB 13 GiB 387 GiB 94.81 1.23 39
o; [1 t2 X! @% s, ^' H Y6 u6 l! S16 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 548 MiB 11 GiB 1.8 TiB 75.91 0.98 69 G" l1 ?0 m4 a' d& |( L/ F
17 hdd 7.27689 1.00000 7.3 TiB 6.7 TiB 6.7 TiB 806 MiB 13 GiB 581 GiB 92.20 1.20 66 b( ]. t4 \ B8 P1 U( ~
18 hdd 7.27689 1.00000 7.3 TiB 4.5 TiB 4.5 TiB 1.4 GiB 8.5 GiB 2.7 TiB 62.43 0.81 66$ U# X- V$ D6 U& `3 h' a
19 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 1.4 GiB 10 GiB 1.9 TiB 73.28 0.95 65
9 K* i( t4 x. h5 z6 ]# M20 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 705 MiB 11 GiB 1.8 TiB 75.91 0.98 64
9 @( t7 m$ q1 |7 A ?+ `21 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 911 MiB 11 GiB 1.2 TiB 84.11 1.09 62
9 Z% k8 N( L/ p' @22 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 301 MiB 11 GiB 1.2 TiB 84.03 1.09 665 q% {& J2 q7 L* \' ?( A# T, `
23 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 401 MiB 9.8 GiB 1.7 TiB 75.96 0.98 675 o) s. Q! w3 I+ K
24 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.3 GiB 9.6 GiB 2.1 TiB 70.58 0.91 63! F: W- p6 u- m9 @9 @) [: s: Z" t: C
25 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.1 GiB 9.7 GiB 2.1 TiB 70.56 0.91 65
3 n! v' l& v, x26 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 730 MiB 10 GiB 1.9 TiB 73.32 0.95 684 A, T1 \' U# I3 A
27 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 818 MiB 12 GiB 1.2 TiB 84.08 1.09 623 Y2 z4 U) B% o, N6 B& Q
28 hdd 7.27689 1.00000 7.3 TiB 4.9 TiB 4.9 TiB 587 MiB 9.3 GiB 2.3 TiB 67.84 0.88 68
9 V# ]) ~) ?& P9 @- i$ U# \$ S) W29 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 215 MiB 11 GiB 1.2 TiB 84.09 1.09 666 P" j" _4 x3 G0 p% Y
30 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 690 MiB 12 GiB 1.2 TiB 84.15 1.09 64. a% A, r E6 v' h" L- l
31 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 1020 MiB 10 GiB 1.8 TiB 75.94 0.98 64
, ^& k$ d1 b2 m32 hdd 7.27689 1.00000 7.3 TiB 6.5 TiB 6.5 TiB 616 MiB 12 GiB 786 GiB 89.45 1.16 660 ]& E4 N: o3 x* R1 c6 _
33 hdd 7.27689 1.00000 7.3 TiB 4.9 TiB 4.9 TiB 622 MiB 8.9 GiB 2.3 TiB 67.84 0.88 66
' ~8 G7 Q1 A) j3 s34 hdd 7.27689 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 102 MiB 11 GiB 1.6 TiB 78.56 1.02 651 F# R5 U6 m% v; @$ s5 W
35 hdd 7.27689 1.00000 7.3 TiB 5.9 TiB 5.9 TiB 723 MiB 11 GiB 1.4 TiB 81.31 1.05 639 D" a! ?% N( h1 ~
TOTAL 262 TiB 202 TiB 202 TiB 25 GiB 381 GiB 60 TiB 77.15
+ ]6 Y- S4 X6 I9 ^复制5 t+ C+ Z3 }6 z
可以手动修改权重解决:
: [# x: d' j3 i; \; }
' |. O) {+ K0 `1 B, n n' s4 ]$ ceph osd crush reweight osd.4 0.3. p- K6 C3 F# E4 E4 v* i. i
复制
; B1 k& J6 N- G- }! Z* X$ vpg 均衡
2 f6 J" _% g7 \5 Rpg 在默认分配有不合理的地方。https://cloud.tencent.com/developer/article/1664655[6]" p- m0 y- d" \, ?- ~. ]2 k
# K, H. t, K* O
$ ceph osd df tree | awk '/osd\./{print $NF" "$(NF-1)" "$(NF-3) }' t; N9 Y M+ D% `
osd.0 89 71.20
: C& O+ J0 i6 k/ m' j" Posd.1 38 94.80. r1 g! p ?, Q: }1 c
osd.2 92 68.44! |6 M8 R) V; P& ^& |7 C# }4 b2 ~
osd.3 92 72.36
4 Y! L$ R/ ^1 i9 M* z) Cosd.4 28 76.861 e. w& t2 y1 L8 V' X ]% g) G+ e+ m
osd.5 64 81.37
: D2 n& J9 Y. T, y, ^2 K8 ~osd.6 62 87.90
5 f. K3 E6 f% o, _9 S; I: Yosd.7 89 78.78
r, E& e8 q. b; z6 @4 }1 sosd.8 52 86.18
8 J3 o2 c2 L8 j2 v1 eosd.9 89 75.44. T+ p5 H" U. }/ u5 \
osd.10 37 96.33- e ?9 R7 y* O! ]7 x
osd.11 102 75.26
$ z7 y1 `5 }) V$ i' ?4 s2 josd.12 33 91.41
8 {& n5 q$ `. z& ^" @* R. b. losd.13 34 95.98
) \: h9 C( F/ P) m7 G% dosd.14 59 84.97
* P$ o: p) F# Cosd.15 20 70.92
. s$ M4 L0 t! M5 R# Rosd.16 113 89.46/ h& [8 r' K1 g) x$ ]7 \
osd.17 30 77.12
1 i- d( U7 P" w7 V* rosd.18 124 77.11
6 y- [. X; t; H% ?3 \ k) d! Cosd.19 44 95.23
% |( x' f( j8 z1 f. D2 p* Losd.20 65 84.63
+ G: x. I! ]! f: f* Dosd.21 98 96.71% ~; V4 F4 x( b, `
osd.22 34 95.93
; ?, u3 A* P% q& @6 | Gosd.23 62 84.56
$ X+ w8 `5 @( G3 [osd.24 110 76.63( L& V/ s1 m( a6 x8 h5 E0 g5 z
osd.25 64 82.325 G+ f: q! s! o0 z
osd.26 59 88.268 y' x% n* P8 z7 P9 X
osd.27 38 95.83; A9 O; g9 ]- U$ v1 ]! B
osd.28 105 79.198 _' U3 a' u6 u, P6 B( B
osd.29 36 94.94
/ c" I$ l. I4 t) f% {* _ @osd.30 94 90.79
0 Z& p. z5 f3 X+ V2 Uosd.31 91 81.743 l6 Z9 Y5 q$ U! E' a& T0 s
osd.32 12 42.44
' O$ g8 t6 b1 a+ s' Gosd.33 94 81.32% t% d4 o4 ?7 E0 h6 t# J0 j, P
osd.34 46 86.51
; b. f; l9 v8 uosd.35 37 92.68
$ y6 t3 F9 k) V- f7 C" J9 d1 H复制
! @9 U9 e. O& y$ ~8 d+ B& `9 Sreweight-by-pg 按归置组分布情况调整 OSD 的权重:
3 V- J8 _3 ~3 Z2 C, w9 h; U/ N- V9 y7 v
$ ceph osd reweight-by-pg$ T! c- g$ M& U8 O$ y3 J. n2 w5 b; E
moved 0 / 2336 (0%)- M+ A( ]5 S" {: s
avg 64.8889
5 a! M7 G8 d9 O7 Zstddev 58.677 -> 58.677 (expected baseline 7.9427)9 M! K* g# x# e0 y* c* k
min osd.1 with 0 -> 0 pgs (0 -> 0 * mean)* D4 O F' P( v) d4 P. I
max osd.18 with 168 -> 168 pgs (2.58904 -> 2.58904 * mean)5 h7 {0 o0 j: T1 f" D0 a; v
8 f( ` [7 s4 r) s& @( B
oload 120
: @; n+ P+ C6 qmax_change 0.05- w. D2 ]" B0 r* F6 v ~$ u
max_change_osds 4
1 ~8 H7 S+ ], Q5 N. x: ?/ Vaverage_utilization 18.2677
6 @/ r) Y3 q* Q' M: ooverload_utilization 21.9212
; m/ M! Y( p8 `( sosd.19 weight 1.0000 -> 0.9500
9 F% }* ?) I& X/ |osd.1 weight 1.0000 -> 0.95007 _6 j+ U; e7 o) h! w* k8 \5 q
osd.27 weight 1.0000 -> 0.9500
% j4 p# k* Q) |6 [* Aosd.10 weight 1.0000 -> 0.95000 i* U/ G, l% H, @
复制/ ]. n+ w) k i, c& l* o4 T
reweight-by-utilization 按利用率调整 OSD 的权重:2 |4 X+ L. @+ ?( |
/ k6 [ F, f- R- |0 v; F* R. l$ ceph osd reweight-by-pg& b7 ~8 [ v( ^
moved 0 / 2336 (0%)
9 J* u, i$ E4 V' ~9 e& y0 H' Havg 64.88890 b6 a1 ]' }& T/ P# r3 v
stddev 58.677 -> 58.677 (expected baseline 7.9427)- O/ x. J1 D; T' A: V- P' f1 ?
min osd.1 with 0 -> 0 pgs (0 -> 0 * mean)/ Q8 u- o$ y# C+ A* d, _
max osd.18 with 168 -> 168 pgs (2.58904 -> 2.58904 * mean)6 U+ m, L" E; n
/ J' J3 h8 J' Z2 `2 z
oload 120
: H, C1 P2 T2 Omax_change 0.05
' H& M; M* {- fmax_change_osds 4# G4 {0 }6 w, T9 |' @
average_utilization 18.2677$ V; g% }* R$ t( r$ X' G
overload_utilization 21.9212( a8 |8 k; [4 \' v; B( N
osd.19 weight 1.0000 -> 0.9500- O- Z9 x/ w- l- ^4 d
osd.1 weight 1.0000 -> 0.9500
2 _( J7 z. @4 l, a5 h' xosd.27 weight 1.0000 -> 0.95006 X9 O1 t6 R7 z7 {8 p
osd.10 weight 1.0000 -> 0.9500
! ]: X7 u) c7 k+ H2 D# ^- t复制
+ c9 f9 [! }' d6 Q) `5 A* N* B* h调整写入权重:2 K$ q# E$ _* t
3 D/ d; ?, T' ?$ ceph osd reweight osd.35 0.001
; k, J+ h( G+ i4 _* W/ I* t复制7 r6 A6 w. ~! w- l
查看当前 osd 信息:: z v( r5 }+ _; L' D, e
1 z, b1 v3 Q5 h; k. q8 R- K$ ceph osd df
' U7 G1 _" z, A$ n8 t. O$ uID CLASS WEIGHT REWEIGHT SIZE USE DATA OMAP META AVAIL %USE VAR PGS' B( k; w, U m; i* A p( e, |
0 hdd 7.27689 1.00000 7.3 TiB 5.2 TiB 5.2 TiB 1.0 GiB 9.4 GiB 2.0 TiB 71.96 0.86 39" |7 p3 @9 q$ S6 p2 y
1 hdd 0.00999 0.90002 7.3 TiB 6.9 TiB 6.9 TiB 604 MiB 12 GiB 382 GiB 94.88 1.13 37
# D' d! T# A p% O! l" j' v' k 2 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.2 GiB 8.8 GiB 2.2 TiB 69.55 0.83 34
( [: n4 C6 K+ Z$ A6 G7 ~( e 3 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 812 MiB 9.9 GiB 2.0 TiB 73.15 0.87 34- `; Z9 @2 s, J* e7 k5 t$ q9 j
4 hdd 0.29999 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 185 MiB 12 GiB 1.7 TiB 77.01 0.92 26
' M) P$ v/ F- \; c' ~5 R* q, L 5 hdd 3.00000 1.00000 7.3 TiB 6.0 TiB 5.9 TiB 443 MiB 11 GiB 1.3 TiB 81.90 0.98 36
* z, p9 |' M: L }) g 6 hdd 3.00000 1.00000 7.3 TiB 6.5 TiB 6.5 TiB 499 MiB 11 GiB 809 GiB 89.14 1.06 38- X4 u. t/ ~' D! o$ F! Y+ }& Z
7 hdd 7.27689 1.00000 7.3 TiB 5.8 TiB 5.8 TiB 1.2 GiB 11 GiB 1.4 TiB 80.10 0.96 439 n" e2 V2 h3 f! h% x: _1 S
8 hdd 3.00000 1.00000 7.3 TiB 6.3 TiB 6.3 TiB 502 MiB 11 GiB 992 GiB 86.69 1.03 360 D M5 t- A w3 W1 p: D$ j( y
9 hdd 7.27689 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 1.5 GiB 9.8 GiB 1.7 TiB 76.57 0.91 42
3 S7 M( r0 w4 u3 n: ?10 hdd 0.00999 0.00099 7.3 TiB 7.0 TiB 7.0 TiB 295 MiB 12 GiB 267 GiB 96.41 1.15 37- {+ s1 ^- C8 [) q% `" f
11 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 1.2 GiB 9.8 GiB 1.7 TiB 76.13 0.91 37# w" Y# r# U( U$ @0 z8 m! N. g
12 hdd 0.00999 1.00000 7.3 TiB 6.7 TiB 6.6 TiB 95 MiB 12 GiB 635 GiB 91.48 1.09 32, s# G+ e, E0 i/ _ |: V4 I
13 hdd 0.00999 1.00000 7.3 TiB 7.0 TiB 7.0 TiB 584 MiB 12 GiB 315 GiB 95.78 1.14 34 O! r1 H8 K: _* c0 L/ }+ f* n
14 hdd 3.00000 1.00000 7.3 TiB 6.2 TiB 6.2 TiB 974 MiB 11 GiB 1.0 TiB 85.86 1.02 40' ~0 t/ Q- A: \4 m8 C' f+ V! ^. }6 j
15 hdd 0.00999 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 116 KiB 10 GiB 2.2 TiB 70.43 0.84 20/ q5 a+ j& q" B
16 hdd 7.27689 1.00000 7.3 TiB 6.6 TiB 6.6 TiB 1.2 GiB 11 GiB 697 GiB 90.64 1.08 43, f1 _$ U! w( b1 F: w& I: X3 D7 V' {1 x
17 hdd 0.29999 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 40 KiB 12 GiB 1.7 TiB 76.75 0.92 26
- q0 _/ ^6 v% C/ {2 t18 hdd 7.27689 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 1.9 GiB 9.3 GiB 1.6 TiB 78.01 0.93 53/ X9 u6 u# I, @- _0 T" X
19 hdd 0.00999 0.00099 7.3 TiB 6.9 TiB 6.9 TiB 1.5 GiB 13 GiB 371 GiB 95.02 1.13 40
/ m" ~. F+ ^0 |+ B% O$ P0 K# s# @20 hdd 3.00000 1.00000 7.3 TiB 6.2 TiB 6.2 TiB 744 MiB 12 GiB 1.0 TiB 85.86 1.02 37
" ~ N1 ?* T* Y4 i y* n, I21 hdd 7.27689 0.00099 7.3 TiB 7.0 TiB 7.0 TiB 913 MiB 12 GiB 239 GiB 96.79 1.15 402 p* S1 E" N1 s+ u
22 hdd 0.00999 0.00099 7.3 TiB 7.0 TiB 7.0 TiB 283 MiB 12 GiB 298 GiB 96.00 1.14 34/ m2 q W/ M% {
23 hdd 3.00000 1.00000 7.3 TiB 6.2 TiB 6.2 TiB 515 MiB 11 GiB 1.1 TiB 85.30 1.02 35
4 n; H& v* ?; ^2 H8 P- Z; {( B3 U4 \1 B24 hdd 7.27689 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 1.4 GiB 9.8 GiB 1.6 TiB 77.63 0.93 42/ P& a1 P& D4 @ O
25 hdd 3.00000 1.00000 7.3 TiB 6.0 TiB 6.0 TiB 1.2 GiB 10 GiB 1.3 TiB 82.66 0.99 40
i: `2 n8 D0 X26 hdd 2.00000 1.00000 7.3 TiB 6.5 TiB 6.5 TiB 737 MiB 11 GiB 823 GiB 88.95 1.06 36
+ N, }1 N1 }' j- y4 v) _, g5 z27 hdd 0.00999 0.00099 7.3 TiB 7.0 TiB 6.9 TiB 822 MiB 12 GiB 327 GiB 95.61 1.14 37
; D+ U3 F( {3 `: {& T! h Z2 m28 hdd 7.27689 1.00000 7.3 TiB 5.8 TiB 5.8 TiB 859 MiB 10 GiB 1.4 TiB 80.23 0.96 40: h; q/ v( P! _; c$ L( l5 ?5 w: U
29 hdd 0.00999 0.00099 7.3 TiB 6.9 TiB 6.9 TiB 215 MiB 12 GiB 371 GiB 95.02 1.13 36
* `( E0 C$ ~7 B, Z7 t) f+ m! m30 hdd 7.27689 1.00000 7.3 TiB 6.7 TiB 6.7 TiB 1.0 GiB 12 GiB 607 GiB 91.85 1.10 47# i: M, \' e3 o5 b- C9 y
31 hdd 7.27689 1.00000 7.3 TiB 6.0 TiB 6.0 TiB 1.2 GiB 10 GiB 1.3 TiB 82.81 0.99 41 q2 g# J1 T c% r$ b1 Z* }
32 hdd 0.29999 1.00000 7.3 TiB 3.0 TiB 3.0 TiB 32 KiB 7.1 GiB 4.3 TiB 41.47 0.49 107 ]0 H3 V4 J M9 c0 { F
33 hdd 7.27689 1.00000 7.3 TiB 6.0 TiB 6.0 TiB 827 MiB 9.7 GiB 1.3 TiB 82.06 0.98 41
8 y l/ h ~: p# ?1 l! R$ e9 u0 o34 hdd 2.00000 1.00000 7.3 TiB 6.3 TiB 6.3 TiB 308 MiB 11 GiB 976 GiB 86.90 1.04 33
. \% c6 S+ b0 R2 w7 u35 hdd 0.00999 0.00099 7.3 TiB 6.7 TiB 6.7 TiB 613 MiB 12 GiB 540 GiB 92.75 1.11 36
* W9 F0 O' L1 O$ {) D O( r TOTAL 262 TiB 220 TiB 219 TiB 27 GiB 391 GiB 42 TiB 83.87; F# p# ~) i s8 v/ n3 t3 M- Q/ M
MIN/MAX VAR: 0.49/1.15 STDDEV: 10.62
/ g% r+ F/ P- [7 U0 y复制
1 N. J7 ?' Z/ E: c4 W* S a删除 Cephfs
! \. v- K7 k/ x8 J
. {9 B( s4 N* A7 b! s8 c7 q关闭所有 mds 服务, 需要登入服务器手动关闭:1 Z7 K. R; ?# G4 |/ U) k" H& F' |
7 `! m$ {) @! {4 q1 J1 a" S; ?+ E3 d
$ systemctl stop ceph-mds@${HOSTNAME}7 x Q) X! I' V& A5 w
复制; x2 K# `4 z5 p# g+ x) n5 Q
删除所需 fs:0 x* l) ~0 f! ?3 g4 [
7 @, i- f0 c+ t/ U( d7 A
$ ceph fs ls
" d) v+ d4 l1 A% x- r$ ceph fs rm data --yes-i-really-mean-it
! |+ `, W5 s; v( [6 e: D" C* {! ?复制$ ?$ T3 j" Z5 a, L2 L
SSD 使用: [) _% d) J8 a2 U
& e0 x: d6 u( g) S$ b. E( W% e
查看当前 OSD 状态: (相关文档:https://blog.csdn.net/kozazyh/article/details/79904219[7])
: I7 _6 s& V- u5 O
6 W9 S1 m* D; X$ ceph osd crush class ls4 ^8 z0 Y9 c% n. E3 n9 y
[! Y: E5 E7 [; V! q- \1 T+ S5 }
"ssd"0 R+ _" o2 p; f9 p, j$ |$ R6 x- p/ X% X
]
' K z% F9 R1 T. [复制. g# C+ }4 m, q7 i8 ~2 z9 K
如果使用的 SSD 标识错误,请自定义修改,命令如下, 移除 osd 1 ~ 3 的标识:6 @& P4 B+ B* T v* D* [
$ O% |; O2 a! M& e* m$ `7 |
$ for i in 0 1 2;do ceph osd crush rm-device-class osd.$i;done
! _; V# L' [1 q: v- o- b复制+ j1 ~1 ^3 [" A
设置 1 ~ 3 标识为 ssd:
) I4 c/ N8 \( w" N5 j: e
) W/ S/ [3 n6 X$ for i in 0 1 2;do ceph osd crush set-device-class ssd osd.$i;done
! y7 B5 C/ N2 V( y+ G& _复制
5 R, x. w6 c8 F- C* I: j创建一个 crush rule:9 k$ F- E r/ P& @% o Q1 A
/ ~: t* P0 l6 g- ~" C+ _$ ceph osd crush rule create-replicated rule-ssd default host ssd. T& f6 p! O! }$ l. f0 f
$ ceph osd crush rule ls9 O" j* I( O/ P5 m6 \
复制
; s& V Y" e' j, I3 p1 m) V9 Q+ d然后创建 pool 时附带 rule 的名称:
: D. v d, x; m8 T
( }5 ^- r/ U& Z$ ceph osd pool create fs_data 96 rule-ssd, i3 p. k' c* v7 q5 \0 i3 N
$ ceph osd pool create fs_metadata 16 rule-ssd& O6 U" C" K3 e( @
$ ceph fs new fs fs_data fs_metadata( @6 O( n6 ]! V$ \; N
复制4 K6 K: X% M& A" \- k. u' K
crushmap 查看
% C) M9 N; D- ~0 S' f; j* H3 y5 z% M执行命令如下:
0 y+ B. N% F" e/ b2 y; T" I L; Q) m! l& {: Q3 s& z, k, H3 p$ }
$ ceph osd getcrushmap -o crushmap; h3 f0 k; ], m; W
$ crushtool -d crushmap -o crushmap9 D! }5 }) |$ N5 Q. Q; q! ]
$ cat crushmap. }9 q i: t" x( C/ T
复制/ U" V% x& S3 N; B1 Z
3 monitors have not enabled msgr2
# G# B1 a9 Y; F3 F3 i解决如下: Q! {- [2 M! U
$ G3 Z0 d/ q n+ Z7 X4 X# n$ ceph mon enable-msgr22 M6 Q+ g2 x$ [( Y1 v+ o& a
复制
7 _) y/ Z l4 }) A! k6 K) Y3 V2 daemons have recently crashed
% i3 X" V/ z; |解决如下:https://blog.csdn.net/QTM_Gitee/article/details/106004435[8]
, q+ U$ X% ^9 U
( x$ E9 ^: e, H: c$ ceph crash ls) y) ]- w9 u. `' G
$ ceph crash archive-all `% g H! V5 b# r, I
$ m8 Z: X$ z' [
|
|