|
|
Rbd 无法删除1 g' w) |* e# V* h# q8 j% g; G K
rbd 无法删除,错误如下:8 r/ h! S- o5 @( H/ |
$ v5 O5 G2 z7 V7 M$ rbd rm nextcloud/mysql
2 u1 T- Q# ?. y" ?/ \2020-05-13 16:27:46.155 7f024bfff700 -1 librbd::image::RemoveRequest: 0x557a7af027a0 check_image_watchers: image has watchers - not removing
) o( S: i9 ~) Z7 d& a" l+ F# s& GRemoving image: 0% complete...failed.
! c. k9 Y7 d9 @$ c) Arbd: error: image still has watchers. e( m6 f- k' i& j) d8 R
This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.
1 t5 U a$ C) a9 ^
& b; g6 S4 U. ^% z5 |4 f$ rbd info nextcloud/mysql# L$ M" A% w5 k
rbd image 'mysql':
5 n$ I/ c8 _; p4 W5 I. P) S- Y- l" d size 40 GiB in 10240 objects
. C+ Y; z% t9 j8 P5 y( p order 22 (4 MiB objects)
3 P9 n; n- w0 } id: 17e006b8b4567
' }% @" ^5 n) H% K* F$ ^ block_name_prefix: rbd_data.17e006b8b4567
) S9 k7 U& v2 j) p" \& R format: 2 O0 e7 s: b3 D- O
features: layering4 I+ d% u. ?1 ? P/ N: f$ |
op_features:
/ b% g' Y2 [% J [. C# A* [# M flags:. M3 B4 @3 w: F# w
create_timestamp: Tue Oct 15 10:47:34 20191 V- F- Q7 X$ x# G `- J. G! D1 f" F
复制
( B/ h2 V/ b! @/ a7 o6 @查看当前 rbd 状态:+ z5 d3 n* ~% `$ d2 {
8 Y" m* c. x' F) a8 M+ P$ rbd status nextcloud/mysql7 Z% K3 Q+ ?. l; M
Watchers: b. j: w4 t4 g' b o5 Q8 U
watcher=10.100.21.95:0/115493307 client.67866 cookie=7! ~; a8 } T* [7 v2 s1 _3 h
复制7 z- M4 D. }6 ^( ]4 O: v7 Z5 l7 u' i
发现有节点正在挂载,登入到相应机器进行查看:
! m! N: Y( Z' n9 j
$ I( Y* o/ X7 Y$ rbd showmapped
# T* {- z* q, g0 Y' w" P" @4 Cid pool image snap device; o+ T4 c1 D7 m( x" F
...1 X8 M) o1 ^7 u$ p
3 nextcloud mysql - /dev/rbd3
1 P0 n; K$ a! H4 s$ j# `6 n+ }复制5 [& K2 W1 s: \8 u2 g9 \: q3 s4 {& ]
取消映射:6 [0 G: U! k- a$ G! V
9 K9 p8 E9 ~' X1 ^6 v# G( ?& ~$ V$ rbd unmap nextcloud/mysql3 I E9 n" D" `) @- h2 s
复制+ [9 a" n6 R6 b0 z( \1 Y8 V
重新执行删除操作即可:
5 c+ I. J) S4 N' E
- d$ o* V" ?+ O$ F# H9 s$ a9 d$ rbd rm nextcloud/mysql
8 E4 e1 \0 s. k) j& P. W) L5 pRemoving image: 100% complete...done.4 s( u/ f9 N7 b
复制
& R2 j' t( D9 q+ H+ m) X1 F暴力解决方案,直接对其添加黑名单,忽略挂载节点:
/ T; ^5 Y- K, S3 {
; M; _4 `3 l3 M$ o$ J5 I$ ceph osd blacklist add 10.100.21.95:0/115493307
, n' u+ N! r& m$ rbd rm nextcloud/mysql
& r' J+ S- x, q% H l复制+ t/ x; S2 h0 q2 `: c* L# ^1 O
OSD 延迟
3 O( |; b' I3 A6 a4 ?: @查看是否有 osd 延迟:
% a. I* b/ ^/ l! l3 y" @% I0 M) I4 z4 D7 T
$ ceph osd perf, o, z* T& v! @! E! u3 r: l
osd commit_latency(ms) apply_latency(ms)
N* ?2 `! L ` 2 0 0. L) l: F" s/ C9 L; |& j8 O2 |& S
1 0 0
# z9 R2 L8 u0 p" a3 [9 v* b" L 0 0 0
: \2 B: u) O T0 J3 E; C复制, y6 [# n5 ^9 V( i! L) j
碎片整理: W. E2 n6 w l- ~; `6 k
查看碎片:
: G' a. u0 V8 o
1 _6 n4 ?9 J" A/ B5 ~5 ~2 _. \& z1 Y8 u$ xfs_db -c frag -r /dev/mapper/VolGroup-lv_data15 Z$ y' |+ T$ n5 y% z
复制
& ^3 C7 H" W/ ^6 A- E4 ~% F6 F8 T$ G. n整理碎片:$ w6 p9 W5 A o$ `! f
1 `' S& \) ?( K b查看通电时长
1 C; q5 K+ @" ?, s8 Q2 h3 E查看磁盘通电时长:7 c0 x$ V- q& {1 _) h
# K1 C+ }4 o) E1 ]( [4 W. m
$ smartctl -A /dev/mapper/VolGroup-lv_data1
. m3 L0 [: \. J5 f- f6 Y# u+ z复制
. @) V, h3 M) W0 H) [+ P修改副本数量9 ^, d/ {4 q7 `: o' _2 q" D
修改副本数量:
+ ?, h: Q. E! j, a7 _2 d1 T+ N& G* U, V, E; a; [
$ ceph osd pool set fs_data2 min_size 1: K5 W$ \. v5 ]. l
$ ceph osd pool set fs_data2 size 2
2 {- q+ U; e" J% q5 ?复制
* x3 E2 s$ ~8 C& Y1 e4 V- O添加 / 删除 pool
& D8 o$ Q: K) S, h" O; }添加 / 删除 pool:
' [% o9 k. G# R% u
9 p( m( t# f( ?. @$ ceph fs add_data_pool fs fs_data2
% }* z& F7 O3 _+ U( B. H9 ?! w$ ceph fs rm_data_pool fs fs_data2
" _6 v8 c, r0 x! r0 A$ ~/ {复制+ N5 O* a. e, C5 @
osd 数据均衡分布& M$ e1 f1 t1 @& q
osd 数据均衡分布:& f& y( x" z. t4 h* O
( O& }1 j: o u- X1 a
$ ceph balancer status# j* ~% r! L+ q1 q: J% M
$ ceph balancer on
w- ^1 G% O' w" @0 Z$ @$ ceph balancer mode crush-compat, C! ]2 N( R) t; G5 ~6 o2 ~ E# E
复制8 q% \0 ~9 s- s) s/ U
mds 无法查询
; ]; g/ H' t% [3 ^mds 无法查询:* l0 ?, I9 U3 ?3 f* G/ }9 ?
( }& \6 y* q1 D$ Q
$ ceph fs status
# k. ~! T0 K' u9 Z! BError EINVAL: Traceback (most recent call last):
n6 V: U, Z9 i. \5 J! _ File "/usr/lib64/ceph/mgr/status/module.py", line 311, in handle_command
$ C' v4 e1 p* R3 z return self.handle_fs_status(cmd)
# K1 T9 u3 w% I/ K v4 V2 X File "/usr/lib64/ceph/mgr/status/module.py", line 177, in handle_fs_status
9 l- a5 S8 v& x+ d, r mds_versions[metadata.get('ceph_version', "unknown")].append(info['name'])( ?, p6 u3 s) V. i
AttributeError: 'NoneType' object has no attribute 'get'
6 [0 f: }3 v* S3 X( y$ e& o; g- t! `/ \( u
$ ceph mds metadata
" V g N) g# j9 ^[: t2 v- o& u: r
{
, b( g' A& u3 l D4 ~ "name": "BJ-YZ-CEPH-94-54"
1 L% N+ Q2 d- h2 k$ B },
% R. Y" T5 k$ w. Q6 m! {+ x {
" k5 j4 Z5 I o* c& w4 Q "name": "BJ-YZ-CEPH-94-53",1 h+ ?; v3 z$ K7 l& m/ P+ L
"addr": "10.100.94.53:6825/4233274463",6 u, e: A5 M5 V+ R4 q
"arch": "x86_64",; V* A% n- y# ~. g/ L4 P' p& v
"ceph_release": "mimic",
( t3 v: v: }- f1 s+ | "ceph_version": "ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)",4 t: }) O i$ l+ q. c9 M! T u
"ceph_version_short": "13.2.10",# a, P; N+ t9 j* C1 a) @
"cpu": "Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz",0 f$ n: U( N; T% I! M: T! F
"distro": "centos",, @) E. ]! v% I3 o. [ ~$ J
"distro_description": "CentOS Linux 7 (Core)",
7 r5 V: z8 T/ o+ z "distro_version": "7",
3 I2 ]4 W S7 _* t "hostname": "BJ-YZ-CEPH-94-53",% _3 k r6 a8 M, g: J
"kernel_description": "#1 SMP Sat Dec 10 18:16:05 EST 2016",$ i; |2 ^! b, C9 @( u
"kernel_version": "4.4.38-1.el7.elrepo.x86_64",5 q6 P' }7 x0 Z
"mem_swap_kb": "67108860",
7 Z$ I- R6 k) E1 Z "mem_total_kb": "131914936",( p9 @+ _( M# B# M5 k
"os": "Linux"# e3 F* V. p, a' f1 y0 {! a; y
},4 Q% J# L1 L2 l
{ j; q7 ]. |! r8 z! J
"name": "BJ-YZ-CEPH-94-52",' n& \& F+ U6 @
"addr": "10.100.94.52:6800/3956121270",
3 t+ ], f; d) x5 D+ [" |- ~& r& O "arch": "x86_64",. Q- d% ~+ c* V7 x/ c* y
"ceph_release": "mimic",, v! H; e0 e+ {
"ceph_version": "ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)",
( e( Y ?: r9 w1 ]0 E: ] "ceph_version_short": "13.2.10",
r) ]+ n! A6 j: j "cpu": "Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz",
9 f6 I. O7 u& V: a "distro": "centos",3 v V/ ^" b; |2 F. y: B0 a
"distro_description": "CentOS Linux 7 (Core)",7 {$ i8 A3 L2 `; K7 U! H
"distro_version": "7",( H$ L _' c8 q( V/ D9 P$ K1 V4 s
"hostname": "BJ-YZ-CEPH-94-52",2 j1 K- U+ P, S# H
"kernel_description": "#1 SMP Sat Dec 10 18:16:05 EST 2016",, `- q. h. g: Z! h; Z ~
"kernel_version": "4.4.38-1.el7.elrepo.x86_64",+ Y. q; B2 V3 v( e% o% k6 c2 [3 H
"mem_swap_kb": "67108860",
" t" x3 J. j, ^* ]/ g! R% p: \ "mem_total_kb": "131914936",% c! g3 u. \" ]! h* I
"os": "Linux"
; N& E8 e1 l; p1 x }
9 w! D% z) j: W. []5 \" g2 K/ s0 x" Z
复制9 o* w* p8 D- I% S( h3 ?9 B
重启 mds 解决。) `* ^+ ]6 q9 R& O1 N D5 Q
% _ T4 S4 F( K- Rcephfs 显示状态正常但无法写入数据
) e/ t n* K$ Z1 m# S7 Q2 Tcephfs 显示正常无法使用,一般是有异常 client 导致的,首先查找 mds 是否存在链接,尝试删除链接解决:
6 Z O9 F) v& C3 u! s! ~# x% q7 o S3 _ F
$ ceph tell mds.BJ-YZ-CEPH-94-52 session ls# Z6 L8 I9 Y2 l( ?3 T1 a" C; H( A
$ ceph tell mds.BJ-YZ-CEPH-94-52 session evict id=834283% h# N# [6 [7 x( h4 F
复制
& [: W& n ^9 R ?' v& x% |每一个 mds 的 id 号不通用,不能跨节点删除。$ Q7 E) A1 L/ r) T% ^- {+ Z
0 b4 v4 P! ~5 w6 p$ M B+ b; Wfs 增加 mds
) C. F& p. J! S, y4 C2 J6 @& l: E0 pfs 增加 mds:
, N3 Y! Y* t( i3 c v9 M. n9 }. T5 i! b( u* z8 v+ v, @3 I9 a( \
$ ceph fs set fs max_mds 2- V8 N6 i. L; V# E2 i$ f7 ~$ i, l
复制
7 Z: G I6 \' L2 `mon 时区异常+ K9 w0 Y) z/ ]; E+ w; E
mon 因为时区有部分异常导致报错如下:
& D7 B8 o* Y; k, E) s' y! x; k" {4 H. q% _; \$ z$ Q( B
$ ceph -s: O) r5 I2 k' f
cluster:
" Z* ~$ d F) }2 ]: c id: 2f77b028-ed2a-4010-9b79-90fd3052afc6
1 G0 c) a i+ g2 \- U) p" | health: HEALTH_WARN
# i1 L7 O. u0 U& v, n. G- E# k 9 slow ops, oldest one blocked for 211643 sec, daemons [mon.BJ-YZ-CEPH-94-53,mon.BJ-YZ-CEPH-94-54] have slow ops.% w8 d' T5 c+ e! ~4 R/ \0 Y8 Z. [
- u* ^) R8 ]4 W2 ]( c s# l services:
# w* y F! N, c7 k! \5 d mon: 3 daemons, quorum BJ-YZ-CEPH-94-52,BJ-YZ-CEPH-94-53,BJ-YZ-CEPH-94-54
, \% n: J4 F3 O2 \1 a" W mgr: BJ-YZ-CEPH-94-52(active), standbys: BJ-YZ-CEPH-94-54, BJ-YZ-CEPH-94-53$ } [: p4 E" a
mds: fs-2/2/2 up {0=BJ-YZ-CEPH-94-52=up:active,1=BJ-YZ-CEPH-94-53=up:active}, 1 up:standby-replay
' u! I8 V* ^, P2 }$ s5 | osd: 36 osds: 36 up, 36 in9 I. m/ J, h0 l9 h: @
) \' g: l6 K& }: n" N$ W data:, ~0 M& o. ?+ Y4 a
pools: 7 pools, 1152 pgs
6 a4 E# Q% i6 y( y objects: 37.66 M objects, 67 TiB
. A$ o' }" S4 P5 n: e usage: 136 TiB used, 126 TiB / 262 TiB avail
. O/ o4 ^" u2 d! Q2 U; o pgs: 1148 active+clean
1 u; P$ l% H, [9 E) r 4 active+clean+scrubbing+deep d; C/ s* u; {. X/ i' \
3 s: E8 L% D' |0 ` io:
+ g. W. q- v$ L0 k9 E7 M2 ~0 [ client: 13 KiB/s rd, 27 MiB/s wr, 2 op/s rd, 19 op/s wr7 C4 x. S7 W/ M# S- Y- t. y3 ]5 s1 Q
复制
0 D1 u: W2 B+ v( n! v& e配置 npt sever:
0 ]" \7 ^. q, u3 Q* t l7 K) ?& c [) i
$ systemctl status ntpd: I# s& O8 A( c, p) M. m8 I1 T
$ systemctl start ntpd! N" B( v& X8 h3 I$ }
复制; I, A! g$ _0 K$ j; U0 R+ t' m
重启异常的 mon.targe 解决:
4 x; C% N4 h& d
( z+ f% V' V/ }- c$ systemctl status ceph-mon.target/ i/ N; t( v V1 U. i
$ systemctl restart ceph-mon.target8 D/ k2 q* _$ x
复制; o6 f& E$ ^% d0 @
1 MDSs report slow requests/ ~8 d" _0 r- b6 V7 p5 O
报错如下:* E ^. r- l. L
! X9 n7 F" v3 [1 C, B
$ ceph -s) k2 }8 Q H! X T1 q K
cluster:" l: I) S0 J! W$ b
id: b313ec26-5aa0-4db2-9fb5-a38b207471ee
3 x) q& s* j1 E% s4 I3 ~ health: HEALTH_WARN
# Y, j4 j' e, k6 X. Q4 R 1 MDSs report slow requests
4 k d5 R* S1 t' R( j Reduced data availability: 38 pgs inactive
* A& K+ v. J) p, e% v Degraded data redundancy: 122006/1192166 objects degraded (10.234%), 102 pgs degraded, 116 pgs undersized/ }8 ?3 L. i4 [( j' `* S l0 M
101 slow ops, oldest one blocked for 81045 sec, daemons [osd.1,osd.2] have slow ops.
7 f t: g& |3 L: t/ n% B. A, t复制# W" n$ m: |& G8 V( f' ^9 w8 Z
重启 mon 即可解决:
0 G4 h0 f( B- F* M! L! U8 k: p
& l9 I, U, F( G# ^5 K/ U$ systemctl restart ceph-mon.target. {' F7 L4 @, c6 N( x( f5 C
复制
+ P$ _+ e# b. o. d5 Y' E如果无法解决需要重启 mds 解决:
: O9 l3 Y' p/ N% i0 l0 P+ p j( ]0 x3 T( N$ e
$ systemctl restart ceph-mds@${HOSTNAME}
5 F% Q3 ^4 \8 _0 b; z9 G$ }) I复制
. N6 E8 G$ J6 y* l! c/ cReduced data availability: 38 pgs inactive
* [5 i9 H# i$ [' y% Q: f- |报错如下:https://zhuanlan.zhihu.com/p/74323736[1]# U, q( S, c b8 a2 b9 [
8 [( ~2 O% z) |: O1 m: i d/ z
$ ceph -s* l! c% S9 b4 k7 |, I8 O" f5 f
cluster:4 t- D' Z! m- }, U( x
id: b313ec26-5aa0-4db2-9fb5-a38b207471ee. Z) R3 y0 N4 e4 x0 `4 \* X
health: HEALTH_WARN! f8 R: D$ f% |2 J* q* f, |' `
1 MDSs report slow requests
+ R% N; G; a0 w. @* {/ J Reduced data availability: 38 pgs inactive
0 N7 {8 i' b; G: l6 ? 145 slow ops, oldest one blocked for 184238 sec, daemons [osd.1,osd.2] have slow ops." n2 l2 @! z- P
- u2 Y# w4 ~+ L9 x& Q
services:: ]; I) W# L; V4 z
mon: 3 daemons, quorum master001,master002,master003
# `1 {( p* b4 W% z mgr: master001(active), standbys: master002, master0030 N* i5 }0 f7 l( m: |
mds: kubernetes-2/2/2 up {0=master001=up:active,1=master002=up:active}, 1 up:standby
0 ^( Y* \# C; m! A! {! S4 a& I3 | osd: 3 osds: 3 up, 3 in
3 u* E. n: b$ N2 l9 X% P rgw: 1 daemon active6 Z3 ?6 r, n& I3 f* y( t
; y! i) i- Q* L0 i data:. A9 ?/ a& }' n' u* H
pools: 9 pools, 244 pgs
9 p* g! v1 I9 U5 p6 Y objects: 535.1 k objects, 177 GiB
+ }5 e/ s" P& C' q, z4 V. ?2 j usage: 470 GiB used, 4.1 TiB / 4.6 TiB avail0 W0 _7 t% G; Y
pgs: 15.574% pgs unknown; Z0 c' ^% n) j" G- V4 X
206 active+clean8 C# n" ?2 Q9 A4 F: S% Q# g
38 unknown
3 Z7 s# @5 H* v5 m, g' W$ M3 g/ A$ o K7 E$ G) r
io:
6 i+ _. w) y/ q2 V+ P3 o8 n3 H client: 35 KiB/s wr, 0 op/s rd, 2 op/s wr" \5 h ?8 J; l
复制
) d; v K3 z3 {( r此问题属于 pg 丢失数据并且无法自动回复造成的。解决办法是清除 pg 数据让其自动修复,但这样可能会造成数据丢失(如果 size 为 1 则肯定丢失数据)
" S9 H1 j) [* F+ l4 _5 H$ T! q6 R3 P! a1 }
首先查看异常的 pg:
/ w6 C4 z4 R# ~+ G0 a3 v
; T$ v$ m* ~( U然后执行 query 查看信息:
5 [# Z* q+ n2 J. d D9 {, ?; H0 f- z) Z7 w" u& R
$ ceph pg 1.6e query6 a/ M; F, \ _
Error ENOENT: i don't have pgid 1.6e- U; Q) ~8 D# w) `
复制0 N+ F/ ^1 x; k( [) W& b; E
上述无法查到 pg,通过如下命令查看异常的 pg:
7 O4 P- d+ q, [7 _& F! m# H0 ?4 L) `
$ ceph pg dump_stuck unclean
0 k: i" o' l( i, A/ c" h" Q; vok; b, F4 F1 }0 X4 K# @, D
PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY i& D4 u( i# ^* l0 j
1.74 unknown [] -1 [] -11 ~& S% t) A# Z7 H! k# b( `" K
1.70 unknown [] -1 [] -1; z& d+ ~- W8 m/ M7 M
1.6a unknown [] -1 [] -1
+ C4 [) B4 U% }7 `# V6 O1.2d unknown [] -1 [] -1
% S0 p, J" |% s. ~! a' v( ]; u: ~1.20 unknown [] -1 [] -13 Z0 {$ w9 D, l0 L
1.1e unknown [] -1 [] -14 ^: Q" J1 y% f" F, R
1.1c unknown [] -1 [] -1
, Z* ^6 l3 P( ?+ ^4 A, t1.17 unknown [] -1 [] -1' v, F& n" g$ k4 X5 k
1.9 unknown [] -1 [] -1
( X" Q" \" r6 l9 i) [/ R6 F, Y1.29 unknown [] -1 [] -1
7 E2 K7 X" {. @, X0 T5 A$ H1.56 unknown [] -1 [] -1
% M o5 X* E; `0 @1.72 unknown [] -1 [] -1
8 H: o5 m) m( q8 k1.45 unknown [] -1 [] -1) I! ^# _) V% ?7 [/ [9 G+ n5 a3 O
1.4e unknown [] -1 [] -1
3 w+ p& T/ S2 ~1.46 unknown [] -1 [] -13 \3 B- ^, T+ g& |% Q" ]; ?: H
1.22 unknown [] -1 [] -1
4 o+ p& ` e, L U4 N1.53 unknown [] -1 [] -1
k; [ F, Z) r1.59 unknown [] -1 [] -1. _& T" _, q& A1 O
1.24 unknown [] -1 [] -10 E! W6 [- ?. @+ p
1.55 unknown [] -1 [] -1
9 r( p' ^4 _$ c- m2 ~2 i) u7 C6 A1.3f unknown [] -1 [] -1% [% J# J) x1 Q% c
1.38 unknown [] -1 [] -1
8 c0 w/ b" j2 ~( i( F4 {9 E7 P Y1.a unknown [] -1 [] -1
' j& F! P, B% G1.7 unknown [] -1 [] -1
4 {4 q( y* L) z s% G' [7 w. L1.34 unknown [] -1 [] -1
" v7 c2 d* a! u1.64 unknown [] -1 [] -1
2 R5 l3 D; B# _9 h- k4 l9 d1.6 unknown [] -1 [] -1$ \/ i3 l4 B1 P
1.32 unknown [] -1 [] -1
0 k4 ~4 ~4 I) |1.4 unknown [] -1 [] -18 J8 Z e* m1 T& q4 `8 i" [
1.2e unknown [] -1 [] -1
- e& E) y( B9 q* I" R1.31 unknown [] -1 [] -1
+ B! Y$ ?+ f* D2 }1.5e unknown [] -1 [] -18 t% @) O4 C8 q& W! {+ [1 U Q
1.0 unknown [] -1 [] -1
Z& l; F! U8 `: m0 p1.42 unknown [] -1 [] -1
2 y7 r2 D9 i# K( g7 ~1.15 unknown [] -1 [] -1
; q+ Q! o& C( f* @1.6e unknown [] -1 [] -1
6 h# }0 a' m! k1.41 unknown [] -1 [] -1
/ W7 r1 g% x" S& K: n1.10 unknown [] -1 [] -1
- R( K3 K# ?, X$ i复制
n5 R4 x' L- M9 b$ x执行如下命令强制清除 pg 的数据:https://docs.ceph.com/docs/mimic ... troubleshooting-pg/[2]
! u8 f+ s' p' p0 e G9 S2 b A7 k" U" Q
$ ceph osd force-create-pg 1.74 --yes-i-really-mean-it
) I: v- \& W* I$ t7 \* \1 r8 c# A3 I: c* t5 ]
# 批量执行
/ I2 T2 h: ~) s# ceph pg dump_stuck unclean|awk '{print $1}'|xargs -i ceph osd force-create-pg {} --yes-i-really-mean-it' b( x( w* N" q- l! U& K( b5 ?
复制
, u0 g5 q! c- G' d5 W执行完成后即可恢复。
. J5 ~7 g, Q9 K2 p
& n: s; w& I: H r) U: L1 clients failing to respond to capability release
! u, R9 X# ? C1 g% ^) R m报错如下:) F6 G: P8 U3 D/ d- `) O& S; M- K
" y8 w- A( c# K8 p/ l; J
$ ceph health detail
/ b: z7 [ Z3 ~HEALTH_WARN 1 clients failing to respond to capability release8 P9 Z* }9 F3 ?4 [5 `" w7 I+ a% B5 i
MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release) r- n( t0 h7 H( N% f" ?0 Y
mdsmaster001(mds.0): Client master003.k8s.shileizcc-ops.com: failing to respond to capability release client_id: 284951
" N. ~# i; ]* v8 Q9 F0 w复制* o) v* l) X! D" ?9 m4 D" n* Z
清除次 ID 即可:https://blog.csdn.net/zuoyang1990/article/details/98530070[3]
/ c6 p6 Y. ~+ J/ y2 h
) u6 k! n$ l" x/ c$ ceph daemon mds.master003 session ls|grep 284951! u/ K' U m5 }1 y- S* a
$ ceph tell mds.master003 session evict id=284951
$ p7 L9 ^+ E H; _6 o: l5 C, `复制( {7 l, B4 ]* t+ z3 [
如果报错如下:6 |( x$ k" q; A2 X
9 I* h0 N5 k9 S' ]- H$ ceph tell mds.master003 session evict id=284951
) W$ S9 T, |3 {2020-08-13 10:45:03.869 7f271b7fe700 0 client.306366 ms_handle_reset on 10.100.21.95:6800/1646216103
8 M' z7 r1 f5 o) C Z5 ~% Y$ n' Q, ~1 K2020-08-13 10:45:03.881 7f2730ff9700 0 client.316415 ms_handle_reset on 10.100.21.95:6800/1646216103) t2 S" B6 q5 [+ }
Error EAGAIN: MDS is replaying log
2 B9 m3 o% {5 I复制
7 \# a) G; E" _: S4 y需要到 mds.0 节点执行,否则无法找到次 client。; z9 y: ?0 K. V- K) f6 d
$ _9 m$ A: L f' ^* f j
内核优化) f1 i* ~! t2 m6 ~ t9 W3 P1 x% i! p
内核优化:https://blog.csdn.net/fuzhongfaya/article/details/80932766[4]
- P7 k) H: F2 N' W5 h6 J* `8 ]. [7 }8 d, u
$ echo "8192" > /sys/block/sda/queue/read_ahead_kb$ m( V8 E! w1 H+ b5 c
$ echo "vm.swappiness = 0" | tee -a /etc/sysctl.conf
9 e8 e {& d# k3 o" Y$ sysctl -p1 w- r7 p% ^: n5 U: D" @; i" p
$ echo "deadline" > /sys/block/sd[x]/queue/scheduler
a; @4 @1 @- j( D) H3 c O+ D
4 a& \5 p: f+ U V, G7 M5 g6 c1 Z# ssd
+ F) u% ?" ?( ?6 }# echo "noop" > /sys/block/sd[x]/queue/scheduler; R1 K7 \* J# n ]3 ], s
复制
0 y7 F/ t, H5 Z/ c& M1 xswap 最好是直接关闭,配置内存参数在一定程度上不会生效。+ Q) h! }0 Y9 _1 d w8 x, V/ \, O& Q
7 s# P# K5 y# c/ [配置文件
7 w, S* N& D6 N: t7 B Y; K% O V( n" z' x5 {' T% O' V, t
40 核心 128 GB 配置文件:0 w' p$ y! [+ B* z
& J' ^' ]2 q+ H$ F" k1 u0 u[global]. e0 p& J& ]2 X$ y
fsid = 2f77b028-ed2a-4010-9b79-90fd3052afc6# k( X, v+ e# @8 K* K
mon_initial_members = BJ-YZ-CEPH-94-52, BJ-YZ-CEPH-94-53, BJ-YZ-CEPH-94-54
+ _& ~7 Y% d3 P( N, q8 Rmon_host = 10.100.94.52,10.100.94.53,10.100.94.54* p( r9 F; U+ C! E; P' [
auth_cluster_required = cephx/ l% f, J2 J' |% W }% R0 q
auth_service_required = cephx! s6 q% [0 @& Z
auth_client_required = cephx2 o# ~) W, |7 ^ G- t; S
1 V* O5 H2 v0 Z! ]9 o ppublic network = 10.100.94.0/24
- t3 q( N' L8 Z! T. _( p3 I# ocluster network = 10.100.94.0/242 F6 X% Z. F( ]# h5 ?
i3 S/ {6 L# S; M5 S/ K* e3 ^) d% A" q3 r
[mon.a]
; P5 J1 S4 \" l, t# T4 M) Q# _ Chost = BJ-YZ-CEPH-94-522 q0 u5 v: X2 O) J+ G E; {
mon addr = 10.100.94.52:6789+ w! Z& F% z( S# y7 b, q! M
C& t) e p2 t! D+ z[mon.b]
- F1 d( Y! m1 S( C" ]host = BJ-YZ-CEPH-94-53
) {/ W( V. B K0 Tmon addr = 10.100.94.53:6789
9 \9 V' P, X; _! ^
$ I7 J0 F: Z- D1 t; x3 A% S[mon.c], N3 K& K/ n% _1 I+ d9 ^8 q
host = BJ-YZ-CEPH-94-54
! a1 \0 R+ q7 F: Nmon addr = 10.100.94.54:67895 z% u" M+ Q7 _; P7 K" K+ v
, d' v0 p$ P: _# Q. {[mon]- T( l$ ]; T' F5 w+ ^
mon data = /var/lib/ceph/mon/ceph-$id
/ R4 H% {) C1 e' }! }& q3 \6 U o( W( ~+ Y6 \
# monitor 间的 clock drift,默认值 0.05; z- m, x$ |6 n; |' x! a: y1 F
mon clock drift allowed = 1
1 K, I$ B6 `) [- s8 J+ M% {
! `/ R) J+ O; \8 D, v# 向 monitor 报告 down 的最小 OSD 数,默认值 1
7 q$ ]6 E; v. _3 }# @+ L' U4 Z% Y$ Cmon osd min down reporters = 1* i& K% k+ y; V+ `
7 u0 |. f! H! k7 t# 标记一个OSD状态为down和out之前ceph等待的秒数,默认值300# J' a4 ]0 ?8 q6 A6 [
mon osd down out interval = 600
/ f5 l2 x' R+ q5 W n! Y+ A% X# H$ C% |. |" B
mon_allow_pool_delete = true$ w5 y& v2 c* |% p
# h0 [# P8 W' d* C: l; \: O[osd]
W+ K. d/ ]$ {" y& b/ P: \# osd 数据路径" E- S: i, @; N# F
osd data = /var/lib/ceph/osd/ceph-$id
9 q% Y" ]3 d- i
8 c$ i7 K+ ]' D' M& a8 O0 f# 默认 pool pg,pgp 数量
7 F& }1 s8 R) N/ i1 eosd pool default pg num = 1200+ f) I7 ~! } v" E
osd pool default pgp num = 1200
. @) E5 t m5 n1 v4 e4 U1 S% D9 ?4 ]) y: `0 a5 f
# osd 的 journal 写日志时的大小默认 5120
- ]6 E) p+ M- Vosd journal size = 20000# i) ~ Y G. I4 o' F" q
5 c L' b# b V- h( [, a
# 格式化文件系统类型
9 C3 g" L0 b& q! w5 l |" Q, r4 cosd mkfs type = xfs
5 n1 b' A: ~# w- H1 F" \5 M7 R, s5 W! }6 d5 T7 p, `* K% t
# 格式化文件系统时附加参数; v/ X% ?! b4 l. P. T0 O- I
osd mkfs options xfs = -f- u' `6 g1 k; ]: _* l9 Q5 {
- P# E' R/ a6 B; Z3 q1 t; B# 为 XATTRS 使用 object map,EXT4 文件系统时使用,XFS 或者 btrf 也可以使用,默认 false
2 u" u+ V1 K" \6 y: {4 K4 e$ Zfilestore xattr use omap = true
, o! u- G5 r; I. W" S' O. s. K- R# _' x5 \. C, G/ U- }5 T# h
# 从日志到数据盘最小同步间隔(seconds),默认值 0.1) Y9 U6 v4 |) K" M7 K
filestore min sync interval = 10
# f1 O6 S8 D o+ W3 N' F; i; x
" O( P1 a( V) d# 从日志到数据盘最大同步间隔(seconds),默认值 5( R" j8 W @; ~) N2 D
filestore max sync interval = 15
+ d8 A' H4 @3 X" k2 e* A1 C
9 @# {; ^2 \: J5 F, M* m; e% J6 C" i# 数据盘最大接受的操作数,默认值 5009 }8 T! _; F( R, K
filestore queue max ops = 25000
. D B' h. q- U0 D
, p% k* {0 n/ n7 Q9 W# 数据盘能够 commit 的最大字节数(bytes),默认值 1006 g" X, V$ M2 J6 v3 _. w
filestore queue max bytes = 10485760
$ ^5 _4 z6 p$ L+ e1 a- G- P4 h; q; y! s4 }$ [! J
# 数据盘能够 commit 的操作数,500) P# z7 x7 U* d" M8 u7 e( y* Y5 z
filestore queue committing max ops = 50009 |- I' ^& B. d9 f% }2 `* `
2 Y' l/ m' j$ Y: j" z
# 数据盘能够 commit 的最大字节数(bytes),默认值 100- U- b: U. s. b9 L2 P
filestore queue committing max bytes = 10485760000
0 ]* }9 l0 I. d& Z0 L3 } Z0 |6 C( a; d& m" }" X. X3 p
# 前一个子目录分裂成子目录中的文件的最大数量,默认值 2$ I1 B6 y$ P. K0 i; A
filestore split multiple = 8" o9 Q, P" ?7 J2 C3 _
. Z: s0 u8 T5 u" T3 u8 N
# 前一个子类目录中的文件合并到父类的最小数量,默认值10/ `8 w$ N, P8 @# ~6 |8 z* I
filestore merge threshold = 40
) S1 Y4 v$ W9 q- E( O! ]$ x X+ k7 t3 {' p
# 对象文件句柄缓存大小,默认值 128+ y6 a1 g0 I c! h2 W
filestore fd cache size = 1024
' ?4 ~4 V+ F! r) Q+ o9 E: e C- R
* T: D! h0 m- u0 c# 并发文件系统操作数,默认值 2* |6 U* G' a+ u3 @# H" P4 A8 c
filestore op threads = 327 `& y. t F, C$ ~5 J5 I
: E4 M( h$ m1 n, I: w( }# journal 一次性写入的最大字节数(bytes),默认值 1048560
* t8 ?! d/ m. k; t' J( r# Tjournal max write bytes = 1073714824
" Y3 Y. ?! M( ]+ J6 A+ H! u% P! Q, b) _8 e3 A$ ]7 k: S
# journal一次性写入的最大记录数,默认值 1002 |: C0 {) j: w
journal max write entries = 10000
+ k: [; H) y0 W' T6 W0 d8 l6 h% R' ?, Y' r* v9 |5 N
# journal一次性最大在队列中的操作数,默认值 50
: V4 t7 K0 _1 Ojournal queue max ops = 50000; r7 r6 w7 g l
/ }8 O! c/ }. B) S8 X1 V& \# h/ m
# journal一次性最大在队列中的字节数(bytes),默认值 33554432& N. o7 Z6 o! }
journal queue max bytes = 10485760000
7 o; W+ u f- S( ~
! Y' {1 [1 b1 r5 ]+ d) t# # OSD一次可写入的最大值(MB), 默认 90
: d z- v' ^, N+ ^# _. ]osd max write size = 512
8 p. R: r% y% g! a r1 L+ e. i: N% c7 R4 ]" k
# 客户端允许在内存中的最大数据(bytes), 默认值100! g) w E: z# ?
osd client message size cap = 2147483648$ _# E& y( v/ g4 Q5 O; |
* g9 c: P( p+ Y# 在 Deep Scrub 时候允许读取的字节数(bytes), 默认值5242883 e7 q3 p) l+ S1 u
osd deep scrub stride = 13107203 @/ ~/ R- j) _3 a0 V* k8 n
% P1 J: G8 t9 |: j* T
# 并发文件系统操作数, 默认值 2
* j$ d4 W/ T* _osd op threads = 32
. \* f T0 j% M$ p9 `
' ^) A4 j, n9 |, n5 B8 P# OSD 密集型操作例如恢复和 Scrubbing 时的线程, 默认值1
. k: I" j' [* k) t6 O8 a- kosd disk threads = 10" t0 S9 W" C# M
! X$ u, n: o, l5 Q4 Y
# 保留 OSD Map 的缓存(MB), 默认 500. }9 n' ^4 M' t
osd map cache size = 10240
3 ~* U9 @ s: D2 L; v6 m- K! h4 S/ k8 J
# OSD 进程在内存中的 OSD Map 缓存(MB), 默认 507 \6 B: ?( w' U! q2 M9 H
osd map cache bl size = 1280
. C* O1 g1 I: b3 U+ z" h1 U4 v' `' ^2 }) w1 w
# 默认值rw,noatime,inode64, Ceph OSD xfs Mount选项
& }/ p7 V% H9 I! b# n+ [. H% p+ tosd mount options xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier"
* O( U4 |3 E# A+ z( c3 S6 ^+ t: o# X9 \$ A& V
# 恢复操作优先级,取值 1-63,值越高占用资源越高, 默认值 10
# Y# D' Y7 g y3 o! G U4 Nosd recovery op priority = 20
; x5 T9 A& w6 ? D7 ]6 ~
5 u2 Q: j2 r* t. k# 同一时间内活跃的恢复请求数, 默认值 15
8 K3 x. d0 [) }. nosd recovery max active = 15
, N6 _4 T, z% G4 E, l' R( T
! p! A* |1 Y4 @9 K! e0 [# 一个 OSD 允许的最大 backfills 数, 默认值 10 r' A6 h2 x- r
osd max backfills = 10
, E* w1 [2 _' r' ^% n, v0 w
. {5 [$ H$ r( r, l* p# 开启严格队列降级操作; @3 Q6 M# k; Z# K2 V4 B3 q: ?
osd op queue cut off = high
2 q6 B: J) M4 H, x$ r% j( u. h [' H+ v+ ]) O- V
osd_deep_scrub_large_omap_object_key_threshold = 800000
- O/ `+ N& f: h) yosd_deep_scrub_large_omap_object_value_sum_threshold = 107374182400 u/ g& k& U% @6 i
. v& J( ~* Y5 j7 X2 J' S8 ]; j[mds]
, H C9 }3 |; p. a8 Y8 o! x% j/ ^# mds 缓存大小设置 60GB
2 |1 c* k" `0 Mmds cache memory limit = 62212254726
% Q& d" A) @5 b* R$ W
: Z& J8 h1 ?6 a, H3 } Z' R# 超时时间默认 60 秒
( T" i3 p: C- X* q5 Nmds_revoke_cap_timeout = 360
7 w A6 r, ]- e) |
0 L( {+ E* e2 [. X1 Y2 Y9 Kmds log max segments = 51200# w4 ` J9 {1 D. C5 d5 }/ L
mds log max expiring = 51200, m. n' O! N3 b6 |# \8 l6 C
4 \/ W+ A6 ~ D0 emds_beacon_grace = 300( Q- R8 e) M; i8 U
6 X: c' ^/ `5 X, I& Y# 对目录碎片大小的硬限制 默认 100000
! F8 c# O# i2 `, \0 k# https://docs.ceph.com/docs/master/cephfs/dirfrags/5 ]: M C5 ~- l0 E" A \5 o
mds_bal_fragment_size_max = 500000
- e% `* y& N" e! `: V8 K- z+ w- V4 c b' E7 q/ H3 e
## 官方配置 https://ceph.readthedocs.io/en/latest/cephfs/mds-config-ref/
/ `( B5 S/ ~; j5 v
/ ~4 w2 }& t# m {* v& m. c[client]
2 P2 h$ U# p# ]/ A* P
4 A+ D3 ]; e9 @" r5 R( B# RBD缓存, 默认 true# w" H4 [ |2 S; \
rbd cache = true- I2 m. y8 n7 w3 ~5 z/ n
8 \, Z8 ^/ x# @% i3 M; e
# RBD缓存大小(bytes), 默认 335544320(320M)
# V4 d- ^8 J+ M4 G- orbd cache size = 268435456
* Z4 }" B" |, A" y( H' _. j8 F! U3 j9 g
# 缓存为 write-back 时允许的最大 dirty 字节数(bytes),如果为0,使用 write-through,默认值为 251658249 I" n6 }7 m$ W0 `7 W& G
rbd cache max dirty = 134217728* N6 H; p! {; e
V1 u% H. Y4 t+ _$ W
# 在被刷新到存储盘前 dirty 数据存在缓存的时间(seconds), 默认值为 1
& n' ^+ b6 ], D8 X* z0 U wrbd cache max dirty age = 54 W. W; h; }7 w* A( n
( i; Z+ h& F$ h/ @
client_try_dentry_invalidate = false
4 D0 J6 x% v+ M7 M& P; t. D: K1 i% c6 ~, p( ? S
[mgr]
0 w; R: p7 L' Zmgr modules = dashboard
3 n. {. X: N$ B) Y0 G( k/ J* d5 L0 E- _- a
# 华为云调优指南 https://support.huaweicloud.com/ ... object_05_0008.html- b% d* C X7 l' Z
# https://poph163.com/2020/02/18/c ... %E8%B0%83%E4%BC%98/: m0 t5 B# T1 J, r3 d* Y( C
复制# `3 v9 |. N. ~8 \
full osd* _' d: Z, I2 v
full osd 每个 osd 已经写满上限:https://docs.ceph.com/en/latest/ ... no-free-drive-space[5]+ e* U9 ~( f) o- a
1 A9 s, z. L' Y9 _, F: c
$ ceph osd dump | grep full_ratio
, t1 _, s- a ifull_ratio 0.95" a8 { ?0 k7 |6 `: E# |/ g: k! U
backfillfull_ratio 0.9- R. d3 T; K/ Z% f T
nearfull_ratio 0.85
. W: c9 S2 m, _& d复制" P- i V5 E, u- J. c5 `- i- k
集群状态:5 _5 [% a& E: e; Q, g
: d. {$ ^: N" ^' b
$ ceph -s
, w6 i/ m; h$ `2 T cluster:4 v. g! F4 n8 R/ ^
id: 2f77b028-ed2a-4010-9b79-90fd3052afc6
5 x3 @( D& P8 U5 A* v9 p5 q2 L health: HEALTH_ERR l% O7 m( J5 a6 [5 P& S5 U
2 backfillfull osd(s)1 A3 o4 x* q& k, g4 g
1 full osd(s)
+ t) p; a) M1 s# ~# F1 g5 g 2 nearfull osd(s)& X6 o: `5 M3 L2 M+ a1 E$ V
7 pool(s) full& n! ^2 c( ~6 D
复制
# [6 X" ]% S0 o" Z) A- K执行 osd 磁盘状态时,如果已经有超过 95% 使用率时则会报错 full osd 则会造成 cluster 无法正常使用:# R4 p( T9 f$ C- m0 `
/ O4 w" `8 k8 }5 j ^, G1 _$ ceph osd df
3 w/ g' Y2 p& f; i& ^ID CLASS WEIGHT REWEIGHT SIZE USE DATA OMAP META AVAIL %USE VAR PGS- M3 z! f* }# p. T' Y& v. K
0 hdd 7.27689 1.00000 7.3 TiB 4.7 TiB 4.7 TiB 918 MiB 9.1 GiB 2.5 TiB 65.15 0.84 686 O9 v; u) f8 o: ~' F
1 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 327 MiB 11 GiB 1.2 TiB 84.07 1.09 67( E5 W- }3 r7 X- e
2 hdd 7.27689 1.00000 7.3 TiB 4.3 TiB 4.3 TiB 924 MiB 8.4 GiB 2.9 TiB 59.70 0.77 67$ {6 a, A3 e" B) U
3 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 807 MiB 9.8 GiB 2.1 TiB 70.57 0.91 661 [6 }8 J- V# k1 {0 I
4 hdd 7.27689 1.00000 7.3 TiB 6.7 TiB 6.7 TiB 770 MiB 13 GiB 583 GiB 92.18 1.19 66
/ D. T* h$ O0 O! F6 E 5 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 623 MiB 10 GiB 1.8 TiB 75.87 0.98 66
1 k( f) U: ?) t9 Q( X7 t$ q 6 hdd 7.27689 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 602 MiB 11 GiB 1.6 TiB 78.67 1.02 64* C) F$ }0 U2 Q( H
7 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 1.1 GiB 10 GiB 1.9 TiB 73.35 0.95 65/ ~8 j; N0 l g6 `6 _+ G ?
8 hdd 7.27689 1.00000 7.3 TiB 5.9 TiB 5.9 TiB 498 MiB 11 GiB 1.4 TiB 81.29 1.05 68
- u5 Q9 ^" {+ f$ p# c: q# V 9 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.1 GiB 9.8 GiB 2.1 TiB 70.59 0.91 65
/ u- V# O" r# [% I; t10 hdd 7.27689 1.00000 7.3 TiB 6.3 TiB 6.3 TiB 297 MiB 12 GiB 985 GiB 86.78 1.12 618 @; t: L4 \# r1 |" O9 P- s) V6 {
11 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 923 MiB 9.7 GiB 2.1 TiB 70.56 0.91 67
4 b$ U! z; I! N4 P3 x; N12 hdd 7.27689 1.00000 7.3 TiB 5.9 TiB 5.9 TiB 203 MiB 11 GiB 1.4 TiB 81.39 1.05 65
3 z9 @7 E& i5 [# S4 n13 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 799 MiB 10 GiB 1.9 TiB 73.29 0.95 66
& i: I: b. I% \' q/ g1 C14 hdd 7.27689 1.00000 7.3 TiB 4.9 TiB 4.9 TiB 873 MiB 9.4 GiB 2.3 TiB 67.77 0.88 71
+ [- y- ]4 Y" N L( N: }9 G15 hdd 0.29999 1.00000 7.3 TiB 6.9 TiB 6.9 TiB 191 MiB 13 GiB 387 GiB 94.81 1.23 39
+ j) H# H# W4 J: \" j16 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 548 MiB 11 GiB 1.8 TiB 75.91 0.98 69+ \" u& C# }* o+ H( N% s7 r$ F
17 hdd 7.27689 1.00000 7.3 TiB 6.7 TiB 6.7 TiB 806 MiB 13 GiB 581 GiB 92.20 1.20 66
: U L4 B6 a& i2 m1 {& f18 hdd 7.27689 1.00000 7.3 TiB 4.5 TiB 4.5 TiB 1.4 GiB 8.5 GiB 2.7 TiB 62.43 0.81 66
+ h+ h4 Q5 @ C) t1 K19 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 1.4 GiB 10 GiB 1.9 TiB 73.28 0.95 65
* M/ ~7 g. P# v T+ q* C+ W3 {20 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 705 MiB 11 GiB 1.8 TiB 75.91 0.98 64) h: b5 s8 L7 O8 C
21 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 911 MiB 11 GiB 1.2 TiB 84.11 1.09 62
$ a" ^$ k. D8 P" X9 d# O22 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 301 MiB 11 GiB 1.2 TiB 84.03 1.09 66( b& a+ [) S `, o4 V+ ^
23 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 401 MiB 9.8 GiB 1.7 TiB 75.96 0.98 67) i. e( k3 H; ~2 q# e
24 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.3 GiB 9.6 GiB 2.1 TiB 70.58 0.91 63' U+ m( H8 n6 r( X* [ @9 Q
25 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.1 GiB 9.7 GiB 2.1 TiB 70.56 0.91 65+ j- m6 f- t* P Q+ s k
26 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 730 MiB 10 GiB 1.9 TiB 73.32 0.95 68, [, W$ W# C% x0 _5 n
27 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 818 MiB 12 GiB 1.2 TiB 84.08 1.09 627 w) \; a& `" u0 @$ D
28 hdd 7.27689 1.00000 7.3 TiB 4.9 TiB 4.9 TiB 587 MiB 9.3 GiB 2.3 TiB 67.84 0.88 68
. T% a& q1 N* h4 D6 J29 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 215 MiB 11 GiB 1.2 TiB 84.09 1.09 66) s9 {) Q0 f3 r1 O8 r
30 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 690 MiB 12 GiB 1.2 TiB 84.15 1.09 64
1 s) l5 N; [8 b, E9 I% V5 W/ k31 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 1020 MiB 10 GiB 1.8 TiB 75.94 0.98 64) C& F+ b, \: F
32 hdd 7.27689 1.00000 7.3 TiB 6.5 TiB 6.5 TiB 616 MiB 12 GiB 786 GiB 89.45 1.16 66& _& u1 D$ n5 T3 i4 j
33 hdd 7.27689 1.00000 7.3 TiB 4.9 TiB 4.9 TiB 622 MiB 8.9 GiB 2.3 TiB 67.84 0.88 66& |7 h; [( K) n8 X D! t9 a
34 hdd 7.27689 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 102 MiB 11 GiB 1.6 TiB 78.56 1.02 65! \$ a6 G& @, Z2 O8 [
35 hdd 7.27689 1.00000 7.3 TiB 5.9 TiB 5.9 TiB 723 MiB 11 GiB 1.4 TiB 81.31 1.05 63
}. d7 r! T: B) h8 B TOTAL 262 TiB 202 TiB 202 TiB 25 GiB 381 GiB 60 TiB 77.15
0 V0 n6 A" \6 ~/ F' T+ S* K复制) s$ ]1 G& m+ }( f" H
可以手动修改权重解决:
& _! A- f3 H" L: N( Z. t" u7 X- e( Q9 b
$ ceph osd crush reweight osd.4 0.3
7 { e& I, @' y6 B( B* C L& h( M# j复制. \" e: y) j* p3 X+ q; A8 I Q
pg 均衡# w+ U( L0 X" w. P1 H
pg 在默认分配有不合理的地方。https://cloud.tencent.com/developer/article/1664655[6]
6 N$ P& S0 I, c! l1 x
0 l5 A/ m# F" }* E$ ceph osd df tree | awk '/osd\./{print $NF" "$(NF-1)" "$(NF-3) }'
' W. }: l& w; i/ A: I5 N+ {osd.0 89 71.20* G0 K1 s* [% I1 h" d
osd.1 38 94.80
+ h2 w* \ ]7 y) H# K1 n2 f0 V9 Fosd.2 92 68.44
# {. H) L4 Y5 r8 dosd.3 92 72.36
9 t2 p. S1 V/ N- x3 F* k, j) fosd.4 28 76.86
; \- }7 h# L/ I) u; @! Z1 r* V' P( Kosd.5 64 81.375 i t( e+ ]' \% H! j! }; H/ [
osd.6 62 87.901 i+ N; ?* H& M7 K8 ?$ j
osd.7 89 78.78
2 m/ _+ q' y; ^: }osd.8 52 86.18% W/ \9 _$ s) O- T8 }0 L
osd.9 89 75.44
6 s. Q1 \# X$ g) Z) t( U8 Oosd.10 37 96.337 Y- \# r2 V# x) D" ?5 Q
osd.11 102 75.26
T. a( p/ r3 V" p- x; T( s# e) {( tosd.12 33 91.418 E$ P" {/ H$ F" m
osd.13 34 95.98
6 e" R( {3 T8 f# H! o1 Rosd.14 59 84.977 g( y: |& \( h8 K) c" E8 ^- y
osd.15 20 70.92
$ z8 e( S% u: n6 s# d5 _osd.16 113 89.46+ G( j! D! }$ P) \1 s/ @8 R
osd.17 30 77.12/ x6 L2 q; X( x( C9 ?5 _( [7 O
osd.18 124 77.11
5 v5 h+ `4 L+ o1 p$ W$ l' Posd.19 44 95.23
) ^1 o- ?: {; Z1 `3 eosd.20 65 84.63+ H7 d' Y! w5 G3 B6 m' p5 Q
osd.21 98 96.71
3 t3 ?" c2 V+ [4 Tosd.22 34 95.93
& V6 I" p* \: L6 \osd.23 62 84.56
/ N4 k m0 f) s& {6 Posd.24 110 76.63/ V+ q* W; F+ |: e; N, d; m
osd.25 64 82.32
# l/ J$ l! ~6 _1 @osd.26 59 88.26+ L) b a% V- }- t: m1 Q; |+ N. i
osd.27 38 95.83
% L6 l$ Y, O g: W8 G( \osd.28 105 79.19
+ G6 t' z2 m9 O6 W4 a& S& A& mosd.29 36 94.94
: K9 { ?8 X$ ]+ z: l% O# { Wosd.30 94 90.79
. }9 u5 _" u4 w, ^7 z# [osd.31 91 81.74) {/ m6 ^8 y# R* }% g7 u5 ^+ f
osd.32 12 42.44( C: }& i( S+ j4 y; v# d
osd.33 94 81.32
- F- B3 M: P) K+ U1 e+ xosd.34 46 86.51( F& i( d% v: W: e: k
osd.35 37 92.68
5 J8 d: N7 R8 e0 `6 k$ Y* L6 v复制/ l: i- {. m5 U3 n
reweight-by-pg 按归置组分布情况调整 OSD 的权重:
' e! G% w" B2 v1 ^3 g( c5 X* t5 E* w( F# |4 d) \1 }
$ ceph osd reweight-by-pg
' C- j1 R' m' Cmoved 0 / 2336 (0%)$ E( z( v/ r* d; m/ ` X+ w; ~
avg 64.8889& | g! X+ F+ W. O. P+ {& y
stddev 58.677 -> 58.677 (expected baseline 7.9427)/ p# J2 F7 H5 ^: J% ^+ B6 ^, g
min osd.1 with 0 -> 0 pgs (0 -> 0 * mean)
, L# ]; P M# h) p9 xmax osd.18 with 168 -> 168 pgs (2.58904 -> 2.58904 * mean)
4 T* x! e6 _" q2 k
! W6 z/ S) e1 `: M+ ?& b" m( Zoload 120
1 L& A$ ~2 G/ O& S. p' d8 n. [max_change 0.05 A6 u+ x9 g/ f! S, ^, ~, A
max_change_osds 4
; l' L: d. |; O5 J' }6 Vaverage_utilization 18.2677
# {0 q+ |2 I( V1 M. G* l- l! o; |overload_utilization 21.9212% j$ O6 b4 b2 k' @* y4 J% Y7 I8 k8 J
osd.19 weight 1.0000 -> 0.9500
% ~2 ^8 z% q" v$ C9 W% m+ Hosd.1 weight 1.0000 -> 0.9500
6 Y* _( [8 C. v: ~" W+ [ Rosd.27 weight 1.0000 -> 0.9500& V6 d1 F) n0 H# c6 B9 i2 \
osd.10 weight 1.0000 -> 0.95009 V1 E0 z% r0 X0 a
复制" P, e* P! F+ n% k
reweight-by-utilization 按利用率调整 OSD 的权重:8 `% c' r" S2 g) U, e8 p9 I. L+ p
. w, C- i. v6 M3 A; p5 G7 V
$ ceph osd reweight-by-pg4 e: G/ ~! ^8 J+ t* n% ~+ Y5 `
moved 0 / 2336 (0%) _# X4 @' X% t3 ~7 n
avg 64.88890 O7 p* Z. n ?9 `6 m7 I& r$ ]# I
stddev 58.677 -> 58.677 (expected baseline 7.9427). }5 d$ @8 @6 d% L7 g2 @3 s
min osd.1 with 0 -> 0 pgs (0 -> 0 * mean). ~$ v1 q, ^# U
max osd.18 with 168 -> 168 pgs (2.58904 -> 2.58904 * mean); O- F! \& W- a/ V% ]' `/ E4 V- J
6 d" W; h% X4 E' L B7 c3 Boload 1209 v( n5 l" ~4 c/ C' P3 {
max_change 0.05
' g4 r- y: h9 u- X0 X3 bmax_change_osds 4* k$ s: A I( v9 C$ {) c
average_utilization 18.2677, E8 E: i4 S: c! e4 |8 r
overload_utilization 21.9212- e: }3 ]+ Z7 P9 S7 F9 j& d0 [
osd.19 weight 1.0000 -> 0.9500
+ a. u$ j/ H3 h6 p; Bosd.1 weight 1.0000 -> 0.9500
4 M6 Y9 n& g+ V! D% C" t: [osd.27 weight 1.0000 -> 0.9500$ g1 g1 c% Y4 [
osd.10 weight 1.0000 -> 0.9500
8 b( d. j. j! J8 @9 i+ M& ]5 `复制7 G! D4 M" @8 N9 x* t1 g
调整写入权重:- R/ a7 r! C; p) v/ L
# m: D r+ A, u7 _9 l$ ceph osd reweight osd.35 0.001
1 z- U7 [$ A9 [0 ?复制0 b6 S( Q: ~; E( D* u
查看当前 osd 信息:
1 K% o- g2 |4 Z' j/ _7 l
8 V6 k' ^4 R* C& q" b# \' ^$ p+ d, ?$ ceph osd df
! ~0 F" B/ u5 ]% r! r) `ID CLASS WEIGHT REWEIGHT SIZE USE DATA OMAP META AVAIL %USE VAR PGS, a- S% d T# p( |! W
0 hdd 7.27689 1.00000 7.3 TiB 5.2 TiB 5.2 TiB 1.0 GiB 9.4 GiB 2.0 TiB 71.96 0.86 39! f; h" C" c( l* |
1 hdd 0.00999 0.90002 7.3 TiB 6.9 TiB 6.9 TiB 604 MiB 12 GiB 382 GiB 94.88 1.13 37
% H5 v7 x1 @2 S7 C) Y3 K: e. \+ l 2 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.2 GiB 8.8 GiB 2.2 TiB 69.55 0.83 34" T! T2 i5 b! u* d
3 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 812 MiB 9.9 GiB 2.0 TiB 73.15 0.87 34
' M j2 V7 Y, O+ ?1 C 4 hdd 0.29999 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 185 MiB 12 GiB 1.7 TiB 77.01 0.92 26
( f0 Q# E8 V# o9 P ~: ? 5 hdd 3.00000 1.00000 7.3 TiB 6.0 TiB 5.9 TiB 443 MiB 11 GiB 1.3 TiB 81.90 0.98 367 C/ s. J3 P ?/ u8 J) v
6 hdd 3.00000 1.00000 7.3 TiB 6.5 TiB 6.5 TiB 499 MiB 11 GiB 809 GiB 89.14 1.06 38
, K# a: K* X2 ?; j, D/ S 7 hdd 7.27689 1.00000 7.3 TiB 5.8 TiB 5.8 TiB 1.2 GiB 11 GiB 1.4 TiB 80.10 0.96 43
+ m' k5 [& y9 ]7 a1 B# V 8 hdd 3.00000 1.00000 7.3 TiB 6.3 TiB 6.3 TiB 502 MiB 11 GiB 992 GiB 86.69 1.03 36* J+ O& a/ |5 Q. S* D, R$ ?$ E; b* I
9 hdd 7.27689 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 1.5 GiB 9.8 GiB 1.7 TiB 76.57 0.91 42
4 L3 ?' m* Y/ a; G10 hdd 0.00999 0.00099 7.3 TiB 7.0 TiB 7.0 TiB 295 MiB 12 GiB 267 GiB 96.41 1.15 37
% ^2 E9 C) O) d1 W f. W' Q11 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 1.2 GiB 9.8 GiB 1.7 TiB 76.13 0.91 376 ~- ?0 }. Z5 \% e5 p4 W
12 hdd 0.00999 1.00000 7.3 TiB 6.7 TiB 6.6 TiB 95 MiB 12 GiB 635 GiB 91.48 1.09 32+ r( x) B) L8 V% |1 m7 X
13 hdd 0.00999 1.00000 7.3 TiB 7.0 TiB 7.0 TiB 584 MiB 12 GiB 315 GiB 95.78 1.14 34; {% I1 D F" u3 x1 |
14 hdd 3.00000 1.00000 7.3 TiB 6.2 TiB 6.2 TiB 974 MiB 11 GiB 1.0 TiB 85.86 1.02 40
; r% j6 f( J( R15 hdd 0.00999 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 116 KiB 10 GiB 2.2 TiB 70.43 0.84 20' ^3 e) W1 Q' S! o9 U+ r
16 hdd 7.27689 1.00000 7.3 TiB 6.6 TiB 6.6 TiB 1.2 GiB 11 GiB 697 GiB 90.64 1.08 43
& R; h7 r# ~, z4 A17 hdd 0.29999 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 40 KiB 12 GiB 1.7 TiB 76.75 0.92 26, T4 u. _! K, X
18 hdd 7.27689 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 1.9 GiB 9.3 GiB 1.6 TiB 78.01 0.93 53
! b6 D1 {6 G) Y! R19 hdd 0.00999 0.00099 7.3 TiB 6.9 TiB 6.9 TiB 1.5 GiB 13 GiB 371 GiB 95.02 1.13 407 x a6 g! L* m6 i* Q( q6 \
20 hdd 3.00000 1.00000 7.3 TiB 6.2 TiB 6.2 TiB 744 MiB 12 GiB 1.0 TiB 85.86 1.02 37
& k/ K0 |1 r0 b: D; x21 hdd 7.27689 0.00099 7.3 TiB 7.0 TiB 7.0 TiB 913 MiB 12 GiB 239 GiB 96.79 1.15 40/ i" }/ r; F; j4 H5 ^7 I
22 hdd 0.00999 0.00099 7.3 TiB 7.0 TiB 7.0 TiB 283 MiB 12 GiB 298 GiB 96.00 1.14 346 `- M5 |( I, |: y$ L A$ q# P
23 hdd 3.00000 1.00000 7.3 TiB 6.2 TiB 6.2 TiB 515 MiB 11 GiB 1.1 TiB 85.30 1.02 35- s' C j/ k6 B0 J& @8 E
24 hdd 7.27689 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 1.4 GiB 9.8 GiB 1.6 TiB 77.63 0.93 425 c) E' N% x M" |7 S4 Q9 N# [
25 hdd 3.00000 1.00000 7.3 TiB 6.0 TiB 6.0 TiB 1.2 GiB 10 GiB 1.3 TiB 82.66 0.99 408 F. L4 Z) E: ~4 v" c% m
26 hdd 2.00000 1.00000 7.3 TiB 6.5 TiB 6.5 TiB 737 MiB 11 GiB 823 GiB 88.95 1.06 36
, I4 z6 m9 J) j9 Y27 hdd 0.00999 0.00099 7.3 TiB 7.0 TiB 6.9 TiB 822 MiB 12 GiB 327 GiB 95.61 1.14 37
9 x$ f3 u, A7 ?2 l( @28 hdd 7.27689 1.00000 7.3 TiB 5.8 TiB 5.8 TiB 859 MiB 10 GiB 1.4 TiB 80.23 0.96 40
0 l+ K, B! q# \5 P+ b29 hdd 0.00999 0.00099 7.3 TiB 6.9 TiB 6.9 TiB 215 MiB 12 GiB 371 GiB 95.02 1.13 36% N3 R1 M6 A' Q, ^# x
30 hdd 7.27689 1.00000 7.3 TiB 6.7 TiB 6.7 TiB 1.0 GiB 12 GiB 607 GiB 91.85 1.10 47
* i6 x5 X7 \5 Q X& l" @3 r31 hdd 7.27689 1.00000 7.3 TiB 6.0 TiB 6.0 TiB 1.2 GiB 10 GiB 1.3 TiB 82.81 0.99 41
6 V5 D. X& {1 P32 hdd 0.29999 1.00000 7.3 TiB 3.0 TiB 3.0 TiB 32 KiB 7.1 GiB 4.3 TiB 41.47 0.49 10; ?2 M6 ?, Q1 `4 j; P$ N5 t& u
33 hdd 7.27689 1.00000 7.3 TiB 6.0 TiB 6.0 TiB 827 MiB 9.7 GiB 1.3 TiB 82.06 0.98 416 P4 l) J9 X8 x# v* p8 `1 r7 _
34 hdd 2.00000 1.00000 7.3 TiB 6.3 TiB 6.3 TiB 308 MiB 11 GiB 976 GiB 86.90 1.04 33
& X9 \' p) K: v35 hdd 0.00999 0.00099 7.3 TiB 6.7 TiB 6.7 TiB 613 MiB 12 GiB 540 GiB 92.75 1.11 361 j' W+ c" Y8 Y' I) V% s
TOTAL 262 TiB 220 TiB 219 TiB 27 GiB 391 GiB 42 TiB 83.87
; v! p# Y& P7 g7 B4 jMIN/MAX VAR: 0.49/1.15 STDDEV: 10.62
" x" z9 ]; ?" w- b复制; F. ~" g* J2 t7 Y& \
删除 Cephfs0 L( ^! S; n( C+ U
9 A( N0 ?& a3 ^ S关闭所有 mds 服务, 需要登入服务器手动关闭:
' u" K9 d& S5 S. g+ K9 `8 k/ @- h7 p a! R* ]- f+ B
$ systemctl stop ceph-mds@${HOSTNAME}( `4 w- b0 }& F6 C; ~" z
复制6 k5 f2 ~& c4 X4 v+ A ]
删除所需 fs:
: y' I6 I7 Z. ]5 o3 Y* B/ M! t y5 r/ x8 @8 y* R! r
$ ceph fs ls& u: R6 q/ W, }7 e% v
$ ceph fs rm data --yes-i-really-mean-it
( c9 q! p$ y. X5 x F, ?2 l/ x1 U1 o复制& [" |; Q* a3 L
SSD 使用8 L; y( w @6 s4 l
7 B8 |" u5 g0 y+ \" \查看当前 OSD 状态: (相关文档:https://blog.csdn.net/kozazyh/article/details/79904219[7])
7 d% o5 a. c" H7 K
; |; S `( V: ]9 J1 r6 [: O$ ceph osd crush class ls6 W, M6 g, b" V8 g6 P* P3 U0 s7 I/ S
[4 i( w: [- I8 Z, ]
"ssd"6 [$ d, t8 \6 G A
]
`$ i5 {9 J7 l9 T( K复制
9 K5 i- M. P f如果使用的 SSD 标识错误,请自定义修改,命令如下, 移除 osd 1 ~ 3 的标识:
! E! ^* E3 r! q4 J! `' Y
$ J1 U0 d4 z! M! J2 o6 G- U+ C$ for i in 0 1 2;do ceph osd crush rm-device-class osd.$i;done
1 `9 Z h3 |$ X3 [& X! H+ P5 S复制
8 L: K7 N6 t( h$ \$ h设置 1 ~ 3 标识为 ssd:' t% i; D, e6 h5 H( {) n. P% e3 I( c
! X. B6 V. W" Y5 L% G1 X
$ for i in 0 1 2;do ceph osd crush set-device-class ssd osd.$i;done: `1 Y9 H V. h) P4 @7 p& X" r
复制: G9 `! V9 q: S* d4 {5 N: I4 S
创建一个 crush rule:
% [) q# A4 ^5 M6 [) B/ p. t( \+ j4 S3 w; [5 o; H0 x
$ ceph osd crush rule create-replicated rule-ssd default host ssd. e" e) h5 q4 U5 B; T
$ ceph osd crush rule ls6 I* Q" x2 R6 C+ v- [5 x& o) I
复制( h" q: n ^& }
然后创建 pool 时附带 rule 的名称:
* s( l ]# V2 E
8 O; `, Z0 |- |$ ceph osd pool create fs_data 96 rule-ssd
* j1 q! A# \# I9 I: Q5 u2 _6 r8 @$ ceph osd pool create fs_metadata 16 rule-ssd
" `4 g' Z6 `( ?$ H$ ceph fs new fs fs_data fs_metadata5 y) Q$ Y/ B& v
复制2 M3 \0 t4 @" M% B1 S2 n; \
crushmap 查看
7 K5 `! ^& j4 S4 B% y% i, m5 a执行命令如下:$ K/ j$ r1 e7 k' c2 k
5 U2 b, T3 G3 z( m
$ ceph osd getcrushmap -o crushmap
; n0 l- o) ]: z9 I* v1 O$ crushtool -d crushmap -o crushmap7 ^3 t; Q- Z4 l6 R8 A' [' ^
$ cat crushmap
/ c. o/ |7 V& K6 Z3 n6 z复制5 C# n8 m' |6 m
3 monitors have not enabled msgr2
% q* @! m4 Q! m& T8 g解决如下:
8 j0 |: D5 p7 J$ q
( E' o& \; R$ _$ ceph mon enable-msgr2
- j$ M. Z7 I7 B M0 Q复制
( C# k+ S& C8 ^2 u. g/ T2 daemons have recently crashed
9 r+ u/ k9 S' j4 |1 c5 u解决如下:https://blog.csdn.net/QTM_Gitee/article/details/106004435[8]
1 C E' u/ K/ N( n- G! Z/ \. x' S8 T$ i
$ ceph crash ls5 v0 v( w- |0 l& @; ^, q' V+ X
$ ceph crash archive-all
5 U/ q$ c) K0 P- O e& m1 L8 W- H
|
|