|
|
Rbd 无法删除: N; e; X3 q- q
rbd 无法删除,错误如下:* W9 i# m# b; G. v# B
- J2 ]# a/ a6 I' P$ G2 W& V9 d6 E: F$ rbd rm nextcloud/mysql
9 s/ e' T" I. J3 n, k: ~" i$ f2020-05-13 16:27:46.155 7f024bfff700 -1 librbd::image::RemoveRequest: 0x557a7af027a0 check_image_watchers: image has watchers - not removing' n4 M* P$ Q8 l. h- [& t$ D0 N6 u
Removing image: 0% complete...failed.
' ?! W5 s( @! p* Jrbd: error: image still has watchers
- l1 A) }0 P& M7 x YThis means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.' _, O5 `/ l( [3 f; @
- V# H/ l, e7 }' m$ rbd info nextcloud/mysql0 u" X! B z f: S) `- p
rbd image 'mysql':0 s. F0 h0 C+ x0 w
size 40 GiB in 10240 objects6 x+ O8 S+ E9 O/ T$ _2 _; E# C
order 22 (4 MiB objects); x' u- `/ C$ O, ?$ ~0 B5 v
id: 17e006b8b4567, l h1 b4 [* Z( Y% f
block_name_prefix: rbd_data.17e006b8b4567
! X+ |$ \6 c; S# O n' q3 M format: 26 S4 K/ W6 |0 z* B7 k
features: layering9 X) G1 n7 p4 i' X
op_features:" D0 {- j/ T5 i r
flags:0 b( T; Z0 n5 k% @* U
create_timestamp: Tue Oct 15 10:47:34 2019; G5 s" w, B9 @6 p, J
复制
! E: }4 H" L8 ] q c0 T查看当前 rbd 状态:
4 W& c! ? B# @( `% U; K, H
0 k, ]* G- Q* D I [4 u$ rbd status nextcloud/mysql
. t) v& l4 C- e9 S: vWatchers:, W, e% b; Z+ S+ P6 p8 b
watcher=10.100.21.95:0/115493307 client.67866 cookie=7; e( q" f7 g* ^% U5 _3 C2 o, C
复制4 A B' k+ ~ N
发现有节点正在挂载,登入到相应机器进行查看:& J( N( a# b' g+ a O; q8 s
( Z7 r8 z# f/ @
$ rbd showmapped
" G# C5 ?: G% Gid pool image snap device+ c( _! v4 m) k @
...# f7 Q4 u: i U+ X7 J3 N3 I! _/ I
3 nextcloud mysql - /dev/rbd38 p, r5 s) G( H( |2 O+ a
复制6 K# y6 Z% u( n" U& n/ {" g
取消映射:
+ V: u: b9 G, J1 A5 z$ k' z* @+ I+ }) j- i# e
$ rbd unmap nextcloud/mysql
9 C1 x- G8 g! e, d3 J- ?) \复制
5 v6 r+ U. z6 o7 H! S重新执行删除操作即可:
+ Z! Y a5 d* I8 S! E5 N. _1 e# o7 a' ?6 ^( n
$ rbd rm nextcloud/mysql
G7 P4 @# h9 K# u. t7 ~ ERemoving image: 100% complete...done.
/ a/ i7 T: F2 O- Z, F! o0 U8 _复制
( E" {3 ~' u+ M0 |+ E' T暴力解决方案,直接对其添加黑名单,忽略挂载节点:& v) y9 R' w* E3 u5 r
: q, X1 W# `* a+ x L1 p& G+ @
$ ceph osd blacklist add 10.100.21.95:0/115493307+ b1 \; e4 `* ~9 K" ^
$ rbd rm nextcloud/mysql
5 J5 K1 Q; s/ ^: ^* A3 f# [" a# M复制
* l8 U! Z! i9 p" F& @OSD 延迟# ^: W& N2 Z. x0 a, X$ h
查看是否有 osd 延迟:
+ h+ }7 t4 m& p8 F8 i) X6 f8 d' ^- l. ]7 q# Q3 f
$ ceph osd perf- Z' W3 k) j0 I& ?' |- D& ~0 K
osd commit_latency(ms) apply_latency(ms)
/ H8 J* R8 g8 M1 u 2 0 0
3 |6 G) w7 Q+ N# C6 l3 k- w, W 1 0 0! k' q0 j. v6 t. c% L: R- I
0 0 0 x6 H% i, p3 }1 A7 z- E6 V! C, a
复制
. S4 I! b1 g( A3 d# \" _! Y9 s碎片整理
% F6 i" d$ q7 `查看碎片:) h. a8 [1 v! D; L% H
) y* _' ~7 U" c" V# v5 Z3 A$ xfs_db -c frag -r /dev/mapper/VolGroup-lv_data1& c. t; ^5 _2 Q# P; d) t
复制/ k$ F- ]4 [+ S u7 t8 u4 F6 p$ `1 d. |
整理碎片:
- K' `) X$ l' j" m; S8 v g5 N
0 n/ t+ X; l4 Z' }查看通电时长 z! r/ R' g$ V+ h2 G
查看磁盘通电时长:3 B1 A* }) i# ]3 t5 j9 ~
) P4 I, i8 a; P3 D. k; p( d0 l
$ smartctl -A /dev/mapper/VolGroup-lv_data16 g, Y: @8 J" Q- _) i9 r5 y
复制
3 `$ }( f! O7 q" f3 N8 M4 U6 t8 M修改副本数量
. S, g. S, P. T; t, G: x修改副本数量:
, f: [; O2 H, d- z# W& f! v& J( s: i9 @/ Y; t
$ ceph osd pool set fs_data2 min_size 1 L6 _: Z& O4 Y) `
$ ceph osd pool set fs_data2 size 26 X G! d& ~9 g1 W: \- `
复制
. X8 U( D( m" @6 v5 m( ~* H0 c添加 / 删除 pool
; C1 t y: E4 |7 E) @+ O添加 / 删除 pool:: s Y& y) J u# `
6 r- u; k9 k. V# J& B6 n. A4 }$ ceph fs add_data_pool fs fs_data26 `/ P1 [& x; M Q y0 }- ^
$ ceph fs rm_data_pool fs fs_data2
% o7 a8 `" T# S4 Q7 \6 \2 F) U' j( k复制7 e$ g. Z D0 N$ N
osd 数据均衡分布- t% b8 n) R1 G- ~/ s, }
osd 数据均衡分布:* p2 l1 u R# h4 A! r3 A5 [/ ?+ X9 \
* c: R+ v" E: t/ P; Y; L( N$ ceph balancer status& w6 P- V% Y9 N8 V! W* L
$ ceph balancer on8 c; Q; A& D6 e( T" ~
$ ceph balancer mode crush-compat
( x, D% C& i" [" ~0 n% }, h复制
. c% I9 G5 v2 t) Q1 ^$ W" wmds 无法查询
2 U3 `3 n% o# y3 G6 ~; N+ Rmds 无法查询: P1 _) J' q% `5 t
, R* T" } t! C8 N& J/ C$ I% c$ ceph fs status
: ?5 T: \) D3 GError EINVAL: Traceback (most recent call last):) n2 o8 y" b* r% U* l$ e' y; f9 n
File "/usr/lib64/ceph/mgr/status/module.py", line 311, in handle_command4 {. }% @7 s* G1 M3 M5 f; G7 H& U
return self.handle_fs_status(cmd)/ q/ y4 v: j: f% i1 o+ A
File "/usr/lib64/ceph/mgr/status/module.py", line 177, in handle_fs_status
9 C+ Z3 N) C' \7 ]5 ]5 U8 l mds_versions[metadata.get('ceph_version', "unknown")].append(info['name'])
6 H6 B* X+ N- |8 k/ S+ W+ @AttributeError: 'NoneType' object has no attribute 'get'2 B5 N- _8 Y8 E7 s
1 R4 [1 P9 p8 |8 M( l) j" [' @
$ ceph mds metadata2 n! I: Q. Y- A
[
0 ~# K6 t4 {, B; S3 {* g& G D {
4 b/ \% M8 i6 ~" p' n "name": "BJ-YZ-CEPH-94-54"
$ h1 E. F* W: y. U/ B* `7 U. N },
: B8 p$ N% C% Z0 ~( U5 H4 q: Q5 o {+ H- `) A% {( p0 y o% S* Z
"name": "BJ-YZ-CEPH-94-53",% j' w1 ^- P4 i) ^
"addr": "10.100.94.53:6825/4233274463",' w2 v# ]; Y5 u
"arch": "x86_64",
7 r; U6 i: T2 {" C( K0 H "ceph_release": "mimic",
" X- ^# C, M7 J7 r& ^8 r "ceph_version": "ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)",. s$ X6 g; u7 r5 i. n
"ceph_version_short": "13.2.10",
4 [5 K" \. x, y- Q "cpu": "Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz",. {( n2 ]7 R9 ^* q. l! K" H
"distro": "centos",
' }: g' B" g6 \* ? e' S7 l "distro_description": "CentOS Linux 7 (Core)",7 f8 {6 _4 s+ Z+ `/ |, A
"distro_version": "7",) i( @* K" N; R- j' P
"hostname": "BJ-YZ-CEPH-94-53",
0 U2 [7 r A. ?; W* x "kernel_description": "#1 SMP Sat Dec 10 18:16:05 EST 2016",
& O6 X6 ^8 ^! K: a" C "kernel_version": "4.4.38-1.el7.elrepo.x86_64",1 c6 y1 d* \0 F% I) k
"mem_swap_kb": "67108860",/ O; N7 y. C8 r F( h' a6 \& w3 P* D
"mem_total_kb": "131914936",
" J" ] o0 T& \8 v3 \ "os": "Linux"; _. F7 P) h# G, S, r; y( }1 Z
},- W0 u, Q* A% P$ t: r
{
1 X$ L4 A# g$ w4 ^ "name": "BJ-YZ-CEPH-94-52",2 G: B( B, ]; ?) M; Q2 w+ }4 F8 D
"addr": "10.100.94.52:6800/3956121270",' K/ b, d- [9 v! b% q g+ a
"arch": "x86_64",! D3 ]9 C/ C6 i, l
"ceph_release": "mimic",
) W: E# L2 s. v, Y% Q5 l! O. I "ceph_version": "ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)",
0 x( S6 l t" b3 G6 ~4 [! T "ceph_version_short": "13.2.10",
' [$ N5 ^, o: p7 m* i: R1 n7 u "cpu": "Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz",
7 k9 M: m4 }% d. _ "distro": "centos",! t1 q' S3 Y: M& U% s* J; W" J
"distro_description": "CentOS Linux 7 (Core)",* h! S. q% s6 [9 v
"distro_version": "7",, X4 P: k* j; X: W
"hostname": "BJ-YZ-CEPH-94-52",
% U( g2 ^ Y! `& T5 d0 v "kernel_description": "#1 SMP Sat Dec 10 18:16:05 EST 2016",* \0 `* P7 h$ J3 E9 O2 x8 x
"kernel_version": "4.4.38-1.el7.elrepo.x86_64",+ {8 o$ {5 L; b' n7 t& M
"mem_swap_kb": "67108860",) C7 {6 c* W- `! l5 d
"mem_total_kb": "131914936",
2 A/ Z5 A) L1 @ "os": "Linux"
- A1 g% s! X! A4 ]/ F. Q1 F }; j: _: w% b, n8 L7 b( s
]
5 s& }+ o2 \( Q/ F- g: S复制
9 N3 v# n. d: K B重启 mds 解决。2 a4 c9 g* \0 |2 g6 \7 y1 c
) N4 e1 w8 v9 d0 u9 w# e8 mcephfs 显示状态正常但无法写入数据3 t! O) f! l: y0 u$ _, D2 s6 I
cephfs 显示正常无法使用,一般是有异常 client 导致的,首先查找 mds 是否存在链接,尝试删除链接解决:
: h: P1 d$ u/ n; J% [% B/ j/ i& H, v
$ ceph tell mds.BJ-YZ-CEPH-94-52 session ls
8 ~4 ^1 N/ s7 o1 J! U$ ceph tell mds.BJ-YZ-CEPH-94-52 session evict id=834283
' i) V+ a+ \# a* o' Q复制! s" G6 ^- b2 y, D; m
每一个 mds 的 id 号不通用,不能跨节点删除。5 E1 M, w+ ?3 p
7 S, T$ }$ Y. [% [' @fs 增加 mds
: g# h# h1 B1 a) O1 \. [fs 增加 mds:( ~8 R- K% J4 O0 H1 v$ G+ X
: P/ Q6 x5 ~3 ?. Q/ t* \$ ceph fs set fs max_mds 2! W% B5 y4 y6 e) F4 v$ w) R' {
复制
$ S9 X! Z3 I- Ymon 时区异常0 I- z$ I- S$ j @0 e
mon 因为时区有部分异常导致报错如下:1 u' a" ]8 j1 m! W
1 \7 H; e1 V7 e" T$ ceph -s1 w1 f& B( l# ]
cluster:" W$ x) U5 c& U4 O( u2 e4 }
id: 2f77b028-ed2a-4010-9b79-90fd3052afc6; N1 k# a. a& V
health: HEALTH_WARN
/ ?! f( f& B7 `& J: v! H3 U 9 slow ops, oldest one blocked for 211643 sec, daemons [mon.BJ-YZ-CEPH-94-53,mon.BJ-YZ-CEPH-94-54] have slow ops.
7 X( |. g; u1 X% e" B
% t3 _" Z: B9 D0 ^) h( p services:( K- S# H/ w# ]* Q& Y
mon: 3 daemons, quorum BJ-YZ-CEPH-94-52,BJ-YZ-CEPH-94-53,BJ-YZ-CEPH-94-54: g, l- ?0 }/ x" b$ P1 V
mgr: BJ-YZ-CEPH-94-52(active), standbys: BJ-YZ-CEPH-94-54, BJ-YZ-CEPH-94-53
9 d% R' K) e4 f mds: fs-2/2/2 up {0=BJ-YZ-CEPH-94-52=up:active,1=BJ-YZ-CEPH-94-53=up:active}, 1 up:standby-replay
U/ p/ [& j7 O& |4 ^) s5 A# a$ K- X osd: 36 osds: 36 up, 36 in
: `% m+ b( b5 m
; \$ V; g p9 q1 i3 o& r6 i) I X$ g data:3 B# Z0 r( r# C! b
pools: 7 pools, 1152 pgs0 d! {1 O# b. N, n0 I3 C0 F7 @
objects: 37.66 M objects, 67 TiB9 D- E, U- Z' ]3 Y' P# E
usage: 136 TiB used, 126 TiB / 262 TiB avail9 P7 a; W% w& B: O5 a: j
pgs: 1148 active+clean
+ A* v9 _2 L( v0 U$ W5 g2 l' g0 m 4 active+clean+scrubbing+deep
0 U1 l8 s3 {8 `; m6 c
+ c8 t5 P# `8 o5 d+ I2 L, {/ o" i& c io:" u5 T+ \1 f1 X* Q$ c
client: 13 KiB/s rd, 27 MiB/s wr, 2 op/s rd, 19 op/s wr
! x3 U2 A8 e; \2 F9 t' `0 Z复制
2 K+ p$ i/ ^( e" w% f! _配置 npt sever:1 y# a6 |( [8 _( ^8 I
4 r/ Y" B4 F2 g! [; P+ y
$ systemctl status ntpd
# J; w4 p9 ~7 F/ W1 l$ systemctl start ntpd' C- x0 [, X3 Q4 c
复制, z! C" i2 Q0 l/ ]5 Q
重启异常的 mon.targe 解决:# S" `" X0 e/ ?
4 Q. ^8 N; n! |* {$ systemctl status ceph-mon.target
a4 X% i$ L# ]0 K$ systemctl restart ceph-mon.target
# H; L+ D; ~+ ~( B$ H% t复制3 {0 T" A) P" q. D' p- J
1 MDSs report slow requests4 A$ Z: H1 W( A
报错如下:
7 \/ m* z" \: T; ^9 f; L) A- X. P5 p
$ ceph -s
+ R# \% |1 X( c8 k cluster:
2 N0 B3 }& [2 s+ X! q* Q id: b313ec26-5aa0-4db2-9fb5-a38b207471ee
8 c7 ~4 x. f; T$ X$ D" H health: HEALTH_WARN; N6 B+ U) W8 `0 G' }
1 MDSs report slow requests
4 \+ R) W. R$ a% i Reduced data availability: 38 pgs inactive
# M' V% c) R8 U6 X' D9 k) a Degraded data redundancy: 122006/1192166 objects degraded (10.234%), 102 pgs degraded, 116 pgs undersized
5 `: w3 x* P6 o& l 101 slow ops, oldest one blocked for 81045 sec, daemons [osd.1,osd.2] have slow ops.' h3 J- y1 [# o( z/ `9 b, z, D
复制" b) r; i5 ~; K/ ^: ?# u
重启 mon 即可解决:
% _# w6 [8 R, V6 m% q
9 ~$ V% x& c" u: F7 m$ systemctl restart ceph-mon.target
5 n/ N' ^$ F$ r. B- |) O复制
% R0 G ^% S! q; K; ]2 w如果无法解决需要重启 mds 解决:
. L! Y9 A3 J; \6 E" _7 k0 K% z* X0 C0 I% ^1 ^$ X. M( y# c2 V3 S
$ systemctl restart ceph-mds@${HOSTNAME}- d% h* U4 V$ @* x* `2 A
复制
9 C( m: G! d7 z3 h. i; O1 V% |% ~Reduced data availability: 38 pgs inactive
. U8 O6 f: s9 j$ n/ ^0 N8 g2 }/ P报错如下:https://zhuanlan.zhihu.com/p/74323736[1]
# {( J0 \ O7 q, u7 k( [! i
6 r+ I6 P) n: {" {/ l' e* h$ ceph -s
3 j- s# J' [: M4 ?" a! [6 V cluster:" m; A+ N4 r. c, R+ n
id: b313ec26-5aa0-4db2-9fb5-a38b207471ee
) g% `( c) d, c* X0 E" R. ^% p9 P health: HEALTH_WARN c2 Y v5 O, a% e8 `
1 MDSs report slow requests8 n( r! v' b! B+ }- l* d
Reduced data availability: 38 pgs inactive2 p: r- c# h: O
145 slow ops, oldest one blocked for 184238 sec, daemons [osd.1,osd.2] have slow ops.
5 r2 ?5 s! O& i* Y+ I0 T# C4 M+ d l! o% ~* \! B. N V( G: G" {
services:. v% s! t0 {" E8 y& b1 _7 C
mon: 3 daemons, quorum master001,master002,master003' g' x s/ J5 O8 e) X; R
mgr: master001(active), standbys: master002, master003
( e9 p, p$ o3 ~ mds: kubernetes-2/2/2 up {0=master001=up:active,1=master002=up:active}, 1 up:standby5 p/ M0 b0 t3 y/ b# l- E# N. w
osd: 3 osds: 3 up, 3 in, @# W$ S! _: X
rgw: 1 daemon active0 E/ A+ v. g( q- w: D0 n; v
2 b1 _+ C, M( Y- o data:3 E- K, g* A. j
pools: 9 pools, 244 pgs+ {" i+ k1 D7 _9 d" ? w8 p# A
objects: 535.1 k objects, 177 GiB9 E/ F$ {6 N9 t8 _7 }8 I
usage: 470 GiB used, 4.1 TiB / 4.6 TiB avail
% Z- o; ~% B( f! u pgs: 15.574% pgs unknown! u t& p0 ? u0 j: q
206 active+clean- j- e" h( N6 o5 y. K
38 unknown% R+ h, `& U6 |, e9 h+ l e
/ j8 _% o! \/ O, [, V
io:, u& O: U [$ n8 O0 X/ a, T
client: 35 KiB/s wr, 0 op/s rd, 2 op/s wr
5 z. l t0 j4 [" `, B7 }) r4 S复制
7 B- p6 C& J/ E1 \9 b0 P7 `此问题属于 pg 丢失数据并且无法自动回复造成的。解决办法是清除 pg 数据让其自动修复,但这样可能会造成数据丢失(如果 size 为 1 则肯定丢失数据)
* L1 z) U z j; J) e0 l) @; x1 M/ n# ?9 n: B* C
首先查看异常的 pg:9 H4 l$ g( J: X# i) q$ {! m
2 U% c* G* N/ h
然后执行 query 查看信息:+ A( U6 ~; O- Z
( s( w/ X, Q& k& F
$ ceph pg 1.6e query9 R# V' H1 E1 H6 a) S* `: C
Error ENOENT: i don't have pgid 1.6e
% ~$ x2 ^. |3 |6 c. m, h复制
( S. J K9 ]3 A) d0 _+ z2 v( a- j v% g上述无法查到 pg,通过如下命令查看异常的 pg:
, E$ i* ]' `& {1 M. M4 I, T3 X- \' _
$ ceph pg dump_stuck unclean0 `* `: L" @$ r5 M
ok$ l9 I+ T1 c9 T2 J* U, O4 s( w
PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY( j& [7 P0 S" |! s x
1.74 unknown [] -1 [] -1
7 m8 X. X7 E a" R1.70 unknown [] -1 [] -1
! R9 }; N0 C# Q- A" w4 x' L1.6a unknown [] -1 [] -1
7 s$ R" q, s& a" w) K2 P1 F1.2d unknown [] -1 [] -14 U" ]: I# s$ |
1.20 unknown [] -1 [] -1
, j) h+ a( }0 P( v' G' L1.1e unknown [] -1 [] -1/ a* Y1 f" \' u9 t' B8 Y
1.1c unknown [] -1 [] -1+ g) C! P. s! I6 G* k. h
1.17 unknown [] -1 [] -1
2 S5 m6 \6 S3 b( P9 W3 a1.9 unknown [] -1 [] -1/ J0 P& g/ ?* S/ H) p
1.29 unknown [] -1 [] -17 S3 w' d0 v( O$ z
1.56 unknown [] -1 [] -1) E. q+ l$ O, R3 c* j5 l z6 T
1.72 unknown [] -1 [] -1; \ }6 ~' ?3 y! [( i
1.45 unknown [] -1 [] -14 t% n6 U+ F1 r/ j; G& h2 ?% E( [# o
1.4e unknown [] -1 [] -1
& a8 E* [( L: e e4 v0 P1.46 unknown [] -1 [] -16 ?* A5 G( G# H; |8 X# j
1.22 unknown [] -1 [] -1
2 O4 b. @* @4 J8 a! U$ H1.53 unknown [] -1 [] -1
' H: b6 `$ r& n# { j/ j1.59 unknown [] -1 [] -1
" J) [# {; _% r! x- O0 d: t1.24 unknown [] -1 [] -11 d/ i! `2 h' C
1.55 unknown [] -1 [] -1
; E* U. q7 H, R( M1.3f unknown [] -1 [] -1
. u/ x+ I& p9 X! k5 v1.38 unknown [] -1 [] -1
- x( b& B$ [) C/ R' `1.a unknown [] -1 [] -1
' B1 j/ c( m" N1.7 unknown [] -1 [] -1/ }1 U* o$ W- {1 W7 e4 a
1.34 unknown [] -1 [] -1/ z" D1 f9 y* g
1.64 unknown [] -1 [] -1
1 f4 n% s3 e3 j+ K. f; U1.6 unknown [] -1 [] -1
% e, R; k; U) `9 G7 n1 M1 c8 p* U1.32 unknown [] -1 [] -1
y7 B# p' H8 P0 P5 z1.4 unknown [] -1 [] -17 P9 u; S1 M" J2 N
1.2e unknown [] -1 [] -1
' Y6 r1 w' P2 r2 ]. J3 l1.31 unknown [] -1 [] -1* [: C# h' j4 r6 {, o
1.5e unknown [] -1 [] -1$ u8 I2 a: z0 b2 V* z
1.0 unknown [] -1 [] -1
) U* D" O8 G# x1.42 unknown [] -1 [] -1& |3 {& _' c) R5 ~7 m3 F ?
1.15 unknown [] -1 [] -1) ?$ M2 U. [4 Y$ B; j" Q
1.6e unknown [] -1 [] -1; j! f; b5 I4 [+ y5 a @8 t6 ^' \
1.41 unknown [] -1 [] -1/ |4 s: [$ x# ^0 G: \ p; C
1.10 unknown [] -1 [] -1% S0 K& w/ B: a+ Q! E' P
复制 m6 l7 ~2 K5 d0 S/ \' m4 }# u
执行如下命令强制清除 pg 的数据:https://docs.ceph.com/docs/mimic ... troubleshooting-pg/[2]% q5 j' I1 ]/ v
9 G3 m: t( y" ^6 X: c% w8 A9 \
$ ceph osd force-create-pg 1.74 --yes-i-really-mean-it" @5 D# X0 B9 ]8 ~/ [
. H8 _0 T' ], t- u6 E; z7 \
# 批量执行$ B$ ?$ l+ Q+ I4 |% |* b5 \
# ceph pg dump_stuck unclean|awk '{print $1}'|xargs -i ceph osd force-create-pg {} --yes-i-really-mean-it2 b5 T/ ~1 l4 G) Q
复制
4 R V R; \1 g执行完成后即可恢复。9 \, T1 H8 v; ` R+ V1 l! o% g ?3 N* F
- ?5 X6 w* G; Z; R5 r( c1 i7 {1 clients failing to respond to capability release9 E' N% w) A7 k1 b, B% F
报错如下:
- {3 q; t, K. n- w, j' S4 V/ l+ Y9 _5 l
$ ceph health detail
! x& {. W+ r* s# X; a; B4 aHEALTH_WARN 1 clients failing to respond to capability release, M7 f: o7 k1 x* c+ }; r* [
MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release
+ O8 d% j* b5 O8 X& l/ _3 i mdsmaster001(mds.0): Client master003.k8s.shileizcc-ops.com: failing to respond to capability release client_id: 284951
7 g- \, Q: v y" _" q复制
7 B. ~# s5 B+ r4 H清除次 ID 即可:https://blog.csdn.net/zuoyang1990/article/details/98530070[3]. j0 g$ w0 y2 }; l8 u
. C+ o; }" f, Y' Z' |6 x, s. a$ ceph daemon mds.master003 session ls|grep 284951
' [' z4 ^+ K. R2 i' l$ ceph tell mds.master003 session evict id=284951
; @9 a2 d/ e5 E8 J* _+ F2 E复制9 x7 r8 P$ t; x/ d" n
如果报错如下:2 z3 `3 U8 `5 t. k: K
$ {3 \. y' X% L3 v
$ ceph tell mds.master003 session evict id=284951
6 n% P4 t3 B* H" s7 Q: s& Z2020-08-13 10:45:03.869 7f271b7fe700 0 client.306366 ms_handle_reset on 10.100.21.95:6800/1646216103& |6 W/ U0 k4 g- S% i
2020-08-13 10:45:03.881 7f2730ff9700 0 client.316415 ms_handle_reset on 10.100.21.95:6800/16462161033 ^: K! f" N3 p& v; Y
Error EAGAIN: MDS is replaying log
9 m7 ]2 v- _' x( w: `/ Q; I; \复制& C4 t/ C! F# D* Y9 F% P. J
需要到 mds.0 节点执行,否则无法找到次 client。
2 l. U; _ l4 X4 Y0 n& u9 z6 m! O: O! `
内核优化: T9 N3 i2 f9 {, n! V# D
内核优化:https://blog.csdn.net/fuzhongfaya/article/details/80932766[4]6 z/ i" Y0 B$ e0 F8 V# l L- C
2 W* O& c1 z6 F7 _' ^$ s5 s% i: @+ k: f
$ echo "8192" > /sys/block/sda/queue/read_ahead_kb3 m; E3 v1 t5 P" d* o9 ?) z K
$ echo "vm.swappiness = 0" | tee -a /etc/sysctl.conf
( h, w; P9 N8 x9 E1 I+ i$ sysctl -p
' i+ u y5 v& J- f$ echo "deadline" > /sys/block/sd[x]/queue/scheduler
0 A/ D6 n/ L. S6 r" p4 l& N0 U* Y u6 c" Z- I6 q! i
# ssd
7 g- w2 r! A* {6 Z* V5 @7 x3 ^, x% h# echo "noop" > /sys/block/sd[x]/queue/scheduler% P, w) O# Q* z" |1 T3 s7 }3 Y
复制
$ v( y r" I& L+ [+ rswap 最好是直接关闭,配置内存参数在一定程度上不会生效。
& K6 c; G. o( E, ^% a& a+ V2 L
/ p+ z; u! K6 @- w配置文件
2 c$ E# L5 Z: v2 k* n& _6 Q+ X2 N! a+ @% L2 s+ ~+ k
40 核心 128 GB 配置文件:
/ E V0 R3 z1 |4 o- B6 z4 N4 v! A; f: b: i- `; ^
[global]+ C u% M% | u7 W R
fsid = 2f77b028-ed2a-4010-9b79-90fd3052afc6
A ~( f( w" e! Q- [mon_initial_members = BJ-YZ-CEPH-94-52, BJ-YZ-CEPH-94-53, BJ-YZ-CEPH-94-545 A* C- M1 E8 M! s8 m
mon_host = 10.100.94.52,10.100.94.53,10.100.94.54
' v; J R& o8 g; wauth_cluster_required = cephx
$ d% Y1 v7 o8 \7 uauth_service_required = cephx3 V, R& B1 l; I7 N
auth_client_required = cephx
8 d- ^% D! u5 O" }+ m- _
+ y5 F, S' U# k6 c2 K2 e* Apublic network = 10.100.94.0/24- A& b+ p8 G4 f3 l r# y7 W
cluster network = 10.100.94.0/244 B @' E9 o% t3 C" U. Q+ q& M
; m7 ?7 Q }/ X6 r[mon.a]: }: {8 X1 X6 ]0 k3 d
host = BJ-YZ-CEPH-94-52 y( l9 s. X* S# f$ `- P4 R/ R/ }
mon addr = 10.100.94.52:6789
4 r* u9 k& k o4 o9 `. Q6 n8 l" T& {; L3 |* c( \
[mon.b]# ?) n$ o# q% g' W: r, q, y) N, h
host = BJ-YZ-CEPH-94-536 U6 Z" @. r! l5 H% j5 R* r
mon addr = 10.100.94.53:6789
4 q4 } \; p( h7 l! R. k* x0 {1 S
[mon.c]0 Q! z2 ?8 e( V* T7 ^) i
host = BJ-YZ-CEPH-94-54
1 s( m/ j, A( c1 H& v- ^mon addr = 10.100.94.54:67895 c$ [0 l$ _" K4 P: t! d! A
; k) T) R3 w3 |. h5 z[mon]- n5 d2 {7 V5 ~2 @6 @) k
mon data = /var/lib/ceph/mon/ceph-$id" N9 g1 l$ Y; |% g' Y# B0 V
* N* B, V" ?5 b3 M2 P# monitor 间的 clock drift,默认值 0.05
3 w" }4 d. {3 c- A. u( K4 ]mon clock drift allowed = 1
$ x( \+ ^1 b6 q7 t
1 x* z- w+ `/ [( {/ y0 A- P# 向 monitor 报告 down 的最小 OSD 数,默认值 1
& P# Q9 I& w/ @* S: R e" @( }! Smon osd min down reporters = 1
* u; P* n/ P; u3 V+ @4 G) D# b( O+ _; c" z5 S, O, |
# 标记一个OSD状态为down和out之前ceph等待的秒数,默认值3002 C5 P$ n7 ~) J
mon osd down out interval = 600$ n0 g' z1 K7 w' x0 U7 o
( Y% I3 d% S% J( l R6 X7 Hmon_allow_pool_delete = true) q1 X4 M2 V$ x/ I
+ o/ G5 d1 p( e9 D& \" h- U[osd]
2 a: [) z4 i: _0 _6 w# osd 数据路径
, W* d: B! T5 j) q+ rosd data = /var/lib/ceph/osd/ceph-$id$ T& i' y; h- A5 G/ Z/ v
7 D! e+ b% }2 l* Q7 y! `
# 默认 pool pg,pgp 数量
' t0 i! o* |% x8 D4 }* U' `osd pool default pg num = 1200
# S4 o" [1 x4 T& _1 v& J2 d2 S; d$ losd pool default pgp num = 1200: E9 U5 S' d/ H( a+ f! g! o
5 q6 N# T1 o4 B
# osd 的 journal 写日志时的大小默认 5120. H q5 p" r5 b9 ~
osd journal size = 200005 j; a: Z# B) E% w) ~
& v' v* Y9 B! j( Z8 V' a* k
# 格式化文件系统类型
$ Z9 {9 |% [2 d: s- F5 eosd mkfs type = xfs
, f, X5 n2 l0 g C- Z6 M$ O+ ]+ r
0 x& I4 d; [, X0 _& Q: `2 }- H- T# 格式化文件系统时附加参数
) G, |! j& H/ A! {osd mkfs options xfs = -f
+ ~" [4 C6 i+ E( \/ T
6 ^5 B6 V, f7 Q: V. ~8 [# 为 XATTRS 使用 object map,EXT4 文件系统时使用,XFS 或者 btrf 也可以使用,默认 false
% z1 A8 i" W" B! X- Vfilestore xattr use omap = true5 Y# u8 D) ] _
2 ^% o( A$ M6 d" G! P1 q/ B2 s$ U e* z
# 从日志到数据盘最小同步间隔(seconds),默认值 0.1# R: c& x9 Q' D% d7 G4 V
filestore min sync interval = 10
9 C) o' @( C2 c. @4 T+ ]% z, @! A! o" G9 u7 W7 G* b; N
# 从日志到数据盘最大同步间隔(seconds),默认值 5
% P9 |/ h9 [+ }9 i) m- a" \filestore max sync interval = 15
3 I; a5 |. C& Q. m
) j+ o! T6 S$ {5 ?/ L# 数据盘最大接受的操作数,默认值 500
( H4 H: T$ K+ ?1 o5 F( N$ ffilestore queue max ops = 25000- G% Y# q! j+ E- Q: \ g3 Q" m
9 i" Y; W6 r( S2 o# D7 Q% _2 I. x! y# 数据盘能够 commit 的最大字节数(bytes),默认值 100
; P& w6 `4 o7 ?6 {9 w* k( i2 vfilestore queue max bytes = 104857605 D1 Q9 `) b7 T& c! _2 T" a# ?
4 }9 b; d) p' X# 数据盘能够 commit 的操作数,500
0 K) g$ B/ m/ E4 x+ m/ t& Efilestore queue committing max ops = 5000
# d# R' Y5 i) x% ~9 W
3 a# q7 P& U* }3 ^" j; R# 数据盘能够 commit 的最大字节数(bytes),默认值 100
% g/ z* w- n5 w6 S, v+ efilestore queue committing max bytes = 104857600005 q: x. c( H3 q! f/ H
6 Z- O d" \' L6 E+ l
# 前一个子目录分裂成子目录中的文件的最大数量,默认值 2( L- Z" \6 a) ?; N
filestore split multiple = 8
9 `: T# Q) L" |) N' h
! s; o7 U3 h. N* d$ v3 M: s( `9 g# 前一个子类目录中的文件合并到父类的最小数量,默认值10 q: L/ z M2 H$ z9 C' [; z5 f
filestore merge threshold = 406 f3 T) O7 I; p9 n4 G" y% X( i
+ B( L; ^: Q" h5 e( F6 F
# 对象文件句柄缓存大小,默认值 128
$ X7 J X9 D- [; sfilestore fd cache size = 1024
$ X. q0 v# Q0 h0 Q( J& ]2 H8 N) P3 T \! m E
# 并发文件系统操作数,默认值 2& R- R: f$ N) L) K
filestore op threads = 32
. k. y& W7 `, U# t' ~2 W
. e/ h8 s; s+ L, `# journal 一次性写入的最大字节数(bytes),默认值 1048560# X0 K5 |6 E7 c! ? }- b
journal max write bytes = 1073714824
1 @ e, n4 m4 }! _4 l4 _
- S, e/ X& t: |# journal一次性写入的最大记录数,默认值 100- ]: q7 x9 J: Y! C2 A, `% }$ n
journal max write entries = 10000* C6 p# E4 L) i5 @/ d
: z+ H, H! I6 ]8 J# journal一次性最大在队列中的操作数,默认值 50- ]% Z+ A. N6 ]+ u- ?! ] ^
journal queue max ops = 500009 g4 E1 B. v9 h; {3 \/ C
5 t, Y. t. _+ m% \ Q9 H# w# journal一次性最大在队列中的字节数(bytes),默认值 335544326 ]- _& B5 h$ a2 p, O9 a5 m3 C6 y
journal queue max bytes = 10485760000/ e4 l- x- W! ?
3 ?9 Z2 g |, s6 e. c. F
# # OSD一次可写入的最大值(MB), 默认 90$ C/ G6 j6 ~% B* D q7 \+ K
osd max write size = 512! H( d; t4 {1 f& Q/ L |) c
' s! Q. c1 I; n9 E! F. t$ w5 V# 客户端允许在内存中的最大数据(bytes), 默认值1007 o! e& M" ]( o; k. l
osd client message size cap = 2147483648
7 P6 |4 z. E* v3 R/ s
: c5 }% x; ?8 G* j# 在 Deep Scrub 时候允许读取的字节数(bytes), 默认值524288
N; K3 `, a. E% f( h! Qosd deep scrub stride = 1310720
: Q) q) l2 f; l& D" o; n9 l) ~. L, k. R# u" {! o* a1 G- R
# 并发文件系统操作数, 默认值 2
* O( { T/ a& ]: U- e1 E z8 Nosd op threads = 32( t8 W( v* O* p# E
. i8 t) `/ Y x
# OSD 密集型操作例如恢复和 Scrubbing 时的线程, 默认值1
0 M- m" r6 W- j) Uosd disk threads = 101 z# l0 A/ c6 L, H4 V' b4 `
; [, d7 T: Z" A! W7 |. Q
# 保留 OSD Map 的缓存(MB), 默认 500
4 t& A5 T- @9 C0 U% zosd map cache size = 10240! M, n2 [% A7 c O: ?
* N& C# g8 R0 {! Q# OSD 进程在内存中的 OSD Map 缓存(MB), 默认 50) w& [) h% ^# f5 }: F
osd map cache bl size = 1280) e9 r) a5 m" t, H3 a+ Y0 j. J
" `1 b9 _: P5 L5 g: w# 默认值rw,noatime,inode64, Ceph OSD xfs Mount选项+ O) a: o& A g6 k3 x' J5 o
osd mount options xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier"
) V- ?) H6 s& b. g' F) b5 M+ V' |
6 Z7 _ m1 J& |! \0 E# 恢复操作优先级,取值 1-63,值越高占用资源越高, 默认值 10
8 y9 O! Q3 o/ P% u8 @; j* Dosd recovery op priority = 20
1 K& y6 e4 N( b& H# }0 ~! x+ M: _$ B0 S3 _% l6 O' _7 F9 F
# 同一时间内活跃的恢复请求数, 默认值 15
% e2 v* Z# D0 ]( Nosd recovery max active = 15
, d& S+ S, X m+ v6 Q7 F6 z' Y: c/ {( z9 l6 U* A
# 一个 OSD 允许的最大 backfills 数, 默认值 10, H t, S5 o+ U. q8 i
osd max backfills = 100 [" J/ L2 m/ o: i
/ y( T* F& F* Z2 j, }! L
# 开启严格队列降级操作2 Q. | u U& M- `$ z1 s4 n
osd op queue cut off = high' f- Z, j2 |0 H$ ?9 V$ q
/ j7 [" j! Z0 ?: F( N9 P- cosd_deep_scrub_large_omap_object_key_threshold = 800000
! j: D& l+ K* ]/ L# g6 z# H& Nosd_deep_scrub_large_omap_object_value_sum_threshold = 10737418240
1 R' x1 ?) t8 @ h) ?& B: |8 i' }: _ E( e. t' ~ O2 {& Y
[mds]+ q! y' b* x; N% r6 d, E/ w1 S
# mds 缓存大小设置 60GB/ X: C" _* @1 X& @( t0 \9 F* ~
mds cache memory limit = 62212254726
$ j2 j- K: ]3 C* t( |
4 D% Y8 U5 z9 p% o# 超时时间默认 60 秒% M- E1 P' E. v* N5 `7 Q
mds_revoke_cap_timeout = 360! ^( o" v" X+ \. u$ D% L
7 F" C' s: o+ X, P6 {, emds log max segments = 51200
* Y* ~! _. y4 Nmds log max expiring = 51200* [4 ^. @. _; ?) F* C3 Q
5 U; O p @; C& M
mds_beacon_grace = 300
7 p% E8 I; J/ X4 u
: @7 G8 _" u- o8 \, O# {# 对目录碎片大小的硬限制 默认 100000$ r% N1 Y1 B0 g3 z% s# h
# https://docs.ceph.com/docs/master/cephfs/dirfrags/ u6 ~3 ]6 G2 R) T# x" o* [7 r
mds_bal_fragment_size_max = 5000008 _/ M5 ?0 v4 D3 W9 R4 W
- a6 \ @; v) g5 P# O. L- F( a
## 官方配置 https://ceph.readthedocs.io/en/latest/cephfs/mds-config-ref/0 h! C7 J8 [$ I6 h( ]7 C2 T
* {+ |4 H8 x+ f( E4 b3 z[client]* q7 E$ n: U. }, _+ z+ F1 N
6 P ~- e, K, I F, ?# RBD缓存, 默认 true
8 A3 R! }& O5 H$ S$ u1 urbd cache = true+ ]) f* h# D& g8 Y1 s$ Y$ P: P; f
; i+ a( y2 v% @1 a# RBD缓存大小(bytes), 默认 335544320(320M)
8 s) a3 Q6 b0 X% Arbd cache size = 268435456
9 _# S" e* P# q, P5 q% r* U* |# ^: Y2 b, C$ e
# 缓存为 write-back 时允许的最大 dirty 字节数(bytes),如果为0,使用 write-through,默认值为 25165824) `' ~8 u7 h8 ~
rbd cache max dirty = 1342177284 V7 U4 @7 i. x, ~2 Z6 u ~
5 B S" e8 e) d7 {
# 在被刷新到存储盘前 dirty 数据存在缓存的时间(seconds), 默认值为 1
' k2 Z: f. P; t& e" u2 L9 O% Prbd cache max dirty age = 5
) J) }7 _7 R# k" `" K/ u: m
/ G& t4 q% E% T' P2 l( H8 _client_try_dentry_invalidate = false
8 n( S: ^- w1 Q
8 A$ n& h0 J2 G+ w( k2 c. e5 {: n8 ^[mgr]. h1 j5 }" v9 l5 v' L5 B- _3 _6 N2 y
mgr modules = dashboard8 r: i& N9 l% a$ w, k# d
1 `) C2 h0 s b7 e- I* x# 华为云调优指南 https://support.huaweicloud.com/ ... object_05_0008.html- z0 g4 s/ o! }
# https://poph163.com/2020/02/18/c ... %E8%B0%83%E4%BC%98/6 h) i3 |4 H; y! R, Z
复制5 C7 ^: \) P" ^% u. N, S
full osd X$ B4 s0 Y0 ` ]! a6 a, u
full osd 每个 osd 已经写满上限:https://docs.ceph.com/en/latest/ ... no-free-drive-space[5]
9 C1 }2 L8 \- B, J
+ C- x& u2 P/ p6 s0 t$ ceph osd dump | grep full_ratio0 N _3 {! ?2 V/ w
full_ratio 0.95( N# D0 I' K7 ]5 [4 n* x7 k2 A3 x" p
backfillfull_ratio 0.9
6 r N5 w) Y, G/ q; S( z' ]# Qnearfull_ratio 0.85+ j0 Z7 s7 Q! K/ W' M& s
复制
/ `- i0 G! o2 }- k- f集群状态:% D3 L5 e$ b3 c ~: F& c3 `+ i( z
# R+ Z, {; u8 w8 i! h x$ ceph -s% M( W3 e2 T! F9 `+ O1 S5 `
cluster:
4 I, [1 X$ B2 E( h7 E2 @ id: 2f77b028-ed2a-4010-9b79-90fd3052afc6& p1 B: u2 h8 n( r: |$ Z
health: HEALTH_ERR
8 Z5 l1 [ e) v$ S8 R) {4 O. B2 T; _ 2 backfillfull osd(s)
" W* `0 F: s$ ~# h6 Q 1 full osd(s)/ j @( d# O% Y- O0 e5 w3 K
2 nearfull osd(s) q$ B% l; b5 ?
7 pool(s) full
9 V8 I0 t4 ~; y) Z复制5 e: z+ n1 ]0 X7 V- ]/ K! t
执行 osd 磁盘状态时,如果已经有超过 95% 使用率时则会报错 full osd 则会造成 cluster 无法正常使用:3 q8 A A1 @* N; X' R
4 _# q- z: c7 w4 }: F0 z$ ceph osd df
* }4 |( U {$ |! U# }1 mID CLASS WEIGHT REWEIGHT SIZE USE DATA OMAP META AVAIL %USE VAR PGS/ O- G2 v8 G' \2 u9 R v
0 hdd 7.27689 1.00000 7.3 TiB 4.7 TiB 4.7 TiB 918 MiB 9.1 GiB 2.5 TiB 65.15 0.84 68
( H9 y2 X* a6 d9 r7 q. V 1 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 327 MiB 11 GiB 1.2 TiB 84.07 1.09 67
- e2 ?2 l) o* o: G! t P! W/ M 2 hdd 7.27689 1.00000 7.3 TiB 4.3 TiB 4.3 TiB 924 MiB 8.4 GiB 2.9 TiB 59.70 0.77 67
3 V) }( b- z# o# A8 A1 i 3 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 807 MiB 9.8 GiB 2.1 TiB 70.57 0.91 66
8 I' X6 v ^% t- `5 y/ M( q 4 hdd 7.27689 1.00000 7.3 TiB 6.7 TiB 6.7 TiB 770 MiB 13 GiB 583 GiB 92.18 1.19 66. f# h! m0 M! Y* D
5 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 623 MiB 10 GiB 1.8 TiB 75.87 0.98 66- d! V4 X' F+ B% ]% Z1 Q
6 hdd 7.27689 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 602 MiB 11 GiB 1.6 TiB 78.67 1.02 64
9 V; ^7 F# ?) s/ U 7 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 1.1 GiB 10 GiB 1.9 TiB 73.35 0.95 65+ {6 O; L3 E* ~1 z+ W1 q2 E
8 hdd 7.27689 1.00000 7.3 TiB 5.9 TiB 5.9 TiB 498 MiB 11 GiB 1.4 TiB 81.29 1.05 68; u/ Z5 l* q% u% o+ B, q2 k' M
9 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.1 GiB 9.8 GiB 2.1 TiB 70.59 0.91 65
6 C1 m8 @: P5 P% W10 hdd 7.27689 1.00000 7.3 TiB 6.3 TiB 6.3 TiB 297 MiB 12 GiB 985 GiB 86.78 1.12 61
" }5 T9 `( q! N/ d% @5 T8 _7 `( ?11 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 923 MiB 9.7 GiB 2.1 TiB 70.56 0.91 678 C% N: s& P9 x" P; S0 M
12 hdd 7.27689 1.00000 7.3 TiB 5.9 TiB 5.9 TiB 203 MiB 11 GiB 1.4 TiB 81.39 1.05 658 B% Q6 X: J7 \7 m
13 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 799 MiB 10 GiB 1.9 TiB 73.29 0.95 66! T! D" u& l/ X% G7 I
14 hdd 7.27689 1.00000 7.3 TiB 4.9 TiB 4.9 TiB 873 MiB 9.4 GiB 2.3 TiB 67.77 0.88 71
# L, I0 @' C7 I$ c' K* r15 hdd 0.29999 1.00000 7.3 TiB 6.9 TiB 6.9 TiB 191 MiB 13 GiB 387 GiB 94.81 1.23 398 o9 u% D: p+ J2 ?, u
16 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 548 MiB 11 GiB 1.8 TiB 75.91 0.98 69
1 w3 L n& Y, @0 r" b* F* ?, I17 hdd 7.27689 1.00000 7.3 TiB 6.7 TiB 6.7 TiB 806 MiB 13 GiB 581 GiB 92.20 1.20 66
- l% v% A1 l; _. |; e: z; v- v18 hdd 7.27689 1.00000 7.3 TiB 4.5 TiB 4.5 TiB 1.4 GiB 8.5 GiB 2.7 TiB 62.43 0.81 66
! u, t6 ^; E1 u19 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 1.4 GiB 10 GiB 1.9 TiB 73.28 0.95 65
% D" N- f9 d4 \4 ~* v+ T20 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 705 MiB 11 GiB 1.8 TiB 75.91 0.98 646 l( ~2 u- E& `' J2 I" Y
21 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 911 MiB 11 GiB 1.2 TiB 84.11 1.09 62
. d! m* \4 d7 ? L `22 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 301 MiB 11 GiB 1.2 TiB 84.03 1.09 66- _# L# W% N" b" z5 W. Z$ U* t
23 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 401 MiB 9.8 GiB 1.7 TiB 75.96 0.98 67
i: K; s! `, l! \24 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.3 GiB 9.6 GiB 2.1 TiB 70.58 0.91 63
, X" y9 I% M$ I0 I) u1 s25 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.1 GiB 9.7 GiB 2.1 TiB 70.56 0.91 65
$ A. \) A" i; `26 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 730 MiB 10 GiB 1.9 TiB 73.32 0.95 68
: u: k% ?, _# T0 {% u" J2 l27 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 818 MiB 12 GiB 1.2 TiB 84.08 1.09 62
9 Y" ` \* ~8 y; g28 hdd 7.27689 1.00000 7.3 TiB 4.9 TiB 4.9 TiB 587 MiB 9.3 GiB 2.3 TiB 67.84 0.88 68" _4 x# ]4 c/ @8 Y6 o' f
29 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 215 MiB 11 GiB 1.2 TiB 84.09 1.09 668 y0 X1 X7 ]" C* O4 S: v s) b
30 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 690 MiB 12 GiB 1.2 TiB 84.15 1.09 64# e3 g! `5 v+ U3 C/ G
31 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 1020 MiB 10 GiB 1.8 TiB 75.94 0.98 648 r4 V' C7 P x9 T* q
32 hdd 7.27689 1.00000 7.3 TiB 6.5 TiB 6.5 TiB 616 MiB 12 GiB 786 GiB 89.45 1.16 66
& C4 H$ p/ [' v0 Y& m33 hdd 7.27689 1.00000 7.3 TiB 4.9 TiB 4.9 TiB 622 MiB 8.9 GiB 2.3 TiB 67.84 0.88 66. J3 h: U& R8 S" _+ V
34 hdd 7.27689 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 102 MiB 11 GiB 1.6 TiB 78.56 1.02 65
! \9 i7 ^7 X* n- k8 i r9 ~35 hdd 7.27689 1.00000 7.3 TiB 5.9 TiB 5.9 TiB 723 MiB 11 GiB 1.4 TiB 81.31 1.05 63' _7 d! N/ _9 q& {
TOTAL 262 TiB 202 TiB 202 TiB 25 GiB 381 GiB 60 TiB 77.15
9 E2 q1 S U8 [+ V( O9 t复制
" k2 N$ y2 |; f9 f) S. ^* E0 A可以手动修改权重解决:* z/ X+ t6 {/ n# I2 Q
) h) m S! @% r- J Z' r: G/ ]$ ceph osd crush reweight osd.4 0.3. a/ d, H8 Q2 A+ O
复制
8 r [% }2 A& |pg 均衡1 V. K" s4 x% c8 V* w) q' S3 K
pg 在默认分配有不合理的地方。https://cloud.tencent.com/developer/article/1664655[6]
2 a$ s* L6 R5 A8 ^3 p; }
! p4 F: K1 b0 H7 @; o$ ceph osd df tree | awk '/osd\./{print $NF" "$(NF-1)" "$(NF-3) }'
2 Y+ [) A* Z G. e& tosd.0 89 71.20
5 z6 o8 F+ L1 k+ F) U: R5 h5 [osd.1 38 94.80
* C1 ?8 E- X- N) B0 K p7 ^* A7 fosd.2 92 68.448 G5 x; E# C H/ @
osd.3 92 72.365 m$ ^- I. } {/ r) R: p) L
osd.4 28 76.86
9 f. {5 Q1 ]9 y$ H( Dosd.5 64 81.378 s/ g$ b" I7 i) x& ?( Y
osd.6 62 87.90
, d+ W* D% j' tosd.7 89 78.78
_3 @$ k& Q8 U0 i* `: uosd.8 52 86.18' n! E2 _1 i D5 A8 `5 {
osd.9 89 75.44
: x( n# Q x$ josd.10 37 96.33
! Q0 B g- L: A0 U: Nosd.11 102 75.26
7 U( j" J9 G% K5 I, D+ Yosd.12 33 91.413 f6 h2 k$ [% A2 U5 E
osd.13 34 95.98 o4 x+ x1 l: t5 i
osd.14 59 84.97
/ F" Y9 i- |2 o! Bosd.15 20 70.92
& B% R- l7 S$ h$ ~8 a) P& ?osd.16 113 89.46
5 f H4 r% x' i' T( c) Rosd.17 30 77.12/ C( q* a4 b! l5 g7 a% O: ?1 s; |
osd.18 124 77.11# Z: J; z/ x; C8 o- @8 o. [
osd.19 44 95.23
4 h' P1 j% u4 Iosd.20 65 84.63$ H; W3 P9 {) b9 C+ E+ T
osd.21 98 96.71
0 W; i1 V" E8 J) @$ Wosd.22 34 95.93
; E' [9 Y2 i* C# j/ q+ yosd.23 62 84.56
$ s+ z$ D3 |; _+ g. y4 I% N) d6 zosd.24 110 76.63
8 U, Q- F0 e7 u* u: v" c2 Posd.25 64 82.32+ o5 s0 N: S* N# s: t) R$ o
osd.26 59 88.26' [% k" ^( X6 F4 X
osd.27 38 95.83
) Q: [6 |3 ^8 v) Z* aosd.28 105 79.19: } `# Y8 \ `; Q
osd.29 36 94.94% L: w1 `; w2 K+ |! E7 S
osd.30 94 90.79
7 ]1 Q, n" z7 r' l" z3 {osd.31 91 81.74
7 |& k; k' ~/ F0 u7 a( l# ^osd.32 12 42.44& {- w' D9 R' P% w- S
osd.33 94 81.32' ]6 p% U1 h) y, W: \6 n. q
osd.34 46 86.51
. V {& s' [. P: G: S d2 j% _4 Cosd.35 37 92.68
) {9 ]) x5 ~9 e# Z' P5 h0 h复制
/ x) |! ~5 j7 k; Breweight-by-pg 按归置组分布情况调整 OSD 的权重:! O4 ^- @) l- t! S* W
) @ W2 U! k+ r
$ ceph osd reweight-by-pg
4 U8 }; _7 o# Imoved 0 / 2336 (0%)
9 i- k8 |9 i- _& z( r1 m: u% Eavg 64.8889
$ L9 G9 W9 b& x. `3 m' g$ cstddev 58.677 -> 58.677 (expected baseline 7.9427)
( @1 u3 h! b( m5 {min osd.1 with 0 -> 0 pgs (0 -> 0 * mean)
& \5 k) g; u3 |5 J( n" Xmax osd.18 with 168 -> 168 pgs (2.58904 -> 2.58904 * mean)9 d# \$ ~9 r- {! v/ s, P
! s- H) r( S7 u' e) \ P8 xoload 1203 B* ^# J! D+ @) p% ?! W, l) T) T
max_change 0.05
0 r) e& K9 F, q3 }* Wmax_change_osds 4
$ P- p _+ S8 {% s% o) o1 Saverage_utilization 18.2677+ K/ X& v) [0 Z; g, R. S: G+ o
overload_utilization 21.9212% F$ B3 K2 S, O
osd.19 weight 1.0000 -> 0.9500
( w4 d3 ~* _+ l; a# Q! \# tosd.1 weight 1.0000 -> 0.9500
( ^! F3 G& y- ?9 p, ^( Wosd.27 weight 1.0000 -> 0.9500) X% [7 m2 D { o; @6 f1 v
osd.10 weight 1.0000 -> 0.9500; i" T) A+ i0 e2 Y; H2 r
复制" S0 h. W& p* H9 W* H
reweight-by-utilization 按利用率调整 OSD 的权重:
2 e. o' E$ ?& |1 d' p% M5 V. s+ t ]
6 j6 w* B% ~! V6 f$ ceph osd reweight-by-pg! V8 d5 S9 ~* b- m$ K4 j
moved 0 / 2336 (0%)
, q" V8 j. D" t% X7 ]5 mavg 64.8889) E2 \# | S5 k! T
stddev 58.677 -> 58.677 (expected baseline 7.9427)* Y4 c' Q0 z2 P9 Q2 N% z5 ^ e+ O5 N
min osd.1 with 0 -> 0 pgs (0 -> 0 * mean)
, a5 |; h2 a/ g U) _7 R% vmax osd.18 with 168 -> 168 pgs (2.58904 -> 2.58904 * mean)* ^4 b; Z, u* \( f1 M7 \
8 e% k7 |+ U& B: U# Goload 120/ {6 l2 u% B. w' p/ E: v
max_change 0.05( F( r: L% b) |' V
max_change_osds 4- U( A1 g$ w/ V% P4 }( J! J
average_utilization 18.2677
9 U+ d# T: A- Q/ L6 d2 noverload_utilization 21.9212
: ]3 ?- m8 R! H" F' |. Zosd.19 weight 1.0000 -> 0.95004 O+ R+ ~- d+ L% d
osd.1 weight 1.0000 -> 0.95006 V8 h" D+ M- h+ s/ v/ c, G$ B
osd.27 weight 1.0000 -> 0.9500
9 Q4 V- m) X5 l3 E, [osd.10 weight 1.0000 -> 0.9500
2 X5 y: b' {$ u, R! M复制 g7 {$ S2 h# I# V- X
调整写入权重:7 U I( E0 H6 O# A G! r4 ?. A
+ k' O- M- c# D$ ceph osd reweight osd.35 0.0015 G7 K" P" }! {+ O. b: p
复制
7 Q9 C5 W# F5 Z$ @1 K查看当前 osd 信息:
/ s+ x0 S5 R+ O" k! _% }' t
( x- f' l5 \- X! n r$ ceph osd df
2 a; ?4 L. ?9 u3 ~( VID CLASS WEIGHT REWEIGHT SIZE USE DATA OMAP META AVAIL %USE VAR PGS
. U3 }8 [) w6 Y 0 hdd 7.27689 1.00000 7.3 TiB 5.2 TiB 5.2 TiB 1.0 GiB 9.4 GiB 2.0 TiB 71.96 0.86 39
' |1 x! b% [( _- i6 i' _: ^ e 1 hdd 0.00999 0.90002 7.3 TiB 6.9 TiB 6.9 TiB 604 MiB 12 GiB 382 GiB 94.88 1.13 378 t) N2 e+ p7 c2 D% ]8 S7 O. D
2 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.2 GiB 8.8 GiB 2.2 TiB 69.55 0.83 34, F/ z! c4 }3 y. q1 S. R9 n
3 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 812 MiB 9.9 GiB 2.0 TiB 73.15 0.87 34/ E2 {$ }3 O: C) S' L6 m
4 hdd 0.29999 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 185 MiB 12 GiB 1.7 TiB 77.01 0.92 26
/ z5 \& P. [$ n. F. U: W 5 hdd 3.00000 1.00000 7.3 TiB 6.0 TiB 5.9 TiB 443 MiB 11 GiB 1.3 TiB 81.90 0.98 36# v: A8 W2 j. d3 P
6 hdd 3.00000 1.00000 7.3 TiB 6.5 TiB 6.5 TiB 499 MiB 11 GiB 809 GiB 89.14 1.06 38" [: V, G; p6 n' M6 C9 N$ Y
7 hdd 7.27689 1.00000 7.3 TiB 5.8 TiB 5.8 TiB 1.2 GiB 11 GiB 1.4 TiB 80.10 0.96 43
5 g" `/ L% x* p1 C+ r8 o! u% e$ k+ y 8 hdd 3.00000 1.00000 7.3 TiB 6.3 TiB 6.3 TiB 502 MiB 11 GiB 992 GiB 86.69 1.03 368 m+ i: B: |. T! s0 M
9 hdd 7.27689 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 1.5 GiB 9.8 GiB 1.7 TiB 76.57 0.91 42: s& N Y* y" \$ s; r8 _# ~
10 hdd 0.00999 0.00099 7.3 TiB 7.0 TiB 7.0 TiB 295 MiB 12 GiB 267 GiB 96.41 1.15 37
3 J$ A N+ B* r! J4 u2 {11 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 1.2 GiB 9.8 GiB 1.7 TiB 76.13 0.91 37
$ Z" Q6 \0 X, g* K0 y12 hdd 0.00999 1.00000 7.3 TiB 6.7 TiB 6.6 TiB 95 MiB 12 GiB 635 GiB 91.48 1.09 32; z8 R2 d( s" {
13 hdd 0.00999 1.00000 7.3 TiB 7.0 TiB 7.0 TiB 584 MiB 12 GiB 315 GiB 95.78 1.14 34% Z( m$ s8 c5 _9 @: M6 S2 f
14 hdd 3.00000 1.00000 7.3 TiB 6.2 TiB 6.2 TiB 974 MiB 11 GiB 1.0 TiB 85.86 1.02 40
& f' C% [- V0 r1 T4 K8 P15 hdd 0.00999 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 116 KiB 10 GiB 2.2 TiB 70.43 0.84 20
* b1 J4 C, `: z$ n. E$ W% U9 b" D0 m16 hdd 7.27689 1.00000 7.3 TiB 6.6 TiB 6.6 TiB 1.2 GiB 11 GiB 697 GiB 90.64 1.08 43, @' ?) M% x' l% G
17 hdd 0.29999 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 40 KiB 12 GiB 1.7 TiB 76.75 0.92 26
* S. s) n( O! [) h# y18 hdd 7.27689 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 1.9 GiB 9.3 GiB 1.6 TiB 78.01 0.93 53
$ [7 [* {7 |) g% B7 d4 J19 hdd 0.00999 0.00099 7.3 TiB 6.9 TiB 6.9 TiB 1.5 GiB 13 GiB 371 GiB 95.02 1.13 40- J. a$ k( t9 Q8 E" h
20 hdd 3.00000 1.00000 7.3 TiB 6.2 TiB 6.2 TiB 744 MiB 12 GiB 1.0 TiB 85.86 1.02 37" H1 T7 \6 z6 ~0 D" R$ E5 T& ^' N( h
21 hdd 7.27689 0.00099 7.3 TiB 7.0 TiB 7.0 TiB 913 MiB 12 GiB 239 GiB 96.79 1.15 40
2 Z8 W/ d" Z1 r' P8 W22 hdd 0.00999 0.00099 7.3 TiB 7.0 TiB 7.0 TiB 283 MiB 12 GiB 298 GiB 96.00 1.14 34
6 c" o; a' H& L+ M3 O" u23 hdd 3.00000 1.00000 7.3 TiB 6.2 TiB 6.2 TiB 515 MiB 11 GiB 1.1 TiB 85.30 1.02 35& w& n) n) m6 L
24 hdd 7.27689 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 1.4 GiB 9.8 GiB 1.6 TiB 77.63 0.93 42
v- T$ \. {% T- f3 g25 hdd 3.00000 1.00000 7.3 TiB 6.0 TiB 6.0 TiB 1.2 GiB 10 GiB 1.3 TiB 82.66 0.99 40$ m8 L. D# v8 o$ i) |; A; c! g: u" ?
26 hdd 2.00000 1.00000 7.3 TiB 6.5 TiB 6.5 TiB 737 MiB 11 GiB 823 GiB 88.95 1.06 36& K; O. a; K1 p" @$ ^! y& d' G
27 hdd 0.00999 0.00099 7.3 TiB 7.0 TiB 6.9 TiB 822 MiB 12 GiB 327 GiB 95.61 1.14 37
5 q& o- b+ c+ h3 l- }" r. G$ R: ^4 }28 hdd 7.27689 1.00000 7.3 TiB 5.8 TiB 5.8 TiB 859 MiB 10 GiB 1.4 TiB 80.23 0.96 40% W, p1 f7 I) _4 e
29 hdd 0.00999 0.00099 7.3 TiB 6.9 TiB 6.9 TiB 215 MiB 12 GiB 371 GiB 95.02 1.13 36
) J+ l9 v, e' t A/ H6 j! {30 hdd 7.27689 1.00000 7.3 TiB 6.7 TiB 6.7 TiB 1.0 GiB 12 GiB 607 GiB 91.85 1.10 47
# M4 d3 b% K$ J6 R1 H31 hdd 7.27689 1.00000 7.3 TiB 6.0 TiB 6.0 TiB 1.2 GiB 10 GiB 1.3 TiB 82.81 0.99 41
- `0 ?* R5 z( i32 hdd 0.29999 1.00000 7.3 TiB 3.0 TiB 3.0 TiB 32 KiB 7.1 GiB 4.3 TiB 41.47 0.49 103 ?: K6 h+ j' Q4 t- R0 d! |
33 hdd 7.27689 1.00000 7.3 TiB 6.0 TiB 6.0 TiB 827 MiB 9.7 GiB 1.3 TiB 82.06 0.98 41
4 e0 h4 d8 I( @: k34 hdd 2.00000 1.00000 7.3 TiB 6.3 TiB 6.3 TiB 308 MiB 11 GiB 976 GiB 86.90 1.04 33
: W7 P2 `8 \0 ~3 {35 hdd 0.00999 0.00099 7.3 TiB 6.7 TiB 6.7 TiB 613 MiB 12 GiB 540 GiB 92.75 1.11 36( y- p2 i) C( b9 c. Z& \ r
TOTAL 262 TiB 220 TiB 219 TiB 27 GiB 391 GiB 42 TiB 83.873 V; F* }* q6 D0 B
MIN/MAX VAR: 0.49/1.15 STDDEV: 10.62" x% |) Z( B% Z; ^
复制
3 u9 Y8 Q6 s( T3 T! A; T删除 Cephfs5 E; }+ p* {$ f
. F7 r6 K- r( ~: L5 L关闭所有 mds 服务, 需要登入服务器手动关闭:3 o* j8 e6 [1 H
( B% ^0 d) v% _! y1 d5 @: {$ systemctl stop ceph-mds@${HOSTNAME}
* f1 O- s- A1 V: f* s- L o复制
h& M( w1 Y7 ^2 {& C删除所需 fs:# R0 o* Q r1 d) b1 i% \ a& l
2 B% \9 C4 t) K, l* X
$ ceph fs ls
! k( y% O$ a0 a c N, b. F; S$ ceph fs rm data --yes-i-really-mean-it% ~2 u1 b4 V% b, Q a. n
复制6 y& u J7 W/ q' f" {6 [
SSD 使用
) w: f6 J& n* [) g" [8 C7 s
% J: D2 `# ^5 c' r! k# \8 C. g查看当前 OSD 状态: (相关文档:https://blog.csdn.net/kozazyh/article/details/79904219[7])2 C+ t' g+ D8 ~' V, Q$ T4 g6 T" |& Z
5 k9 S- A6 }" Y4 n) m
$ ceph osd crush class ls
$ R4 i* d# |3 Z& q& H: y[
8 C& m& T3 G! \ j "ssd"' @' @/ c3 [! N: D! p$ p5 f+ |4 j
]/ L W! r5 `: }+ G
复制
, C: m+ e$ F& J) I1 ~- s如果使用的 SSD 标识错误,请自定义修改,命令如下, 移除 osd 1 ~ 3 的标识:
7 h' t' }2 B9 V/ g6 P
1 X& D! |! P4 W! k6 V$ for i in 0 1 2;do ceph osd crush rm-device-class osd.$i;done; y8 r' t v! h6 @
复制$ M* ? ~% S. q& m6 \1 `* V
设置 1 ~ 3 标识为 ssd:
1 \' e3 u- Y: Y- d0 |, R& c
& ^/ s1 r3 h8 e, c2 I$ _6 }$ for i in 0 1 2;do ceph osd crush set-device-class ssd osd.$i;done0 e J4 U2 Y- N" U; r0 O
复制
: }7 M. C! T8 z创建一个 crush rule: |+ K2 D7 y* r0 w2 Z
$ n$ K! v: |2 o9 B$ ceph osd crush rule create-replicated rule-ssd default host ssd8 {( ?. k' X: S- j; h* f
$ ceph osd crush rule ls
* O6 e. h7 X4 K% }复制
" R/ e- B' W" d+ Z% k; x然后创建 pool 时附带 rule 的名称:& l V2 g( H- A. @
0 k% U" ~7 r& j9 {' z6 Y$ ceph osd pool create fs_data 96 rule-ssd
$ r4 O- P5 f7 d6 J% A7 V$ ceph osd pool create fs_metadata 16 rule-ssd
8 R A" d5 A% d0 Y0 s2 z& Y. u$ ceph fs new fs fs_data fs_metadata
. h) e9 d% D+ u+ ~7 B复制
6 A. j u6 s; F2 O% ^crushmap 查看% {! K; ^: I8 Z$ L
执行命令如下:
$ ]8 l5 R6 S: N2 d# ^1 w' h
- r6 A( M7 `' _6 Q3 m# _; s$ ceph osd getcrushmap -o crushmap8 ?$ Y# j, w" e5 T! _5 g& N
$ crushtool -d crushmap -o crushmap: ?3 s3 L( {, M+ p* c) o: _
$ cat crushmap1 W% A/ [6 N4 L! W
复制) i3 {% }% [1 D1 c7 v: l+ r
3 monitors have not enabled msgr20 W+ U4 p! i! E7 e
解决如下:
) T4 n& A0 T; c
9 j2 R: ?# u- y6 _! u$ ceph mon enable-msgr2
5 n/ w0 u5 S) f. ?1 [复制, D. y, Q$ A: F
2 daemons have recently crashed
9 w; |$ R5 J/ A0 i. K) Z( h解决如下:https://blog.csdn.net/QTM_Gitee/article/details/106004435[8]: k7 I: k6 C9 M1 C- L4 k
# h8 k1 u+ C9 u0 v' E
$ ceph crash ls z9 v( S3 p6 ]" O3 h
$ ceph crash archive-all
7 ?8 o8 X) K. L. u8 R |1 Y; u; ?4 h! g
|
|