|
|
Rbd 无法删除- O: `$ p6 d& R8 p/ t5 B
rbd 无法删除,错误如下:
' J- e5 N- K6 `9 w+ C" g1 O6 z. {7 \2 B5 _! z7 w
$ rbd rm nextcloud/mysql% x; R0 t4 S# n. T
2020-05-13 16:27:46.155 7f024bfff700 -1 librbd::image::RemoveRequest: 0x557a7af027a0 check_image_watchers: image has watchers - not removing
$ Y7 P _7 b# s% v ~3 vRemoving image: 0% complete...failed." f2 |! O/ j7 o
rbd: error: image still has watchers
9 i8 B' ?4 v2 A( y8 \. j0 FThis means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.
% O' F2 u) V1 r
: ]0 F! v/ t4 T% ]; ?+ w# Y5 E6 n$ rbd info nextcloud/mysql
1 }! G3 U$ W) L( p! srbd image 'mysql':8 S( [! Z* r4 {: N- j9 f5 P, B
size 40 GiB in 10240 objects, h0 R. O' _9 t9 X+ L+ W* s; d
order 22 (4 MiB objects)6 ]% }3 H3 q( H f
id: 17e006b8b4567
/ @# u4 ~: Q5 U" H2 h; L block_name_prefix: rbd_data.17e006b8b45678 v6 p) ^( |0 ~# L) C/ o
format: 2 S( x) T) i& e Z
features: layering
$ _# }- {) ^& z+ j6 k op_features:) q/ m+ ^; B/ Q: f5 ?( P) S
flags:$ B# Y f. o3 K, V! D
create_timestamp: Tue Oct 15 10:47:34 2019' B3 N, `( r, ^( r
复制
8 x) j+ L& ]: c- |1 M$ X% V查看当前 rbd 状态:' f7 B. p3 ?7 u
$ p& l+ y: ^7 P3 U& x$ rbd status nextcloud/mysql
* |1 z: I& ^9 K6 V# [Watchers:
, V) Z% D' t# V. u& ` watcher=10.100.21.95:0/115493307 client.67866 cookie=7" l* A6 \- l7 A9 r4 g0 A3 X
复制
$ ]2 I7 R( L# n7 u" l* N' \# [发现有节点正在挂载,登入到相应机器进行查看:. a% C; c) O- ?$ K
6 {5 |7 g! I+ I3 j2 @
$ rbd showmapped K" h8 f. p7 [/ P+ ]
id pool image snap device
' o N0 H( L6 Q* H6 S2 C" `, U...
' @- L: L% Z3 s M3 `5 ]. ?3 nextcloud mysql - /dev/rbd3
1 a1 i7 ~' Z7 W( ] f复制
; Y. k+ d& {2 k9 k* G取消映射:# o z+ Z- g% }0 n& Z, S
* f. T' y9 ?" X9 t
$ rbd unmap nextcloud/mysql" k9 b: I$ m" m0 N6 x) {) U% j* \
复制
- b7 N0 @2 F9 p. [1 u; C重新执行删除操作即可:6 ^' S: n O" D# A7 v6 [2 r
R, O2 R1 R+ @7 V4 j
$ rbd rm nextcloud/mysql5 o2 S' s! f/ a) R& r
Removing image: 100% complete...done./ Q% l, Q* L) B# h- x) X ?
复制8 N' e4 `" i# B
暴力解决方案,直接对其添加黑名单,忽略挂载节点:
; P5 z5 A8 X3 O, B3 b7 b7 D" @' e0 _
$ ceph osd blacklist add 10.100.21.95:0/115493307- z: F# Q3 @$ l" ~' E. L) B
$ rbd rm nextcloud/mysql
- M, L" R$ S: R6 {( I1 f7 L复制
( Y" j/ z" @' s3 H9 f2 @" ?OSD 延迟
3 n( d- w. `+ e' ]2 B查看是否有 osd 延迟:% n& Q! J$ j" V J6 W# q
; b1 y. ~, X/ l# r3 s8 f) k v$ ceph osd perf
: N ]4 m: D* Fosd commit_latency(ms) apply_latency(ms)
+ E& M4 h2 }* g8 J 2 0 07 }5 m* R# l6 |# x( M$ n
1 0 0
' H2 e, D' I) n. W$ |5 b) Q 0 0 0$ w) N8 x; X& F! ~- I6 ~' Q( T
复制
7 R+ i! c. _# a+ s. w碎片整理
% R* Q5 _6 n2 Z3 O! [% S( _& p2 X9 B1 f查看碎片:
8 @9 b* G! Q) S1 P. G8 Y& K- I
i! _( o, z6 y4 @3 H* b$ xfs_db -c frag -r /dev/mapper/VolGroup-lv_data1
3 \7 L# Z$ |& S# ^: {4 S复制$ i2 v& T9 h2 R2 k' j
整理碎片:+ j% m: U" J; i
$ u0 c7 U3 @! @! W) p7 ]: n6 ~查看通电时长
6 K! G6 c$ z: E2 W查看磁盘通电时长:* J2 a U# p" c$ ^' ^
' P2 h( q5 z w6 K" g, R' ^* G$ smartctl -A /dev/mapper/VolGroup-lv_data1$ ]" C8 L# T0 v" z
复制
, g: T) D6 W9 f2 ?3 T" Q修改副本数量: w/ D* M; N# n, j4 Q2 j5 H
修改副本数量:" H$ \5 @7 W# d" Y
. n3 E. f' s( c% A* X$ ceph osd pool set fs_data2 min_size 1
2 }; f) r% ~; \( ]$ ceph osd pool set fs_data2 size 2: V. K7 e/ C1 c
复制
, _/ A$ k' T* t" @% J7 Q添加 / 删除 pool8 @6 m* [# {/ F: f, F; e
添加 / 删除 pool:* P: T/ j: B1 E3 P7 `( N
+ y) h7 B1 x4 r2 W5 K' y' ?
$ ceph fs add_data_pool fs fs_data2
! ]- V. t$ H! e1 q8 U$ ceph fs rm_data_pool fs fs_data2
; @! R3 D5 r4 f- E复制
3 r1 e2 {( o5 o- V4 j3 nosd 数据均衡分布
+ E/ f7 e$ c$ a2 l. _% N+ i5 Kosd 数据均衡分布:
9 i2 {: G o3 C
$ N; H2 h- V! u5 M: q; v7 @8 B$ ceph balancer status
$ n! P/ o7 w; n8 U$ ceph balancer on" L4 H$ M ?; V1 l, I' O
$ ceph balancer mode crush-compat
2 w* B3 [: ]+ S. }" C复制, H8 L8 W# U3 N/ j$ _
mds 无法查询/ N) w. ~6 j1 ~' d( `; S) u
mds 无法查询:
" m! i. J, U8 w. k& B" P+ L: [# r! V9 Z7 Q
$ ceph fs status" E4 i0 ^6 e0 W1 L0 e1 ^4 { S/ d
Error EINVAL: Traceback (most recent call last):7 q: [1 r: V% F. _" }% Q
File "/usr/lib64/ceph/mgr/status/module.py", line 311, in handle_command9 s$ F2 A/ ^- D9 x8 M' K" t
return self.handle_fs_status(cmd)# G1 j/ p, m- t" `$ z
File "/usr/lib64/ceph/mgr/status/module.py", line 177, in handle_fs_status
" u1 P1 q5 g2 J mds_versions[metadata.get('ceph_version', "unknown")].append(info['name']); `- b/ a2 R6 q9 f
AttributeError: 'NoneType' object has no attribute 'get'; G: F7 h9 \0 c$ N/ ]7 J
" k5 ~& w- q) c5 f, `8 J+ h$ ceph mds metadata% `' ?. H# Q3 a6 p3 p
[
% G) A; _+ r' O- B% Z( r5 [, k {
+ }3 Z# p1 o1 H! { "name": "BJ-YZ-CEPH-94-54"
5 W# m5 i P6 p* s9 m },
& V$ z$ }4 J/ x$ d$ i {3 Q! }$ {) b+ Y- i, p" d
"name": "BJ-YZ-CEPH-94-53",
3 D% ~& l' F2 Z" [: b! Z; R1 o# u "addr": "10.100.94.53:6825/4233274463",# f$ h. [" @& U+ F1 }# n1 Z9 d
"arch": "x86_64",
/ D C( `5 |7 p9 ~ "ceph_release": "mimic",8 e+ N# Z1 N; q% i. s
"ceph_version": "ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)",' _( ? }+ ]* O4 n
"ceph_version_short": "13.2.10",0 s9 @* p5 G1 |2 ~ y# [
"cpu": "Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz",. E3 ?+ [, t. `6 \2 i) K2 r
"distro": "centos",
9 l: K& l2 Y% l4 f- G. `" l "distro_description": "CentOS Linux 7 (Core)",( w, M% M# T S2 U+ _
"distro_version": "7"," U7 K" C( P+ m: z9 X
"hostname": "BJ-YZ-CEPH-94-53",0 t9 C3 D; O0 G% R9 |
"kernel_description": "#1 SMP Sat Dec 10 18:16:05 EST 2016",
! Y, B7 E: n0 @# P "kernel_version": "4.4.38-1.el7.elrepo.x86_64",
3 j: t) [9 Y) | x. Q4 v "mem_swap_kb": "67108860",
, v* ]& P9 p. T2 G$ Z+ a "mem_total_kb": "131914936",- b% q8 Q2 W+ l6 `# g0 A
"os": "Linux" S, N& f9 j1 j& I0 x* w% L" |
},
1 ?. c5 {) Q3 L0 X! v( b' d* G9 `. P/ R {
" h$ \9 e% }. {2 u1 f "name": "BJ-YZ-CEPH-94-52",3 s7 w }7 Q$ }1 V- [- x( t
"addr": "10.100.94.52:6800/3956121270",! Y5 l) b2 U7 c8 N' b
"arch": "x86_64",: @0 j. w% q. c, Q
"ceph_release": "mimic",/ }; h& N4 }0 |3 W
"ceph_version": "ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)",
7 }6 S2 O# }0 c$ L K; B1 Z1 p "ceph_version_short": "13.2.10",
0 y+ B/ [5 {& v) v "cpu": "Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz",
3 }- W5 E& |4 B/ ? T* I "distro": "centos",
6 [- s' N6 @- T4 ` "distro_description": "CentOS Linux 7 (Core)",
) m% u* J1 T+ k "distro_version": "7",2 I+ h7 n) ?; @! z1 M' _/ L; H! M
"hostname": "BJ-YZ-CEPH-94-52",
) z7 g" j8 X9 Q" G6 i0 H, N, K "kernel_description": "#1 SMP Sat Dec 10 18:16:05 EST 2016",6 h" U- _# P5 f% W
"kernel_version": "4.4.38-1.el7.elrepo.x86_64",
+ f4 c. g1 I, A' Q "mem_swap_kb": "67108860",
+ A9 G0 j, S0 H "mem_total_kb": "131914936",/ z$ l( {- |& D7 W2 R
"os": "Linux"! x1 L& w ]; p M
}8 {, ?( h, B/ Q# Z8 K% j. y4 @
]& I8 K y6 F; D X3 h
复制
3 {. j' e! p* h8 S重启 mds 解决。
$ i, T9 [7 M8 h( Q J7 T; f4 F8 ?; A8 G& a3 R! t2 ]8 o
cephfs 显示状态正常但无法写入数据
! p+ t) v8 V1 z! Acephfs 显示正常无法使用,一般是有异常 client 导致的,首先查找 mds 是否存在链接,尝试删除链接解决:
% \/ o( R; H. X$ p& G+ _3 s4 _# M9 s% k. P- X- w) \9 _
$ ceph tell mds.BJ-YZ-CEPH-94-52 session ls; R& j* k# I- C* W9 f) b* j' H; b
$ ceph tell mds.BJ-YZ-CEPH-94-52 session evict id=834283
6 ]2 @8 I1 ~9 n& z复制* G( u, o4 K$ ^
每一个 mds 的 id 号不通用,不能跨节点删除。
% d. m7 M- h T, J) v
! @9 B4 p, y3 Q1 ? T5 y/ G% ~7 y) zfs 增加 mds# S$ A. n7 f* M
fs 增加 mds:& s1 j% d( I, o2 n# n; C7 n. c
$ @8 m# n; }8 P& K
$ ceph fs set fs max_mds 2
8 `6 Y1 @; i& E& a' W复制0 ]( x) U- b2 H6 w8 L
mon 时区异常
5 w% @! T6 S. D3 J* H; l' ]mon 因为时区有部分异常导致报错如下:
r; N' e; {% z$ s3 k: O) Q- A0 Z+ O3 Q4 f! P
$ ceph -s
+ V) `, Y' ~- @' Q- P cluster:
, O3 _2 j2 Z ]( [ P4 R; D6 B id: 2f77b028-ed2a-4010-9b79-90fd3052afc6
( O9 c0 b; R/ }% Y* y/ w health: HEALTH_WARN
6 ?; }+ I0 R" M3 A+ i ?+ \" m 9 slow ops, oldest one blocked for 211643 sec, daemons [mon.BJ-YZ-CEPH-94-53,mon.BJ-YZ-CEPH-94-54] have slow ops.- a1 U+ J; Z9 }& |
- p6 [ u4 q/ E5 F* U4 b9 ]6 @" s3 m
services:) k1 M7 I& P& _ s" I; h
mon: 3 daemons, quorum BJ-YZ-CEPH-94-52,BJ-YZ-CEPH-94-53,BJ-YZ-CEPH-94-54
# m; E) H7 f' C6 Y) d mgr: BJ-YZ-CEPH-94-52(active), standbys: BJ-YZ-CEPH-94-54, BJ-YZ-CEPH-94-53$ X5 ~, L* G; K. T
mds: fs-2/2/2 up {0=BJ-YZ-CEPH-94-52=up:active,1=BJ-YZ-CEPH-94-53=up:active}, 1 up:standby-replay3 j/ y$ r+ P6 {( x" Z1 P, O
osd: 36 osds: 36 up, 36 in2 C- w9 g1 [: u/ D a. q4 _% E1 [
8 }! Q9 V: p: y0 k/ T3 K+ `- M; c' [
data:
7 {7 w% j, v6 D, E8 S( d pools: 7 pools, 1152 pgs( g% Q$ R3 w1 O
objects: 37.66 M objects, 67 TiB9 }0 i. y2 r Q# X% h: s2 U+ e/ u
usage: 136 TiB used, 126 TiB / 262 TiB avail
& D4 g4 V p% |" y8 }# ^ pgs: 1148 active+clean- m% \1 v6 p! W
4 active+clean+scrubbing+deep
7 W8 o: b6 f4 F( S% c( d3 ~% c9 U+ H% t) g/ Z+ s& e
io:
( [( i( J3 E# x- e- { client: 13 KiB/s rd, 27 MiB/s wr, 2 op/s rd, 19 op/s wr( \: g7 S, E$ Y, o2 T4 X
复制 \* e% V) F N) q. Z4 c+ ?
配置 npt sever:, }9 }4 z) x! i3 \5 n
3 N* Q% i" l6 n* }$ A9 Q, P! s$ systemctl status ntpd [5 z3 k, w2 F% `& _
$ systemctl start ntpd
2 n- G" {$ T& M复制
% {' ?# q5 J/ x; C8 |* G重启异常的 mon.targe 解决:3 j$ ^: Q" r8 c' W
% t( ^6 a& ^7 a$ systemctl status ceph-mon.target
( g( H' m9 S( f7 R/ g P/ N2 S$ systemctl restart ceph-mon.target @9 l6 v$ ?# T- n
复制
, Q8 S+ g1 G- i9 @/ N/ @- w4 {1 MDSs report slow requests* ?. ^2 S) j2 R" \$ X l- V
报错如下:. L8 {& C) v, s2 r' v8 }/ L# {6 w6 H! Z! {
- j' Y3 r& H0 y( J+ b$ A
$ ceph -s5 t: `. O9 Y% F6 f, g% a
cluster:
5 Z4 K1 |9 _0 k4 O8 V3 R id: b313ec26-5aa0-4db2-9fb5-a38b207471ee
7 { ~/ S( f Y2 }: }3 D5 | health: HEALTH_WARN
5 X# W& j4 F+ p% X2 R" A8 B; `: z 1 MDSs report slow requests' R5 o6 \- `0 n1 ?1 q0 s
Reduced data availability: 38 pgs inactive
7 q R1 U' P- ]; d& X# ` Degraded data redundancy: 122006/1192166 objects degraded (10.234%), 102 pgs degraded, 116 pgs undersized
- c6 j( v7 |" F5 ` 101 slow ops, oldest one blocked for 81045 sec, daemons [osd.1,osd.2] have slow ops.
! ?4 G/ H/ B. H+ H8 a J) N复制
4 ~9 v. f2 ?* p, I4 N2 J& V6 W6 o重启 mon 即可解决:
% K1 \0 _/ x& Z% m1 R+ D, }- e. x- b# K0 t. U. A
$ systemctl restart ceph-mon.target
4 m0 h+ _8 p+ v8 c/ ~) k复制
, _) k$ |& @) g- I/ O如果无法解决需要重启 mds 解决:) X7 o% d. d- { m
. u* T9 @0 \! o1 b/ k$ systemctl restart ceph-mds@${HOSTNAME}
; \& }" _: r6 g( P1 S% G复制
* U! A( Z( u6 u9 g1 o# F$ K5 D1 _$ JReduced data availability: 38 pgs inactive" ^7 R0 m+ E( k$ j+ ~0 t. }
报错如下:https://zhuanlan.zhihu.com/p/74323736[1]* X: m) }" ]2 p0 [1 c' Z$ \
# D4 V) Y% }2 S7 ?. P; `. C' [
$ ceph -s2 z" ~% l( P- _7 n+ r
cluster:
/ @4 r' c3 w: B# x$ z. @ id: b313ec26-5aa0-4db2-9fb5-a38b207471ee
9 f, a' t, E8 @" I" G; S$ c health: HEALTH_WARN
5 j3 `' A/ _- ]8 I' O4 I7 D 1 MDSs report slow requests$ h$ T4 q% J4 O1 E8 o+ r
Reduced data availability: 38 pgs inactive" K$ n# o. M1 K W
145 slow ops, oldest one blocked for 184238 sec, daemons [osd.1,osd.2] have slow ops.
, t* Z# k& C- R, a* N" Z7 z- }$ S0 E8 o {0 V
services:
! K1 l% G4 q1 `. b/ p a5 ?, D mon: 3 daemons, quorum master001,master002,master003! o# t. a8 t- q% _5 Q
mgr: master001(active), standbys: master002, master003
) N+ h4 F" e' z' g5 \6 {; S mds: kubernetes-2/2/2 up {0=master001=up:active,1=master002=up:active}, 1 up:standby2 C3 e7 D( x6 Y, D$ n/ d
osd: 3 osds: 3 up, 3 in
9 B9 ]0 \8 k* m$ D' d rgw: 1 daemon active
4 Z& {4 z& c- Z& j- n- w
$ A$ E8 X8 t& ?% W% n1 V) U data:- h) V6 L1 g0 j/ S9 ^. u* O$ l
pools: 9 pools, 244 pgs% n( R, T. z* v2 N
objects: 535.1 k objects, 177 GiB
+ Y: X( R, E2 }0 K1 P usage: 470 GiB used, 4.1 TiB / 4.6 TiB avail
' O" P {* q# M pgs: 15.574% pgs unknown7 G# k, e' P, w7 j. O
206 active+clean
. J: z9 K% a' w; C0 C( W: v 38 unknown
: j% |$ g& x+ J& I0 ~ \" P! T; t$ l. R6 w! @: K$ A) I
io:
; g- U+ f! d* U& t& j% }) E) Q client: 35 KiB/s wr, 0 op/s rd, 2 op/s wr- B* c$ i% P. Y% S% c
复制8 q3 a( k" T4 t; e) ^7 x- K5 P, ]3 j g
此问题属于 pg 丢失数据并且无法自动回复造成的。解决办法是清除 pg 数据让其自动修复,但这样可能会造成数据丢失(如果 size 为 1 则肯定丢失数据): R6 y, V2 r. L* F1 D) }1 M8 I
0 ?8 @# x2 a+ D8 P2 _& a$ m, s
首先查看异常的 pg:% v p8 Z) a, O, N$ @
2 }, `& u9 `! ]) N5 s5 c然后执行 query 查看信息:
! h7 z- Q+ j9 |. i. |$ K, S; J, ^1 x. p _
$ ceph pg 1.6e query! w8 T |0 k, Z& A. e L
Error ENOENT: i don't have pgid 1.6e6 N+ R$ y+ ~* S# S4 z1 [
复制
8 d9 N1 D3 @ W上述无法查到 pg,通过如下命令查看异常的 pg:
& }, I+ P+ }4 L
! W, R7 _1 g! Y" |" j: r$ ceph pg dump_stuck unclean" g0 T: `* P' H7 @3 K2 e
ok
$ [8 ?9 c) e9 {. U" aPG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY
! W* Z9 Z7 _2 d: }3 D8 j. J1.74 unknown [] -1 [] -1
7 G, z8 Z2 l" ~7 T( U1.70 unknown [] -1 [] -1- o8 C$ {8 x. w ~9 e+ f/ ~: Z
1.6a unknown [] -1 [] -1" O6 ~0 s! ?( Z; t) h
1.2d unknown [] -1 [] -16 D0 D% R9 |) z w" a
1.20 unknown [] -1 [] -1
0 _! v; X8 {5 l! v7 J) D$ d1.1e unknown [] -1 [] -1
e% r/ g+ ]/ I1.1c unknown [] -1 [] -1
3 d J* x8 i0 q# I8 X; X$ K1.17 unknown [] -1 [] -1
! U6 D/ H. W$ z6 [; O# h1.9 unknown [] -1 [] -1
; R9 ^) @9 |) c# S# v- J1.29 unknown [] -1 [] -1
' V' V4 x4 _0 F; h3 B$ b& [% ]1.56 unknown [] -1 [] -1( K2 X' |0 j( K2 }2 q
1.72 unknown [] -1 [] -1
P7 h5 L8 y! J' a/ ]3 l+ W$ L1.45 unknown [] -1 [] -14 [2 T* z0 l2 t; D5 d) \
1.4e unknown [] -1 [] -1( J" \& m* a# E. f% H; k1 J
1.46 unknown [] -1 [] -1
# q: j! Q: j0 v$ Y: S1 s1.22 unknown [] -1 [] -1
8 `) F4 _0 V) \- R* J2 ]4 \: p1.53 unknown [] -1 [] -1
# A& B( C6 o% U' S7 Y% |1.59 unknown [] -1 [] -1# t2 [# m, `' p! }6 U+ x
1.24 unknown [] -1 [] -1
+ ]5 X# v% V0 X) x- k) A1.55 unknown [] -1 [] -1( Y+ u5 f: f; t) K: q, R
1.3f unknown [] -1 [] -1
3 C+ j- u9 Q; [ \2 y1.38 unknown [] -1 [] -1
* Z& H4 u8 T2 i1 f$ T% B$ o& H1.a unknown [] -1 [] -1
0 W0 i& Z! q* z# Y( p1.7 unknown [] -1 [] -1
3 d0 B) I& D9 I) n- Z3 ?8 h0 Q1.34 unknown [] -1 [] -17 S) R1 M& O/ U* b
1.64 unknown [] -1 [] -1
3 \+ T% p0 f, P- r6 ?# n1.6 unknown [] -1 [] -1
0 M m3 \1 c. `# U8 j# j4 N1.32 unknown [] -1 [] -1' W2 d4 H- \3 g6 R/ `' u
1.4 unknown [] -1 [] -1
1 y a5 [8 Z2 i7 @9 E1.2e unknown [] -1 [] -1
* `# `/ \' e. Y; |1.31 unknown [] -1 [] -1
. s5 \, D& v, ~# p1.5e unknown [] -1 [] -1 h; U }5 H8 J' L
1.0 unknown [] -1 [] -1
& _4 x% p9 X* _! U1.42 unknown [] -1 [] -1
" J2 F4 {% {& t' ~" v& H7 ]1.15 unknown [] -1 [] -1: n( q) M; F1 Q% a3 ~
1.6e unknown [] -1 [] -1" P( H/ c; Y/ W6 L! s
1.41 unknown [] -1 [] -1
4 t7 X5 }8 Y! _6 D" G1 @ o1.10 unknown [] -1 [] -1
6 b9 |! y) A$ F* C+ E3 r复制
8 ^" j( \1 {% Z执行如下命令强制清除 pg 的数据:https://docs.ceph.com/docs/mimic ... troubleshooting-pg/[2]
( d. M4 K9 D. s. f
' z1 h# Q7 ^# Q) i% k$ ceph osd force-create-pg 1.74 --yes-i-really-mean-it
$ n7 O# c, M/ K$ O5 |* @" A, d( M
# 批量执行
) R: o. M. y9 u3 K) I$ u$ i# ceph pg dump_stuck unclean|awk '{print $1}'|xargs -i ceph osd force-create-pg {} --yes-i-really-mean-it
7 m! U K; i( S, o, V* g; m复制
( f; m3 ]! w- v执行完成后即可恢复。1 S( s( U3 K e( C7 P; h2 P: D
7 x* V" V3 n) y* \( b
1 clients failing to respond to capability release4 q- A! O' s2 L/ R
报错如下:% t8 h* ~$ z9 l, ?/ m/ v+ o
' K8 a$ p9 }2 V- u0 j$ ceph health detail
7 X' N3 w" M) P$ b9 O/ z) l- fHEALTH_WARN 1 clients failing to respond to capability release, G, h4 r* l* q H, O
MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release
" ^0 q C) X {3 Z& k1 {# |4 @+ _ mdsmaster001(mds.0): Client master003.k8s.shileizcc-ops.com: failing to respond to capability release client_id: 284951! c- f- Q* }. O$ d
复制
0 p# e% {1 Q% w. F+ } i7 O清除次 ID 即可:https://blog.csdn.net/zuoyang1990/article/details/98530070[3]+ G# `- R4 |4 c" o# }0 w- W
. q3 o* l: H0 t7 d9 r- U
$ ceph daemon mds.master003 session ls|grep 284951
# Q- q- P8 c9 Y$ ceph tell mds.master003 session evict id=284951/ {# a! m" B1 O# L9 n" f- A P
复制8 [' R# {( O; H0 B6 I" C
如果报错如下:
( x4 |9 i1 V z% @
! _& \8 Y/ }9 S9 G; L, s- a$ ceph tell mds.master003 session evict id=284951; {! |0 n- P5 A; x
2020-08-13 10:45:03.869 7f271b7fe700 0 client.306366 ms_handle_reset on 10.100.21.95:6800/1646216103* u" Z2 b% M3 ^( g( O l5 H
2020-08-13 10:45:03.881 7f2730ff9700 0 client.316415 ms_handle_reset on 10.100.21.95:6800/16462161037 @$ _5 u5 T( N a1 p% V
Error EAGAIN: MDS is replaying log
3 B; c; F; m G复制
* {1 n9 X2 d% n4 E: [& k需要到 mds.0 节点执行,否则无法找到次 client。+ D. v7 L; `% z; p
( v: K9 i; Q6 v# C: e) o内核优化5 z& j* \. |" ]. `$ H- |" ]
内核优化:https://blog.csdn.net/fuzhongfaya/article/details/80932766[4]
% k& F; n1 E% i" m9 s( y: q
9 ~0 Z2 ?6 ?0 U, P* S: @# T. t3 O$ echo "8192" > /sys/block/sda/queue/read_ahead_kb
% P9 x0 T% `. t6 U$ echo "vm.swappiness = 0" | tee -a /etc/sysctl.conf
* _ ~8 Q. K5 @# f( g+ C$ sysctl -p- T! ]7 {: V# {
$ echo "deadline" > /sys/block/sd[x]/queue/scheduler
1 K5 M8 I, t! d7 R
) B! m% _4 |/ @/ o# ssd9 z" P& K( N& N2 R
# echo "noop" > /sys/block/sd[x]/queue/scheduler
2 U- E3 w/ m0 y4 g( K; W$ v; P复制
" n$ U. b" k$ X+ x$ v+ p+ r5 n! jswap 最好是直接关闭,配置内存参数在一定程度上不会生效。; ~! G2 G: ? R( J' g
6 S a& s" w" S+ L5 l0 c配置文件4 Q" g, [- a6 s, F' q
% ^2 v, \( u4 Z! s; p40 核心 128 GB 配置文件:
. U" Z) c& Z" a+ @/ u+ K, M. M# u& M
[global]* W2 q* {; Q9 s" r* T% m* i
fsid = 2f77b028-ed2a-4010-9b79-90fd3052afc6
# X6 i2 h0 V. [) f3 ^4 S& \- Fmon_initial_members = BJ-YZ-CEPH-94-52, BJ-YZ-CEPH-94-53, BJ-YZ-CEPH-94-54+ ~. M0 ]& |8 h& I1 q6 |" d
mon_host = 10.100.94.52,10.100.94.53,10.100.94.54
! G& T5 y- x- z6 j3 e5 ]+ Iauth_cluster_required = cephx
$ |' ~" E2 Z2 S$ ^3 nauth_service_required = cephx
& ] ^& E- N2 `. d) U) mauth_client_required = cephx1 E8 l. i' n( T7 L" ~( x) |
& a* B B, w# a0 {% a; ~' N+ H9 w) \public network = 10.100.94.0/24
7 h! p. @# w1 lcluster network = 10.100.94.0/24
" f; V" D! p5 d$ D9 H/ a6 W* A
. e/ V' L# j; P+ M. h[mon.a]5 C N' c8 J. _" c, o6 c7 G
host = BJ-YZ-CEPH-94-52$ b1 I# E7 z( U5 w
mon addr = 10.100.94.52:6789- n; N" J2 q- J; E: l! x
% [! K+ \1 L1 z: K7 s[mon.b], U* r4 |( W8 ~
host = BJ-YZ-CEPH-94-53& ?5 z" b( }- L0 w6 p' P
mon addr = 10.100.94.53:6789
4 Z% x7 _9 J2 q7 C4 u- n/ b5 z: u! \: X) s2 k! P
[mon.c]
4 r9 T& F" N9 e, b+ e' Uhost = BJ-YZ-CEPH-94-54
7 D; c( |8 Q. x) j: C% @; j# h9 _mon addr = 10.100.94.54:6789
9 e K) V$ d$ u* [/ ]' n: I
' \2 r! p8 _! B) p[mon]
6 L+ K/ `4 [% y- `' t' z- Omon data = /var/lib/ceph/mon/ceph-$id- N+ ~& {) e" x. T+ f; s
& V' g2 h: n/ G c9 g& h# v
# monitor 间的 clock drift,默认值 0.05
: }9 M$ w4 i$ w/ |% }mon clock drift allowed = 15 c" i/ C+ m1 M+ q
) ]9 I9 b ? i3 H# 向 monitor 报告 down 的最小 OSD 数,默认值 1' | Z& E( o' U! k, V7 b
mon osd min down reporters = 1
5 d0 B6 y4 w7 O N# c F X2 F) \6 B5 Q6 p
# 标记一个OSD状态为down和out之前ceph等待的秒数,默认值3009 D; Q$ z6 p+ V2 T+ T0 z
mon osd down out interval = 600
# A- ?6 p3 A& y7 A5 H# Z
; T. M# W4 c% v2 I7 q+ Imon_allow_pool_delete = true- d8 E7 j3 e: x5 A$ h: o3 }
, g' C" W$ _( `: w" G) U
[osd]
7 ~6 M8 D7 ^+ `2 S- O# osd 数据路径2 O5 T' ~7 a2 |* ^/ r
osd data = /var/lib/ceph/osd/ceph-$id9 C! B5 k% P# [* `: W
* w. e2 F: w/ p, b( I
# 默认 pool pg,pgp 数量
( J9 K1 H& _3 x4 [6 Q* e losd pool default pg num = 1200
$ ? s& p7 x8 h9 F2 e: Fosd pool default pgp num = 1200
; h) X A, M8 r u& R9 j) e0 F, n2 A8 N2 X8 R
# osd 的 journal 写日志时的大小默认 5120/ _+ q- \% X; o0 V& r7 A
osd journal size = 20000
! m* o, r5 P* k' m, F2 _! ]4 D1 p4 A/ z2 S
# 格式化文件系统类型
% p3 }1 P1 s fosd mkfs type = xfs
0 ~" N3 {' \- X. s
9 ^- C3 ^3 W% s# Q# 格式化文件系统时附加参数+ N- H6 I* w7 A1 Z1 Y$ O
osd mkfs options xfs = -f
* W! ?( B6 e3 N4 B" a* [( X0 O) W/ {. {. c2 ]3 d2 @
# 为 XATTRS 使用 object map,EXT4 文件系统时使用,XFS 或者 btrf 也可以使用,默认 false
# Y& a7 ~8 P) i4 E- X: R% Efilestore xattr use omap = true( @. {. b5 X# @& t
0 L! d) P8 W# T1 t, ~
# 从日志到数据盘最小同步间隔(seconds),默认值 0.1# v5 C5 F! u9 N
filestore min sync interval = 10- |: U2 I3 R [
, h/ Z$ S4 F9 d. a# 从日志到数据盘最大同步间隔(seconds),默认值 5
8 G' i# m% d- Ufilestore max sync interval = 15
3 r; N7 {; {" s7 ]; ?; H7 s4 U9 N1 A4 g
# 数据盘最大接受的操作数,默认值 500
' G- D0 j+ y Y2 Hfilestore queue max ops = 25000
) J- Y4 @% }0 W+ _6 B8 N% \- W$ }* `. H6 | [/ ?
# 数据盘能够 commit 的最大字节数(bytes),默认值 100
9 J- g8 _- _8 ~% B( T" s$ e) w. Zfilestore queue max bytes = 104857601 b0 r1 N+ a% v% U2 X2 J
2 D5 A; P9 X) A) w$ n; M" y: F
# 数据盘能够 commit 的操作数,500
& M" m4 F+ C2 ]% L2 k! g4 L. K5 Pfilestore queue committing max ops = 50007 V* S) ] c+ R# O( L" d: c: ^
8 A, C* A% N( ?; H" H: g
# 数据盘能够 commit 的最大字节数(bytes),默认值 100
% }, a3 y, P. e U/ O; a Q, \+ h0 }/ ofilestore queue committing max bytes = 10485760000
: O1 ~- w0 w5 v( a M+ ~ a5 P
# S- R3 m. |; Z2 R6 m5 Z) Z l# 前一个子目录分裂成子目录中的文件的最大数量,默认值 2
5 _* q; ]4 [2 Z [6 m0 g! wfilestore split multiple = 8
% T# i' g/ K V: \; w0 f0 W( J" m% A \8 x; Q
# 前一个子类目录中的文件合并到父类的最小数量,默认值10
# _$ G+ n% Q+ ^$ G7 xfilestore merge threshold = 40
1 k1 @/ | s! @& ?
5 [' r* D% y7 p ], H& f# 对象文件句柄缓存大小,默认值 128# ?# K4 _5 R+ L3 P; Z( s$ f
filestore fd cache size = 1024
& [+ o' N4 k# O/ G3 q- X3 _8 j* c9 g+ z% N& ^8 H9 c& w* y
# 并发文件系统操作数,默认值 2
3 ~. j0 o7 z9 y9 M( f; P& U; |filestore op threads = 32: _+ X4 @. P& f1 L) j% V: j
: P" r9 [$ ~* m/ x/ f# journal 一次性写入的最大字节数(bytes),默认值 1048560
5 B! J3 N: v" ^. C! `4 f ajournal max write bytes = 1073714824
0 L3 P3 ?- a' e8 N8 U2 o) `( X9 m8 _% N" ^9 J1 A: \4 A8 g
# journal一次性写入的最大记录数,默认值 100
9 o) b+ V0 V$ {$ W" k) I R# n2 vjournal max write entries = 10000
' D$ q8 c' F/ J0 r$ Q7 U6 \7 B# |& H
3 V; V6 c( Y' S4 ]% v! O# journal一次性最大在队列中的操作数,默认值 50/ s0 T2 U3 n$ g! U; \
journal queue max ops = 500000 s' d( l5 ?& B
* B2 K5 e, m: A4 @3 }
# journal一次性最大在队列中的字节数(bytes),默认值 33554432' U3 h/ r- I; j+ k
journal queue max bytes = 10485760000
/ C8 o/ O% y _+ {, a' S' h7 Y! D
1 z! r5 @1 k% {6 A# # OSD一次可写入的最大值(MB), 默认 90- S0 D6 J2 U. H* R2 y
osd max write size = 512 _4 ^* t7 L, E. I d
U/ k K' J5 n
# 客户端允许在内存中的最大数据(bytes), 默认值100
$ L* v9 L8 S) g# K& k- Rosd client message size cap = 2147483648
; D( T# C# K- ]+ z; W1 m7 N7 B( l: D+ X! v( f& e4 B" W: H5 |9 B
# 在 Deep Scrub 时候允许读取的字节数(bytes), 默认值524288
# w/ O" a( G+ @/ O9 f6 | josd deep scrub stride = 1310720
# j A) |" W5 `# l
3 C4 L( b3 }+ R. P- o* g l# 并发文件系统操作数, 默认值 27 T( l5 G8 I( l5 N* @7 I: ~
osd op threads = 32# z+ \" p" C0 g4 I% N& Q! H8 }$ |2 ?
; U( ^8 K+ [$ `4 j2 T8 e
# OSD 密集型操作例如恢复和 Scrubbing 时的线程, 默认值1
, m$ o7 @2 o4 x9 {- `osd disk threads = 10
! w; v, C/ G7 A$ j( ~, k* }# N7 m. }- z2 W+ }
# 保留 OSD Map 的缓存(MB), 默认 500
' R% _1 y! `* t. Oosd map cache size = 10240
: U# m. A* D$ Z! f/ Q- x! D* h* N+ @( s( b2 F/ g/ J. [
# OSD 进程在内存中的 OSD Map 缓存(MB), 默认 50
$ \+ Q% {" s. {9 T' x9 vosd map cache bl size = 1280 P* t: P: G# @
/ T4 g* Q) g* `. a1 n/ B# 默认值rw,noatime,inode64, Ceph OSD xfs Mount选项
t0 S8 f: N" [. E" M7 w* q: zosd mount options xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier"
: j8 {5 q1 h( H; b5 N8 a, c9 H/ ^
# 恢复操作优先级,取值 1-63,值越高占用资源越高, 默认值 10
9 G+ T% x- [7 p/ [; yosd recovery op priority = 20/ T# o8 U+ v: j( b6 M8 [. z
" V* ~ i( O' z$ r: }7 @- N0 y: o. }# 同一时间内活跃的恢复请求数, 默认值 15
3 o& \6 \* j; J- L2 eosd recovery max active = 152 U. x, O8 t6 ^8 n
: A- m; N8 {# z# 一个 OSD 允许的最大 backfills 数, 默认值 10
" P- M0 Q$ r6 J8 n6 xosd max backfills = 10$ }3 y, g5 M" Q- I! G3 u
& x& b4 Z8 w; C7 G1 v2 F# 开启严格队列降级操作
+ g- C* H- x) a/ t$ Y5 gosd op queue cut off = high) X7 m* n9 `6 R* F, h) y6 ~5 l
+ }# b O/ x' h) m( x- W1 q- y
osd_deep_scrub_large_omap_object_key_threshold = 8000004 f- J. ^: q/ s
osd_deep_scrub_large_omap_object_value_sum_threshold = 10737418240
3 D |$ N0 h4 v9 M8 E) [6 m4 X: @* X% N; N
[mds]3 C$ ?5 W5 N7 p4 U: z# x3 A) `
# mds 缓存大小设置 60GB
' x7 m1 ^6 }' v% ^, n# umds cache memory limit = 62212254726
j9 o. x4 d1 f @% K5 o& \9 s8 S9 T8 |6 c/ b. h
# 超时时间默认 60 秒7 r& p m4 D9 U) B0 w0 `
mds_revoke_cap_timeout = 360
& ]2 i* L+ l! n8 Q- a5 T ?, [$ `' m0 U+ }6 H( ^: O
mds log max segments = 512009 H- |2 T# F, r1 [* Z( l$ ~
mds log max expiring = 51200" b5 W9 x: J+ |% f
3 [. X6 R% N- @7 D2 [) Y
mds_beacon_grace = 300; P, v& W$ C b1 e) }' V: ]
( o% v% G: N; M$ ^* ~4 E1 o' ^
# 对目录碎片大小的硬限制 默认 1000001 N6 J: Y( ]# Y' Z F
# https://docs.ceph.com/docs/master/cephfs/dirfrags/
, z$ N8 G! n9 s/ lmds_bal_fragment_size_max = 500000 {5 f2 z. [$ N( p; q/ L& j
7 Y# t9 x2 G9 U% L## 官方配置 https://ceph.readthedocs.io/en/latest/cephfs/mds-config-ref/
, h, D" c6 m4 x) n! G7 X/ N' W- ^# ~) p, @
[client]
+ `( w0 h- w5 j1 I
+ F4 Y* M Z1 n f# D8 E# RBD缓存, 默认 true1 \% y8 Q; p' |3 t* f, ^; F/ q
rbd cache = true
! I8 W$ |+ S0 h1 p+ P; b0 B5 `7 ^6 H( x. b5 w6 E8 m
# RBD缓存大小(bytes), 默认 335544320(320M)" w9 A4 i5 ~8 Z b9 t2 T
rbd cache size = 268435456
) R" q+ l) C- b) P: ?3 ?
" k0 @9 E) r! g! R' j# 缓存为 write-back 时允许的最大 dirty 字节数(bytes),如果为0,使用 write-through,默认值为 25165824- l4 w l1 D$ c* k2 N4 M
rbd cache max dirty = 134217728! l0 O2 T! _% Q
/ E4 Z M5 }8 {' @) h1 f" p# 在被刷新到存储盘前 dirty 数据存在缓存的时间(seconds), 默认值为 18 a Y: [5 w9 S; m9 t. E3 M4 \# v
rbd cache max dirty age = 5
4 Y" a3 a0 s; }; B9 H) E
" R4 T- K) V P, lclient_try_dentry_invalidate = false
9 g& i5 ^) }' Z! X
3 B4 ` f9 f3 k; D. ^3 I7 A) h9 [/ n. F[mgr]
+ D5 J! k4 o4 o0 k6 \! fmgr modules = dashboard; O' O0 G% _& @
/ R0 B0 q9 ~: @; u# 华为云调优指南 https://support.huaweicloud.com/ ... object_05_0008.html: b K& K+ Y7 G7 n# c" P2 z
# https://poph163.com/2020/02/18/c ... %E8%B0%83%E4%BC%98/
# H# k$ i( t# C# L- a( s复制: Q7 h# P6 [3 q$ x2 [/ i- H
full osd
: r. I1 Y: Y3 n% @full osd 每个 osd 已经写满上限:https://docs.ceph.com/en/latest/ ... no-free-drive-space[5]. J' ~- v9 Z' e a( ~% P
$ m! A. h' i5 z
$ ceph osd dump | grep full_ratio
% e* J$ _8 U, h5 w2 Y2 J1 h8 X- kfull_ratio 0.95
% G. q. k- \9 G( ?; qbackfillfull_ratio 0.9
0 c# C( F; z& e3 C$ s9 ]nearfull_ratio 0.85
6 }& u5 |/ g4 h7 A7 O2 ?+ y复制
3 `! L) M/ a1 a6 p集群状态:
6 D/ P- q& a! _# a2 s6 T
* J3 W, J' J; G$ ceph -s
/ [! T7 o$ P% I" j6 N6 J+ J cluster:* m5 y' U5 }- ~8 e/ ^
id: 2f77b028-ed2a-4010-9b79-90fd3052afc6
8 r' J* |' {, e$ q6 x health: HEALTH_ERR" {+ {' A' m& l( I
2 backfillfull osd(s)* p m& V9 B& M4 G
1 full osd(s)
, X% N: h* M" |4 }. q4 E! U& d 2 nearfull osd(s); ]" ]* v8 q1 n6 F% c
7 pool(s) full
4 |/ r6 r6 O- g6 U8 t复制
% X5 K! W1 g7 z4 y/ ~. z执行 osd 磁盘状态时,如果已经有超过 95% 使用率时则会报错 full osd 则会造成 cluster 无法正常使用:
1 m: P0 O- i! v! q3 O: ~) V7 I% o! `9 L, d K
$ ceph osd df
. Z2 R3 c1 V6 M( D& kID CLASS WEIGHT REWEIGHT SIZE USE DATA OMAP META AVAIL %USE VAR PGS
' _8 w/ | ~8 a# h) \ 0 hdd 7.27689 1.00000 7.3 TiB 4.7 TiB 4.7 TiB 918 MiB 9.1 GiB 2.5 TiB 65.15 0.84 68
9 P5 v- I' |6 |/ r5 T- f5 y 1 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 327 MiB 11 GiB 1.2 TiB 84.07 1.09 67
, ?7 t: U; R T/ K- @ 2 hdd 7.27689 1.00000 7.3 TiB 4.3 TiB 4.3 TiB 924 MiB 8.4 GiB 2.9 TiB 59.70 0.77 67. c: _5 [3 f( f3 R" i r3 `
3 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 807 MiB 9.8 GiB 2.1 TiB 70.57 0.91 66# D+ a- M$ y- m3 [( L& X) ]
4 hdd 7.27689 1.00000 7.3 TiB 6.7 TiB 6.7 TiB 770 MiB 13 GiB 583 GiB 92.18 1.19 66* ?6 m9 i* B* c6 T: c' F" ^
5 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 623 MiB 10 GiB 1.8 TiB 75.87 0.98 66
, i! N. Y; z; `* y 6 hdd 7.27689 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 602 MiB 11 GiB 1.6 TiB 78.67 1.02 645 m* L0 X P- a: p$ d: W- l' k
7 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 1.1 GiB 10 GiB 1.9 TiB 73.35 0.95 658 k: K* b; _6 X: H" a
8 hdd 7.27689 1.00000 7.3 TiB 5.9 TiB 5.9 TiB 498 MiB 11 GiB 1.4 TiB 81.29 1.05 68: q; _/ e. M: M$ i
9 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.1 GiB 9.8 GiB 2.1 TiB 70.59 0.91 650 T9 K+ w' D% @% e
10 hdd 7.27689 1.00000 7.3 TiB 6.3 TiB 6.3 TiB 297 MiB 12 GiB 985 GiB 86.78 1.12 616 b# L# [7 b% N3 b* y
11 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 923 MiB 9.7 GiB 2.1 TiB 70.56 0.91 67( }$ ~- u9 [+ i: h/ b
12 hdd 7.27689 1.00000 7.3 TiB 5.9 TiB 5.9 TiB 203 MiB 11 GiB 1.4 TiB 81.39 1.05 65 B/ _( ?( {- g( y [
13 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 799 MiB 10 GiB 1.9 TiB 73.29 0.95 66
4 x( x2 C* A0 M9 C+ ^0 S14 hdd 7.27689 1.00000 7.3 TiB 4.9 TiB 4.9 TiB 873 MiB 9.4 GiB 2.3 TiB 67.77 0.88 71, ~0 D9 @. V+ s
15 hdd 0.29999 1.00000 7.3 TiB 6.9 TiB 6.9 TiB 191 MiB 13 GiB 387 GiB 94.81 1.23 39
" L: J5 ?0 o% q( |16 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 548 MiB 11 GiB 1.8 TiB 75.91 0.98 69
% I$ E4 n+ a, l17 hdd 7.27689 1.00000 7.3 TiB 6.7 TiB 6.7 TiB 806 MiB 13 GiB 581 GiB 92.20 1.20 66/ l, M$ o; @: a& \& U
18 hdd 7.27689 1.00000 7.3 TiB 4.5 TiB 4.5 TiB 1.4 GiB 8.5 GiB 2.7 TiB 62.43 0.81 66" L% j4 I6 ~/ l; C, h6 N3 K6 H# j
19 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 1.4 GiB 10 GiB 1.9 TiB 73.28 0.95 650 E: u8 i/ Z$ G! ^7 N$ T. J$ _ ?, Q
20 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 705 MiB 11 GiB 1.8 TiB 75.91 0.98 64
0 P0 C- N4 p+ A2 o* O$ H21 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 911 MiB 11 GiB 1.2 TiB 84.11 1.09 62
0 R* Y- c/ O9 k+ s22 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 301 MiB 11 GiB 1.2 TiB 84.03 1.09 66; @6 k) S4 ~1 }& J# O
23 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 401 MiB 9.8 GiB 1.7 TiB 75.96 0.98 670 A4 X9 M0 ]/ { O' d0 U L
24 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.3 GiB 9.6 GiB 2.1 TiB 70.58 0.91 63! k$ c" a2 n) Y7 Y
25 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.1 GiB 9.7 GiB 2.1 TiB 70.56 0.91 65
1 K4 W* z8 W: @6 n26 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 730 MiB 10 GiB 1.9 TiB 73.32 0.95 68, H0 b" O& ?. {' d
27 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 818 MiB 12 GiB 1.2 TiB 84.08 1.09 62; {# s% n: j' H' O8 X
28 hdd 7.27689 1.00000 7.3 TiB 4.9 TiB 4.9 TiB 587 MiB 9.3 GiB 2.3 TiB 67.84 0.88 68
+ }3 i0 `- t) t. M+ B5 Y, D29 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 215 MiB 11 GiB 1.2 TiB 84.09 1.09 664 L+ n2 I3 l- R; n9 {
30 hdd 7.27689 1.00000 7.3 TiB 6.1 TiB 6.1 TiB 690 MiB 12 GiB 1.2 TiB 84.15 1.09 64
+ g$ s& m& r- s1 r3 n31 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 1020 MiB 10 GiB 1.8 TiB 75.94 0.98 64/ p$ Z, G- N$ [% L
32 hdd 7.27689 1.00000 7.3 TiB 6.5 TiB 6.5 TiB 616 MiB 12 GiB 786 GiB 89.45 1.16 66
, N# F9 m. h8 u4 j/ O1 W33 hdd 7.27689 1.00000 7.3 TiB 4.9 TiB 4.9 TiB 622 MiB 8.9 GiB 2.3 TiB 67.84 0.88 66
" t# }. ]" H" ?$ A5 C- L n34 hdd 7.27689 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 102 MiB 11 GiB 1.6 TiB 78.56 1.02 65
! Q; L, l6 W7 w7 }% r% P35 hdd 7.27689 1.00000 7.3 TiB 5.9 TiB 5.9 TiB 723 MiB 11 GiB 1.4 TiB 81.31 1.05 63: ]! I* Z4 b+ {" A- ~ {* `
TOTAL 262 TiB 202 TiB 202 TiB 25 GiB 381 GiB 60 TiB 77.15
0 p: N Y* Y) e1 F4 N3 g+ P* l复制
7 B% `4 }' i' i2 W9 m" v可以手动修改权重解决:) [! v6 u* \) J* m n
" ^- k" g2 B1 N$ ceph osd crush reweight osd.4 0.3
& `- I& B& G0 R# G/ C复制: v; c U/ f G
pg 均衡. v$ ~: p7 @5 u% |) F1 p& Z& ^ w
pg 在默认分配有不合理的地方。https://cloud.tencent.com/developer/article/1664655[6]8 U @5 T! J$ X
; B" g ~# Z/ c% [
$ ceph osd df tree | awk '/osd\./{print $NF" "$(NF-1)" "$(NF-3) }'- [/ ~' y' J- R, n7 O$ z
osd.0 89 71.20
* v- X7 ~" z$ p& Z: a$ Q4 oosd.1 38 94.800 @# W2 [ t8 s
osd.2 92 68.44
! F4 F- N5 w9 t& U( H( v+ y2 l1 aosd.3 92 72.36( o& a0 R$ F, \ D
osd.4 28 76.86# I% z, L; ~4 K
osd.5 64 81.37+ S% V8 X) W+ h& D% g8 _9 J
osd.6 62 87.90, C' {- a* E" \' t. f( c
osd.7 89 78.78$ h8 Q- U L$ y$ d
osd.8 52 86.18
' b) l5 M% K1 N, l4 }( c* f) Vosd.9 89 75.44) T. ], Y, F. p7 [0 @
osd.10 37 96.33
! I& }: e8 c* g% |% e: ?: n; Cosd.11 102 75.26
6 n; K9 a; \5 E3 i( ~+ A/ xosd.12 33 91.41
/ y7 k% e5 Y2 [osd.13 34 95.98/ h8 C3 m: D6 s8 f c. C0 |
osd.14 59 84.97# I& x" n$ f/ A* y
osd.15 20 70.92" o. _* ]2 y; z8 b. O( @, h
osd.16 113 89.46
: k8 ]% J1 J" e( G2 ~) oosd.17 30 77.12
0 S% O! {9 d8 nosd.18 124 77.11
. K7 R. O4 n( r. d4 e9 h/ Dosd.19 44 95.23$ i% I6 A+ a) G5 ^! N+ N
osd.20 65 84.63$ g `% S2 ~% k3 ^: [
osd.21 98 96.71
/ H: m0 _6 [% [; b$ ]% q, P- x6 X( losd.22 34 95.93
" n9 C* I2 E8 b/ y, b$ c* ? F; Hosd.23 62 84.561 K$ d0 ^; Y7 e# G! P7 F' M
osd.24 110 76.63$ Q7 m$ q7 ]0 j& Z, U( E) `
osd.25 64 82.32
" B* r7 K V: _' I/ Q7 |; josd.26 59 88.26# j- i; L6 Q4 q m8 \. y8 a& r1 J5 U
osd.27 38 95.83- u/ z9 [; _4 D9 s2 A( t' Y+ K3 d' {
osd.28 105 79.19% y4 h% l* h: q* Y# F
osd.29 36 94.944 v& a. Y) U6 w, b d
osd.30 94 90.79
* h2 o' o( E. Y" q0 w3 G, gosd.31 91 81.740 \0 J. ^% D5 e: S
osd.32 12 42.44
! k! q4 f6 \" \# s1 Z6 {osd.33 94 81.32
: a9 P+ R. W& S7 Q8 r& d% U5 L- losd.34 46 86.51
' |1 S, r: |. V) ~9 l. S) Gosd.35 37 92.68
y% k- x( }1 E复制
: M$ V! |- h" Y9 ireweight-by-pg 按归置组分布情况调整 OSD 的权重:
7 @7 k7 ~9 R( Y, Y, t5 t& x6 n0 C0 @& q8 U
$ ceph osd reweight-by-pg
0 m) X1 |+ ?3 l3 gmoved 0 / 2336 (0%). b3 P5 U/ c" m5 } U
avg 64.8889" E' |5 J( ?! q" J, T
stddev 58.677 -> 58.677 (expected baseline 7.9427)2 y6 |2 W2 q1 |/ |9 t" w8 c2 e1 Z# q
min osd.1 with 0 -> 0 pgs (0 -> 0 * mean)- i9 _8 ?6 ^$ |; x
max osd.18 with 168 -> 168 pgs (2.58904 -> 2.58904 * mean)
. B/ R: ^) z, [3 |$ a7 b+ v# u r1 Z& T
oload 1207 z. \# C7 n, H/ i
max_change 0.05; e# b" E# I% l! T
max_change_osds 4
/ e) @% h _5 M( yaverage_utilization 18.2677
" C$ W9 V% R: E h0 koverload_utilization 21.92122 x# f& [4 Y \8 r5 E- R( l# M
osd.19 weight 1.0000 -> 0.95008 a! ]3 y9 W: C) \2 S
osd.1 weight 1.0000 -> 0.9500
% v- l1 j( y" ~5 sosd.27 weight 1.0000 -> 0.95009 r7 L: H5 o7 x* W) Y# ^% @8 ?
osd.10 weight 1.0000 -> 0.9500! r4 ~( b" A2 j3 X6 D' _9 o
复制
0 Y8 ]. y* H' Q+ C% \) W: x1 T0 Qreweight-by-utilization 按利用率调整 OSD 的权重:
; c2 C1 q% [( z- e5 a5 j+ O, o' [; ?* R% n7 Y: j
$ ceph osd reweight-by-pg
; u( [1 J$ m0 |8 [7 Kmoved 0 / 2336 (0%)
: c3 ~9 r' Q8 \* javg 64.8889
8 @1 Y$ W2 T( pstddev 58.677 -> 58.677 (expected baseline 7.9427)1 I3 c6 Y$ d8 v3 ~; ?
min osd.1 with 0 -> 0 pgs (0 -> 0 * mean)0 L: D7 Y& U! u/ ~
max osd.18 with 168 -> 168 pgs (2.58904 -> 2.58904 * mean)5 N( @7 B& s2 ]$ v! N& O4 G0 G
: E5 S" k% ?+ Joload 120 F {( L5 V r# T
max_change 0.057 B4 b- H- _' I {; S' |' O% Q+ b
max_change_osds 4
# M& C1 w N$ maverage_utilization 18.2677
; J, |3 L* N, poverload_utilization 21.92121 E' ]9 `# u' O) O/ l8 o; C
osd.19 weight 1.0000 -> 0.9500
# s6 p0 K$ O+ bosd.1 weight 1.0000 -> 0.9500
5 m. f, P6 ~* l7 i0 K$ ?osd.27 weight 1.0000 -> 0.9500) P3 G$ \9 x) Q
osd.10 weight 1.0000 -> 0.9500 T4 c5 C) c8 i( f- t3 J) v* U, B
复制
, Z n* ]- {, I5 z7 C. d调整写入权重:, h& ]; M' T) s( ^2 I8 i: \
3 T: z* g o! ^1 V/ K2 p) P$ ceph osd reweight osd.35 0.001
8 D- C) O1 M7 t8 `( r复制& ] U$ e; p% ^9 q3 x$ t1 S3 D
查看当前 osd 信息:
7 `/ L. ]. D z* s( X7 B
# I% X$ [$ b9 `* @$ ceph osd df
* a% R- V }$ c: oID CLASS WEIGHT REWEIGHT SIZE USE DATA OMAP META AVAIL %USE VAR PGS
* O. u+ [! X1 h" S, I2 p 0 hdd 7.27689 1.00000 7.3 TiB 5.2 TiB 5.2 TiB 1.0 GiB 9.4 GiB 2.0 TiB 71.96 0.86 39
. F- ?: X1 b: W- m 1 hdd 0.00999 0.90002 7.3 TiB 6.9 TiB 6.9 TiB 604 MiB 12 GiB 382 GiB 94.88 1.13 37' l$ ~9 I7 r3 A$ R- h, g
2 hdd 7.27689 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 1.2 GiB 8.8 GiB 2.2 TiB 69.55 0.83 34
3 {& t6 @) T# s) \' `7 u 3 hdd 7.27689 1.00000 7.3 TiB 5.3 TiB 5.3 TiB 812 MiB 9.9 GiB 2.0 TiB 73.15 0.87 34
& M( c0 C6 H) q+ T! J6 h3 M 4 hdd 0.29999 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 185 MiB 12 GiB 1.7 TiB 77.01 0.92 26
4 e4 _; X" B: o- k% _ 5 hdd 3.00000 1.00000 7.3 TiB 6.0 TiB 5.9 TiB 443 MiB 11 GiB 1.3 TiB 81.90 0.98 36$ K% H ~; ~/ Y" S; I
6 hdd 3.00000 1.00000 7.3 TiB 6.5 TiB 6.5 TiB 499 MiB 11 GiB 809 GiB 89.14 1.06 38
5 x' f0 f. J/ N; a! b 7 hdd 7.27689 1.00000 7.3 TiB 5.8 TiB 5.8 TiB 1.2 GiB 11 GiB 1.4 TiB 80.10 0.96 43, }- ]8 T7 v2 @' u" A9 \" T* E
8 hdd 3.00000 1.00000 7.3 TiB 6.3 TiB 6.3 TiB 502 MiB 11 GiB 992 GiB 86.69 1.03 366 d. F9 ~# {3 P3 e4 K9 G
9 hdd 7.27689 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 1.5 GiB 9.8 GiB 1.7 TiB 76.57 0.91 42. r6 G8 @2 E. c* @, p
10 hdd 0.00999 0.00099 7.3 TiB 7.0 TiB 7.0 TiB 295 MiB 12 GiB 267 GiB 96.41 1.15 37
7 U, h) y3 m( P% Y8 \. m11 hdd 7.27689 1.00000 7.3 TiB 5.5 TiB 5.5 TiB 1.2 GiB 9.8 GiB 1.7 TiB 76.13 0.91 37
$ |" C: W# S; B" B12 hdd 0.00999 1.00000 7.3 TiB 6.7 TiB 6.6 TiB 95 MiB 12 GiB 635 GiB 91.48 1.09 32' X" E/ ]" n D1 c0 @
13 hdd 0.00999 1.00000 7.3 TiB 7.0 TiB 7.0 TiB 584 MiB 12 GiB 315 GiB 95.78 1.14 34
9 q" V" k E- A14 hdd 3.00000 1.00000 7.3 TiB 6.2 TiB 6.2 TiB 974 MiB 11 GiB 1.0 TiB 85.86 1.02 405 R$ b2 E, {/ C- ^2 g" {5 p
15 hdd 0.00999 1.00000 7.3 TiB 5.1 TiB 5.1 TiB 116 KiB 10 GiB 2.2 TiB 70.43 0.84 20
- w1 R9 ?- e( K! J. R16 hdd 7.27689 1.00000 7.3 TiB 6.6 TiB 6.6 TiB 1.2 GiB 11 GiB 697 GiB 90.64 1.08 43: L( ?9 n3 L2 g0 C1 U- w0 `9 r% r7 C
17 hdd 0.29999 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 40 KiB 12 GiB 1.7 TiB 76.75 0.92 26
( G. w3 _4 p0 v# }# m+ F18 hdd 7.27689 1.00000 7.3 TiB 5.7 TiB 5.7 TiB 1.9 GiB 9.3 GiB 1.6 TiB 78.01 0.93 535 X9 N/ Q. l6 R1 [" _
19 hdd 0.00999 0.00099 7.3 TiB 6.9 TiB 6.9 TiB 1.5 GiB 13 GiB 371 GiB 95.02 1.13 40% [. z, P' h% h. W- h: c5 T
20 hdd 3.00000 1.00000 7.3 TiB 6.2 TiB 6.2 TiB 744 MiB 12 GiB 1.0 TiB 85.86 1.02 37
: O2 c5 X. }: j- V0 E/ I21 hdd 7.27689 0.00099 7.3 TiB 7.0 TiB 7.0 TiB 913 MiB 12 GiB 239 GiB 96.79 1.15 40
. ^* s/ A7 o7 _% {4 H22 hdd 0.00999 0.00099 7.3 TiB 7.0 TiB 7.0 TiB 283 MiB 12 GiB 298 GiB 96.00 1.14 34. v2 v) \* P( b. w3 c* O6 P& A
23 hdd 3.00000 1.00000 7.3 TiB 6.2 TiB 6.2 TiB 515 MiB 11 GiB 1.1 TiB 85.30 1.02 35
% ?+ @8 {/ V2 H. e1 N$ T24 hdd 7.27689 1.00000 7.3 TiB 5.6 TiB 5.6 TiB 1.4 GiB 9.8 GiB 1.6 TiB 77.63 0.93 42" X) G8 ?2 L4 w( b
25 hdd 3.00000 1.00000 7.3 TiB 6.0 TiB 6.0 TiB 1.2 GiB 10 GiB 1.3 TiB 82.66 0.99 40
, W" E& V* ^2 w26 hdd 2.00000 1.00000 7.3 TiB 6.5 TiB 6.5 TiB 737 MiB 11 GiB 823 GiB 88.95 1.06 36 P* r5 w4 i7 \6 a) t
27 hdd 0.00999 0.00099 7.3 TiB 7.0 TiB 6.9 TiB 822 MiB 12 GiB 327 GiB 95.61 1.14 37
$ J7 M# @- A9 z" {28 hdd 7.27689 1.00000 7.3 TiB 5.8 TiB 5.8 TiB 859 MiB 10 GiB 1.4 TiB 80.23 0.96 40
* L: v* |( \, x8 }: T29 hdd 0.00999 0.00099 7.3 TiB 6.9 TiB 6.9 TiB 215 MiB 12 GiB 371 GiB 95.02 1.13 36
: I% T2 t5 w3 Q" h. R30 hdd 7.27689 1.00000 7.3 TiB 6.7 TiB 6.7 TiB 1.0 GiB 12 GiB 607 GiB 91.85 1.10 47% \" P- k% y; w) y* j" y
31 hdd 7.27689 1.00000 7.3 TiB 6.0 TiB 6.0 TiB 1.2 GiB 10 GiB 1.3 TiB 82.81 0.99 413 B# l, {( k% g; [
32 hdd 0.29999 1.00000 7.3 TiB 3.0 TiB 3.0 TiB 32 KiB 7.1 GiB 4.3 TiB 41.47 0.49 10
$ o) X( e/ `: W' ~. f# T# ]33 hdd 7.27689 1.00000 7.3 TiB 6.0 TiB 6.0 TiB 827 MiB 9.7 GiB 1.3 TiB 82.06 0.98 41
/ L. H3 y( i5 v+ E; s' ^34 hdd 2.00000 1.00000 7.3 TiB 6.3 TiB 6.3 TiB 308 MiB 11 GiB 976 GiB 86.90 1.04 339 s/ [# H0 T. z3 i' `' w" ~$ b
35 hdd 0.00999 0.00099 7.3 TiB 6.7 TiB 6.7 TiB 613 MiB 12 GiB 540 GiB 92.75 1.11 36
. B7 E) w3 }5 X# f% ^: g TOTAL 262 TiB 220 TiB 219 TiB 27 GiB 391 GiB 42 TiB 83.87
$ ]" @' b2 ]/ s! G7 G/ ^MIN/MAX VAR: 0.49/1.15 STDDEV: 10.62, z! ~. t2 N. ]: }5 d0 k
复制
# d( ?" [) @( H6 V删除 Cephfs! }, C' C; ]; _6 J0 S$ W
4 f1 j+ M% g7 I) t) d4 _& u关闭所有 mds 服务, 需要登入服务器手动关闭:! O6 _0 {+ p V
' E# @8 e6 F5 N- ~2 T0 R
$ systemctl stop ceph-mds@${HOSTNAME}
$ ^5 }3 c# [4 I+ r" Y复制
3 b" W: s. D# o0 e$ H9 N0 D* _% _9 i! f删除所需 fs:
1 w1 y1 }! }; G5 r+ s+ N4 H) T5 H. B* c4 e3 E! U
$ ceph fs ls
. p5 y/ t1 Y3 v6 T$ ceph fs rm data --yes-i-really-mean-it: D# [, J/ |! j
复制
* e7 c8 T1 Y* B) @% KSSD 使用* X2 K" L( z2 t( z: Z* M! `
( F" a' X9 x$ V" J o& f查看当前 OSD 状态: (相关文档:https://blog.csdn.net/kozazyh/article/details/79904219[7])
9 R9 ]% L- v3 }0 Z6 d1 ^8 V" f* l# y) T
$ ceph osd crush class ls$ J2 n6 m0 D- C0 y+ w9 v6 X u
[; r4 S# }* T7 i: H* A V8 \
"ssd"1 r6 M; J0 U- Y- A- T
]# z) r0 q' `' V
复制" v/ F: S X1 b9 I
如果使用的 SSD 标识错误,请自定义修改,命令如下, 移除 osd 1 ~ 3 的标识:
, u+ _* J* `7 q4 q! E) p! D: z8 S" w8 J, ^
$ for i in 0 1 2;do ceph osd crush rm-device-class osd.$i;done
- t/ ]+ G6 R0 x8 h复制
1 o D+ ]9 \; h: @' }+ ^设置 1 ~ 3 标识为 ssd:+ k# Q: G# {- P' k2 y+ |, f
. _9 ^) E: \* H6 D$ for i in 0 1 2;do ceph osd crush set-device-class ssd osd.$i;done3 _+ u" L# H0 _- n# F& C
复制 e- K$ B1 ?& H% N' m' p
创建一个 crush rule:2 F B* m9 \, ]6 B
! d& R( M$ B; Y: R4 z
$ ceph osd crush rule create-replicated rule-ssd default host ssd
% _$ s! `6 A9 d$ ceph osd crush rule ls" E: K; Y8 B3 Y! S; P9 X0 U
复制# y: z; s" i; y) h( m9 t
然后创建 pool 时附带 rule 的名称:, C8 E# m4 Z8 @( J+ M
; S- K1 l$ ]- r+ s
$ ceph osd pool create fs_data 96 rule-ssd
7 u" D* K I" J: I1 Z$ ceph osd pool create fs_metadata 16 rule-ssd
' \) I, k; E) r) m, u$ ceph fs new fs fs_data fs_metadata
0 j& c+ e; B* _/ u: \2 G2 {' u复制
]( F) F8 D7 u6 ~# Ccrushmap 查看0 c/ w; W* b1 n. L
执行命令如下:7 Q' K9 o* H3 V
! r3 {) t, [& ]$ ceph osd getcrushmap -o crushmap
* F" l y- t& n* h5 B6 f/ u$ crushtool -d crushmap -o crushmap
7 X) X9 L& C6 _ P7 a J$ cat crushmap
. m8 H" |( n' E6 V' A$ k2 W复制. i. N- @: C5 n A+ N- B3 Z0 N& L
3 monitors have not enabled msgr2
: S N- L( F& N5 F1 v7 P/ T解决如下:
5 \8 }; z" u" H7 H' O/ S( W
: W0 g% D3 J- N8 {- Q [$ ceph mon enable-msgr2& Z, d0 @1 R. V. P* [
复制
/ I5 \3 [, m1 R( v2 daemons have recently crashed: [; A" g7 A: X
解决如下:https://blog.csdn.net/QTM_Gitee/article/details/106004435[8]
0 e$ ~# I$ G3 i# [, R/ X7 C
" d* }( l: w4 O& e0 z. v2 a$ ceph crash ls
! l" y5 W( h; c$ ceph crash archive-all
6 U0 d( \ A+ r, f
! e; t- y+ T7 V2 i* I# ^* {! [ |
|