|
|
[root@compute1 ~]#
- `( R! B ]- {7 ]+ ~6 G$ SMessage from syslogd@compute1 at Dec 13 17:56:10 ...) T6 O; @7 A6 U4 }9 p
kernel:NMI watchdog: BUG: soft lockup - CPU#16 stuck for 22s! [ksoftirqd/16:89]) {+ e7 e3 ^ M: y
: Q; E4 G0 T( G" L' S$ [# @Message from syslogd@compute1 at Dec 13 17:56:22 ...
5 ?; x) |8 N2 c, W' X kernel:NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:2:918]
) a# k7 k% R0 c$ j
" W9 i6 S. l2 o i& s5 ` S4 c# EMessage from syslogd@compute1 at Dec 13 17:57:05 ...
/ f/ g$ D+ ~, w" O6 H kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:11804]8 C8 ]0 m) g5 M3 R& R2 {8 u/ \
7 Z: D1 n, L5 s& f% W& Z O& i4 }Message from syslogd@compute1 at Dec 13 17:57:17 ...
8 p! J3 M P& R" r5 `: v; P kernel:NMI watchdog: BUG: soft lockup - CPU#16 stuck for 22s! [ksoftirqd/16:89]& ^) e8 \7 y$ ]- O3 ^5 }
. t# K ^' R, f9 q: S1 c( ~; m3 R, {# y6 Q* e
kernel:NMI watchdog: BUG: soft lockup - CPU#34 stuck for 22s!
5 I' j6 ~% L8 J6 q3 I0 u解决:9 T0 |( `# |8 A1 o% J8 |( i
$ Z: F' M( k: Z) p D: b& l; v, ^echo 30 > /proc/sys/kernel/watchdog_thresh' U+ I* a- y9 a6 {9 [9 ]. c# f
#追加到配置文件去% Y! x3 ?/ ^5 o/ ]4 I6 e" }
tail -l /proc/sys/kernel/watchdog_thresh
7 U) U' V$ G. }+ \3 B9 L0 g! [$ Z#查看确认
& a, }% X- N# [6 [sysctl -w kernel.watchdog_thresh=308 q8 [, _9 s( T$ @+ M" }
#临时生效
2 m3 K m5 m$ a' u+ [
1 r) G9 ^9 _6 Q
2 _# N7 E1 B* K w解决办法:
2 z* Q3 q9 l0 \5 w
+ b6 m5 U% i% }. ~2 H#追加到配置文件中
; G6 M! P3 d; |! c z. Q" |6 R# q+ N+ o5 {" i" D& L9 Z
echo 30 > /proc/sys/kernel/watchdog_thresh
; O( @3 |" t7 |- U: \9 A
8 }- d! ~5 K( M0 s#查看' \0 Y' ^! }7 q) B
, s, e4 I' {. @
[root@git-node1 data]# tail -1 /proc/sys/kernel/watchdog_thresh7 c: o* u1 L, K2 i0 t
304 g$ f, B; W8 `% e
/ @) U4 O+ k1 F0 v/ c; A#临时生效
4 G9 M7 v3 q' k" c) @- l' s5 \, |& J/ S% S6 P' T6 W
sysctl -w kernel.watchdog_thresh=30
" Z. `5 |( a9 a4 I, _& T+ [2 n1 b* M; `. V1 s
4 h( g5 }( h) i+ o9 Z
1 W9 I9 _; U! A; m# ^
#内核软死锁(soft lockup)bug原因分析
3 z' B3 k3 W7 T+ W, U! w' S; j7 @, d( x! X* j
Soft lockup名称解释:所谓,soft lockup就是说,这个bug没有让系统彻底死机,但是若干个进程(或者kernel thread)被锁死在了某个状态(一般在内核区域),很多情况下这个是由于内核锁的使用的问题。 G3 A( d' l: d4 ]! ~0 D1 a1 V
[, U9 b/ F8 Y7 U' Q0 W- ?$ g
( ^5 P& n6 d" B; F0 P* u7 Y: h, W7 c' f$ {' n( u6 m4 p
vim /etc/sysctl.conf, L4 z& W/ A6 y: x! j K
6 U& b/ s" h# Mkernel.watchdog_thresh=30
6 M/ _1 l) A5 u& |$ w6 s" A
0 @* u9 j- E* ^7 ~0 L5 b. f8 W- Y4 r- S6 |
|
|