|
下载完后直接上传到监控端和被监控端的的:/usr/local/nagios/libexec/ 目录.
: [/ e5 j, u) Q# ^" U. j给它执行权限:
2 e' i# T0 W! u; L: lchmod u+x check_iostat
. W+ c5 D5 E6 ^) O* S3 d" K/ S7 X" k! `* I% V
查看它的帮助:
" n5 h* \' Z0 z# h[root@localhost libexec]# ./check_iostat -help' z9 \* |( T- ~# I
% P# j& M: j$ s! f, B
This plugin shows the I/O usage of the specified disk, using the iostat external program.
/ Q. B1 Z$ h- v1 |2 IIt prints three statistics: Transactions per second (tps), Kilobytes per second
6 c6 y6 m" o! n z. f; Fread from the disk (KB_read/s) and and written to the disk (KB_written/s)
% `6 X! N, ] ^
; T1 a( Y- j- Z( w0 N& _, v./check_iostat:! n! |5 ?2 R2 t' [
-d Device to be checked (without the full path, eg. sda)
0 ?7 G' a& C- e0 D) S C" x3 F-c ,, Sets the CRITICAL level for tps, KB_read/s and KB_written/s, respectively' H( O1 U2 h( B# E- w" z8 M
-w ,, Sets the WARNING level for tps, KB_read/s and KB_written/s, respectively$ Z& D/ N7 Y3 J' @+ e
: q+ ~! |2 m: D, }) h1 f, `$ D
; @; O9 X8 a9 ^6 }- w- F可以看到,它是用来检查硬盘上每秒数据写入读取的。 ' ~* }; z2 S- e$ L, j, X
参数分别是: 7 a2 J& d; ~; p1 b1 e
-d: 要检查的设备名称,不用写全路径 % X9 ]/ z3 R: m d$ }% D
-c: 当达到多少 KB/S 时就报 CRITICAL 级别的警 ?9 Y7 j+ f2 F, R9 G
-w: 当达到多少 KB/S 时就报 WARNING 级别的警 ! F$ _4 n2 a _/ C. l4 V$ t
5 Q+ f. t! b1 q1 S( K查看本机的硬盘信息: 1 c5 Q9 k8 d% L0 v( l" V7 [" G
[root@localhost libexec]# df -h/ o9 Y2 C9 `' ?# V7 j+ f8 T
Filesystem Size Used Avail Use% Mounted on) b7 g: `: S' T% o
/dev/mapper/VolGroup00-LogVol00
( `) m8 H9 y7 m 128G 27G 95G 22% /
6 Z- q* k+ z6 D: E& {- U- h/dev/sda1 99M 13M 82M 14% /boot7 {) i9 T9 r7 {7 Z& c; o6 ], a
tmpfs 4.0G 0 4.0G 0% /dev/shm) x, E7 S- K8 Q# ^
6 Z% k! F$ ]3 O# L g ^
4 C( {& ~) a& S6 K
上面的信息是 sda1, 那么 -d 后就写 sda ) Q0 j( z H) z0 a8 n" J. s
另外,还有可能不是 sda 的,如: 7 V8 @" s0 t3 {2 G% H: d, x
[root@li387-161 ~]# df -h8 \+ L- F9 n# u" B J7 a& b
Filesystem Size Used Avail Use% Mounted on( h8 A3 W* O) F1 W$ `
/dev/xvda 79G 38G 40G 49% /- E- L! |2 v/ h+ G* b% G9 A
tmpfs 1009M 108K 1009M 1% /dev/shm, [ b O: ]0 Z3 S1 @
T- @) n- q& L
) i5 ?; v8 v4 q上面的情况,-d 后就写 xvda
4 z# }5 O7 t2 l V+ @: n: l% e7 y; [- E- E/ w6 I/ e, \
检查是否能运行: + W2 N* t( g8 i, i# P
[root@localhost libexec]# ./check_iostat -d sda -w 1000 -c 2000
' n1 o. b8 C/ T9 C w+ JOK - I/O stats tps=1.71 KB_read/s=2.77 KB_written/s=26.77 | 'tps'=1.71; 'KB_read/s'=2.77; 'KB_written/s'=26.77;8 Z7 ^, s, ]4 l, G# q* l* G' k" B
) F* z! P. Z; M$ ^# ~/ ^0 N- _
: T1 D! V" J, ?, q如果不能运行,报错,先在本机安装 sysstat: + s% e/ ]7 W8 m/ l* F( j0 W
[root@localhost libexec]# yum install -y sysstat
- P- s" L5 [2 P2 x+ J# ?如果还报错,那就根据报错的信息一步步解决. 4 ] n% h! m. O! N
比如我这边报过: bc: command not found ; 解决: yum install bc 4 h! H U s4 k" j7 K
; X3 E7 A! f! ]# {
直到上面的 check_iostat 能正确执行.开始下一步配置.
3 q: M3 f8 R( {% `" v/ G7 L/ l
0 o& C& a* {7 }/ P' q- K; y8 H3 qNagios 配置
j, U/ a" \' \( ?+ y8 `: b, C========================== , n* K r! a/ k A% R- q7 ^& Z5 n
监控本地: 0 \7 Z6 l( g( F- x c# y3 m; H+ L8 R1 I
------------ $ q$ h. c( H9 \
在 commands.cfg 中添加 check_iostat
5 n! A9 g( _9 b- |2 E7 s5 ]' zdefine command{
; R3 R( J. K% t. R4 y command_name check_iostat) k9 |5 i3 Y" f9 p3 g
command_line $USER1$/check_iostat -d $ARG1$ -w $ARG2$ -c $ARG3$
1 |9 A* P" p! u& m+ i}
/ @6 m$ d' |; L( i3 O9 r$ {9 E$ M # L1 E3 C w/ r# D) n
定义了 check_iostat 命令,且接收三个参数. / k$ e( J& x+ c
$ _8 S0 P/ Y3 \. k0 j- l3 D更改本地配置文件.假如叫: localhost.cfg
/ V1 I7 `# O% E w* q在里面定义一个服务:
6 k; D1 I( n% K) W) Udefine service{
5 L/ y" C$ H0 Z9 c6 H9 f" c use local-service ; Name of service template to use d i" Q# _* Z6 o( Z5 A( \
host_name VOD-106, n" x! A$ P2 f/ e2 a+ `/ t
service_description Disk I/O, q" m, E6 b# U; h8 p6 M8 N
check_period 24x7 ; The service can be checked at any time of the day
4 h. q1 l6 b0 d/ a; _4 c! w max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state6 I& t5 N- q5 |/ K9 L0 j
normal_check_interval 2 ; Check the service every 10 minutes under normal conditions) o( h* u7 c& y! S- f
retry_check_interval 1 ; Re-check the service every two minutes until a hard state can be determined& e: C% U( a& \1 D
contact_groups admins ; Notifications get sent out to everyone in the 'admins' group
4 Z+ \* n7 F; c8 e! znotification_options w,u,c,r,f ; Send notifications about warning, unknown, critical, and recovery events3 l; i. ~" h `3 ^( y W
notification_interval 1 ;
, L4 b, l Y7 ^- G notification_period 24x7 ; Notifications can be sent out at any time# C4 @: J' Z% S4 D' p1 a, X( C P; H
check_command check_iostat!sda!1000!2000
/ u) A4 c5 M. y3 E# ]- S}/ V3 M" |9 k6 M F/ [4 E" X6 k
; R2 }: I1 y. m' M7 g Z# w如上红色部分所写.
9 i3 M8 N, s2 h# f2 K用感叹号分隔开参数. 1 M! Q7 i2 ^# l8 L
上面共有三个参数: sda, 1000, 2000 分别对应前面 commonds.cfg 中的三个参数.
3 W4 P6 N4 L- v3 \) E( Q2 q; d# n% p$ x
重新加载配置文件:
- v; g% ?9 ]5 Yservice nagios reload / T( m i7 B; _" ^% M7 g Z
9 N3 V4 ?. m3 E+ m8 Y" e. _4 j5 ^! J监控远程:( O. n- |5 ?! b8 Q* U
------------ T' {% m/ p m9 z+ s; X/ u8 w* k7 g
% j' N3 t( s5 o/ [/ ?0 D
在监控端,修改远程服务器的配置文件.比如: remote.cfg ) L/ I, {, v8 T2 x
$ z2 G% K4 p# c定义命令:
; \& X$ l; K' k2 hdefine service{
# J6 {2 L4 [+ C5 d% ~ use generic-service ; Name of service template to use
- P! {8 D' @% t* w! }8 J6 z host_name JP_VPS_2G6 z; d5 K: g1 T* E( N& u
service_description Disk I/O6 g) ?1 h" U; {1 T! g7 j! Y: C
check_command check_nrpe!check_iostat* D1 v8 p# r7 ] s7 B5 v
}
9 d+ {- f- N& F% x; b6 `
9 G5 Z V0 L& Q+ I2 M由于它是通过 check_nrpe 调用远程服务器上的命令.我们要在远程服务器上执行的命令就是这里 check_nrpe 命令的参数,即感叹号后的那个: check_iostat 1 l* F9 Q1 E2 a" v: e, c2 o
所以要确保被监控的机器上有 check_iostat 这个命令.安装方式和前面一样. $ t: S& r1 c! o4 z/ d; c
同时保证 check_nrpe 能顺利调用远程机器.可以通过命令尝试:
S$ A9 r' v) p: @[root@localhost libexec]# ./check_nrpe -H 111.111.44.1111 W! Q6 d$ D1 O' |# h6 j
NRPE v2.13' X) q- y! n2 F& v2 ~) u
4 R% p9 C F4 }% i8 F
0 P- h4 a2 Z( K3 B9 @然后更改被监控机器上的 /usr/local/nagios/etc/nrpe.cfg 1 P( @) P) _$ H0 M
添加命令:
$ ~2 v9 |0 r9 ]8 ecommand[check_iostat]=/usr/local/nagios/libexec/check_iostat -d sda -w 1000 -c 2000
9 A/ _! J* u8 E& `1 q重启被监控端的服务: 4 y$ g) n9 @* g- R% a
service xinetd restart
9 V$ W. @$ o4 c2 ?$ i& v( `, \7 B* Z/ _: r9 `3 o+ D, G
|