|
下载完后直接上传到监控端和被监控端的的:/usr/local/nagios/libexec/ 目录.
* [- N V% |; D7 S/ c给它执行权限:
v; r/ N( W6 t5 tchmod u+x check_iostat 6 ^2 l3 C8 y8 F# V9 U# v1 ]
0 e4 ^2 [4 ?" ^3 N% Z1 s
查看它的帮助: $ Z. |2 X2 G( e7 x, \7 e! q% R
[root@localhost libexec]# ./check_iostat -help; {, a i, x9 F5 A% x( N" p
# g1 j2 P! i- P& R0 N4 i
This plugin shows the I/O usage of the specified disk, using the iostat external program.$ R9 K1 s1 |' y' v: K8 ]; }& B
It prints three statistics: Transactions per second (tps), Kilobytes per second
; e Y; Q+ ^/ f% ]read from the disk (KB_read/s) and and written to the disk (KB_written/s)2 f- J, ^ S5 [4 S* B s, f! M+ v7 r
! V6 Z% v2 V- B4 e+ T( t./check_iostat:* J8 a+ x m# e
-d Device to be checked (without the full path, eg. sda)) O: f a( W- E0 ^4 A
-c ,, Sets the CRITICAL level for tps, KB_read/s and KB_written/s, respectively
4 R' h- i: W2 K* p1 M( g-w ,, Sets the WARNING level for tps, KB_read/s and KB_written/s, respectively
. G s/ T- x8 k% D" N4 D3 u* ? 9 d6 H, x5 ^) n
# ^" G3 ?6 |. t; o( w7 w可以看到,它是用来检查硬盘上每秒数据写入读取的。
/ G& Z, p% |5 e. O4 V# p, Q参数分别是: - d3 A0 X% C+ j' C
-d: 要检查的设备名称,不用写全路径 ( l: V- }2 a5 h2 o3 ?
-c: 当达到多少 KB/S 时就报 CRITICAL 级别的警 1 v- { D4 G9 c" ^+ y
-w: 当达到多少 KB/S 时就报 WARNING 级别的警 c: \. ?/ `# B6 E
: h: I! R- k! I4 [/ R
查看本机的硬盘信息: |+ H r. J( K9 x
[root@localhost libexec]# df -h
( J6 H/ ^8 u+ M5 p6 y. HFilesystem Size Used Avail Use% Mounted on
5 p, e. c* g' V ?! G" x. a/dev/mapper/VolGroup00-LogVol007 d" e; q7 q0 Z m' k" \8 L; y
128G 27G 95G 22% /
1 c4 @+ X" h$ J2 Y6 t/dev/sda1 99M 13M 82M 14% /boot; E$ O5 R" X, n) m7 W* P7 r
tmpfs 4.0G 0 4.0G 0% /dev/shm
7 ^( R( a) L0 ]! ~6 Y * `4 `/ }" c6 ~; z+ W
+ ?+ T" h0 g$ X% \上面的信息是 sda1, 那么 -d 后就写 sda
" p1 |4 c' j& E6 c另外,还有可能不是 sda 的,如: . o6 `! |& n2 l. p* C
[root@li387-161 ~]# df -h
. G: D6 t- @3 d' rFilesystem Size Used Avail Use% Mounted on( t" T n+ N; h4 D% {
/dev/xvda 79G 38G 40G 49% /
; ^# \* g7 Q: d V$ Vtmpfs 1009M 108K 1009M 1% /dev/shm* B' d3 I- i3 ~
7 ~) b, s( P* q4 [. L- r" W
1 L; R( Q, {& \上面的情况,-d 后就写 xvda 8 o8 j# _2 X* Q* ^; F2 m
( R' N2 j" |& N( C# W" f
检查是否能运行:
8 P0 d2 N/ q e3 k+ s; o[root@localhost libexec]# ./check_iostat -d sda -w 1000 -c 2000
$ L7 r. C0 E# k. _; k; O( {8 VOK - I/O stats tps=1.71 KB_read/s=2.77 KB_written/s=26.77 | 'tps'=1.71; 'KB_read/s'=2.77; 'KB_written/s'=26.77;
1 I c) J" \: S& k R8 s& \/ Z4 O
# A8 ^- S* ~ I+ N
: B* ^$ \8 }) O, U# y/ i如果不能运行,报错,先在本机安装 sysstat:
9 z% D0 V# E- W' V) p# E[root@localhost libexec]# yum install -y sysstat ) X' f) X; s8 W
如果还报错,那就根据报错的信息一步步解决. 7 R7 B9 Y: J/ a1 M! E! ]
比如我这边报过: bc: command not found ; 解决: yum install bc
5 C( j( ?5 X4 f- N+ z: L9 s/ M
# R* v. c- `" M- r3 V9 Z直到上面的 check_iostat 能正确执行.开始下一步配置. 2 d& `8 w$ a( a- Y* P; b
! l+ b4 G; d/ y+ L. t3 Z, s
Nagios 配置 4 U. q0 ?' c6 w% h
========================== 4 b2 B& ]3 c" e
监控本地:
U& u* s1 x( w p! D------------
+ ~# c% g, Q& U+ k V; m; u; h在 commands.cfg 中添加 check_iostat
7 }# ~* R9 G# l. c1 u7 v1 Q' Zdefine command{2 g% H a$ E# S& P7 I
command_name check_iostat
" O6 p* o6 ^/ u7 j4 p8 O command_line $USER1$/check_iostat -d $ARG1$ -w $ARG2$ -c $ARG3$
% K- l& N6 ]" Z {}
' x6 @ x4 N. J9 V! v+ v P/ p
& m9 ~4 Z2 R6 U) i6 P( C% I+ j定义了 check_iostat 命令,且接收三个参数.
, l% j3 K' ?- d' E. Z
* W% M5 z/ ^ s% W( Z$ @更改本地配置文件.假如叫: localhost.cfg
U- J Z" o5 ?' I8 X7 u% h在里面定义一个服务:
' {9 C. ^3 k4 m# g# g" ndefine service{$ x) s/ N& X2 y
use local-service ; Name of service template to use
* m# V/ \: ^ F+ j9 [ host_name VOD-106 z& A8 a. Q% }/ a
service_description Disk I/O' H9 r8 U. L: k' ?7 k; f7 t
check_period 24x7 ; The service can be checked at any time of the day: `4 t, B- M/ g6 e
max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state: x8 L5 P2 k0 D3 D
normal_check_interval 2 ; Check the service every 10 minutes under normal conditions
, \* R7 c; l- A8 q/ W7 g. j retry_check_interval 1 ; Re-check the service every two minutes until a hard state can be determined
9 s4 A- g6 Y3 ?8 S contact_groups admins ; Notifications get sent out to everyone in the 'admins' group3 Y4 j8 T) v7 ?' z
notification_options w,u,c,r,f ; Send notifications about warning, unknown, critical, and recovery events$ V$ T. k. _' i3 N( ^ q& [
notification_interval 1 ; + o! R7 H! i, F ? j) b
notification_period 24x7 ; Notifications can be sent out at any time+ p5 k( m7 o, W7 ?8 |' t; O
check_command check_iostat!sda!1000!20007 [8 Y& b. J, g; A. w6 r; p9 @
}
$ }/ H7 L! y$ Y0 M4 D# x 8 ?+ ]. T+ x9 g2 a) F
如上红色部分所写. 2 d: h! q" S$ W' s
用感叹号分隔开参数.
* l+ ^# {) A1 X' t1 O4 M上面共有三个参数: sda, 1000, 2000 分别对应前面 commonds.cfg 中的三个参数.
5 v: H) h2 Y! K3 [. ^: ^) \3 F: n- G0 C7 T) b
重新加载配置文件: 8 a+ o" w: L5 a, j! I6 [
service nagios reload 8 y! ~, A9 y: f4 \7 b. j. l
8 `! T' Q9 N$ a6 e
监控远程:
# @4 c4 U- z0 j9 h6 X/ q7 H------------
2 w; ^0 S) n% M/ O' J9 ^5 ]$ @4 Q * V: l% o. s3 j( Z) \
在监控端,修改远程服务器的配置文件.比如: remote.cfg 8 l0 V% @2 Y1 b5 y
4 A" F5 k4 x1 H* r% G
定义命令: : f Y: E) \$ V; y' d7 d# l
define service{* j. W s, z6 d% M7 P4 z
use generic-service ; Name of service template to use* [0 `9 U/ f n- R3 }5 F: z
host_name JP_VPS_2G
- T* D3 x1 c" K# M. v& v/ \ service_description Disk I/O# @* e3 q! a: r7 ]6 S
check_command check_nrpe!check_iostat
) Z5 ?/ g, C c; G( E6 }/ h}
$ e$ X% T' H* Z$ L/ _" U - d" X0 `; W; Y* S/ Y% O3 r
由于它是通过 check_nrpe 调用远程服务器上的命令.我们要在远程服务器上执行的命令就是这里 check_nrpe 命令的参数,即感叹号后的那个: check_iostat 4 P, B/ } K0 N$ ~3 s" C* S- e
所以要确保被监控的机器上有 check_iostat 这个命令.安装方式和前面一样. 8 R6 a, C1 U/ H( r' p! H a, M, t% N
同时保证 check_nrpe 能顺利调用远程机器.可以通过命令尝试: ( X. U3 S" v" I6 e+ Y
[root@localhost libexec]# ./check_nrpe -H 111.111.44.111
' T8 K: \3 l3 x! X1 J* ONRPE v2.13
5 K) b+ L9 G! A) Y" ^2 m 6 k ^5 W$ p/ R
5 d) Q( b; y! T
然后更改被监控机器上的 /usr/local/nagios/etc/nrpe.cfg / Q& H: F2 Q4 a/ Y* {+ j0 K* t% X
添加命令: 1 [" b1 j, a* f% Z9 g
command[check_iostat]=/usr/local/nagios/libexec/check_iostat -d sda -w 1000 -c 2000 ' h; v, m0 x9 Y) p* z8 f
重启被监控端的服务:
; {1 j! y& |* \2 t- Jservice xinetd restart ; S, q" Z% p- I
0 {& ~# @# x, J$ T H% E
|