|
|
一,安装环境与配置前准备工作
* u2 c( O) O/ R9 q. c硬件:4个虚拟机分别为master1:192.168.110.20,master2:192.168.110.21,slave1:192.168.110.22,slave2:192.168.110.23
' A9 M$ B8 Y6 }$ j8 @系统:红帽 CentOS6.5( i" H- f2 j% V: \! O
HADOOP版本:最新版本hadoop-2.0.0-alpha 安装包为hadoop-2.0.0-alpha.tar.gz* I" {; ]7 o" N* u& Z& G" }2 K
下载官网地址:http://apache.etoak.com/hadoop/common/hadoop-2.0.0-alpha/1 \% y1 L9 G, O* J8 Y
JDK版本:jdk-6u6-linux-i586.bin(最低要求为JDK 1.6)
+ p$ O6 X- e) C& f& ]% m虚拟机的安装和LINUX的安装不介绍,GOOGLE一大堆
' [) v3 G& w( @1 k% z创建相关目录:mkdir /usr/hadoop(hadoop安装目录)mkdir /usr/java(JDK安装目录)二,安装JDK(所有节点都一样)
/ O6 P7 Z* C5 L9 V5 |# W1,将下载好的jdk-6u6-linux-i586.bin通过SSH上传到/usr/java下
@9 i, L4 K2 L% q* I5 o# h; s' T5 L2,进入JDK安装目录cd /usr/java 并且执行chmod +x jdk-6u6-linux-i586.bin$ ^5 y/ @. K' u. A/ }
3,执行./jdk-6u6-linux-i586.bin(一路回车,遇到yes/no全部yes,最后会done,安装成功)2 z; ^( \4 S. f; b$ f
4,配置环境变量,执行cd /etc命令后执行vi profile,在行末尾添加
# Y/ Z% L- [5 i, E/ I: q+ @" Sexport JAVA_HOME=/usr/java/jdk1.6.0_27
. p. g7 @9 n, O. texport CLASSPATH=.:$JAVA_HOME/lib/tools.jar:/lib/dt.jar
' Q3 C* s \4 Eexport PATH=$JAVA_HOME/bin:$PATH5,执行chmod +x profile将其变成可执行文件, t3 P3 D) Y5 J& t% z& K
6,执行source profile使其配置立即生效5 f! ^( r% u( u) J0 g
7,执行java -version查看是否安装成功三,修改主机名,所有节点均一样配置
$ P/ o6 I' `, z7 W+ t, G8 O: R1,连接到主节点192.168.110.20,修改network,执行cd /etc/sysconfig命令后执行vi network,修改HOSTNAME=master1
" W" q6 r8 e! t& d$ l$ B2,修改hosts文件,执行cd /etc命令后执行vi hosts,在行末尾添加:
9 k1 N3 u1 C0 f8 X192.168.110.20 master1
) J3 z2 z7 P& W' |! I1 _192.168.110.21 master29 n) Q' T/ _+ k* \* A
192.168.110.22 slave1
$ j$ J3 H. _' z$ s192.168.110.23 slave2
$ I9 W9 d [$ r6 y# `: D3,执行hostname master1 |/ @& k$ u+ l8 b
4,执行exit后重新连接可看到主机名以修改OK四,配置SSH无密码登陆
7 {" M) P: f7 E2 c0 f* y0 Q1,SSH无密码原理简介:首先在master上生成一个密钥对,包括一个公钥和一个私钥,并将公钥复制到所有的slave上。
; |9 R# l# d, K; q( C$ l然后当master通过SSH连接slave时,slave就会生成一个随机数并用master的公钥对随机数进行加密,并发送给master。
* @( m& _. X4 \. ~. `8 G最后master收到加密数之后再用私钥解密,并将解密数回传给slave,slave确认解密数无误之后就允许master不输入密码进行连接了
9 I3 V0 d. _: F% H2,具体步骤:
% c& u a6 G( @7 O+ t1、执行命令ssh-keygen -t rsa之后一路回车,查看刚生成的无密码钥对:cd .ssh 后执行ll2 [, n# j: i, n# B( _% C2 C
2、把id_rsa.pub追加到授权的key里面去。执行命令cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
4 U- j! p9 K+ i; R8 C3、修改权限:执行chmod 600 ~/.ssh/authorized_keys0 ~/ |. Q3 f9 u0 d% Z
4、确保cat /etc/ssh/sshd_config 中存在如下内容" f. `0 H3 ^4 H1 o7 B: J6 O
RSAAuthentication yes% V" [4 Q8 k$ P: C
PubkeyAuthentication yes& M- c; U+ C* C
AuthorizedKeysFile .ssh/authorized_keys8 G8 z& K3 C; q- A4 e
如需修改,则在修改后执行重启SSH服务命令使其生效:service sshd restart; { m: u; i3 _& X k
5、将公钥复制到所有的slave机器上:scp ~/.ssh/id_rsa.pub 192.168.110.22:~/ 然后输入yes,最后输入slave机器的密码
$ }7 G* s" ?* A* r; H6、在slave机器上创建.ssh文件夹:mkdir ~/.ssh 然后执行chmod 700 ~/.ssh(若文件夹以存在则不需要创建); \9 Q1 D4 l* x- G2 I
7、追加到授权文件authorized_keys执行命令:cat ~/id_rsa.pub >> ~/.ssh/authorized_keys 然后执行chmod 600 ~/.ssh/authorized_keys
) X* t9 g% ? Z/ [2 I6 i1 V8、重复第4步
! p/ J @. _) O6 @+ K1 u/ `0 ~9、验证命令:在master机器上执行 ssh 192.168.110.22发现主机名由master1变成slave1即成功,最后删除id_rsa.pub文件:rm -r id_rsa.pub
3 h9 K% m0 O! h( S- D! j J3,按照以上步骤分别配置master1,master2,slave1,slave2,要求每个master与每个slave之间都可以无密码登录五,安装HADOOP,所有节点都一样
( O6 \; C8 u$ M( ?8 l8 x# z# a1,将hadoop-2.0.0-alpha.tar.gz上传到HADOOP的安装目录/usr/hadoop中
5 c/ E0 E3 R6 C+ J: d/ N4 Q2,解压安装包:tar -zxvf hadoop-2.0.0-alpha.tar.gz! n1 W9 X3 E* m2 g+ C# v
3,创建tmp文件夹:mkdir /usr/hadoop/tmp
( Z! j3 w" O- u% k6 b, B) U4 Y4,配置环境变量:vi /etc/profile) u1 V* ~3 q! r
export HADOOP_DEV_HOME=/usr/hadoop/hadoop-2.0.0-alpha
1 M; U3 L1 L1 e f7 X, Nexport PATH=$PATH:$HADOOP_DEV_HOME/bin! n8 Q" X5 r: b1 @
export PATH=$PATH:$HADOOP_DEV_HOME/sbin
6 e/ ?7 O7 t5 F% i. s" R4 Kexport HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}
' v' ]" W: H+ z# R ]export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}
9 z8 ?' C. s' U& w0 Cexport HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}
3 w. @5 M1 _7 L6 Oexport YARN_HOME=${HADOOP_DEV_HOME}7 s% _. j- \- H+ M; W3 o e
export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop D1 \# k# X4 f% l
export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop! Y9 L1 e- z7 u. b& Z; `
export YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
" o1 `* b. W: r# j" ?) ~5,配置HADOOP
9 i5 |+ K$ s, z4 I/ N0 D配置文件位于/usr/hadoop/hadoop-2.0.0-alpha/etc/hadoop下
4 `0 j" P, w3 z0 i% r1、创建并配置hadoop-env.sh
) F+ b# f3 r/ v+ l: Svi /usr/hadoop/hadoop-2.0.0-alpha/etc/hadoop/hadoop-env.sh 在末尾添加export JAVA_HOME=/usr/java/jdk1.6.0_279 k6 G) h+ Z6 f$ V6 r7 d# L
2、配置core-site.xml文件
% g& a1 l- {+ X<property>, z) _, Y" |: t6 ~( ` X
<name>hadoop.tmp.dir</name>9 u. k" x- [( s, s. H
<value>/usr/hadoop/tmp</value>
7 l# N% A3 d3 @</property>
4 x9 t4 s' M7 d5 Z4 W3、创建并配置slaves:vi slaves 并添加以下内容
6 S$ B1 d' O, w) Z* c9 ^+ p2 t. c2 I6 B. e192.168.110.221 P& I/ W7 n0 R3 G
192.168.110.230 K2 ^3 |! d" W4 S# n o
4、配置hdfs-site.xml( d/ U+ b( }% c: x1 w
<configuration>
9 x m3 f+ f+ p& M* q y4 t<property>6 G' Z4 i7 T) q) z" u$ q
<name>dfs.namenode.name.dir</name>9 T# X& Q3 y% e+ Z! X2 w( q
<value>file:/usr/hadoop/hdfs/name</value>3 F; B- V p5 Y7 b/ ]$ P- B
<final>true</final>4 |* J: W$ ~* n
</property><property>( M2 m6 W3 B) ^2 R" _
<name>dfs.federation.nameservice.id</name> k8 m3 |/ C. i* y+ P2 C) L
<value>ns1</value>
* y* _! z9 t+ p7 A3 @</property><property>
# t9 u) }3 q) k7 |) h" x- l <name>dfs.namenode.backup.address.ns1</name>- b" C3 U5 b8 B: m5 _) {# X7 h
<value>192.168.110.23:50100</value>
" X) I' t/ o" R. J G7 W. _</property><property>1 k$ d: k+ ]; ^3 |; z0 J
<name>dfs.namenode.backup.http-address.ns1</name>
" m% P, A! n5 l6 {0 B <value>192.168.110.23:50105</value>8 D/ W3 k! z9 f/ Q
</property><property>* ^$ F' D% v& c# [9 L: A/ w
<name>dfs.federation.nameservices</name>
, G6 _% G# A8 }4 Z$ l B <value>ns1,ns2</value>/ u' ]* ]# ?7 i; {
</property><property>
% J2 P9 t8 t1 k, F5 Z: s <name>dfs.namenode.rpc-address.ns1</name># {$ D. D7 _6 E! t
<value>192.168.110.20:9000</value>- B0 G2 v& }. N+ J
</property><property>
* w, c: {$ g* `+ u <name>dfs.namenode.rpc-address.ns2</name>
- o, H# `0 `0 O" k' }& m <value>192.168.110.21:9000</value>5 E$ q, v( m, [) Q7 q6 U
</property><property>) h, O, D% ^; p
<name>dfs.namenode.http-address.ns1</name>
! l) e0 R; E7 h' l6 A. G8 \' t <value>192.168.110.20:23001</value>- d, v( }, G# [& ?0 `& U
</property><property>3 u) P7 M9 i5 J n% ~( e" |/ e9 g
<name>dfs.namenode.http-address.ns2</name>
+ o& k9 i* h' ?1 |4 e6 S: n <value>192.168.110.21:13001</value>
* t5 ?+ r! J6 N; Q6 s- R</property><property>
1 o9 j7 D# a# R/ @/ d! ?! w <name>dfs.dataname.data.dir</name>
~* U1 [. G3 b6 L( R% q, j+ k* x <value>file:/usr/hadoop/hdfs/data</value>
1 `5 i. e, P( }+ R <final>true</final>* x# Y$ E) B$ r
</property><property>6 @0 j9 s9 N% U7 I- u! r
<name>dfs.namenode.secondary.http-address.ns1</name>0 _1 O# P# @" ~+ X7 ?( e$ D+ I
<value>192.168.110.20:23002</value>
7 ?2 d% N0 v. u+ X" Y</property><property>1 s- W) y/ t" ^+ U* e% x" s ^" w1 u
<name>dfs.namenode.secondary.http-address.ns2</name>
: Q6 c# z2 \# p7 Y0 ^ <value>192.168.110.21:23002</value>
- x2 g4 d5 L8 x% p( T& @</property><property>
( P8 I% p) K. G" j2 A8 m <name>dfs.namenode.secondary.http-address.ns1</name>
+ c: `0 J# G% N. B5 p' ` <value>192.168.110.20:23003</value>$ m+ \6 p2 D2 G( }/ O! ^8 k! R
</property><property>
; T! `4 q b) \: z0 ^) ?2 a <name>dfs.namenode.secondary.http-address.ns2</name>! ~/ K& S, p! q z! E$ i; B* C/ m
<value>192.168.110.21:23003</value>8 x' x, o. z, S
</property></configuration>5、配置yarn-site.xml
7 x% _& K( C# ^# q/ B/ A<configuration><!-- Site specific YARN configuration properties -->! M! W# T' ?$ F: m7 x
<property>
8 R: A( W9 Q. Z9 j <name>yarn.resourcemanager.address</name>7 b" c3 N2 b8 a' @( z4 X
<value>192.168.110.20:18040</value>
, {5 f( h& S S* f# U! Y2 e</property><property>( Z' P. ^% ~$ J. a
<name>yarn.resourcemanager.scheduler.address</name>5 U* P5 k6 O0 w" U6 a
<value>192.168.110.20:18030</value>8 m0 b$ N1 u5 y: a2 {
</property><property>
& q4 C/ p( \* O" D- d: { <name>yarn.resourcemanager.webapp.address</name>3 b$ D3 D6 S# O$ w k. c
<value>192.168.110.20:18088</value>
" X8 t8 b* j1 W; |- c</property><property>
, C5 J2 u: @; E( y4 K0 a <name>yarn.resourcemanager.resource-tracker.address</name>
% |. T+ D# c% c) U& t7 H8 f <value>192.168.110.20:18025</value>
: p( A3 H, b3 t( N& [</property><property>
7 y6 E8 V: S3 }% v5 q <name>yarn.resourcemanager.admin.address</name>
2 s5 x: E6 q! E4 s3 F P4 U) t, m! n. S <value>192.168.110.20:18141</value>
. @: \ q C+ S5 |2 J, b( E I</property><property>
1 p% J, }( O: } <name>yarn.nodemanager.aux-services</name> P. ?: M8 j7 y% r3 h
<value>mapreduce.shuffle</value>7 A6 y5 c! d# c& R; ^, k
</property>; @' o9 }% f- f+ p# ~# _
</configuration>六,启动HADOOP集群,并测试WORDCOUNT R+ |, x5 G# A) ]
1,格式化 namenode:分别在两个master上执行:hadoop namenode -format -clusterid eric
& q- J8 O# i2 x" ?; Z( J& X2,启动HADOOP:在master1执行start-all.sh或先执行start-dfs.sh再执行start-yarn.sh
2 Z# }% i8 x `: y3,分别在各个节点上执行jps命令,显示结果如下即成功启动:7 U1 i% }# c. x/ c: M0 }7 N
[root@master1 hadoop]# jps
; ]7 B: {7 r! P8 J f1956 Bootstrap, t$ k3 `& J& ]; \: w
4183 Jps8 M4 D) s% K j. `) e% R
3938 ResourceManager
" o% _1 a8 j U! X& C3845 SecondaryNameNode
% O, u* ]# J, ?5 m/ ^0 e3652 NameNode8 Z* l: }8 s( @' Q3 d
[root@master2 ~]# jps
, Q( j& y3 {- u! n3778 Jps
1 I$ e W2 w3 u1981 Bootstrap0 c G, v5 b$ E5 A
3736 SecondaryNameNode0 V! _4 C% n0 ~3 [" g ^% N6 U. y
3633 NameNode! y* R$ I2 @: [2 I) X9 S
[root@slave1 ~]# jps
% F$ S9 p% Y) u) A# i3766 Jps L: W' P' X8 U' T! }& z+ i: c
3675 NodeManager0 B# }4 x5 U5 V# `3 |1 \
3551 DataNode: H! `' b4 r4 g3 z
[root@slave1 ~]# jps
3 G- x A9 s7 t- A% O6 w3 A+ F3675 NodeManager, i$ Z! }, \- q
3775 Jps- A; |2 s4 Z2 o9 z+ e
3551 DataNode' |8 C; O( w" ~# W3 G% m: V
4,在master1上,创建输入目录:hadoop fs -mkdir hdfs://192.168.110.20:9000/input
. ?1 E# {9 b0 P5,将/usr/hadoop/hadoop-2.0.0-alpha/目录下的所有txt文件复制到hdfs分布式文件系统的目录里,执行以下命令 I' T# x; [+ h, ?7 Q- m! R5 K0 U
hadoop fs -put /usr/hadoop/hadoop-2.0.0-alpha/*.txt hdfs://192.168.110.20:9000/input
' A3 q9 z N* a6,在master1上,执行HADOOP自带的例子,wordcount包,命令如下:
- l T& G* x& m& j5 H% H- kcd /usr/hadoop/hadoop-2.0.0-alpha/share/hadoop/mapreduce( P* M% t7 ], g4 u7 u7 c
hadoop jar hadoop-mapreduce-examples-2.0.0-alpha.jar wordcount hdfs://192.168.110.20:9000/input hdfs://192.168.110.20:9000/output
8 ^+ q+ B9 r( m7,在master1上,查看结果命令如下:
# G4 K7 S/ H0 ]% Y# T: O; ~" i: _% ?[root@master1 hadoop]# hadoop fs -ls hdfs://192.168.110.20:9000/output
; L9 ?% }4 U( T, F$ y, wFound 2 items
0 X$ d. d# j6 o% h+ B( F Z$ e-rw-r--r-- 2 root supergroup 0 2012-06-29 22:59 hdfs://192.168.110.20:9000/output/_SUCCESS
( r6 @) `( U8 s9 l9 C-rw-r--r-- 2 root supergroup 8739 2012-06-29 22:59 hdfs://192.168.110.20:9000/output/part-r-00000! g6 b/ ~, W; |: j1 }3 C1 R+ s6 w
[root@master1 hadoop]# hadoop fs -ls hdfs://192.168.110.20:9000/input
8 d- V% N6 u: A; FFound 3 items
/ ]6 d$ O( b' w7 f& S/ f-rw-r--r-- 2 root supergroup 15164 2012-06-29 22:55 hdfs://192.168.110.20:9000/input/LICENSE.txt
# Z! A P. w& X M-rw-r--r-- 2 root supergroup 101 2012-06-29 22:55 hdfs://192.168.110.20:9000/input/NOTICE.txt7 ]0 T, t& X9 Y! Y* I9 V4 B3 z
-rw-r--r-- 2 root supergroup 1366 2012-06-29 22:55 hdfs://192.168.110.20:9000/input/README.txt4 ]; p( g+ A! }) Z" y
[root@master1 hadoop]# hadoop fs -cat hdfs://192.168.110.20:9000/output/part-r-00000即可看到每个单词的数量
7 |: W7 x; p8 T |8,可以通过IE访问:http://192.168.110.20:23001/dfshealth.jsp& T7 |$ u0 ~: D( D8 I! b% Q' Q
到此整个过程就结束了……… |
|