|
|
一,安装环境与配置前准备工作+ q7 O- ` D+ C7 u. d. u$ ^
硬件:4个虚拟机分别为master1:192.168.110.20,master2:192.168.110.21,slave1:192.168.110.22,slave2:192.168.110.23
# p e& S- h& f; H5 Q" y系统:红帽 CentOS6.5. }* P: T$ z7 }- @4 {# V+ p
HADOOP版本:最新版本hadoop-2.0.0-alpha 安装包为hadoop-2.0.0-alpha.tar.gz
% E: b. O9 _+ B/ f下载官网地址:http://apache.etoak.com/hadoop/common/hadoop-2.0.0-alpha/
; L3 p. Q L" Z& eJDK版本:jdk-6u6-linux-i586.bin(最低要求为JDK 1.6)
" C- V- m/ {: [% b) ~虚拟机的安装和LINUX的安装不介绍,GOOGLE一大堆1 u# A: O4 M: o# g6 J+ w$ a G
创建相关目录:mkdir /usr/hadoop(hadoop安装目录)mkdir /usr/java(JDK安装目录)二,安装JDK(所有节点都一样)8 R( h8 V O" c6 g+ s7 {
1,将下载好的jdk-6u6-linux-i586.bin通过SSH上传到/usr/java下
3 j6 K9 d0 ?1 i2,进入JDK安装目录cd /usr/java 并且执行chmod +x jdk-6u6-linux-i586.bin" t; K# R. J- O9 Y. i
3,执行./jdk-6u6-linux-i586.bin(一路回车,遇到yes/no全部yes,最后会done,安装成功)
5 I5 X5 T% r' C! f" Q$ }1 t7 I5 e. y4,配置环境变量,执行cd /etc命令后执行vi profile,在行末尾添加
- s0 I! g1 O9 j) f k/ P/ [1 P9 Z1 x- nexport JAVA_HOME=/usr/java/jdk1.6.0_27
3 w! p" ^# v6 @ Gexport CLASSPATH=.:$JAVA_HOME/lib/tools.jar:/lib/dt.jar' [; F3 w4 h" M) T
export PATH=$JAVA_HOME/bin:$PATH5,执行chmod +x profile将其变成可执行文件
% C# x- x* I Z. t% e; T6,执行source profile使其配置立即生效: p6 h# i* d; h+ d
7,执行java -version查看是否安装成功三,修改主机名,所有节点均一样配置; l3 M s: B& V. x
1,连接到主节点192.168.110.20,修改network,执行cd /etc/sysconfig命令后执行vi network,修改HOSTNAME=master1
, X" @7 ^/ u7 v4 Z% Y) q* A, I$ k2,修改hosts文件,执行cd /etc命令后执行vi hosts,在行末尾添加:
6 l' {( P' o: D8 J" |7 |9 I: Y192.168.110.20 master1
5 V5 i$ [" o S$ U8 K# ]( Q8 Q192.168.110.21 master23 q1 X* |, i) Q9 z
192.168.110.22 slave1) ~' E7 |" }% N% p) u/ d
192.168.110.23 slave2/ v( W5 Q9 O6 u" h& Y6 [0 h: d b
3,执行hostname master18 c) j; ?1 t- Z, \/ J
4,执行exit后重新连接可看到主机名以修改OK四,配置SSH无密码登陆
2 f1 G, M# ~% U; s6 U1,SSH无密码原理简介:首先在master上生成一个密钥对,包括一个公钥和一个私钥,并将公钥复制到所有的slave上。/ f$ ?" t6 r2 p# x5 C% V- k0 O
然后当master通过SSH连接slave时,slave就会生成一个随机数并用master的公钥对随机数进行加密,并发送给master。; E) { {8 r) k& s5 S
最后master收到加密数之后再用私钥解密,并将解密数回传给slave,slave确认解密数无误之后就允许master不输入密码进行连接了$ C: R7 c3 k' {, v
2,具体步骤:
0 R: B$ I+ b% v( U3 Q1、执行命令ssh-keygen -t rsa之后一路回车,查看刚生成的无密码钥对:cd .ssh 后执行ll
, A* N% V: H# h" w/ l! F2、把id_rsa.pub追加到授权的key里面去。执行命令cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
5 D% H" @5 X6 k7 b9 F- h% N3、修改权限:执行chmod 600 ~/.ssh/authorized_keys
/ R7 A5 G+ a! Y9 w2 l5 x6 D3 ^4、确保cat /etc/ssh/sshd_config 中存在如下内容7 W( K& d7 c9 `! k, o5 N
RSAAuthentication yes
5 B8 B4 g W# K# u5 {2 E( m: I: nPubkeyAuthentication yes# I4 P# w5 G+ F% _
AuthorizedKeysFile .ssh/authorized_keys- S1 i! e, @( q! E+ c
如需修改,则在修改后执行重启SSH服务命令使其生效:service sshd restart( |- {1 `' c! O6 O# B1 n6 Q) ~/ x
5、将公钥复制到所有的slave机器上:scp ~/.ssh/id_rsa.pub 192.168.110.22:~/ 然后输入yes,最后输入slave机器的密码
6 ^: j9 ^1 U" _! [% u* y4 P6、在slave机器上创建.ssh文件夹:mkdir ~/.ssh 然后执行chmod 700 ~/.ssh(若文件夹以存在则不需要创建); h1 q' B1 P* ^ V, k' L2 f
7、追加到授权文件authorized_keys执行命令:cat ~/id_rsa.pub >> ~/.ssh/authorized_keys 然后执行chmod 600 ~/.ssh/authorized_keys
" U; x; q* D. x; _% K8、重复第4步
1 S: F' L2 u2 F9 f- B- T9、验证命令:在master机器上执行 ssh 192.168.110.22发现主机名由master1变成slave1即成功,最后删除id_rsa.pub文件:rm -r id_rsa.pub. e, w( m( Q' W3 @' B9 r
3,按照以上步骤分别配置master1,master2,slave1,slave2,要求每个master与每个slave之间都可以无密码登录五,安装HADOOP,所有节点都一样
% H8 Y5 E7 ` [1,将hadoop-2.0.0-alpha.tar.gz上传到HADOOP的安装目录/usr/hadoop中1 q: p$ m9 y% A$ B
2,解压安装包:tar -zxvf hadoop-2.0.0-alpha.tar.gz( S9 ^) F. r' m- E6 Z
3,创建tmp文件夹:mkdir /usr/hadoop/tmp
1 G. N' t3 M$ L; D2 K! D; c4,配置环境变量:vi /etc/profile
* F# K/ e: ^3 l! Vexport HADOOP_DEV_HOME=/usr/hadoop/hadoop-2.0.0-alpha+ e: u' q" Z! }: u% I
export PATH=$PATH:$HADOOP_DEV_HOME/bin
& a C) s; W" W/ L# n" }/ A9 w5 R8 L/ eexport PATH=$PATH:$HADOOP_DEV_HOME/sbin6 Z* @- G. J! H" [
export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}4 M# j! X& L4 l4 V: _. x
export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}: i) m/ `- g3 |8 y, P" L. o
export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}
- @# @, s# N0 \/ x) Z; v% c% Pexport YARN_HOME=${HADOOP_DEV_HOME}1 R7 e4 m! Q% w# s: x
export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
2 o G+ n. _# A( j. U5 wexport HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
: p1 B0 a. }: G7 R9 ]1 n8 dexport YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
+ K- w9 ?4 t9 D3 i, G9 k3 p1 ~4 u5,配置HADOOP
3 M* Y% j6 K/ g. `* q+ J配置文件位于/usr/hadoop/hadoop-2.0.0-alpha/etc/hadoop下
) H7 _) b9 R0 C' S1、创建并配置hadoop-env.sh
% x: X( \ }* C7 _vi /usr/hadoop/hadoop-2.0.0-alpha/etc/hadoop/hadoop-env.sh 在末尾添加export JAVA_HOME=/usr/java/jdk1.6.0_27! c) }( _" k, Y% ]* {3 j1 s
2、配置core-site.xml文件
4 i9 K% ?, e! V# s<property>
( z# x& ]! Y; S2 |1 ? <name>hadoop.tmp.dir</name>
5 F7 c8 x) T; s& {) G9 g2 j <value>/usr/hadoop/tmp</value>0 |/ H- v' q/ F
</property>
+ Z/ a c7 s0 e- Z5 ]( }3、创建并配置slaves:vi slaves 并添加以下内容
4 k% ^% K+ Z9 @ B& u! A192.168.110.22 o/ q/ u4 y+ y8 Z
192.168.110.23
0 X+ M/ @" a% ?' z4 z; }- S0 A% H5 c4、配置hdfs-site.xml
: {9 R" R+ b; W, q<configuration>
$ c- ^: s8 j9 n/ N% i" j. Z7 e<property>
s/ [7 [2 v, E' }+ x <name>dfs.namenode.name.dir</name>, f- ]7 u* W6 A3 e. I
<value>file:/usr/hadoop/hdfs/name</value>8 }1 z6 M9 D2 |& J/ o8 D: }
<final>true</final>- E$ D! w, L" {+ I z
</property><property>
+ }4 \6 g/ Y! q <name>dfs.federation.nameservice.id</name>0 R+ w7 l1 P2 t% Y
<value>ns1</value>. z& b8 Q6 s8 i/ ]& Z& M9 Q
</property><property>
5 q! C F& }6 ]9 O3 m6 u- R0 g <name>dfs.namenode.backup.address.ns1</name>! n7 Z% q- i/ I( d8 p9 n
<value>192.168.110.23:50100</value>; e/ ~" e6 T1 G' I
</property><property>
* i* y' f' ?) ?( f: X# N <name>dfs.namenode.backup.http-address.ns1</name># @9 n; c3 S/ y. t7 e, W
<value>192.168.110.23:50105</value>
^) \5 j9 X* c7 Q3 B. ]</property><property>
$ ]! w' e7 _9 c7 I' i ?$ Y <name>dfs.federation.nameservices</name>9 y; N. l. {9 e D1 }" P- z8 Y8 I
<value>ns1,ns2</value># Z/ h) f0 D$ J
</property><property>7 J$ a/ H- ~+ k# }/ Z& G5 j2 [
<name>dfs.namenode.rpc-address.ns1</name>& f$ _8 \# w5 I r
<value>192.168.110.20:9000</value>
- Y% D1 t1 o) X+ u% [" I</property><property>
' k, ]* U! z$ k/ y" l <name>dfs.namenode.rpc-address.ns2</name>; H# @# p: [5 g8 G% h
<value>192.168.110.21:9000</value>/ e- [" ? i6 j* e" p! `/ r+ z
</property><property> T8 z2 w" N7 [
<name>dfs.namenode.http-address.ns1</name>
4 p8 q# ]8 L; Q2 i( c, A1 g* ]0 ] <value>192.168.110.20:23001</value>
9 J8 \) C" Z$ Q/ D# h</property><property>
$ ^& n" p/ r; I4 \ <name>dfs.namenode.http-address.ns2</name>6 s B* a. y( ~) @. V4 E2 S
<value>192.168.110.21:13001</value>
; W% b0 L! u8 e Y</property><property>
2 ^0 e7 A& V; ]6 P0 I) N! n <name>dfs.dataname.data.dir</name>
5 g% C$ F; R+ w/ W9 [" e" {4 q8 m c# N <value>file:/usr/hadoop/hdfs/data</value>
$ p6 q9 C& Y x <final>true</final>
$ U' ?8 l% C' T$ Q: \* g</property><property>
/ x, D; E) J$ G+ f; L$ S+ j+ r <name>dfs.namenode.secondary.http-address.ns1</name>
6 J: q* R% C- y# N; D4 U& Y6 v <value>192.168.110.20:23002</value>
1 e; t+ D; `5 D% S</property><property>
! R0 o4 b. O. O) H <name>dfs.namenode.secondary.http-address.ns2</name>
1 V" o- W( H7 ]$ z, [& w <value>192.168.110.21:23002</value>, g. W6 d! Z- H# u1 M" D
</property><property>
/ ^) ?: j2 f$ h0 y m O0 L <name>dfs.namenode.secondary.http-address.ns1</name>! W5 [/ ]) @4 T* j5 G/ ~' o- c
<value>192.168.110.20:23003</value>& {7 r7 `( ]2 a/ ~5 c
</property><property>
2 K+ I' x* J( m7 E# H4 D <name>dfs.namenode.secondary.http-address.ns2</name>
. h! } c( D2 a2 V' c <value>192.168.110.21:23003</value>
# t& b$ ]* C7 N) S</property></configuration>5、配置yarn-site.xml
" [4 x( U0 R* b/ D; o! f<configuration><!-- Site specific YARN configuration properties -->
6 i: N/ ~5 d$ D d6 D! b+ Z6 T<property>5 v# t* q% `. V( [* L- G# L
<name>yarn.resourcemanager.address</name>* o% O5 q" U; h0 L" _$ v" b: p
<value>192.168.110.20:18040</value>
2 J$ N. u% R/ M! Z; l' e</property><property> a( b/ M0 P) S \; o5 }; m
<name>yarn.resourcemanager.scheduler.address</name>
1 j$ y5 B& x; n <value>192.168.110.20:18030</value>
7 h0 X, x* T+ c8 A# A, Q, Q* {</property><property>9 R9 a* O" {3 a* e2 s! y
<name>yarn.resourcemanager.webapp.address</name>
d9 d0 x7 ?1 v$ S' w. x <value>192.168.110.20:18088</value>
9 e! O! E$ I5 f% i8 Q$ E, B</property><property>+ B- J( A. r- ?' O5 u* a% y, q
<name>yarn.resourcemanager.resource-tracker.address</name>
/ Q& D1 p! f* u @" _0 J <value>192.168.110.20:18025</value>
3 X9 ]$ [8 ~# ~0 [, d+ c/ E4 z</property><property>
' x: ^* f8 n+ G8 r; b <name>yarn.resourcemanager.admin.address</name>. d6 B7 |7 l b3 ?0 t, Z
<value>192.168.110.20:18141</value>: A2 x: @" a( ?" d% p3 U
</property><property>
6 u) r n1 W$ ?" i8 I; t <name>yarn.nodemanager.aux-services</name>9 o1 q/ R' x" f
<value>mapreduce.shuffle</value>+ r; z/ _9 e1 F
</property>+ ` Y0 E% k! F7 I& P! R# `5 p
</configuration>六,启动HADOOP集群,并测试WORDCOUNT
" V6 c J. [: I6 K0 D1,格式化 namenode:分别在两个master上执行:hadoop namenode -format -clusterid eric# \# G* u ~# ~; _
2,启动HADOOP:在master1执行start-all.sh或先执行start-dfs.sh再执行start-yarn.sh
- o$ g3 ]2 n& t6 G5 T$ j3,分别在各个节点上执行jps命令,显示结果如下即成功启动:( W9 Y- l* R& ? H4 }3 e1 }9 K
[root@master1 hadoop]# jps% n4 B+ |/ b9 X$ ?' ]- b. W$ I: D) N
1956 Bootstrap
/ h% G4 X+ n$ J" |( M4183 Jps# Z- Q' m, \& u/ ?: S8 L' g
3938 ResourceManager! j7 ]+ f2 M3 \$ a3 }% b0 w* g' a
3845 SecondaryNameNode
( n& k& L/ T% O3 F3652 NameNode; l4 r8 r8 @* C0 T4 V
[root@master2 ~]# jps! k' y7 m. |# V0 d% [5 G9 h/ o
3778 Jps
6 j- x- M8 j c) R, W4 N' q! Q1981 Bootstrap" S' T3 i; ^* A; `/ S# Y* G1 ]! E
3736 SecondaryNameNode
6 h1 n4 ~2 M' h5 F4 O" D4 Q( D, }5 G6 ]3633 NameNode# Y7 s2 t* {1 X, Z5 h
[root@slave1 ~]# jps# H) h4 T a; V$ }5 P+ v' ~
3766 Jps% Y" M% e- q- [$ `
3675 NodeManager/ Y0 N9 X7 F( f8 u1 D$ E/ L
3551 DataNode% y$ I. ]) @* N: E( @: O
[root@slave1 ~]# jps
- V9 w! P- L8 M4 J8 V3675 NodeManager
! E2 a# A, e( G. Q- k, }! r3775 Jps9 V$ C7 `, T& x* ? s3 N% i
3551 DataNode5 ^* q# F6 B. q8 e
4,在master1上,创建输入目录:hadoop fs -mkdir hdfs://192.168.110.20:9000/input8 h4 ~& U6 Y E0 Z& m
5,将/usr/hadoop/hadoop-2.0.0-alpha/目录下的所有txt文件复制到hdfs分布式文件系统的目录里,执行以下命令
& m, }6 }' ` q( I% e( t' yhadoop fs -put /usr/hadoop/hadoop-2.0.0-alpha/*.txt hdfs://192.168.110.20:9000/input
5 H- Q7 w. T2 L$ M, Y6,在master1上,执行HADOOP自带的例子,wordcount包,命令如下:
: @- @0 g+ M" F! q+ j0 W7 r, }% Bcd /usr/hadoop/hadoop-2.0.0-alpha/share/hadoop/mapreduce
8 V, P7 m) @' {4 `1 k. V0 ]# \hadoop jar hadoop-mapreduce-examples-2.0.0-alpha.jar wordcount hdfs://192.168.110.20:9000/input hdfs://192.168.110.20:9000/output* c: l ?* _$ u- Q7 o. B
7,在master1上,查看结果命令如下:: L# [9 G9 Y3 }. P( i" [! k7 q
[root@master1 hadoop]# hadoop fs -ls hdfs://192.168.110.20:9000/output: @: v, {9 R0 K8 {# b
Found 2 items
) u6 y+ m; N! S- S# B: V" [-rw-r--r-- 2 root supergroup 0 2012-06-29 22:59 hdfs://192.168.110.20:9000/output/_SUCCESS
6 `, L5 [' _1 w+ E; H& f/ u- I2 W-rw-r--r-- 2 root supergroup 8739 2012-06-29 22:59 hdfs://192.168.110.20:9000/output/part-r-00000
& k y4 q# L) \2 h3 Q[root@master1 hadoop]# hadoop fs -ls hdfs://192.168.110.20:9000/input
+ L X( I# M, H9 r/ e, w; tFound 3 items$ r2 z* i$ f$ Y9 F( }( k7 [
-rw-r--r-- 2 root supergroup 15164 2012-06-29 22:55 hdfs://192.168.110.20:9000/input/LICENSE.txt" } E0 R7 O, f+ _* c1 L' l
-rw-r--r-- 2 root supergroup 101 2012-06-29 22:55 hdfs://192.168.110.20:9000/input/NOTICE.txt
: c% G! E1 i' h9 p-rw-r--r-- 2 root supergroup 1366 2012-06-29 22:55 hdfs://192.168.110.20:9000/input/README.txt
8 N& e; @$ Z- ^[root@master1 hadoop]# hadoop fs -cat hdfs://192.168.110.20:9000/output/part-r-00000即可看到每个单词的数量, o4 Y' J+ f- j+ W
8,可以通过IE访问:http://192.168.110.20:23001/dfshealth.jsp
: w8 x1 X: t$ {1 `8 h到此整个过程就结束了……… |
|