|
|
一,安装环境与配置前准备工作
& r' S3 `2 ~ h" J$ p$ _硬件:4个虚拟机分别为master1:192.168.110.20,master2:192.168.110.21,slave1:192.168.110.22,slave2:192.168.110.23
- U- H( D; m g9 ^* y系统:红帽 CentOS6.5
( t9 j& \% u6 Z$ ]0 B6 F* [; eHADOOP版本:最新版本hadoop-2.0.0-alpha 安装包为hadoop-2.0.0-alpha.tar.gz& y$ ^- y' a; x5 R, n7 x5 H' @
下载官网地址:http://apache.etoak.com/hadoop/common/hadoop-2.0.0-alpha/2 ^) B* o2 n I2 p1 p# J
JDK版本:jdk-6u6-linux-i586.bin(最低要求为JDK 1.6)
1 M# J) \: z3 z: X1 @虚拟机的安装和LINUX的安装不介绍,GOOGLE一大堆! t, {. `( K% w. |, j- R% T5 }/ y
创建相关目录:mkdir /usr/hadoop(hadoop安装目录)mkdir /usr/java(JDK安装目录)二,安装JDK(所有节点都一样)
; \ m. `* s. w- w1,将下载好的jdk-6u6-linux-i586.bin通过SSH上传到/usr/java下" _4 K& a% }0 B
2,进入JDK安装目录cd /usr/java 并且执行chmod +x jdk-6u6-linux-i586.bin
' e8 z) X% [# l& ^$ _/ L3,执行./jdk-6u6-linux-i586.bin(一路回车,遇到yes/no全部yes,最后会done,安装成功)% D( }- y }9 e9 D* c4 J
4,配置环境变量,执行cd /etc命令后执行vi profile,在行末尾添加
* ?# a2 K0 e, R* ?% l0 [export JAVA_HOME=/usr/java/jdk1.6.0_27, H, I7 ]8 u3 f6 x. n
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar:/lib/dt.jar
& Z# S) v6 B: w. M0 ~export PATH=$JAVA_HOME/bin:$PATH5,执行chmod +x profile将其变成可执行文件
6 f- A9 S: O2 b; Q& P" i6,执行source profile使其配置立即生效. b& K2 o+ C5 C A
7,执行java -version查看是否安装成功三,修改主机名,所有节点均一样配置
1 o0 O+ y4 N9 w1,连接到主节点192.168.110.20,修改network,执行cd /etc/sysconfig命令后执行vi network,修改HOSTNAME=master1
* i6 O+ [& O, N0 G+ G" k) {2,修改hosts文件,执行cd /etc命令后执行vi hosts,在行末尾添加: y) ]$ C. F; e0 e
192.168.110.20 master1
, N; ^/ A- B; c0 i192.168.110.21 master23 E, \* s$ I% @# K3 i2 Z
192.168.110.22 slave1
: y% E3 v: G" s* w" H6 H192.168.110.23 slave2
" E/ U3 e0 ^1 u: T- l8 n' Q% ]; d3,执行hostname master1
1 U, R" E9 H- |+ m% T4,执行exit后重新连接可看到主机名以修改OK四,配置SSH无密码登陆
: E/ G% Q. }" D1,SSH无密码原理简介:首先在master上生成一个密钥对,包括一个公钥和一个私钥,并将公钥复制到所有的slave上。- h" S8 L p* `: i6 z% I' X) [
然后当master通过SSH连接slave时,slave就会生成一个随机数并用master的公钥对随机数进行加密,并发送给master。7 r& M: I0 W+ j; D
最后master收到加密数之后再用私钥解密,并将解密数回传给slave,slave确认解密数无误之后就允许master不输入密码进行连接了
9 j# g5 W* O7 i5 A1 \2,具体步骤:3 r% U& x) x+ t; ]
1、执行命令ssh-keygen -t rsa之后一路回车,查看刚生成的无密码钥对:cd .ssh 后执行ll3 H5 R: ?# g% o
2、把id_rsa.pub追加到授权的key里面去。执行命令cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
% ]6 ~# n3 D$ n" U# I8 L& }* X9 m! g6 N' d7 U3、修改权限:执行chmod 600 ~/.ssh/authorized_keys
" v' v# S- a( W+ @( r4、确保cat /etc/ssh/sshd_config 中存在如下内容
* w+ F9 G6 F0 v. r$ g. w xRSAAuthentication yes
5 |; |6 E# D: f- ~# E% n4 \/ bPubkeyAuthentication yes) o( Z! ^: T) X. J
AuthorizedKeysFile .ssh/authorized_keys
7 ]6 w5 C8 B7 v$ V+ q如需修改,则在修改后执行重启SSH服务命令使其生效:service sshd restart
, R1 p* \5 ?/ u) k ~' y9 G5、将公钥复制到所有的slave机器上:scp ~/.ssh/id_rsa.pub 192.168.110.22:~/ 然后输入yes,最后输入slave机器的密码
: T( q* J3 p- E* _6、在slave机器上创建.ssh文件夹:mkdir ~/.ssh 然后执行chmod 700 ~/.ssh(若文件夹以存在则不需要创建)# h( P9 Y3 ~: f$ A
7、追加到授权文件authorized_keys执行命令:cat ~/id_rsa.pub >> ~/.ssh/authorized_keys 然后执行chmod 600 ~/.ssh/authorized_keys* G* p: @& v( {) K5 v
8、重复第4步* ]$ a [ `' G" `( z9 D3 b ~5 ~
9、验证命令:在master机器上执行 ssh 192.168.110.22发现主机名由master1变成slave1即成功,最后删除id_rsa.pub文件:rm -r id_rsa.pub
3 X3 v& k( {; S# u$ H' M, [3,按照以上步骤分别配置master1,master2,slave1,slave2,要求每个master与每个slave之间都可以无密码登录五,安装HADOOP,所有节点都一样7 Q! d+ p" m. t9 k0 p
1,将hadoop-2.0.0-alpha.tar.gz上传到HADOOP的安装目录/usr/hadoop中
5 I! S7 x ^( Q: s3 N- y2,解压安装包:tar -zxvf hadoop-2.0.0-alpha.tar.gz
, a0 f, V: \3 Y3,创建tmp文件夹:mkdir /usr/hadoop/tmp
+ F, _- y2 `1 K3 r$ l4,配置环境变量:vi /etc/profile1 N* ] n; u0 S8 w
export HADOOP_DEV_HOME=/usr/hadoop/hadoop-2.0.0-alpha
9 U0 U7 \; C' iexport PATH=$PATH:$HADOOP_DEV_HOME/bin
7 s! `9 ^8 ^7 q, ^5 |5 iexport PATH=$PATH:$HADOOP_DEV_HOME/sbin
; m. o! y; ?- aexport HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}: N8 t y1 K4 e- \5 U8 T8 n
export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}
/ y7 c4 p+ `2 G/ X+ q3 d! O/ X: _& fexport HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}! ?0 C' A& ]+ y, P7 J% D
export YARN_HOME=${HADOOP_DEV_HOME}
8 `3 S3 U- l% i, U9 }" l, Y$ [( t2 rexport HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
. A) D" s& x/ X2 j% t! W) d( d4 q, Aexport HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
1 b& B9 U5 V# ]' vexport YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop8 H4 t1 V2 a3 p: G
5,配置HADOOP) R7 R! l. d" A9 @
配置文件位于/usr/hadoop/hadoop-2.0.0-alpha/etc/hadoop下
7 E, U, }4 C0 Z9 o3 v% N0 J" c k& ?1、创建并配置hadoop-env.sh
# s$ B, ^) t# B) F& t) m S1 ~- Nvi /usr/hadoop/hadoop-2.0.0-alpha/etc/hadoop/hadoop-env.sh 在末尾添加export JAVA_HOME=/usr/java/jdk1.6.0_27
7 ^& g' P* D/ a9 J- h: [) X2、配置core-site.xml文件
& W( [% G* y. t5 z" c8 x<property>5 s7 B) x+ v9 O
<name>hadoop.tmp.dir</name>
8 d f; V% h8 G <value>/usr/hadoop/tmp</value>9 m( X' p- N J- y' ?
</property>8 r3 b7 L) h3 ~7 V
3、创建并配置slaves:vi slaves 并添加以下内容
. z- c; d9 M" `. R# U+ V192.168.110.22
. I* ~0 [; i% }) w8 e& B$ _192.168.110.23
$ B( d3 D; {( M9 V4、配置hdfs-site.xml
* H, i* m- k( c7 |9 Q$ N<configuration>4 B" c2 k, o3 ~* Q0 ^3 `3 |, S# P
<property>
# v( S- ?+ A, _3 f9 {& ?% ^" [ <name>dfs.namenode.name.dir</name>* F$ E/ z% g4 E" Y
<value>file:/usr/hadoop/hdfs/name</value>' r& _( [' e4 m
<final>true</final>% x3 ? O" e P( o$ l7 l8 E* T! {( ]
</property><property>: ^* o- T/ S6 Q/ a; w( R, Z
<name>dfs.federation.nameservice.id</name>
J6 R/ h0 j$ E) Y* H' F2 |4 C <value>ns1</value>0 f* g! h" k' x0 H, N, Q* Y; ^
</property><property>
$ b) h3 M9 P; v# v* k+ c5 r <name>dfs.namenode.backup.address.ns1</name>7 ~2 P5 X$ a& T, s% P) G( Z
<value>192.168.110.23:50100</value>
# r8 U# w- i2 V</property><property>' O7 v8 A) E9 \, D/ f+ ?
<name>dfs.namenode.backup.http-address.ns1</name>, d% a5 S7 E5 q1 u$ y, c
<value>192.168.110.23:50105</value>
$ [6 A# ?2 e( a! Z: B</property><property>( p( H: Z! |- e# y G0 h) `( [
<name>dfs.federation.nameservices</name>7 q( ]8 _* Q8 g( }$ ?: v/ w
<value>ns1,ns2</value>0 @# \- W. F: [
</property><property>: [ O0 R# B5 p2 A/ x$ M+ M
<name>dfs.namenode.rpc-address.ns1</name>7 E0 ?# @1 K* P: r- ^5 Z
<value>192.168.110.20:9000</value>
# O7 W& ?6 E, G9 G2 N</property><property>
' Y# r% M8 o- Z: y1 |" M* i3 U3 Z <name>dfs.namenode.rpc-address.ns2</name>7 h1 l+ c" A, E( B6 F) j1 }
<value>192.168.110.21:9000</value>8 l: i& C5 p$ `0 y+ }: m) ]3 t
</property><property>
3 b4 u& M8 U: [' `1 [* z( y <name>dfs.namenode.http-address.ns1</name>
# O! t, A# m- N8 Q! t <value>192.168.110.20:23001</value>0 `8 c7 q& s# v/ F+ D0 {- Q- s0 z
</property><property>/ V1 _8 z# A0 L4 _
<name>dfs.namenode.http-address.ns2</name>
! p3 Y- \8 ^4 j; f8 i <value>192.168.110.21:13001</value>1 P5 z8 N! v8 L& _" l d
</property><property>
* G7 Z+ J. U9 w, e3 y+ o1 ] <name>dfs.dataname.data.dir</name>
: M5 P; }6 B# T6 c% q5 Q <value>file:/usr/hadoop/hdfs/data</value> v. Q, M+ k! m/ S: T) t- d8 D# u
<final>true</final>
) u5 m2 V( w4 K2 E4 N</property><property>0 h& G3 { M; A
<name>dfs.namenode.secondary.http-address.ns1</name>$ n2 l8 m3 j# y; o
<value>192.168.110.20:23002</value>
! n9 p9 z; [- H9 y</property><property>" q9 e: G0 H: N. K6 ~0 S7 K
<name>dfs.namenode.secondary.http-address.ns2</name>
3 R7 |: [7 N: z* z, _: `0 y9 { <value>192.168.110.21:23002</value>
0 c8 |" w5 }4 I, @$ Z8 C</property><property>
" k5 I$ z: T, o# D( {7 d <name>dfs.namenode.secondary.http-address.ns1</name>
2 S& L6 ?: o; U <value>192.168.110.20:23003</value>
3 A/ k1 v/ ~# v/ K4 S) _8 N</property><property>3 m: z( f# D" |2 o
<name>dfs.namenode.secondary.http-address.ns2</name>( F* g8 n. E3 Y" L$ @9 f7 {
<value>192.168.110.21:23003</value>
/ `, J- z7 Q! ~</property></configuration>5、配置yarn-site.xml
$ [( V0 U/ e! l' L<configuration><!-- Site specific YARN configuration properties -->
( ]9 Y0 s4 I1 W# W2 A4 [7 ~<property>
. k& `$ M% W: J, z# a' q6 p# d <name>yarn.resourcemanager.address</name>
8 C0 `+ t/ `9 m( w <value>192.168.110.20:18040</value>
9 ] g! R, ]4 b& L</property><property> Y5 x. o4 U9 E6 W( d5 r6 _: A
<name>yarn.resourcemanager.scheduler.address</name>( j/ \1 r6 l! A* a
<value>192.168.110.20:18030</value>
! v% x. g B2 s ?0 h/ B# f/ V" {</property><property>7 j9 n( {4 i8 X- d4 u: C7 y
<name>yarn.resourcemanager.webapp.address</name>' D9 X- l) C$ F! i/ z( S
<value>192.168.110.20:18088</value>$ P$ p& w+ f' K9 P& u7 ^
</property><property>
6 {0 y1 e+ c1 `+ q. h+ b <name>yarn.resourcemanager.resource-tracker.address</name>
4 Q1 J1 @/ S5 J7 \! d <value>192.168.110.20:18025</value> ~) e, b# ?+ b# d/ g' x
</property><property># v# f( @% n* [6 g' M- k& y2 w$ Y
<name>yarn.resourcemanager.admin.address</name>1 x" V: Z: X% L l3 a) c
<value>192.168.110.20:18141</value>
: I! D* s6 \: T; W/ O, H) Q% D) E</property><property>6 d8 m$ t6 s( f$ c- v U! O6 C
<name>yarn.nodemanager.aux-services</name>
$ O7 l2 s5 A' W; g <value>mapreduce.shuffle</value>
; x( ]+ x9 z/ ^$ |' K& z$ m</property> p0 g: I2 W$ [) S$ a: z
</configuration>六,启动HADOOP集群,并测试WORDCOUNT
# ]" N3 x+ u% G3 Y+ G7 m8 g1,格式化 namenode:分别在两个master上执行:hadoop namenode -format -clusterid eric i C' c/ T7 a# v! E6 O2 U/ `$ N1 O
2,启动HADOOP:在master1执行start-all.sh或先执行start-dfs.sh再执行start-yarn.sh
# O6 N0 L6 d! O" c( Y) v3,分别在各个节点上执行jps命令,显示结果如下即成功启动:
8 k: r2 B; W2 G, p9 n[root@master1 hadoop]# jps
& x- @& ?1 ~; K8 i6 r+ X3 W1956 Bootstrap
0 ~; \& U4 E2 ^- v4 V4183 Jps5 ^6 K3 j4 C! H1 g! R
3938 ResourceManager
3 i- Z* n( {9 o1 g3845 SecondaryNameNode
, b @+ Y$ F- P5 z# v7 d# _3652 NameNode5 }: X8 N/ y9 p! G3 k0 l: J! V% ~
[root@master2 ~]# jps7 A. {* B+ C( v' f6 `
3778 Jps* S) P+ H4 l; E/ l9 c0 L% Z* p
1981 Bootstrap, x3 o, p2 `+ p' b+ w/ F' O
3736 SecondaryNameNode. s- D; p7 h; Z# D3 ^( U) q
3633 NameNode8 N* K4 a3 t, T: o" J6 z
[root@slave1 ~]# jps
1 ^' ?' ]+ b0 l" w% u/ M3 |# y5 [3766 Jps
9 X% j: a: u% P- F3 v3675 NodeManager" v) }6 k. {- K2 V! i! K' A# R, z* A
3551 DataNode
3 \* h8 f2 s7 o# z9 Q[root@slave1 ~]# jps5 ^8 g! c |* }, U
3675 NodeManager# X2 `% u( L! s" n7 m
3775 Jps: i& t" ~( z" R" U5 w
3551 DataNode7 a, E, H6 d. A7 |$ K
4,在master1上,创建输入目录:hadoop fs -mkdir hdfs://192.168.110.20:9000/input4 R; I9 |" W) k' m
5,将/usr/hadoop/hadoop-2.0.0-alpha/目录下的所有txt文件复制到hdfs分布式文件系统的目录里,执行以下命令/ q! j0 v w/ f" T8 q
hadoop fs -put /usr/hadoop/hadoop-2.0.0-alpha/*.txt hdfs://192.168.110.20:9000/input& J% o5 T% ^9 u. K- o
6,在master1上,执行HADOOP自带的例子,wordcount包,命令如下:: ]4 F9 H, a1 w9 N/ x
cd /usr/hadoop/hadoop-2.0.0-alpha/share/hadoop/mapreduce+ z- o1 p6 f6 \
hadoop jar hadoop-mapreduce-examples-2.0.0-alpha.jar wordcount hdfs://192.168.110.20:9000/input hdfs://192.168.110.20:9000/output
1 K5 y) _8 g- i" u3 r' d) B' {$ w7,在master1上,查看结果命令如下:+ V8 d# F4 {# m
[root@master1 hadoop]# hadoop fs -ls hdfs://192.168.110.20:9000/output
3 i) c8 h2 F4 J0 @+ @Found 2 items
. I/ b: B T! J& i: Q' E2 ?-rw-r--r-- 2 root supergroup 0 2012-06-29 22:59 hdfs://192.168.110.20:9000/output/_SUCCESS
+ Z' j _8 f/ e-rw-r--r-- 2 root supergroup 8739 2012-06-29 22:59 hdfs://192.168.110.20:9000/output/part-r-00000
# X3 w' Z' e$ z! Q2 S+ V1 c[root@master1 hadoop]# hadoop fs -ls hdfs://192.168.110.20:9000/input
. ~5 L. P( f8 i1 HFound 3 items5 z- R( M; J X; V% {
-rw-r--r-- 2 root supergroup 15164 2012-06-29 22:55 hdfs://192.168.110.20:9000/input/LICENSE.txt5 k P+ l+ o( y9 h
-rw-r--r-- 2 root supergroup 101 2012-06-29 22:55 hdfs://192.168.110.20:9000/input/NOTICE.txt
! M$ f# W( g, t5 S' N-rw-r--r-- 2 root supergroup 1366 2012-06-29 22:55 hdfs://192.168.110.20:9000/input/README.txt: x5 J" Y- w( C2 [& L4 i
[root@master1 hadoop]# hadoop fs -cat hdfs://192.168.110.20:9000/output/part-r-00000即可看到每个单词的数量
3 H- Q- `+ U) Q/ Z+ J# Q/ G8,可以通过IE访问:http://192.168.110.20:23001/dfshealth.jsp% l) O! x1 G( \9 q4 D$ }
到此整个过程就结束了……… |
|