分布式安装(至少三台主机):
环境所需软件:
CentOS7
hadoop-2.7.3.tar.gz
jdk-8u102-linux-x64.tar.gz
安装前准备工作:
- 修改 /etc/hosts 文件
vim /etc/hosts
内容:
192.168.10.11 bigdata1
192.168.10.12 bigdata2
192.168.10.13 bigdata3 -
配置免密钥登陆
cd
ssh-keygen -t rsa
一直回车,直到结束ssh-copy-id .ssh/id_rsa.pub bigdata1 ssh-copy-id .ssh/id_rsa.pub bigdata2 ssh-copy-id .ssh/id_rsa.pub bigdata3
-
同步时间
通过设置计划任务实现各主机间的时间同步
vim /etc/crontab
0 0 1 root ntpdate -s time.windows.com或者部署一个时间服务器实现同步,这里就不详细讲解了 (*)hdfs-site.xml <!--数据块的冗余度,默认是3--> <property> <name>dfs.replication</name> <value>2</value> </property> <!--是否开启HDFS的权限检查,默认:true--> <!-- <property> <name>dfs.permissions</name> <value>false</value> </property> --> core-site.xml <!--NameNode的地址--> <property> <name>fs.defaultFS</name> <value>hdfs://bigdata1:9000</value> </property> <!--HDFS数据保存的目录,默认是Linux的tmp目录--> <property> <name>hadoop.tmp.dir</name> <value>/root/training/hadoop-2.7.3/tmp</value> </property> mapred-site.xml <!--MR程序运行的容器是Yarn--> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> yarn-site.xml <!--ResourceManager的地址--> <property> <name>yarn.resourcemanager.hostname</name> <value>bigdata1</value> </property> <!--NodeManager运行MR任务的方式--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> 对NameNode进行格式化: hdfs namenode -format 日志:Storage directory /root/training/hadoop-2.7.3/tmp/dfs/name has been successfully formatted. scp -r /root/training/hadoop-2.7.3 bigdata2:/root/training/hadoop-2.7.3 scp -r /root/training/hadoop-2.7.3 bigdata3:/root/training/hadoop-2.7.3 启动:start-all.sh = start-dfs.sh + start-yarn.sh 验证 (*)命令行:hdfs dfsadmin -report (*)网页:HDFS:http://192.168.157.12:50070/ Yarn:http://192.168.157.12:8088 (*)Demo:测试MapReduce程序 example: /root/training/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /input/data.txt /output/wc1204
原创文章,作者:Maggie-Hunter,如若转载,请注明出处:https://blog.ytso.com/tech/opensource/193661.html