大数据定义是:大量的非结构话的数据。量要大,要非结构化。
Hadoop 分三部分组成,1.hdfs hadoop分布式文件系统。2.MapReduce 分布式计算。3.hive 分布式存储。
操作系统:centos6.5 64
环境搭建:1.安装Hadoop,2.安装mysql, 3.安装hive,4,安装jdk
-
Hadoop安装:
下载Hadoop1.0.4,hadoop-1.0.4.tar.gz。
wget http://archive.apache.org/dist/hadoop/core/hadoop-1.0.4/hadoop-1.0.4.tar.gz
解压:
tar xzvf hadoop-1.0.4.tar.gz -C /usr
修改3个配置文件
cd /usr/hadoop-1.0.4/conf/
vi core-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
vi mapred-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
vi hdfs-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/longlong/temp/log1,/home/longlong/temp/log2</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/longlong/temp/data1,/home/longlong/temp/data2</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
配置环境变量,此处粘贴了java,hive,hadoop的环境变量
vi /etc/profile
JAVA_HOME=/home/Hadoop/jdk1.6.0_45
CLASSPATH=$JAVA_HOME/jre/lib/rt.jar
HADOOP_HOME=/usr/hadoop-1.0.4
HIVE_HOME=/usr/hive
PATH=$HADOOP_HOME/bin:$PATH:$JAVA_HOME/bin:$HIVE_HOME/bin
安装jdk:
下载jdk
wget http://download.oracle.com/otn-pub/java/jdk/6u45-b06/jdk-6u45-linux-x64.bin
chmod +x jdk-6u45-linux-x64.bin
./jdk-6u45-linux-x64.bin
配置 vi hadoop-env.sh,末尾添加jdkhome
export JAVA_HOME=/home/Hadoop/jdk1.6.0_45
进入bin目录配置 vi hadoop-config.sh
export HADOOP_HOME=${HADOOP_PREFIX}
export HADOOP_HOME_WARN_SUPPRESS=1
格式化
./hadoop namenode -format
2.mysql 安装,
3.hive安装,
下载
wget http://mirrors.cnnic.cn/apache/hive/hive-0.13.1/apache-hive-0.13.1-bin.tar.gz
tar -xf apache-hive-0.13.1-bin.tar.gz
mv apache-hive-0.13.1-bin hive
更换元数据库
cd conf/
touch hive-site.xml
vi hive-site.xml
<?xml version=”1.0″ ?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
</configuration>
下载mysql驱动,
wget http://cdn.mysql.com/Downloads/Connector-J/mysql-connector-java-5.0.8.tar.gz
tar -xzvf mysql-connector-java-5.0.8.tar.gz
cd mysql-connector-java-5.0.8
cp mysql-connector-java-5.0.8-bin.jar /usr/hive/lib/
启动
1.hdfs 启动
cd /usr/hadoop-1.0.4/bin
./start-all.sh
password/操作系统密码
2.mysql 启动
service mysqld start
mysql -uroot -proot
3.hive启动
cd /usr/hive/bin
./hive
原创文章,作者:ItWorker,如若转载,请注明出处:https://blog.ytso.com/tech/opensource/196857.html