这篇文章主要介绍“sqoop1.4.7环境搭建及mysql数据导入导出到hive的方法”,在日常操作中,相信很多人在sqoop1.4.7环境搭建及mysql数据导入导出到hive的方法问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答”sqoop1.4.7环境搭建及mysql数据导入导出到hive的方法”的疑惑有所帮助!接下来,请跟着小编一起来学习吧!
sqoop文档:http://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#_prerequisites
在hive创建表和导入数据时必须添加分隔符,否则数据导出时会报错
1.下载安装
[root@node1 ~]# wget http://mirrors.shu.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
[root@node1 ~]# tar xvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /opt/
[root@node1 ~]# cd /opt/
[root@node1 opt]# mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop-1.4.7
[root@node1 opt]# vim /etc/profile
export SQOOP_HOME=/opt/sqoop-1.4.7
export HADOOP_HOME=/opt/hadoop-2.8.5
export HADOOP_CLASSPATH=/opt/hive-2.3.4/lib/*
export HCAT_HOME=/opt/sqoop-1.4.7/testdata/hcatalog
export ACCUMULO_HOME=/opt/sqoop-1.4.7/src/java/org/apache/sqoop/accumulo
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$SQOOP_HOME/bin
[root@node1 opt]# source /etc/profile
[root@node1 opt]# sqoop help –帮助信息
[root@node1 opt]# sqoop import –help –参数帮助信息
2.修改yarn配置文件
[root@node1 ~]# vim /opt/hadoop-2.8.5/etc/hadoop/yarn-site.xml
<property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>2</value> </property>
[root@node1 ~]# scp /opt/hadoop-2.8.5/etc/hadoop/yarn-site.xml node2:/opt/hadoop-2.8.5/etc/hadoop/ –将配置文件复制到各节点
yarn-site.xml 100% 1414 804.3KB/s 00:00
[root@node1 ~]# scp /opt/hive-2.3.4/conf/hive-site.xml /opt/sqoop-1.4.7/conf/ –hive的配置文件也要放在sqoop下面,因为sqoop要调用hive
[root@node1 ~]# stop-all.sh
[root@node1 ~]# start-all.sh
3.将mysql数据导入到HDFS
参数解释:
–append 追加数据
–as-textfile 导入后形成文本文件
–columns 只导入哪些字段
–delete-target-dir –如果导入的目录存在先删除再导入
–fetch-size <n> –每次读多少数据
-m –起多少任务
-e –查询语句(select)
–table <table-name> –表名
–target-dir dir –指定HDFS目录
–warehouse-dir dir –导入的表将在此目录之下(表名与目录名一至)
–where where clause –where条件
-z –数据压缩
–direct –绕过mysql数据库,直接导入(忧化参数)
[root@node1 ~]# sqoop import –connect jdbc:mysql://172.16.9.100/hive –username hive –password system –table TBL_PRIVS –target-dir /user/sqoop –direct -m 1 –fields-terminated-by '/t'
[root@node1 ~]# hdfs dfs -ls /user/sqoop –查看导入的目录
Found 2 items
-rw-r–r– 3 root supergroup 0 2019-03-19 12:43 /user/sqoop/_SUCCESS
-rw-r–r– 3 root supergroup 176 2019-03-19 12:43 /user/sqoop/part-m-00000
[root@node1 ~]# hdfs dfs -cat /user/sqoop/part-m-00000 –查看导入的数据
6,1552878877,1,root,USER,root,USER,INSERT,6
7,1552878877,1,root,USER,root,USER,SELECT,6
8,1552878877,1,root,USER,root,USER,UPDATE,6
9,1552878877,1,root,USER,root,USER,DELETE,6
[root@node1 ~]#
4.将mysql数据导入到hive中
参数详解:
–hive-home dir 指定hive目录
–hive-import 导入到hive
–hive-database 导入指定的库
–hive-overwrite 覆盖到hive
–create-hive-table 在hive中创建表
–hive-table table-name 指定hive表名
–hive-partition-value v hive分区
[root@node1 ~]# sqoop import –connect jdbc:mysql://172.16.9.100/hive –username hive –password system –table TBL_PRIVS –target-dir /user/tmp –hive-import –hive-table tt -m 1 –create-hive-table –delete-target-dir –direct –fields-terminated-by '/t'
[root@node1 conf]# hive
Logging initialized using configuration in jar:file:/opt/hive-2.3.4/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show tables;
OK
tt
Time taken: 11.464 seconds, Fetched: 1 row(s)
hive> select * from tt;
OK
6 1552878877 1 root USER root USER INSERT 6
7 1552878877 1 root USER root USER SELECT 6
8 1552878877 1 root USER root USER UPDATE 6
9 1552878877 1 root USER root USER DELETE 6
Time taken: 3.978 seconds, Fetched: 4 row(s)
hive>
5.将mysql数据导入到hive指定的库中
[root@node1 ~]# sqoop import –connect jdbc:mysql://172.16.9.100/hive –username hive –password system –table TABLE_PARAMS –hive-import –hive-table tt1 -m 1 –create-hive-table –hive-database tong –direct –fields-terminated-by '/t'
[root@node1 conf]# hive
Logging initialized using configuration in jar:file:/opt/hive-2.3.4/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> use tong;
OK
Time taken: 14.34 seconds
hive> show tables;
OK
tt1
Time taken: 0.374 seconds, Fetched: 1 row(s)
hive> select * from tt1;
OK
6 numFiles 1
6 numRows 0
6 rawDataSize 0
6 totalSize 8
6 transient_lastDdlTime 1552878901
11 comment Imported by sqoop on 2019/03/19 15:36:21
11 numFiles 1
11 numRows 0
11 rawDataSize 0
11 totalSize 176
11 transient_lastDdlTime 1552981011
16 comment Imported by sqoop on 2019/03/19 16:04:22
16 numFiles 1
16 numRows 0
16 rawDataSize 0
16 totalSize 239
16 transient_lastDdlTime 1552982688
Time taken: 3.004 seconds, Fetched: 17 row(s)
hive>
6.将HDFS的数据导入到mysql中
[root@node1 ~]# hdfs dfs -cat /user/tmp/part-m-00000
1 2
3 4
5 6
[root@node1 ~]# sqoop export –connect jdbc:mysql://172.16.9.100/tong –username tong –password system –export-dir /user/tmp/part-m-00000 –table t1 –direct –fields-terminated-by '/t'
[root@node1 ~]# mysql -u root -psystem
Welcome to the MariaDB monitor. Commands end with ; or /g.
Your MySQL connection id is 1006876
Server version: 5.6.35 MySQL Community Server (GPL)
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '/h' for help. Type '/c' to clear the current input statement.
MySQL [(none)]> use tong
MySQL [tong]> select * from t1;
+——+——+
| a | b |
+——+——+
| 3 | 4 |
| 5 | 6 |
| 1 | 2 |
+——+——+
3 rows in set (0.00 sec)
MySQL [tong]>
报错信息:(卡在Running job不动,不向下执行)
19/03/19 11:20:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552965562217_0001
19/03/19 11:20:10 INFO impl.YarnClientImpl: Submitted application application_1552965562217_0001
19/03/19 11:20:10 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1552965562217_0001/
19/03/19 11:20:10 INFO mapreduce.Job: Running job: job_1552965562217_0001
解决方法:
[root@node1 ~]# vim /opt/hadoop-2.8.5/etc/hadoop/yarn-site.xml –限制内存,cpu的资源,并将配置文件同步到其它node,重启hadoop服务
<property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>2</value> </property>
[root@node1 ~]#
报错信息:(mysql导入到hive中)
19/03/19 14:34:25 INFO hive.HiveImport: Loading uploaded data into Hive
19/03/19 14:34:25 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
19/03/19 14:34:25 ERROR tool.ImportTool: Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50)
at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392)
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379)
解决方法:
[root@node1 ~]# vim /etc/profile –添加lib变量
export HADOOP_CLASSPATH=/opt/hive-2.3.4/lib/*
[root@node1 ~]# source /etc/profile
报错信息:(是因为sqoop和hive的jackson包冲突)
19/03/19 15:32:11 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/19 15:32:11 INFO ql.Driver: Executing command(queryId=root_20190319153153_63feddd9-a2c8-4217-97d4-23dd9840a54b): CREATE TABLE `tt` ( `TBL_GRANT_ID` BIGINT, `CREATE_TIME` INT,
`GRANT_OPTION` INT, `GRANTOR` STRING, `GRANTOR_TYPE` STRING, `PRINCIPAL_NAME` STRING, `PRINCIPAL_TYPE` STRING, `TBL_PRIV` STRING, `TBL_ID` BIGINT) COMMENT 'Imported by sqoop on 2019/03/19
15:31:49' ROW FORMAT DELIMITED FIELDS TERMINATED BY '/001' LINES TERMINATED BY '/012' STORED AS TEXTFILE
19/03/19 15:32:11 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
19/03/19 15:32:12 ERROR exec.DDLTask: java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang/Class;)Lcom/fasterxml/jackson/databind/ObjectReader;
at org.apache.hadoop.hive.common.StatsSetupConst$ColumnStatsAccurate.<clinit>(StatsSetupConst.java:165)
at org.apache.hadoop.hive.common.StatsSetupConst.parseStatsAcc(StatsSetupConst.java:297)
at org.apache.hadoop.hive.common.StatsSetupConst.setBasicStatsState(StatsSetupConst.java:230)
at org.apache.hadoop.hive.common.StatsSetupConst.setBasicStatsStateForCreateTable(StatsSetupConst.java:292)
解决方法:
[root@node1 ~]# mv /opt/sqoop-1.4.7/lib/jackson-* /home/
[root@node1 ~]# cp -a /opt/hive-2.3.4/lib/jackson-* /opt/sqoop-1.4.7/lib/
报错信息:
19/03/19 18:38:40 INFO metastore.HiveMetaStore: 0: Done cleaning up thread local RawStore
19/03/19 18:38:40 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore
19/03/19 18:38:40 ERROR tool.ImportTool: Import failed: java.io.IOException: Hive CliDriver exited with status=1
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:355)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
解决方法:
create table t1(a int,b int) row format delimited fields terminated by '/t'; –创建表时必须加分隔符
sqoop import –connect jdbc:mysql://172.16.9.100/hive –username hive –password system –table TBL_PRIVS –target-dir /user/sqoop –direct -m 1 –fields-terminated-by '/t'
到此,关于“sqoop1.4.7环境搭建及mysql数据导入导出到hive的方法”的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注亿速云网站,小编会继续努力为大家带来更多实用的文章!
原创文章,作者:745907710,如若转载,请注明出处:https://blog.ytso.com/229161.html