生产背景:
在从mysql导入到hive中,遇到如下问题:
从hive导出到mysql中,遇到如下问题:
sqoop 缺点:
1 基于命令行的操作方式,易出错,且不安全。
2 数据传输和数据格式是紧耦合的,这使得connector无法支持所有的数据格式
3 用户名和密码暴漏出来
4 sqoop安装需要root权限
Sqoop优点:
1 高效可控的利用资源,任务并行度,超时时间。
2 数据类型映射与转化,可自动进行,用户也可自定义 .
3 支持多种主流数据库,MySQL,Oracle,SQL Server,DB2等等 。
Sqoop原理:
Sqoop的inport原理:
Sqoop的export原理:
验证sqoop的各种报错:
1 mysql字段太短
2 hive的空字段转换
3 分隔符错误
4 mysql的网络不在集群网络中
5 mysql停止服务
6 mysql utf8编码只是3个字节,可能是因为某些unicode字符转成utf8之后变成了4个字节,需要mysql支持utf8mb4
7 Sqoop调式信息
8 修改生成的Java类,重新打包。
Sqoop命令行说明
生产背景:
在从mysql导入到hive中,遇到如下问题:
1) 源mysql和集群机器不在同一个网段中,导致执行导入命令,网络连接失败。
2) 某些字符导入到hive中,出现报错终止。
2.1 sqoop使用的JDBC-connector 版本太低(更换版本)。
从hive导出到mysql中,遇到如下问题:
1)某些字符插入mysql,出现报错终止。
1.1 可能mysql本身编码的限制,某些字符不支持,比如uft8和utf8mb4
1.2 sqoop使用的JDBC-connector 版本太低(更换版本)。
sqoop 缺点:
1 基于命令行的操作方式,易出错,且不安全。
2 数据传输和数据格式是紧耦合的,这使得connector无法支持所有的数据格式
3 用户名和密码暴漏出来
4 sqoop安装需要root权限
Sqoop优点:
1 高效可控的利用资源,任务并行度,超时时间。
2 数据类型映射与转化,可自动进行,用户也可自定义 .
3 支持多种主流数据库,MySQL,Oracle,SQL Server,DB2等等 。
Sqoop原理:
Sqoop的inport原理:
Sqoop在import时,需要制定split-by参数。Sqoop根据不同的split-by参数值来进行切分,然后将切分出来的区域分配到不同map中。每个map中再处理数据库中获取的一行一行的值,写入到HDFS中。同时split-by根据不同的参数类型有不同的切分方法,如比较简单的int型,Sqoop会取最大和最小split-by字段值,然后根据传入的num-mappers来确定划分几个区域。 比如select max(split_by),min(split-by) from得到的max(split-by)和min(split-by)分别为1000和1,而num-mappers为2的话,则会分成两个区域(1,500)和(501-100),同时也会分成2个sql给2个map去进行导入操作,分别为select XXX from table where split-by>=1 and split-by<500和select XXX from table where split-by>=501 and split-by<=1000。最后每个map各自获取各自SQL中的数据进行导入工作。
Sqoop的export原理:根据mysql表名称,生成一个以表名称命名的Java类,该类继承了sqoopRecord的,是一个只有Map的MR,且自定义了输出字段。
sqoop export –connect jdbc:mysql://$url:3306/$3?characterEncoding=utf8 –username $username –password $password –table $1 –export-dir $2 –input-fields-terminated-by ‘|’ –null-non-string ‘0’ –null-string ‘0’;
验证sqoop的各种报错:
mysql表
create table dm_trlog (
plat varchar(20),
user_id varchar(20),
click_time varchar(20),
click_url varchar(200)
)
hive 表
CREATE TABLE TRLOG
(
PLATFORM string,
USER_ID int,
CLICK_TIME string,
CLICK_URL string
)
row format delimited
fields terminated by ‘/t’;
sqoop list-databases –connect jdbc:mysql://192.168.119.129:3306/ –username li72 –password 123
1 mysql字段太短
sqoop export –connect jdbc:mysql://192.168.119.129:3306/student?characterEncoding=utf8 –username li72 –password 123 –table dm_trlog –export-dir /home/bigdata/hive/data/db1.db/trlog –input-fields-terminated-by ‘/t’ –null-non-string ‘0’ –null-string ‘0’;
Warning: $HADOOP_HOME is deprecated.
14/11/06 01:42:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/06 01:42:32 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/11/06 01:42:32 INFO tool.CodeGenTool: Beginning code generation
14/11/06 01:42:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_trlog` AS t LIMIT 1
14/11/06 01:42:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_trlog` AS t LIMIT 1
14/11/06 01:42:34 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/bigdata/hadoop
Note: /tmp/sqoop-root/compile/d5e37c20a9231b3253c97fc27d16d8a9/dm_trlog.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/11/06 01:42:38 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/d5e37c20a9231b3253c97fc27d16d8a9/dm_trlog.jar
14/11/06 01:42:38 INFO mapreduce.ExportJobBase: Beginning export of dm_trlog
14/11/06 01:42:43 INFO input.FileInputFormat: Total input paths to process : 1
14/11/06 01:42:43 INFO input.FileInputFormat: Total input paths to process : 1
14/11/06 01:42:43 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/11/06 01:42:43 WARN snappy.LoadSnappy: Snappy native library not loaded
14/11/06 01:42:44 INFO mapred.JobClient: Running job: job_201411060114_0001
14/11/06 01:42:45 INFO mapred.JobClient: map 0% reduce 0%
14/11/06 01:43:15 INFO mapred.JobClient: Task Id : attempt_201411060114_0001_m_000000_0, Status : FAILED
java.io.IOException: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column ‘click_time’ at row 1
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.close(AsyncSqlRecordWriter.java:192)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column ‘click_time’ at row 1
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4118)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4052)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2503)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2664)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2794)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1379)
at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:233)
mysql字段太短了
drop table dm_trlog;
create table dm_trlog (
plat varchar(20),
user_id varchar(20),
click_time varchar(200),
click_url varchar(200)
)
2 hive的空字段转换
由于Hive的NULL用/N来表示,字段用/001来分割,换行用/n来换行,导出分隔符一定要和hive表保持一致,如果为空可以指定转换为0,有些mysql数字字段不能插入/N
加上两个参数:–input-null-string ‘//N’ –input-null-non-string ‘//N’,多加一个‘/’,是为转义
sqoop export –connect jdbc:mysql://192.168.119.129:3306/student?characterEncoding=utf8 –username li72 –password 123 –table dm_trlog –export-dir /home/bigdata/hive/data/db1.db/trlog –input-fields-terminated-by ‘/t’ –null-non-string ‘0’ –null-string ‘0’;
14/10/23 04:53:47 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/915d24128af59c9e517580e0f07411d4/dm_pc_play_kpi.jar
14/10/23 04:53:47 INFO mapreduce.ExportJobBase: Beginning export of dm_pc_play_kpi
14/10/23 04:53:48 INFO input.FileInputFormat: Total input paths to process : 1
14/10/23 04:53:48 INFO input.FileInputFormat: Total input paths to process : 1
14/10/23 04:53:48 WARN snappy.LoadSnappy: Snappy native library is available
14/10/23 04:53:48 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/10/23 04:53:48 INFO snappy.LoadSnappy: Snappy native library loaded
14/10/23 04:53:48 INFO mapred.JobClient: Running job: job_201408301703_84117
14/10/23 04:53:49 INFO mapred.JobClient: map 0% reduce 0%
14/10/23 04:55:45 INFO mapred.JobClient: Task Id : attempt_201408301703_84117_m_000000_0, Status : FAILED
java.io.IOException: Can’t export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.NumberFormatException: For input string: “N”
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
at java.lang.Float.valueOf(Float.java:417)
at dm_pc_play_kpi.__loadFromFields(dm_pc_play_kpi.java:335)
at dm_pc_play_kpi.parse(dm_pc_play_kpi.java:282)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
… 10 more
14/10/23 04:55:53 INFO mapred.JobClient: Task Id : attempt_201408301703_84117_m_000000_1, Status : FAILED
14/10/23 04:55:58 INFO mapred.JobClient: Task Id : attempt_201408301703_84117_m_000000_2, Status : FAILED
java.io.IOException: Can’t export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.NumberFormatException: For input string: “N”
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
at java.lang.Float.valueOf(Float.java:417)
at dm_pc_play_kpi.__loadFromFields(dm_pc_play_kpi.java:335)
at dm_pc_play_kpi.parse(dm_pc_play_kpi.java:282)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
… 10 more
3 分隔符错误
Hive中的分隔符是’/t’ 但是导出写成’|’
sqoop export –connect jdbc:mysql://192.168.119.129:3306/student?characterEncoding=utf8 –username li72 –password 123 –table dm_trlog –export-dir /home/bigdata/hive/data/db1.db/trlog –input-fields-terminated-by ‘|’ –null-non-string ‘0’ –null-string ‘0’;
为了测试,特意把分隔符改成 “|”
14/11/06 01:50:19 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/06 01:50:19 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/11/06 01:50:19 INFO tool.CodeGenTool: Beginning code generation
14/11/06 01:50:20 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_trlog` AS t LIMIT 1
14/11/06 01:50:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_trlog` AS t LIMIT 1
14/11/06 01:50:21 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/bigdata/hadoop
Note: /tmp/sqoop-root/compile/e474b3f8292f91dd4134b302ae35df19/dm_trlog.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/11/06 01:50:25 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/e474b3f8292f91dd4134b302ae35df19/dm_trlog.jar
14/11/06 01:50:25 INFO mapreduce.ExportJobBase: Beginning export of dm_trlog
14/11/06 01:50:45 INFO input.FileInputFormat: Total input paths to process : 1
14/11/06 01:50:45 INFO input.FileInputFormat: Total input paths to process : 1
14/11/06 01:50:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/11/06 01:50:45 WARN snappy.LoadSnappy: Snappy native library not loaded
14/11/06 01:50:51 INFO mapred.JobClient: Running job: job_201411060114_0003
14/11/06 01:50:52 INFO mapred.JobClient: map 0% reduce 0%
14/11/06 01:51:20 INFO mapred.JobClient: Task Id : attempt_201411060114_0003_m_000000_0, Status : FAILED
java.io.IOException: Can’t export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at dm_trlog.__loadFromFields(dm_trlog.java:252)
at dm_trlog.parse(dm_trlog.java:201)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
… 10 more
14/11/06 01:51:21 INFO mapred.JobClient: Task Id : attempt_201411060114_0003_m_000001_0, Status : FAILED
java.io.IOException: Can’t export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at dm_trlog.__loadFromFields(dm_trlog.java:252)
at dm_trlog.parse(dm_trlog.java:201)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
… 10 more
4 mysql的网络不在集群网络中
sqoop export –connect jdbc:mysql://192.168.119.1:3306/student?characterEncoding=utf8 –username li72 –password 123 –table dm_trlog –export-dir /home/bigdata/hive/data/db1.db/trlog –input-fields-terminated-by ‘/t’ –null-non-string ‘0’ –null-string ‘0’;
Warning: $HADOOP_HOME is deprecated.
14/11/06 02:04:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/06 02:04:30 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/11/06 02:04:30 INFO tool.CodeGenTool: Beginning code generation
14/11/06 02:07:40 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:355)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2461)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2498)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2283)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:822)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:404)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:317)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:745)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:605)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:628)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:235)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:219)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:283)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1255)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1072)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:82)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.<init>(Socket.java:375)
at java.net.Socket.<init>(Socket.java:218)
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:259)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:305)
… 32 more
5 mysql停止服务
14/11/06 04:55:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
14/11/06 04:55:25 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/06 04:55:25 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory
14/11/06 04:55:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory
14/11/06 04:55:25 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:mysql:
14/11/06 04:55:25 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/11/06 04:55:25 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.MySQLManager@8a0d5d
14/11/06 04:55:25 INFO tool.CodeGenTool: Beginning code generation
14/11/06 04:55:26 DEBUG manager.SqlManager: No connection paramenters specified. Using regular API for making connection.
14/11/06 04:55:27 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:355)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2461)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2498)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2283)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:822)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:404)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:317)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:745)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:605)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:628)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:235)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:219)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:283)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1255)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1072)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:82)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.<init>(Socket.java:375)
at java.net.Socket.<init>(Socket.java:218)
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:259)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:305)
… 32 more
14/11/06 04:55:27 DEBUG manager.SqlManager: No connection paramenters specified. Using regular API for making connection.
14/11/06 04:55:27 ERROR manager.CatalogQueryManager: Failed to list columns from query: SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = (SELECT SCHEMA()) AND TABLE_NAME = ‘dm_trlog’
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:355)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2461)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2498)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2283)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:822)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:404)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:317)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:745)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.CatalogQueryManager.getColumnNames(CatalogQueryManager.java:147)
at org.apache.sqoop.orm.ClassWriter.getColumnNames(ClassWriter.java:1222)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1074)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:82)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.<init>(Socket.java:375)
at java.net.Socket.<init>(Socket.java:218)
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:259)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:305)
… 28 more
14/11/06 04:55:27 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
6 mysql utf8编码只是3个字节,可能是因为某些unicode字符转成utf8之后变成了4个字节,需要mysql支持utf8mb4
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
14/11/09 07:00:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/09 07:00:33 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/11/09 07:00:33 INFO tool.CodeGenTool: Beginning code generation
14/11/09 07:00:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_go_snger` AS t LIMIT 1
14/11/09 07:00:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_go_snger` AS t LIMIT 1
14/11/09 07:00:34 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop/hadoop-1.1.2
Note: /tmp/sqoop-hadoop/compile/211b9679d4ac771d3ff710cbbd1c7277/dm_go_snger.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/11/09 07:00:35 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/211b9679d4ac771d3ff710cbbd1c7277/dm_go_snger.jar
14/11/09 07:00:35 INFO mapreduce.ExportJobBase: Beginning export of dm_go_snger
14/11/09 07:00:36 INFO input.FileInputFormat: Total input paths to process : 1
14/11/09 07:00:36 INFO input.FileInputFormat: Total input paths to process : 1
14/11/09 07:00:36 WARN snappy.LoadSnappy: Snappy native library is available
14/11/09 07:00:36 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/11/09 07:00:36 INFO snappy.LoadSnappy: Snappy native library loaded
14/11/09 07:00:37 INFO mapred.JobClient: Running job: job_201408301703_121362
14/11/09 07:00:38 INFO mapred.JobClient: map 0% reduce 0%
14/11/09 07:00:52 INFO mapred.JobClient: Task Id : attempt_201408301703_121362_m_000000_0, Status : FAILED
java.io.IOException: Can’t export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: java.sql.SQLException: Incorrect string value: ‘xF3x90x8Cx92xEFxBF…’ for column ‘singer’ at row 1
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:220)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84)
… 10 more
Caused by: java.sql.SQLException: Incorrect string value: ‘xF3x90x8Cx92xEFxBF…’ for column ‘singer’ at row 1
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1072)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3563)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3495)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1959)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2113)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2693)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2102)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1364)
at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:233)
14/11/09 07:00:59 INFO mapred.JobClient: Task Id : attempt_201408301703_121362_m_000000_1, Status : FAILED
java.io.IOException: Can’t export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: java.sql.SQLException: Incorrect string value: ‘xF3x90x8Cx92xEFxBF…’ for column ‘singer’ at row 1
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:220)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84)
… 10 more
Caused by: java.sql.SQLException: Incorrect string value: ‘xF3x90x8Cx92xEFxBF…’ for column ‘singer’ at row 1
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1072)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3563)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3495)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1959)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2113)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2693)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2102)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1364)
at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:233)
14/11/09 07:01:07 INFO mapred.JobClient: Task Id : attempt_201408301703_121362_m_000000_2, Status : FAILED
java.io.IOException: Can’t export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: java.sql.SQLException: Incorrect string value: ‘xF3x90x8Cx92xEFxBF…’ for column ‘singer’ at row 1
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:220)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84)
… 10 more
Caused by: java.sql.SQLException: Incorrect string value: ‘xF3x90x8Cx92xEFxBF…’ for column ‘singer’ at row 1
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1072)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3563)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3495)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1959)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2113)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2693)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2102)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1364)
at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:233)
14/11/09 07:01:20 INFO mapred.JobClient: Job complete: job_201408301703_121362
14/11/09 07:01:20 INFO mapred.JobClient: Counters: 8
14/11/09 07:01:20 INFO mapred.JobClient: Job Counters
14/11/09 07:01:20 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=31500
14/11/09 07:01:20 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/11/09 07:01:20 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/11/09 07:01:20 INFO mapred.JobClient: Rack-local map tasks=3
14/11/09 07:01:20 INFO mapred.JobClient: Launched map tasks=4
14/11/09 07:01:20 INFO mapred.JobClient: Data-local map tasks=1
14/11/09 07:01:20 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
14/11/09 07:01:20 INFO mapred.JobClient: Failed map tasks=1
14/11/09 07:01:20 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 44.4052 seconds (0 bytes/sec)
14/11/09 07:01:20 INFO mapreduce.ExportJobBase: Exported 0 records.
14/11/09 07:01:20 ERROR tool.ExportTool: Error during export: Export job failed!
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in jar:file:/home/hadoop/hadoop/hive-0.10.0.20140629/lib/hive-common-0.10.0.jar!/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201411090701_1423933997.txt
7 Sqoop调式信息
增加关键字--verbose
sqoop export –connect jdbc:mysql://192.168.119.129:3306/student?characterEncoding=utf8 –username li72 –password 123 –verbose –table dm_trlog –export-dir /home/bigdata/hive/data/db1.db/trlog –input-fields-terminated-by ‘/t’ –null-non-string ‘0’ –null-string ‘0’;
8 修改生成的Java类,重新打包。
每次通过sqoop导入MySql的时,都会生成一个以MySql表命名的.java文件,然后打成JAR包,给sqoop提交给hadoop 的MR来解析Hive表中的数据。那可以根据报的错误,找到对应的行,改写该文件,编译,重新打包,sqoop可以通过 -jar-file ,–class-name 组合让我们指定运行自己的jar包中的某个class。来解析该hive表中的每行数据。脚本如下:一个完整的例子如下:
./bin/sqoop export –connect “jdbc:mysql://192.168.119.129:3306/student?useUnicode=true&characterEncoding=utf-8”
–username li72 –password 123 –table dm_trlog
–export-dir /hive/warehouse/trlog –input-fields-terminated-by ‘/t’
–input-null-string ‘//N’ –input-null-non-string ‘//N’
–class-name com.li72.trlog
–jar-file /tmp/sqoopTempjar/trlog.jar
上面–jar-file 参数指定jar包的路径。–class-name 指定jar包中的class。
这样就可以解决所有解析异常了。
Sqoop命令行说明
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
|
Common arguments: --connect < jdbc-uri > Specify JDBC connect string --connection-manager < class-name > Specify connection manager class name --connection-param-file < properties-file > Specify connection parameters file --driver < class-name > Manually specify JDBC driver class to use --hadoop-home < hdir > Override $HADOOP_MAPRED_HOME_ARG --hadoop-mapred-home < dir > Override $HADOOP_MAPRED_HOME_ARG --help Print usage instructions -P Read password from console --password < password > Set authentication password --password-file < password-file > Set authentication password file path --relaxed-isolation Use read-uncommitted isolation for imports --skip-dist-cache Skip copying jars to distributed cache --username < username > Set authentication username --verbose Print more information while working Import control arguments: --append Imports data in append mode --as-avrodatafile Imports data to Avro data files --as-sequencefile Imports data to SequenceFile s --as-textfile Imports data as plain text (default) --boundary-query < statement > Set boundary query for retrieving max and min value of the primary key --columns < col ,col,col...> Columns to import from table --compression-codec < codec > Compression codec to use for import --delete-target-dir Imports data in delete mode --direct Use direct import fast path --direct-split-size < n > Split the input stream every 'n' bytes when importing in direct mode -e,--query < statement > Import results of SQL 'statement' --fetch-size < n > Set number 'n' of rows to fetch from the database when more rows are needed --inline-lob-limit < n > Set the maximum size for an inline LOB -m,--num-mappers < n > Use 'n' map tasks to import in parallel --mapreduce-job-name < name > Set name for generated mapreduce job --split-by < column-name > Column of the table used to split work units --table < table-name > Table to read --target-dir < dir > HDFS plain table destination --validate Validate the copy using the configured validator --validation-failurehandler < validation-failurehandler > Fully qualified class name for ValidationFa ilureHandler --validation-threshold < validation-threshold > Fully qualified class name for ValidationTh reshold --validator < validator > Fully qualified class name for the Validator --warehouse-dir < dir > HDFS parent for table destination --where < where clause> WHERE clause to use during import -z,--compress Enable compression Incremental import arguments: --check-column < column > Source column to check for incremental change --incremental < import-type > Define an incremental import of type 'append' or 'lastmodified' --last-value < value > Last imported value in the incremental check column Output line formatting arguments: --enclosed-by < char > Sets a required field enclosing character --escaped-by < char > Sets the escape character --fields-terminated-by < char > Sets the field separator character --lines-terminated-by < char > Sets the end-of-line character --mysql-delimiters Uses MySQL's default delimiter set: fields: , lines: /n escaped-by: / optionally-enclosed-by: ' --optionally-enclosed-by < char > Sets a field enclosing character Input parsing arguments: --input-enclosed-by < char > Sets a required field encloser --input-escaped-by < char > Sets the input escape character --input-fields-terminated-by < char > Sets the input field separator --input-lines-terminated-by < char > Sets the input end-of-line char --input-optionally-enclosed-by < char > Sets a field enclosing character Hive arguments: --create-hive-table Fail if the target hive table exists --hive-database < database-name > Sets the database name to use when importing to hive --hive-delims-replacement < arg > Replace Hive record /0x01 and row delimiters (/n/r) from imported string fields with user-defined string --hive-drop-import-delims Drop Hive record /0x01 and row delimiters (/n/r) from imported string fields --hive-home < dir > Override $HIVE_HOME --hive-import Import tables into Hive (Uses Hive's default delimiters if none are set.) --hive-overwrite Overwrite existing data in the Hive table --hive-partition-key < partition-key > Sets the partition key to use when importing to hive --hive-partition-value < partition-value > Sets the partition value to use when importing to hive --hive-table < table-name > Sets the table name to use when importing to hive --map-column-hive < arg > Override mapping for specific column to hive types. HBase arguments: --column-family < family > Sets the target column family for the import --hbase-bulkload Enables HBase bulk loading --hbase-create-table If specified, create missing HBase tables --hbase-row-key < col > Specifies which input column to use as the row key --hbase-table < table > Import to < table > in HBase HCatalog arguments: --hcatalog-database < arg > HCatalog database name --hcatalog-home < hdir > Override $HCAT_HOME --hcatalog-table < arg > HCatalog table name --hive-home < dir > Override $HIVE_HOME --hive-partition-key < partition-key > Sets the partition key to use when importing to hive --hive-partition-value < partition-value > Sets the partition value to use when importing to hive --map-column-hive < arg > Override mapping for specific column to hive types. HCatalog import specific options: --create-hcatalog-table Create HCatalog before import --hcatalog-storage-stanza < arg > HCatalog storage stanza for table creation Accumulo arguments: --accumulo-batch-size < size > Batch size in bytes --accumulo-column-family < family > Sets the target column family for the import --accumulo-create-table If specified, create missing Accumulo tables --accumulo-instance < instance > Accumulo instance name. --accumulo-max-latency < latency > Max write latency in milliseconds --accumulo-password < password > Accumulo password. --accumulo-row-key < col > Specifies which input column to use as the row key --accumulo-table < table > Import to < table > in Accumulo --accumulo-user < user > Accumulo user name. --accumulo-visibility < vis > Visibility token to be applied to all rows imported --accumulo-zookeepers < zookeepers > Comma-separated list of zookeepers (host:port) Code generation arguments: --bindir < dir > Output directory for compiled objects --class-name < name > Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class. --input-null-non-string < null-str > Input null non-string representation --input-null-string < null-str > Input null string representation --jar-file < file > Disable code generation; use specified jar --map-column-java < arg > Override mapping for specific columns to java types --null-non-string < null-str > Null non-string representation --null-string < null-str > Null string representation --outdir < dir > Output directory for generated code --package-name < name > Put auto-generated classes in this package |
原创文章,作者:Maggie-Hunter,如若转载,请注明出处:https://blog.ytso.com/191009.html