遇到的坑:
1、 Hive的任务会从临时目录移动数据到数据仓库目录,默认hive使用/tmp作为临时目录,用户通常使用/user/hive/warehouse/作为数据仓库目录。在Federated HDFS情况下,/tmp 和 /user视为两个不同的ViewFS mount table,所以hive任务在这两个目录之间移动数据。Federated HDFS不支持这样做,所以任务会失败。
报错信息:
ERROR : Failed with exception Unable to move sourceviewfs://cluster9/tmp/.hive-staging_hive_2015-07-29_12-34-11_306_6082682065011532871-5/-ext-10002to destinationviewfs://cluster9/user/hive/warehouse/tandem.db/cust_loss_alarm_unit
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to movesourceviewfs://cluster9/tmp/warehouse/.hive-staging_hive_2015-07-29_12-34-11_306_6082682065011532871-5/-ext-10002to destinationviewfs://cluster9/user/hive/warehouse/tandem.db/cust_loss_alarm_unit
atorg.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2521)
atorg.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:105)
atorg.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:222)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
atorg.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
atorg.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1640)
atorg.apache.hadoop.hive.ql.Driver.execute(Driver.java:1399)
atorg.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183)
atorg.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
atorg.apache.hadoop.hive.ql.Driver.run(Driver.java:1044)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:144)
atorg.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69)
atorg.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196)
atjava.security.AccessController.doPrivileged(Native Method)
atjavax.security.auth.Subject.doAs(Subject.java:415)
atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208)
atjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
atjava.util.concurrent.FutureTask.run(FutureTask.java:262)
atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
atjava.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Renames across Mount points notsupported
atorg.apache.hadoop.fs.viewfs.ViewFileSystem.rename(ViewFileSystem.java:444)
atorg.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2509)
… 21 more
相关代码:
org.apache.hadoop.fs.viewfs.ViewFileSystem
/**
// Alternate 1: renames within same file system -valid but we disallow
// Alternate 2: (as described in next para – valid butwe have disallowed it
//
// Note we compare the URIs. the URIs include the linktargets.
// hence we allow renames across mount links as longas the mount links
// point to the same target.
if (!re***c.targetFileSystem.getUri().equals(
resDst.targetFileSystem.getUri())) {
throw new IOException(“Renames acrossMount points not supported”);
}
*/
//
// Alternate 3 : renames ONLY within the the samemount links.
//
if (re***c.targetFileSystem!=resDst.targetFileSystem) {
throw new IOException(“Renames acrossMount points not supported”);
}
Workaround:
a、在hdfs中 创建 /user/hive/warehouse/staging 目录,赋予777权限
然后添加配置:
<property>
<name>hive.exec.stagingdir</name>
<value>/user/hive/warehouse/staging/.hive-staging</value>
</property>
b、 只创建一个加载点如 /cluser 然后在此加载点下创建/tmp /user等目录,最后修改hive相关目录的默认值。
2、 当查询返回结果集很大的时候,beeline客户端会卡住或out-of-memory
报错信息:
org.apache.thrift.TException: Error in calling method FetchResults
atorg.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1271)
atcom.sun.proxy.$Proxy0.FetchResults(Unknown Source)
atorg.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:363)
at org.apache.hive.beeline.BufferedRows.<init>(BufferedRows.java:42)
atorg.apache.hive.beeline.BeeLine.print(BeeLine.java:1756)
atorg.apache.hive.beeline.Commands.execute(Commands.java:806)
atorg.apache.hive.beeline.Commands.sql(Commands.java:665)
atorg.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974)
atorg.apache.hive.beeline.BeeLine.execute(BeeLine.java:810)
atorg.apache.hive.beeline.BeeLine.begin(BeeLine.java:767)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480)
atorg.apache.hive.beeline.BeeLine.main(BeeLine.java:463)
atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
atjava.lang.reflect.Method.invoke(Method.java:606)
atorg.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.OutOfMemoryError: Java heap space
atjava.lang.Double.valueOf(Double.java:521)
Workaround:
查看源码发现:beeline获取结果集有两种模式一种增量模式,一种buffer模式
org.apache.hive.beeline.BeeLine
int print(ResultSet rs) throws SQLException {
String format = getOpts().getOutputFormat();
OutputFormat f = (OutputFormat)formats.get(format);
if (f == null) {
error(loc(“unknown-format”, new Object[] {
format,formats.keySet()}));
f = new TableOutputFormat(this);
}
Rows rows;
if (getOpts().getIncremental()) {
rows = new IncrementalRows(this,rs); // 增量模式
} else {
rows = new BufferedRows(this, rs);buffer模式
}
return f.print(rows);
}
org.apache.hive.beeline.BeeLineOpts
private boolean incremental = false; //默认为buffer模式
但是通过beeline –help没有发现相关设置
beeline –help
Usage: java org.apache.hive.cli.beeline.BeeLine
-u <databaseurl> the JDBC URL to connect to
-n <username> the username to connect as
-p<password> the password to connect as
-d <driverclass> the driver class to use
-i <initfile> script file for initialization
-e<query> query that should be executed
-f <execfile> script file that should be executed
-w (or) –password-file <password file> the password file to read password from
–hiveconfproperty=value Use value for given property
–hivevarname=value hive variable name and value
This is Hive specific settings in which variables
can be set at session level and referenced in Hive
commands or queries.
–color=[true/false] control whether color is used for display
–showHeader=[true/false] show column namesin query results
–headerInterval=ROWS; the interval between which heades are displayed
–fastConnect=[true/false] skip buildingtable/column list for tab-completion
–autoCommit=[true/false] enable/disableautomatic transaction commit
–verbose=[true/false] show verbose error messages and debug info
–showWarnings=[true/false] display connection warnings
–showNestedErrs=[true/false] displaynested errors
–numberFormat=[pattern] formatnumbers using DecimalFormat pattern
–force=[true/false] continue running script even after errors
–maxWidth=MAXWIDTH the maximum width of the terminal
–maxColumnWidth=MAXCOLWIDTH themaximum width to use when displaying columns
–silent=[true/false] be more silent
–autosave=[true/false] automatically save preferences
–outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv] format mode forresult display
Note that csv, and tsv are deprecated – use csv2, tsv2 instead
–truncateTable=[true/false] truncatetable column when it exceeds length
–delimiterForDSV=DELIMITER specify the delimiter for delimiter-separated values output format (default: |)
–isolation=LEVEL set the transaction isolation level
–nullemptystring=[true/false] set to true toget historic behavior of printing null as empty string
–help display this message
Beeline version 1.1.0-cdh6.4.3 by Apache Hive
但是没关系通过
beeline -u jdbc:hive2://10.17.28.173:10000 –n xxxx -pxxxx –incremental=true 还是能进入增量模式
原创文章,作者:奋斗,如若转载,请注明出处:https://blog.ytso.com/197756.html