ambari 2.7.8 Hive on spark 配置问题解决

ambari 2.7.8 Hive on spark 配置问题解决

因为ambari默认的引擎为tez,所以建议直接使用tez

ambari绑定的hive on spark 适配并不是特别好 记录一下解决的问题


版本

Hadoop 3.1.1

Hive 3.1.0

Spark2 2.3.0


综合解决

#解决hive 启动使用spark引擎直接报错 将spark包放到hive路径下 使hive可以使用spark引擎
cp /usr/hdp/current/spark2-client/jars/spark-core_*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/scala-library*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/spark-network-common*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/spark-unsafe*.jar /usr/hdp/current/hive-server2-hive/lib/

cp /usr/hdp/current/spark2-client/jars/scala-reflect-*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/spark-launcher*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/spark-yarn*.jar /usr/hdp/current/hive-server2-hive/lib/

#Custome hive-site配置 (不这样做 需要更改每台机器的spark lib 更加麻烦)
spark.yarn.jars=hdfs://dwh-test01:8020/spark-jars/*

#上传spark jar到hdfs
sudo -u hdfs hdfs dfs -mkdir /spark-jars
sudo -u hdfs hdfs dfs -chmod 777 /spark-jars
hadoop fs -put /usr/hdp/current/spark2-client/jars/*.jar /spark-jars/

# 执行hdp-select versions将结果配置到 Custom mapred-site 网上有人配置在yarn,hive,spark 但是我配了都没生效
hdp.version=3.1.5.0-152

#删除spark自带的错误的hive包
hadoop fs -rm /spark-jars/hive*.jar
hadoop fs -rm /spark-jars/spark-hive*.jar
#说明:(影响 HIVE ON YARN 的 INSERT语法)
hadoop fs -rm /spark-jars/hive-exec-1.21.2.3.1.5.0-152.jar
#说明:(影响 HIVE ON YARN 的 GROUP BY语法)
hadoop fs -rm /spark-jars/orc-core-1.4.4-nohive.jar

# hive-site取消勾选(影响 HIVE ON YARN 的 JOIN ON语法)
hive.mapjoin.optimized.hashtable=false;

#todo 指定是用spark作为引擎(不生效 暂时只能在sql里手动set)
hive.execution.engine=spark
#Advanced hive-interactive-site 关闭 删除hive.execution.engine限制
Restricted session configs=hive.execution.mode

#测试:SQL执行 手动测试
set hive.execution.engine=spark;
set hive.mapjoin.optimized.hashtable=false;


Error while processing statement: FAILED: Execution Error, return code 3

[42000][3] Error while processing statement: FAILED: 
Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
Spark job failed during runtime. Please check stacktrace for the root cause.


解决

  1. 如果任务已经跑在yarn上 想办法查看spark-history的日志 再进一步排查
  2. 如果hive还无法跑在yarn上,查看ambari集成的服务的默认日志路径 /var/log/hive/.. 再进一步排查


Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session b1d48791-b28a-446c-9900-2dc48e2c751a)'

024-03-05T18:02:35,272 ERROR [HiveServer2-Background-Pool: Thread-177]: operation.Operation (:()) - Error running hive query: 
2024-03-05T18:07:09,694 ERROR [HiveServer2-Background-Pool: Thread-104]: client.SparkClientImpl (:()) - Error while waiting for client to connect.
2024-03-05T18:07:09,715 ERROR [HiveServer2-Background-Pool: Thread-104]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session b1d48791-b28a-446c-9900-2dc48e2c751a)'
2024-03-05T18:07:09,715 ERROR [HiveServer2-Background-Pool: Thread-104]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session b1d48791-b28a-446c-9900-2dc48e2c751a)'
2024-03-05T18:07:09,715 ERROR [HiveServer2-Background-Pool: Thread-104]: ql.Driver (:()) - FAILED: command has been interrupted: during query execution: 
2024-03-05T18:09:07,775 ERROR [HiveServer2-Background-Pool: Thread-136]: client.SparkClientImpl (:()) - Timed out waiting for client to connect.
2024-03-05T18:09:07,779 ERROR [HiveServer2-Background-Pool: Thread-136]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 7e9ae27f-5a9a-4200-b8de-b6fff293612f)'
2024-03-05T18:09:07,779 ERROR [HiveServer2-Background-Pool: Thread-136]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 7e9ae27f-5a9a-4200-b8de-b6fff293612f)'
2024-03-05T18:09:07,780 ERROR [HiveServer2-Background-Pool: Thread-136]: ql.Driver (:()) - FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 7e9ae27f-5a9a-4200-b8de-b6fff293612f
2024-03-05T18:09:07,792 ERROR [HiveServer2-Background-Pool: Thread-136]: operation.Operation (:()) - Error running hive query: 
2024-03-05T18:21:49,052 ERROR [HiveServer2-Background-Pool: Thread-109]: client.SparkClientImpl (:()) - Timed out waiting for client to connect.
2024-03-05T18:21:49,072 ERROR [HiveServer2-Background-Pool: Thread-109]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session b62a3365-ca4a-4ac8-aece-0a8db5a90cdf)'
2024-03-05T18:21:49,072 ERROR [HiveServer2-Background-Pool: Thread-109]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session b62a3365-ca4a-4ac8-aece-0a8db5a90cdf)'


解决

#解决hive 启动使用spark引擎直接报错 将spark包放到hive路径下 使hive可以使用spark引擎
cp /usr/hdp/current/spark2-client/jars/spark-core_*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/scala-library*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/spark-network-common*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/spark-unsafe*.jar /usr/hdp/current/hive-server2-hive/lib/



java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS

24/03/06 17:54:33 ERROR ApplicationMaster: Uncaught exception: 
org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: java.util.concurrent.ExecutionException: Boxed Error
    at scala.concurrent.impl.Promise$.resolver(Promise.scala:59)
    at scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:51)
    at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
    at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
    at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:157)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:739)
Caused by: java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS
    at org.apache.hive.spark.client.rpc.RpcConfiguration.<clinit>(RpcConfiguration.java:48)
    at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:138)
    at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:536)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)
24/03/06 17:54:33 INFO ApplicationMaster: Deleting staging directory hdfs://HA-Namespace/user/hive/.sparkStaging/application_1709716929078_0004
24/03/06 17:54:33 INFO ShutdownHookManager: Shutdown hook called


解决

##Ambari web 设置 Custome hive-site配置hive on spark 使用hdfs的jar包
spark.yarn.jars=hdfs://dwh-test01:8020/spark-jars/*

#上传 spark2的jar到hdfs
hadoop fs -put /usr/hdp/current/spark2-client/jars/*.jar /spark-jars/

#删除spark2自带的错误的hive包
hadoop fs -rm /spark-jars/hive*.jar
hadoop fs -rm /spark-jars/spark-hive*.jar


java.lang.NoSuchMethodError: org.apache.orc.OrcFile

24/03/07 12:00:33 ERROR RemoteDriver: Failed to run job 3ecd93be-704b-4f42-aa50-6fa7aec5d9cd
java.lang.NoSuchMethodError: org.apache.orc.OrcFile$ReaderOptions.useUTCTimestamp(Z)Lorg/apache/orc/OrcFile$ReaderOptions;
    at org.apache.hadoop.hive.ql.io.orc.OrcFile$ReaderOptions.useUTCTimestamp(OrcFile.java:94)
    at org.apache.hadoop.hive.ql.io.orc.OrcFile$ReaderOptions.<init>(OrcFile.java:70)
    at org.apache.hadoop.hive.ql.io.orc.OrcFile.readerOptions(OrcFile.java:100)
    at org.apache.hadoop.hive.ql.io.AcidUtils$MetaDataFile.isRawFormatFile(AcidUtils.java:2344)
    at org.apache.hadoop.hive.ql.io.AcidUtils$MetaDataFile.isRawFormat(AcidUtils.java:2339)
    at org.apache.hadoop.hive.ql.io.AcidUtils.parsedDelta(AcidUtils.java:1037)
    at org.apache.hadoop.hive.ql.io.AcidUtils.parseDelta(AcidUtils.java:1028)
    at org.apache.hadoop.hive.ql.io.AcidUtils.getChildState(AcidUtils.java:1347)
    at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:1163)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.processForWriteIds(HiveInputFormat.java:641)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.processPathsForMmRead(HiveInputFormat.java:605)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:495)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:789)
    at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:552)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:267)
    at org.apache.spark.api.java.JavaRDDLike$class.getNumPartitions(JavaRDDLike.scala:65)
    at org.apache.spark.api.java.AbstractJavaRDDLike.getNumPartitions(JavaRDDLike.scala:45)
    at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:215)
    at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:142)
    at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:114)
    at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:359)
    at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:378)
    at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:343)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)


解决

解决办法 参考上一步

错误原因:spark2自带的错误的版本的hive包 产生了jar包冲突
删掉hdfs上的orc-core-1.4.4-nohive.jar即可(前提是已经配置了hive-site配置了spark.yarn.jars)



org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException

java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer cannot be cast to org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer

24/03/07 10:31:32 INFO DAGScheduler: ResultStage 9 (Map 1) failed in 0.178 s due to Job aborted due to stage failure: Task 0 in stage 9.0 failed 4 times, most recent failure: Lost task 0.3 in stage 9.0 (TID 27, dwh-test03, executor 2): java.lang.IllegalStateException: Hit error while closing operators - failing tree: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:203)
    at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
    at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:96)
    at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
    at scala.collection.Iterator$class.foreach(Iterator.scala:891)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
    at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
    at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
    at org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:2190)
    at org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:2190)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerStringOperator.process(VectorMapJoinInnerStringOperator.java:384)
    at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
    at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
    at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
    at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:136)
    at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
    at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:990)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:732)
    at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:180)
    ... 15 more
Caused by: java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerGenerateResultOperator.commonSetup(VectorMapJoinInnerGenerateResultOperator.java:119)
    at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerStringOperator.process(VectorMapJoinInnerStringOperator.java:109)
    ... 27 more


24/03/07 10:31:32 WARN TaskSetManager: Lost task 1.0 in stage 9.0 (TID 22, dwh-test02, executor 1): java.lang.RuntimeException: Map operator initialization failed: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer cannot be cast to org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer
    at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:124)
    at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55)
    at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer cannot be cast to org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer
    at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.setUpHashTable(VectorMapJoinCommonOperator.java:493)
    at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.completeInitializationOp(VectorMapJoinCommonOperator.java:462)
    at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:469)
    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:399)
    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:572)
    at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:524)
    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
    at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:115)


解决

发现报错的源码在:src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java

private void setUpHashTable() {

HashTableImplementationType hashTableImplementationType = vectorDesc.getHashTableImplementationType();
switch (vectorDesc.getHashTableImplementationType()) {
case OPTIMIZED:
  {
    // Create our vector map join optimized hash table variation *above* the
    // map join table container.
    vectorMapJoinHashTable = VectorMapJoinOptimizedCreateHashTable.createHashTable(conf,
            mapJoinTables[posSingleVectorMapJoinSmallTable]);
  }
  break;

case FAST:
  {
    // Get our vector map join fast hash table variation from the
    // vector map join table container.
    VectorMapJoinTableContainer vectorMapJoinTableContainer =
            (VectorMapJoinTableContainer) mapJoinTables[posSingleVectorMapJoinSmallTable];
    vectorMapJoinHashTable = vectorMapJoinTableContainer.vectorMapJoinHashTable();
  }
  break;
default:
  throw new RuntimeException("Unknown vector map join hash table implementation type " + hashTableImplementationType.name());
}
LOG.info("Using " + vectorMapJoinHashTable.getClass().getSimpleName() + " from " + this.getClass().getSimpleName());

case FAST 的代码是有问题的,发现可以通过修改参数 可以通过修改配置不走FAST

set hive.mapjoin.optimized.hashtable=false;

查看ambari web的参数 发现了这个参数在web上有

备注是:hive.mapjoin.optimized.hashtable
Whether Hive should use memory-optimized hash table for MapJoin.Only works on Tez,
because memory-optimized hashtable cannot be serialized.

翻译:hive.mapjoin.optimized.hashtable
Hive是否应该为MapJoin使用内存优化的哈希表。仅适用于Tez,
因为内存优化的hashtable无法序列化。

是专门给tez引擎的,直接取消勾选,重启hive即可


参考

hive on spark 官方文档

有一定指导作用,但也不能完全解决问题

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,324评论 5 476
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,303评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,192评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,555评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,569评论 5 365
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,566评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,927评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,583评论 0 257
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,827评论 1 297
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,590评论 2 320
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,669评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,365评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,941评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,928评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,159评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,880评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,399评论 2 342

推荐阅读更多精彩内容