ambari 2.7.8 Hive on spark 配置问题解决
因为ambari默认的引擎为tez,所以建议直接使用tez
ambari绑定的hive on spark 适配并不是特别好 记录一下解决的问题
版本
Hadoop 3.1.1
Hive 3.1.0
Spark2 2.3.0
综合解决
#解决hive 启动使用spark引擎直接报错 将spark包放到hive路径下 使hive可以使用spark引擎
cp /usr/hdp/current/spark2-client/jars/spark-core_*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/scala-library*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/spark-network-common*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/spark-unsafe*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/scala-reflect-*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/spark-launcher*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/spark-yarn*.jar /usr/hdp/current/hive-server2-hive/lib/
#Custome hive-site配置 (不这样做 需要更改每台机器的spark lib 更加麻烦)
spark.yarn.jars=hdfs://dwh-test01:8020/spark-jars/*
#上传spark jar到hdfs
sudo -u hdfs hdfs dfs -mkdir /spark-jars
sudo -u hdfs hdfs dfs -chmod 777 /spark-jars
hadoop fs -put /usr/hdp/current/spark2-client/jars/*.jar /spark-jars/
# 执行hdp-select versions将结果配置到 Custom mapred-site 网上有人配置在yarn,hive,spark 但是我配了都没生效
hdp.version=3.1.5.0-152
#删除spark自带的错误的hive包
hadoop fs -rm /spark-jars/hive*.jar
hadoop fs -rm /spark-jars/spark-hive*.jar
#说明:(影响 HIVE ON YARN 的 INSERT语法)
hadoop fs -rm /spark-jars/hive-exec-1.21.2.3.1.5.0-152.jar
#说明:(影响 HIVE ON YARN 的 GROUP BY语法)
hadoop fs -rm /spark-jars/orc-core-1.4.4-nohive.jar
# hive-site取消勾选(影响 HIVE ON YARN 的 JOIN ON语法)
hive.mapjoin.optimized.hashtable=false;
#todo 指定是用spark作为引擎(不生效 暂时只能在sql里手动set)
hive.execution.engine=spark
#Advanced hive-interactive-site 关闭 删除hive.execution.engine限制
Restricted session configs=hive.execution.mode
#测试:SQL执行 手动测试
set hive.execution.engine=spark;
set hive.mapjoin.optimized.hashtable=false;
Error while processing statement: FAILED: Execution Error, return code 3
[42000][3] Error while processing statement: FAILED:
Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.
Spark job failed during runtime. Please check stacktrace for the root cause.
解决
- 如果任务已经跑在yarn上 想办法查看spark-history的日志 再进一步排查
- 如果hive还无法跑在yarn上,查看ambari集成的服务的默认日志路径
/var/log/hive/..
再进一步排查
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session b1d48791-b28a-446c-9900-2dc48e2c751a)'
024-03-05T18:02:35,272 ERROR [HiveServer2-Background-Pool: Thread-177]: operation.Operation (:()) - Error running hive query:
2024-03-05T18:07:09,694 ERROR [HiveServer2-Background-Pool: Thread-104]: client.SparkClientImpl (:()) - Error while waiting for client to connect.
2024-03-05T18:07:09,715 ERROR [HiveServer2-Background-Pool: Thread-104]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session b1d48791-b28a-446c-9900-2dc48e2c751a)'
2024-03-05T18:07:09,715 ERROR [HiveServer2-Background-Pool: Thread-104]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session b1d48791-b28a-446c-9900-2dc48e2c751a)'
2024-03-05T18:07:09,715 ERROR [HiveServer2-Background-Pool: Thread-104]: ql.Driver (:()) - FAILED: command has been interrupted: during query execution:
2024-03-05T18:09:07,775 ERROR [HiveServer2-Background-Pool: Thread-136]: client.SparkClientImpl (:()) - Timed out waiting for client to connect.
2024-03-05T18:09:07,779 ERROR [HiveServer2-Background-Pool: Thread-136]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 7e9ae27f-5a9a-4200-b8de-b6fff293612f)'
2024-03-05T18:09:07,779 ERROR [HiveServer2-Background-Pool: Thread-136]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 7e9ae27f-5a9a-4200-b8de-b6fff293612f)'
2024-03-05T18:09:07,780 ERROR [HiveServer2-Background-Pool: Thread-136]: ql.Driver (:()) - FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 7e9ae27f-5a9a-4200-b8de-b6fff293612f
2024-03-05T18:09:07,792 ERROR [HiveServer2-Background-Pool: Thread-136]: operation.Operation (:()) - Error running hive query:
2024-03-05T18:21:49,052 ERROR [HiveServer2-Background-Pool: Thread-109]: client.SparkClientImpl (:()) - Timed out waiting for client to connect.
2024-03-05T18:21:49,072 ERROR [HiveServer2-Background-Pool: Thread-109]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session b62a3365-ca4a-4ac8-aece-0a8db5a90cdf)'
2024-03-05T18:21:49,072 ERROR [HiveServer2-Background-Pool: Thread-109]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session b62a3365-ca4a-4ac8-aece-0a8db5a90cdf)'
解决
#解决hive 启动使用spark引擎直接报错 将spark包放到hive路径下 使hive可以使用spark引擎
cp /usr/hdp/current/spark2-client/jars/spark-core_*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/scala-library*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/spark-network-common*.jar /usr/hdp/current/hive-server2-hive/lib/
cp /usr/hdp/current/spark2-client/jars/spark-unsafe*.jar /usr/hdp/current/hive-server2-hive/lib/
java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS
24/03/06 17:54:33 ERROR ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:498)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: java.util.concurrent.ExecutionException: Boxed Error
at scala.concurrent.impl.Promise$.resolver(Promise.scala:59)
at scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:51)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:157)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:739)
Caused by: java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS
at org.apache.hive.spark.client.rpc.RpcConfiguration.<clinit>(RpcConfiguration.java:48)
at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:138)
at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:536)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)
24/03/06 17:54:33 INFO ApplicationMaster: Deleting staging directory hdfs://HA-Namespace/user/hive/.sparkStaging/application_1709716929078_0004
24/03/06 17:54:33 INFO ShutdownHookManager: Shutdown hook called
解决
##Ambari web 设置 Custome hive-site配置hive on spark 使用hdfs的jar包
spark.yarn.jars=hdfs://dwh-test01:8020/spark-jars/*
#上传 spark2的jar到hdfs
hadoop fs -put /usr/hdp/current/spark2-client/jars/*.jar /spark-jars/
#删除spark2自带的错误的hive包
hadoop fs -rm /spark-jars/hive*.jar
hadoop fs -rm /spark-jars/spark-hive*.jar
java.lang.NoSuchMethodError: org.apache.orc.OrcFile
24/03/07 12:00:33 ERROR RemoteDriver: Failed to run job 3ecd93be-704b-4f42-aa50-6fa7aec5d9cd
java.lang.NoSuchMethodError: org.apache.orc.OrcFile$ReaderOptions.useUTCTimestamp(Z)Lorg/apache/orc/OrcFile$ReaderOptions;
at org.apache.hadoop.hive.ql.io.orc.OrcFile$ReaderOptions.useUTCTimestamp(OrcFile.java:94)
at org.apache.hadoop.hive.ql.io.orc.OrcFile$ReaderOptions.<init>(OrcFile.java:70)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.readerOptions(OrcFile.java:100)
at org.apache.hadoop.hive.ql.io.AcidUtils$MetaDataFile.isRawFormatFile(AcidUtils.java:2344)
at org.apache.hadoop.hive.ql.io.AcidUtils$MetaDataFile.isRawFormat(AcidUtils.java:2339)
at org.apache.hadoop.hive.ql.io.AcidUtils.parsedDelta(AcidUtils.java:1037)
at org.apache.hadoop.hive.ql.io.AcidUtils.parseDelta(AcidUtils.java:1028)
at org.apache.hadoop.hive.ql.io.AcidUtils.getChildState(AcidUtils.java:1347)
at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:1163)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.processForWriteIds(HiveInputFormat.java:641)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.processPathsForMmRead(HiveInputFormat.java:605)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:495)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:789)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:552)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:267)
at org.apache.spark.api.java.JavaRDDLike$class.getNumPartitions(JavaRDDLike.scala:65)
at org.apache.spark.api.java.AbstractJavaRDDLike.getNumPartitions(JavaRDDLike.scala:45)
at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:215)
at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:142)
at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:114)
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:359)
at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:378)
at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:343)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
解决
解决办法 参考上一步
错误原因:spark2自带的错误的版本的hive包 产生了jar包冲突
删掉hdfs上的orc-core-1.4.4-nohive.jar即可(前提是已经配置了hive-site配置了spark.yarn.jars)
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer cannot be cast to org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer
24/03/07 10:31:32 INFO DAGScheduler: ResultStage 9 (Map 1) failed in 0.178 s due to Job aborted due to stage failure: Task 0 in stage 9.0 failed 4 times, most recent failure: Lost task 0.3 in stage 9.0 (TID 27, dwh-test03, executor 2): java.lang.IllegalStateException: Hit error while closing operators - failing tree: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:203)
at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:96)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
at org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:2190)
at org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:2190)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerStringOperator.process(VectorMapJoinInnerStringOperator.java:384)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:136)
at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:990)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:732)
at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:180)
... 15 more
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerGenerateResultOperator.commonSetup(VectorMapJoinInnerGenerateResultOperator.java:119)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerStringOperator.process(VectorMapJoinInnerStringOperator.java:109)
... 27 more
24/03/07 10:31:32 WARN TaskSetManager: Lost task 1.0 in stage 9.0 (TID 22, dwh-test02, executor 1): java.lang.RuntimeException: Map operator initialization failed: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer cannot be cast to org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer
at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:124)
at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55)
at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer cannot be cast to org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.setUpHashTable(VectorMapJoinCommonOperator.java:493)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.completeInitializationOp(VectorMapJoinCommonOperator.java:462)
at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:469)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:399)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:572)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:524)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:115)
解决
发现报错的源码在:src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java
private void setUpHashTable() {
HashTableImplementationType hashTableImplementationType = vectorDesc.getHashTableImplementationType();
switch (vectorDesc.getHashTableImplementationType()) {
case OPTIMIZED:
{
// Create our vector map join optimized hash table variation *above* the
// map join table container.
vectorMapJoinHashTable = VectorMapJoinOptimizedCreateHashTable.createHashTable(conf,
mapJoinTables[posSingleVectorMapJoinSmallTable]);
}
break;
case FAST:
{
// Get our vector map join fast hash table variation from the
// vector map join table container.
VectorMapJoinTableContainer vectorMapJoinTableContainer =
(VectorMapJoinTableContainer) mapJoinTables[posSingleVectorMapJoinSmallTable];
vectorMapJoinHashTable = vectorMapJoinTableContainer.vectorMapJoinHashTable();
}
break;
default:
throw new RuntimeException("Unknown vector map join hash table implementation type " + hashTableImplementationType.name());
}
LOG.info("Using " + vectorMapJoinHashTable.getClass().getSimpleName() + " from " + this.getClass().getSimpleName());
case FAST 的代码是有问题的,发现可以通过修改参数 可以通过修改配置不走FAST
set hive.mapjoin.optimized.hashtable=false;
查看ambari web的参数 发现了这个参数在web上有
备注是:hive.mapjoin.optimized.hashtable
Whether Hive should use memory-optimized hash table for MapJoin.Only works on Tez,
because memory-optimized hashtable cannot be serialized.
翻译:hive.mapjoin.optimized.hashtable
Hive是否应该为MapJoin使用内存优化的哈希表。仅适用于Tez,
因为内存优化的hashtable无法序列化。
是专门给tez引擎的,直接取消勾选,重启hive即可
参考
有一定指导作用,但也不能完全解决问题