select * from tmp.l_table a join tmp.r_table b on a.dt = b.dt and a.dt = '2021-11-09' and b.dt = '2021-11-09'
-----执行计划
spark.sql(" explain select * from tmp.l_table a join tmp.r_table b on a.dt = b.dt and a.dt = '2021-11-09' and b.dt = '2021-11-09' ").show(100,false)
|== Physical Plan ==
*(3) SortMergeJoin [dt#35], [dt#38], Inner
:- *(1) Sort [dt#35 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(dt#35, 200)
: +- Scan hive tmp.l_table [l_id#33, l_name#34, dt#35], HiveTableRelation `tmp`.`l_table`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [l_id#33, l_name#34], [dt#35], [isnotnull(dt#35), (dt#35 = 2021-11-09)]
+- *(2) Sort [dt#38 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(dt#38, 200)
+- Scan hive tmp.r_table [r_id#36, r_name#37, dt#38], HiveTableRelation `tmp`.`r_table`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [r_id#36, r_name#37], [dt#38], [isnotnull(dt#38), (dt#38 = 2021-11-09)]|
Spark_性能调优及Spark3.0新特性1.spark_shuffle_分区数 spark_sql aqe 优化SparkSQL中,基于SQL分析或者DSL分析,执行Job时,如果产生Shuffle,默认分区数:...