240 发简信
IP属地:广东
  • select * from tmp.l_table a join tmp.r_table b on a.dt = b.dt and a.dt = '2021-11-09' and b.dt = '2021-11-09'

    -----执行计划
    spark.sql(" explain select * from tmp.l_table a join tmp.r_table b on a.dt = b.dt and a.dt = '2021-11-09' and b.dt = '2021-11-09' ").show(100,false)
    |== Physical Plan ==
    *(3) SortMergeJoin [dt#35], [dt#38], Inner
    :- *(1) Sort [dt#35 ASC NULLS FIRST], false, 0
    : +- Exchange hashpartitioning(dt#35, 200)
    : +- Scan hive tmp.l_table [l_id#33, l_name#34, dt#35], HiveTableRelation `tmp`.`l_table`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [l_id#33, l_name#34], [dt#35], [isnotnull(dt#35), (dt#35 = 2021-11-09)]
    +- *(2) Sort [dt#38 ASC NULLS FIRST], false, 0
    +- Exchange hashpartitioning(dt#38, 200)
    +- Scan hive tmp.r_table [r_id#36, r_name#37, dt#38], HiveTableRelation `tmp`.`r_table`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [r_id#36, r_name#37], [dt#38], [isnotnull(dt#38), (dt#38 = 2021-11-09)]|

    Spark_性能调优及Spark3.0新特性

    1.spark_shuffle_分区数 spark_sql aqe 优化SparkSQL中,基于SQL分析或者DSL分析,执行Job时,如果产生Shuffle,默认分区数:...

  • 老子居然没看懂!

    Flink Parallelism和Slot理解

    相关博客:Flink工作原理 1 问题出现 Caused by: akka.pattern.AskTimeoutException:Ask timed out on [Act...

  • 不是1.6后是动态内存管理吗??不用设置spark.storage.memoryFraction了

    Structured Streaming Tips

    title: Structured Streaming Tips (一) tags: spark structured-streaming tips gc 优化 catego...