Tags: performance* + spark*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. However, you said you are doing an outer join. If it is a left join and the right side is larger than the left, then do an inner join first. Then do your left join on the result. Your result most likely will be broadcasted to do the left join. This is a pattern that Holden described at Strata this year in one of her sessions.
    2018-07-12 Tags: , , by klotz
  2. -set hive.exec.orc.split.strategy=ETL; -- this will work only for specific values scan, if full table scan will be required anyway, use default (HYBRID) or BI.
    2017-06-12 Tags: , , , , , , , , by klotz
  3. df.repartition($"key", 2).sortWithinPartitions()

Top of the page

First / Previous / Next / Last / Page 2 of 0 SemanticScuttle - klotz.me: tagged with "performance+spark"

About - Propulsed by SemanticScuttle