SemanticScuttle - klotz.me » klotz: orc+spark+hadoop

klotz: orc* + spark* + hadoop*

Generic Load/Save Functions - Spark 3.2.0 Documentation

usersDF.write.format("orc") .option("orc.bloom.filter.columns", "favorite_color") .option("orc.dictionary.key.threshold", "1.0") .option("orc.column.encoding.direct", "name") .save("users_with_options.orc") Find full example code at "examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala" in the Spark repo

2021-12-01 Tags: spark, orc, bloom filter, parquet, hadoop by klotz
ORC Creation Best Practices - Hortonworks

-set hive.exec.orc.split.strategy=ETL; -- this will work only for specific values scan, if full table scan will be required anyway, use default (HYBRID) or BI.

2017-06-12 Tags: spark, hadoop, orc, hive, performance, tuning, etl, bi, hybrid by klotz

First / Previous / Next / Last / Page 1 of 0