-set hive.exec.orc.split.strategy=ETL; -- this will work only for specific values scan, if full table scan will be required anyway, use default (HYBRID) or BI.
In order to create a cluster that can support Shark, we need to launch an Amazon EMR cluster with Hive installed and then use a bootstrap action to install Spark and Shark.
Every five minutes, the ad server pushes a JSON file containing the latest set of logged data to Amazon S3. Pushing logs in a five-minute interval allows us to produce a timely analysis of the logs.