In order to create a cluster that can support Shark, we need to launch an Amazon EMR cluster with Hive installed and then use a bootstrap action to install Spark and Shark.
Every five minutes, the ad server pushes a JSON file containing the latest set of logged data to Amazon S3. Pushing logs in a five-minute interval allows us to produce a timely analysis of the logs.