  1. val conf = new SparkConf()

    // Custom Codec that process .gz.tmp extensions as a common Gzip format
    conf.set("spark.hadoop.io.compression.codecs", "smx.ananke.spark.util.codecs.TmpGzipCodec")

    val sc = new SparkContext(conf)

    val data = sc.textFile("s3n://my-data-bucket/2015/09/21/13/*")
  3. A Directed Acyclic Word Graph, or DAWG, is a data structure that permits extremely fast word searches. The entry point into the graph represents the starting letter in the search. Each node represents a letter, and you can travel from the node to two other nodes, depending on whether you the letter matches the one you are searching for.
    2013-10-04

