Lambda architecture consists of a serving layer that gets updated infrequently from the batch layer and a speed layer that computes real time analytics to compensate for the slow batch layer. Essentially, Hadoop is computing analytics in batches and in between batch runs, the speed layer is incrementally updating metrics by examining events in a streaming fashion.Both Spark and Storm can operate in a Hadoop cluster and access Hadoop storage. Storm-YARN is Yahoo’s open source implementation of Storm and Hadoop convergence. Spark is providing native integration for Hadoop. Integration with Hadoop is achieved through YARN (NextGen MapReduce). Integrating real time analytics with Hadoop based systems allows for better utilization of cluster resources through computational elasticity and being in the same cluster means that network transfers can be minimal.
Hive, Pig, Scalding, Scoobi, Scrunch and Spark