Bigdata is a horizontally-scaled, open-source architecture for indexed data with an emphasis on RDF capable of loading 1B triples in under one hour on a 15 node cluster. Bigdata operates in both a single machine mode (Journal), highly available replication cluster mode (HAJournalServer), and a horizontally sharded cluster mode (BigdataFederation). The Journal provides fast scalable ACID indexed storage for very large data sets, up to 50 billion triples / quads. The HAJournalServer adds replication, online backup, horizontal scaling of query, and high availability. The federation provides fast scalable shard-wise parallel indexed storage using dynamic sharding and shard-wise ACID updates and incremental cluster size growth. Both platforms support fully concurrent readers with snapshot isolation.
Jena Grande is a collection of utilities, experiments and examples on how to use MapReduce, Pig, HBase or Giraph to process data in RDF format.