Stars
Alluxio, data orchestration for analytics and machine learning in the cloud
Cache File System optimized for columnar formats and object stores
SQL-based streaming analytics platform at scale
UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy
A GPU-powered real-time analytics storage and query engine.
A load balancer / proxy / gateway for prestodb
A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka f…
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Example project to show how to use Spark to read and write Avro/Parquet files
Please visit https://github.com/h2oai/h2o-3 for latest H2O
The official home of the Presto distributed SQL query engine for big data
dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard a…
A free electronic book about Apache Hive. The book is geared towards SQL-knowledgeable business users with some advanced tips for devops.
Scalable machine learning library for Apache Hive/Spark/Pig
High performance JSON IP and GeoIP REST API (IP Geolocation)
Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
Historical VoltDB example schemas, procedures, and client apps for demonstration and educational purposes. No longer maintained.

