Skip to content
View mwacc's full-sized avatar

Block or report mwacc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Alluxio, data orchestration for analytics and machine learning in the cloud

Java 7,166 2,950 Updated Apr 29, 2025

Cache File System optimized for columnar formats and object stores

Java 187 72 Updated Aug 11, 2022

SQL-based streaming analytics platform at scale

Java 1,226 282 Updated Jun 21, 2020

UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy

Java 64 29 Updated Dec 17, 2023

A GPU-powered real-time analytics storage and query engine.

Go 3,083 234 Updated Jul 13, 2024

A testing framework for Presto

Java 62 31 Updated Mar 6, 2026

A load balancer / proxy / gateway for prestodb

JavaScript 358 157 Updated Jul 25, 2024

A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.

Scala 131 57 Updated Jan 17, 2025
Shell 70 62 Updated Mar 28, 2019

KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka f…

Scala 1,184 393 Updated Jan 5, 2017

Shazam in Java

Java 685 238 Updated Mar 16, 2016

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

Scala 1,047 315 Updated Mar 1, 2026

Example project to show how to use Spark to read and write Avro/Parquet files

50 44 Updated Aug 21, 2013

kmeans

Java 62 96 Updated Jul 10, 2024

HiBench is a big data benchmark suite.

Java 1,490 769 Updated Dec 15, 2025

Please visit https://github.com/h2oai/h2o-3 for latest H2O

Java 2,319 553 Updated Oct 24, 2024

The official home of the Presto distributed SQL query engine for big data

Java 16,667 5,535 Updated Mar 11, 2026

SQL Windowing Functions for Hadoop

Java 65 17 Updated Jun 20, 2022

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard a…

Java 314 144 Updated Mar 10, 2026

A free electronic book about Apache Hive. The book is geared towards SQL-knowledgeable business users with some advanced tips for devops.

HTML 104 40 Updated Sep 26, 2017

Pig Visualization framework

JavaScript 465 132 Updated Mar 24, 2023

Scalable machine learning library for Apache Hive/Spark/Pig

501 149 Updated Dec 2, 2016

High performance JSON IP and GeoIP REST API (IP Geolocation)

Go 875 137 Updated Jun 24, 2025

Example code for running R on Hadoop

R 132 62 Updated Oct 17, 2012

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules

Scala 4,382 521 Updated Jun 29, 2022

Crunchbase startups analysis

R 4 1 Updated Jul 25, 2013

Historical VoltDB example schemas, procedures, and client apps for demonstration and educational purposes. No longer maintained.

JavaScript 14 3 Updated Jan 4, 2015

ZeroTurnaround Process Executor

Java 912 114 Updated Jul 12, 2025
Next