Stars
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
An Open Standard for lineage metadata collection
Apache Amoro(incubating) is a Lakehouse management system built on open data lake formats.
Notes talking about the design and implementation of Apache Spark
A library that provides an embeddable, persistent key-value store for fast storage.
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
The live data layer for apps and AI agents Create up-to-the-second views into your business, just using SQL
Event streaming platform for agents, apps, and analytics. Continuously ingest, transform, and serve event data in real time, at scale.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Apache Doris is an easy-to-use, high performance and unified analytics database.
Golang implementation of the Raft consensus protocol
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Flink CDC is a streaming data integration tool
Apache Spark - A unified analytics engine for large-scale data processing
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
TensorFlow code and pre-trained models for BERT
🚀 gnet is a high-performance, lightweight, non-blocking, event-driven networking framework written in pure Go.
APM, Application Performance Monitoring System
ClickHouse® is a real-time analytics database management system
The official home of the Presto distributed SQL query engine for big data
Apache Pulsar - distributed pub-sub messaging system
A feature complete and high performance multi-group Raft library in Go.
The Prometheus monitoring system and time series database.
A fast and reliable .NET Rules Engine with extensive Dynamic expression support
Dapr is a portable runtime for building distributed applications across cloud and edge, combining event-driven architecture with workflow orchestration.