Skip to content
View chenlj's full-sized avatar

Block or report chenlj

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

C++ 16,481 4,002 Updated Feb 4, 2026

An Open Standard for lineage metadata collection

Java 2,296 423 Updated Feb 3, 2026

Apache Amoro(incubating) is a Lakehouse management system built on open data lake formats.

Java 1,107 377 Updated Feb 2, 2026

Notes talking about the design and implementation of Apache Spark

5,356 1,837 Updated Apr 2, 2024

A library that provides an embeddable, persistent key-value store for fast storage.

C++ 31,483 6,722 Updated Feb 3, 2026

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

Java 3,178 1,268 Updated Feb 4, 2026

The live data layer for apps and AI agents Create up-to-the-second views into your business, just using SQL

Rust 6,225 494 Updated Feb 4, 2026

Event streaming platform for agents, apps, and analytics. Continuously ingest, transform, and serve event data in real time, at scale.

Rust 8,760 730 Updated Feb 4, 2026

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

Java 14,143 4,991 Updated Feb 4, 2026

Apache Doris is an easy-to-use, high performance and unified analytics database.

Java 14,985 3,700 Updated Feb 4, 2026

Golang implementation of the Raft consensus protocol

Go 8,913 1,053 Updated Jan 14, 2026

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java 12,509 3,461 Updated Feb 4, 2026

Flink CDC is a streaming data integration tool

Java 6,339 2,123 Updated Feb 4, 2026

Apache Spark - A unified analytics engine for large-scale data processing

Scala 42,747 29,050 Updated Feb 4, 2026

DataX是阿里云DataWorks数据集成的开源版本。

Java 17,105 5,665 Updated Jul 1, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 156,129 31,942 Updated Feb 3, 2026

TensorFlow code and pre-trained models for BERT

Python 39,834 9,712 Updated Jul 23, 2024

🚀 gnet is a high-performance, lightweight, non-blocking, event-driven networking framework written in pure Go.

Go 11,072 1,108 Updated Jan 22, 2026

APM, Application Performance Monitoring System

Java 24,696 6,635 Updated Jan 27, 2026

ClickHouse® is a real-time analytics database management system

C++ 45,634 8,035 Updated Feb 4, 2026

The official home of the Presto distributed SQL query engine for big data

Java 16,641 5,521 Updated Feb 4, 2026

Mirror of Apache Kudu

C++ 1,896 663 Updated Feb 3, 2026

大数据入门指南 ⭐

Java 16,862 4,319 Updated Jan 5, 2024

Apache Pulsar - distributed pub-sub messaging system

Java 15,075 3,700 Updated Feb 3, 2026

阿里巴巴 MySQL binlog 增量订阅&消费组件

Java 29,610 7,675 Updated Jan 28, 2026

A feature complete and high performance multi-group Raft library in Go.

Go 5,298 565 Updated Jul 23, 2025

The Prometheus monitoring system and time series database.

Go 62,545 10,147 Updated Feb 3, 2026

A fast and reliable .NET Rules Engine with extensive Dynamic expression support

C# 4,167 615 Updated Nov 19, 2025

Dapr is a portable runtime for building distributed applications across cloud and edge, combining event-driven architecture with workflow orchestration.

Go 25,469 2,030 Updated Feb 2, 2026

Documentation and Samples for the Official HN API

12,939 716 Updated Jan 1, 2025
Next