-
Facebook
- San Francisco
Stars
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
A realtime serving engine for Data-Intensive Generative AI Applications
📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG
A flexible, adaptive classification system for dynamic text classification
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
OCR toolbox from Davar-Lab
A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating.
Fault-tolerant async actors for Rust that scale seamlessly
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
A curated list of recent and past chart understanding work based on our IEEE TKDE survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Mod…
Guideline following Large Language Model for Information Extraction
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
A cloud native embedded storage engine built on object storage.
WIP - Allows you to create DSPy pipelines using ComfyUI
Durable workflow automation in just a few lines of code
Query your PDF documents and get more insights from them
In-memory vector store with efficient read and write performance for semantic caching and retrieval system. Redis for Semantic Caching.
LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a chain of LLMs to find the answer. The user can see the progres…
SQLSync is a collaborative offline-first wrapper around SQLite. It is designed to synchronize web application state between users, devices, and the edge.
Rust bindings for the C++ api of PyTorch.
Rate limiting, caching, and request prioritization for modern workloads
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.





