Skip to content
View diptanu's full-sized avatar
  • Facebook
  • San Francisco

Block or report diptanu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

Go 1,340 52 Updated Jan 23, 2026

A realtime serving engine for Data-Intensive Generative AI Applications

Rust 1,093 142 Updated Jan 25, 2026

📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG

Python 8,684 643 Updated Jan 25, 2026

A flexible, adaptive classification system for dynamic text classification

Python 526 36 Updated Oct 7, 2025

Table Structure Recognition

Python 28 2 Updated Jul 25, 2024

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Python 496 60 Updated Jul 20, 2025
Jupyter Notebook 391 58 Updated Jan 7, 2024

OCR toolbox from Davar-Lab

Python 9 2 Updated Jan 8, 2024

A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating.

224 11 Updated Sep 9, 2024

Rust actor framework

Rust 1,934 116 Updated Dec 16, 2025

Fault-tolerant async actors for Rust that scale seamlessly

Rust 1,192 64 Updated Jan 19, 2026

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

C++ 1,814 200 Updated Apr 9, 2025

A curated list of recent and past chart understanding work based on our IEEE TKDE survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Mod…

231 23 Updated Dec 17, 2025

Guideline following Large Language Model for Information Extraction

Python 425 27 Updated Oct 27, 2024

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Python 1,949 147 Updated Apr 14, 2025

CBOR: Concise Binary Object Representation

Rust 83 10 Updated Nov 30, 2025

A cloud native embedded storage engine built on object storage.

Rust 2,668 181 Updated Jan 24, 2026

WIP - Allows you to create DSPy pipelines using ComfyUI

Python 200 11 Updated Dec 1, 2024

Durable workflow automation in just a few lines of code

Go 1,084 41 Updated Jan 25, 2026

Query your PDF documents and get more insights from them

Python 5 Updated Apr 28, 2024

In-memory vector store with efficient read and write performance for semantic caching and retrieval system. Redis for Semantic Caching.

Rust 376 13 Updated Nov 29, 2024

LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a chain of LLMs to find the answer. The user can see the progres…

Go 5,965 372 Updated Dec 11, 2025

SQLSync is a collaborative offline-first wrapper around SQLite. It is designed to synchronize web application state between users, devices, and the edge.

Rust 2,864 42 Updated Nov 19, 2025

Rust bindings for the C++ api of PyTorch.

Rust 5,237 411 Updated Jan 22, 2026

CodeXGLUE

C# 1,800 390 Updated Apr 23, 2024

Postgres-native columnar storage extension

C 3,010 99 Updated Feb 10, 2025

Rate limiting, caching, and request prioritization for modern workloads

Go 725 35 Updated Dec 21, 2025

RiteRaft - A raft framework, for regular people

Rust 333 24 Updated Feb 18, 2024

gamedev blog

3,313 147 Updated Mar 8, 2021

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

Python 333,281 54,188 Updated Nov 3, 2025
Next