Skip to content
View CodeCreator's full-sized avatar

Highlights

  • Pro

Organizations

@princeton-nlp

Block or report CodeCreator

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

AI-Native Studio for everything audio. Sound Design/Audio Editing/AI Producer/Fast Exports. Built for web, runs anywhere.

TypeScript 102 4 Updated Jan 10, 2026

Interactive JSON filter using jq

Rust 5,940 72 Updated Mar 17, 2026
Python 1,192 123 Updated Feb 28, 2026

The HELMET Benchmark

Jupyter Notebook 204 39 Updated Feb 26, 2026

[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents

Python 600 114 Updated Mar 16, 2026

Minhashing done in rust

Rust 6 4 Updated Nov 17, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 9,048 1,123 Updated Feb 9, 2026

Ring attention implementation with flash attention

Python 996 95 Updated Sep 10, 2025

Organize the Web: Constructing Domains Enhances Pre-Training Data Curation

Jupyter Notebook 79 7 Updated May 2, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 68,651 8,366 Updated Mar 18, 2026

A PyTorch native platform for training generative AI models

Python 5,155 749 Updated Mar 18, 2026

Fully open reproduction of DeepSeek-R1

Python 25,953 2,418 Updated Nov 24, 2025

ripgrep recursively searches directories for a regex pattern while respecting your gitignore

Rust 61,067 2,435 Updated Feb 27, 2026

Aioli: A unified optimization framework for language model data mixing

Jupyter Notebook 32 4 Updated Jan 17, 2025

A bibliography and survey of the papers surrounding o1

TeX 1,213 51 Updated Nov 16, 2024

16-fold memory access reduction with nearly no loss

Python 108 9 Updated Mar 26, 2025

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 24,699 4,881 Updated Mar 18, 2026

TUI for the Slurm Workload Manager

Rust 465 24 Updated Mar 7, 2026

Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training

Python 222 10 Updated Aug 19, 2024

Cache efficient implementation for Latent Dirichlet Allocation

C++ 165 54 Updated Jan 4, 2019

Convert PDF to markdown + JSON quickly with high accuracy

Python 32,780 2,269 Updated Mar 10, 2026

DataComp for Language Models

HTML 1,427 131 Updated Sep 9, 2025

Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]

Python 79 9 Updated Nov 14, 2024

Windows alt-tab on macOS

Swift 15,131 511 Updated Mar 15, 2026

[ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Models

Python 60 Updated Jul 23, 2024

Emerge is a browser-based interactive codebase and dependency visualization tool for many different programming languages. It supports some basic code quality and graph metrics and provides a simpl…

Python 1,039 71 Updated Oct 26, 2024
Python 1,545 224 Updated Jun 26, 2025

SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]

Python 18,776 2,026 Updated Mar 16, 2026

[ICML 2024] Selecting High-Quality Data for Training Language Models

Python 201 14 Updated Dec 8, 2025

The official implementation of Self-Play Fine-Tuning (SPIN)

Python 1,234 104 Updated May 8, 2024
Next