CodeCreator

Follow

Alexander Wettig CodeCreator

Follow

PhD Student @princeton-nlp

79 followers · 0 following

https://cs.princeton.edu/~awettig

Achievements

Achievements

Highlights

Pro

Organizations

Stars

fluid-tools / wav0

AI-Native Studio for everything audio. Sound Design/Audio Editing/AI Producer/Fast Exports. Built for web, runs anywhere.

TypeScript 102 4 Updated Jan 10, 2026

ynqa / jnv

Interactive JSON filter using jq

Rust 5,940 72 Updated Mar 17, 2026

ChenmienTan / RL2

Python 1,192 123 Updated Feb 28, 2026

princeton-nlp / HELMET

The HELMET Benchmark

Jupyter Notebook 204 39 Updated Feb 26, 2026

SWE-bench / SWE-smith

[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents

Python 600 114 Updated Mar 16, 2026

revbucket / minhash-rs

Minhashing done in rust

Rust 6 4 Updated Nov 17, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 9,048 1,123 Updated Feb 9, 2026

zhuzilin / ring-flash-attention

Ring attention implementation with flash attention

Python 996 95 Updated Sep 10, 2025

CodeCreator / WebOrganizer

Organize the Web: Constructing Domains Enhances Pre-Training Data Curation

Jupyter Notebook 79 7 Updated May 2, 2025

hiyouga / LlamaFactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 68,651 8,366 Updated Mar 18, 2026

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 5,155 749 Updated Mar 18, 2026

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 25,953 2,418 Updated Nov 24, 2025

BurntSushi / ripgrep

ripgrep recursively searches directories for a regex pattern while respecting your gitignore

Rust 61,067 2,435 Updated Feb 27, 2026

HazyResearch / aioli

Aioli: A unified optimization framework for language model data mixing

Jupyter Notebook 32 4 Updated Jan 17, 2025

srush / awesome-o1

A bibliography and survey of the papers surrounding o1

TeX 1,213 51 Updated Nov 16, 2024

andy-yang-1 / DoubleSparse

16-fold memory access reduction with nearly no loss

Python 108 9 Updated Mar 26, 2025

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 24,699 4,881 Updated Mar 18, 2026

karimknaebel / turm

TUI for the Slurm Workload Manager

Rust 465 24 Updated Mar 7, 2026

RulinShao / LightSeq

Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training

Python 222 10 Updated Aug 19, 2024

thu-ml / warplda

Cache efficient implementation for Latent Dirichlet Allocation

C++ 165 54 Updated Jan 4, 2019

datalab-to / marker

Convert PDF to markdown + JSON quickly with high accuracy

Python 32,780 2,269 Updated Mar 10, 2026

mlfoundations / dclm

DataComp for Language Models

HTML 1,427 131 Updated Sep 9, 2025

cxcscmu / MATES

Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]

Python 79 9 Updated Nov 14, 2024

lwouis / alt-tab-macos

Windows alt-tab on macOS

Swift 15,131 511 Updated Mar 15, 2026

October2001 / ProLong

[ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Models

Python 60 Updated Jul 23, 2024

glato / emerge

Emerge is a browser-based interactive codebase and dependency visualization tool for many different programming languages. It supports some basic code quality and graph metrics and provides a simpl…

Python 1,039 71 Updated Oct 26, 2024

databricks / megablocks

Python 1,545 224 Updated Jun 26, 2025

SWE-agent / SWE-agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]

Python 18,776 2,026 Updated Mar 16, 2026

princeton-nlp / QuRating

[ICML 2024] Selecting High-Quality Data for Training Language Models

Python 201 14 Updated Dec 8, 2025

uclaml / SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)

Python 1,234 104 Updated May 8, 2024