Projects – UC Berkeley Sky Computing Lab

StringSight

Turning Model Conversations and Agentic Traces into Actionable Insights

Learn More

DS-Serve

A Framework for Efficient and Scalable Neural Retrieval

Learn More

SkyLight

Advancing the Frontier of Sparse Attention Research

Learn More

Interruptible LRMs

Are Large Reasoning Models Interruptible?

Learn More

rLLM

Democratizing Reinforcement Learning for LLMs

Learn More

ADRS

AI-Driven Research Systems

Learn More

kvcached

Elastic KV Cache for Dynamic GPU Sharing and Efficient Multi-LLM Inference

Learn More

GEPA

System Optimization through Reflective Text Evolution

Learn More

MiniScope

A Least Privilege Framework for Authorizing Tool Calling Agents

Learn More

AgentThink

A Systematic Evaluation Framework that Automatically Identifies Failure Patterns in LLMs

Learn More

vCache

Reliable and Efficient Semantic Prompt Caching

Learn More

DeepScholar-Bench

A Live Benchmark for Generative Research Synthesis

Learn More

LEANN

Fast, Accurate, and 100% Private RAG on your Laptop

Learn More

REVERSE

Retrospective Verification and Self-Correction

Learn More

GSO

Challenging Software Optimization Tasks for Evaluating SWE-Agents

Learn More

Matryoshka

Semantic-Aware Parsing for Security Logs

Learn More

Search Arena

A Crowdsourced In-The-Wild Evaluation Platform for Search-Augmented LLM Systems Based on Human Preference

Learn More

SkyRL

Online RL Training for Real-World Long-Horizon Agents

Learn More

SkyServe

Serving AI Models across Regions and Clouds with Spot Instances

Learn More

Myco

Unlocking Polylogarithmic Accesses in Metadata-Private Messaging

Learn More

R2E-Gym

Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Learn More

MAST

Multi-Agent System Failure Taxonomy

Learn More

Agentica Project

Building Generalist Agents That Scale

Learn More

BARE

A Method for Combining Base Language Models and Instruction-Tuned Language Models for Better Synthetic Data Generation.

Learn More

Ember

A Compositional Framework for Building and Deploying Large Inference-Time Scaling Architectures and Strategies

Learn More

TAG-Bench

A Benchmark for Table-Augmented Generation

Learn More

UCCL

An Efficient Collective Communication Library for GPUs

Learn More

VibeCheck

Give Your Generative Models a Vibe Check 😀

Learn More

NovaSky

Next-Generation Open Vision and AI

Learn More

Sky-T1

Train Your Own o1 Preview Model Within $450

Learn More

multilspy

LSP client library in python to build applications around language servers

Learn More

Spatialyze

A New Framework for End-to-End Querying of Geospatial Videos

Learn More

Compass

Encrypted Semantic Search with High Accuracy

Learn More

DSPy

The Framework for Programming—Not Prompting—Foundation Models

Learn More

LOTUS

Easily Build Knowledge-Intensive LLM Applications That Reason Over Your Data With LOTUS!

Learn More

S-LoRA

Serving Thousands of Concurrent LoRA Adapters

Learn More

RouteLLM

A Framework for Serving and Evaluating LLM Routers – Save LLM Costs Without Compromising Quality!

Learn More

Stylus

Automatic Adapter Selection for Diffusion Models

Learn More

VideoArena

The First Dynamic Leaderboard for SOTA Text-To-Video Generation Models

Learn More

Gorilla OpenFunctions

Elevating LLM Function Calling with Versatile API Integration

Learn More

Berkeley Function-Calling Leaderboard

Measuring Function-Calling Capabilities of Different LLMs

Learn More

R2E

A Dynamic Framework for Evaluating AI Coding Systems

Learn More

Rollbaccine

A General Solution to Rollback Attacks in TEEs

Learn More

Auto-Whittaker

Automatically Rewriting Distributed Protocols for Scalability

Learn More

Scrooge

Enabling Replicated State Machines to Communicate Efficiently

Learn More

SVR3

Secret Key Recovery in a Global-Scale End-to-End Encryption System

Learn More

Skydentity

Let Orchestrators Run Your Workloads on Your Cloud Resources Without Handing Over Your Cloud Credentials and Data

Learn More

Flock

A Framework for Deploying On-Demand Distributed Trust

Learn More

POET

Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

Learn More

SkyPIE

A Fast & Accurate Oracle for Object Placement

Learn More

LiveCodeBench

Holistic and Contamination Free Evaluation of Large Language Models for Code

Learn More

Skyplane

Blazing Fast Bulk Data Transfers Between Any Cloud

Learn More

RAFT

“Retrieval-Augmented Fine-Tuning” combines the benefits of Retrieval-Augmented Generation and Fine-Tuning for better domain adaptation

Learn More

Arena Hard

An Automatic Pipeline to Build High-Quality LLM Benchmarks with High Separability and Agreement to Human Preference from Live Data

Learn More

SGLang

A Fast Serving Framework For Large Language Models and Vision Language Models

Learn More

vLLM

Building the fastest and easiest-to-use inference engine for LLMs

Learn More

Vicuna

An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality

Learn More

Chatbot Arena

An Open Platform for Evaluating LLMs by Human Preference

Learn More

GoEx

A Runtime for LLM-Generated Actions like Code, API Calls, and More.

Learn More

Embarcadero

A Totally Ordered, High Throughput, Pub/Sub System with Disaggregated Memory

Learn More