Skip to content

deepakn97/pare

Repository files navigation

PARE: Proactive Agent Research Environment

arXiv codecov Commit activity License

PARE is a Python research framework for evaluating proactive AI assistants through active user simulation. Built on top of Meta-ARE, it provides a realistic mobile-phone simulation environment where a proactive assistant must observe user behavior, infer goals, and intervene helpfully -- without being asked.

What is PARE?

Proactive assistants need to decide when to help and what to do -- all from passively observing user activity. Evaluating this requires simulating realistic users in realistic environments, which is what PARE provides:

  • 9 domain apps modeled as finite state machines: Apartment, Cab, Calendar, Contacts, Email, Messaging, Note, Reminder, and Shopping
  • 2 core system apps: HomeScreenSystemApp for navigation (open, switch, go home) and PAREAgentUserInterface for proposal management (accept/reject)
  • 143 benchmark scenarios spanning multi-app orchestration, goal inference, and intervention timing
  • Observe-Execute agent architecture with configurable models per stage
  • Oracle validation to automatically verify task completion

PARE framework overview

How It Works

PARE orchestrates a two-agent simulation: a user agent that navigates the phone realistically, and a proactive agent that observes and intervenes.

The key insight is asymmetric interfaces. The user agent sees only the tools available on the current screen (just like a real user tapping through apps), while the proactive agent gets flat API access to all apps for efficient task execution. This forces realistic user behavior without handicapping the assistant.

FSM-based navigation vs flat API access

Sending a message requires navigating through screens for the user (right), but a single API call for the assistant (left).

Quick Start

Prerequisites

  • Python 3.12+
  • uv package manager

Installation

git clone git@github.com:deepakn97/pare.git
cd pare
make install

Configure API Keys

Copy the example environment file and fill in your API keys:

cp .env.example .env

Edit .env with the keys for the providers you plan to use:

# Required for GPT models (gpt-5, gpt-5-mini, gpt-4o, etc.)
OPENAI_API_KEY=your_openai_api_key_here

# Required for Hugging Face model access
HF_TOKEN=your_hf_token_here

# Required for AWS Bedrock models (llama-4-scout, llama-4-maverick, etc.)
AWS_ACCESS_KEY_ID=aws_access_key_id
AWS_SECRET_ACCESS_KEY=secret_access_key_id
AWS_REGION_NAME="us-east-1"
# Or use the new Bedrock API key:
AWS_BEARER_TOKEN_BEDROCK=new_aws_api_key_here

# Scenario configuration (defaults to benchmark/)
PARE_SCENARIOS_DIR=benchmark

# Path to environment augmentation data (relative to project root)
ENV_AUGMENTATION_DATA_PATH="data/metaare_augmentation_data.json"

Run a Single Scenario

pare benchmark sweep -s email_notification -om gpt-5 -em gpt-5

Run the Full Benchmark

pare benchmark sweep --split full -om gpt-5 -em gpt-5 --runs 3

Model Sweep

Model pairs are zipped (not crossed). Each --observe-model is paired with the corresponding --execute-model:

pare benchmark sweep --split full \
  -om gpt-5 -om claude-4.5-sonnet \
  -em gpt-5 -em claude-4.5-sonnet \
  --runs 3

Results

Results are saved in a structured directory under results/:

results/
  {experiment}_{split}_user_{model}_mt_{turns}_umi_..._omi_..._emi_.../
    obs_{model}_exec_{model}_..._result.json
    obs_{model}_exec_{model}_..._report.txt
    combined_report.txt

Use pare benchmark sweep --help for the full list of configuration options.

Other CLI Commands

pare annotation sample -t <traces_dir> -n <size>   # Sample decision points for human eval
pare annotation launch                               # Launch annotation UI
pare cache status                                    # Show cache location and entry count
pare cache invalidate                                # Clear cached results

Note

macOS users: The --executor-type process option may fail due to a known Python multiprocessing issue with the 'spawn' method on macOS. Use the default --executor-type thread instead.

Documentation

Full API reference and architecture docs are available at deepakn97.github.io/pare.

Contributing

See CONTRIBUTING.md for development setup, code style guidelines, and how to submit pull requests.

License

This project is licensed under the terms of the MIT License.

Citation

If you use PARE in your research, please cite:

@misc{nathani2026proactiveagentresearchenvironment,
      title={Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants},
      author={Deepak Nathani and Cheng Zhang and Chang Huan and Jiaming Shan and Yinfei Yang and Alkesh Patel and Zhe Gan and William Yang Wang and Michael Saxon and Xin Eric Wang},
      year={2026},
      eprint={2604.00842},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2604.00842},
}

About

A research framework for evaluating proactive AI assistants through active user simulation

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages