PARE is a Python research framework for evaluating proactive AI assistants through active user simulation. Built on top of Meta-ARE, it provides a realistic mobile-phone simulation environment where a proactive assistant must observe user behavior, infer goals, and intervene helpfully -- without being asked.
- Paper: PARE: Simulating Active Users to Evaluate Proactive Assistants
- Documentation: deepakn97.github.io/pare
Proactive assistants need to decide when to help and what to do -- all from passively observing user activity. Evaluating this requires simulating realistic users in realistic environments, which is what PARE provides:
- 9 domain apps modeled as finite state machines: Apartment, Cab, Calendar, Contacts, Email, Messaging, Note, Reminder, and Shopping
- 2 core system apps:
HomeScreenSystemAppfor navigation (open, switch, go home) andPAREAgentUserInterfacefor proposal management (accept/reject) - 143 benchmark scenarios spanning multi-app orchestration, goal inference, and intervention timing
- Observe-Execute agent architecture with configurable models per stage
- Oracle validation to automatically verify task completion
PARE orchestrates a two-agent simulation: a user agent that navigates the phone realistically, and a proactive agent that observes and intervenes.
The key insight is asymmetric interfaces. The user agent sees only the tools available on the current screen (just like a real user tapping through apps), while the proactive agent gets flat API access to all apps for efficient task execution. This forces realistic user behavior without handicapping the assistant.
Sending a message requires navigating through screens for the user (right), but a single API call for the assistant (left).
- Python 3.12+
- uv package manager
git clone git@github.com:deepakn97/pare.git
cd pare
make installCopy the example environment file and fill in your API keys:
cp .env.example .envEdit .env with the keys for the providers you plan to use:
# Required for GPT models (gpt-5, gpt-5-mini, gpt-4o, etc.)
OPENAI_API_KEY=your_openai_api_key_here
# Required for Hugging Face model access
HF_TOKEN=your_hf_token_here
# Required for AWS Bedrock models (llama-4-scout, llama-4-maverick, etc.)
AWS_ACCESS_KEY_ID=aws_access_key_id
AWS_SECRET_ACCESS_KEY=secret_access_key_id
AWS_REGION_NAME="us-east-1"
# Or use the new Bedrock API key:
AWS_BEARER_TOKEN_BEDROCK=new_aws_api_key_here
# Scenario configuration (defaults to benchmark/)
PARE_SCENARIOS_DIR=benchmark
# Path to environment augmentation data (relative to project root)
ENV_AUGMENTATION_DATA_PATH="data/metaare_augmentation_data.json"pare benchmark sweep -s email_notification -om gpt-5 -em gpt-5pare benchmark sweep --split full -om gpt-5 -em gpt-5 --runs 3Model pairs are zipped (not crossed). Each --observe-model is paired with the corresponding --execute-model:
pare benchmark sweep --split full \
-om gpt-5 -om claude-4.5-sonnet \
-em gpt-5 -em claude-4.5-sonnet \
--runs 3Results are saved in a structured directory under results/:
results/
{experiment}_{split}_user_{model}_mt_{turns}_umi_..._omi_..._emi_.../
obs_{model}_exec_{model}_..._result.json
obs_{model}_exec_{model}_..._report.txt
combined_report.txt
Use pare benchmark sweep --help for the full list of configuration options.
pare annotation sample -t <traces_dir> -n <size> # Sample decision points for human eval
pare annotation launch # Launch annotation UI
pare cache status # Show cache location and entry count
pare cache invalidate # Clear cached resultsNote
macOS users: The --executor-type process option may fail due to a known Python multiprocessing issue with the 'spawn' method on macOS. Use the default --executor-type thread instead.
Full API reference and architecture docs are available at deepakn97.github.io/pare.
See CONTRIBUTING.md for development setup, code style guidelines, and how to submit pull requests.
This project is licensed under the terms of the MIT License.
If you use PARE in your research, please cite:
@misc{nathani2026proactiveagentresearchenvironment,
title={Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants},
author={Deepak Nathani and Cheng Zhang and Chang Huan and Jiaming Shan and Yinfei Yang and Alkesh Patel and Zhe Gan and William Yang Wang and Michael Saxon and Xin Eric Wang},
year={2026},
eprint={2604.00842},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2604.00842},
}
