Build AI Agents That Understand, Simulate, and Act in the Real World
Quick Start β’ Features β’ Arena β’ Docs β’ Contributing β’ Discord
Spatial Agents is an open-source framework for building AI agents with true spatial intelligenceβagents that can understand the physical world, simulate environments, and take actions in reality.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β π§ SPATIAL AGENTS β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β UNDERSTAND β β β SIMULATE β β β ACT β β
β β β β β β β β
β β 3D Scenes β β Physics β β Navigation β β
β β Objects β β Worlds β β Manipulationβ β
β β Spatial β β Scenarios β β Real-world β β
β β Relations β β What-ifs β β Execution β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β π QAI EARTH ENGINE β β
β β Real-world geospatial data β’ Physics β’ 3D environments β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Current AI is blind to the physical world. Spatial Agents gives AI the ability to:
- See β Understand 3D scenes, object relationships, and spatial layouts
- Think β Reason about physics, predict outcomes, plan paths
- Simulate β Test actions in virtual environments before execution
- Act β Navigate, manipulate, and operate in the real world
Perceive and comprehend spatial environmentsβ3D scene understanding, object recognition, spatial relationship extraction, and depth perception.
Run physics-accurate simulations, test hypothetical scenarios, predict outcomes, and train agents in virtual worlds before real deployment.
Execute real-world actionsβnavigation, object manipulation, robotic control, and autonomous decision-making in physical environments.
Benchmark and compete AI models in spatial reasoning challenges. Compare performance across navigation, physics, and 3D understanding.
Plug in any LLM/VLMβOpenAI, Anthropic, Google, DeepSeek, Qwen, or your own custom model.
Connect to real-world geospatial data, maps, terrain, and environmental information.
pip install spatial-agentsfrom spatial_agents import SpatialAgent, Environment
from spatial_agents.capabilities import Vision, Navigation, Manipulation
# Create an agent with spatial capabilities
agent = SpatialAgent(
model="claude-sonnet-4-20250514",
capabilities=[Vision, Navigation, Manipulation]
)
# Connect to an environment
env = Environment.from_location("san_francisco", radius_km=5)
# Agent understands, simulates, and acts
observation = agent.perceive(env)
plan = agent.reason("Navigate to the nearest coffee shop avoiding traffic")
result = agent.execute(plan, simulate_first=True)from spatial_agents import SpatialAgent
agent = SpatialAgent(model="gpt-4o")
# Understand a 3D scene from images
scene = agent.understand(
images=["room_view_1.jpg", "room_view_2.jpg"],
query="Where is the laptop relative to the window?"
)
print(scene.objects) # Detected objects with 3D positions
print(scene.relationships) # Spatial relationships between objects
print(scene.answer) # "The laptop is 2m left of the window, on the desk"from spatial_agents import SpatialAgent, Simulation
agent = SpatialAgent(model="claude-opus-4-5-20250514")
# Plan a complex action
action = agent.plan("Move the robotic arm to pick up the red cube")
# Simulate first to verify safety
sim = Simulation(physics=True)
outcome = sim.run(action, steps=100)
if outcome.success and outcome.collision_free:
agent.execute(action) # Safe to execute in real worldTest and compare spatial intelligence across different AI models.
from spatial_agents import Arena, Agent
arena = Arena(challenge="urban_navigation")
results = arena.compete([
Agent(model="claude-opus-4-5-20250514", name="Claude"),
Agent(model="gpt-4o", name="GPT-4o"),
Agent(model="gemini-ultra", name="Gemini"),
])
print(results.leaderboard())| Rank | Model | Navigation | Object Reasoning | Physics | 3D Understanding | Overall |
|---|---|---|---|---|---|---|
| π₯ | Claude Opus 4.5 | 94.2 | 91.8 | 89.5 | 92.1 | 91.9 |
| π₯ | GPT-4o | 92.1 | 90.3 | 88.7 | 89.4 | 90.1 |
| π₯ | Gemini Ultra | 89.8 | 88.9 | 90.2 | 87.6 | 89.1 |
| 4 | DeepSeek V3 | 87.2 | 86.5 | 85.8 | 84.9 | 86.1 |
| 5 | Qwen 2.5 | 85.4 | 84.2 | 83.1 | 82.8 | 83.9 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SPATIAL AGENTS β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β Perception β β Simulation β β Action β β
β β β β β β β β
β β β’ 3D Vision β β β’ Physics β β β’ Navigation β β
β β β’ Depth β β β’ Scenarios β β β’ Manipulation β β
β β β’ Object Det. β β β’ Prediction β β β’ Control β β
β β β’ Scene Graph β β β’ Training β β β’ Execution β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β Models β β Arena β β Benchmarks β β
β β β β β β β β
β β β’ OpenAI β β β’ Competitions β β β’ SpatialIQ β β
β β β’ Anthropic β β β’ Challenges β β β’ NavScore β β
β β β’ Google β β β’ Leaderboards β β β’ PhysicsBench β β
β β β’ DeepSeek/Qwen β β β’ Evaluation β β β’ 3D-Reason β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β QAI EARTH ENGINE β β
β β Real-world geospatial data β’ Physics β’ 3D environments β β
β β Terrain β’ Buildings β’ Sensor simulation β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3D scene understanding, object detection, depth estimation, spatial relationship extraction, multi-view reconstruction.
Path planning, obstacle avoidance, SLAM, multi-destination routing, real-world wayfinding.
Spatial logic, physics prediction, object permanence, "what's behind X?", relative positioning.
Force understanding, trajectory prediction, stability analysis, collision detection, material properties.
Robotic manipulation, motion planning, grasping, assembly, human-robot interaction.
Real-world maps, satellite imagery, terrain analysis, urban environments, global positioning.
from spatial_agents.benchmarks import SpatialIQBenchmark
benchmark = SpatialIQBenchmark()
results = benchmark.evaluate(your_model, split="test")
print(results.to_latex())
print(results.statistical_analysis())Cite our work:
@software{spatial_agents_2025,
title={Spatial Agents: AI Agents That Understand, Simulate, and Act in the Real World},
author={Qian, Dr. and QAI Lab},
year={2025},
url={https://github.com/qai-lab/spatial-agents}
}We welcome contributions! Whether you're:
- π Adding perception β New vision models, depth estimators
- π Building simulations β Physics engines, virtual environments
- π― Creating actions β Robotics integrations, control systems
- ποΈ Designing challenges β New arena benchmarks
- π Improving docs β Help others build spatial agents
Check out our Contributing Guide to get started.
git clone https://github.com/qai-lab/spatial-agents.git
cd spatial-agents
pip install -e ".[dev]"
pytest- π¬ Discord: discord.gg/qai-lab
- πΌ LinkedIn: QAI Lab
- π¦ X/Twitter: @qai_lab
- π§ Issues: GitHub Issues
Built with β€οΈ by Dr. Qian and QAI Lab
β Star us on GitHub β it helps the project grow!
AI agents that see, think, simulate, and act in the physical world

