Browser Automation with AI - Transform natural language instructions into browser actions seamlessly.
For better quality media(open image for see video): https://i.imgur.com/hf9XgAi.mp4
- π AI-Powered: Uses Google Gemini or OpenAI to understand natural language instructions
- π Simple Syntax: Write automation in plain English
- π Smart Retry: Automatic retry with error context for robust execution
- π¦ Function System: Define and reuse instruction blocks
- π Clean API: Both CLI and Python API available
- π Stealth Mode: Advanced anti-detection for realistic browsing
- π Error Screenshots: Automatic screenshots on failure
- π Caching: Smart prompt caching for faster execution
# Clone the repository
git clone https://github.com/SwintexD/bAUTO.git
cd bauto
# Install dependencies
pip install -r requirements.txt
# Or install from PyPI (coming soon)
pip install bautoGet a free Google Gemini API key from Google AI Studio
# Interactive setup
python -m bauto.cli setup
# Or create .env file manually
echo "GOOGLE_API_KEY=your_api_key_here" > .envpython quick_start.py# Run automation from file
python -m bauto.cli run instructions.yaml
# Quick automation without file
python -m bauto.cli quick "https://google.com" "Search for AI automation"
# Check system info
python -m bauto.cli infofrom bauto import BrowserAutomator, Config, ModelConfig
# Simple usage
automator = BrowserAutomator()
automator.run("Go to google.com and search for Python")
# With custom configuration
config = Config(
model=ModelConfig(model_name="models/gemini-2.0-flash"),
browser=BrowserConfig(headless=True),
automation=AutomationConfig(retry_attempts=3)
)
automator = BrowserAutomator(config)
automator.run("Navigate to https://example.com")Create a YAML file with your instructions:
# my_task.yaml
instructions: |
# Simple task
Navigate to https://google.com
Wait 2 seconds
Find the search box
Type "AI automation" in the search box
Press Enter
Wait 3 seconds
Take a screenshot and save as "result.png"Run it:
python -m bauto.cli run my_task.yamlDefine reusable functions:
instructions: |
# Define a login function
DEFINE_FUNCTION login
Navigate to https://example.com/login
Type "username" in username field
Type "password" in password field
Click login button
Wait 2 seconds
END_FUNCTION
# Use the function
CALL login
Navigate to dashboard
Take screenshotbauto/
βββ core/ # Core automation logic
β βββ automator.py # Main orchestrator
β βββ ai_interface.py # AI provider interface
β βββ code_generator.py # Code generation
β βββ parser.py # Instruction parser
βββ engine/ # Execution engine
β βββ browser.py # Browser management
β βββ action_engine.py # Action execution
β βββ memory.py # Memory system
βββ config/ # Configuration system
β βββ settings.py # Config dataclasses
βββ utils/ # Utilities
β βββ logger.py # Logging
β βββ file_utils.py # File operations
βββ examples/ # Example instruction files
tests/ # Comprehensive test suite
quick_start.py # Quick demo script
Check out the bauto/examples/ directory for complete examples:
- wikipedia_example.yaml - Simple Wikipedia search
- shopping_example.yaml - E-commerce workflow
- social_media_example.yaml - Social media automation with functions
- advanced_example.yaml - Complex GitHub workflow
- form_filling_example.yaml - Form automation
GOOGLE_API_KEY=your_gemini_api_key
OPENAI_API_KEY=your_openai_api_key # AlternativeCreate config.yaml:
model:
provider: gemini
model_name: models/gemini-2.0-flash
temperature: 0.0
browser:
headless: false
stealth_mode: true
profile_dir: browser_profile
automation:
retry_attempts: 3
action_delay: 0.5
screenshot_on_error: true
log_level: INFO# Install dev dependencies
pip install -r requirements-dev.txt
# Run tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=bauto --cov-report=html
# Run specific tests
pytest tests/test_parser.py -v
# Run linting
black bauto/ tests/
ruff check bauto/ tests/The framework provides a clean interface over Selenium:
env.navigate(url) # Navigate to URL
env.find_element_by_text("text") # Find element by text
env.click(element) # Click element
env.type_text(element, "text") # Type text
env.screenshot("filename.png") # Take screenshot
env.scroll("down") # Scroll page
env.wait(seconds) # Wait- Navigation: Navigate, go to, visit
- Interaction: Click, type, press enter, scroll
- Waiting: Wait X seconds, pause
- Screenshots: Take screenshot, capture page
- Forms: Fill form, select option, check checkbox
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Blank Screenshots
- Solution: Add
Wait 3 secondsafter navigation before taking screenshots
Element Not Found
- Solution: Add wait times and use more specific descriptions
Browser Crashes
- Solution: Try disabling headless mode or clearing browser profile
For more help, check Issues or create a new one.
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with Selenium
- Powered by Google Gemini
- Inspired by the need for simpler browser automation
- 8 main modules with clean architecture
- 15+ classes well documented
- 50+ methods with type hints
- Comprehensive test suite with pytest
- 5 complete examples included
- Discussions - Ask questions, share ideas
- Issues - Report bugs, request features
- Contributing - Contribute to the project
If you find this project useful, please consider giving it a star! β
Made with β€οΈ by the bAUTO community
Version: 1.0.0 | Python: 3.8+ | License: MIT