🐢 Open-Source Evaluation & Testing library for LLM Agents
-
Updated
Nov 18, 2025 - Python
🐢 Open-Source Evaluation & Testing library for LLM Agents
Deliver safe & effective language models
MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.
A Python library for verifying code properties using natural language assertions.
Open-source framework for stress-testing LLMs and conversational AI. Identify hallucinations, policy violations, and edge cases with scalable, realistic simulations. Join the discord: https://discord.gg/ssd4S37WNW
Turn plain English into Robot Framework files with AI. No dependencies, no hassle — just validated, ready-to-run tests
Prompture is an API-first library for requesting structured JSON output from LLMs (or any structure), validating it against a schema, and running comparative tests between models.
🚀 First multimodal AI-powered visual testing plugin for Claude Code. AI that can SEE your UI! 10x faster frontend development with closed-loop testing, browser automation, and Claude 4.5 Sonnet vision.
Ethical AI Governance Platform | Bias Detection | Compliance | Fairness Testing for ML, LLM & Multimodal AI | Open Source
Finder for the LMArena anonymous model codenamed "riftrunner" (community‑dubbed Gemini 3.0 Pro RC), using automated prompts and fingerprinting on lmarena.ai.
Integration of OpenAI with Pytest to automate API test generation.
🚀 ARM64 Browser Automation for Claude Code - SaaS testing on 80 Raspberry Pi budget. The first solution that works where Playwright/Puppeteer fail on ARM64. Autonomous testing without human debugging.
Multi-agent simulation using LLMs. Agents autonomously decide actions for survival, reproduction, and social behavior in a grid world.This project aims to replicate a paper published in 2025 (arXiv:2508.12920).
A lightweight dashboard to view and analyze test automation results. Built with Streamlit + PostgreSQL, and powered by AI (Gemini) to help debug test failures faster.
An automated approach for exploring and testing conversational agents using large language models. TRACER discovers chatbot functionalities, generates user profiles, and creates comprehensive test suites for conversational AI systems.
An AI Testing Framework
🚀 Comprehensive testing framework for LLM applications with semantic assertions, multi-provider support, RAG testing, and prompt optimization. Test AI the right way!
A new package that helps developers integration-test AI and LLM applications by validating structured outputs. It takes a user's test scenario or prompt as input, sends it to an LLM, and uses pattern
An MCP server that automatically fixes broken Selenium test locators when UI changes occur, reducing test maintenance overhead.
Add a description, image, and links to the ai-testing topic page so that developers can more easily learn about it.
To associate your repository with the ai-testing topic, visit your repo's landing page and select "manage topics."