guardrail

Here are 4 public repositories matching this topic...

git-disl / Virus

This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"

attack moderation safety emergent defense malicious fine-tuning harmful guardrail llms misalignment

Updated Feb 2, 2025
Python

whitecircle-ai / circle-guard-bench

Star

First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and safeguards)

benchmarking benchmark ai jailbreak safeguard guardrail guardrails large-language-models llm large-language-model llm-security llm-eval llm-evaluation llm-as-a-judge llm-jailbreaks

Updated Dec 3, 2025
Python

Kavach AI provides robust, multi-layered content moderation and safety guardrails for AI systems. It helps protect your AI applications from harmful content, jailbreak attempts, prompt injections, and other security vulnerabilities.

ai guardrail guardrails aiagent guardrail-module aiagentsframework aiagentframework

Updated Apr 12, 2025
Python

radlab-dev-group / llm-router-plugins

Star

A companion repository for llm-router containing a collection of pipeline-ready plugins. Features a masking interface for anonymizing sensitive data and a guardrail system for validating input/output safety against defined policy rules.

plugins anonymization pii masker guardrail llm genai llm-router llm-gateway llm-router-plugins

Updated Dec 26, 2025
Python

Improve this page

Add a description, image, and links to the guardrail topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the guardrail topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly