This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"
-
Updated
Feb 2, 2025 - Python
This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"
First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and safeguards)
Kavach AI provides robust, multi-layered content moderation and safety guardrails for AI systems. It helps protect your AI applications from harmful content, jailbreak attempts, prompt injections, and other security vulnerabilities.
A companion repository for llm-router containing a collection of pipeline-ready plugins. Features a masking interface for anonymizing sensitive data and a guardrail system for validating input/output safety against defined policy rules.
Add a description, image, and links to the guardrail topic page so that developers can more easily learn about it.
To associate your repository with the guardrail topic, visit your repo's landing page and select "manage topics."