Skip to content
Change the repository type filter

All

    Repositories list

    • Python
      0500Updated Jan 4, 2026Jan 4, 2026
    • JAIL-CON

      Public
      [NeurIPS'25] Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency (https://arxiv.org/abs/2510.21189)
      Python
      0200Updated Dec 24, 2025Dec 24, 2025
    • Python
      0200Updated Dec 9, 2025Dec 9, 2025
    • Official Website of JADES
      SCSS
      0000Updated Sep 12, 2025Sep 12, 2025
    • T-GPS

      Public
      Python
      0300Updated Sep 7, 2025Sep 7, 2025
    • JADES

      Public
      This is the public code repository of paper 'JADES: A Universal Framework for Jailbreak Assessment via Decompositional Scoring'
      0600Updated Aug 27, 2025Aug 27, 2025
    • GPTracker

      Public
      [S&P'25] GPTracker: A Large-Scale Measurement of Misused GPTs
      Python
      1900Updated Jul 25, 2025Jul 25, 2025
    • SaferVLM

      Public
      Python
      0700Updated Jul 19, 2025Jul 19, 2025
    • Python
      78400Updated Jun 8, 2025Jun 8, 2025
    • [ACL2025] Official repository for "Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media"
      Python
      1800Updated May 29, 2025May 29, 2025
    • This is the public code repository for the paper 'Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models'
      Python
      11000Updated May 21, 2025May 21, 2025
    • HateBench

      Public
      [USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
      31300Updated Mar 1, 2025Mar 1, 2025
    • [Usenix Security 2025] Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications
      Python
      1500Updated Jan 29, 2025Jan 29, 2025
    • [Usenix Security 2025] On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
      Python
      0510Updated Jan 29, 2025Jan 29, 2025
    • 0100Updated Jan 28, 2025Jan 28, 2025
    • ModSCAN

      Public
      An official public repository of the paper "ModSCAN: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities" (https://arxiv.org/abs/2410.06967).
      Python
      1300Updated Jan 8, 2025Jan 8, 2025
    • ICL-MIA

      Public
      Python
      0510Updated Dec 19, 2024Dec 19, 2024
    • Python
      0900Updated Dec 18, 2024Dec 18, 2024
    • JavaScript
      0810Updated Oct 30, 2024Oct 30, 2024
    • ZeroFake

      Public
      Python
      21110Updated Oct 30, 2024Oct 30, 2024
    • homepage

      Public
      JavaScript
      0000Updated Oct 14, 2024Oct 14, 2024
    • 0000Updated Aug 28, 2024Aug 28, 2024
    • ML-Doctor

      Public
      Code for ML Doctor
      Python
      0600Updated Aug 14, 2024Aug 14, 2024
    • Code for Voice Jailbreak Attacks Against GPT-4o.
      Python
      13610Updated May 31, 2024May 31, 2024
    • easy-bib

      Public
      TeX
      1501Updated Mar 9, 2024Mar 9, 2024
    • .github

      Public
      0000Updated Feb 28, 2024Feb 28, 2024
    • Python
      0600Updated Feb 23, 2024Feb 23, 2024
    • A dataset consists of 6,387 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 666 jailbreak prompts).
      21400Updated Feb 21, 2024Feb 21, 2024
    • Python
      0200Updated Feb 21, 2024Feb 21, 2024
    • MGTBench

      Public
      Python
      0700Updated Feb 21, 2024Feb 21, 2024