Skip to content

Adversarial Machine Learning Toolkit - Model extraction, adversarial examples, neural network probing, and defense evaluation in Julia

Notifications You must be signed in to change notification settings

bad-antics/mirage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ”ฎ MIRAGE

Adversarial Machine Learning Toolkit

Model Extraction โ€ข Adversarial Examples โ€ข Neural Network Probing

Julia License NullSec

    โ–ˆโ–ˆโ–ˆโ•—   โ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—
    โ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ• โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•
    โ–ˆโ–ˆโ•”โ–ˆโ–ˆโ–ˆโ–ˆโ•”โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—  
    โ–ˆโ–ˆโ•‘โ•šโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ•  
    โ–ˆโ–ˆโ•‘ โ•šโ•โ• โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—
    โ•šโ•โ•     โ•šโ•โ•โ•šโ•โ•โ•šโ•โ•  โ•šโ•โ•โ•šโ•โ•  โ•šโ•โ• โ•šโ•โ•โ•โ•โ•โ• โ•šโ•โ•โ•โ•โ•โ•โ•
                                                    
      [ Adversarial ML Toolkit | bad-antics ]

โšก Overview

Mirage is a high-performance adversarial machine learning toolkit written in Julia, designed for security researchers and red teamers to evaluate ML model robustness. It provides tools for:

  • Model Extraction โ€” Steal model functionality through query-based attacks
  • Adversarial Examples โ€” Generate inputs that fool classifiers
  • Neural Network Probing โ€” Analyze model internals and decision boundaries
  • Defense Evaluation โ€” Test robustness of defensive measures

๐ŸŽฏ Features

๐Ÿ•ต๏ธ Model Extraction Attacks

Attack Description
Query Synthesis Generate synthetic queries to extract decision boundaries
Knockoff Nets Train surrogate models using API queries
JBDA Jacobian-Based Dataset Augmentation
ActiveThief Active learning for efficient extraction
CloudLeak MLaaS-specific extraction techniques

๐Ÿ’ฅ Adversarial Example Generation

Method Type Description
FGSM White-box Fast Gradient Sign Method
PGD White-box Projected Gradient Descent
C&W White-box Carlini-Wagner L2/Lโˆž attack
DeepFool White-box Minimal perturbation finder
AutoAttack White-box Ensemble of strongest attacks
Square Black-box Query-efficient score-based
HopSkipJump Black-box Decision-based attack
Boundary Black-box Decision boundary attack
SimBA Black-box Simple Black-box Attack
QEBA Black-box Query-Efficient Boundary Attack

๐Ÿ”ฌ Neural Network Analysis

  • Gradient Saliency โ€” Visualize input importance
  • Integrated Gradients โ€” Attribution methods
  • LIME โ€” Local interpretable explanations
  • Decision Boundary Mapping โ€” 2D/3D visualization
  • Neuron Activation Analysis โ€” Internal representation probing
  • Layer-wise Relevance Propagation โ€” Contribution analysis

๐Ÿ›ก๏ธ Defense Testing

  • Adversarial Training Evaluation
  • Input Preprocessing Bypass
  • Certified Defense Verification
  • Ensemble Robustness Testing
  • Detection Evasion

๐Ÿš€ Quick Start

Installation

using Pkg
Pkg.add(url="https://github.com/bad-antics/mirage")

# Or clone and develop
Pkg.develop(path="/path/to/mirage")

Basic Usage

using Mirage

# Initialize
Mirage.banner()
config = Mirage.init()

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# Model Extraction Attack
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

# Define target API
target = RemoteModel(
    "https://api.target.com/predict",
    headers = Dict("Authorization" => "Bearer TOKEN")
)

# Extract model
surrogate = extract_model(target,
    method = :knockoff,
    budget = 10000,          # Max queries
    input_shape = (28, 28),
    num_classes = 10
)

# Test fidelity
fidelity = evaluate_fidelity(surrogate, target, test_samples)
println("Extraction fidelity: $(fidelity * 100)%")

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# Adversarial Example Generation
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

# Load local model
model = load_model("classifier.onnx")

# Generate adversarial examples
adversarial = attack(model, image,
    method = :pgd,
    epsilon = 0.03,
    iterations = 40,
    step_size = 0.01
)

# Black-box attack (only predictions available)
adversarial = attack(target, image,
    method = :square,
    epsilon = 0.05,
    max_queries = 5000
)

# Check success
original_pred = predict(model, image)
adv_pred = predict(model, adversarial)
println("Original: $original_pred โ†’ Adversarial: $adv_pred")

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# Model Analysis
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

# Gradient saliency
saliency = gradient_saliency(model, image)
visualize_saliency(saliency)

# Decision boundary
boundary = map_decision_boundary(model, 
    samples = test_data,
    resolution = 100
)
plot_boundary(boundary)

# Neuron analysis
activations = probe_neurons(model, image, layer = 5)
top_neurons = most_activated(activations, k = 10)

๐Ÿ“– Attack Reference

White-Box Attacks

# FGSM - Fast Gradient Sign Method
adv = fgsm(model, x, y, epsilon = 0.03)

# PGD - Projected Gradient Descent
adv = pgd(model, x, y,
    epsilon = 0.03,
    alpha = 0.01,
    iterations = 40,
    random_start = true
)

# C&W - Carlini-Wagner
adv = carlini_wagner(model, x, y,
    confidence = 0.0,
    learning_rate = 0.01,
    max_iterations = 1000,
    binary_search_steps = 9
)

# DeepFool
adv = deepfool(model, x,
    max_iterations = 50,
    overshoot = 0.02
)

# AutoAttack (strongest combination)
adv = auto_attack(model, x, y,
    epsilon = 8/255,
    attacks = [:apgd_ce, :apgd_dlr, :fab, :square]
)

Black-Box Attacks

# Square Attack (score-based)
adv = square_attack(model, x, y,
    epsilon = 0.05,
    max_queries = 5000,
    p_init = 0.8
)

# HopSkipJump (decision-based)
adv = hopskipjump(model, x,
    target_label = nothing,  # untargeted
    max_queries = 10000,
    gamma = 1.0
)

# Boundary Attack
adv = boundary_attack(model, x,
    max_iterations = 10000,
    spherical_step = 0.01,
    source_step = 0.01
)

# SimBA
adv = simba(model, x, y,
    epsilon = 0.2,
    max_queries = 10000,
    freq_dims = 28  # DCT basis
)

Model Extraction

# Knockoff Nets
surrogate = knockoff_nets(target,
    architecture = :resnet18,
    budget = 50000,
    batch_size = 256
)

# JBDA - Jacobian-Based Data Augmentation
surrogate = jbda_extract(target,
    seed_data = initial_samples,
    augmentation_factor = 10,
    budget = 20000
)

# Active Learning Extraction
surrogate = active_thief(target,
    strategy = :entropy,
    budget = 10000,
    batch_size = 100
)

๐Ÿ—๏ธ Architecture

mirage/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ Mirage.jl           # Main module
โ”‚   โ”œโ”€โ”€ core/
โ”‚   โ”‚   โ”œโ”€โ”€ Types.jl        # Type definitions
โ”‚   โ”‚   โ”œโ”€โ”€ Config.jl       # Configuration
โ”‚   โ”‚   โ”œโ”€โ”€ Display.jl      # Terminal UI
โ”‚   โ”‚   โ””โ”€โ”€ Utils.jl        # Utilities
โ”‚   โ”œโ”€โ”€ attacks/
โ”‚   โ”‚   โ”œโ”€โ”€ WhiteBox.jl     # White-box attacks
โ”‚   โ”‚   โ”œโ”€โ”€ BlackBox.jl     # Black-box attacks
โ”‚   โ”‚   โ””โ”€โ”€ Extraction.jl   # Model extraction
โ”‚   โ”œโ”€โ”€ models/
โ”‚   โ”‚   โ”œโ”€โ”€ Loaders.jl      # Model loading
โ”‚   โ”‚   โ”œโ”€โ”€ Remote.jl       # Remote API interface
โ”‚   โ”‚   โ””โ”€โ”€ Surrogate.jl    # Surrogate training
โ”‚   โ”œโ”€โ”€ analysis/
โ”‚   โ”‚   โ”œโ”€โ”€ Saliency.jl     # Attribution methods
โ”‚   โ”‚   โ”œโ”€โ”€ Probing.jl      # Network probing
โ”‚   โ”‚   โ””โ”€โ”€ Boundary.jl     # Decision boundaries
โ”‚   โ””โ”€โ”€ defenses/
โ”‚       โ”œโ”€โ”€ Detection.jl    # Attack detection
โ”‚       โ””โ”€โ”€ Evaluation.jl   # Defense testing
โ”œโ”€โ”€ test/
โ”œโ”€โ”€ examples/
โ””โ”€โ”€ docs/

๐Ÿ”ง Configuration

# Configure Mirage
Mirage.configure(
    device = :cuda,           # :cpu, :cuda, :metal
    threads = 8,
    precision = Float32,
    verbose = true,
    log_queries = true,
    cache_gradients = true
)

# Attack-specific config
attack_config = AttackConfig(
    norm = :linf,             # :l2, :linf, :l0, :l1
    epsilon = 0.03,
    targeted = false,
    confidence = 0.0
)

๐Ÿ“Š Metrics & Evaluation

# Evaluate attack success
metrics = evaluate_attack(model, clean, adversarial, labels)

# Metrics include:
# - attack_success_rate
# - average_perturbation (L2, Linf)
# - queries_used
# - confidence_drop
# - transferability (to other models)

# Evaluate model robustness
robustness = evaluate_robustness(model, test_data,
    attacks = [:fgsm, :pgd, :square],
    epsilons = [0.01, 0.03, 0.05, 0.1]
)

# Generate robustness report
report = robustness_report(model, test_data)
display_report(report)

๐Ÿ”— NullSec Integration

Mirage integrates seamlessly with NullSec Linux:

using Mirage

# Auto-detect NullSec environment
if Mirage.nullsec_available()
    # Use shared config
    Mirage.init_nullsec!()
    
    # Log attacks to NullSec
    Mirage.log_attack(result)
    
    # Access NullSec models
    models = Mirage.list_nullsec_models()
end

โš ๏ธ Responsible Use

Mirage is designed for:

  • โœ… Security research and model robustness evaluation
  • โœ… Red team assessments with authorization
  • โœ… Academic research on adversarial ML
  • โœ… Testing your own models and defenses

NOT for:

  • โŒ Attacking models without authorization
  • โŒ Bypassing security in production systems
  • โŒ Any malicious purpose

๐Ÿ“œ License

MIT License - See LICENSE for details.


Built with ๐Ÿ”ฎ by bad-antics

Part of the NullSec Security Toolkit

GitHub

About

Adversarial Machine Learning Toolkit - Model extraction, adversarial examples, neural network probing, and defense evaluation in Julia

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages