Skip to content

๐Ÿ”ฎ Oracle - AI-Powered Vulnerability Discovery Engine. Predictive vulnerability detection using ML models and 300+ patterns. Julia.

Notifications You must be signed in to change notification settings

bad-antics/oracle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ”ฎ Oracle

AI-Powered Predictive Vulnerability Discovery Engine

Julia License Security

Predict vulnerabilities before they're exploited. Oracle uses advanced machine learning to identify 0-day vulnerabilities through pattern analysis and anomaly detection.

๐ŸŒŸ Features

๐Ÿง  AI-Powered Analysis

  • Deep Code Embeddings - Transform code into ML-ready vectors
  • Vulnerability Prediction - Predict security issues with trained models
  • Anomaly Detection - Identify unknown 0-day patterns using isolation forests
  • Pattern Classification - Multi-class vulnerability classification

๐Ÿ” Comprehensive Analysis

  • Static Analysis - AST-based pattern matching and dangerous function detection
  • Semantic Analysis - Symbol tables, call graphs, and type inference
  • Data Flow Analysis - Reaching definitions, live variables, def-use chains
  • Control Flow Analysis - CFG construction, dominators, loop detection
  • Taint Tracking - Full source-to-sink taint propagation

๐Ÿ›ก๏ธ Vulnerability Coverage

  • SQL/Command Injection (CWE-89, CWE-78)
  • Cross-Site Scripting (CWE-79)
  • Buffer Overflow (CWE-120)
  • Use After Free (CWE-416)
  • Path Traversal (CWE-22)
  • Insecure Deserialization (CWE-502)
  • Authentication Bypass (CWE-287)
  • Cryptographic Weaknesses (CWE-327)
  • SSRF (CWE-918)
  • And 6 more vulnerability classes...

๐Ÿ“Š Risk Intelligence

  • CVSS Calculation - Automatic severity scoring
  • Risk Prioritization - Smart finding prioritization
  • CVE Correlation - Link findings to known vulnerabilities
  • NVD Integration - Real-time vulnerability database

๐Ÿ“ Reporting

  • HTML Reports - Beautiful, interactive dashboards
  • SARIF Export - CI/CD integration ready
  • JSON/Markdown - Developer-friendly formats
  • Trend Analysis - Track security posture over time

๐Ÿš€ Quick Start

Installation

using Pkg
Pkg.add(url="https://github.com/yourusername/oracle")

Basic Usage

using Oracle

# Scan a single file
result = analyze("vulnerable.c")

# Scan entire codebase
result = scan_codebase("./src")

# Generate report
generate_report(result, format="html")

Advanced Usage

using Oracle

# Configure scanner
config = ScanConfig(
    enable_ml=true,
    enable_anomaly=true,
    min_confidence=0.5,
    parallel=true
)

# Initialize scanner with custom config
scanner = Scanner(config=config)

# Scan with full analysis
result = scan(scanner, "./project")

# Prioritize findings
prioritizer = RiskPrioritizer()
prioritized = prioritize(prioritizer, result.findings)

# Correlate with CVEs
client = NVDClient(api_key=ENV["NVD_API_KEY"])
correlated = correlate_findings(client, result.findings)

# Generate comprehensive report
generator = ReportGenerator(output_dir="./reports")
generate_report(generator, result, format="html", target="MyProject")

๐Ÿ”ฌ How It Works

Analysis Pipeline

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        Oracle Pipeline                           โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”‚
โ”‚   โ”‚  Code   โ”‚โ”€โ”€โ”€โ–ถโ”‚  Tokenizer  โ”‚โ”€โ”€โ”€โ–ถโ”‚  AST Parser  โ”‚            โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ”‚
โ”‚                                            โ”‚                     โ”‚
โ”‚                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚                        โ–ผ                   โ–ผ                   โ–ผ โ”‚
โ”‚              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”‚
โ”‚              โ”‚    Static    โ”‚    โ”‚   Semantic   โ”‚    โ”‚   Data   โ”‚โ”‚
โ”‚              โ”‚   Analysis   โ”‚    โ”‚   Analysis   โ”‚    โ”‚   Flow   โ”‚โ”‚
โ”‚              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜โ”‚
โ”‚                        โ”‚                   โ”‚                   โ”‚ โ”‚
โ”‚                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                                            โ–ผ                     โ”‚
โ”‚                              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”‚
โ”‚                              โ”‚  Feature Vector  โ”‚                โ”‚
โ”‚                              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ”‚
โ”‚                                      โ”‚                           โ”‚
โ”‚                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”‚
โ”‚                   โ–ผ                  โ–ผ                  โ–ผ        โ”‚
โ”‚          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚          โ”‚  Predictor   โ”‚   โ”‚  Classifier  โ”‚   โ”‚   Anomaly    โ”‚  โ”‚
โ”‚          โ”‚     (ML)     โ”‚   โ”‚  (Ensemble)  โ”‚   โ”‚  Detection   โ”‚  โ”‚
โ”‚          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                   โ”‚                  โ”‚                  โ”‚        โ”‚
โ”‚                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ”‚
โ”‚                                      โ–ผ                           โ”‚
โ”‚                           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                   โ”‚
โ”‚                           โ”‚    Findings      โ”‚                   โ”‚
โ”‚                           โ”‚  & Risk Scores   โ”‚                   โ”‚
โ”‚                           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                   โ”‚
โ”‚                                      โ”‚                           โ”‚
โ”‚                                      โ–ผ                           โ”‚
โ”‚                           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                   โ”‚
โ”‚                           โ”‚     Report       โ”‚                   โ”‚
โ”‚                           โ”‚   Generation     โ”‚                   โ”‚
โ”‚                           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                   โ”‚
โ”‚                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Machine Learning Components

  1. Code Embeddings (CodeEmbedder)

    • Tokenizes code into semantic units
    • Generates 128-dimensional embeddings
    • Supports similarity search for pattern matching
  2. Vulnerability Predictor (VulnerabilityPredictor)

    • Multi-label classification across 15 vulnerability types
    • Trained on historical vulnerability data
    • Heuristic initialization for zero-shot prediction
  3. Pattern Classifier (PatternClassifier)

    • Random forest ensemble with 10 estimators
    • Feature importance tracking
    • Probability distribution output
  4. Anomaly Detector (AnomalyDetector)

    • Isolation Forest algorithm
    • Detects code that deviates from normal patterns
    • Zero-day candidate identification

๐Ÿ“ Project Structure

oracle/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ Oracle.jl           # Main module & exports
โ”‚   โ”œโ”€โ”€ analyzers/
โ”‚   โ”‚   โ”œโ”€โ”€ static.jl       # Static analysis engine
โ”‚   โ”‚   โ”œโ”€โ”€ semantic.jl     # Semantic analysis
โ”‚   โ”‚   โ”œโ”€โ”€ dataflow.jl     # Data flow analysis
โ”‚   โ”‚   โ”œโ”€โ”€ controlflow.jl  # Control flow analysis
โ”‚   โ”‚   โ””โ”€โ”€ taint.jl        # Taint tracking
โ”‚   โ”œโ”€โ”€ ml/
โ”‚   โ”‚   โ”œโ”€โ”€ embeddings.jl   # Code embeddings
โ”‚   โ”‚   โ”œโ”€โ”€ predictor.jl    # Vulnerability prediction
โ”‚   โ”‚   โ”œโ”€โ”€ classifier.jl   # Pattern classification
โ”‚   โ”‚   โ””โ”€โ”€ anomaly.jl      # Anomaly detection
โ”‚   โ”œโ”€โ”€ patterns/
โ”‚   โ”‚   โ”œโ”€โ”€ database.jl     # Pattern database
โ”‚   โ”‚   โ””โ”€โ”€ matcher.jl      # Pattern matching
โ”‚   โ”œโ”€โ”€ engine/
โ”‚   โ”‚   โ”œโ”€โ”€ scanner.jl      # Main scanner
โ”‚   โ”‚   โ””โ”€โ”€ risk.jl         # Risk calculation
โ”‚   โ”œโ”€โ”€ reporting/
โ”‚   โ”‚   โ””โ”€โ”€ generator.jl    # Report generation
โ”‚   โ”œโ”€โ”€ integrations/
โ”‚   โ”‚   โ”œโ”€โ”€ nvd.jl          # NVD API integration
โ”‚   โ”‚   โ””โ”€โ”€ cve.jl          # CVE tracking
โ”‚   โ””โ”€โ”€ utils/
โ”‚       โ”œโ”€โ”€ helpers.jl      # Utility functions
โ”‚       โ””โ”€โ”€ languages.jl    # Language support
โ”œโ”€โ”€ test/
โ”œโ”€โ”€ docs/
โ”œโ”€โ”€ Project.toml
โ””โ”€โ”€ README.md

๐ŸŽฏ Supported Languages

Language Static Semantic Data Flow Taint
C/C++ โœ… โœ… โœ… โœ…
Java โœ… โœ… โœ… โœ…
Python โœ… โœ… โœ… โœ…
JavaScript/TypeScript โœ… โœ… โœ… โœ…
PHP โœ… โœ… โœ… โœ…
Go โœ… โœ… โœ… โœ…
Rust โœ… โœ… โœ… โœ…
Ruby โœ… โœ… โœ… โœ…

๐Ÿ”ง Configuration

Scan Configuration

config = ScanConfig(
    # Scope
    include_patterns = ["*.c", "*.py", "*.js"],
    exclude_patterns = ["*test*", "*vendor*"],
    max_file_size = 1_000_000,
    
    # Analysis modules
    enable_static = true,
    enable_semantic = true,
    enable_dataflow = true,
    enable_taint = true,
    enable_ml = true,
    enable_anomaly = true,
    
    # Thresholds
    min_confidence = 0.5,
    max_findings_per_file = 50,
    
    # Performance
    parallel = true,
    max_workers = 8,
    timeout_seconds = 60,
    
    # Output
    verbose = false,
    generate_report = true,
    report_format = "html"
)

Environment Variables

export NVD_API_KEY="your-api-key"      # For NVD integration
export ORACLE_CACHE_DIR="~/.oracle"    # Cache directory
export ORACLE_LOG_LEVEL="info"         # Logging level

๐Ÿ“ˆ Performance

Metric Value
Files/second ~100 (parallel)
Memory usage ~500MB baseline
Prediction latency <50ms
Accuracy (F1) 0.87 (on benchmark)

๐Ÿ”ฌ Training Custom Models

using Oracle

# Load training data
df = CSV.read("vulnerability_dataset.csv", DataFrame)

# Extract features and labels
features = extract_training_features(df)
labels = df.vuln_class

# Train predictor
predictor = VulnerabilityPredictor()
train!(predictor, features, labels, epochs=100)

# Save model
save_predictor(predictor, "custom_model.jls")

# Train classifier
classifier = PatternClassifier(n_estimators=50)
train!(classifier, features, labels)
save_classifier(classifier, "custom_classifier.jls")

# Train anomaly detector
detector = AnomalyDetector(contamination=0.05)
train!(detector, features)
save_detector(detector, "custom_detector.jls")

๐Ÿค Integration

CI/CD (GitHub Actions)

name: Security Scan
on: [push, pull_request]

jobs:
  oracle-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: julia-actions/setup-julia@v1
      - run: julia -e 'using Pkg; Pkg.add(url="https://github.com/yourusername/oracle")'
      - run: |
          julia -e '
            using Oracle
            result = scan_codebase(".")
            generate_report(result, format="sarif", output_file="results.sarif")
            exit(result.stats.findings_by_severity[CRITICAL] > 0 ? 1 : 0)
          '
      - uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: results.sarif

IDE Integration (VS Code)

{
  "oracle.enable": true,
  "oracle.onSave": true,
  "oracle.minSeverity": "medium",
  "oracle.enableML": true
}

๐Ÿ“š API Reference

Core Functions

# Analyze a single file
analyze(filepath::String; language=nothing) -> AnalysisResult

# Scan entire codebase
scan_codebase(path::String; config=DEFAULT_SCAN_CONFIG) -> ScanResult

# Predict vulnerabilities
predict_vulnerabilities(code::String, language::String) -> Vector{PredictionResult}

# Generate report
generate_report(result::ScanResult; format="html") -> String

Advanced Functions

# Create custom scanner
Scanner(; config::ScanConfig) -> Scanner

# Risk calculation
calculate_risk(calc::RiskCalculator, finding::Finding) -> Float64
calculate_cvss(finding::Finding) -> CVSSScore

# CVE correlation
correlate_findings(client::NVDClient, findings::Vector{Finding}) -> Vector{CorrelatedFinding}

# Anomaly analysis
analyze_anomaly(detector::AnomalyDetector, x::Vector, ref::Matrix) -> AnomalyAnalysis

๐Ÿ›ก๏ธ Security

Oracle is designed with security in mind:

  • No code execution during analysis
  • Sandboxed pattern matching
  • Rate-limited external API calls
  • Secure credential handling

๐Ÿ“œ License

MIT License - See LICENSE for details.

๐Ÿ™ Acknowledgments

  • NVD/NIST for vulnerability data
  • CWE/MITRE for weakness enumeration
  • The Julia community for excellent packages

Documentation โ€ข Issues โ€ข Discussions

Made with ๐Ÿ’œ by the NullSec Team

About

๐Ÿ”ฎ Oracle - AI-Powered Vulnerability Discovery Engine. Predictive vulnerability detection using ML models and 300+ patterns. Julia.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages