MapOSCAL

TL;DR

MapOSCAL automatically analyzes your source code or Kubernetes resources and generates NIST OSCAL compliance documentation using AI-powered discovery. It scans your repository for security controls, maps them to compliance frameworks like NIST SP 800-53, and produces validated OSCAL component definitions. This CLI tool saves security teams weeks of manual documentation work by automating the tedious parts while maintaining accuracy through multi-layer validation.

Overview

Cybersecurity, risk management, as well as regulatory compliance requirements all hinge on a method to accurately describe your system's working environment and configuration. The purpose of this project is to assist the software industry in easily creating standardized software component definitions, specifically to further the interoperability of security and compliance requirements. This takes place using the foundation of Open Security Controls Assessment Language (OSCAL) Framework developed by the National Institute of Standards and Technology (NIST).

Creating and maintaining an OSCAL definition of your system/software is not a trivial task. With OSCAL being a machine-readable format, it's usually accessed as JSON or XML, or using an programmatic SDK. Some UI's exist to improve human interaction, however, it's still a tedious process that requires significant subject matter expertise for mundane tasks. This project seeks to simplify that pain-point by providing an engineering-focused CLI interface that allows for the dynamic drafting of your OSCAL system defintion based on automated discovery techniques. Released under the generous MIT License, its goal is to provide core discovery functionality to as wide an audience as possible. Using the generated output, your system's SMEs (with their highly-valued time) load is shifted from weeks of creating tedious documentation to a more effecient review process of automatically-generated documentation. Its goal and purpose is not to replace such individuals, but to enable them to serve where their expertise is most valuable, not drafting documentation.

Generative AI and OSCAL Discovery

While extremely powerful, generative AI can be equally dangerous in producing false, hallucinatory results if not properly implemented with guardrails. The benefits of using generative AI are only valuable when produced in a framework that allows its powerful pattern-recognition to be assured by non-generative methods. In this open source project, pains have been taken to place guardrails at a high-level view of your application. If there is project growth, in a future commercial version there is planned to be much more granular controls, moving from the application and file level, into functions, relationships, and other more granular aspects.

Compliance Control Implementation Statements

Having an OSCAL-based system defintion is only half of the compliance battle. To be truely effective that definition must be distilled into accurate implementation statements that are tied to one or more compliance frameworks. In this open source implementation we have included a single control definition and mapping for example purposes. If future growth occurs, more are desired to be offered as part of a future, commercial offering.

Validation and Quality Assurance

MapOSCAL includes comprehensive validation and evaluation capabilities to ensure the quality and accuracy of generated OSCAL components:

Local Validation: Fast, deterministic validation using Pydantic schemas for structural correctness
LLM-Assisted Fixes: Intelligent fixing of complex issues that require understanding context
Quality Evaluation: AI-powered assessment of control mapping quality and completeness
Comprehensive Reporting: Detailed validation failures and evaluation results

Security Overview Integration

MapOSCAL includes an advanced security overview generation system with intelligent context optimization:

Service Security Summary: Generates detailed security overviews including authentication, encryption, and audit capabilities
Selective Context Injection: NEW in v0.3.0-alpha - Intelligently includes only relevant security sections based on control type
Token Optimization: Achieves ~47% reduction in prompt tokens while maintaining quality and context relevance
Smart Control Mapping: Different NIST 800-53 control families receive targeted security context:
- Access Control (AC) → Authentication & Authorization context
- Audit & Accountability (AU) → Logging & Monitoring context
- System Protection (SC) → Encryption & Data Protection context
Enhanced Validation: Incorporates security context into critique and revision processes
Improved Accuracy: Better control status determination through focused, relevant context

Cryptographic Operations Detection

NEW in v0.3.0-alpha - MapOSCAL automatically detects and catalogs cryptographic operations in your codebase:

Multi-Language Support: Detects cryptographic patterns in Python and Go codebases
Comprehensive Analysis: Identifies encryption, hashing, signing, and key management operations
Control Integration: Automatically includes cryptographic context in relevant security control mappings
Compliance Mapping: Better accuracy for encryption-related controls (SC family) through detected crypto operations

Dockerfile Analysis & Container Security

NEW in v0.4.0 - MapOSCAL now provides comprehensive container security analysis:

Dockerfile Security Scanning: Automatically analyzes Dockerfiles for security controls and compliance features
Container Control Mapping: Maps Dockerfile instructions to NIST 800-53 controls (AC-6, CM-6, SC-7, SC-13, etc.)
Transport Security Detection: Identifies TLS/HTTPS configuration and certificate management
ENTRYPOINT Script Analysis: Analyzes container entrypoint scripts for security features
Separate FAISS Indexing: Dedicated container security analysis with dockerfile_index.faiss
Comprehensive Coverage: Supports all major Dockerfile instructions (USER, EXPOSE, ENV, COPY, RUN, etc.)
OSCAL Integration: Generates structured properties for automated compliance reporting

Kubernetes Support & Analysis

NEW in v0.5.0 - MapOSCAL now provides comprehensive Kubernetes manifest analysis:

K8s Control Mapping: Automatic NIST 800-53 control identification from Kubernetes resources
Resource Analysis: Analyzes Deployments, Services, ConfigMaps, Secrets, RBAC, and more
Security Context: Maps Kubernetes security features to compliance controls
Network Policies: Automatic detection and analysis of network security policies
RBAC Analysis: Role-based access control analysis and compliance mapping
Pod Security: Pod security standards and security context analysis
Dedicated Command: k8s-process command for Kubernetes-specific analysis

Compliance-Trestle Integration

NEW in v0.5.0 - MapOSCAL now uses industry-standard compliance-trestle for OSCAL generation:

Standards Compliance: 100% OSCAL 1.1.3 compliance across all commands
Type Safety: Strong typing and validation for all OSCAL elements
Schema Validation: Automatic OSCAL structure validation using compliance-trestle models
Future-Proof: Automatic support for OSCAL standard updates
Developer Experience: IDE autocomplete and compile-time validation
Production Ready: Enterprise-grade OSCAL generation with full validation

GPT-5 Model Family Support

NEW in v0.4.0 - MapOSCAL now provides full compatibility with OpenAI's latest models:

Automatic Parameter Detection: Smart handling of max_tokens vs max_completion_tokens parameters
Temperature Restrictions: Automatic handling of GPT-5 temperature parameter limitations
Backward Compatibility: All existing models (GPT-4, GPT-3.5) continue to work unchanged
Future-Proof: Easy to add support for new model restrictions and parameters
Performance Optimization: Automatic parameter optimization for each model type

File Metadata Tracking

MapOSCAL automatically injects metadata into all output files to provide complete audit trails:

Generation Information: Model, provider, base URL, and timing for each operation
Configuration Tracking: Which config file was used for each command
Audit Trail: Complete provenance information for compliance and debugging
Metadata Extraction: Built-in command to extract and display metadata from any output file

Future Growth

The industry is currently struggling to have a clean, clear, and actionable way to describe systems for security and compliance purposes. Our view is that the ideal path forward to improve this problem space is two-fold:

Foundational open source adoption - Having a wide-spread use of OSCAL across both commercial/propriatary as well as commonly-used open source projects is key to future, normalized usage and adoption. With such service definitions an accurate, building-block approach can be achieved to accurately describe complex systems. This movement grows everytime a project is defined in OSCAL and available for usage by others.
Robust commercial support - While this project is foundational and released as open source, requires significant investment in ongoing compliance-related content generation and maintenance. As such, it is desirable to have commercial add-ons in the future to benefit users with turn-key compliance needs.

Recent Improvements

MapOSCAL has undergone significant improvements to enhance usability, accuracy, and maintainability:

v0.5.0 Major Features

Kubernetes Support: New k8s-process command for comprehensive Kubernetes manifest analysis
Compliance-Trestle Integration: Standards-compliant OSCAL generation with enterprise-grade validation
Enhanced OSCAL: All commands now generate 100% OSCAL 1.1.3 compliant component definitions
Dual Analysis Workflow: Separate commands for code analysis (generate) and Kubernetes analysis (k8s-process)

Security Overview Integration

New summarize command: Generates comprehensive security overviews of services
Context-aware control mapping: Uses security overview as reference for better accuracy
Enhanced validation: Incorporates security context into critique and revision processes
Improved explanations: Better control status determination through service understanding

Simplified File Management

Removed service prefixes: All files now use simple, consistent naming
Unique output directories: Each service uses a dedicated output directory for isolation
Cleaner file structure: Simplified file paths and naming conventions
Better organization: Clear separation of analysis, generation, and evaluation outputs

Enhanced CLI Experience

Consistent command interface: All commands now use config files for simplicity
Improved error handling: Better error messages and guidance for users
Streamlined workflow: Logical progression from analysis to evaluation
Better documentation: Comprehensive help text and usage examples

Improved Code Quality

Function-based architecture: Removed unnecessary class instantiations
Better error handling: More robust error handling and recovery
Enhanced logging: Improved logging throughout the codebase
Cleaner imports: Simplified import structure and dependencies

Validation and Quality Assurance

Comprehensive validation: Multi-layer validation with automatic fixes
LLM-assisted resolution: Intelligent fixing of complex validation issues
Quality evaluation: AI-powered assessment of control mapping quality
Detailed reporting: Comprehensive validation and evaluation reports

These improvements make MapOSCAL more user-friendly, accurate, and maintainable while providing better security context for control mapping operations.

Installation

Prerequisites

Python 3.8 or higher
pip (Python package installer)

Setup

OpenAI API Key: This open source configuration currently only supports OpenAI's API functionality for the LLM-based operations. You will need to configure your environmental variable "OPENAI_API_KEY" to have a valid API key.
Clone the repository:

git clone https://github.com/yourusername/MapOSCAL.git
cd MapOSCAL

Create and activate a virtual environment (recommended):

python -m venv .venv
source .venv/bin/activate  # On Windows, use `.venv\Scripts\activate`

Install the package:

pip install -e .

For development, install with additional dependencies:

pip install -e ".[dev]"

Usage

Configuration

Create a configuration file (e.g., config.yaml) with the following structure:

Basic Configuration

# Repository and output settings
repo_path: "/path/to/your/repository"
output_dir: ".oscalgen"

# Catalog and profile paths for OSCAL generation
catalog_path: "path/to/NIST_catalog.json"
profile_path: "path/to/NIST_profile.json"

# Analysis settings
top_k: 5
max_critique_retries: 3

# Configuration file discovery settings
config_extensions: [".yaml", ".yml", ".json", ".toml", ".ini", ".conf", ".properties"]
auto_discover_config: true
config_files: []  # Used when auto_discover_config is false

LLM Configuration (Optional)

You can specify different LLM providers and models for each command:

# LLM Configuration
llm:
  # Global LLM settings (used as defaults)
  provider: "openai"
  model: "gpt-4"
  temperature: 0.4  # Global default temperature
  
  # Command-specific LLM settings (override global settings)
  analyze:
    provider: "openai"
    model: "gpt-4o-mini"  # Fast, cost-effective for analysis
    temperature: 0.1  # Low temperature for consistent analysis
    
  summarize:
    provider: "openai"
    model: "gpt-4"  # High quality for summaries
    temperature: 0.2  # Low temperature for consistent summaries
    
  generate:
    provider: "openai"
    model: "gpt-4"  # High quality for OSCAL generation
    temperature: 0.4  # Moderate temperature for creative but structured generation
    
  evaluate:
    provider: "openai"
    model: "gpt-4"  # High quality for evaluation
    temperature: 0.0  # Very low temperature for deterministic evaluation

Temperature Guidelines:

analyze: 0.1-0.3 (low) for consistent analysis and pattern recognition
summarize: 0.1-0.3 (low) for consistent and reliable summaries
generate: 0.3-0.6 (moderate) for creative but structured OSCAL generation
evaluate: 0.0-0.2 (very low) for deterministic and consistent assessment

Supported LLM Providers:

OpenAI: Any OpenAI model (e.g., gpt-4, gpt-4-turbo, gpt-3.5-turbo, gpt-4o, gpt-4o-mini)
Gemini (via OpenAI-compatible API): Any Gemini model (e.g., gemini-2.0-flash, gemini-2.5-flash, gemini-1.5-pro)

Environment Variables Required:

For OpenAI: OPENAI_API_KEY
For Gemini: GEMINI_API_KEY

Optional Base URL Overrides:

OPENAI_BASE_URL, GEMINI_BASE_URL

Setup Environment Variables:

Copy env.example to .env: cp env.example .env
Edit .env and add your API keys for the providers you plan to use
Only set the API keys you need - you don't need all of them

Configuration Options:

title: Name of your service
description: Description of your service
repo_path: Path to the repository to analyze
output_dir: Directory where analysis and generation outputs will be stored
top_k: Number of most relevant code chunks to retrieve for each control
catalog_path: Path to the OSCAL catalog file (e.g., NIST SP 800-53)
profile_path: Path to the OSCAL profile file (e.g., FedRAMP baseline)
max_critique_retries: Maximum number of validation/fix attempts (default: 3)
config_extensions: List of file extensions to treat as configuration files (when auto_discover_config is True)
auto_discover_config: Whether to auto-discover config files by extension or use manual file list (default: True)
config_files: List of specific file paths to treat as configuration files (when auto_discover_config is False)

Commands

The tool provides five main commands:

Run Complete Workflow

maposcal run-all config.yaml

This command executes the complete MapOSCAL workflow in the proper sequence:

Step 1: Analyze repository and generate initial OSCAL definitions
Step 2: Generate security overview for improved control mapping
Step 3: Create validated OSCAL components with comprehensive validation
Step 4: Evaluate the quality of generated components

The command provides progress updates and continues through the pipeline even if individual steps encounter non-critical issues. This is the recommended way to run MapOSCAL for most use cases.

Analyze Repository

maposcal analyze config.yaml

This command analyzes your repository and generates initial OSCAL component definitions.

Generate Security Overview

maposcal summarize config.yaml

This command generates a comprehensive security overview of the service, including:

Service architecture and technical stack
Authentication and authorization mechanisms
Encryption and data protection measures
Audit logging and monitoring capabilities

The security overview is used as reference context for improved control mapping accuracy.

Generate OSCAL Component

maposcal generate config.yaml

This command generates the final OSCAL component definitions based on the analysis and control mappings. It includes:

Individual control validation with automatic fixes
LLM-assisted resolution of complex issues
Security overview integration for better context
Comprehensive validation reporting
Generation of validation failure logs

Evaluate OSCAL Component Quality

maposcal evaluate config.yaml

This command evaluates the quality of generated OSCAL components using AI-powered assessment:

Scores each control on 4 quality dimensions (0-2 scale)
Provides detailed justifications for scores
Offers improvement recommendations
Generates comprehensive evaluation reports

Extract File Metadata

maposcal metadata path/to/file.json

This command extracts and displays metadata from MapOSCAL output files, showing generation information including model, provider, timing, and configuration used.

Output Files

The tool generates several output files in the specified output_dir:

Analysis Files

meta.json - Code chunk metadata and embeddings
index.faiss - FAISS index for semantic search
summary_meta.json - File-level summary metadata
summary_index.faiss - FAISS index for summary search

Generated Files

implemented_requirements.json - Validated OSCAL component definitions
validation_failures.json - Detailed validation failure information
unvalidated_requirements.json - Requirements that failed validation
security_overview.md - Comprehensive service security overview

Evaluation Files

implemented_requirements_evaluation_results.json - Quality assessment results with scores and recommendations

Validation Features

Local Validation (Fast & Deterministic)

Control Status Validation: Ensures valid control-status values
Configuration Structure: Validates control-configuration format and file extensions
OSCAL Structure: Checks required fields and UUID formats
Cross-Reference Validation: Ensures consistency between status and configuration

LLM-Assisted Fixes

Automatic Fixes: Simple issues fixed automatically (file extensions, missing fields)
Intelligent Resolution: Complex issues sent to LLM for context-aware fixing
Security Context Integration: Uses security overview for better understanding
Retry Logic: Multiple attempts to resolve validation issues

Quality Evaluation

Status Alignment: Is the control-status correct given the explanation and configuration?
Explanation Quality: Is the control-explanation clear, accurate, and grounded?
Configuration Support: Is the control-configuration specific, correct, and valid?
Overall Consistency: Do all parts reinforce each other without contradiction?

Example Workflow

Quick Start (Recommended)

For most use cases, you can run the complete workflow with a single command:

Create a configuration file:

title: "my_service"
description: "My security-critical service"
repo_path: "./my_service"
output_dir: ".oscalgen"
top_k: 5
catalog_path: "examples/NIST_SP-800-53_rev5_catalog.json"
profile_path: "examples/NIST_SP-800-53_rev5_HIGH-baseline_profile.json"
max_critique_retries: 3

# Optional: Configure which files to treat as configuration files
config_extensions:
  - ".yaml"
  - ".yml"
  - ".json"
  - ".env"
  - ".cfg"

Run the complete workflow:

maposcal run-all config.yaml

This single command will:

Analyze your repository and generate initial OSCAL definitions
Generate a comprehensive security overview
Create validated OSCAL components with comprehensive validation
Evaluate the quality of generated components
Provide progress updates throughout the process

Step-by-Step Workflow (Advanced)

If you prefer to run each step individually for more control:

Create a configuration file (same as above)
Run the analysis:

maposcal analyze config.yaml

Generate security overview:

maposcal summarize config.yaml

Generate the OSCAL component:

maposcal generate config.yaml

Evaluate the quality of generated components:

maposcal evaluate config.yaml

Project Structure

maposcal/ - Main package directory
- analyzer/ - Code analysis components
  - analyzer.py - Main analysis workflow
  - chunker.py - Code chunking logic
  - parser.py - File parsing utilities
  - rules.py - Security rule application
- generator/ - OSCAL generation components
  - control_mapper.py - Control mapping logic
  - profile_control_extractor.py - Profile and catalog processing
  - validation.py - Comprehensive validation schemas and functions
- llm/ - Language model integration
  - llm_handler.py - LLM API interaction
  - prompt_templates.py - LLM prompt templates for generation and evaluation
- embeddings/ - Code embedding functionality
  - faiss_index.py - FAISS vector index management
  - local_embedder.py - Local embedding generation
  - meta_store.py - Metadata storage and retrieval
- inspectors/ - Language-specific code inspection
  - inspect_lang_python.py - Python code inspection
  - inspect_lang_golang.py - Golang code inspection
- utils/ - Utility functions
  - control_hints.py - Security control hint definitions
  - control_hints_enumerator.py - Dynamic control hint discovery
  - logging_config.py - Logging configuration
  - utilities.py - General utility functions
- cli.py - Command-line interface with analyze, summarize, generate, and evaluate commands
- settings.py - Global configuration settings
tests/ - Test suite
- analyzer/ - Analyzer tests
- embeddings/ - Embedding tests
- generator/ - Generator tests
- integration/ - Integration tests
- llm/ - LLM tests
- utils/ - Utility tests
examples/ - Example configurations and outputs
- NIST_SP-800-53_rev5_catalog.json - NIST SP 800-53 Rev 5 catalog
- NIST_SP-800-53_rev5_HIGH-baseline_profile.json - NIST High baseline profile
- FedRAMP_rev5_HIGH-baseline_profile.json - FedRAMP High baseline profile
- custom_maposcal_profile.json - Custom MapOSCAL profile example
- min_baseline.json - Minimum baseline profile
- test_baseline.json - Test baseline profile
docs/ - Documentation
- diagrams/ - Architecture and workflow diagrams
  - analysis_flow.png - Analysis workflow diagram
  - generation_flow.png - Generation workflow diagram
config/ - Configuration templates

Examples

The examples/ directory contains several OSCAL catalog and profile files for testing and reference:

NIST SP 800-53 Files

NIST_SP-800-53_rev5_catalog.json - Complete NIST SP 800-53 Revision 5 control catalog
NIST_SP-800-53_rev5_HIGH-baseline_profile.json - NIST High baseline profile with control selections

FedRAMP Files

FedRAMP_rev5_HIGH-baseline_profile.json - FedRAMP High baseline profile for cloud services

Custom Profiles

custom_maposcal_profile.json - Example custom profile showing how to create targeted control sets
min_baseline.json - Minimal baseline profile for testing
test_baseline.json - Test baseline profile for development

Usage Examples

To use the NIST High baseline:

catalog_path: "examples/NIST_SP-800-53_rev5_catalog.json"
profile_path: "examples/NIST_SP-800-53_rev5_HIGH-baseline_profile.json"

To use the FedRAMP High baseline:

catalog_path: "examples/NIST_SP-800-53_rev5_catalog.json"
profile_path: "examples/FedRAMP_rev5_HIGH-baseline_profile.json"

How it works

Analysis

MapOSCAL uses a three-pass analysis system to comprehensively understand your codebase and extract security-relevant information:

Pass 1: Vector Embedding of Code/Config/Docs

The first pass processes all repository files and creates semantic vector embeddings:

File Discovery: Recursively scans the repository, excluding binary files, test files, and common non-relevant patterns
Intelligent Chunking: Breaks files into meaningful chunks based on file type:
- Code files (.py, .go, .java, .js, .ts, etc.): Chunked by function and class definitions
- Config files (.yaml, .yml, .json): Chunked by document separators
- Documentation (.md, .rst, .txt): Chunked by headers and sections
Vector Generation: Creates high-dimensional embeddings for each chunk using local embedding models
Index Creation: Builds a FAISS index for efficient similarity search across all code chunks

Why FAISS? MapOSCAL uses FAISS (Facebook AI Similarity Search) for vector storage and similarity search due to its simple setup requirements - no external database dependencies or complex infrastructure needed. FAISS provides excellent performance for similarity search operations, supports both CPU and GPU acceleration, and stores indices as simple files that can be easily versioned and shared. While alternatives like pgvector, Chroma, or Pinecone offer additional features, FAISS's minimal deployment footprint and high performance make it ideal for local analysis workflows where simplicity and speed are paramount.

Pass 2: Semantic Security Summaries

The second pass generates intelligent summaries of each file using LLM analysis:

File-Level Processing: Each relevant file is processed individually
LLM Summarization: Uses specialized prompts to generate security-focused summaries that capture:
- Authentication mechanisms
- Data handling patterns
- Security controls implemented
- Configuration management
Summary Embedding: Creates vector embeddings for each file summary
Summary Index: Builds a separate FAISS index for file-level similarity search

Pass 3: Rule-Based Feature Extraction

The third pass applies deterministic security rules to extract specific security features:

Pattern Recognition: Scans code chunks for security-relevant patterns:
- TLS/HTTPS usage (uses_tls flag)
- Hardcoded secrets detection (hardcoded_secret flag)
- Authentication checks (auth_check flag)
Control Mapping: Maps detected patterns to relevant security controls:
- TLS usage → SC-8 (Transmission Confidentiality and Integrity)
- Authentication → AC-6 (Least Privilege)
Metadata Enhancement: Enriches chunk metadata with security flags and control hints

This three-pass system ensures comprehensive coverage:

Pass 1 provides semantic understanding and similarity search capabilities
Pass 2 adds human-like comprehension of security contexts
Pass 3 ensures deterministic detection of specific security patterns

The combined outputs enable the generation system to create accurate, contextually relevant OSCAL implemented requirements that can be included in a broader component definition that reflect the actual security posture of your service.

Diagram

Figure 1: Overview of MapOSCAL's analysis workflow

Generation

MapOSCAL's generation process transforms the analysis outputs into structured OSCAL implemented requirements through a sophisticated multi-step workflow:

Step 1: Control Parameter Extraction

The generation begins by extracting control information from OSCAL catalogs and profiles:

Catalog Processing: Parses NIST SP 800-53 or other security control catalogs to extract control definitions
Profile Tailoring: Applies profile-specific modifications, parameter substitutions, and control selections
Parameter Resolution: Resolves control parameters using profile-specific values or catalog defaults
Statement Extraction: Extracts control statements and requirements from the catalog structure

Step 2: Semantic Evidence Retrieval

For each control, MapOSCAL retrieves relevant evidence from the analysis outputs:

Dual-Index Querying: Queries both chunk-level and summary-level FAISS indices using the control description as the search query
Relevance Ranking: Retrieves top-k most semantically similar chunks and file summaries
Evidence Combination: Combines and deduplicates evidence from both code chunks and file summaries
Context Preservation: Maintains source file information, line numbers, and chunk types for traceability

Step 3: LLM-Based Control Mapping

Each control is individually processed by the LLM to generate OSCAL implemented requirements:

Structured Prompting: Uses specialized prompts that include:
- Control ID, title, and detailed description
- Resolved parameter values and additional requirements
- Top-k relevant evidence chunks with source information
- Pre-generated UUIDs for consistency
Status Determination: LLM determines the appropriate control status from five options:
- "applicable and inherently satisfied"
- "applicable but only satisfied through configuration"
- "applicable but partially satisfied"
- "applicable and not satisfied"
- "not applicable"
Configuration Mapping: When applicable, maps specific configuration files, keys, and line numbers
Explanation Generation: Creates detailed explanations of how the control is implemented or why it's not applicable

Step 4: Individual Validation and Revision

Each generated control undergoes rigorous validation and iterative improvement:

Local Validation: Performs deterministic validation using Pydantic schemas:
- OSCAL structure compliance
- Control status validation against allowed values
- Configuration structure and file extension validation
- UUID format validation
- Cross-reference consistency checks
LLM Critique-Revise Loop: For validation failures, uses LLM to:
- Critique the specific issues without rewriting
- Revise only the flagged problems while preserving valid content
- Retry up to 3 times with error feedback
Final Validation: Performs comprehensive validation including duplicate UUID detection

Step 5: Output Generation and Documentation

The final step produces structured outputs with comprehensive documentation:

Implemented Requirements: Generates valid OSCAL JSON with all required properties:
- control-status: Current implementation status
- control-name: Human-readable control name
- control-description: Original control description
- control-explanation: Detailed implementation explanation
- control-configuration: Specific configuration references (when applicable)
- annotations: Source code references and metadata
- statements: Detailed implementation statements
Validation Reports: Creates detailed JSON files documenting:
- Validation failures with timestamps and specific issues
- Unvalidated requirements that couldn't be resolved
- Final validation results with violation details
Quality Evaluation: Optionally evaluates each control for:
- Status alignment accuracy
- Explanation quality and clarity
- Configuration support validity
- Overall consistency across all elements

This generation process ensures that each implemented requirement is:

Accurate: Based on actual code analysis and semantic understanding
Compliant: Follows OSCAL schema requirements exactly
Traceable: Links back to specific source files and configurations
Validated: Undergoes multiple validation layers before final output
Documented: Includes detailed explanations and evidence references

The result is a comprehensive set of OSCAL implemented requirements that accurately reflects your service's security posture and can be integrated into broader component definitions for compliance reporting.

Diagram

Figure 2: Overview of MapOSCAL's generation workflow

Development

Running Tests

pytest

Code Style

The project uses:

Black for code formatting
Ruff for linting
MyPy for type checking

Run the formatters:

black .
ruff check .
mypy .

GitHub Actions

This project includes comprehensive GitHub Actions workflows for continuous integration and release management.

Workflows

CI Workflow (`.github/workflows/ci.yml`)

Runs on every push to main/develop branches and pull requests:

Unit Tests: Runs pytest with coverage on Python 3.13
Code Quality: Checks code formatting (Black), linting (Ruff), and type checking (MyPy)
Security Checks: Runs Bandit security linter and Safety vulnerability scanner
Package Build: Validates package can be built and distributed correctly

Release Workflow (`.github/workflows/release.yml`)

Triggers when a release is created or published:

All CI checks: Runs the same validation as CI workflow (Python 3.13 testing)
Security Analysis: Comprehensive security scanning with detailed reports
Package Publishing: Automatically publishes to PyPI when a version tag is pushed
Release Summary: Generates a comprehensive release validation report

Setup Requirements

Required Secrets

To enable PyPI publishing, add the following secret to your GitHub repository:

Go to your repository Settings → Secrets and variables → Actions
Add a new repository secret:
- Name: PYPI_API_TOKEN
- Value: Your PyPI API token (get one from PyPI account settings)

Optional Integrations

Codecov: For code coverage reporting (automatically configured in CI workflow)
Dependabot: For automated dependency updates (recommended)

Creating Releases

Create a version tag:
```
git tag v1.0.0
git push origin v1.0.0
```
Create a GitHub release:
- Go to your repository → Releases → "Create a new release"
- Select the version tag
- Add release notes
- Publish the release
Automatic publishing: The workflow will automatically:
- Run all validation checks
- Build the package
- Publish to PyPI (if all checks pass)
- Generate a release summary

Workflow Features

Python 3.13 Testing: Tests on the latest stable Python version
Parallel Execution: Jobs run in parallel for faster feedback
Artifact Storage: Security reports and build artifacts are preserved
Conditional Publishing: Only publishes to PyPI for version tags (v*)
Comprehensive Reporting: Detailed validation results and release summaries

Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests and ensure code style compliance
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
.github		.github
docs/diagrams		docs/diagrams
examples		examples
maposcal		maposcal
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
COMPLIANCE_TRESTLE_INTEGRATION.md		COMPLIANCE_TRESTLE_INTEGRATION.md
DOCKERFILE_ANALYSIS_IMPLEMENTATION.md		DOCKERFILE_ANALYSIS_IMPLEMENTATION.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
env.example		env.example
pyproject.toml		pyproject.toml
release_notes_v0.3.0-alpha.md		release_notes_v0.3.0-alpha.md
release_notes_v0.5.0-alpha.md		release_notes_v0.5.0-alpha.md
release_notes_v0.5.0.md		release_notes_v0.5.0.md
sample_control_config.yaml		sample_control_config.yaml

License

ChrisRimondi/MapOSCAL

Folders and files

Latest commit

History

Repository files navigation

MapOSCAL

TL;DR

Overview

Generative AI and OSCAL Discovery

Compliance Control Implementation Statements

Validation and Quality Assurance

Security Overview Integration

Cryptographic Operations Detection

Dockerfile Analysis & Container Security

Kubernetes Support & Analysis

Compliance-Trestle Integration

GPT-5 Model Family Support

File Metadata Tracking

Future Growth

Recent Improvements

v0.5.0 Major Features

Security Overview Integration

Simplified File Management

Enhanced CLI Experience

Improved Code Quality

Validation and Quality Assurance

Installation

Prerequisites

Setup

Usage

Configuration

Basic Configuration

LLM Configuration (Optional)

Commands

Output Files

Analysis Files

Generated Files

Evaluation Files

Validation Features

Local Validation (Fast & Deterministic)

LLM-Assisted Fixes

Quality Evaluation

Example Workflow

Quick Start (Recommended)

Step-by-Step Workflow (Advanced)

Project Structure

Examples

NIST SP 800-53 Files

FedRAMP Files

Custom Profiles

Usage Examples

How it works

Analysis

Pass 1: Vector Embedding of Code/Config/Docs

Pass 2: Semantic Security Summaries

Pass 3: Rule-Based Feature Extraction

Diagram

Generation

Step 1: Control Parameter Extraction

Step 2: Semantic Evidence Retrieval

Step 3: LLM-Based Control Mapping

Step 4: Individual Validation and Revision

Step 5: Output Generation and Documentation

Diagram

Development

Running Tests

Code Style

GitHub Actions

Workflows

CI Workflow (.github/workflows/ci.yml)

Release Workflow (.github/workflows/release.yml)

Setup Requirements

Required Secrets

Optional Integrations

Creating Releases

Workflow Features

Contributing

License

About

CI Workflow (`.github/workflows/ci.yml`)

Release Workflow (`.github/workflows/release.yml`)

Packages