Skip to content

nirban/ML_Project_generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

ML Project Generator

A comprehensive shell script that generates a complete, production-ready machine learning project structure with best practices, including data pipelines, model training, deployment options, and more.

Features

Complete ML Project Structure: Creates a well-organized directory structure following ML engineering best practices

🔧 Multiple Deployment Options:

  • FastAPI REST API
  • Streamlit UI
  • CLI interface with Typer

📊 Full ML Pipeline Support:

  • Data ingestion and preprocessing
  • Model training (scikit-learn, PyTorch, LLMs)
  • Model evaluation and metrics
  • Inference pipeline

🚀 Development Tools:

  • Docker configuration
  • GitHub Actions CI/CD
  • Makefile for common tasks
  • Comprehensive testing structure

📝 Documentation & Configuration:

  • Pre-configured logging
  • YAML-based configuration
  • Environment variable management
  • Detailed code templates

Quick Start

  1. Clone the repository:

    git clone <repository-url>
    cd ML_Project_generator
  2. Make the script executable:

    chmod +x generate.sh
  3. Run the generator:

    ./generate.sh
  4. Follow the prompts to enter your project name (or use the default)

Generated Project Structure

The script creates a comprehensive project structure:

my_ml_project/
├── .github/workflows/          # GitHub Actions CI/CD
├── application/                # Deployment interfaces
│   ├── api/                   #   FastAPI REST API
│   ├── cli/                   #   Command-line interface
│   └── ui/                    #   Streamlit web interface
├── config/                    # Configuration files
├── data/                      # Data storage
│   ├── raw/                   #   Raw data
│   ├── processed/             #   Processed data
│   └── external/              #   External data sources
├── notebooks/                 # Jupyter notebooks for EDA
├── outputs/                   # Model outputs and reports
│   ├── saved_models/          #   Trained models
│   ├── reports/               #   Analysis reports
│   └── visualizations/        #   Plots and charts
├── scripts/                   # Execution scripts
├── src/                       # Source code
│   ├── config/                #   Configuration management
│   ├── data_ingestion/        #   Data loading and preprocessing
│   ├── models/                #   Model training and evaluation
│   ├── pipelines/             #   ML pipelines
│   ├── inference/             #   Prediction and inference
│   └── utils/                 #   Utility functions
├── tests/                     # Unit and integration tests
├── logs/                      # Application logs
├── .gitignore                 # Git ignore rules
├── .env                       # Environment variables
├── README.md                  # Project documentation
├── requirements.txt           # Python dependencies
├── setup.py                   # Package configuration
├── Dockerfile                 # Container configuration
└── Makefile                   # Common tasks automation

Key Components

Data Pipeline

  • Data Ingestion: Download from URLs or ingest local data
  • Preprocessing: Data cleaning, feature engineering, train/test splits
  • Configuration-driven: Paths and parameters in YAML config

Model Training

  • Multiple ML Libraries: Support for scikit-learn, PyTorch, and LLMs
  • Model Selection: Baseline model comparison utilities
  • Metrics: Comprehensive evaluation metrics for classification/regression
  • Model Persistence: Save/load trained models

Inference & Deployment

  • FastAPI API: Production-ready REST API with Pydantic schemas
  • Streamlit UI: Interactive web interface for demonstrations
  • CLI Tool: Command-line interface for batch processing
  • Docker: Containerized deployment

Development Tools

  • Testing: pytest-based test structure
  • CI/CD: GitHub Actions workflow
  • Logging: Configurable logging with multiple handlers
  • Code Quality: Linting and formatting setup

Usage Examples

Running the Full Pipeline

cd your_project_name
python scripts/run_pipeline.py

Training a Model

python scripts/run_train.py

Making Predictions

# Via CLI
python scripts/run_inference.py --input_json '{"feature1": 1.0, "feature2": 2.0}'

# Via API (after starting server)
uvicorn application.api.main:app --reload
curl -X POST "http://localhost:8000/predict" -H "Content-Type: application/json" -d '{"feature1": 1.0, "feature2": 2.0}'

# Via Streamlit UI
streamlit run application/ui/app.py

Using the Makefile

make install    # Install dependencies
make test       # Run tests
make train      # Train model
make run_api    # Start API server
make clean      # Clean cache files

Configuration

The generated project uses YAML configuration files:

# src/config/config.yaml
paths:
  raw_data: "data/raw/dataset.csv"
  processed_data: "data/processed/processed_data.parquet"
  model_output: "outputs/saved_models/"

model_params:
  random_state: 42
  test_size: 0.2

hyperparameters:
  sklearn_example:
    n_estimators: 100
    max_depth: 5

Customization

After generation, customize the project:

  1. Update requirements.txt with your specific dependencies
  2. Modify src/config/config.yaml with your project parameters
  3. Implement your data loading logic in src/data_ingestion/
  4. Define your model architecture in src/models/
  5. Update API schemas in application/api/schemas.py

Docker Deployment

The project includes Docker configuration:

# Build the image
docker build -t your_project_name .

# Run the container
docker run -p 8000:8000 your_project_name

Testing

Run the test suite:

# All tests
pytest

# Specific test file
pytest tests/test_models.py

# With coverage
pytest --cov=src tests/

Requirements

  • Python 3.8+
  • Bash shell (for running the generator script)
  • Git (optional, for version control)

Supported ML Frameworks

The generated templates support:

  • scikit-learn: Traditional ML algorithms
  • PyTorch: Deep learning models
  • Hugging Face Transformers: LLMs and NLP models
  • Custom models: Extensible architecture

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

This project is open source. Please check the LICENSE file for details.

Support

If you encounter any issues or have questions:

  1. Check the generated project's README for project-specific guidance
  2. Review the template files for implementation examples
  3. Open an issue on GitHub

Generated with the ML Project Generator - A tool for creating production-ready ML project structures.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages