ML Project Generator

A comprehensive shell script that generates a complete, production-ready machine learning project structure with best practices, including data pipelines, model training, deployment options, and more.

Features

✨ Complete ML Project Structure: Creates a well-organized directory structure following ML engineering best practices

🔧 Multiple Deployment Options:

FastAPI REST API
Streamlit UI
CLI interface with Typer

📊 Full ML Pipeline Support:

Data ingestion and preprocessing
Model training (scikit-learn, PyTorch, LLMs)
Model evaluation and metrics
Inference pipeline

🚀 Development Tools:

Docker configuration
GitHub Actions CI/CD
Makefile for common tasks
Comprehensive testing structure

📝 Documentation & Configuration:

Pre-configured logging
YAML-based configuration
Environment variable management
Detailed code templates

Quick Start

Clone the repository:

git clone <repository-url>
cd ML_Project_generator

Make the script executable:
```
chmod +x generate.sh
```
Run the generator:
```
./generate.sh
```
Follow the prompts to enter your project name (or use the default)

Generated Project Structure

The script creates a comprehensive project structure:

my_ml_project/
├── .github/workflows/          # GitHub Actions CI/CD
├── application/                # Deployment interfaces
│   ├── api/                   #   FastAPI REST API
│   ├── cli/                   #   Command-line interface
│   └── ui/                    #   Streamlit web interface
├── config/                    # Configuration files
├── data/                      # Data storage
│   ├── raw/                   #   Raw data
│   ├── processed/             #   Processed data
│   └── external/              #   External data sources
├── notebooks/                 # Jupyter notebooks for EDA
├── outputs/                   # Model outputs and reports
│   ├── saved_models/          #   Trained models
│   ├── reports/               #   Analysis reports
│   └── visualizations/        #   Plots and charts
├── scripts/                   # Execution scripts
├── src/                       # Source code
│   ├── config/                #   Configuration management
│   ├── data_ingestion/        #   Data loading and preprocessing
│   ├── models/                #   Model training and evaluation
│   ├── pipelines/             #   ML pipelines
│   ├── inference/             #   Prediction and inference
│   └── utils/                 #   Utility functions
├── tests/                     # Unit and integration tests
├── logs/                      # Application logs
├── .gitignore                 # Git ignore rules
├── .env                       # Environment variables
├── README.md                  # Project documentation
├── requirements.txt           # Python dependencies
├── setup.py                   # Package configuration
├── Dockerfile                 # Container configuration
└── Makefile                   # Common tasks automation

Key Components

Data Pipeline

Data Ingestion: Download from URLs or ingest local data
Preprocessing: Data cleaning, feature engineering, train/test splits
Configuration-driven: Paths and parameters in YAML config

Model Training

Multiple ML Libraries: Support for scikit-learn, PyTorch, and LLMs
Model Selection: Baseline model comparison utilities
Metrics: Comprehensive evaluation metrics for classification/regression
Model Persistence: Save/load trained models

Inference & Deployment

FastAPI API: Production-ready REST API with Pydantic schemas
Streamlit UI: Interactive web interface for demonstrations
CLI Tool: Command-line interface for batch processing
Docker: Containerized deployment

Development Tools

Testing: pytest-based test structure
CI/CD: GitHub Actions workflow
Logging: Configurable logging with multiple handlers
Code Quality: Linting and formatting setup

Usage Examples

Running the Full Pipeline

cd your_project_name
python scripts/run_pipeline.py

Training a Model

python scripts/run_train.py

Making Predictions

# Via CLI
python scripts/run_inference.py --input_json '{"feature1": 1.0, "feature2": 2.0}'

# Via API (after starting server)
uvicorn application.api.main:app --reload
curl -X POST "http://localhost:8000/predict" -H "Content-Type: application/json" -d '{"feature1": 1.0, "feature2": 2.0}'

# Via Streamlit UI
streamlit run application/ui/app.py

Using the Makefile

make install    # Install dependencies
make test       # Run tests
make train      # Train model
make run_api    # Start API server
make clean      # Clean cache files

Configuration

The generated project uses YAML configuration files:

# src/config/config.yaml
paths:
  raw_data: "data/raw/dataset.csv"
  processed_data: "data/processed/processed_data.parquet"
  model_output: "outputs/saved_models/"

model_params:
  random_state: 42
  test_size: 0.2

hyperparameters:
  sklearn_example:
    n_estimators: 100
    max_depth: 5

Customization

After generation, customize the project:

Update requirements.txt with your specific dependencies
Modify src/config/config.yaml with your project parameters
Implement your data loading logic in src/data_ingestion/
Define your model architecture in src/models/
Update API schemas in application/api/schemas.py

Docker Deployment

The project includes Docker configuration:

# Build the image
docker build -t your_project_name .

# Run the container
docker run -p 8000:8000 your_project_name

Testing

Run the test suite:

# All tests
pytest

# Specific test file
pytest tests/test_models.py

# With coverage
pytest --cov=src tests/

Requirements

Python 3.8+
Bash shell (for running the generator script)
Git (optional, for version control)

Supported ML Frameworks

The generated templates support:

scikit-learn: Traditional ML algorithms
PyTorch: Deep learning models
Hugging Face Transformers: LLMs and NLP models
Custom models: Extensible architecture

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

This project is open source. Please check the LICENSE file for details.

Support

If you encounter any issues or have questions:

Check the generated project's README for project-specific guidance
Review the template files for implementation examples
Open an issue on GitHub

Generated with the ML Project Generator - A tool for creating production-ready ML project structures.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
generate.sh		generate.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Project Generator

Features

Quick Start

Generated Project Structure

Key Components

Data Pipeline

Model Training

Inference & Deployment

Development Tools

Usage Examples

Running the Full Pipeline

Training a Model

Making Predictions

Using the Makefile

Configuration

Customization

Docker Deployment

Testing

Requirements

Supported ML Frameworks

Contributing

License

Support

About

Uh oh!

Releases

Packages

Languages

nirban/ML_Project_generator

Folders and files

Latest commit

History

Repository files navigation

ML Project Generator

Features

Quick Start

Generated Project Structure

Key Components

Data Pipeline

Model Training

Inference & Deployment

Development Tools

Usage Examples

Running the Full Pipeline

Training a Model

Making Predictions

Using the Makefile

Configuration

Customization

Docker Deployment

Testing

Requirements

Supported ML Frameworks

Contributing

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages