A comprehensive shell script that generates a complete, production-ready machine learning project structure with best practices, including data pipelines, model training, deployment options, and more.
✨ Complete ML Project Structure: Creates a well-organized directory structure following ML engineering best practices
🔧 Multiple Deployment Options:
- FastAPI REST API
- Streamlit UI
- CLI interface with Typer
📊 Full ML Pipeline Support:
- Data ingestion and preprocessing
- Model training (scikit-learn, PyTorch, LLMs)
- Model evaluation and metrics
- Inference pipeline
🚀 Development Tools:
- Docker configuration
- GitHub Actions CI/CD
- Makefile for common tasks
- Comprehensive testing structure
📝 Documentation & Configuration:
- Pre-configured logging
- YAML-based configuration
- Environment variable management
- Detailed code templates
-
Clone the repository:
git clone <repository-url> cd ML_Project_generator
-
Make the script executable:
chmod +x generate.sh
-
Run the generator:
./generate.sh
-
Follow the prompts to enter your project name (or use the default)
The script creates a comprehensive project structure:
my_ml_project/
├── .github/workflows/ # GitHub Actions CI/CD
├── application/ # Deployment interfaces
│ ├── api/ # FastAPI REST API
│ ├── cli/ # Command-line interface
│ └── ui/ # Streamlit web interface
├── config/ # Configuration files
├── data/ # Data storage
│ ├── raw/ # Raw data
│ ├── processed/ # Processed data
│ └── external/ # External data sources
├── notebooks/ # Jupyter notebooks for EDA
├── outputs/ # Model outputs and reports
│ ├── saved_models/ # Trained models
│ ├── reports/ # Analysis reports
│ └── visualizations/ # Plots and charts
├── scripts/ # Execution scripts
├── src/ # Source code
│ ├── config/ # Configuration management
│ ├── data_ingestion/ # Data loading and preprocessing
│ ├── models/ # Model training and evaluation
│ ├── pipelines/ # ML pipelines
│ ├── inference/ # Prediction and inference
│ └── utils/ # Utility functions
├── tests/ # Unit and integration tests
├── logs/ # Application logs
├── .gitignore # Git ignore rules
├── .env # Environment variables
├── README.md # Project documentation
├── requirements.txt # Python dependencies
├── setup.py # Package configuration
├── Dockerfile # Container configuration
└── Makefile # Common tasks automation
- Data Ingestion: Download from URLs or ingest local data
- Preprocessing: Data cleaning, feature engineering, train/test splits
- Configuration-driven: Paths and parameters in YAML config
- Multiple ML Libraries: Support for scikit-learn, PyTorch, and LLMs
- Model Selection: Baseline model comparison utilities
- Metrics: Comprehensive evaluation metrics for classification/regression
- Model Persistence: Save/load trained models
- FastAPI API: Production-ready REST API with Pydantic schemas
- Streamlit UI: Interactive web interface for demonstrations
- CLI Tool: Command-line interface for batch processing
- Docker: Containerized deployment
- Testing: pytest-based test structure
- CI/CD: GitHub Actions workflow
- Logging: Configurable logging with multiple handlers
- Code Quality: Linting and formatting setup
cd your_project_name
python scripts/run_pipeline.pypython scripts/run_train.py# Via CLI
python scripts/run_inference.py --input_json '{"feature1": 1.0, "feature2": 2.0}'
# Via API (after starting server)
uvicorn application.api.main:app --reload
curl -X POST "http://localhost:8000/predict" -H "Content-Type: application/json" -d '{"feature1": 1.0, "feature2": 2.0}'
# Via Streamlit UI
streamlit run application/ui/app.pymake install # Install dependencies
make test # Run tests
make train # Train model
make run_api # Start API server
make clean # Clean cache filesThe generated project uses YAML configuration files:
# src/config/config.yaml
paths:
raw_data: "data/raw/dataset.csv"
processed_data: "data/processed/processed_data.parquet"
model_output: "outputs/saved_models/"
model_params:
random_state: 42
test_size: 0.2
hyperparameters:
sklearn_example:
n_estimators: 100
max_depth: 5After generation, customize the project:
- Update requirements.txt with your specific dependencies
- Modify src/config/config.yaml with your project parameters
- Implement your data loading logic in
src/data_ingestion/ - Define your model architecture in
src/models/ - Update API schemas in
application/api/schemas.py
The project includes Docker configuration:
# Build the image
docker build -t your_project_name .
# Run the container
docker run -p 8000:8000 your_project_nameRun the test suite:
# All tests
pytest
# Specific test file
pytest tests/test_models.py
# With coverage
pytest --cov=src tests/- Python 3.8+
- Bash shell (for running the generator script)
- Git (optional, for version control)
The generated templates support:
- scikit-learn: Traditional ML algorithms
- PyTorch: Deep learning models
- Hugging Face Transformers: LLMs and NLP models
- Custom models: Extensible architecture
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is open source. Please check the LICENSE file for details.
If you encounter any issues or have questions:
- Check the generated project's README for project-specific guidance
- Review the template files for implementation examples
- Open an issue on GitHub
Generated with the ML Project Generator - A tool for creating production-ready ML project structures.