NBPipeliner is a Python library designed to automate the execution of Jupyter notebooks, convert them to HTML reports, and serve them via a Flask web application. It also includes scheduling capabilities to run notebooks at regular intervals.
- Automated Notebook Execution: Execute Jupyter notebooks using Papermill.
- HTML Conversion: Convert executed notebooks to HTML reports using nbconvert.
- Web Serving: Serve HTML reports via a Flask web application.
- Scheduling: Schedule notebook executions at configurable intervals.
- Logging: Comprehensive logging for monitoring and debugging.
- Python 3.8 or higher
- Poetry for dependency management
git clone https://github.com/compartia/nbpipeline.git
cd nbpipelinerEnsure you have Poetry installed. If not, install it using:
curl -sSL https://install.python-poetry.org | python3 -Then, install the project's dependencies:
poetry installActivate Poetry's virtual environment:
poetry shellNBPipeliner can be configured using environment variables. Below are the available configurations:
| Environment Variable | Description | Default Value |
|---|---|---|
NBP_LOG_LEVEL |
Logging level for console output | INFO |
NBP_LOG_LEVEL_FILE |
Logging level for file output | INFO |
NBP_DEFAULT_SCHEDULE_INTERVAL_MINUTES |
Interval in minutes between scheduled notebook executions | 10 |
NPB_SERVER_PORT |
Port number for the Flask web server | 8088 |
NPB_SERVER_HOST |
Host address for the Flask web server | 0.0.0.0 |
NPB_WORK_DIR |
Base directory for data (reports) | ./data |
You can set environment variables in your shell or use a .env file with tools like python-dotenv.
Example:
export LOG_LEVEL=DEBUG
export NPB_SERVER_PORT=5000Ensure that your Jupyter notebooks are placed in the notebooks_dir directory. The names of these notebooks should match the names defined in your pipeline stages. This ensures that each stage can correctly locate and execute the corresponding notebook.
For example, if you have a stage defined as ('sample_stage_1', 'stage1'), there should be a notebook named sample_stage_1.ipynb in your notebooks_dir directory.
Define your pipeline stages. Each stage is a tuple containing the notebook name and the URL path.
from nbpipeline.run import NBPipeliner
PIPELINE_STAGES = [
('sample_stage_1', 'stage1'),
('sample_stage_2', 'stage2'),
]
pipeline = NBPipeliner(__PIPELINE_STAGES, notebooks_dir)
pipeline.start()Upon running, the application will:
- Execute the defined Jupyter notebooks.
- Convert them to HTML reports.
- Serve the reports via a Flask web application accessible at
http://<host>:<port>/.
Open your web browser and navigate to http://localhost:8088/ (replace 8088 with your configured port). You'll see a navigation page listing all the reports. Click on any link to view the corresponding HTML report.
The pipeline is scheduled to execute notebooks at intervals defined by NBP_DEFAULT_SCHEDULE_INTERVAL_MINUTES. By default, it's set to run every 10 minutes. You can adjust this interval by setting the environment variable accordingly.
Example:
To schedule the pipeline to run every 30 minutes:
export NBP_DEFAULT_SCHEDULE_INTERVAL_MINUTES=30- data/reports/: Generated HTML reports will be saved here.
Contributions are welcome! Please follow these steps:
-
Fork the Repository
-
Create a Feature Branch
git checkout -b feature/YourFeature
-
Commit Your Changes
git commit -m "Add your feature" -
Push to the Branch
git push origin feature/YourFeature
-
Open a Pull Request
This project is licensed under the MIT License.