NBPipeliner lite (to be renamed to NBPipelite)

NBPipeliner is a Python library designed to automate the execution of Jupyter notebooks, convert them to HTML reports, and serve them via a Flask web application. It also includes scheduling capabilities to run notebooks at regular intervals.

Features

Automated Notebook Execution: Execute Jupyter notebooks using Papermill.
HTML Conversion: Convert executed notebooks to HTML reports using nbconvert.
Web Serving: Serve HTML reports via a Flask web application.
Scheduling: Schedule notebook executions at configurable intervals.
Logging: Comprehensive logging for monitoring and debugging.

Prerequisites

Python 3.8 or higher
Poetry for dependency management

Installation

1. Clone the Repository

git clone https://github.com/compartia/nbpipeline.git
cd nbpipeliner

2. Install Dependencies

Ensure you have Poetry installed. If not, install it using:

curl -sSL https://install.python-poetry.org | python3 -

Then, install the project's dependencies:

poetry install

3. Activate the Virtual Environment

Activate Poetry's virtual environment:

poetry shell

Configuration

NBPipeliner can be configured using environment variables. Below are the available configurations:

Environment Variable	Description	Default Value
`NBP_LOG_LEVEL`	Logging level for console output	`INFO`
`NBP_LOG_LEVEL_FILE`	Logging level for file output	`INFO`
`NBP_DEFAULT_SCHEDULE_INTERVAL_MINUTES`	Interval in minutes between scheduled notebook executions	`10`
`NPB_SERVER_PORT`	Port number for the Flask web server	`8088`
`NPB_SERVER_HOST`	Host address for the Flask web server	`0.0.0.0`
`NPB_WORK_DIR`	Base directory for data (reports)	`./data`

Setting Environment Variables

You can set environment variables in your shell or use a .env file with tools like python-dotenv.

Example:

export LOG_LEVEL=DEBUG
export NPB_SERVER_PORT=5000

Usage

1. Prepare Notebooks

Ensure that your Jupyter notebooks are placed in the notebooks_dir directory. The names of these notebooks should match the names defined in your pipeline stages. This ensures that each stage can correctly locate and execute the corresponding notebook.

For example, if you have a stage defined as ('sample_stage_1', 'stage1'), there should be a notebook named sample_stage_1.ipynb in your notebooks_dir directory.

2. Define Pipeline Stages

Define your pipeline stages. Each stage is a tuple containing the notebook name and the URL path.

    from nbpipeline.run import NBPipeliner

    PIPELINE_STAGES = [  
        ('sample_stage_1', 'stage1'),
        ('sample_stage_2', 'stage2'),    
    ]
    pipeline = NBPipeliner(__PIPELINE_STAGES, notebooks_dir)
    pipeline.start()

3. Run your Application

Upon running, the application will:

Execute the defined Jupyter notebooks.
Convert them to HTML reports.
Serve the reports via a Flask web application accessible at http://<host>:<port>/.

4. Access the Reports

Open your web browser and navigate to http://localhost:8088/ (replace 8088 with your configured port). You'll see a navigation page listing all the reports. Click on any link to view the corresponding HTML report.

5. Scheduling

The pipeline is scheduled to execute notebooks at intervals defined by NBP_DEFAULT_SCHEDULE_INTERVAL_MINUTES. By default, it's set to run every 10 minutes. You can adjust this interval by setting the environment variable accordingly.

Example:

To schedule the pipeline to run every 30 minutes:

export NBP_DEFAULT_SCHEDULE_INTERVAL_MINUTES=30

data/reports/: Generated HTML reports will be saved here.

Contributing

Contributions are welcome! Please follow these steps:

Fork the Repository
Create a Feature Branch
```
git checkout -b feature/YourFeature
```
Commit Your Changes
```
git commit -m "Add your feature"
```
Push to the Branch
```
git push origin feature/YourFeature
```
Open a Pull Request

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
nbpipeline		nbpipeline
notebooks		notebooks
.flake8		.flake8
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
README.md		README.md
example.env		example.env
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NBPipeliner lite (to be renamed to NBPipelite)

Table of Contents

Features

Prerequisites

Installation

1. Clone the Repository

2. Install Dependencies

3. Activate the Virtual Environment

Configuration

Setting Environment Variables

Usage

1. Prepare Notebooks

2. Define Pipeline Stages

3. Run your Application

4. Access the Reports

5. Scheduling

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NBPipeliner lite (to be renamed to NBPipelite)

Table of Contents

Features

Prerequisites

Installation

1. Clone the Repository

2. Install Dependencies

3. Activate the Virtual Environment

Configuration

Setting Environment Variables

Usage

1. Prepare Notebooks

2. Define Pipeline Stages

3. Run your Application

4. Access the Reports

5. Scheduling

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages