nerdd-backend is the central component of NERDD - a platform providing machine learning models
for cheminformatics. The backend
- accepts HTTP / websocket requests,
- communicates with all components of the platform via a message broker, and
- stores persistent information on the file system or in a database.
# clone repository
git clone https://github.com/molinfo-vienna/nerdd-backend
# create conda environment
cd nerdd-backend
conda env create -f environment.yml
conda activate nerdd_backend
# install dependencies
pip install .
# run the server using development config
# (this works without any prerequisites; it uses a toy communication channel,
# a fake database backend and a basic computational module for demonstration)
python -m nerdd_backend.main
# use localhost:8000 for api requests
curl localhost:8000/modules
# see all endpoints at API docs at http://localhost:8000/docs
xdg-open http://localhost:8000/docsCustom options can be passed by providing a predefined configuration (see nerdd_backend/config)
or overwriting individual options using ++. By default, the development config (
nerdd_backend/config/development.yaml) is loaded.
# switch to production config
python -m nerdd_backend.main --config-name production
# change port
python -m nerdd_backend.main ++port=7999
# turn off fake data
python -m nerdd_backend.main ++mock_infra=false
# set storage directory
python -m nerdd_backend.main ++media_root=./media
# run on existing kafka cluster as communication channel and rethinkdb database backend
# (see all options in config files in config folder)
python -m nerdd_backend.main --config-name production \
++db.host=localhost ++db.port=31562 \
++channel.broker=localhost:31624- The subpackage
nerdd_backend.routerscontains all FastAPI routes accessible by the user. - All routes have access to the FastAPI application state using
request.app.statecontainingstate.config: custom settings (e.g.max_num_molecules_per_job) provided by the user when starting the server,state.repository: the database access layer,state.channel: an object for sending messages to the message broker, andstate.filesystem: an object for storing and retrieving files.
- Application settings are managed using Hydra.
Predefined configurations are defined in
/config, but individual options may be overridden when starting the server application. - All schemas of requests, responses and records persisted to the database are declared as Pydantic
models (deriving from
pydantic.BaseModel) innerdd_backend.models. - The
nerdd_backend.data.Repositoryclass represents the database access layer specifying all database interactions. For a concrete database, a subclass ofRepositoryis defined and all required methods are implemented. This abstraction allows the replacement of the underlying database technology without changes to the application logic. - Communication with the NERDD infrastructure is handled via a message broker. The package
nerdd-linkdefines message schemas and available topics. Specifically, thenerdd_link.Actionclass allows iterating over all messages in a topic and processing them using custom logic. All files in thenerdd_backend.actionssubpackage contain subclasses ofnerdd-link.Actionthat react to messages coming from the message broker. - Actions are executed asynchronously in the global FastAPI lifespan, i.e. they run in parallel to FastAPI's handlers that serve HTTP requests.
# create conda environment
conda env create -f environment.yml
# install package in development mode
pip install -e .[test]
# run tests
pytest
# run tests with automatic reload on code changes
ptw
