Use Pydantic to model API requests/responses #77

shreddd · 2025-07-19T21:04:02Z

This PR is meant to make the stack Pydantic friendly, including responses that match the Pydantic Entity models, validation etc.

Also includes test code to enable docker container based testing for the above.

The PR ended up being a lot broader in scope and covers:

Adding Pydantic models to responses
Updated tests and test framework
docker compose steps for running ingest and test
separation of docker-compose uv environments for test
Adding Pydantic validation to ingest
Miscellaneous code and doc cleanup

Use separate venv for tests

Dockerfile

.github/workflows/ci.yml

eecavanna · 2025-07-19T23:48:36Z

Hi @shreddd, I merged in PR #76 a couple minutes ago. That has created some Git merge conflicts on this branch. In case you want me to resolve those on this branch, you can message me here or on Slack and I'll take a crack at it.

shreddd · 2025-07-20T02:36:17Z

@eecavanna - I think we should reconcile testing approaches - I'm putting stuff in the top level tests dir rather than under src

eecavanna · 2025-07-20T02:57:11Z

I'll resolve the merge conflicts this evening.

src/server.py

shreddd · 2025-07-22T16:43:47Z

@eecavanna - ready for your review - feel free to fix small changes directly if needed.

eecavanna · 2025-07-22T16:44:46Z

Thanks! I'll review it this afternoon.

eecavanna · 2025-07-22T17:15:49Z

When I ran docker compose up --detach on this branch (as of commit 6676ae8), I got the following error in the app container logs (which remained in a reboot loop):

error: Project virtual environment directory `/app/.venv` cannot be used because it is not a valid Python environment (no Python executable was found)

I ran docker compose up build --detach (i.e. with build) also and saw the same behavior.

eecavanna · 2025-07-22T17:22:42Z

Update: I think the issue was that my local (host) environment didn't have a Python virtual environment at ./.venv (in the container, /app/.venv) at the time. I was able to get the container to stay running, by creating that Python virtual environment locally. This is a sign of a problem to me because the container is sharing the Python virtual environment with the host. I will fix this in docker-compose.yml now.

eecavanna · 2025-07-22T17:39:12Z

FYI: As of commit c759e8e, tests can be run this way:

docker compose run --rm -it app \
  uv run pytest -vv

and ingest can be run this way:

docker compose run --rm -it app \
  uv run python /app/mongodb/ingest_data.py --mongo-uri "mongodb://admin:root@mongo:27017" --input /app/tests/data --clean

The dedicated test, ingest, and ingest-test services could be retired. We can discuss/do that in a separate PR.

shreddd · 2025-07-22T17:47:15Z

Hmm - I wanted specific targets to avoid having to remember or cut and paste long commands. I would like to leave those in there, unless we want to switch to a makefile model which could also capture specific targets.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…into pydantic-server

eecavanna

Hi @shreddd, thanks for implementing this. I think having the API ensure all the data it's sending and receiving fits into some Pydantic class will result in people finding the API easier to use.

Before merging, I suggest looking at the following commit, which I made just now (while waiting for an unrelated, long-running Mongo aggregation pipeline to finish running in another window):

8022e99

In that commit, I removed the requirement of manually running the ingest script before running the tests. Now, a test fixture will run the ingest script. The way I implemented it here, the database is destroyed and recreated between each test, for a clean slate. This could be optimized (i.e. more surgical) down the road, in case people find the test duration to be too long. An advantage of this approach over doing a single ingest before starting the tests, is that tests can freely manipulate the database (to test corner cases) without affecting subsequent tests or being affected by previous tests.

Copilot

Pull Request Overview

This PR implements Pydantic model integration for API requests/responses, enhances the testing framework with Docker support, and includes various code improvements. The changes enable type-safe API responses, comprehensive testing against real MongoDB instances, and improved development workflow.

Introduces Pydantic models for structured API responses and MongoDB query validation
Adds comprehensive test suite with Docker-based MongoDB testing infrastructure
Updates import paths to use src/ directory structure for better organization

Reviewed Changes

Copilot reviewed 22 out of 27 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/test_server.py	Updates import paths to use src/ directory structure
tests/test_hello.py	Removes trivial test file
tests/test_api.py	Adds comprehensive API test suite with MongoDB integration
tests/data/*.json	Adds test data files for different BER data sources
tests/conftest.py	Implements test configuration with database patching
src/tests/conftest.py	Removes old conftest file from src directory
src/server.py	Integrates Pydantic models and improves API response types
src/models.py	Defines new Pydantic models for API requests/responses
src/bertron_client.py	Adds dependency comment for requests library
pyproject.toml	Updates dependencies and adds pytest configuration
mongodb/ingest_data.py	Replaces requests with httpx and adds Pydantic validation
docker-compose.yml	Enhances Docker setup for testing and development
Dockerfile	Adds test target with isolated virtual environment
CONTRIBUTING.md	Updates documentation for Docker-based development
.github/workflows/ci.yml	Updates CI workflow comments
.dockerignore	Adds cache directories to ignore list

Copilot · 2025-07-23T05:08:38Z

src/bertron_client.py

 """

-import requests
+import requests  # FIXME: `requests` is not listed as a dependency in `pyproject.toml`


The FIXME comment indicates that requests is not listed as a dependency in pyproject.toml. Since the ingest script was updated to use httpx, consider updating this file to use httpx as well for consistency, or add requests to the dependencies if it's still needed.

Copilot · 2025-07-23T05:08:38Z

mongodb/legacy/geo_query.py

+                "proposals": proposal_count,
+                "ess_dive": ess_dive_count,
+                "nmdc": nmdc_count,
+                "nmdc": jgi_count,


The key "nmdc" is used twice in the dataset_counts dictionary. The second occurrence should likely be "jgi" to properly categorize JGI count statistics.

Suggested change

"nmdc": jgi_count,

"jgi": jgi_count,

Copilot · 2025-07-23T05:08:38Z

tests/test_api.py

+        port=cfg.mongo_port,
+        username=cfg.mongo_username,
+        password=cfg.mongo_password,
+    )


Database credentials are being passed directly in the connection string. Consider using environment variables or a more secure method for handling database authentication in tests.

Copilot · 2025-07-23T05:08:39Z

tests/test_api.py

+    ingest_cli_args = [
+        "ingest_data.py",
+        "--mongo-uri",
+        f"mongodb://{cfg.mongo_username}:{cfg.mongo_password}@{cfg.mongo_host}:{cfg.mongo_port}",


Database credentials are being exposed in the connection string. This could potentially log sensitive information. Consider using MongoDB connection options that don't expose credentials in URLs.

Suggested change

f"mongodb://{cfg.mongo_username}:{cfg.mongo_password}@{cfg.mongo_host}:{cfg.mongo_port}",

f"mongodb://{cfg.mongo_host}:{cfg.mongo_port}",

Copilot · 2025-07-23T05:08:39Z

src/server.py

+    # Determine the names of the fields that the Entity model has.
+    model_field_names = Entity.model_fields.keys()


The Entity.model_fields.keys() is called on every document clean operation. Consider caching this value as a module-level constant since Entity fields don't change at runtime.

Suggested change

# Determine the names of the fields that the Entity model has.

model_field_names = Entity.model_fields.keys()

# Use the cached names of the fields that the Entity model has.

model_field_names = ENTITY_MODEL_FIELD_NAMES

Copilot · 2025-07-23T05:08:39Z

mongodb/ingest_data.py

-            self.db.entities.create_index('ber_data_source')
-            self.db.entities.create_index('data_type')
-    
+            self.db.entities.create_index("uri", unique=True)


Index creation is called inside the insert_entity method, which means it will attempt to create indexes on every entity insertion. Consider moving index creation to a separate initialization method to avoid redundant operations.

Copilot · 2025-07-23T05:08:39Z

tests/test_api.py

+        "tests/data",
+        "--clean",
+    ]
+    with patch.object(sys, "argv", ingest_cli_args):


The test is patching sys.argv to invoke the ingest script. As noted in the TODO comment on line 43-45, consider refactoring the ingest script to expose its core functionality as a callable function to eliminate the need for sys.argv patching.

shreddd added 6 commits July 18, 2025 12:26

updated server models for all responses

999cabe

relative import

14cd2cd

Add testing setup

e9d1277

Use separate venv for tests

ruff

eae2bb3

Merge branch 'main' into pydantic-server

d06765e

Update ci and skip failing tests

16066fa

eecavanna reviewed Jul 19, 2025

View reviewed changes

Dockerfile Show resolved Hide resolved

eecavanna reviewed Jul 19, 2025

View reviewed changes

.github/workflows/ci.yml Outdated Show resolved Hide resolved

eecavanna mentioned this pull request Jul 19, 2025

Read MongoDB connection parameters from environment variables #76

Merged

cleanup

68f02de

eecavanna requested a review from Copilot July 20, 2025 04:58

This comment was marked as outdated.

Sign in to view

eecavanna added 3 commits July 19, 2025 22:51

Merge branch 'main' into pydantic-server

b8a4c3e

Use standard TestClient instead of transitive requests package

75320c7

Update GHA workflow to ingest data into bertron_test database

084c978

This was referenced Jul 20, 2025

Update Docker Compose file to use constant MongoDB host and port #79

Closed

Update Docker Compose file to use constant Mongo host and port #78

Open

eecavanna added 8 commits July 19, 2025 23:59

Omit -it options from docker compose run in GHA workflow

2589b52

Temporarily add "known passing" test to facilitate debugging GHA

f02d6d7

Update ingester to log the database name

2d6cb43

Remove duplicate test used for debugging

a6e0bac

Patch the config object via both import paths

e7a73db

Clarify comment

19c9cfd

Convert pytest into a dev-dependency

39ed799

Convert httpx into non-dev dependency because ingester imports it

69341cc

eecavanna assigned shreddd Jul 20, 2025

eecavanna reviewed Jul 20, 2025

View reviewed changes

src/server.py Outdated Show resolved Hide resolved

shreddd added 3 commits July 22, 2025 09:26

comment

7463251

ruff updates

fe0e47b

remove unused Dict

6676ae8

shreddd marked this pull request as ready for review July 22, 2025 16:43

eecavanna requested a review from Copilot July 22, 2025 16:45

This comment was marked as outdated.

Sign in to view

Omit .venv folder from volume mount to avoid host-guest interference

c759e8e

shreddd and others added 12 commits July 22, 2025 10:50

Update CONTRIBUTING.md

8241e8b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Relocate model to models.py

89433de

Merge branch 'pydantic-server' of https://github.com/ber-data/bertron …

5fcbce1

…into pydantic-server

Use ruff to reformat Python source files (to resolve GHA failure)

264aa54

Remove commented-out command for running ingest directly

c757a87

Resolve type ambiguities in ingest script

98e0a82

Combine two startswith calls into one using tuple syntax

da01ccb

Clarify comment about what function does

b610001

Add doctest and configure pytest to run it

caf62e2

Use ruff to reformat Python module (to resolve GHA failure)

b631d10

Re-indent command to reflect abstraction layers

35b1075

Run ingest script automatically via pytest fixture

8022e99

eecavanna approved these changes Jul 22, 2025

View reviewed changes

eecavanna requested a review from Copilot July 23, 2025 05:07

Copilot AI reviewed Jul 23, 2025

View reviewed changes

shreddd merged commit 40102d9 into main Jul 24, 2025
1 check passed

shreddd deleted the pydantic-server branch July 24, 2025 01:16

	f"mongodb://{cfg.mongo_username}:{cfg.mongo_password}@{cfg.mongo_host}:{cfg.mongo_port}",
	f"mongodb://{cfg.mongo_host}:{cfg.mongo_port}",

		# Determine the names of the fields that the Entity model has.
		model_field_names = Entity.model_fields.keys()

Use Pydantic to model API requests/responses #77

Use Pydantic to model API requests/responses #77

Uh oh!

Conversation

shreddd commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eecavanna commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shreddd commented Jul 20, 2025

Uh oh!

eecavanna commented Jul 20, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

shreddd commented Jul 22, 2025

Uh oh!

eecavanna commented Jul 22, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

eecavanna commented Jul 22, 2025

Uh oh!

eecavanna commented Jul 22, 2025

Uh oh!

eecavanna commented Jul 22, 2025

Uh oh!

shreddd commented Jul 22, 2025

Uh oh!

eecavanna left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shreddd commented Jul 19, 2025 •

edited

Loading

eecavanna commented Jul 19, 2025 •

edited

Loading

eecavanna left a comment •

edited

Loading