Skip to content

Conversation

@shreddd
Copy link
Contributor

@shreddd shreddd commented Jul 19, 2025

This PR is meant to make the stack Pydantic friendly, including responses that match the Pydantic Entity models, validation etc.

Also includes test code to enable docker container based testing for the above.

The PR ended up being a lot broader in scope and covers:

  • Adding Pydantic models to responses
  • Updated tests and test framework
  • docker compose steps for running ingest and test
  • separation of docker-compose uv environments for test
  • Adding Pydantic validation to ingest
  • Miscellaneous code and doc cleanup

@eecavanna
Copy link
Collaborator

eecavanna commented Jul 19, 2025

Hi @shreddd, I merged in PR #76 a couple minutes ago. That has created some Git merge conflicts on this branch. In case you want me to resolve those on this branch, you can message me here or on Slack and I'll take a crack at it.

@shreddd
Copy link
Contributor Author

shreddd commented Jul 20, 2025

@eecavanna - I think we should reconcile testing approaches - I'm putting stuff in the top level tests dir rather than under src

@eecavanna
Copy link
Collaborator

I'll resolve the merge conflicts this evening.

@eecavanna eecavanna requested a review from Copilot July 20, 2025 04:58

This comment was marked as outdated.

@shreddd shreddd marked this pull request as ready for review July 22, 2025 16:43
@shreddd
Copy link
Contributor Author

shreddd commented Jul 22, 2025

@eecavanna - ready for your review - feel free to fix small changes directly if needed.

@eecavanna
Copy link
Collaborator

Thanks! I'll review it this afternoon.

@eecavanna eecavanna requested a review from Copilot July 22, 2025 16:45

This comment was marked as outdated.

@eecavanna
Copy link
Collaborator

When I ran docker compose up --detach on this branch (as of commit 6676ae8), I got the following error in the app container logs (which remained in a reboot loop):

error: Project virtual environment directory `/app/.venv` cannot be used because it is not a valid Python environment (no Python executable was found)

I ran docker compose up build --detach (i.e. with build) also and saw the same behavior.

@eecavanna
Copy link
Collaborator

Update: I think the issue was that my local (host) environment didn't have a Python virtual environment at ./.venv (in the container, /app/.venv) at the time. I was able to get the container to stay running, by creating that Python virtual environment locally. This is a sign of a problem to me because the container is sharing the Python virtual environment with the host. I will fix this in docker-compose.yml now.

@eecavanna
Copy link
Collaborator

FYI: As of commit c759e8e, tests can be run this way:

docker compose run --rm -it app \
  uv run pytest -vv

and ingest can be run this way:

docker compose run --rm -it app \
  uv run python /app/mongodb/ingest_data.py --mongo-uri "mongodb://admin:root@mongo:27017" --input /app/tests/data --clean

The dedicated test, ingest, and ingest-test services could be retired. We can discuss/do that in a separate PR.

@shreddd
Copy link
Contributor Author

shreddd commented Jul 22, 2025

Hmm - I wanted specific targets to avoid having to remember or cut and paste long commands. I would like to leave those in there, unless we want to switch to a makefile model which could also capture specific targets.

Copy link
Collaborator

@eecavanna eecavanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shreddd, thanks for implementing this. I think having the API ensure all the data it's sending and receiving fits into some Pydantic class will result in people finding the API easier to use.

Before merging, I suggest looking at the following commit, which I made just now (while waiting for an unrelated, long-running Mongo aggregation pipeline to finish running in another window):

8022e99

In that commit, I removed the requirement of manually running the ingest script before running the tests. Now, a test fixture will run the ingest script. The way I implemented it here, the database is destroyed and recreated between each test, for a clean slate. This could be optimized (i.e. more surgical) down the road, in case people find the test duration to be too long. An advantage of this approach over doing a single ingest before starting the tests, is that tests can freely manipulate the database (to test corner cases) without affecting subsequent tests or being affected by previous tests.

@eecavanna eecavanna requested a review from Copilot July 23, 2025 05:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements Pydantic model integration for API requests/responses, enhances the testing framework with Docker support, and includes various code improvements. The changes enable type-safe API responses, comprehensive testing against real MongoDB instances, and improved development workflow.

  • Introduces Pydantic models for structured API responses and MongoDB query validation
  • Adds comprehensive test suite with Docker-based MongoDB testing infrastructure
  • Updates import paths to use src/ directory structure for better organization

Reviewed Changes

Copilot reviewed 22 out of 27 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/test_server.py Updates import paths to use src/ directory structure
tests/test_hello.py Removes trivial test file
tests/test_api.py Adds comprehensive API test suite with MongoDB integration
tests/data/*.json Adds test data files for different BER data sources
tests/conftest.py Implements test configuration with database patching
src/tests/conftest.py Removes old conftest file from src directory
src/server.py Integrates Pydantic models and improves API response types
src/models.py Defines new Pydantic models for API requests/responses
src/bertron_client.py Adds dependency comment for requests library
pyproject.toml Updates dependencies and adds pytest configuration
mongodb/ingest_data.py Replaces requests with httpx and adds Pydantic validation
docker-compose.yml Enhances Docker setup for testing and development
Dockerfile Adds test target with isolated virtual environment
CONTRIBUTING.md Updates documentation for Docker-based development
.github/workflows/ci.yml Updates CI workflow comments
.dockerignore Adds cache directories to ignore list

"""

import requests
import requests # FIXME: `requests` is not listed as a dependency in `pyproject.toml`
Copy link

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FIXME comment indicates that requests is not listed as a dependency in pyproject.toml. Since the ingest script was updated to use httpx, consider updating this file to use httpx as well for consistency, or add requests to the dependencies if it's still needed.

Copilot uses AI. Check for mistakes.
"proposals": proposal_count,
"ess_dive": ess_dive_count,
"nmdc": nmdc_count,
"nmdc": jgi_count,
Copy link

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key "nmdc" is used twice in the dataset_counts dictionary. The second occurrence should likely be "jgi" to properly categorize JGI count statistics.

Suggested change
"nmdc": jgi_count,
"jgi": jgi_count,

Copilot uses AI. Check for mistakes.
port=cfg.mongo_port,
username=cfg.mongo_username,
password=cfg.mongo_password,
)
Copy link

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Database credentials are being passed directly in the connection string. Consider using environment variables or a more secure method for handling database authentication in tests.

Copilot uses AI. Check for mistakes.
ingest_cli_args = [
"ingest_data.py",
"--mongo-uri",
f"mongodb://{cfg.mongo_username}:{cfg.mongo_password}@{cfg.mongo_host}:{cfg.mongo_port}",
Copy link

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Database credentials are being exposed in the connection string. This could potentially log sensitive information. Consider using MongoDB connection options that don't expose credentials in URLs.

Suggested change
f"mongodb://{cfg.mongo_username}:{cfg.mongo_password}@{cfg.mongo_host}:{cfg.mongo_port}",
f"mongodb://{cfg.mongo_host}:{cfg.mongo_port}",

Copilot uses AI. Check for mistakes.
Comment on lines +340 to +341
# Determine the names of the fields that the Entity model has.
model_field_names = Entity.model_fields.keys()
Copy link

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Entity.model_fields.keys() is called on every document clean operation. Consider caching this value as a module-level constant since Entity fields don't change at runtime.

Suggested change
# Determine the names of the fields that the Entity model has.
model_field_names = Entity.model_fields.keys()
# Use the cached names of the fields that the Entity model has.
model_field_names = ENTITY_MODEL_FIELD_NAMES

Copilot uses AI. Check for mistakes.
self.db.entities.create_index('ber_data_source')
self.db.entities.create_index('data_type')
self.db.entities.create_index("uri", unique=True)
Copy link

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index creation is called inside the insert_entity method, which means it will attempt to create indexes on every entity insertion. Consider moving index creation to a separate initialization method to avoid redundant operations.

Copilot uses AI. Check for mistakes.
"tests/data",
"--clean",
]
with patch.object(sys, "argv", ingest_cli_args):
Copy link

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is patching sys.argv to invoke the ingest script. As noted in the TODO comment on line 43-45, consider refactoring the ingest script to expose its core functionality as a callable function to eliminate the need for sys.argv patching.

Copilot uses AI. Check for mistakes.
@shreddd shreddd merged commit 40102d9 into main Jul 24, 2025
1 check passed
@shreddd shreddd deleted the pydantic-server branch July 24, 2025 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants