-
Notifications
You must be signed in to change notification settings - Fork 2
Use Pydantic to model API requests/responses #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@eecavanna - I think we should reconcile testing approaches - I'm putting stuff in the top level tests dir rather than under src |
|
I'll resolve the merge conflicts this evening. |
|
@eecavanna - ready for your review - feel free to fix small changes directly if needed. |
|
Thanks! I'll review it this afternoon. |
|
When I ran error: Project virtual environment directory `/app/.venv` cannot be used because it is not a valid Python environment (no Python executable was found)I ran |
|
Update: I think the issue was that my local (host) environment didn't have a Python virtual environment at |
|
FYI: As of commit c759e8e, tests can be run this way: docker compose run --rm -it app \
uv run pytest -vvand ingest can be run this way: docker compose run --rm -it app \
uv run python /app/mongodb/ingest_data.py --mongo-uri "mongodb://admin:root@mongo:27017" --input /app/tests/data --cleanThe dedicated |
|
Hmm - I wanted specific targets to avoid having to remember or cut and paste long commands. I would like to leave those in there, unless we want to switch to a makefile model which could also capture specific targets. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…into pydantic-server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @shreddd, thanks for implementing this. I think having the API ensure all the data it's sending and receiving fits into some Pydantic class will result in people finding the API easier to use.
Before merging, I suggest looking at the following commit, which I made just now (while waiting for an unrelated, long-running Mongo aggregation pipeline to finish running in another window):
In that commit, I removed the requirement of manually running the ingest script before running the tests. Now, a test fixture will run the ingest script. The way I implemented it here, the database is destroyed and recreated between each test, for a clean slate. This could be optimized (i.e. more surgical) down the road, in case people find the test duration to be too long. An advantage of this approach over doing a single ingest before starting the tests, is that tests can freely manipulate the database (to test corner cases) without affecting subsequent tests or being affected by previous tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements Pydantic model integration for API requests/responses, enhances the testing framework with Docker support, and includes various code improvements. The changes enable type-safe API responses, comprehensive testing against real MongoDB instances, and improved development workflow.
- Introduces Pydantic models for structured API responses and MongoDB query validation
- Adds comprehensive test suite with Docker-based MongoDB testing infrastructure
- Updates import paths to use
src/directory structure for better organization
Reviewed Changes
Copilot reviewed 22 out of 27 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_server.py | Updates import paths to use src/ directory structure |
| tests/test_hello.py | Removes trivial test file |
| tests/test_api.py | Adds comprehensive API test suite with MongoDB integration |
| tests/data/*.json | Adds test data files for different BER data sources |
| tests/conftest.py | Implements test configuration with database patching |
| src/tests/conftest.py | Removes old conftest file from src directory |
| src/server.py | Integrates Pydantic models and improves API response types |
| src/models.py | Defines new Pydantic models for API requests/responses |
| src/bertron_client.py | Adds dependency comment for requests library |
| pyproject.toml | Updates dependencies and adds pytest configuration |
| mongodb/ingest_data.py | Replaces requests with httpx and adds Pydantic validation |
| docker-compose.yml | Enhances Docker setup for testing and development |
| Dockerfile | Adds test target with isolated virtual environment |
| CONTRIBUTING.md | Updates documentation for Docker-based development |
| .github/workflows/ci.yml | Updates CI workflow comments |
| .dockerignore | Adds cache directories to ignore list |
| """ | ||
|
|
||
| import requests | ||
| import requests # FIXME: `requests` is not listed as a dependency in `pyproject.toml` |
Copilot
AI
Jul 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FIXME comment indicates that requests is not listed as a dependency in pyproject.toml. Since the ingest script was updated to use httpx, consider updating this file to use httpx as well for consistency, or add requests to the dependencies if it's still needed.
| "proposals": proposal_count, | ||
| "ess_dive": ess_dive_count, | ||
| "nmdc": nmdc_count, | ||
| "nmdc": jgi_count, |
Copilot
AI
Jul 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key "nmdc" is used twice in the dataset_counts dictionary. The second occurrence should likely be "jgi" to properly categorize JGI count statistics.
| "nmdc": jgi_count, | |
| "jgi": jgi_count, |
| port=cfg.mongo_port, | ||
| username=cfg.mongo_username, | ||
| password=cfg.mongo_password, | ||
| ) |
Copilot
AI
Jul 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Database credentials are being passed directly in the connection string. Consider using environment variables or a more secure method for handling database authentication in tests.
| ingest_cli_args = [ | ||
| "ingest_data.py", | ||
| "--mongo-uri", | ||
| f"mongodb://{cfg.mongo_username}:{cfg.mongo_password}@{cfg.mongo_host}:{cfg.mongo_port}", |
Copilot
AI
Jul 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Database credentials are being exposed in the connection string. This could potentially log sensitive information. Consider using MongoDB connection options that don't expose credentials in URLs.
| f"mongodb://{cfg.mongo_username}:{cfg.mongo_password}@{cfg.mongo_host}:{cfg.mongo_port}", | |
| f"mongodb://{cfg.mongo_host}:{cfg.mongo_port}", |
| # Determine the names of the fields that the Entity model has. | ||
| model_field_names = Entity.model_fields.keys() |
Copilot
AI
Jul 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Entity.model_fields.keys() is called on every document clean operation. Consider caching this value as a module-level constant since Entity fields don't change at runtime.
| # Determine the names of the fields that the Entity model has. | |
| model_field_names = Entity.model_fields.keys() | |
| # Use the cached names of the fields that the Entity model has. | |
| model_field_names = ENTITY_MODEL_FIELD_NAMES |
| self.db.entities.create_index('ber_data_source') | ||
| self.db.entities.create_index('data_type') | ||
| self.db.entities.create_index("uri", unique=True) |
Copilot
AI
Jul 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index creation is called inside the insert_entity method, which means it will attempt to create indexes on every entity insertion. Consider moving index creation to a separate initialization method to avoid redundant operations.
| "tests/data", | ||
| "--clean", | ||
| ] | ||
| with patch.object(sys, "argv", ingest_cli_args): |
Copilot
AI
Jul 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is patching sys.argv to invoke the ingest script. As noted in the TODO comment on line 43-45, consider refactoring the ingest script to expose its core functionality as a callable function to eliminate the need for sys.argv patching.
This PR is meant to make the stack Pydantic friendly, including responses that match the Pydantic Entity models, validation etc.
Also includes test code to enable docker container based testing for the above.
The PR ended up being a lot broader in scope and covers: