-
Notifications
You must be signed in to change notification settings - Fork 84
ENG-1553 #6951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENG-1553 #6951
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (84.61%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #6951 +/- ##
==========================================
- Coverage 87.32% 87.31% -0.01%
==========================================
Files 525 525
Lines 34515 34526 +11
Branches 3984 3986 +2
==========================================
+ Hits 30140 30148 +8
- Misses 3509 3511 +2
- Partials 866 867 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Greptile OverviewGreptile SummaryThis PR addresses performance bottlenecks in the dataset reference validation step by eliminating N+1 query issues identified through Datadog logs. The changes use SQLAlchemy's Key optimizations:
The refactoring in Confidence Score: 5/5
Important Files ChangedFile Analysis
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, no comments
| name=connection_key, | ||
| collections=[collection], | ||
| connection_key=connection_key, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to say that I love the style of doing validation early and return/continue for failure so that the core logic doesn't end up nested seven layers deep 👍
thabofletcher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 🚀 🚀
Co-authored-by: Jade Wibbels <jade@ethyca.com>
Ticket ENG-1553
Description Of Changes
🎯 e2e tests were taking longer than usual to finish, they were taking 7-10 minutes to finish the ‘Dataset reference validation’ step.
Using Datadog logs located some bottleneck areas:
Performance impact
Before (from logs):
18 seconds from start to dataset parsing
Multiple rounds of dataset parsing (suggesting repeated validation)
159 datasets × 2 relationships = potentially 318+ queries just for dataset loading
After (expected):
~3 queries for dataset loading (1 for datasets, 1 for connection_configs, 1 for ctl_datasets)
~2 queries for connection configs (1 for configs, 1 for datasets)
~3 queries for manual tasks (1 for tasks, 1 for configs/fields, 1 for dependencies)
~3 queries for manual webhooks (1 for webhooks, 1 for connection_configs, 1 for systems)
Total: ~11 queries instead of potentially 500+
These changes should reduce the "Dataset reference validation" step time. The optimizations address the N+1 query issues identified in the logs.
Code Changes
src/fides/api/models/manual_webhook.py- useselectinloadsrc/fides/api/service/privacy_request/request_runner_service.py- useselectinloadsrc/fides/api/task/manual/manual_task_utils.py- useselectinloadand batch loadSteps to Confirm
Pre-Merge Checklist
CHANGELOG.mdupdatedmaindowngrade()migration is correct and works