Skip to content

Conversation

@JadeCara
Copy link
Contributor

@JadeCara JadeCara commented Nov 12, 2025

Ticket ENG-1553

Description Of Changes

🎯 e2e tests were taking longer than usual to finish, they were taking 7-10 minutes to finish the ‘Dataset reference validation’ step.

Using Datadog logs located some bottleneck areas:

  • DatasetConfig N+1 queries (lines 497-510)
  • ConnectionConfig N+1 queries (lines 532-540)
  • Manual Task N+1 queries (create_manual_task_artificial_graphs)
  • AccessManualWebhook N+1 queries (get_manual_webhook_access_inputs)

Performance impact
Before (from logs):
18 seconds from start to dataset parsing
Multiple rounds of dataset parsing (suggesting repeated validation)
159 datasets × 2 relationships = potentially 318+ queries just for dataset loading
After (expected):
~3 queries for dataset loading (1 for datasets, 1 for connection_configs, 1 for ctl_datasets)
~2 queries for connection configs (1 for configs, 1 for datasets)
~3 queries for manual tasks (1 for tasks, 1 for configs/fields, 1 for dependencies)
~3 queries for manual webhooks (1 for webhooks, 1 for connection_configs, 1 for systems)
Total: ~11 queries instead of potentially 500+
These changes should reduce the "Dataset reference validation" step time. The optimizations address the N+1 query issues identified in the logs.

Code Changes

  • src/fides/api/models/manual_webhook.py - use selectinload
  • src/fides/api/service/privacy_request/request_runner_service.py - use selectinload
  • src/fides/api/task/manual/manual_task_utils.py - use selectinload and batch load

Steps to Confirm

  1. Run with fidesplus pointed at this branch.
  2. Create several DSRs - There should be no change in functionality.
  3. I am going to reach out to @nrxsmith to see if there is any appreciable testing time diffs on nightly when merged.

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

@vercel
Copy link

vercel bot commented Nov 12, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Nov 19, 2025 7:08pm
fides-privacy-center Ignored Ignored Nov 19, 2025 7:08pm

@codecov
Copy link

codecov bot commented Nov 12, 2025

Codecov Report

❌ Patch coverage is 84.61538% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.31%. Comparing base (054f7de) to head (4c3411f).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/fides/api/task/manual/manual_task_utils.py 80.95% 2 Missing and 2 partials ⚠️

❌ Your patch status has failed because the patch coverage (84.61%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6951      +/-   ##
==========================================
- Coverage   87.32%   87.31%   -0.01%     
==========================================
  Files         525      525              
  Lines       34515    34526      +11     
  Branches     3984     3986       +2     
==========================================
+ Hits        30140    30148       +8     
- Misses       3509     3511       +2     
- Partials      866      867       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JadeCara JadeCara marked this pull request as ready for review November 12, 2025 16:13
@JadeCara JadeCara requested a review from a team as a code owner November 12, 2025 16:13
@JadeCara JadeCara requested review from thabofletcher and removed request for a team November 12, 2025 16:13
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 12, 2025

Greptile Overview

Greptile Summary

This PR addresses performance bottlenecks in the dataset reference validation step by eliminating N+1 query issues identified through Datadog logs. The changes use SQLAlchemy's selectinload to eagerly load relationships and batch loading to reduce query count from potentially 500+ to approximately 11 queries.

Key optimizations:

  • DatasetConfig loading: Eager loads connection_config and ctl_dataset relationships to avoid repeated queries when building dataset graphs
  • ConnectionConfig loading: Eager loads datasets relationship to prevent N+1 queries in filter_fides_connector_datasets
  • AccessManualWebhook loading: Eager loads connection_config.system relationship to avoid lazy loading during webhook processing
  • ManualTask loading: Refactored to batch load all manual tasks with their configs.field_definitions and conditional_dependencies relationships in a single query, replacing the previous approach of querying each connection key individually

The refactoring in manual_task_utils.py extracts common collection creation logic into a helper function (_create_collection_from_manual_task) to avoid code duplication while maintaining the same functionality. All changes are focused on query optimization without altering business logic.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are focused performance optimizations using well-established SQLAlchemy patterns (eager loading with selectinload). The business logic remains unchanged, only the query strategy is optimized. The refactoring in manual_task_utils.py properly maintains the same functionality while eliminating N+1 queries through batch loading. All relationship paths being eagerly loaded are valid and exist in the models.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
src/fides/api/models/manual_webhook.py 5/5 Optimized get_enabled method by eagerly loading connection_config.system relationship using selectinload to prevent N+1 queries when accessing webhook system information
src/fides/api/service/privacy_request/request_runner_service.py 5/5 Replaced DatasetConfig.all() and ConnectionConfig.all() with eager loading using selectinload to pre-fetch related connection_config, ctl_dataset, and datasets relationships, eliminating N+1 queries during dataset validation
src/fides/api/task/manual/manual_task_utils.py 5/5 Refactored create_manual_task_artificial_graphs to batch load all manual tasks with eager loading of configs.field_definitions and conditional_dependencies relationships, replacing per-connection queries with a single batched query

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

name=connection_key,
collections=[collection],
connection_key=connection_key,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to say that I love the style of doing validation early and return/continue for failure so that the core logic doesn't end up nested seven layers deep 👍

Copy link
Contributor

@thabofletcher thabofletcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🚀 🚀

@JadeCara JadeCara enabled auto-merge November 17, 2025 18:35
@JadeCara JadeCara added this pull request to the merge queue Nov 17, 2025
@JadeCara JadeCara removed this pull request from the merge queue due to a manual request Nov 17, 2025
@JadeCara JadeCara enabled auto-merge November 17, 2025 19:12
@JadeCara JadeCara disabled auto-merge November 17, 2025 19:57
@JadeCara JadeCara enabled auto-merge November 17, 2025 23:16
@JadeCara JadeCara added this pull request to the merge queue Nov 17, 2025
@JadeCara JadeCara removed this pull request from the merge queue due to a manual request Nov 17, 2025
@JadeCara JadeCara enabled auto-merge November 18, 2025 00:29
@JadeCara JadeCara added this pull request to the merge queue Nov 18, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 18, 2025
@JadeCara JadeCara added this pull request to the merge queue Nov 18, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 18, 2025
@JadeCara JadeCara added this pull request to the merge queue Nov 19, 2025
Merged via the queue into main with commit 9042667 Nov 19, 2025
68 of 69 checks passed
@JadeCara JadeCara deleted the ENG-1553-long-dataset-reference-validation-process branch November 19, 2025 20:19
jjdaurora pushed a commit that referenced this pull request Dec 5, 2025
Co-authored-by: Jade Wibbels <jade@ethyca.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants