Skip to content

perf(worker): Optimize flake processing by bulk updating testruns#853

Open
sentry[bot] wants to merge 1 commit intomainfrom
seer/perf/optimize-flake-processing-db
Open

perf(worker): Optimize flake processing by bulk updating testruns#853
sentry[bot] wants to merge 1 commit intomainfrom
seer/perf/optimize-flake-processing-db

Conversation

@sentry
Copy link
Copy Markdown
Contributor

@sentry sentry bot commented Apr 17, 2026

Fixes WORKER-X12. The issue was that: process_flakes_for_commit iterates uploads, calling get_testruns per upload, causing N+1 database queries.

  • Refactored get_testruns into get_testruns_for_uploads to fetch testruns for multiple uploads simultaneously.
  • Modified get_testruns_for_uploads to return a dictionary mapping upload IDs to their respective testruns.
  • Updated process_single_upload to accept testruns as an argument, removing individual database fetches.
  • Centralized Testrun bulk updates in process_flakes_for_commit to perform a single update for all processed testruns, reducing database operations.
  • Improved overall database efficiency by reducing the number of queries and bulk update calls during flake processing.

This fix was generated by Seer in Sentry, triggered automatically. 👁️ Run ID: 13507733

Not quite right? Click here to continue debugging with Seer.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.


Note

Medium Risk
Moderate risk: changes query/update behavior in flake detection by batching Testrun fetches and centralizing bulk_update, which could affect which rows are updated if upload IDs/filters are incorrect.

Overview
Improves process_flakes_for_commit performance by replacing per-upload get_testruns queries with a single get_testruns_for_uploads(upload_ids) query that groups recent Testruns by upload.

process_single_upload now consumes pre-fetched testruns, and outcome changes are persisted via one centralized Testrun.objects.bulk_update(...) after all uploads are processed (instead of per-upload updates).

Reviewed by Cursor Bugbot for commit 4bb87c4. Bugbot is set up for automated code reviews on this repo. Configure here.

extra={"upload": upload.id},
)

Testrun.objects.bulk_update(all_testruns, ["outcome"])
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: An exception during the processing loop in process_flakes_for_commit will cause all pending testrun outcome updates for the commit to be lost, as the final bulk_update is no longer atomic per-upload.
Severity: MEDIUM

Suggested Fix

Wrap the processing for each upload within the main loop of process_flakes_for_commit in a with transaction.atomic(): block. This will ensure that database operations for each upload are treated as a single atomic unit, preventing partial updates and data inconsistency if an error occurs mid-process.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: apps/worker/services/test_analytics/ta_process_flakes.py#L143

Potential issue: The function `process_flakes_for_commit` was refactored to perform a
single `bulk_update` of testrun outcomes after processing all uploads for a commit.
However, the processing loop still contains individual database writes, such as
`flake.save()` within `handle_pass`. If a database exception occurs during one of these
individual writes, the function will exit prematurely. As a result, the final
`bulk_update` is never executed, causing all accumulated testrun outcome updates for
that commit to be lost. This can lead to data inconsistency, where a flake's state is
updated but the corresponding testrun outcome is not.

Did we get this right? 👍 / 👎 to inform future reviews.

@sentry
Copy link
Copy Markdown
Contributor Author

sentry bot commented Apr 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.25%. Comparing base (0ad8a0c) to head (4bb87c4).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #853   +/-   ##
=======================================
  Coverage   92.25%   92.25%           
=======================================
  Files        1307     1307           
  Lines       48017    48025    +8     
  Branches     1636     1636           
=======================================
+ Hits        44299    44307    +8     
  Misses       3407     3407           
  Partials      311      311           
Flag Coverage Δ
workerintegration 58.52% <13.33%> (-0.03%) ⬇️
workerunit 90.39% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@codecov-notifications
Copy link
Copy Markdown

codecov-notifications bot commented Apr 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants