perf(worker): Optimize flake processing by bulk updating testruns#853
perf(worker): Optimize flake processing by bulk updating testruns#853sentry[bot] wants to merge 1 commit intomainfrom
Conversation
| extra={"upload": upload.id}, | ||
| ) | ||
|
|
||
| Testrun.objects.bulk_update(all_testruns, ["outcome"]) |
There was a problem hiding this comment.
Bug: An exception during the processing loop in process_flakes_for_commit will cause all pending testrun outcome updates for the commit to be lost, as the final bulk_update is no longer atomic per-upload.
Severity: MEDIUM
Suggested Fix
Wrap the processing for each upload within the main loop of process_flakes_for_commit in a with transaction.atomic(): block. This will ensure that database operations for each upload are treated as a single atomic unit, preventing partial updates and data inconsistency if an error occurs mid-process.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: apps/worker/services/test_analytics/ta_process_flakes.py#L143
Potential issue: The function `process_flakes_for_commit` was refactored to perform a
single `bulk_update` of testrun outcomes after processing all uploads for a commit.
However, the processing loop still contains individual database writes, such as
`flake.save()` within `handle_pass`. If a database exception occurs during one of these
individual writes, the function will exit prematurely. As a result, the final
`bulk_update` is never executed, causing all accumulated testrun outcome updates for
that commit to be lost. This can lead to data inconsistency, where a flake's state is
updated but the corresponding testrun outcome is not.
Did we get this right? 👍 / 👎 to inform future reviews.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #853 +/- ##
=======================================
Coverage 92.25% 92.25%
=======================================
Files 1307 1307
Lines 48017 48025 +8
Branches 1636 1636
=======================================
+ Hits 44299 44307 +8
Misses 3407 3407
Partials 311 311
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Fixes WORKER-X12. The issue was that:
process_flakes_for_commititerates uploads, callingget_testrunsper upload, causing N+1 database queries.get_testrunsintoget_testruns_for_uploadsto fetch testruns for multiple uploads simultaneously.get_testruns_for_uploadsto return a dictionary mapping upload IDs to their respective testruns.process_single_uploadto accept testruns as an argument, removing individual database fetches.Testrunbulk updates inprocess_flakes_for_committo perform a single update for all processed testruns, reducing database operations.This fix was generated by Seer in Sentry, triggered automatically. 👁️ Run ID: 13507733
Not quite right? Click here to continue debugging with Seer.
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.
Note
Medium Risk
Moderate risk: changes query/update behavior in flake detection by batching
Testrunfetches and centralizingbulk_update, which could affect which rows are updated if upload IDs/filters are incorrect.Overview
Improves
process_flakes_for_commitperformance by replacing per-uploadget_testrunsqueries with a singleget_testruns_for_uploads(upload_ids)query that groups recentTestruns by upload.process_single_uploadnow consumes pre-fetched testruns, and outcome changes are persisted via one centralizedTestrun.objects.bulk_update(...)after all uploads are processed (instead of per-upload updates).Reviewed by Cursor Bugbot for commit 4bb87c4. Bugbot is set up for automated code reviews on this repo. Configure here.