fix(test-analytics): Add unique constraint to Flake model#841
fix(test-analytics): Add unique constraint to Flake model#841sentry[bot] wants to merge 1 commit intomainfrom
Conversation
| migrations.AddConstraint( | ||
| model_name="flake", | ||
| constraint=models.UniqueConstraint( | ||
| fields=["repoid", "test_id"], | ||
| name="test_analytics_flake_repoid_test_id_unique", | ||
| ), | ||
| ), |
There was a problem hiding this comment.
Bug: The database migration to add a unique constraint on (repoid, test_id) is missing a step to deduplicate existing data, which will cause the migration to fail during deployment.
Severity: CRITICAL
Suggested Fix
Add a RunPython or RunSQL operation to the migration file before the AddConstraint operation. This operation should identify and remove duplicate Flake records, keeping only one instance for each unique (repoid, test_id) pair, before the unique constraint is applied.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location:
libs/shared/shared/django_apps/test_analytics/migrations/0008_flake_repoid_test_id_unique.py#L10-L16
Potential issue: The new database migration in `0008_flake_repoid_test_id_unique.py`
adds a `UniqueConstraint` for the `(repoid, test_id)` fields on the `Flake` model.
However, it does not include a data migration step to remove existing duplicate entries
that were created by previous buggy code. When this migration is run on a database
containing such duplicates, PostgreSQL will raise an error because the unique constraint
cannot be applied. This will cause the entire migration process to fail, blocking
deployments and requiring manual database intervention to resolve.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit b7ed358. Configure here.
| fields=["repoid", "test_id"], | ||
| name="test_analytics_flake_repoid_test_id_unique", | ||
| ), | ||
| ), |
There was a problem hiding this comment.
Migration will fail if duplicate flake rows exist
Medium Severity
The migration adds a UniqueConstraint on (repoid, test_id) without first cleaning up any existing duplicate rows. Since the PR acknowledges the previous bulk_create used incorrect unique_fields=["id"], duplicate (repoid, test_id) pairs may already exist in production. If they do, PostgreSQL will reject the constraint with an IntegrityError, causing the migration to fail and blocking deployment. A data migration step to deduplicate existing rows is needed before adding the constraint.
Reviewed by Cursor Bugbot for commit b7ed358. Configure here.
| fields=["repoid", "test_id"], | ||
| name="test_analytics_flake_repoid_test_id_unique", | ||
| ), | ||
| ] |
There was a problem hiding this comment.
Redundant index duplicates new unique constraint's index
Low Severity
The new UniqueConstraint on ["repoid", "test_id"] implicitly creates a unique B-tree index in PostgreSQL. The pre-existing models.Index(fields=["repoid", "test_id"]) on the same columns is now fully redundant. Maintaining both means double the write overhead and storage for an identical index. The explicit Index on these fields can be removed (and a migration generated to drop it).
Reviewed by Cursor Bugbot for commit b7ed358. Configure here.
| curr_flakes.values(), | ||
| update_conflicts=True, | ||
| unique_fields=["id"], | ||
| unique_fields=["repoid", "test_id"], |
There was a problem hiding this comment.
Re-detected flake preserves stale start_date after expiry
Medium Severity
When a flake expires (30 passes → end_date set, removed from dict) and the same test subsequently fails, handle_failure creates a new in-memory Flake with a fresh start_date. With the old unique_fields=["id"], this would insert a new row with the correct start_date. With the new unique_fields=["repoid", "test_id"], it conflicts with the expired DB row and does an UPDATE — but start_date is not in update_fields, so the original (potentially months-old) start_date is silently preserved. This makes the re-detected flake appear to have started far earlier than it actually did.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit b7ed358. Configure here.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #841 +/- ##
=======================================
Coverage 92.25% 92.25%
=======================================
Files 1307 1308 +1
Lines 48017 48022 +5
Branches 1636 1636
=======================================
+ Hits 44299 44304 +5
Misses 3407 3407
Partials 311 311
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |


Fixes WORKER-Y89. The issue was that: Individual Flake object saves in a loop cause N+1 queries, exacerbated by incorrect
bulk_createunique_fields.Flakemodel onrepoidandtest_idto prevent duplicate entries.bulk_createoperations forFlakeobjects to userepoidandtest_idasunique_fieldsfor conflict resolution, aligning with the new constraint.This fix was generated by Seer in Sentry, triggered automatically. 👁️ Run ID: 13448192
Not quite right? Click here to continue debugging with Seer.
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.
Note
Medium Risk
Includes a new database uniqueness constraint and changes upsert conflict keys, which could surface existing duplicates or cause unexpected write failures during migration/deploy.
Overview
Prevents duplicate
Flakerows by enforcing uniqueness on (repoid,test_id) and aligning write paths to upsert on that key.Adds a Django
UniqueConstraintplus migration for theFlakemodel, and updates thebulk_create(..., update_conflicts=True)calls in flake-processing workers to useunique_fields=["repoid", "test_id"]instead ofidwhen resolving conflicts.Reviewed by Cursor Bugbot for commit b7ed358. Bugbot is set up for automated code reviews on this repo. Configure here.