Skip to content

fix: use NOT EXISTS for superseding score set filter to prevent row m…#707

Open
bencap wants to merge 1 commit intorelease-2026.1.3from
bugfix/bencap/675/search-row-multiplication
Open

fix: use NOT EXISTS for superseding score set filter to prevent row m…#707
bencap wants to merge 1 commit intorelease-2026.1.3from
bugfix/bencap/675/search-row-multiplication

Conversation

@bencap
Copy link
Copy Markdown
Collaborator

@bencap bencap commented Apr 14, 2026

The score set search query filtered out superseded score sets using a LEFT OUTER JOIN on the superseding_score_set relationship. Because replaces_id has no unique constraint, score sets with multiple superseding versions produced N rows per original, all counted against the SQL LIMIT. This caused paginated searches to return fewer unique score sets than requested (~84 instead of 100 on prod).

Replace the LEFT JOIN + OR filter with a NOT EXISTS subquery (via .has()), which produces exactly one row per score set regardless of how many superseders exist. Also strengthens the regression test to use multiple keywords per experiment and adds a new test for the multiple-superseders scenario.

Opens #706, which is the root cause of this issue. This fix mitigates consequences for the search endpoint specifically, but does not address all issues caused by the bug.

Although it turns out the joined_loads weren't the root cause of this specific issue, I'm leaving the new select_in_loads as they still represent an improvement over the prior code and could protect us from future row multiplication.

…ultiplication in search

The score set search query filtered out superseded score sets using a LEFT OUTER
JOIN on the superseding_score_set relationship. Because replaces_id has no unique
constraint, score sets with multiple superseding versions produced N rows per
original, all counted against the SQL LIMIT. This caused paginated searches to
return fewer unique score sets than requested (~84 instead of 100 on prod).

Replace the LEFT JOIN + OR filter with a NOT EXISTS subquery (via .has()), which
produces exactly one row per score set regardless of how many superseders exist.
Also strengthens the regression test to use multiple keywords per experiment and
adds a new test for the multiple-superseders scenario.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Score set search returns fewer results than expected due to row multiplication in query

1 participant