Skip to content

Require project_id on list endpoints for the four hot tables#1250

Merged
mihow merged 2 commits intomainfrom
fix/require-project-on-hot-list-endpoints
Apr 20, 2026
Merged

Require project_id on list endpoints for the four hot tables#1250
mihow merged 2 commits intomainfrom
fix/require-project-on-hot-list-endpoints

Conversation

@mihow
Copy link
Copy Markdown
Collaborator

@mihow mihow commented Apr 17, 2026

Summary

Opt-in alternative to #1133. Adds a require_project_for_list flag to ProjectMixin (default False) and opts in only the four list endpoints that cause real damage when scanned unfiltered:

  • ClassificationViewSet
  • OccurrenceViewSet
  • DetectionViewSet
  • SourceImageViewSet

Also sets require_project = True on SummaryView and removes the unfiltered else branch.

Context

On 2026-02-16 the Facebook crawler (meta-externalagent) hit /api/v2/classifications/ 13k+ times in a day without a project filter. The resulting COUNT(*) on the classifications table took ~3.6 min per request and dragged P99 latency past 2 min. The crawler is now blocked at nginx, but unfiltered list queries from any client remain a foot-gun on the four largest tables.

Why opt-in, not opt-out

The draft in #1133 flips the ProjectMixin default to require_project_for_list = True. That sounds narrower than it is: LimitOffsetPaginationWithPermissions (the global default pagination class, ami/base/pagination.py:29-30) calls view.get_active_project() on every paginated list response for any ProjectMixin viewset — not just viewsets that call it themselves. Flipping the default therefore affects ~20 viewsets, and silently breaks any of them that don't get an explicit opt-out, including:

  • GET /api/v2/projects/ (listing projects themselves — used on UI load)
  • GET /api/v2/algorithms/, /api/v2/pipelines/ (admin ML views)
  • GET /api/v2/sites/, /api/v2/devices/, /api/v2/storage/

Opt-in keeps the change scoped to the tables we actually want to protect and needs no audit of the other 15+ viewsets. The default-flip can come as a follow-up once those opt-outs are verified.

Changes

File Change
ami/base/views.py Add require_project_for_list = False to ProjectMixin; enforce in get_active_project() when action == "list"
ami/main/api/views.py Set require_project_for_list = True on Classification/Occurrence/Detection/SourceImage viewsets
ami/main/api/views.py DetectionViewSet.list() calls self.get_active_project() up front so the 400 fires before DRF pagination's queryset.count() runs a full-table scan. The other three hot viewsets already call get_active_project() inside get_queryset(), so the check lands before pagination's COUNT naturally.
ami/main/api/views.py SummaryView: require_project = True + delete unfiltered else branch
ami/main/tests.py New TestProjectRequiredOnListEndpoints: 4 hot endpoints 400/200, summary 400/200, regression guard for /projects/ and /taxa/
ami/main/tests.py Update two pre-existing tests for the new contract: TestDraftProjectPermissions.test_summary_counts asserts 400 on global /status/summary/, and TestProjectDefaultThresholdFilter.test_no_project_id_requires_project asserts 400 on unfiltered /api/v2/occurrences/

Behavior

  • GET /api/v2/classifications/ → 400 {"project_id": ["This field is required."]}
  • GET /api/v2/classifications/?project_id=1 → 200
  • GET /api/v2/classifications/123/ → 200 (detail view unaffected)
  • GET /api/v2/projects/ → 200 (unchanged)
  • GET /api/v2/taxa/ → 200 (unchanged)
  • GET /api/v2/status/summary/ → 400
  • GET /api/v2/status/summary/?project_id=1 → 200

Test plan

  • New TestProjectRequiredOnListEndpoints tests pass (4 cases) — 400 on missing project_id for the 4 hot endpoints, 200 with project_id, regression guard for /projects/ and /taxa/.
  • Updated summary / occurrence tests pass locally under docker compose -f docker-compose.ci.yml run --rm django python manage.py test ami.main.tests.TestDraftProjectPermissions.test_summary_counts ami.main.tests.TestProjectDefaultThresholdFilter ami.main.tests.TestProjectRequiredOnListEndpoints.
  • Smoke-test the UI home page (project list) loads.
  • Smoke-test occurrence/capture/classification galleries load with project_id.

Follow-ups (not in this PR)

  • Add composite index main_detection(occurrence_id, timestamp DESC) #1251: composite index main_detection(occurrence_id, timestamp DESC) for the /captures/ join hot path (separate migration PR).
  • Tighten the frontend useStatus hook typing — projectId?: stringprojectId: string — so the 400 contract on SummaryView is reflected at the call-site level (current callers already pass it).
  • Full audit of the remaining ProjectMixin viewsets if/when we flip the mixin default.

Adds `require_project_for_list` flag to ProjectMixin (default False) and
opts in ClassificationViewSet, OccurrenceViewSet, DetectionViewSet, and
SourceImageViewSet. Also sets require_project=True on SummaryView and
removes the unfiltered else branch.

Context: on 2026-02-16 the Facebook crawler hit /api/v2/classifications/
13k+ times without a project filter, causing 3.6 min COUNT(*) queries
and P99 > 2 min. The crawler is now blocked at nginx, but unfiltered
list queries from any client remain a risk.

This is the opt-in alternative to #1133: flipping the mixin default to
True has too wide a blast radius (pagination calls get_active_project
on every ProjectMixin viewset, so `/projects/`, `/algorithms/`, etc.
would all 400 without opt-outs). Opt-in keeps the change scoped to the
hot tables we actually want to protect.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 17, 2026 20:48
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 17, 2026

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit 479ad8a
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/69e2ca72eba77c0008b118b6

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 17, 2026

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit 479ad8a
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/69e2ca7253c9ab0008cf4581

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

The pull request refactors project context enforcement across API endpoints by introducing a require_project_for_list attribute to ProjectMixin. This allows selective project requirement enforcement for list actions separately from other actions. Multiple viewsets are updated to enforce this on their list endpoints, and SummaryView is simplified to always require project context.

Changes

Cohort / File(s) Summary
ProjectMixin Enhancement
ami/base/views.py
Added require_project_for_list class attribute and modified get_active_project() to compute action-dependent project requirement logic; changed exception message from specific ("Project ID is required but was not provided or is invalid") to generic ("Project not found.") to rely on serializer validation.
ViewSet List Endpoint Configuration
ami/main/api/views.py
Added require_project_for_list = True to four viewsets (SourceImageViewSet, DetectionViewSet, OccurrenceViewSet, ClassificationViewSet); simplified SummaryView.get() to always return project-scoped aggregates by removing conditional branching and the non-project variant including last_updated field.
Test Coverage Addition
ami/main/tests.py
New TestProjectRequiredOnListEndpoints test suite verifying project requirement enforcement on list endpoints; validates that hotly-accessed endpoints return 400 without project_id and 200 with it, plus regression coverage for unrelated endpoints.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ViewSet as ViewSet<br/>(ProjectMixin)
    participant ProjectMixin as ProjectMixin<br/>.get_active_project()
    participant Serializer as Serializer<br/>(Validation)

    Client->>ViewSet: GET /api/v2/classifications/ (list)
    ViewSet->>ProjectMixin: get_active_project()<br/>(action="list")
    ProjectMixin->>ProjectMixin: Compute required flag<br/>(require_project_for_list=True)
    alt project_id provided & valid
        ProjectMixin->>Serializer: Validate project_id
        Serializer->>Serializer: Valid
        Serializer-->>ViewSet: Validated project
        ViewSet-->>Client: 200 OK<br/>(project-scoped data)
    else project_id missing or invalid
        ProjectMixin->>ViewSet: Return None or raise
        ViewSet->>Serializer: Validate required project_id
        Serializer->>Serializer: Raise ValidationError
        Serializer-->>Client: 400 BAD_REQUEST
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

backend

Poem

🐰 A fluffy hop through API flows,

Where projects now must show,

List endpoints stand with stricter ground,

No wandering without bounds!

SummaryView's logic flows more clear,

Project context, always here! 🌿

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: requiring project_id on list endpoints for the four high-traffic 'hot table' viewsets (classifications, occurrences, detections, source images).
Description check ✅ Passed The PR description comprehensively covers all required template sections with clear explanations of changes, context, behavior, testing, and follow-ups.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/require-project-on-hot-list-endpoints

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in mechanism to require project_id on selected high-cost list endpoints to prevent unfiltered full-table scans, while keeping existing list endpoints unchanged unless explicitly opted in.

Changes:

  • Introduces require_project_for_list (default False) in ProjectMixin and enforces it via get_active_project() for action == "list".
  • Opts in the four “hot table” viewsets (classifications/occurrences/detections/captures) to require project_id for list requests; makes /status/summary/ require a project and removes the unscoped branch.
  • Adds API tests to assert 400/200 behavior for the protected endpoints and regression-guard unaffected list endpoints.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
ami/base/views.py Adds require_project_for_list and updates get_active_project() enforcement logic.
ami/main/api/views.py Opts specific viewsets into list-time project enforcement; requires project for SummaryView.
ami/main/tests.py Adds tests covering required/accepted behavior for protected list endpoints and guards unaffected endpoints.
Comments suppressed due to low confidence (1)

ami/main/api/views.py:926

  • DetectionViewSet opts into require_project_for_list, but the view never calls get_active_project() in list()/get_queryset(). With DRF LimitOffsetPagination, the expensive queryset.count() happens in paginate_queryset() before get_paginated_response() calls view.get_active_project(), so an unfiltered request can still trigger a full-table COUNT even though it ultimately returns 400. To actually prevent the scan, force self.get_active_project() to run before pagination (e.g., at the start of list() or inside get_queryset()).
    require_project_for_list = True  # Unfiltered list scans are too expensive on this table
    queryset = Detection.objects.exclude(NULL_DETECTIONS_FILTER).select_related("source_image", "detection_algorithm")
    serializer_class = DetectionSerializer
    filterset_fields = ["source_image", "detection_algorithm", "source_image__project"]
    ordering_fields = ["created_at", "updated_at", "detection_score", "timestamp"]

    def get_serializer_class(self):
        """
        Return different serializers for list and detail views.
        """
        if self.action == "list":
            return DetectionListSerializer
        else:
            return DetectionSerializer

    @extend_schema(parameters=[project_id_doc_param])
    def list(self, request, *args, **kwargs):
        return super().list(request, *args, **kwargs)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ami/base/views.py Outdated
@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented Apr 17, 2026

@coderabbitai ready for review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

@mihow Sure, I'll start reviewing the PR now!

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
ami/main/tests.py (2)

2479-2489: ⚠️ Potential issue | 🔴 Critical

Update this test for the guarded occurrence list.

/api/v2/occurrences/ is one of the hot list endpoints now requiring project_id; this test still expects the old unfiltered 200 response and is failing CI.

🐛 Proposed fix
-    def test_no_project_id_returns_all(self):
-        """Without project_id, threshold falls back to 0.0 and returns all occurrences"""
+    def test_no_project_id_requires_project(self):
+        """Occurrence list requires project_id to avoid unfiltered hot-table scans."""
         url = "/api/v2/occurrences/?limit=1000"
         res = self.client.get(url)
-        self.assertEqual(res.status_code, status.HTTP_200_OK)
-
-        # Check that our test occurrences are present (don't assume all in DB are ours)
-        expected_ids = {occ.pk for occ in list(self.high_occurrences) + list(self.low_occurrences)}
-        returned_ids = {o["id"] for o in res.data["results"]}
-
-        self.assertTrue(expected_ids.issubset(returned_ids), f"Missing occurrence IDs: {expected_ids - returned_ids}")
+        self.assertEqual(res.status_code, status.HTTP_400_BAD_REQUEST)
+        self.assertIn("project_id", res.data)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/main/tests.py` around lines 2479 - 2489, The test
test_no_project_id_returns_all must be updated because /api/v2/occurrences/ now
requires a project_id; modify the test to supply a valid project_id query param
(e.g., append ?project_id=<test_project.pk>&limit=1000 or include project_id in
the existing URL construction) and adjust assertions to only expect occurrences
for that project (build expected_ids from
self.high_occurrences/self.low_occurrences filtered by that project or use the
specific test_project occurrences), then assert the returned_ids contain that
filtered expected_ids and that the response status remains HTTP_200_OK.

2315-2354: ⚠️ Potential issue | 🔴 Critical

Update stale global summary assertions.

SummaryView.require_project=True makes global_url return 400 now, so this test’s global-summary block contradicts the new contract and is failing CI. Keep the project-scoped visibility assertions, and remove or rewrite the global comparison as an explicit “requires project_id” assertion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/main/tests.py` around lines 2315 - 2354, The global-summary assertions
are now stale because SummaryView.require_project=True; replace the existing
global block that calls global_url and asserts 200 with assertions that the
endpoint requires a project_id: call _auth_get(self.outsider, global_url) and
_auth_get(self.member, global_url) and assert their status_code is 400 (or the
view's error code) and optionally assert the response message indicates
"requires project_id" (check outsider_global_response.json() /
member_global_response.json() for the exact error field); leave the
project-scoped checks (project_url, draft visibility) untouched and remove any
comparisons between outsider_global_response and member_global_response that
assume successful global summaries.
ami/main/api/views.py (2)

909-926: ⚠️ Potential issue | 🟠 Major

Add missing require_project_for_list flag and wire project scoping to DetectionViewSet.

The code doesn't currently set require_project_for_list = True and lacks a get_queryset() method to enforce project scoping. Without both, the /api/v2/detections/ list endpoint bypasses project isolation despite having project_id as a filterable field.

Add the flag and implement project filtering following the established pattern in SourceImageViewSet:

Proposed fix
class DetectionViewSet(DefaultViewSet, ProjectMixin):
    """
    API endpoint that allows detections to be viewed or edited.
    """

+    require_project_for_list = True  # Unfiltered list scans are too expensive on this table
    queryset = Detection.objects.exclude(NULL_DETECTIONS_FILTER).select_related("source_image", "detection_algorithm")
    serializer_class = DetectionSerializer
    filterset_fields = ["source_image", "detection_algorithm", "source_image__project"]
    ordering_fields = ["created_at", "updated_at", "detection_score", "timestamp"]

+    def get_queryset(self):
+        qs = super().get_queryset()
+        project = self.get_active_project()
+        if project:
+            qs = qs.filter(source_image__project=project)
+        return qs
+
    def get_serializer_class(self):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/main/api/views.py` around lines 909 - 926, DetectionViewSet is missing
the require_project_for_list guard and a project-aware queryset; set
require_project_for_list = True and implement get_queryset(self) (or override
get_queryset in the class that currently defines
queryset/select_related/serializer_class and list logic) to read the current
project from self.request (or use the same project-extraction helper used by
SourceImageViewSet), filter Detection.objects.exclude(NULL_DETECTIONS_FILTER) by
source_image__project equal to the current project when a project is present,
preserve the existing select_related("source_image", "detection_algorithm") and
ordering/filterset behavior, and ensure list uses this filtered queryset so the
/api/v2/detections/ endpoint enforces project scoping.

1814-1822: ⚠️ Potential issue | 🟠 Major

Ensure useStatus hook contract matches API requirement: make projectId mandatory or add error handling.

The backend's SummaryView enforces require_project = True, returning a 400 ValidationError when project_id is missing. However, the useStatus hook still declares projectId?: string as optional.

Current call sites (summary.tsx and useNavItems.ts) safely pass projectId, but the hook signature invites undefined values. When projectId is undefined, getFetchUrl omits the project_id parameter, triggering a 400 response. Neither caller currently handles the error field returned by the hook.

Either make projectId required in the hook signature or ensure all callers explicitly handle the 400 error scenario.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/main/api/views.py` around lines 1814 - 1822, SummaryView sets
require_project = True (returns 400 if project_id missing) but the useStatus
hook declares projectId?: string optional and its getFetchUrl can omit
project_id causing 400s; update the hook signature to require projectId (change
useStatus to accept projectId: string) and update all callers (summary.tsx and
useNavItems.ts) to pass a non-null projectId, or alternatively keep the optional
signature but add explicit error handling around getFetchUrl/returned error to
handle 400 responses; ensure references to SummaryView/require_project and
getFetchUrl are accounted for when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@ami/main/api/views.py`:
- Around line 909-926: DetectionViewSet is missing the require_project_for_list
guard and a project-aware queryset; set require_project_for_list = True and
implement get_queryset(self) (or override get_queryset in the class that
currently defines queryset/select_related/serializer_class and list logic) to
read the current project from self.request (or use the same project-extraction
helper used by SourceImageViewSet), filter
Detection.objects.exclude(NULL_DETECTIONS_FILTER) by source_image__project equal
to the current project when a project is present, preserve the existing
select_related("source_image", "detection_algorithm") and ordering/filterset
behavior, and ensure list uses this filtered queryset so the /api/v2/detections/
endpoint enforces project scoping.
- Around line 1814-1822: SummaryView sets require_project = True (returns 400 if
project_id missing) but the useStatus hook declares projectId?: string optional
and its getFetchUrl can omit project_id causing 400s; update the hook signature
to require projectId (change useStatus to accept projectId: string) and update
all callers (summary.tsx and useNavItems.ts) to pass a non-null projectId, or
alternatively keep the optional signature but add explicit error handling around
getFetchUrl/returned error to handle 400 responses; ensure references to
SummaryView/require_project and getFetchUrl are accounted for when making the
change.

In `@ami/main/tests.py`:
- Around line 2479-2489: The test test_no_project_id_returns_all must be updated
because /api/v2/occurrences/ now requires a project_id; modify the test to
supply a valid project_id query param (e.g., append
?project_id=<test_project.pk>&limit=1000 or include project_id in the existing
URL construction) and adjust assertions to only expect occurrences for that
project (build expected_ids from self.high_occurrences/self.low_occurrences
filtered by that project or use the specific test_project occurrences), then
assert the returned_ids contain that filtered expected_ids and that the response
status remains HTTP_200_OK.
- Around line 2315-2354: The global-summary assertions are now stale because
SummaryView.require_project=True; replace the existing global block that calls
global_url and asserts 200 with assertions that the endpoint requires a
project_id: call _auth_get(self.outsider, global_url) and _auth_get(self.member,
global_url) and assert their status_code is 400 (or the view's error code) and
optionally assert the response message indicates "requires project_id" (check
outsider_global_response.json() / member_global_response.json() for the exact
error field); leave the project-scoped checks (project_url, draft visibility)
untouched and remove any comparisons between outsider_global_response and
member_global_response that assume successful global summaries.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3110f623-f547-4a5b-b900-b3e0df9f15a9

📥 Commits

Reviewing files that changed from the base of the PR and between 2714195 and 4a6094f.

📒 Files selected for processing (3)
  • ami/base/views.py
  • ami/main/api/views.py
  • ami/main/tests.py

…re pagination COUNT, update stale tests

- ami/base/views.py: rename require_project/require_project_for_list comments to "Opt-in" so they read correctly given the False default (per Copilot).
- ami/main/api/views.py: call self.get_active_project() at the top of DetectionViewSet.list() so the missing-project_id 400 fires before DRF pagination's queryset.count() runs a full-table scan. The other three hot viewsets already call get_active_project() inside get_queryset().
- ami/main/tests.py: update two pre-existing tests that relied on the old unguarded contract — global /status/summary/ is now 400, and /api/v2/occurrences/ without project_id is now 400.

Co-Authored-By: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request Apr 20, 2026
@mihow
Copy link
Copy Markdown
Collaborator Author

mihow commented Apr 20, 2026

Tested on staging. All UI routes to these "hot" endpoints appears to already pass project_id

@mihow mihow merged commit e4f9d46 into main Apr 20, 2026
7 checks passed
@mihow mihow deleted the fix/require-project-on-hot-list-endpoints branch April 20, 2026 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants