Conversation
📝 WalkthroughWalkthroughAdds a new assessment subsystem: DB models and migration, dataset upload/validation, batch submission for OpenAI/Google, async polling/processing, SSE events, export utilities, mappers, CRUD/service layers, API routes, and router/cron integration. Changes
Sequence DiagramsequenceDiagram
participant Client
participant API as API Routes
participant Service as Assessment Service
participant DB as Database
participant Provider as Batch Provider
participant Storage as Object Storage
participant Cron as Cron Poller
Client->>API: POST /assessment (dataset, configs)
API->>Service: start_assessment(request)
Service->>DB: create Assessment (manager)
Service->>DB: resolve configs
loop per config
Service->>DB: create AssessmentRun
Service->>Provider: submit_batch(payload)
Provider-->>Service: batch_job_id
Service->>DB: update run -> processing
end
Service-->>Client: AssessmentResponse
Cron->>DB: fetch pending runs
Cron->>Provider: check_batch_status(job_id)
alt batch complete
Provider-->>Cron: results
Cron->>Storage: upload_raw_results()
Cron->>Service: parse_assessment_output
Cron->>DB: update run -> completed
Cron->>API: publish SSE (results_ready)
else batch failed
Cron->>DB: update run -> failed
else in-progress
Cron-->>Cron: keep processing
end
Client->>API: GET /assessment/{id}/results?format=csv|json|xlsx
API->>Service: build_export_response
Service->>Storage: fetch_results()
Service-->>Client: StreamingResponse (file)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (15)
backend/app/assessment/utils/parsing.py-25-26 (1)
25-26:⚠️ Potential issue | 🟡 MinorUse explicit
Nonechecks to avoid masking zero token counts.
usage.get("input_tokens") or usage.get("prompt_tokens")will fall through to the alternate key when the primary key is present but0. While provider payloads with zero input/output tokens are uncommon, the fallthrough also makes the precedence between the two aliases inconsistent (only when truthy). Prefer an explicitNonecheck so the_tokenskey always wins when present.♻️ Proposed fix
- input_tokens = usage.get("input_tokens") or usage.get("prompt_tokens") - output_tokens = usage.get("output_tokens") or usage.get("completion_tokens") + input_tokens = ( + usage.get("input_tokens") + if usage.get("input_tokens") is not None + else usage.get("prompt_tokens") + ) + output_tokens = ( + usage.get("output_tokens") + if usage.get("output_tokens") is not None + else usage.get("completion_tokens") + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/utils/parsing.py` around lines 25 - 26, The current token extraction uses truthy fallback which treats 0 as missing; replace the "or" fallbacks with explicit None checks so that input_tokens prefers usage["input_tokens"] when the key exists (even if 0) and only uses usage["prompt_tokens"] when usage.get("input_tokens") is None, and do the analogous change for output_tokens so usage["output_tokens"] takes precedence over usage["completion_tokens"]; update the two assignments in parsing.py that set input_tokens and output_tokens accordingly (use explicit is not None checks against usage.get(...)).backend/app/models/user.py-81-81 (1)
81-81:⚠️ Potential issue | 🟡 MinorPATCH
/meshould compute and return features like GET/me, or document the inconsistency.The
featuresfield defaults to[]in most endpoints but is populated by flag computation only in GET/me. This creates an inconsistency for PATCH/me(line 97), which returns the user directly without computing features—clients calling PATCH/mewill receivefeatures: []even if the user has feature flags enabled, contradicting the behavior of GET/me.Recommend either:
- Computing features in PATCH
/mebefore returning, or- Documenting that
featuresreflects the current state only after GET/me.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/user.py` at line 81, PATCH `/me` returns the User model with features defaulting to [] because it doesn't run the same flag computation used by GET `/me`; modify the PATCH `/me` handler to compute and populate the features field before returning (reuse the same flag computation logic used by GET `/me`, or extract that logic into a helper like compute_user_features(user) and call it from both GET `/me` and PATCH `/me`) so the returned User.features matches GET `/me`.backend/app/api/permissions.py-92-101 (1)
92-101:⚠️ Potential issue | 🟡 MinorMisleading 403 detail when organization context is absent.
When
org_id is None(caller has no organization on the auth context), the message still reads "is not enabled for this organization", which conflates "feature disabled" with "no organization scope". Consider distinguishing the two for clearer client-side handling.🛡️ Suggested differentiation
- if org_id is None or not is_enabled( - session=session, - flag=flag, - organization_id=org_id, - project_id=project_id, - ): - raise HTTPException( - status_code=403, - detail=f"Feature '{flag.name}' is not enabled for this organization.", - ) + if org_id is None: + raise HTTPException( + status_code=403, + detail=f"Feature '{flag.name}' requires an organization context.", + ) + if not is_enabled( + session=session, + flag=flag, + organization_id=org_id, + project_id=project_id, + ): + raise HTTPException( + status_code=403, + detail=f"Feature '{flag.name}' is not enabled for this organization.", + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/permissions.py` around lines 92 - 101, The current 403 message conflates missing org context and a disabled feature; update the check around org_id and is_enabled in the block containing org_id, is_enabled(...), and the raised HTTPException so that if org_id is None you raise a 403 (or appropriate 400/401 if preferred) with a clear message like "No organization context provided" (referencing org_id), otherwise keep the 403 with detail f"Feature '{flag.name}' is not enabled for this organization." (use session, project_id and flag.name as currently used); ensure the two paths are distinct and use the same HTTPException symbol for consistency.backend/app/assessment/cron.py-144-182 (1)
144-182:⚠️ Potential issue | 🟡 MinorFailure-path itself is unprotected — a single bad cleanup aborts the entire poll.
update_assessment_run_status(commits, may raise on DB error / rollback) andrecompute_assessment_status(raisesValueErrorif the assessment was concurrently deleted) are both invoked inside theexcepthandler with no further protection. If either raises, the exception escapes the per-run loop and skips every remaining assessment in this cron tick.Wrap the cleanup so a failure on one run can't stall the cron:
🛡️ Proposed defensive cleanup
except Exception as e: logger.error( "[poll_all_pending_assessment_evaluations] " f"Failed run {run.id} | " ... exc_info=True, ) - update_assessment_run_status( - session=session, - run=run, - status="failed", - error_message="Processing failed. Check server logs for details.", - ) - refreshed_assessment = recompute_assessment_status( - session=session, assessment_id=assessment.id - ) + try: + update_assessment_run_status( + session=session, + run=run, + status="failed", + error_message="Processing failed. Check server logs for details.", + ) + refreshed_assessment = recompute_assessment_status( + session=session, assessment_id=assessment.id + ) + except Exception as cleanup_err: + logger.error( + "[poll_all_pending_assessment_evaluations] " + f"Cleanup failed for run {run.id} | error={cleanup_err}", + exc_info=True, + ) + failed += 1 + continue failure_result = { ... } all_results.append(failure_result) assessment_event_broker.publish( _build_callback_payload( refreshed_assessment, run, failure_result, ) ) failed += 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/cron.py` around lines 144 - 182, The except handler for run processing must be made defensive so cleanup errors don't abort the whole poll: wrap calls to update_assessment_run_status, recompute_assessment_status, and assessment_event_broker.publish (and any related cleanup logic that may commit/raise) in their own try/except block(s), catch and log any exceptions (including the run.id, run.run_name and assessment_id) with exc_info=True, ensure you still append a failure_result and increment failed for this run even if cleanup fails, and swallow the exception so the per-run loop continues to the next run.backend/app/assessment/models.py-32-36 (1)
32-36:⚠️ Potential issue | 🟡 Minor
id: int = SQLField(default=None, ...)mismatches the type — should beint | None.Both
Assessment.idandAssessmentRun.idare typedintbut defaulted toNone(until the DB assigns one on insert). Pydantic v2 + SQLModel will raise validation errors when constructing instances pre-flush, and static type-checkers will (correctly) flag every place that readsassessment.idassuming non-None when the row is unflushed.🛠️ Proposed fix
- id: int = SQLField( + id: int | None = SQLField( default=None, primary_key=True, sa_column_kwargs={"comment": "Unique identifier for the assessment"}, ) @@ - id: int = SQLField( + id: int | None = SQLField( default=None, primary_key=True, sa_column_kwargs={"comment": "Unique identifier for the assessment run"}, )Also applies to: 144-148
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/models.py` around lines 32 - 36, The id fields are declared as `id: int = SQLField(default=None, ...)` but default None means the type should allow None; update the type annotations of Assessment.id and AssessmentRun.id to be nullable (e.g., `int | None` or `Optional[int]`) to match the SQLField default and avoid Pydantic/typing errors, and then review any call sites that assume non-None to handle the unflushed case or assert/coerce as needed.backend/app/assessment/mappers.py-10-22 (1)
10-22:⚠️ Potential issue | 🟡 MinorType signature doesn't match call sites —
textmay beNone.
normalize_llm_textis declared astext: str -> str, but it's called withkaapi_params.get("instructions")(lines 100, 185), which returnsOptional[str]. The body already handles non-str via theisinstance(text, str)early return, so widening the annotation correctly will keep the code honest and silence type-checkers.🛠️ Proposed fix
-def normalize_llm_text(text: str) -> str: - if not isinstance(text, str) or not text: - return text +def normalize_llm_text(text: str | None) -> str | None: + if not isinstance(text, str) or not text: + return text🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` around lines 10 - 22, The function normalize_llm_text has a too-narrow type signature: change its annotation from def normalize_llm_text(text: str) -> str to accept and return Optional[str] (e.g., def normalize_llm_text(text: Optional[str]) -> Optional[str]) and add the necessary Optional import from typing; this matches call sites like kaapi_params.get("instructions") and the existing early-return behavior that handles non-str/None values.backend/app/api/routes/features.py-26-30 (1)
26-30:⚠️ Potential issue | 🟡 MinorMissing return type annotation on helper.
_parse_flag_or_400returns aFeatureFlagenum value but has no return annotation.As per coding guidelines: "Always add type hints to all function parameters and return values in Python code".
🛠️ Proposed fix
-def _parse_flag_or_400(flag_key: str): +from app.core.feature_flags import FeatureFlag as FeatureFlagEnum + +def _parse_flag_or_400(flag_key: str) -> FeatureFlagEnum: try: return parse_feature_flag(flag_key) except ValueError as exc: raise HTTPException(status_code=400, detail=str(exc)) from exc(Use whatever import path matches your existing
parse_feature_flagreturn type.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/features.py` around lines 26 - 30, _add a return type annotation to the helper _parse_flag_or_400 indicating it returns the FeatureFlag enum returned by parse_feature_flag; update the function signature to include -> FeatureFlag and import FeatureFlag from its module (or use the correct fully-qualified import used elsewhere) so the helper's return type is explicitly annotated and matches parse_feature_flag's return type.backend/app/assessment/batch.py-107-135 (1)
107-135:⚠️ Potential issue | 🟡 MinorWorkbook handle is leaked on exceptions.
openpyxl.load_workbook(...)opens a file handle (especially inread_only=Truemode where it streams). Ifnext(rows_iter, None), the column build, or the row iteration raises,wb.close()is skipped and the underlying ZIP/file handle is leaked until GC. Wrap the body intry/finallyor use a context-manager-style pattern.🛠️ Proposed fix
- wb = openpyxl.load_workbook(io.BytesIO(content), read_only=True, data_only=True) - ws = wb.active - if ws is None: - wb.close() - return [] - - rows_iter = ws.iter_rows(values_only=True) - header = next(rows_iter, None) - if header is None: - wb.close() - return [] - - columns = [str(h) if h is not None else f"col_{i}" for i, h in enumerate(header)] - result = [] - for row in rows_iter: - if row and any(cell is not None for cell in row): - row_dict = { - columns[i]: str(cell) if cell is not None else "" - for i, cell in enumerate(row) - if i < len(columns) - } - result.append(row_dict) - - wb.close() - return result + wb = openpyxl.load_workbook(io.BytesIO(content), read_only=True, data_only=True) + try: + ws = wb.active + if ws is None: + return [] + + rows_iter = ws.iter_rows(values_only=True) + header = next(rows_iter, None) + if header is None: + return [] + + columns = [str(h) if h is not None else f"col_{i}" for i, h in enumerate(header)] + result: list[dict[str, str]] = [] + for row in rows_iter: + if row and any(cell is not None for cell in row): + result.append( + { + columns[i]: str(cell) if cell is not None else "" + for i, cell in enumerate(row) + if i < len(columns) + } + ) + return result + finally: + wb.close()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/batch.py` around lines 107 - 135, The workbook handle opened by openpyxl.load_workbook in _parse_excel_rows can leak if any subsequent operation raises; wrap workbook usage in a try/finally or context-manager pattern so wb.close() always runs: after creating wb, enter a try block that performs ws = wb.active, header = next(rows_iter, ...), column construction and row iteration, and put wb.close() in the finally (or use openpyxl.load_workbook’s context manager if available) to ensure the file/ZIP handle is closed on success or exception.backend/app/assessment/processing.py-37-70 (1)
37-70:⚠️ Potential issue | 🟡 Minor
_sanitize_json_outputdoesn't escape\bor\f.JSON spec (RFC 8259) treats every U+0000–U+001F character as illegal inside string values. The current implementation handles
\n,\r,\t, but model outputs occasionally contain literal\b(0x08) and\f(0x0C) — most often when they're hallucinated. If those slip through, the secondjson.loadswill still fail.Tightening the sanitizer to escape any
ord(ch) < 0x20while inside a string covers all illegal control characters in one shot.🛠️ Proposed fix
- elif in_string and ch == "\n": - result.append("\\n") - elif in_string and ch == "\r": - result.append("\\r") - elif in_string and ch == "\t": - result.append("\\t") - else: - result.append(ch) + elif in_string and ord(ch) < 0x20: + escapes = {"\n": "\\n", "\r": "\\r", "\t": "\\t", + "\b": "\\b", "\f": "\\f"} + result.append(escapes.get(ch, f"\\u{ord(ch):04x}")) + else: + result.append(ch)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/processing.py` around lines 37 - 70, The sanitizer _sanitize_json_output currently only replaces newline, carriage return and tab inside JSON strings and misses other control characters like \b and \f; update the loop so that when inside a string (in_string is True) and the current character is not an escaped character (escape_next is False) you check if ord(ch) < 0x20 and, if so, append a JSON-safe escape (e.g. a specific mapping for '\n','\r','\t','\b','\f' or generically append "\\u{ord(ch):04x}") instead of the raw character; keep the existing handling for backslash (escape_next) and quote toggling so previously escaped characters are preserved.backend/app/crud/feature_flag.py-37-63 (1)
37-63:⚠️ Potential issue | 🟡 MinorConcurrent create can leak
IntegrityError(500) instead of clean conflict.Between
get_feature_flag(...)andsession.commit(), two concurrent requests with the same(key, organization_id, project_id)can both pass the existence check and try to insert. The DB'suq_feature_flag_key_org_projectconstraint correctly rejects the second one, but as an uncaughtIntegrityError, which surfaces as HTTP 500 increate_feature_flag_route(it expectsNoneto translate to 409).Catching the integrity error and returning
None(or re-fetching) keeps behavior consistent with the existingNone-on-conflict contract.🛡️ Proposed fix
+from sqlalchemy.exc import IntegrityError @@ feature_flag = FeatureFlag( key=key, organization_id=organization_id, project_id=project_id, enabled=enabled, ) session.add(feature_flag) - session.commit() - session.refresh(feature_flag) - return feature_flag + try: + session.commit() + except IntegrityError: + session.rollback() + return None + session.refresh(feature_flag) + return feature_flag🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/feature_flag.py` around lines 37 - 63, The create_feature_flag function can raise an uncaught IntegrityError when two requests concurrently pass get_feature_flag and both attempt to insert the same (key, organization_id, project_id); wrap the insert/commit in a try/except catching sqlalchemy.exc.IntegrityError, call session.rollback() in the except, then return None (or optionally re-fetch via get_feature_flag and return that) so the function preserves the contract of returning None on conflict instead of letting a 500 bubble up; update create_feature_flag to import IntegrityError and handle the exception around session.commit()/session.add() accordingly.backend/app/assessment/routes.py-332-336 (1)
332-336:⚠️ Potential issue | 🟡 MinorMissing return type annotation on
event_stream.- async def event_stream(): + async def event_stream() -> AsyncIterator[str]: async for chunk in assessment_event_broker.subscribe(): yield chunk await asyncio.sleep(0)You'll need
from collections.abc import AsyncIterator(orfrom typing import AsyncIteratoron older Pythons).As per coding guidelines: "Always add type hints to all function parameters and return values in Python code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/routes.py` around lines 332 - 336, Add a return type annotation to the async generator by changing the signature of event_stream to specify AsyncIterator[bytes] (e.g. async def event_stream() -> AsyncIterator[bytes]:) and import AsyncIterator from collections.abc at the top of the module; keep the body using assessment_event_broker.subscribe() unchanged so the generator type matches the yielded chunk type.backend/app/assessment/batch.py-395-420 (1)
395-420:⚠️ Potential issue | 🟡 Minor
google_params["reasoning"]is set by the mapper but never consumed here.
map_kaapi_to_google_params(mappers.py line 205–207) writesgoogle_params["reasoning"] = reasoning, but the builder at lines 395–420 only readsinstructions,temperature,top_p,max_output_tokens,thinking_config, andoutput_schema. Thereasoningvalue is silently dropped and never included in the Gemini request.Either map
reasoninginto the appropriate Gemini field (e.g., merge withthinkingConfig) or remove the assignment from the mapper to eliminate dead code.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/batch.py` around lines 395 - 420, The google_params["reasoning"] value produced by map_kaapi_to_google_params is never used in the Gemini request builder; update the builder in batch.py to consume google_params["reasoning"] by merging it into the outgoing generationConfig (e.g., if thinking_config exists, combine or extend it with reasoning and set generation_config["thinkingConfig"] accordingly; if not, set generation_config["thinkingConfig"] = reasoning) so the reasoning intent is sent as part of the request, or alternatively remove the assignment of google_params["reasoning"] in map_kaapi_to_google_params to eliminate dead code.backend/app/assessment/mappers.py-93-106 (1)
93-106:⚠️ Potential issue | 🟡 MinorAdd guard clause for
modelbefore callinglitellm.supports_reasoning().If
kaapi_paramslacks"model", the code callslitellm.supports_reasoning(model="openai/None"), which returnsFalse(or raises an error depending on litellm version). This causes misleading warnings at lines 131–134 that claim the model "does not support reasoning" when the actual problem is that no model was specified.Fix
- support_reasoning = litellm.supports_reasoning(model=f"openai/{model}") + support_reasoning = ( + litellm.supports_reasoning(model=f"openai/{model}") if model else False + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` around lines 93 - 106, The code calls litellm.supports_reasoning with model=f"openai/{model}" even when model is None; add a guard that checks whether model is present in kaapi_params before invoking litellm.supports_reasoning: if model is truthy call support_reasoning = litellm.supports_reasoning(model=f"openai/{model}"), otherwise set support_reasoning = False (and optionally log or raise a clear "model missing" message) so you don't pass "openai/None" into supports_reasoning or produce misleading reasoning-support warnings; update any subsequent logic that uses support_reasoning accordingly (look for the model variable and support_reasoning usage in this function).backend/app/assessment/utils/export.py-395-417 (1)
395-417:⚠️ Potential issue | 🟡 Minor
assessment_id=run.assessment_id or 0uses 0 as a sentinel for runs without a parent assessment, inconsistent with other models.
AssessmentExportRow.assessment_idis typed asint, forcing the conversion ofNoneto0. However,AssessmentRunPublic,AssessmentRunSummary, and the source fieldrun.assessment_idall useint | None. ChangeAssessmentExportRow.assessment_idtoint | Noneto match the source type and existing patterns in the codebase.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/utils/export.py` around lines 395 - 417, AssessmentExportRow.assessment_id is currently coerced to 0 in the export (assessment_id=run.assessment_id or 0) and typed as int; change the AssessmentExportRow dataclass/typing to accept int | None for assessment_id and stop converting None to 0 by passing run.assessment_id directly in the export_rows append. Update the type annotation (AssessmentExportRow.assessment_id: int | None), any related imports/typing declarations, and adjust any callers that assumed 0 to handle None where necessary (references: AssessmentExportRow and the export loop building export_rows using run.assessment_id).backend/app/assessment/crud.py-275-292 (1)
275-292:⚠️ Potential issue | 🟡 MinorStatus taxonomy is inconsistent; "in_progress" is checked but never set, causing silent data loss.
processing_runsaccepts both"processing"and"in_progress", while the other counters and the model documentation accept single literals. The documented statuses are only "pending", "processing", "completed", "failed" — yet the aggregator checks for a fifth value that is never assigned anywhere in the codebase.If any run has a status outside the expected set (typo, legacy value, or other unexpected value), it is excluded from all four counters but still contributes to
total_runs = len(runs). When_determine_assessment_statusreceives mismatched totals, it falls through to"processing", silently masking the inconsistency.Recommend (a) consolidating on a single canonical set of statuses (StrEnum) enforced at assignment time, and (b) asserting when a run's status falls outside that set.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/crud.py` around lines 275 - 292, The aggregator is mixing an unexpected "in_progress" literal into processing_runs and not counting unexpected statuses, causing total_runs mismatch and silent masking in _determine_assessment_status; fix by consolidating to the canonical status set (pending, processing, completed, failed) and stop checking "in_progress" in the processing_runs calculation (only check r.status == "processing"), and add a validation/assertion step when iterating runs (e.g., validate_run_status or inline check inside the block that computes pending_runs/processing_runs/completed_runs/failed_runs) that logs or raises if r.status is not in the canonical set so invalid/legacy values are caught at assignment time rather than ignored. Ensure assessment.total_runs still equals len(runs) and that the counters sum to total_runs unless a validation error is raised; update references to processing_runs and _determine_assessment_status accordingly.
🧹 Nitpick comments (16)
backend/app/assessment/utils/parsing.py (1)
13-17: Consider hardening JSON parsing against malformed payloads.
json.loadson the array branch and per-line in the JSONL branch will raiseJSONDecodeErrorstraight to callers. If batch outputs from providers occasionally contain a malformed line, the entire result load fails. Depending on caller expectations, you may want to wrap line parsing with a try/except + logger warning so a single bad record doesn’t poison the whole run. Skip if upstream callers are already expected to handle this.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/utils/parsing.py` around lines 13 - 17, The JSON parsing currently calls json.loads directly on content (array branch using parsed) and per-line (JSONL branch), which will raise JSONDecodeError for malformed payloads; wrap the top-level json.loads(content) in a try/except JSONDecodeError and log a warning then return an empty list (or appropriate fallback), and for the JSONL branch catch JSONDecodeError around each json.loads(line) so you log the bad line and skip it instead of letting one bad record fail the whole parse; use the existing logger (or add one) to emit warnings that include the offending line and the exception.backend/app/assessment/validators.py (1)
53-67: Simplify by reading once and checkinglen(content).The
file.file.seek/tell/seekdance reaches into the underlyingSpooledTemporaryFilesynchronously, which works but bypasses Starlette's asyncUploadFileAPI and is brittle if the upload backend changes. Reading first and validating size on the bytes avoids the seek-around entirely and is also cheaper sinceread()is being called anyway. The 10 MB upper bound provides a natural memory cap.- file.file.seek(0, 2) - file_size = file.file.tell() - file.file.seek(0) - - if file_size > MAX_FILE_SIZE: + content = await file.read() + file_size = len(content) + + if file_size > MAX_FILE_SIZE: raise HTTPException( status_code=413, detail=f"File too large. Maximum size: {MAX_FILE_SIZE / (1024 * 1024):.0f}MB", ) if file_size == 0: raise HTTPException(status_code=422, detail="Empty file uploaded") - content = await file.read() return content, file_extNote: if you ever need to enforce the limit before reading (to prevent OOM on very large uploads), prefer streaming chunked reads with a running counter rather than the seek/tell approach — but for a 10 MB cap this is fine.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/validators.py` around lines 53 - 67, Replace the synchronous file.file.seek/tell/seek usage with a single async read: call await file.read() once (using UploadFile.read()), then use len(content) to enforce MAX_FILE_SIZE and detect empty uploads instead of file_size; remove references to file.file.seek/tell and keep the existing HTTPException behavior and return (content, file_ext) from the same validator function.backend/app/main.py (1)
70-75: Add a return type annotation tolifespan.Per coding guidelines for
**/*.py, all function parameters and return values should have type hints. Theapp: FastAPIparameter is annotated, but the return is not. Ruff'sARG001warning about unusedappis a false positive — FastAPI's lifespan protocol requires this signature.-@asynccontextmanager -async def lifespan(app: FastAPI): - """App startup / shutdown lifecycle.""" +@asynccontextmanager +async def lifespan(app: FastAPI) -> AsyncIterator[None]: + """App startup / shutdown lifecycle."""(Add
from collections.abc import AsyncIteratorto imports.) As per coding guidelines: "Always add type hints to all function parameters and return values in Python code".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/main.py` around lines 70 - 75, The lifespan function lacks a return type hint; update its signature to annotate the return as an async iterator (e.g., AsyncIterator[None]) and add the required import from collections.abc (import AsyncIterator) so the signature becomes async def lifespan(app: FastAPI) -> AsyncIterator[None]:; keep the existing app parameter annotation (to avoid Ruff false-positive) and no other behavior changes.backend/app/models/__init__.py (1)
100-100: Consider movingAssessment/AssessmentRunre-exports to a dedicated location or removing them entirely.The import of
AssessmentandAssessmentRunfromapp.assessment.modelsat line 100 is currently safe (no circular import exists), but consolidating external model re-exports in a central__init__can create tight coupling if the assessment subpackage ever needs to import fromapp.modelsfor FK relationships. A cleaner approach:
- Remove this re-export and have callers
from app.assessment.models import Assessment, AssessmentRundirectly (preferred — keeps subpackage models self-contained).- If table discovery is required for Alembic, register models in your
env.py/metadata module instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/__init__.py` at line 100, The __init__.py currently re-exports Assessment and AssessmentRun (from app.assessment.models) which can create tight coupling; remove the import line exporting Assessment and AssessmentRun from app.assessment.models in app.models.__init__ and update callers to import directly from app.assessment.models instead (or, if you need automatic table discovery for Alembic, register those models in your Alembic env.py/metadata module rather than re-exporting them here); ensure no remaining references rely on the central re-export and run tests/migrations to verify imports still resolve.backend/app/models/feature_flag.py (1)
61-78:FeatureFlagCreateandFeatureFlagUpdateare identical — collapse them.They define the exact same fields and semantics; either alias or have one inherit from the other so that future changes to one don't silently diverge.
♻️ Proposed simplification
class FeatureFlagCreate(SQLModel): key: str = Field(min_length=1, max_length=128) organization_id: int project_id: int | None = None enabled: bool -class FeatureFlagUpdate(SQLModel): - key: str = Field(min_length=1, max_length=128) - organization_id: int - project_id: int | None = None - enabled: bool +class FeatureFlagUpdate(FeatureFlagCreate): + """Same payload shape as create; kept distinct for OpenAPI clarity."""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/feature_flag.py` around lines 61 - 78, FeatureFlagCreate and FeatureFlagUpdate are identical; collapse them by introducing a single shared model (e.g., FeatureFlagBase or keep FeatureFlagCreate as canonical) and have the other name inherit or alias it to avoid duplication—update references to use the unified model (FeatureFlagCreate or FeatureFlagUpdate) and leave FeatureFlagDelete unchanged; ensure the shared model preserves the existing Field validations (key, organization_id, project_id, enabled) so future changes apply in one place.backend/app/assessment/events.py (1)
25-26: Use builtinTimeoutError(Ruff UP041).In Python 3.11+,
asyncio.TimeoutErroris an alias of the builtinTimeoutError. Per the project's Python 3.11+ stance and the static analysis hint, prefer the builtin.- except asyncio.TimeoutError: + except TimeoutError: yield ": keep-alive\n\n" continueAs per coding guidelines: "Use Python 3.11+ with type hints throughout the codebase".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/events.py` around lines 25 - 26, Replace the except that catches asyncio.TimeoutError with the builtin TimeoutError in the async block where you await queue.get() (the code that sets payload = await asyncio.wait_for(queue.get(), timeout=15)); locate the try/except around that await in events.py and change the exception class to TimeoutError to follow Python 3.11+ conventions and satisfy the Ruff UP041 lint.backend/app/core/feature_flags/__init__.py (1)
22-23: Misleading "unused" discard.
organization_idandproject_idare actually used below — onlyuser_idis unused. The_ = (organization_id, project_id, user_id)line falsely signals all three are unused. If the intent is to silence linters foruser_id, prefer renaming the parameter.♻️ Proposed cleanup
def is_enabled( session: Session, flag: FeatureFlag, organization_id: int, project_id: int | None = None, - user_id: int | None = None, + user_id: int | None = None, # noqa: ARG001 — reserved for future user-scoped flags ) -> bool: """Check whether *flag* is enabled for the given scope.""" - _ = (organization_id, project_id, user_id) resolved_flag = get_feature_flag_enabled(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/core/feature_flags/__init__.py` around lines 22 - 23, The tuple assignment "_ = (organization_id, project_id, user_id)" falsely marks all three parameters as unused; instead keep real usage of organization_id and project_id and only silence the unused user_id by renaming it (e.g. _user_id) or explicitly assigning _ = user_id, and remove the tuple assignment; update the function signature or the single-use discard so that organization_id and project_id are no longer masked as unused while only user_id is ignored (refer to the parameters organization_id, project_id, user_id and the surrounding check/flag function in __init__.py).backend/app/assessment/service.py (2)
57-62: RedundantUUID(str(...))round-trip.
run.config_idis already typed asUUID | Noneon the model; converting tostrand back toUUIDis a no-op (and would crash onNone, but you already validate non-None on lines 52–56).- configs.append( - AssessmentConfigRef( - config_id=UUID(str(run.config_id)), - config_version=run.config_version, - ) - ) + configs.append( + AssessmentConfigRef( + config_id=run.config_id, + config_version=run.config_version, + ) + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/service.py` around lines 57 - 62, The code unnecessarily wraps run.config_id in UUID(str(...)) before constructing AssessmentConfigRef; since run.config_id is already a UUID (and you validate it's non-None earlier in the block around lines 52–56), replace UUID(str(run.config_id)) with run.config_id when creating AssessmentConfigRef to avoid the redundant conversion and potential crash on None.
144-195: Redundantrecompute_assessment_statuscalls inside the per-run loop.
recompute_assessment_statusis invoked after each successful submit (line 178), after each failed submit (line 191), and again once at the end (line 195). For an assessment with N configs, that'sN + 1full re-aggregations of the same rows; only the final one is observable to clients (the others are immediately overwritten).Move the call out of the loop:
♻️ Proposed simplification
run = update_assessment_run_status( session=session, run=run, status="processing", batch_job_id=batch_job.id, total_items=batch_job.total_items, ) - recompute_assessment_status(session=session, assessment_id=assessment.id) except Exception as e: ... run = update_assessment_run_status( session=session, run=run, status="failed", error_message="Batch submission failed. Please try again or contact support.", ) - recompute_assessment_status(session=session, assessment_id=assessment.id) runs.append(run) recompute_assessment_status(session=session, assessment_id=assessment.id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/service.py` around lines 144 - 195, The code currently calls recompute_assessment_status() inside the per-config loop (after successful submit and in the except block) causing redundant re-aggregations; remove the two in-loop calls to recompute_assessment_status (the ones immediately after update_assessment_run_status in both the try and except paths around submit_assessment_batch and update_assessment_run_status) and keep only the single recompute_assessment_status(session=session, assessment_id=assessment.id) that runs once after the loop finishes so aggregation runs once for the final state.backend/app/api/routes/features.py (1)
33-170: Endpoints don't load Swagger descriptions from external markdown.None of these new endpoints (
/features,/feature-flagsGET/POST/PATCH/DELETE) supplydescription=load_description("features/<action>.md"). The assessment dataset routes in this same PR (e.g.,backend/app/assessment/routes.py:93) already follow this pattern.As per coding guidelines: "Load Swagger endpoint descriptions from external markdown files instead of inline strings using
load_description("domain/action.md")".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/features.py` around lines 33 - 170, The router endpoints get_features, get_feature_flags, create_feature_flag_route, patch_feature_flag, and remove_feature_flag are missing Swagger descriptions loaded from external Markdown; update each `@router`.<method> decorator to include description=load_description("features/<action>.md") (use names like "features/get.md", "features/list.md" or "features/create.md"/"patch.md"/"delete.md" matching your docs convention) so the OpenAPI docs pull content from external files rather than inline strings; ensure you import or have access to load_description and use the exact function names (get_features, get_feature_flags, create_feature_flag_route, patch_feature_flag, remove_feature_flag) to locate the decorators to modify.backend/app/assessment/processing.py (1)
253-253: Log prefix uses run id rather than function name.
log_prefix = f"[assessment_run={run.id}]"is useful context, but the repo convention is to lead with the function name. Combining both keeps the run-scoped context while satisfying the guideline.- log_prefix = f"[assessment_run={run.id}]" + log_prefix = f"[check_and_process_assessment][run={run.id}]"As per coding guidelines: "Prefix all log messages with the function name in square brackets:
logger.info(f"[function_name] Message ...")".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/processing.py` at line 253, Update the log_prefix assignment so it begins with the current function name in square brackets followed by the existing run-scoped context (i.e., include both the function name and assessment run id); change the log_prefix variable (currently set as log_prefix = f"[assessment_run={run.id}]") to the repo-standard pattern like "[<function_name>][assessment_run=<run.id>]" (use the actual function name where this line lives rather than the literal "<function_name>") so all subsequent logger calls follow the required convention.backend/app/assessment/routes.py (1)
348-436: RepetitiveAssessmentPublic(...)/AssessmentRunPublic(...)construction.
list_assessment_managers,get_assessment_manager,list_evaluations, andget_evaluationeach manually copy 14–15 fields one by one. Since bothAssessmentPublicandAssessmentRunPublicare pure Pydantic projections of fields already on the SQLModel rows, you can replace these blocks withAssessmentPublic.model_validate(a, from_attributes=True)(Pydantic v2) — or setmodel_config = ConfigDict(from_attributes=True)on the public schemas and pass the row directly.This shrinks four endpoints to one-liners and removes the risk of drift when fields are added.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/routes.py` around lines 348 - 436, The endpoints list_assessment_managers, get_assessment_manager (and the analogous list_evaluations/get_evaluation) repetitively map fields into AssessmentPublic/AssessmentRunPublic; replace the manual field-by-field construction by either enabling attribute parsing on the Pydantic models (set model_config = ConfigDict(from_attributes=True) on AssessmentPublic and AssessmentRunPublic) and pass the SQLModel row directly, or call AssessmentPublic.model_validate(a, from_attributes=True) (and AssessmentRunPublic.model_validate(...)) when building the APIResponse; update imports if needed and replace each block in list_assessment_managers, get_assessment_manager, list_evaluations, and get_evaluation with the single model_validate (or direct model instantiation) call to return the Pydantic projection.backend/app/assessment/crud.py (2)
196-227: Constrainstatusto known values.
status: strallows any string to be persisted toAssessmentRun.status, which couples directly into the aggregation inrecompute_assessment_status(see comment above). Tightening to aLiteral["pending", "processing", "completed", "failed", ...](or the same StrEnum used elsewhere) gives compile-time safety and prevents drift between writers and the aggregator.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/crud.py` around lines 196 - 227, The update_assessment_run_status function accepts status: str which lets arbitrary values be persisted to AssessmentRun.status and can drift from the aggregator logic in recompute_assessment_status; change the status parameter to the project-wide enum type (or a typing.Literal of the allowed values) used elsewhere (the same StrEnum/StatusEnum used for assessment runs) and update any call sites to pass that enum/value so only known statuses are stored and compile-time/type-checking will prevent drift; ensure type imports and annotations are updated where update_assessment_run_status is defined and referenced.
1-12: Place CRUD operations inbackend/app/crud/per project convention.This new module is a pure data-access layer for the
Assessment/AssessmentRuntables but lives atbackend/app/assessment/crud.pyinstead of the centralbackend/app/crud/package used elsewhere. Consider moving it (e.g. tobackend/app/crud/assessment.py) so future contributors can locate DB code consistently and so it composes with the existing CRUD test/fixtures conventions.Based on learnings: "Use CRUD pattern for database access operations located in
backend/app/crud/."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/crud.py` around lines 1 - 12, The CRUD code for Assessment and AssessmentRun lives in app.assessment.crud but should be placed in the central CRUD package; move the module into the project's crud package (e.g., create app.crud.assessment) and update any imports to reference app.crud.assessment instead of app.assessment.crud, keeping the same symbols (Assessment, AssessmentRun, logger, and any CRUD functions/classes) and ensure the new module exports the same public API so existing callers continue to work.backend/app/assessment/utils/export.py (2)
160-171: Hard-codedbase_fields.index("result_status")will raiseValueErrorif the field is renamed.If
AssessmentExportRowis later refactored to drop or renameresult_status, every CSV/XLSX export call crashes with an opaqueValueError. Consider guarding it (fall back tolen(base_fields)) or anchor the column ordering on a stable constant declared in this module so the dependency is explicit.♻️ Suggested guard
- output_idx = base_fields.index("result_status") + 1 # after result_status + try: + output_idx = base_fields.index("result_status") + 1 # place output_keys after result_status + except ValueError: + output_idx = len(base_fields)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/utils/export.py` around lines 160 - 171, The code uses base_fields.index("result_status") which will raise ValueError if that field is renamed; update the logic in the block that computes output_idx/fieldnames (referencing base_fields, output_idx, fieldnames, has_unparsed_output) to guard the lookup: either define and use a stable constant (e.g., RESULT_STATUS_FIELD) and check membership before calling index, or wrap the index call in a try/except and fall back to len(base_fields) (or another sensible insertion point) so exports don't crash; ensure the subsequent insertion of "output_raw" uses the guarded output_idx when computing the insert position.
200-228: Pin the Excel engine explicitly to"openpyxl".When
pd.ExcelWriter(buf)is called with aBytesIOobject, pandas infers the file format as .xlsx and selects an engine. Without an explicit engine parameter, pandas prefers xlsxwriter if installed, otherwise falls back to openpyxl. Since the codebase only declaresopenpyxl>=3.1.0as a dependency and the error message explicitly references openpyxl support, the engine should be pinned to ensure deterministic behavior that matches the stated runtime requirement.♻️ Suggested change
- with pd.ExcelWriter(buf) as writer: + with pd.ExcelWriter(buf, engine="openpyxl") as writer: data_frame.to_excel(writer, index=False, sheet_name="results")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/utils/export.py` around lines 200 - 228, The Excel writer call should explicitly pin the engine to openpyxl to ensure deterministic .xlsx output; update the pd.ExcelWriter usage (the block that creates buf, data_frame = pd.DataFrame(expanded, columns=excel_fields) and the with pd.ExcelWriter(buf) as writer: ... block) to pass engine="openpyxl" so pandas uses openpyxl rather than xlsxwriter or another engine, keeping the rest of the export flow (_drop_empty_columns, expanded, excel_fields, data_frame.to_excel(...)) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 091c4f08-e02a-48f6-b5b8-7905058d0c10
⛔ Files ignored due to path filters (1)
backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (34)
.gitignorebackend/app/alembic/versions/050_add_assessment_manager_table.pybackend/app/alembic/versions/055_create_feature_flag_table.pybackend/app/api/main.pybackend/app/api/permissions.pybackend/app/api/routes/cron.pybackend/app/api/routes/features.pybackend/app/api/routes/users.pybackend/app/assessment/__init__.pybackend/app/assessment/batch.pybackend/app/assessment/cron.pybackend/app/assessment/crud.pybackend/app/assessment/dataset.pybackend/app/assessment/events.pybackend/app/assessment/mappers.pybackend/app/assessment/models.pybackend/app/assessment/processing.pybackend/app/assessment/routes.pybackend/app/assessment/service.pybackend/app/assessment/utils/__init__.pybackend/app/assessment/utils/export.pybackend/app/assessment/utils/parsing.pybackend/app/assessment/validators.pybackend/app/core/feature_flags/__init__.pybackend/app/core/feature_flags/constants.pybackend/app/crud/feature_flag.pybackend/app/main.pybackend/app/models/__init__.pybackend/app/models/batch_job.pybackend/app/models/feature_flag.pybackend/app/models/llm/request.pybackend/app/models/user.pybackend/app/tests/api/test_permissions.pybackend/pyproject.toml
| sa.UniqueConstraint( | ||
| "key", | ||
| "organization_id", | ||
| "project_id", | ||
| name="uq_feature_flag_key_org_project", | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Composite UniqueConstraint won't deduplicate org-level flags (project_id NULL).
In PostgreSQL, NULL values are not considered equal in unique constraints, so this constraint allows multiple rows with the same (key, organization_id, NULL). That breaks the implicit "one row per (key, org) when no project scope" invariant — list_feature_flags would return both, and the resolved flag value would depend on insertion order (and update_feature_flag could update an arbitrary one of them).
Add a partial unique index for the org-scoped case:
sa.UniqueConstraint(
"key",
"organization_id",
"project_id",
name="uq_feature_flag_key_org_project",
),
)
+ # PostgreSQL treats NULLs as distinct in unique constraints, so the constraint
+ # above does not prevent duplicates when project_id IS NULL. Enforce uniqueness
+ # of org-level flags via a partial unique index.
+ op.create_index(
+ "uq_feature_flag_key_org_no_project",
+ "feature_flag",
+ ["key", "organization_id"],
+ unique=True,
+ postgresql_where=sa.text("project_id IS NULL"),
+ )
op.create_index(
op.f("ix_feature_flag_key"),And mirror the drop in downgrade():
def downgrade():
op.drop_index(op.f("ix_feature_flag_organization_id"), table_name="feature_flag")
op.drop_index(op.f("ix_feature_flag_key"), table_name="feature_flag")
+ op.drop_index("uq_feature_flag_key_org_no_project", table_name="feature_flag")
op.drop_table("feature_flag")You'll likely want a matching Index(..., postgresql_where=...) declaration on the FeatureFlag model's __table_args__ so future autogenerate runs stay in sync.
Side note on indexes: with the composite unique constraint already covering
(key, organization_id, project_id)and a separateix_feature_flag_key, the standaloneix_feature_flag_organization_idis the only thing needed for org-scoped lookups. The redundancy is harmless but worth noting.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/alembic/versions/055_create_feature_flag_table.py` around lines
75 - 81, The composite UniqueConstraint uq_feature_flag_key_org_project does not
prevent duplicate org-scoped flags because PostgreSQL treats NULLs as distinct;
add a partial unique index for the org-scoped case (unique on (key,
organization_id) WHERE project_id IS NULL) in the upgrade migration and drop it
in downgrade(), and also add a matching Index(...) with postgresql_where=... to
the FeatureFlag model's __table_args__ so future autogenerate keeps migrations
in sync; keep existing composite constraint and other indexes as-is.
| from fastapi import HTTPException | ||
| from fastapi.responses import StreamingResponse | ||
|
|
||
| from app.assessment.batch import _load_dataset_rows |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Avoid importing private (_-prefixed) symbols across modules.
_load_dataset_rows (line 13) and _get_batch_provider (line 290) are leading-underscore symbols imported from sibling modules. By convention these are module-private and may be renamed/moved without notice, silently breaking this exporter. Either promote them to public helpers (drop the underscore and document them) or move the export pipeline closer to where these helpers live.
Also applies to: 287-291
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/assessment/utils/export.py` at line 13, The exporter imports
module-private helpers _load_dataset_rows and _get_batch_provider from a sibling
module; change the exporter to use public helpers instead by renaming those
functions in their source to load_dataset_rows and get_batch_provider (and
updating their docs), then update the imports in this file to import
load_dataset_rows and get_batch_provider; alternatively, move the export
pipeline into the same module that currently defines
_load_dataset_rows/_get_batch_provider so it can call them privately—ensure all
references in this file are updated to the new public names (load_dataset_rows,
get_batch_provider) or removed if you relocate the pipeline.
| if settings.SENTRY_DSN and settings.ENVIRONMENT != "development": | ||
| sentry_sdk.init(dsn=str(settings.SENTRY_DSN), enable_tracing=True) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the actual code in main.py to see the structure
cat -n backend/app/main.py | head -90Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 3531
🏁 Script executed:
# Check the full file structure around the relevant lines
wc -l backend/app/main.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 99
🏁 Script executed:
# Look at the exact lines mentioned in the review
sed -n '30,80p' backend/app/main.py | cat -nRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1850
🌐 Web query:
Sentry SDK sentry_sdk.init multiple calls production behavior documentation
💡 Result:
In production, calling sentry_sdk.init multiple times on the Sentry Python SDK leads to undefined behavior. The SDK is designed to be initialized only once during the application lifecycle. Multiple initializations cause integrations to hook into libraries and lifecycle events repeatedly, leading to interactions between initializations that result in unpredictable issues, such as duplicate event reporting or events sent to incorrect DSNs.
Citations:
- 1: Sentry can't be initialized twice getsentry/sentry-python#3059
- 2: https://develop.sentry.dev/sdk/foundations/client/
- 3: https://docs.sentry.io/platforms/python.md
- 4: setup_integrations() after client init doesnt seem to work getsentry/sentry-python#259
🌐 Web query:
Sentry SDK enable_tracing parameter traces_sample_rate equivalence
💡 Result:
In the Sentry Python SDK, there is an enable_tracing parameter (also referred to as enable_tracing in the Client constructor) that, when set to True, is equivalent to setting traces_sample_rate=1.0. This enables tracing by capturing 100% of transactions. Setting either traces_sample_rate or traces_sampler (non-None) enables tracing; if neither is set, no transactions are sampled (equivalent to traces_sample_rate=0.0). The enable_tracing parameter provides a convenient boolean toggle for full sampling. This equivalence is confirmed in the Python SDK API documentation where both parameters are listed together under tracing options, and Sentry developer guidelines note enableTracing (camelCase in spec) sets the sample rate to 1.0 implicitly. In other SDKs like .NET, EnableTracing=true explicitly sets TracesSampleRate=1.0, with past conflicts resolved to avoid overriding explicit rates. To fully disable tracing, set traces_sample_rate=None (not 0.0, as 0.0 still respects parent sampling decisions for distributed traces).
Citations:
- 1: https://getsentry.github.io/sentry-python/api.html
- 2: https://develop.sentry.dev/sdk/telemetry/traces.md
- 3: Conflict between EnableTracing and TracesSampleRate getsentry/sentry-dotnet#2366
- 4: https://docs.sentry.io/platforms/python/tracing/configure-sampling.md
- 5: https://docs.sentry.io/platforms/python/configuration/options.md
Second sentry_sdk.init call causes undefined behavior in production.
In non-development environments, this re-initializes Sentry at module load time, replacing the Hub's client with one stripped of all configuration from lines 34–56 — losing the OTEL instrumenter, before_send_transaction filter, LoggingIntegration, and disabled_integrations list. Per Sentry's developer documentation, calling Sentry.init() multiple times in production leads to undefined behavior, with integrations hooking repeatedly and causing issues like duplicate event reporting.
Additionally, enable_tracing=True is a legacy alias for traces_sample_rate=1.0, which the first init already sets; the second call provides no new functionality while discarding essential configuration.
Merge the environment check into the first init and remove the redundant second call. Also add a return type annotation to the lifespan function (line 70).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/main.py` around lines 66 - 67, Remove the second call to
sentry_sdk.init and instead include the environment check in the initial sentry
setup so the single sentry_sdk.init call (the one that configures
traces_sample_rate, before_send_transaction, LoggingIntegration,
disabled_integrations, and OTEL instrumenter) only runs when settings.SENTRY_DSN
is set and settings.ENVIRONMENT != "development"; delete the redundant
init(dsn=..., enable_tracing=True) call that re-initializes Sentry. Also add an
explicit return type annotation to the lifespan function (lifespan) to match its
async generator type signature.
| class FeatureFlag(FeatureFlagBase, table=True): | ||
| __tablename__ = "feature_flag" | ||
| __table_args__ = ( | ||
| UniqueConstraint( | ||
| "key", | ||
| "organization_id", | ||
| "project_id", | ||
| name="uq_feature_flag_key_org_project", | ||
| ), | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
PostgreSQL UNIQUE constraint NULLS NOT DISTINCT version support
💡 Result:
PostgreSQL supports the UNIQUE constraint with NULLS NOT DISTINCT starting from version 15. This feature allows unique constraints and indexes to treat NULL values as not distinct (equal), preventing multiple rows with NULL in the constrained columns. By default (NULLS DISTINCT), multiple NULLs are allowed. Syntax examples: CREATE TABLE products ( product_no integer UNIQUE NULLS NOT DISTINCT, name text ); or ALTER TABLE products ADD CONSTRAINT uq_product_no UNIQUE NULLS NOT DISTINCT (product_no); Official documentation confirms this in PostgreSQL 15 unique indexes section and current (18) constraints section. PostgreSQL 15 release notes explicitly mention: "Allow unique constraints and indexes to treat NULL values as not distinct". Prior to version 15, NULLs were always treated as distinct in unique constraints/indexes. Workarounds like partial indexes were used for similar behavior. The feature is available in all versions from 15 onwards, including the latest (18 as of 2026-02-26).
Citations:
- 1: https://www.postgresql.org/about/featurematrix/detail/unique-nulls-not-distinct/
- 2: https://www.postgresql.org/docs/15/indexes-unique.html
- 3: https://www.postgresql.org/docs/current/ddl-constraints.html
- 4: https://www.postgresql.org/docs/release/15.0/
🏁 Script executed:
# Read the full feature_flag.py file to understand the context
cat -n backend/app/models/feature_flag.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 2931
🏁 Script executed:
# Search for get_feature_flag_enabled to understand the usage pattern
rg -A 5 "get_feature_flag_enabled" --type pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1290
🏁 Script executed:
# Check if there are any alembic migrations related to feature_flag
fd -e py . alembic/versions -o git ls-files "alembic/versions/*feature*flag*"Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 201
🏁 Script executed:
# Check the FeatureFlagBase to understand all columns
rg -B 5 "class FeatureFlagBase" backend/app/models/feature_flag.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 196
🏁 Script executed:
# Get the full get_feature_flag_enabled function implementation
rg -A 20 "def get_feature_flag_enabled" backend/app/crud/feature_flag.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 578
🏁 Script executed:
# Search for all feature flag CRUD operations to understand the full context
rg "def.*feature_flag" backend/app/crud/feature_flag.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 226
🏁 Script executed:
# Check for any existing alembic migrations related to feature_flag
fd . alembic/versions -name "*feature*" -o -name "*flag*"Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 306
🏁 Script executed:
# Check the database schema or migration that created feature_flag table
rg "feature_flag" alembic/versions --type pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 132
🏁 Script executed:
# Check SQLModel version and documentation for UniqueConstraint support
rg "sqlmodel|SQLModel" pyproject.toml setup.py requirements.txt 2>/dev/null | head -20Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Verify the query behavior - check if there are any tests related to feature flags
fd feature_flag . --type py | rg -l "test|spec"Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 245
🏁 Script executed:
# Check the imports in feature_flag.py to confirm SQLModel/SQLAlchemy capabilities
head -20 backend/app/crud/feature_flag.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 551
🏁 Script executed:
# Check if there are any configuration files that specify PostgreSQL version
rg -l "postgres|version" . --type py | grep -E "config|settings|pyproject|requirements" | head -5Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 316
🏁 Script executed:
# Search for any existing use of postgresql_nulls_not_distinct or partial indexes
rg "postgresql_nulls_not_distinct|postgresql_where" .Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 2238
🏁 Script executed:
# Check SQLAlchemy/SQLModel imports to confirm Index support
rg "from sqlalchemy|from sqlmodel" backend/app/models/ | head -10Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 823
🏁 Script executed:
# Check for feature_flag migration file
fd "feature_flag\|041\|042\|043\|044\|045" alembic/versions --extension pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 179
🏁 Script executed:
# Search in alembic versions for feature_flag table creation
rg -l "feature_flag" alembic/versionsRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 132
🏁 Script executed:
# Check if Index is imported anywhere in models for pattern precedent
rg "from sqlalchemy import.*Index" backend/app/models/Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 312
🏁 Script executed:
# Find the actual alembic directory structure
find . -type d -name "alembic" 2>/dev/null | head -5Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 97
🏁 Script executed:
# Check if feature_flag migration exists
find . -name "*feature*flag*" -type f 2>/dev/null | grep -E "alembic|migration"Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 139
🏁 Script executed:
# List alembic versions directory structure
ls -la backend/app/alembic/versions/ 2>/dev/null | tail -20Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 1805
🏁 Script executed:
# Read the feature_flag migration file
cat -n backend/app/alembic/versions/055_create_feature_flag_table.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 3302
Critical: UniqueConstraint won't enforce uniqueness for org-scoped flags (project_id IS NULL).
PostgreSQL's default NULLS DISTINCT semantics (prior to version 15) mean two rows with the same (key, organization_id) and project_id = NULL will both be allowed. Since get_feature_flag_enabled() does .first(), duplicate org-level rows lead to nondeterministic enablement reads, and the API surface cannot reliably upsert org-scoped flags.
Pick one of these (and mirror in migration 055_create_feature_flag_table.py):
- Postgres 15+: add
postgresql_nulls_not_distinct=Trueto theUniqueConstraint. - Portable: replace with two partial unique indexes (one for
project_id IS NULL, one forproject_id IS NOT NULL).
🛡️ Option A — NULLS NOT DISTINCT (PG ≥ 15)
-class FeatureFlag(FeatureFlagBase, table=True):
- __tablename__ = "feature_flag"
- __table_args__ = (
- UniqueConstraint(
- "key",
- "organization_id",
- "project_id",
- name="uq_feature_flag_key_org_project",
- ),
- )
+class FeatureFlag(FeatureFlagBase, table=True):
+ __tablename__ = "feature_flag"
+ __table_args__ = (
+ UniqueConstraint(
+ "key",
+ "organization_id",
+ "project_id",
+ name="uq_feature_flag_key_org_project",
+ postgresql_nulls_not_distinct=True,
+ ),
+ )🛡️ Option B — partial unique indexes (portable)
-from sqlmodel import Field, SQLModel, UniqueConstraint
+from sqlalchemy import Index, text
+from sqlmodel import Field, SQLModel
@@
- __table_args__ = (
- UniqueConstraint(
- "key",
- "organization_id",
- "project_id",
- name="uq_feature_flag_key_org_project",
- ),
- )
+ __table_args__ = (
+ Index(
+ "uq_feature_flag_key_org_proj",
+ "key", "organization_id", "project_id",
+ unique=True,
+ postgresql_where=text("project_id IS NOT NULL"),
+ ),
+ Index(
+ "uq_feature_flag_key_org_no_proj",
+ "key", "organization_id",
+ unique=True,
+ postgresql_where=text("project_id IS NULL"),
+ ),
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/models/feature_flag.py` around lines 33 - 42, The
UniqueConstraint on FeatureFlag (class FeatureFlag) won’t prevent duplicate
org-scoped flags because project_id IS NULL is treated distinct; update both the
model and migration 055_create_feature_flag_table.py to use one of the suggested
fixes: either add postgresql_nulls_not_distinct=True to the UniqueConstraint
(Postgres ≥15) or replace the UniqueConstraint with two partial unique
indexes—one UNIQUE on (key, organization_id) WHERE project_id IS NULL and
another UNIQUE on (key, organization_id, project_id) WHERE project_id IS NOT
NULL—so get_feature_flag_enabled() returns deterministic results; pick and apply
the same approach in the model (FeatureFlag) and the migration.
- Added `service.py` to handle assessment evaluation orchestration, including starting and retrying assessments. - Introduced utility functions for exporting assessment results in various formats (CSV, XLSX, JSON) in `export.py`. - Created parsing utilities for handling batch results in `parsing.py`. - Implemented validation for dataset file uploads in `validators.py`. - Updated batch job model to include assessment type. - Enhanced evaluation run model to reference parent assessments. - Added new fields for reasoning effort and summary preferences in LLM request model.
b0f5463 to
bce0944
Compare
…n model - Introduced AssessmentRun model to replace EvaluationRun for assessment-specific runs. - Updated CRUD operations to manage Assessment and AssessmentRun entities. - Refactored processing logic to accommodate new AssessmentRun structure. - Adjusted logging and error handling to reflect changes in run management. - Modified routes and services to utilize AssessmentRun for assessment evaluations. - Enhanced export functionality to support new AssessmentRun model. - Removed EvaluationRun references where applicable and ensured backward compatibility.
There was a problem hiding this comment.
Actionable comments posted: 10
♻️ Duplicate comments (3)
backend/app/assessment/dataset.py (2)
78-101:⚠️ Potential issue | 🟠 MajorResource leak:
wb.close()is not in afinallyblock.If
iter_rows/nextor the count comprehension raises (corrupt xlsx, unexpected types), control jumps toexceptat line 99 andwb.close()is never reached, leaking the underlying zip file handle. Move cleanup intofinally(or usecontextlib.closing).🛡️ Proposed fix
def _count_excel_rows(content: bytes) -> int: """Count data rows in an Excel file (excluding header).""" + wb = None try: import openpyxl wb = openpyxl.load_workbook(io.BytesIO(content), read_only=True, data_only=True) ws = wb.active if ws is None: return 0 rows_iter = ws.iter_rows(values_only=True) header = next(rows_iter, None) if header is None: - wb.close() return 0 - count = sum( + return sum( 1 for row in rows_iter if row and any(cell is not None for cell in row) ) - wb.close() - return count except Exception as e: logger.warning(f"[_count_excel_rows] Failed to count rows | {e}") return 0 + finally: + if wb is not None: + try: + wb.close() + except Exception: + pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/dataset.py` around lines 78 - 101, The _count_excel_rows function can leak the workbook handle because wb.close() is only called on normal return; ensure the workbook is always closed by moving wb.close() into a finally block (or use contextlib.closing or openpyxl.Workbook context manager) so that after creating wb = openpyxl.load_workbook(...) and before returning from any path (including exceptions) the workbook is closed; update the logic around ws, header, and the count comprehension to use this guaranteed cleanup and remove duplicate close calls inside the try block.
140-163:⚠️ Potential issue | 🟠 MajorSilent partial failure: dataset row persisted even when object-store upload fails.
_upload_file_to_object_storeswallows exceptions and returnsNone, after whichcreate_evaluation_datasetis invoked unconditionally — leaving an "EvaluationDataset" record whoseobject_store_urlisNone. Downstream batch processing (backend/app/assessment/batch.py::_load_dataset_rowsraisesValueError(f"Dataset {dataset.id} has no object_store_url")) will then fail mysteriously, and the caller ofupload_datasethas no signal of the failure.Fail loudly on upload failure rather than persisting an orphan metadata row.
🛡️ Proposed behavior
object_store_url = _upload_file_to_object_store( session=session, project_id=project_id, file_content=file_content, file_ext=file_ext, dataset_name=dataset_name, ) + if object_store_url is None: + raise HTTPException( + status_code=502, + detail="Failed to upload dataset to object storage. Please retry.", + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/dataset.py` around lines 140 - 163, The code currently calls _upload_file_to_object_store and then unconditionally creates the DB row via create_evaluation_dataset even when the upload fails or returns None; change upload_dataset so it treats upload failures as fatal: call _upload_file_to_object_store inside a try/except, propagate or raise a clear exception when the upload raises or when the returned object_store_url is falsy/None, and only call create_evaluation_dataset when object_store_url is a valid non-empty string; reference _upload_file_to_object_store and create_evaluation_dataset to locate the change and ensure no EvaluationDataset is persisted if the upload did not succeed.backend/app/assessment/events.py (1)
13-48:⚠️ Potential issue | 🟠 MajorUnbounded subscriber queues — slow/idle SSE clients can OOM the process.
asyncio.Queue()defaults to unlimited size, andput_nowaithere never blocks/drops. A subscriber that connects but stops draining (e.g., its TCP send buffer is backed up) will accumulate every published assessment event indefinitely. Under cron-driven publish bursts across many subscribers this is a real memory-pressure / reliability issue.Use a bounded queue with explicit overflow handling (drop oldest, or evict the slow subscriber).
🛡️ Proposed bounded queue with overflow drop
+_QUEUE_MAXSIZE = 256 + class AssessmentEventBroker: def __init__(self) -> None: self._subscribers: set[asyncio.Queue[dict]] = set() async def subscribe(self) -> AsyncIterator[str]: - queue: asyncio.Queue[dict] = asyncio.Queue() + queue: asyncio.Queue[dict] = asyncio.Queue(maxsize=_QUEUE_MAXSIZE) self._subscribers.add(queue) @@ for queue in list(self._subscribers): - queue.put_nowait(payload) + try: + queue.put_nowait(payload) + except asyncio.QueueFull: + logger.warning( + "[publish] Subscriber queue full, dropping oldest | type=%s", + payload.get("type"), + ) + try: + queue.get_nowait() + queue.put_nowait(payload) + except (asyncio.QueueEmpty, asyncio.QueueFull): + pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/events.py` around lines 13 - 48, The subscriber queues are unbounded (asyncio.Queue()) so slow/idle SSE clients can cause OOM; fix by creating bounded queues in subscribe (e.g., queue: asyncio.Queue[dict] = asyncio.Queue(maxsize=XX)) and update publish to handle overflow: if queue.put_nowait(payload) raises asyncio.QueueFull, either drop the oldest item from that queue (queue.get_nowait() then retry put_nowait) or treat the subscriber as dead by discarding the queue from self._subscribers and logging/removing it; change symbols: modify subscribe (queue creation) and publish (the for loop over self._subscribers and use of queue.put_nowait) to implement the chosen overflow strategy and add appropriate debug/info logs.
🧹 Nitpick comments (18)
backend/app/api/routes/cron.py (2)
79-95: Be defensive about the shape ofresultbefore mutating it.If
process_all_pending_evaluationsever returns something other than a dict (e.g.,Noneafter an internal error path),result["assessment"] = ...will raise aTypeErroroutside the innertry/except, and the original error context will be lost. A quickisinstance(result, dict)guard (or initializingresult = result or {}) keeps the merge robust.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/cron.py` around lines 79 - 95, The code mutates result assuming it's a dict which can raise a TypeError if process_all_pending_evaluations (or the variable result) is None or not a dict; guard and normalize it before merging by checking isinstance(result, dict) (or using result = result or {}) and then safely set result["assessment"] = assessment_result and update totals from assessment_result; reference the variables result and assessment_result and the surrounding merge block so the normalization happens immediately before the lines that assign result["assessment"] and update total_processed/total_failed/total_still_processing.
71-77: Lazy import inside the request handler — surface the reason or hoist it.The runtime
from app.assessment.cron import poll_all_pending_assessment_evaluationsinside the handler is typically a circular-import workaround. If that's the case, leave a brief comment explaining why; otherwise hoist the import to module scope so it's loaded once at startup rather than on every cron invocation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/cron.py` around lines 71 - 77, The inline runtime import of poll_all_pending_assessment_evaluations inside the cron request handler should be resolved: either hoist the import to module scope so the function is imported once at startup (remove the in-handler from app.assessment.cron import ... and add it at top-level), or if this was done to avoid a circular import, add a concise comment above the inline import explaining the circular-import reason and why the import must remain lazy. Ensure you keep the existing call to poll_all_pending_assessment_evaluations(session=session) unchanged and reference that symbol in your change.backend/app/assessment/cron.py (1)
73-76: Stray adjacent string concatenation produces an awkward log line.
"[poll_all_pending_assessment_evaluations] " "No active assessments found"(and the same pattern on line 75) creates a single string with two spaces between the prefix and the message. Drop the empty piece for a cleaner log.♻️ Proposed change
- logger.info( - "[poll_all_pending_assessment_evaluations] " "No active assessments found" - ) + logger.info( + "[poll_all_pending_assessment_evaluations] No active assessments found" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/cron.py` around lines 73 - 76, The log call in poll_all_pending_assessment_evaluations uses two adjacent string literals ("[poll_all_pending_assessment_evaluations] " "No active assessments found") which concatenates into a string with double space; update the logger.info invocation in cron.py to use a single combined string (e.g., "[poll_all_pending_assessment_evaluations] No active assessments found") or an f-string so the prefix and message are one contiguous string with a single space, ensuring the message reads cleanly.backend/app/assessment/events.py (1)
26-26: Use the builtinTimeoutErroralias.Per Ruff UP041,
asyncio.TimeoutErroris a deprecated alias of the builtinTimeoutError(since Python 3.11). Since this repo targets 3.11+, use the builtin directly.♻️ Proposed change
- except asyncio.TimeoutError: + except TimeoutError: yield ": keep-alive\n\n" continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/events.py` at line 26, Replace the deprecated alias asyncio.TimeoutError with the builtin TimeoutError in the exception handler inside backend.app.assessment.events (the except block currently using asyncio.TimeoutError); update the except clause in the function/method where that try/except appears (look for the "except asyncio.TimeoutError:" line) to use "except TimeoutError:" so it relies on the built-in exception instead of the asyncio alias.backend/app/alembic/versions/055_add_assessment_manager_table.py (1)
121-132: Consider DB-levelserver_defaultforinserted_at/updated_at.
inserted_atandupdated_atare NOT NULL but have noserver_default. The model relies on SQLModel'sdefault_factory=now, so any direct SQLINSERT(e.g., from psql, a fixture script, or an Alembic data migration) without these columns set will fail. Addingserver_default=sa.func.now()keeps both ORM and raw inserts safe.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/alembic/versions/055_add_assessment_manager_table.py` around lines 121 - 132, The migration defines NOT NULL columns inserted_at and updated_at without DB defaults, which breaks raw INSERTs; update the Column definitions for "inserted_at" and "updated_at" to include a database server default (server_default=sa.func.now()) and add server_onupdate=sa.func.now() for "updated_at" so the DB will populate timestamps for direct SQL inserts and automatic updates from the database side.backend/app/assessment/service.py (4)
10-10: Importing a private symbol (_resolve_config) across modules.The leading underscore signals
_resolve_configisbatch.py's implementation detail; importing it fromservice.pycouples the two modules to that internal contract. Either rename it (e.g.,resolve_config) and treat it as a real public helper, or move it to a sharedassessment._utils/assessment.configmodule so both call sites depend on a stable name.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/service.py` at line 10, service.py is importing the private function _resolve_config from batch.py which couples modules to an internal symbol; change this by promoting the helper to a public API (rename _resolve_config to resolve_config) or move it into a shared module (e.g., assessment.config or assessment_utils) and update both callers to import the new public name, then remove the underscore import from batch.py and service.py and ensure submit_assessment_batch and any other callers reference the new resolve_config location.
145-193:start_assessmentdoes multi-row creates/updates without an explicit transactional boundary — partial state on failure.Each
create_assessment,create_assessment_run, andupdate_assessment_run_statuscommits on its own (per the CRUD helpers). If something between iterations raises an unhandled error (e.g., a network blip duringsubmit_assessment_batchthat escapes thetry, or a DB error duringrecompute_assessment_status), the parentAssessmentplus a partial set ofAssessmentRunrows will be persisted with the user receiving a 500. The cron will then try to process orphanedpending/processingrows.Consider either (a) wrapping the orchestration in an explicit transaction with
session.begin()plus a single final commit, or (b) ensuring the failure path here marks the whole parent assessment asfailedwith a clearerror_messageso cron + the UI can reason about it. The current per-iterationtry/exceptonly handlessubmit_assessment_batchfailures, not the surrounding CRUD ops.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/service.py` around lines 145 - 193, The start_assessment flow performs multiple DB creates/updates (create_assessment, create_assessment_run, update_assessment_run_status, recompute_assessment_status) without an explicit transaction, allowing partial commits if an error occurs (e.g., during submit_assessment_batch or recompute_assessment_status); wrap the multi-step orchestration in a single transactional boundary (use session.begin() / session.begin_nested() as appropriate) so all create/update calls are atomic and commit once at the end, or alternatively, on any uncaught exception ensure you mark the parent Assessment as failed (update_assessment_run_status + update parent assessment status and error_message via the same session) before re-raising/returning to avoid leaving orphaned pending/processing runs; locate these changes in start_assessment and the exception handling around submit_assessment_batch/recompute_assessment_status to implement the chosen approach.
159-195: Per-iterationrecompute_assessment_statusis redundant and triples the DB writes on the parent row.Each loop iteration calls
recompute_assessment_statuson the success path (line 178) or the failure path (line 191), then there's a final one at line 195. For an N-config assessment, that's N+1 recomputes (each does aSELECTover runs + anUPDATE/COMMITon the parent). The final one alone is sufficient since the parent's status only needs to be correct after the loop completes — none of the per-iteration values are read between iterations.Drop the inner calls and rely on line 195.
♻️ Proposed change
run = update_assessment_run_status( session=session, run=run, status="processing", batch_job_id=batch_job.id, total_items=batch_job.total_items, ) - recompute_assessment_status(session=session, assessment_id=assessment.id) except Exception as e: ... run = update_assessment_run_status( session=session, run=run, status="failed", error_message="Batch submission failed. Please try again or contact support.", ) - recompute_assessment_status(session=session, assessment_id=assessment.id) runs.append(run) recompute_assessment_status(session=session, assessment_id=assessment.id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/service.py` around lines 159 - 195, The loop currently calls recompute_assessment_status inside both the try (after update_assessment_run_status) and the except blocks, causing redundant DB writes; remove the per-iteration calls to recompute_assessment_status within the try/except and keep only the final recompute_assessment_status after the loop completes so the parent's status is updated once; locate the calls around submit_assessment_batch, update_assessment_run_status, and the except block handling the exception for run processing and delete those inner recompute_assessment_status invocations while leaving the final recompute_assessment_status(session=session, assessment_id=assessment.id) at the end.
51-62:UUID(str(run.config_id))is a redundant round-trip.
AssessmentRun.config_idis already typed asUUID | Noneper the model, and theif not run.config_idguard above ensures it's aUUIDhere.UUID(str(uuid_obj))is a no-op conversion.♻️ Proposed change
configs.append( AssessmentConfigRef( - config_id=UUID(str(run.config_id)), + config_id=run.config_id, config_version=run.config_version, ) )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/service.py` around lines 51 - 62, The code is redundantly wrapping an already-UUID config_id with UUID(str(...)) in the loop building AssessmentConfigRef; change the AssessmentConfigRef call to use the existing UUID directly (e.g. config_id=run.config_id) and if the type checker complains, assert or cast the non-None type before use (e.g. assert run.config_id is not None then use run.config_id) so you remove the unnecessary UUID(str(run.config_id)) round-trip; update the instantiation in the loop that creates AssessmentConfigRef accordingly.backend/app/assessment/mappers.py (3)
10-22: Type hint inaccuracy:normalize_llm_textcan returnNone.The signature is
(text: str) -> str, but lines 11–12 returntextunchanged whentextis falsy or non-str— so aNoneinput yieldsNone. Callers at lines 100 and 185 passkaapi_params.get("instructions")(possiblyNone) and rely on this. Tighten the annotation.♻️ Proposed change
-def normalize_llm_text(text: str) -> str: - if not isinstance(text, str) or not text: +def normalize_llm_text(text: str | None) -> str | None: + if not isinstance(text, str) or not text: return text🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` around lines 10 - 22, The type hint for normalize_llm_text is inaccurate because it can accept and return None; update its annotation to accept Optional[str] and return Optional[str] (from typing import Optional), i.e. change def normalize_llm_text(text: str) -> str to def normalize_llm_text(text: Optional[str]) -> Optional[str], and keep the existing early-return behavior so callers that pass kaapi_params.get("instructions") (which may be None) are correctly typed.
25-59: Strict-schema/strip-schema helpers don't recurse intoanyOf/oneOf/allOf.
_ensure_openai_strict_schemaand_strip_additional_propertiesonly walkpropertiesanditems. JSON Schemas that compose subschemas viaanyOf/oneOf/allOf(or nested under$defs) won't haveadditionalProperties: falseinjected (OpenAI strict mode will reject them) and won't haveadditionalPropertiesstripped on the Google side. Worth handling these for parity with the OpenAI strict-schema spec.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` around lines 25 - 59, Both _ensure_openai_strict_schema and _strip_additional_properties only recurse into "properties" and "items", so they miss subschemas under composition keywords and definitions; update both functions to also traverse "anyOf", "oneOf", "allOf" (iterating and recursively processing each subschema when they're lists) and "$defs" / "definitions" (recursively processing dict values), ensuring you treat nested dicts/lists consistently (use _ensure_openai_strict_schema in the former and _strip_additional_properties in the latter) so additionalProperties is injected or removed throughout composed/nested schemas.
5-5: Avoid depending ongoogle.genai._transformers(private module).The leading underscore signals it's not part of the public API; minor SDK upgrades can rename, restructure, or remove it without notice, breaking assessment schema conversion silently. The Google Genai SDK now natively supports JSON Schema via the documented
response_json_schemaparameter (announced Nov 2025)—pass raw JSON Schema directly without converting throught_schema. Alternatively, hand-roll the conversion in_convert_json_schema_to_googleby implementing the transformation logic (uppercase types, resolve$reffrom$defs, convertanyOf[null, X]to nullable) and pin the SDK version explicitly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` at line 5, The import of google.genai._transformers in backend.app.assessment.mappers should be removed and the assessment->GenAI schema flow updated to not rely on the private module; instead pass the raw JSON Schema into the SDK's documented response_json_schema parameter when building requests, or implement a robust internal converter _convert_json_schema_to_google that performs the necessary transformations (uppercase/openAI-style type names, resolve $ref entries from $defs, convert anyOf [ "null", X ] into a nullable representation) and pin the SDK version if you must keep an internal transformer; update any references in this file (e.g., where genai_transformers was used) to call response_json_schema or the new _convert_json_schema_to_google helper.backend/app/assessment/batch.py (2)
289-304: Shallow copy ofopenai_paramsshares nestedtext/reasoning/toolsdicts across all rows.
body = dict(openai_params)is a shallow copy: every JSONL line built in this loop ends up referencing the same nestedtext.format.json_schema,reasoning, andtoolsobjects. Today nothing mutates them after this point, but any future code that does (e.g. per-row schema override) would silently affect every other row.Use
copy.deepcopy(openai_params)here (and on the Gemini side at the equivalent assignment) if these payloads are ever expected to diverge per row.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/batch.py` around lines 289 - 304, The current shallow copy "body = dict(openai_params)" causes nested objects (like text.format.json_schema, reasoning, tools) to be shared across all rows; replace that with a deep copy (use copy.deepcopy(openai_params)) when constructing body inside the loop so each JSONL row gets an independent payload; apply the same deep-copy pattern at the equivalent Gemini-side assignment as well to avoid cross-row mutation later (look for the "body" variable, "openai_params", and the jsonl_data.append block/BATCH_KEY usage to locate the spots to change).
138-161: Placeholder-in-value injection in_build_text_prompt.
prompt.replace("{col}", row[col])replaces all occurrences in sequence. If a column value itself contains another placeholder (e.g., columnnamehas the literal text{policy}), andpolicyis also a substituted column processed later, the value gets re-expanded. Equivalent to chained substitution. For Indic-text or user-uploaded data this is unlikely, but worth either documenting the contract or usingstr.format_mapwith a defaultdict to do a single-pass substitution.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/batch.py` around lines 138 - 161, The _build_text_prompt function currently does sequential replace which allows placeholder-in-value re-expansion (e.g., a column value containing "{other}" gets re-substituted when that column is processed later); update _build_text_prompt to perform a single-pass, safe substitution by building a mapping of column->normalized value and using a single-call formatting approach (e.g., str.format_map with a default/missing-safe dict) instead of iterative prompt.replace, ensuring you still call normalize_llm_text for each value before substitution; keep the original behavior for the no-template branch unchanged.backend/app/assessment/models.py (1)
32-36: Primary-key fields are typed as non-optional but default toNone.
id: int = SQLField(default=None, primary_key=True)is internally inconsistent with strict type checking — the value isNoneuntil the DB assigns it. SQLModel's idiomatic form isid: int | None = SQLField(default=None, primary_key=True).Low-priority cleanup; behavior is unchanged.
Also applies to: 144-148
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/models.py` around lines 32 - 36, The primary-key fields (e.g., the id field declared as id: int = SQLField(default=None, primary_key=True)) are typed non-optional while defaulting to None; update the type annotations to allow None (use int | None or Optional[int]) so the signature matches the SQLField(default=None) behavior, e.g., change id: int to id: int | None = SQLField(...); apply the same fix to the other primary-key field(s) with the same pattern elsewhere in the file (the block around the second id declaration).backend/app/assessment/crud.py (1)
255-315: Recompute is read‑modify‑write without locking — concurrent updates can leave stale counters.
recompute_assessment_statusreads child runs, computes counters, and writes them back to the parent. If two workers run this concurrently for the sameassessment_id(e.g. cron + an SSE-triggered recompute), the later commit overwrites the earlier one based on a possibly stale snapshot of child rows.Today's call sites appear to be single-threaded (cron + sequential processing), so this is informational. If parallel polling is ever introduced, consider
SELECT ... FOR UPDATEon the parent row, or recompute counters in SQL with a singleUPDATE ... FROM (SELECT ...)aggregation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/crud.py` around lines 255 - 315, recompute_assessment_status currently does a read-modify-write on the Assessment row which can be lost under concurrent recomputes; to fix, acquire a row lock on the parent Assessment (e.g., SELECT ... FOR UPDATE) before recomputing in recompute_assessment_status or replace the logic with a single atomic UPDATE ... FROM (SELECT ...) aggregation that computes totals from AssessmentRun and updates Assessment in one statement; reference the recompute_assessment_status function and the Assessment/AssessmentRun models and ensure you commit while holding the lock or use a transaction so concurrent workers cannot overwrite each other's results.backend/app/assessment/utils/export.py (1)
395-417:assessment_id=run.assessment_id or 0masks orphan rows.Coercing a missing
assessment_idto0produces an export row that looks like it belongs to assessment0. If the schema invariant is that exported runs always have a parent, fail loud instead; otherwise relaxAssessmentExportRow.assessment_idtoint | Noneinmodels.pyand pass the real value through.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/utils/export.py` around lines 395 - 417, The export is masking missing parent relationships by coercing assessment_id to 0; update the export to pass the real value (use assessment_id=run.assessment_id) and either (A) if a missing parent should be an error, add an explicit check in the export routine (raise ValueError or similar when run.assessment_id is None) before building AssessmentExportRow, or (B) if exports may legitimately have no parent, update AssessmentExportRow.assessment_id type in models.py to allow None (int | None) and keep passing run.assessment_id through; locate the construction of AssessmentExportRow in export_rows.append and implement one of these two fixes around that symbol.backend/app/assessment/processing.py (1)
482-486: LGTM on the wrapper, but the local import is doing real work — call it out.Importing
poll_all_pending_assessment_evaluationsinside the function avoids a circular import withapp.assessment.cron. A short inline comment will save the next maintainer a debugging session if they "clean up" the import to module scope.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/processing.py` around lines 482 - 486, The local import of poll_all_pending_assessment_evaluations inside poll_all_pending_assessments is intentional to avoid a circular import; add a concise inline comment above the from app.assessment.cron import poll_all_pending_assessment_evaluations line explaining that the import must remain local (not module-level) to prevent circular imports and to warn future maintainers not to move it to top-level.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/alembic/versions/055_add_assessment_manager_table.py`:
- Around line 15-16: The migration's revision identifier is not zero-padded and
mismatches the filename and down_revision; update the module-level variable
revision from "55" to "055" so it matches the filename and existing
down_revision = "054", keeping the three-digit convention used across migrations
and ensuring subsequent revisions align.
In `@backend/app/assessment/batch.py`:
- Around line 107-135: The workbook created in _parse_excel_rows (wb returned by
openpyxl.load_workbook) may never be closed if iter_rows, next(header) or the
row-to-dict conversion raises; wrap the workbook usage in a try/finally (or
context manager if available) so wb.close() is always called in the finally
block after opening wb and before returning, and move any early returns (e.g.,
when ws is None or header is None) into that try so they still hit the finally
cleanup; ensure variables referenced (wb, ws, rows_iter, header, result) are
initialized inside the try scope.
- Around line 481-569: The code is defaulting unknown provider_name to "openai"
via _NATIVE_PROVIDERS.get(..., "openai"), causing unknown providers to silently
use the OpenAI path; change this to fail fast by using
_NATIVE_PROVIDERS.get(provider_name) (no default) or explicitly checking if
provider_name not in _NATIVE_PROVIDERS and raising ValueError, then use
base_provider for the if/elif branches (referencing the variables base_provider
and provider_name and the mapping _NATIVE_PROVIDERS); ensure the unreachable
else that raises ValueError remains reachable so start_batch_job is only called
for known providers.
In `@backend/app/assessment/cron.py`:
- Around line 144-182: The cleanup steps inside the broad except should be
isolated so a secondary DB error can't abort the whole cron loop: wrap the calls
to update_assessment_run_status, recompute_assessment_status, the
construction/appending of failure_result, assessment_event_broker.publish (and
the failed += 1) in their own try/except block; on exception catch and
logger.error with context (include run.id/run.assessment_id/run.run_name) and
exc_info=True, then continue the for-loop without re-raising so one failed
cleanup won't stop processing remaining runs.
- Around line 41-61: The payload returned by _build_callback_payload nests
"type" inside APIResponse.success_response(...).data so events.py's subscribe()
(which reads payload.get("type", "message")) always falls back to "message"; fix
by promoting the event type to the root of the published payload: in
_build_callback_payload (or immediately after calling
APIResponse.success_response(...).model_dump()) ensure the returned dict
contains a top-level "type": "assessment.child_status_changed" key (while
keeping the detailed run/assessment info under "data"), so subscribe() can
detect the event without changing events.py.
In `@backend/app/assessment/crud.py`:
- Around line 192-223: The update_assessment_run_status function currently
treats None as "do not change", preventing callers from clearing fields; fix
this by introducing a unique sentinel (e.g., UNSET = object()) as the default
for error_message, batch_job_id, total_items, and object_store_url, then change
the checks to "is not UNSET" and assign the value (which may be None) to
run.error_message, run.batch_job_id, run.total_items, and run.object_store_url
inside update_assessment_run_status so callers can explicitly clear fields by
passing None; update any call sites accordingly to use the sentinel omission vs
explicit None semantics.
In `@backend/app/assessment/processing.py`:
- Line 253: Update the log_prefix to include the current function name so all
subsequent logger calls follow the project's convention; replace the line
setting log_prefix = f"[assessment_run={run.id}]" with something like log_prefix
= f"[{inspect.currentframe().f_code.co_name}][assessment_run={run.id}]" (and add
"import inspect" if missing) so functions like the one containing run.id will
prefix logs with the function name and the assessment_run id used in
logger.info/logger.error calls.
- Around line 198-231: In the Google/Gemini branch (the block handling
provider_name in ("google","google-native")), populate the "usage" field from
the response instead of hard-coding None (e.g., extract response.get("usage")
or, if Gemini batch responses use a different shape, parse the batch structure
to pull usage/metadata) and treat an empty string from
extract_text_from_response_dict(response) as an error by setting error="Empty
response output" (mirror the Anthropic branch behavior); update the logic around
response, text = extract_text_from_response_dict(response) and the
results.append entries to include the extracted usage and to set error when text
is empty.
In `@backend/app/assessment/utils/export.py`:
- Around line 60-66: The expansion loop in export.py silently overwrites input
columns that share names with export fields (e.g. output, error, row_id, run_id,
result_status, response_id) by setting new_row[k] from input_data then calling
new_row.update(row); modify the loop that builds new_row (the variables
row_payload, input_data, input_keys, new_row are present) to detect collisions
against a reserved_fields set and instead namespace conflicting input keys (e.g.
store as "input_<k>") or, alternatively, emit a single warning when any
collision is detected; ensure namespacing logic also avoids clobbering an
existing "input_<k>" in row and preserves original row fields when calling
new_row.update(row).
- Around line 422-433: The sort_export_rows function currently sorts by
row.row_id lexicographically which makes "row_10" come before "row_2"; update
sort_export_rows (and any callers expecting stable order like
load_export_rows_for_run / AssessmentExportRow) to extract the numeric index
from row.row_id (e.g. parse the suffix after "row_") and use that integer in the
sort key instead of the raw string, keeping the existing tie-breakers
(row.config_version or 0, row.run_id); ensure the parser handles missing/invalid
suffixes by falling back to 0 so sorting remains stable.
---
Duplicate comments:
In `@backend/app/assessment/dataset.py`:
- Around line 78-101: The _count_excel_rows function can leak the workbook
handle because wb.close() is only called on normal return; ensure the workbook
is always closed by moving wb.close() into a finally block (or use
contextlib.closing or openpyxl.Workbook context manager) so that after creating
wb = openpyxl.load_workbook(...) and before returning from any path (including
exceptions) the workbook is closed; update the logic around ws, header, and the
count comprehension to use this guaranteed cleanup and remove duplicate close
calls inside the try block.
- Around line 140-163: The code currently calls _upload_file_to_object_store and
then unconditionally creates the DB row via create_evaluation_dataset even when
the upload fails or returns None; change upload_dataset so it treats upload
failures as fatal: call _upload_file_to_object_store inside a try/except,
propagate or raise a clear exception when the upload raises or when the returned
object_store_url is falsy/None, and only call create_evaluation_dataset when
object_store_url is a valid non-empty string; reference
_upload_file_to_object_store and create_evaluation_dataset to locate the change
and ensure no EvaluationDataset is persisted if the upload did not succeed.
In `@backend/app/assessment/events.py`:
- Around line 13-48: The subscriber queues are unbounded (asyncio.Queue()) so
slow/idle SSE clients can cause OOM; fix by creating bounded queues in subscribe
(e.g., queue: asyncio.Queue[dict] = asyncio.Queue(maxsize=XX)) and update
publish to handle overflow: if queue.put_nowait(payload) raises
asyncio.QueueFull, either drop the oldest item from that queue
(queue.get_nowait() then retry put_nowait) or treat the subscriber as dead by
discarding the queue from self._subscribers and logging/removing it; change
symbols: modify subscribe (queue creation) and publish (the for loop over
self._subscribers and use of queue.put_nowait) to implement the chosen overflow
strategy and add appropriate debug/info logs.
---
Nitpick comments:
In `@backend/app/alembic/versions/055_add_assessment_manager_table.py`:
- Around line 121-132: The migration defines NOT NULL columns inserted_at and
updated_at without DB defaults, which breaks raw INSERTs; update the Column
definitions for "inserted_at" and "updated_at" to include a database server
default (server_default=sa.func.now()) and add server_onupdate=sa.func.now() for
"updated_at" so the DB will populate timestamps for direct SQL inserts and
automatic updates from the database side.
In `@backend/app/api/routes/cron.py`:
- Around line 79-95: The code mutates result assuming it's a dict which can
raise a TypeError if process_all_pending_evaluations (or the variable result) is
None or not a dict; guard and normalize it before merging by checking
isinstance(result, dict) (or using result = result or {}) and then safely set
result["assessment"] = assessment_result and update totals from
assessment_result; reference the variables result and assessment_result and the
surrounding merge block so the normalization happens immediately before the
lines that assign result["assessment"] and update
total_processed/total_failed/total_still_processing.
- Around line 71-77: The inline runtime import of
poll_all_pending_assessment_evaluations inside the cron request handler should
be resolved: either hoist the import to module scope so the function is imported
once at startup (remove the in-handler from app.assessment.cron import ... and
add it at top-level), or if this was done to avoid a circular import, add a
concise comment above the inline import explaining the circular-import reason
and why the import must remain lazy. Ensure you keep the existing call to
poll_all_pending_assessment_evaluations(session=session) unchanged and reference
that symbol in your change.
In `@backend/app/assessment/batch.py`:
- Around line 289-304: The current shallow copy "body = dict(openai_params)"
causes nested objects (like text.format.json_schema, reasoning, tools) to be
shared across all rows; replace that with a deep copy (use
copy.deepcopy(openai_params)) when constructing body inside the loop so each
JSONL row gets an independent payload; apply the same deep-copy pattern at the
equivalent Gemini-side assignment as well to avoid cross-row mutation later
(look for the "body" variable, "openai_params", and the jsonl_data.append
block/BATCH_KEY usage to locate the spots to change).
- Around line 138-161: The _build_text_prompt function currently does sequential
replace which allows placeholder-in-value re-expansion (e.g., a column value
containing "{other}" gets re-substituted when that column is processed later);
update _build_text_prompt to perform a single-pass, safe substitution by
building a mapping of column->normalized value and using a single-call
formatting approach (e.g., str.format_map with a default/missing-safe dict)
instead of iterative prompt.replace, ensuring you still call normalize_llm_text
for each value before substitution; keep the original behavior for the
no-template branch unchanged.
In `@backend/app/assessment/cron.py`:
- Around line 73-76: The log call in poll_all_pending_assessment_evaluations
uses two adjacent string literals ("[poll_all_pending_assessment_evaluations] "
"No active assessments found") which concatenates into a string with double
space; update the logger.info invocation in cron.py to use a single combined
string (e.g., "[poll_all_pending_assessment_evaluations] No active assessments
found") or an f-string so the prefix and message are one contiguous string with
a single space, ensuring the message reads cleanly.
In `@backend/app/assessment/crud.py`:
- Around line 255-315: recompute_assessment_status currently does a
read-modify-write on the Assessment row which can be lost under concurrent
recomputes; to fix, acquire a row lock on the parent Assessment (e.g., SELECT
... FOR UPDATE) before recomputing in recompute_assessment_status or replace the
logic with a single atomic UPDATE ... FROM (SELECT ...) aggregation that
computes totals from AssessmentRun and updates Assessment in one statement;
reference the recompute_assessment_status function and the
Assessment/AssessmentRun models and ensure you commit while holding the lock or
use a transaction so concurrent workers cannot overwrite each other's results.
In `@backend/app/assessment/events.py`:
- Line 26: Replace the deprecated alias asyncio.TimeoutError with the builtin
TimeoutError in the exception handler inside backend.app.assessment.events (the
except block currently using asyncio.TimeoutError); update the except clause in
the function/method where that try/except appears (look for the "except
asyncio.TimeoutError:" line) to use "except TimeoutError:" so it relies on the
built-in exception instead of the asyncio alias.
In `@backend/app/assessment/mappers.py`:
- Around line 10-22: The type hint for normalize_llm_text is inaccurate because
it can accept and return None; update its annotation to accept Optional[str] and
return Optional[str] (from typing import Optional), i.e. change def
normalize_llm_text(text: str) -> str to def normalize_llm_text(text:
Optional[str]) -> Optional[str], and keep the existing early-return behavior so
callers that pass kaapi_params.get("instructions") (which may be None) are
correctly typed.
- Around line 25-59: Both _ensure_openai_strict_schema and
_strip_additional_properties only recurse into "properties" and "items", so they
miss subschemas under composition keywords and definitions; update both
functions to also traverse "anyOf", "oneOf", "allOf" (iterating and recursively
processing each subschema when they're lists) and "$defs" / "definitions"
(recursively processing dict values), ensuring you treat nested dicts/lists
consistently (use _ensure_openai_strict_schema in the former and
_strip_additional_properties in the latter) so additionalProperties is injected
or removed throughout composed/nested schemas.
- Line 5: The import of google.genai._transformers in
backend.app.assessment.mappers should be removed and the assessment->GenAI
schema flow updated to not rely on the private module; instead pass the raw JSON
Schema into the SDK's documented response_json_schema parameter when building
requests, or implement a robust internal converter
_convert_json_schema_to_google that performs the necessary transformations
(uppercase/openAI-style type names, resolve $ref entries from $defs, convert
anyOf [ "null", X ] into a nullable representation) and pin the SDK version if
you must keep an internal transformer; update any references in this file (e.g.,
where genai_transformers was used) to call response_json_schema or the new
_convert_json_schema_to_google helper.
In `@backend/app/assessment/models.py`:
- Around line 32-36: The primary-key fields (e.g., the id field declared as id:
int = SQLField(default=None, primary_key=True)) are typed non-optional while
defaulting to None; update the type annotations to allow None (use int | None or
Optional[int]) so the signature matches the SQLField(default=None) behavior,
e.g., change id: int to id: int | None = SQLField(...); apply the same fix to
the other primary-key field(s) with the same pattern elsewhere in the file (the
block around the second id declaration).
In `@backend/app/assessment/processing.py`:
- Around line 482-486: The local import of
poll_all_pending_assessment_evaluations inside poll_all_pending_assessments is
intentional to avoid a circular import; add a concise inline comment above the
from app.assessment.cron import poll_all_pending_assessment_evaluations line
explaining that the import must remain local (not module-level) to prevent
circular imports and to warn future maintainers not to move it to top-level.
In `@backend/app/assessment/service.py`:
- Line 10: service.py is importing the private function _resolve_config from
batch.py which couples modules to an internal symbol; change this by promoting
the helper to a public API (rename _resolve_config to resolve_config) or move it
into a shared module (e.g., assessment.config or assessment_utils) and update
both callers to import the new public name, then remove the underscore import
from batch.py and service.py and ensure submit_assessment_batch and any other
callers reference the new resolve_config location.
- Around line 145-193: The start_assessment flow performs multiple DB
creates/updates (create_assessment, create_assessment_run,
update_assessment_run_status, recompute_assessment_status) without an explicit
transaction, allowing partial commits if an error occurs (e.g., during
submit_assessment_batch or recompute_assessment_status); wrap the multi-step
orchestration in a single transactional boundary (use session.begin() /
session.begin_nested() as appropriate) so all create/update calls are atomic and
commit once at the end, or alternatively, on any uncaught exception ensure you
mark the parent Assessment as failed (update_assessment_run_status + update
parent assessment status and error_message via the same session) before
re-raising/returning to avoid leaving orphaned pending/processing runs; locate
these changes in start_assessment and the exception handling around
submit_assessment_batch/recompute_assessment_status to implement the chosen
approach.
- Around line 159-195: The loop currently calls recompute_assessment_status
inside both the try (after update_assessment_run_status) and the except blocks,
causing redundant DB writes; remove the per-iteration calls to
recompute_assessment_status within the try/except and keep only the final
recompute_assessment_status after the loop completes so the parent's status is
updated once; locate the calls around submit_assessment_batch,
update_assessment_run_status, and the except block handling the exception for
run processing and delete those inner recompute_assessment_status invocations
while leaving the final recompute_assessment_status(session=session,
assessment_id=assessment.id) at the end.
- Around line 51-62: The code is redundantly wrapping an already-UUID config_id
with UUID(str(...)) in the loop building AssessmentConfigRef; change the
AssessmentConfigRef call to use the existing UUID directly (e.g.
config_id=run.config_id) and if the type checker complains, assert or cast the
non-None type before use (e.g. assert run.config_id is not None then use
run.config_id) so you remove the unnecessary UUID(str(run.config_id))
round-trip; update the instantiation in the loop that creates
AssessmentConfigRef accordingly.
In `@backend/app/assessment/utils/export.py`:
- Around line 395-417: The export is masking missing parent relationships by
coercing assessment_id to 0; update the export to pass the real value (use
assessment_id=run.assessment_id) and either (A) if a missing parent should be an
error, add an explicit check in the export routine (raise ValueError or similar
when run.assessment_id is None) before building AssessmentExportRow, or (B) if
exports may legitimately have no parent, update
AssessmentExportRow.assessment_id type in models.py to allow None (int | None)
and keep passing run.assessment_id through; locate the construction of
AssessmentExportRow in export_rows.append and implement one of these two fixes
around that symbol.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 072190d8-1f9b-4ad9-9f98-d507cb197d57
📒 Files selected for processing (21)
backend/app/alembic/versions/055_add_assessment_manager_table.pybackend/app/api/main.pybackend/app/api/routes/cron.pybackend/app/assessment/__init__.pybackend/app/assessment/batch.pybackend/app/assessment/cron.pybackend/app/assessment/crud.pybackend/app/assessment/dataset.pybackend/app/assessment/events.pybackend/app/assessment/mappers.pybackend/app/assessment/models.pybackend/app/assessment/processing.pybackend/app/assessment/routes.pybackend/app/assessment/service.pybackend/app/assessment/utils/__init__.pybackend/app/assessment/utils/export.pybackend/app/assessment/utils/parsing.pybackend/app/assessment/validators.pybackend/app/models/__init__.pybackend/app/models/batch_job.pybackend/app/models/llm/request.py
✅ Files skipped from review due to trivial changes (5)
- backend/app/models/batch_job.py
- backend/app/assessment/init.py
- backend/app/api/main.py
- backend/app/assessment/utils/parsing.py
- backend/app/models/init.py
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/app/models/llm/request.py
- backend/app/assessment/utils/init.py
- backend/app/assessment/routes.py
| revision = "55" | ||
| down_revision = "054" |
There was a problem hiding this comment.
Revision ID is "55" but the filename and down_revision use zero-padded "055" / "054".
Existing migrations are zero-padded three-digit IDs, and down_revision = "054". Setting revision = "55" (without padding) breaks the visual ordering convention and creates a mismatch with the filename 055_add_assessment_manager_table.py. Use "055" to match the established scheme — the next migration (which will likely set its down_revision to this id) will be far less surprising.
🛠️ Proposed change
-revision = "55"
+revision = "055"
down_revision = "054"As per coding guidelines: "Generate database migrations using alembic revision --autogenerate -m "Description" --rev-id <number> where rev-id is the latest existing revision ID + 1".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| revision = "55" | |
| down_revision = "054" | |
| revision = "055" | |
| down_revision = "054" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/alembic/versions/055_add_assessment_manager_table.py` around
lines 15 - 16, The migration's revision identifier is not zero-padded and
mismatches the filename and down_revision; update the module-level variable
revision from "55" to "055" so it matches the filename and existing
down_revision = "054", keeping the three-digit convention used across migrations
and ensuring subsequent revisions align.
| def _parse_excel_rows(content: bytes) -> list[dict[str, str]]: | ||
| """Parse Excel content into list of row dicts.""" | ||
| import openpyxl | ||
|
|
||
| wb = openpyxl.load_workbook(io.BytesIO(content), read_only=True, data_only=True) | ||
| ws = wb.active | ||
| if ws is None: | ||
| wb.close() | ||
| return [] | ||
|
|
||
| rows_iter = ws.iter_rows(values_only=True) | ||
| header = next(rows_iter, None) | ||
| if header is None: | ||
| wb.close() | ||
| return [] | ||
|
|
||
| columns = [str(h) if h is not None else f"col_{i}" for i, h in enumerate(header)] | ||
| result = [] | ||
| for row in rows_iter: | ||
| if row and any(cell is not None for cell in row): | ||
| row_dict = { | ||
| columns[i]: str(cell) if cell is not None else "" | ||
| for i, cell in enumerate(row) | ||
| if i < len(columns) | ||
| } | ||
| result.append(row_dict) | ||
|
|
||
| wb.close() | ||
| return result |
There was a problem hiding this comment.
wb.close() is not guaranteed on iteration errors.
If iter_rows/next or any conversion in the row-build raises (e.g., extremely malformed .xlsx), the workbook handle is leaked because close() is only called on the success paths. Wrap the body in try/finally to ensure cleanup.
♻️ Suggested fix
def _parse_excel_rows(content: bytes) -> list[dict[str, str]]:
"""Parse Excel content into list of row dicts."""
import openpyxl
wb = openpyxl.load_workbook(io.BytesIO(content), read_only=True, data_only=True)
- ws = wb.active
- if ws is None:
- wb.close()
- return []
-
- rows_iter = ws.iter_rows(values_only=True)
- header = next(rows_iter, None)
- if header is None:
- wb.close()
- return []
-
- columns = [str(h) if h is not None else f"col_{i}" for i, h in enumerate(header)]
- result = []
- for row in rows_iter:
- if row and any(cell is not None for cell in row):
- row_dict = {
- columns[i]: str(cell) if cell is not None else ""
- for i, cell in enumerate(row)
- if i < len(columns)
- }
- result.append(row_dict)
-
- wb.close()
- return result
+ try:
+ ws = wb.active
+ if ws is None:
+ return []
+
+ rows_iter = ws.iter_rows(values_only=True)
+ header = next(rows_iter, None)
+ if header is None:
+ return []
+
+ columns = [str(h) if h is not None else f"col_{i}" for i, h in enumerate(header)]
+ result = []
+ for row in rows_iter:
+ if row and any(cell is not None for cell in row):
+ row_dict = {
+ columns[i]: str(cell) if cell is not None else ""
+ for i, cell in enumerate(row)
+ if i < len(columns)
+ }
+ result.append(row_dict)
+ return result
+ finally:
+ wb.close()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/assessment/batch.py` around lines 107 - 135, The workbook created
in _parse_excel_rows (wb returned by openpyxl.load_workbook) may never be closed
if iter_rows, next(header) or the row-to-dict conversion raises; wrap the
workbook usage in a try/finally (or context manager if available) so wb.close()
is always called in the finally block after opening wb and before returning, and
move any early returns (e.g., when ws is None or header is None) into that try
so they still hit the finally cleanup; ensure variables referenced (wb, ws,
rows_iter, header, result) are initialized inside the try scope.
| base_provider = _NATIVE_PROVIDERS.get(provider_name, "openai") | ||
|
|
||
| if base_provider == "openai": | ||
| mapped_params, warnings = map_kaapi_to_openai_params(params) | ||
| if warnings: | ||
| logger.info(f"[submit_assessment_batch] Mapper warnings: {warnings}") | ||
|
|
||
| jsonl_data = build_openai_jsonl( | ||
| rows=rows, | ||
| text_columns=text_columns, | ||
| attachments=attachments, | ||
| prompt_template=prompt_template, | ||
| openai_params=mapped_params, | ||
| ) | ||
|
|
||
| # Get OpenAI client and submit | ||
| from app.utils import get_openai_client | ||
|
|
||
| openai_client = get_openai_client( | ||
| session=session, | ||
| org_id=organization_id, | ||
| project_id=project_id, | ||
| ) | ||
| provider = OpenAIBatchProvider(client=openai_client) | ||
|
|
||
| batch_config = { | ||
| "endpoint": "/v1/responses", | ||
| "description": f"Assessment: {run.run_name}", | ||
| "completion_window": "24h", | ||
| } | ||
|
|
||
| batch_job = start_batch_job( | ||
| session=session, | ||
| provider=provider, | ||
| provider_name="openai", | ||
| job_type=BatchJobType.ASSESSMENT, | ||
| organization_id=organization_id, | ||
| project_id=project_id, | ||
| jsonl_data=jsonl_data, | ||
| config=batch_config, | ||
| ) | ||
|
|
||
| elif base_provider == "google": | ||
| mapped_params, warnings = map_kaapi_to_google_params(params) | ||
| if warnings: | ||
| logger.info(f"[submit_assessment_batch] Mapper warnings: {warnings}") | ||
|
|
||
| jsonl_data = build_google_jsonl( | ||
| rows=rows, | ||
| text_columns=text_columns, | ||
| attachments=attachments, | ||
| prompt_template=prompt_template, | ||
| google_params=mapped_params, | ||
| ) | ||
|
|
||
| # Get Gemini client and submit | ||
| from app.core.batch import GeminiBatchProvider | ||
| from app.core.batch.client import GeminiClient | ||
|
|
||
| gemini_client = GeminiClient.from_credentials( | ||
| session=session, | ||
| org_id=organization_id, | ||
| project_id=project_id, | ||
| ) | ||
| provider = GeminiBatchProvider( | ||
| client=gemini_client.client, | ||
| model=f"models/{mapped_params.get('model', 'gemini-2.5-pro')}", | ||
| ) | ||
|
|
||
| batch_config = { | ||
| "display_name": f"assessment-{run.run_name}", | ||
| "model": f"models/{mapped_params.get('model', 'gemini-2.5-pro')}", | ||
| } | ||
|
|
||
| batch_job = start_batch_job( | ||
| session=session, | ||
| provider=provider, | ||
| provider_name="google", | ||
| job_type=BatchJobType.ASSESSMENT, | ||
| organization_id=organization_id, | ||
| project_id=project_id, | ||
| jsonl_data=jsonl_data, | ||
| config=batch_config, | ||
| ) | ||
|
|
||
| else: | ||
| raise ValueError( | ||
| f"Unsupported provider for assessment batches: {provider_name}" | ||
| ) |
There was a problem hiding this comment.
Unknown provider names silently fall through to the OpenAI batch path.
base_provider = _NATIVE_PROVIDERS.get(provider_name, "openai") defaults unknown providers to "openai", so the if base_provider == "openai": branch always matches and the else: raise ValueError(...) at lines 566‑569 is unreachable. A typo or new provider value (e.g., "anthropic") will silently submit through OpenAI rather than fail loudly.
♻️ Suggested fix — fail fast on unknown providers
- # Determine the base provider (openai or google)
- base_provider = _NATIVE_PROVIDERS.get(provider_name, "openai")
+ # Determine the base provider (openai or google)
+ base_provider = _NATIVE_PROVIDERS.get(provider_name)
+ if base_provider is None:
+ raise ValueError(
+ f"Unsupported provider for assessment batches: {provider_name}"
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/assessment/batch.py` around lines 481 - 569, The code is
defaulting unknown provider_name to "openai" via _NATIVE_PROVIDERS.get(...,
"openai"), causing unknown providers to silently use the OpenAI path; change
this to fail fast by using _NATIVE_PROVIDERS.get(provider_name) (no default) or
explicitly checking if provider_name not in _NATIVE_PROVIDERS and raising
ValueError, then use base_provider for the if/elif branches (referencing the
variables base_provider and provider_name and the mapping _NATIVE_PROVIDERS);
ensure the unreachable else that raises ValueError remains reachable so
start_batch_job is only called for known providers.
| def _build_callback_payload( | ||
| assessment: Assessment, | ||
| run: AssessmentRun, | ||
| result: dict[str, Any], | ||
| ) -> dict[str, Any]: | ||
| """Build minimal SSE payload for assessment invalidation.""" | ||
| return APIResponse.success_response( | ||
| data={ | ||
| "type": "assessment.child_status_changed", | ||
| "assessment_id": assessment.id, | ||
| "assessment_status": assessment.status, | ||
| "run": { | ||
| "id": run.id, | ||
| "config_id": str(run.config_id) if run.config_id else None, | ||
| "config_version": run.config_version, | ||
| "status": result.get("current_status"), | ||
| "error": result.get("error"), | ||
| "updated_at": run.updated_at.isoformat() if run.updated_at else None, | ||
| }, | ||
| } | ||
| ).model_dump() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect APIResponse.success_response to confirm wrapping shape.
fd -t f utils.py -p backend/app | head
ast-grep --pattern $'class APIResponse$$$:
$$$
'
rg -nP -C5 "def success_response" --type=pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1073
🏁 Script executed:
# Get full APIResponse class definition
sed -n '50,75p' backend/app/utils.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 863
🏁 Script executed:
# Check the publish method in events.py around line 29
fd -t f 'events.py' backend/app/assessmentRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 108
🏁 Script executed:
# Look at events.py to see how payload is used
rg -n "def publish" backend/app/assessment -A 20 --type=pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1084
🏁 Script executed:
# Check if there are any model_dump calls or customizations in APIResponse
rg -n "model_dump\|class_config\|ConfigDict" backend/app/utils.py --type=py -B 2 -A 2Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Check where the payload from the queue is consumed
rg -n "queue.get\|from_queue\|on_event" backend/app/assessment -A 10 --type=pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Check SSE streaming to confirm event_type usage
rg -n "event_type\|event:" backend/app/assessment --type=py -B 2 -A 2Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Verify type hints are complete on _build_callback_payload
sed -n '41,62p' backend/app/assessment/cron.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 891
🏁 Script executed:
# Find where assessment_event_broker is imported and used in cron.py
rg -n "assessment_event_broker" backend/app/assessment/cron.py --type=py -B 2 -A 2Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 857
🏁 Script executed:
# Find SSE route/handler that consumes assessment_event_broker
rg -n "assessment_event_broker\|AssessmentEventBroker" backend/app --type=py -lRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Look at the full events.py to understand the complete flow
cat backend/app/assessment/events.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1777
SSE event: line will degrade to "message" — "type" is nested inside APIResponse.success_response(...).data, not at payload root.
The subscribe handler in backend/app/assessment/events.py at line 28 extracts the event type:
event_type = payload.get("type", "message")
yield f"event: {event_type}\ndata: {json.dumps(payload)}\n\n"But _build_callback_payload wraps the callback dict in APIResponse.success_response(data={...}).model_dump(), producing:
{
"success": true,
"data": {
"type": "assessment.child_status_changed",
"assessment_id": "...",
...
},
"error": null,
...
}The root-level payload.get("type") returns None and falls back to "message", so all assessment events are sent as event: message, defeating client-side listeners expecting event: assessment.child_status_changed.
Either lift "type" to the root of the published payload or update events.py::subscribe() to look at payload["data"]["type"].
Suggested fix (option A — flatten)
- return APIResponse.success_response(
- data={
- "type": "assessment.child_status_changed",
- "assessment_id": assessment.id,
- "assessment_status": assessment.status,
- "run": {
- "id": run.id,
- "config_id": str(run.config_id) if run.config_id else None,
- "config_version": run.config_version,
- "status": result.get("current_status"),
- "error": result.get("error"),
- "updated_at": run.updated_at.isoformat() if run.updated_at else None,
- },
- }
- ).model_dump()
+ envelope = APIResponse.success_response(
+ data={
+ "assessment_id": assessment.id,
+ "assessment_status": assessment.status,
+ "run": {
+ "id": run.id,
+ "config_id": str(run.config_id) if run.config_id else None,
+ "config_version": run.config_version,
+ "status": result.get("current_status"),
+ "error": result.get("error"),
+ "updated_at": run.updated_at.isoformat() if run.updated_at else None,
+ },
+ }
+ ).model_dump()
+ envelope["type"] = "assessment.child_status_changed"
+ return envelope🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/assessment/cron.py` around lines 41 - 61, The payload returned by
_build_callback_payload nests "type" inside
APIResponse.success_response(...).data so events.py's subscribe() (which reads
payload.get("type", "message")) always falls back to "message"; fix by promoting
the event type to the root of the published payload: in _build_callback_payload
(or immediately after calling APIResponse.success_response(...).model_dump())
ensure the returned dict contains a top-level "type":
"assessment.child_status_changed" key (while keeping the detailed run/assessment
info under "data"), so subscribe() can detect the event without changing
events.py.
| except Exception as e: | ||
| logger.error( | ||
| "[poll_all_pending_assessment_evaluations] " | ||
| f"Failed run {run.id} | " | ||
| f"experiment={run.run_name} | " | ||
| f"assessment_id={run.assessment_id} | " | ||
| f"config_id={run.config_id} | " | ||
| f"config_version={run.config_version} | " | ||
| f"error={e}", | ||
| exc_info=True, | ||
| ) | ||
| update_assessment_run_status( | ||
| session=session, | ||
| run=run, | ||
| status="failed", | ||
| error_message="Processing failed. Check server logs for details.", | ||
| ) | ||
| refreshed_assessment = recompute_assessment_status( | ||
| session=session, assessment_id=assessment.id | ||
| ) | ||
| failure_result = { | ||
| "assessment_id": run.assessment_id, | ||
| "run_id": run.id, | ||
| "run_name": run.run_name, | ||
| "config_id": str(run.config_id) if run.config_id else None, | ||
| "config_version": run.config_version, | ||
| "action": "failed", | ||
| "error": "Processing failed", | ||
| "current_status": "failed", | ||
| } | ||
| all_results.append(failure_result) | ||
| assessment_event_broker.publish( | ||
| _build_callback_payload( | ||
| refreshed_assessment, | ||
| run, | ||
| failure_result, | ||
| ) | ||
| ) | ||
| failed += 1 |
There was a problem hiding this comment.
Cleanup inside the broad except can itself raise and abort the cron loop.
update_assessment_run_status and recompute_assessment_status both perform session.commit() and re-raise on failure (see backend/app/assessment/crud.py:192-222 and :255-315). If a transient DB error hits while we're handling an unrelated check_and_process_assessment exception, the secondary exception escapes this except and unwinds out of the for-loop, skipping every remaining assessment for this cron tick.
Wrap the cleanup ops in their own try/except (logging + continue) so one bad run can't poison the whole batch.
🛡️ Sketch
except Exception as e:
logger.error( ... )
- update_assessment_run_status(
- session=session,
- run=run,
- status="failed",
- error_message="Processing failed. Check server logs for details.",
- )
- refreshed_assessment = recompute_assessment_status(
- session=session, assessment_id=assessment.id
- )
+ try:
+ update_assessment_run_status(
+ session=session,
+ run=run,
+ status="failed",
+ error_message="Processing failed. Check server logs for details.",
+ )
+ refreshed_assessment = recompute_assessment_status(
+ session=session, assessment_id=assessment.id
+ )
+ except Exception as cleanup_err:
+ logger.error(
+ "[poll_all_pending_assessment_evaluations] "
+ f"Cleanup failed for run {run.id}: {cleanup_err}",
+ exc_info=True,
+ )
+ failed += 1
+ continue
...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/assessment/cron.py` around lines 144 - 182, The cleanup steps
inside the broad except should be isolated so a secondary DB error can't abort
the whole cron loop: wrap the calls to update_assessment_run_status,
recompute_assessment_status, the construction/appending of failure_result,
assessment_event_broker.publish (and the failed += 1) in their own try/except
block; on exception catch and logger.error with context (include
run.id/run.assessment_id/run.run_name) and exc_info=True, then continue the
for-loop without re-raising so one failed cleanup won't stop processing
remaining runs.
| def update_assessment_run_status( | ||
| session: Session, | ||
| run: AssessmentRun, | ||
| status: str, | ||
| error_message: str | None = None, | ||
| batch_job_id: int | None = None, | ||
| total_items: int | None = None, | ||
| object_store_url: str | None = None, | ||
| ) -> AssessmentRun: | ||
| """Update an assessment run's status and optional fields.""" | ||
| run.status = status | ||
| run.updated_at = now() | ||
|
|
||
| if error_message is not None: | ||
| run.error_message = error_message | ||
| if batch_job_id is not None: | ||
| run.batch_job_id = batch_job_id | ||
| if total_items is not None: | ||
| run.total_items = total_items | ||
| if object_store_url is not None: | ||
| run.object_store_url = object_store_url | ||
|
|
||
| session.add(run) | ||
| try: | ||
| session.commit() | ||
| session.refresh(run) | ||
| except Exception as e: | ||
| session.rollback() | ||
| logger.error(f"[update_assessment_run_status] Failed: {e}", exc_info=True) | ||
| raise | ||
|
|
||
| return run |
There was a problem hiding this comment.
error_message=None cannot clear an existing error.
The pattern if error_message is not None: run.error_message = error_message (and the same for batch_job_id, total_items, object_store_url) means once any of these are populated, callers cannot reset them through this function, only overwrite them. If a retry/recovery path ever needs to clear error_message on transition back to processing/completed, this will silently keep stale data.
If the intent is "merge non-default fields", this is fine — just be aware of the limitation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/assessment/crud.py` around lines 192 - 223, The
update_assessment_run_status function currently treats None as "do not change",
preventing callers from clearing fields; fix this by introducing a unique
sentinel (e.g., UNSET = object()) as the default for error_message,
batch_job_id, total_items, and object_store_url, then change the checks to "is
not UNSET" and assign the value (which may be None) to run.error_message,
run.batch_job_id, run.total_items, and run.object_store_url inside
update_assessment_run_status so callers can explicitly clear fields by passing
None; update any call sites accordingly to use the sentinel omission vs explicit
None semantics.
| elif provider_name in ("google", "google-native"): | ||
| response = result.get("response") | ||
| error = result.get("error") | ||
|
|
||
| if error: | ||
| results.append( | ||
| { | ||
| "row_id": row_id, | ||
| "output": None, | ||
| "error": str(error), | ||
| "usage": None, | ||
| } | ||
| ) | ||
| continue | ||
|
|
||
| if response: | ||
| text = extract_text_from_response_dict(response) | ||
| results.append( | ||
| { | ||
| "row_id": row_id, | ||
| "output": text, | ||
| "error": None, | ||
| "usage": None, | ||
| } | ||
| ) | ||
| else: | ||
| results.append( | ||
| { | ||
| "row_id": row_id, | ||
| "output": None, | ||
| "error": "Empty response", | ||
| "usage": None, | ||
| } | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm the Gemini response shape and where usage metadata / errors live in the parsed dict.
fd -t f 'gemini.py' backend/app/core/batch
rg -n -C2 'usageMetadata|usage_metadata|extract_text_from_response_dict|promptFeedback|finishReason' backend/app/core/batchRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1618
🏁 Script executed:
# Check the extract_text_from_response_dict function implementation
head -100 backend/app/core/batch/gemini.py | cat -nRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 3570
🏁 Script executed:
# Look for usageMetadata handling in gemini.py
rg -n 'usageMetadata|usage|metadata' backend/app/core/batch/gemini.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Check the OpenAI branch to see how it handles usage
sed -n '198,231p' backend/app/assessment/processing.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1157
🏁 Script executed:
# Look at the broader context in processing.py to see the OpenAI branch
sed -n '150,231p' backend/app/assessment/processing.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 3368
🏁 Script executed:
# Check the usage_totals function in export.py
rg -n -A10 'def usage_totals' backend/app/assessment/export.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 148
🏁 Script executed:
# Find export.py
fd -t f 'export.py' backend/Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 114
🏁 Script executed:
# Look for usage extraction in Gemini context
rg -n 'usageMetadata|usage_metadata' backend/Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 3629
🏁 Script executed:
# Search for Gemini batch response handling to understand structure
rg -n -B3 -A3 'candidates|finishReason|promptFeedback' backend/app/core/batch/gemini.py | head -50Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 661
🏁 Script executed:
# Check if there's test data or fixtures showing Gemini response structure
fd -t f '*.json' backend/ | xargs rg -l 'usageMetadata|candidates' | head -5Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 632
🏁 Script executed:
# Check usage_totals function
rg -n -A15 'def usage_totals' backend/app/assessment/utils/export.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Look at Gemini batch test to see response structure
cat -n backend/app/tests/core/batch/test_gemini.py | head -150Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 6852
🏁 Script executed:
# Check if Gemini batch processing includes usageMetadata in response dict
rg -n -B5 -A5 'usageMetadata\|"usage"' backend/app/core/batch/gemini.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Check how Gemini processes batch results and constructs the response dict
sed -n '200,350p' backend/app/core/batch/gemini.py | cat -nRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 6925
🏁 Script executed:
# Check if finishReason is checked anywhere in batch processing
rg -n 'finishReason|finish_reason|SAFETY|BLOCKED' backend/app/core/batch/gemini.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Look for usageMetadata in Gemini batch response handling
rg -n '"usageMetadata"|usage' backend/app/core/batch/gemini.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Check test data to see actual Gemini batch response structure
rg -n -A20 'download_batch_results\|response.*dict' backend/app/tests/core/batch/test_gemini.py | head -80Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Verify the Anthropic implementation to compare error handling patterns
sed -n '130,200p' backend/app/assessment/processing.py | cat -nRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 3672
🏁 Script executed:
# Check Gemini API documentation or response examples in tests/batch
rg -n -B10 -A10 '"response"' backend/app/tests/core/batch/test_gemini.py | head -100Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 4494
🏁 Script executed:
# Search for example response structure in test data
rg -n '"candidates"' backend/app/tests/core/batch/test_gemini.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 521
🏁 Script executed:
# Check if there's any code that extracts usage from Gemini batch response
rg -n 'response\.' backend/app/core/batch/gemini.py | grep -i usageRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
Gemini branch has two unhandled cases that diverge from Anthropic behavior.
-
Usage data is dropped:
usageis hard-coded toNoneeven when processing Gemini responses. The Anthropic branch extracts this viaresponse_body.get("usage"). However, verify whether Gemini batch responses include usage metadata—the batch response structure shown in tests (candidates/content/parts) differs from direct API responses. -
Empty text is not treated as an error: When
extract_text_from_response_dict()returns an empty string, the row is recorded as a success witherror=None. The Anthropic branch handles this by settingerror="Empty response output"when no text is extracted. Apply the same logic to the Google branch.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/assessment/processing.py` around lines 198 - 231, In the
Google/Gemini branch (the block handling provider_name in
("google","google-native")), populate the "usage" field from the response
instead of hard-coding None (e.g., extract response.get("usage") or, if Gemini
batch responses use a different shape, parse the batch structure to pull
usage/metadata) and treat an empty string from
extract_text_from_response_dict(response) as an error by setting error="Empty
response output" (mirror the Anthropic branch behavior); update the logic around
response, text = extract_text_from_response_dict(response) and the
results.append entries to include the extracted usage and to set error when text
is empty.
| Returns: | ||
| Dict with status information | ||
| """ | ||
| log_prefix = f"[assessment_run={run.id}]" |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Log prefix should include the function name.
log_prefix = f"[assessment_run={run.id}]" is later used in logger.info/logger.error calls (lines 322‑325, 356, 456‑458), so those messages won't carry the function name as required by the project's logging convention.
♻️ Suggested adjustment
- log_prefix = f"[assessment_run={run.id}]"
+ log_prefix = f"[check_and_process_assessment] assessment_run={run.id}"As per coding guidelines: "Prefix all log messages with the function name in square brackets: logger.info(f"[function_name] Message {mask_string(sensitive_value)}")".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| log_prefix = f"[assessment_run={run.id}]" | |
| log_prefix = f"[check_and_process_assessment] assessment_run={run.id}" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/assessment/processing.py` at line 253, Update the log_prefix to
include the current function name so all subsequent logger calls follow the
project's convention; replace the line setting log_prefix =
f"[assessment_run={run.id}]" with something like log_prefix =
f"[{inspect.currentframe().f_code.co_name}][assessment_run={run.id}]" (and add
"import inspect" if missing) so functions like the one containing run.id will
prefix logs with the function name and the assessment_run id used in
logger.info/logger.error calls.
| for row in row_payload: | ||
| input_data = row.pop("input_data", None) or {} | ||
| new_row = {} | ||
| for k in input_keys: | ||
| new_row[k] = input_data.get(k) | ||
| new_row.update(row) | ||
| expanded.append(new_row) |
There was a problem hiding this comment.
Dataset column name collisions silently lose user data.
new_row[k] = input_data.get(k) followed by new_row.update(row) means any input column whose name matches an AssessmentExportRow field (e.g. output, error, row_id, run_id, result_status, response_id) will be overwritten by the result column on every row, with no warning. This is easy to hit if a dataset has an output or error column.
Consider namespacing input columns (e.g. input_<col>) when they collide with reserved field names, or warn once during expansion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/assessment/utils/export.py` around lines 60 - 66, The expansion
loop in export.py silently overwrites input columns that share names with export
fields (e.g. output, error, row_id, run_id, result_status, response_id) by
setting new_row[k] from input_data then calling new_row.update(row); modify the
loop that builds new_row (the variables row_payload, input_data, input_keys,
new_row are present) to detect collisions against a reserved_fields set and
instead namespace conflicting input keys (e.g. store as "input_<k>") or,
alternatively, emit a single warning when any collision is detected; ensure
namespacing logic also avoids clobbering an existing "input_<k>" in row and
preserves original row fields when calling new_row.update(row).
| def sort_export_rows( | ||
| export_rows: list[AssessmentExportRow], | ||
| ) -> list[AssessmentExportRow]: | ||
| """Sort exported rows for stable downloads across runs/configs.""" | ||
| export_rows.sort( | ||
| key=lambda row: ( | ||
| row.config_version or 0, | ||
| row.row_id, | ||
| row.run_id, | ||
| ) | ||
| ) | ||
| return export_rows |
There was a problem hiding this comment.
row_id is sorted lexicographically — "row_10" comes before "row_2".
row_id is the string f"row_{idx}" (see processing.py parsing and load_export_rows_for_run at line 406). Lexicographic ordering will place row_10 before row_2, which scrambles CSV/XLSX/JSON exports for any dataset with ≥10 rows.
♻️ Suggested fix — natural sort on the index
+def _row_id_index(row_id: str) -> int:
+ if isinstance(row_id, str) and row_id.startswith("row_"):
+ try:
+ return int(row_id.split("_", 1)[1])
+ except (ValueError, IndexError):
+ return 0
+ return 0
+
def sort_export_rows(
export_rows: list[AssessmentExportRow],
) -> list[AssessmentExportRow]:
"""Sort exported rows for stable downloads across runs/configs."""
export_rows.sort(
key=lambda row: (
row.config_version or 0,
- row.row_id,
+ _row_id_index(row.row_id),
row.run_id,
)
)
return export_rows🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/assessment/utils/export.py` around lines 422 - 433, The
sort_export_rows function currently sorts by row.row_id lexicographically which
makes "row_10" come before "row_2"; update sort_export_rows (and any callers
expecting stable order like load_export_rows_for_run / AssessmentExportRow) to
extract the numeric index from row.row_id (e.g. parse the suffix after "row_")
and use that integer in the sort key instead of the raw string, keeping the
existing tie-breakers (row.config_version or 0, row.run_id); ensure the parser
handles missing/invalid suffixes by falling back to 0 so sorting remains stable.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/assessment/dataset.py`:
- Around line 22-26: Remove support for legacy .xls by deleting the ".xls" entry
from the _MIME_TYPES dict and from ALLOWED_EXTENSIONS in validators.py so .xls
files are rejected at validation (do not route them to openpyxl). Also update
the validation routine that checks ALLOWED_EXTENSIONS to explicitly reject
".xls" (and return a clear validation error). Finally, harden the parsing code
that calls openpyxl (the read/parse functions in dataset.py and batch.py where
openpyxl is invoked) by replacing the broad silent except that returns 0 rows
with either re-raising the caught InvalidFileException or logging the exception
and returning a proper error response so legacy-file parsing failures aren’t
swallowed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7f0dd174-276b-4d14-9951-73c8a226e4b3
📒 Files selected for processing (3)
backend/app/assessment/batch.pybackend/app/assessment/dataset.pybackend/app/models/llm/request.py
✅ Files skipped from review due to trivial changes (1)
- backend/app/assessment/batch.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/models/llm/request.py
| _MIME_TYPES = { | ||
| ".csv": "text/csv", | ||
| ".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", | ||
| ".xls": "application/vnd.ms-excel", | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether .xls is accepted upstream and where parsing paths are defined.
rg -n --type=py -C3 'validate_dataset_file|allowed.*(xls|xlsx|csv)|\.xls|_count_rows\(|_count_excel_rows\(|openpyxl\.load_workbook'Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 6119
🌐 Web query:
Does openpyxl.load_workbook support legacy ".xls" files, or only ".xlsx/.xlsm/.xltx/.xltm" formats?
💡 Result:
No, openpyxl.load_workbook does not support legacy ".xls" files. It only supports Office Open XML formats: ".xlsx", ".xlsm", ".xltx", and ".xltm". Attempting to load a ".xls" file raises an InvalidFileException with a message recommending xlrd or conversion to .xlsx.
Citations:
- 1: https://openpyxl.readthedocs.io/en/stable/%5Fmodules/openpyxl/reader/excel.html
- 2: https://openpyxl.readthedocs.io/en/stable/_modules/openpyxl/reader/excel.html
- 3: https://openpyxl.readthedocs.io/
- 4: https://openpyxl.readthedocs.io/en/3.1/tutorial.html
🏁 Script executed:
# Check the full _count_excel_rows function in dataset.py
sed -n '78,102p' backend/app/assessment/dataset.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 823
🏁 Script executed:
# Check the full _parse_excel_rows function in batch.py
sed -n '124,150p' backend/app/assessment/batch.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 879
🏁 Script executed:
# Also check if there's a broader exception handler after _count_rows call
rg -n -A20 'def _count_rows' backend/app/assessment/dataset.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 822
.xls is routed to an openpyxl parser that does not support legacy Excel format.
Upstream validation in validators.py:15 accepts .xls files, which are then routed to openpyxl-based parsing. Since openpyxl only supports Office Open XML formats (.xlsx, .xlsm, etc.), not legacy .xls, this fails:
- In
dataset.py: The broadexcept Exception as e:handler (line 98-100) catches the error silently and returns0rows, corrupting dataset metadata and causing downstream failures. - In
batch.py: The same openpyxl call (line 127) has no exception handling, causing an unhandledInvalidFileException.
Also applies to: backend/app/assessment/batch.py:103-105 and backend/app/assessment/batch.py:127
Proposed fix (reject legacy .xls at validation layer)
_MIME_TYPES = {
".csv": "text/csv",
".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
- ".xls": "application/vnd.ms-excel",
}
@@
def _count_rows(content: bytes, file_ext: str) -> int:
"""Count data rows in a file (CSV or Excel), excluding the header."""
- if file_ext in (".xlsx", ".xls"):
+ if file_ext == ".xlsx":
return _count_excel_rows(content)
+ if file_ext == ".xls":
+ raise HTTPException(
+ status_code=422,
+ detail="Legacy .xls files are not supported. Please upload .xlsx or .csv.",
+ )
return _count_csv_rows(content)Also remove .xls from ALLOWED_EXTENSIONS in validators.py:15.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/assessment/dataset.py` around lines 22 - 26, Remove support for
legacy .xls by deleting the ".xls" entry from the _MIME_TYPES dict and from
ALLOWED_EXTENSIONS in validators.py so .xls files are rejected at validation (do
not route them to openpyxl). Also update the validation routine that checks
ALLOWED_EXTENSIONS to explicitly reject ".xls" (and return a clear validation
error). Finally, harden the parsing code that calls openpyxl (the read/parse
functions in dataset.py and batch.py where openpyxl is invoked) by replacing the
broad silent except that returns 0 rows with either re-raising the caught
InvalidFileException or logging the exception and returning a proper error
response so legacy-file parsing failures aren’t swallowed.
…ssessment modules
…ous evaluation processing
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (7)
backend/app/assessment/events.py (2)
36-48: Tighten thepayloadtype annotation.
dictwithout parameters loses information for callers and static analyzers. Since payloads are JSON-serialized and the type field is read via.get("type"), preferdict[str, Any]to match the queue type and align with the Python 3.11+ typing convention used elsewhere in the codebase.♻️ Proposed fix
- def publish(self, payload: dict) -> None: + def publish(self, payload: dict[str, Any]) -> None:Add the import at the top:
-from collections.abc import AsyncIterator +from collections.abc import AsyncIterator +from typing import AnyOptionally also update the queue annotation on Line 13 / 16 to
asyncio.Queue[dict[str, Any]]for consistency.As per coding guidelines: "Always add type hints to all function parameters and return values in Python code" and "Use Python 3.11+ with type hints throughout the codebase".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/events.py` around lines 36 - 48, Update the publish method signature to use a precise payload type: change payload: dict to payload: dict[str, Any] and add from typing import Any to the module imports; also update any asyncio.Queue annotations used for subscribers (e.g., asyncio.Queue[dict[str, Any]] or the _subscribers collection element type) so the queue and publish agree on dict[str, Any] payloads, keeping the logic unchanged.
24-24: Use builtinTimeoutErrorinstead of the deprecatedasyncio.TimeoutErroralias.Since Python 3.11,
asyncio.TimeoutErroris a deprecated alias for the builtinTimeoutError(per Ruff UP041). Given the codebase targets Python 3.11+, prefer the builtin.♻️ Proposed fix
- except asyncio.TimeoutError: + except TimeoutError: yield ": keep-alive\n\n" continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/events.py` at line 24, Replace the deprecated asyncio.TimeoutError alias with the builtin TimeoutError in the exception handler: change the except clause currently written as "except asyncio.TimeoutError" in backend/app/assessment/events.py to "except TimeoutError" and remove any now-unused reference to asyncio if that's the only usage in this module; ensure the surrounding function/method (the exception handling block in events.py) still behaves the same.backend/app/tests/api/routes/test_cron.py (1)
28-47: Consider asserting the newassessmentmerge behavior and aligning the mock shape with the production contract.The three updated tests patch
poll_all_pending_assessment_evaluationsbut never assert on the new behavior the cron handler adds (merging assessment counts into the totals and exposingresult["assessment"]). With all assessment counts mocked to0, the existingtotal_*assertions would still pass even if the merge logic inevaluation_cron_jobregressed (e.g., assessment counts ignored, key renamed, exception swallowed intoassessment_error). Adding at least one case where assessment counts are non-zero would actually exercise the new code path this PR is testing.Additionally, the real
poll_all_pending_assessment_evaluationsalways returnstotalanddetailsalongside the three counters (seebackend/app/assessment/cron.pylines 189-195). The mock omitting those fields drifts from the production contract; if a future change readsassessment["details"]orassessment["total"], these tests will silently keep passing.♻️ Suggested tightening (apply to one or more scenarios)
with patch( "app.api.routes.cron.process_all_pending_evaluations", new=AsyncMock(return_value=mock_result), ), patch( "app.assessment.cron.poll_all_pending_assessment_evaluations", new=AsyncMock( - return_value={"processed": 0, "failed": 0, "still_processing": 0} + return_value={ + "total": 2, + "processed": 1, + "failed": 1, + "still_processing": 0, + "details": [], + } ), ): response = client.get( f"{settings.API_V1_STR}/cron/evaluations", headers={"X-API-KEY": superuser_api_key.key}, ) assert response.status_code == 200 data = response.json() assert data["status"] == "success" - assert data["total_processed"] == 5 - assert data["total_failed"] == 0 - assert data["total_still_processing"] == 1 + # Verify assessment counts are merged into the aggregated totals + assert data["total_processed"] == 5 + 1 + assert data["total_failed"] == 0 + 1 + assert data["total_still_processing"] == 1 + 0 + assert data["assessment"]["processed"] == 1 + assert data["assessment"]["failed"] == 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/api/routes/test_cron.py` around lines 28 - 47, Update the tests that patch poll_all_pending_assessment_evaluations to return a realistic shape (include "processed", "failed", "still_processing" with at least one non-zero count plus the production fields "total" and "details") and then assert that evaluation_cron_job/cron handler merges those assessment counts into the response (check response.json()["assessment"] exists and that total_processed/total_failed/total_still_processing include the assessment counts). Ensure the mock uses AsyncMock(return_value=...) for poll_all_pending_assessment_evaluations and add assertions referencing the merged totals and the assessment sub-dictionary.backend/app/assessment/crud.py (1)
1-12: CRUD module should be placed underbackend/app/crud/per project convention.This file is located at
backend/app/assessment/crud.py, but the project convention places CRUD modules inbackend/app/crud/(as noted in learnings and confirmed by the existingbackend/app/crud/directory). If your team is intentionally adopting feature-folder organization for the assessment subsystem, document this exception inCLAUDE.mdor contributor guidelines to keep the convention consistent for reviewers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/crud.py` around lines 1 - 12, The CRUD module for Assessment is in the wrong location; move backend/app/assessment/crud.py into the shared CRUD package under backend/app/crud/ (keep filename crud.py) so that the Assessment and AssessmentRun operations live alongside other CRUD modules; update any imports that reference app.assessment.crud to app.crud.crud (or equivalent) and ensure logger = logging.getLogger(__name__) and references to Assessment and AssessmentRun continue to resolve; if you intentionally want feature-folder organization instead, add a clear exception note to CLAUDE.md or the contributor guidelines describing why assessment CRUD lives under app/assessment and list the impacted symbols (Assessment, AssessmentRun, functions in crud.py) so reviewers are aware.backend/app/assessment/mappers.py (3)
117-118: Dead branch —"null"is no longer a validsummaryvalue.
TextLLMParams.summarynow only accepts"auto" | "detailed" | "concise" | None. Thesummary == "null"check can never be True for validated params and only adds confusion. Drop the special-case.♻️ Suggested simplification
- if summary is not None: - reasoning_payload["summary"] = None if summary == "null" else summary + if summary is not None: + reasoning_payload["summary"] = summary🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` around lines 117 - 118, The `"null"` branch is dead because TextLLMParams.summary no longer allows "null"; inside the block that checks `if summary is not None` (where `reasoning_payload` and `summary` are used) remove the `summary == "null"` special-case and assign the `summary` value directly (e.g., set `reasoning_payload["summary"]` to `summary`), leaving None handling as-is; update any related comments if present.
96-98: Implicit aliasing ofreasoning→effortis non-obvious.
effort = kaapi_params.get("effort") or reasoningquietly promotesTextLLMParams.reasoning(Literallow/medium/high) into the OpenAIreasoning.effortslot wheneffortisn't set. The two fields have different declared enums and semantics (reasoningis described as “Reasoning configuration or instructions,” whileeffortis the OpenAI-specific reasoning effort). At minimum add a short comment explaining the fallback so future readers don’t conflate them; ideally, decide whetherreasoningis a deprecated alias foreffortand document/log it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` around lines 96 - 98, The current fallback "effort = kaapi_params.get('effort') or reasoning" silently reuses TextLLMParams.reasoning for the OpenAI-specific reasoning.effort, which conflates different enums/semantics; update the mapping in backend/app/assessment/mappers.py to make this explicit by either (a) adding a clear comment above the lines referencing kaapi_params explaining that reasoning is used as a fallback for effort and why, or (b) preferably implement an explicit conversion/logging path: if kaapi_params.get("effort") is missing but reasoning is present, map/translate reasoning to the allowed effort values (or log a deprecation/warning that reasoning is being used as an alias) before assigning to effort; reference the symbols kaapi_params, reasoning, effort and ensure any enum translation or warning message clarifies the intended semantics.
1-7: Unusedlogger.
logger = logging.getLogger(__name__)is defined but never used in this module. Either drop it, or replace the silentwarnings.append(...)calls with structured logs prefixed with the function name (per the repo logging convention) so warnings are observable in the runtime, not only in the returned tuple.As per coding guidelines: “Prefix all log messages with the function name in square brackets:
logger.info(f"[function_name] Message ...")”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/assessment/mappers.py` around lines 1 - 7, The module defines logger = logging.getLogger(__name__) but never uses it; update the module to either remove the unused logger variable or (preferred) replace the silent warnings.append(...) calls in the functions in this file with structured logging using that logger (e.g., logger.warning) and follow repo convention by prefixing messages with the function name in square brackets (use the function's actual name in the message), e.g., logger.warning(f"[function_name] descriptive warning..."); keep the existing return tuple behavior but emit warnings via logger instead of only appending to the warnings list so runtime logs are observable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/assessment/crud.py`:
- Around line 270-289: Remove the dead "in_progress" branch from status checks:
anywhere the code computes processing_runs or inspects AssessmentRun.status
using r.status in {"processing", "in_progress"} (e.g., the processing_runs
calculation and the other conditional in the same module) change the set to just
"processing". Update the expressions that sum or filter runs (symbol:
processing_runs) and any related conditionals that pass values into
_determine_assessment_status so they only check for "processing" to eliminate
the unreachable "in_progress" case.
In `@backend/app/assessment/mappers.py`:
- Around line 203-213: Remove the silent passthrough of the unused "reasoning"
field (delete the assignment to google_params["reasoning"]) and instead emit
warnings when unsupported fields appear: check kaapi_params for "effort" and
"summary" and call the module logger's warning (e.g., logger.warning) to notify
users those params are ignored; preserve the existing handling for
thinking_level -> google_params["thinking_config"] and output_schema ->
google_params["output_schema"] (using _convert_json_schema_to_google) unchanged.
- Around line 95-108: The OpenAI mapper must validate the required model early:
check kaapi_params.get("model") (variable model) and if it's missing/None
immediately return the same error behavior used in the Google mapper (i.e.,
surface a validation error rather than continuing); do this before calling
litellm.supports_reasoning(f"openai/{model}") so you never call
supports_reasoning with None and so model will be present when building
openai_params. Locate the OpenAI mapper's use of model, support_reasoning, and
openai_params and add that early guard/return to mirror the Google mapper's
validation logic.
In `@backend/app/models/llm/request.py`:
- Around line 38-43: Update the Field description for the summary attribute in
the request model to remove the obsolete "null/" wording; change the text that
currently reads "Use null/None to disable." to something like "Use None to
disable." so callers aren’t misled into passing the string "null" — locate the
summary: Literal["auto", "detailed", "concise"] | None = Field(...) definition
and edit only the description string accordingly.
---
Nitpick comments:
In `@backend/app/assessment/crud.py`:
- Around line 1-12: The CRUD module for Assessment is in the wrong location;
move backend/app/assessment/crud.py into the shared CRUD package under
backend/app/crud/ (keep filename crud.py) so that the Assessment and
AssessmentRun operations live alongside other CRUD modules; update any imports
that reference app.assessment.crud to app.crud.crud (or equivalent) and ensure
logger = logging.getLogger(__name__) and references to Assessment and
AssessmentRun continue to resolve; if you intentionally want feature-folder
organization instead, add a clear exception note to CLAUDE.md or the contributor
guidelines describing why assessment CRUD lives under app/assessment and list
the impacted symbols (Assessment, AssessmentRun, functions in crud.py) so
reviewers are aware.
In `@backend/app/assessment/events.py`:
- Around line 36-48: Update the publish method signature to use a precise
payload type: change payload: dict to payload: dict[str, Any] and add from
typing import Any to the module imports; also update any asyncio.Queue
annotations used for subscribers (e.g., asyncio.Queue[dict[str, Any]] or the
_subscribers collection element type) so the queue and publish agree on
dict[str, Any] payloads, keeping the logic unchanged.
- Line 24: Replace the deprecated asyncio.TimeoutError alias with the builtin
TimeoutError in the exception handler: change the except clause currently
written as "except asyncio.TimeoutError" in backend/app/assessment/events.py to
"except TimeoutError" and remove any now-unused reference to asyncio if that's
the only usage in this module; ensure the surrounding function/method (the
exception handling block in events.py) still behaves the same.
In `@backend/app/assessment/mappers.py`:
- Around line 117-118: The `"null"` branch is dead because TextLLMParams.summary
no longer allows "null"; inside the block that checks `if summary is not None`
(where `reasoning_payload` and `summary` are used) remove the `summary ==
"null"` special-case and assign the `summary` value directly (e.g., set
`reasoning_payload["summary"]` to `summary`), leaving None handling as-is;
update any related comments if present.
- Around line 96-98: The current fallback "effort = kaapi_params.get('effort')
or reasoning" silently reuses TextLLMParams.reasoning for the OpenAI-specific
reasoning.effort, which conflates different enums/semantics; update the mapping
in backend/app/assessment/mappers.py to make this explicit by either (a) adding
a clear comment above the lines referencing kaapi_params explaining that
reasoning is used as a fallback for effort and why, or (b) preferably implement
an explicit conversion/logging path: if kaapi_params.get("effort") is missing
but reasoning is present, map/translate reasoning to the allowed effort values
(or log a deprecation/warning that reasoning is being used as an alias) before
assigning to effort; reference the symbols kaapi_params, reasoning, effort and
ensure any enum translation or warning message clarifies the intended semantics.
- Around line 1-7: The module defines logger = logging.getLogger(__name__) but
never uses it; update the module to either remove the unused logger variable or
(preferred) replace the silent warnings.append(...) calls in the functions in
this file with structured logging using that logger (e.g., logger.warning) and
follow repo convention by prefixing messages with the function name in square
brackets (use the function's actual name in the message), e.g.,
logger.warning(f"[function_name] descriptive warning..."); keep the existing
return tuple behavior but emit warnings via logger instead of only appending to
the warnings list so runtime logs are observable.
In `@backend/app/tests/api/routes/test_cron.py`:
- Around line 28-47: Update the tests that patch
poll_all_pending_assessment_evaluations to return a realistic shape (include
"processed", "failed", "still_processing" with at least one non-zero count plus
the production fields "total" and "details") and then assert that
evaluation_cron_job/cron handler merges those assessment counts into the
response (check response.json()["assessment"] exists and that
total_processed/total_failed/total_still_processing include the assessment
counts). Ensure the mock uses AsyncMock(return_value=...) for
poll_all_pending_assessment_evaluations and add assertions referencing the
merged totals and the assessment sub-dictionary.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 259f6741-a7b6-438c-973c-4239e35527ce
📒 Files selected for processing (7)
backend/app/assessment/crud.pybackend/app/assessment/events.pybackend/app/assessment/mappers.pybackend/app/assessment/routes.pybackend/app/models/__init__.pybackend/app/models/llm/request.pybackend/app/tests/api/routes/test_cron.py
✅ Files skipped from review due to trivial changes (2)
- backend/app/models/init.py
- backend/app/assessment/routes.py
| runs = list(session.exec(statement).all()) | ||
|
|
||
| pending_runs = sum(1 for r in runs if r.status == "pending") | ||
| processing_runs = sum(1 for r in runs if r.status in {"processing", "in_progress"}) | ||
| completed_runs = sum(1 for r in runs if r.status == "completed") | ||
| failed_runs = sum(1 for r in runs if r.status == "failed") | ||
| total_runs = len(runs) | ||
|
|
||
| assessment.total_runs = total_runs | ||
| assessment.pending_runs = pending_runs | ||
| assessment.processing_runs = processing_runs | ||
| assessment.completed_runs = completed_runs | ||
| assessment.failed_runs = failed_runs | ||
| assessment.status = _determine_assessment_status( | ||
| total_runs=total_runs, | ||
| pending_runs=pending_runs, | ||
| processing_runs=processing_runs, | ||
| completed_runs=completed_runs, | ||
| failed_runs=failed_runs, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all writers/readers of AssessmentRun.status to confirm the canonical status vocabulary.
rg -nP --type=py -C2 '\b(AssessmentRun|run)\.status\s*=\s*'
echo '---'
rg -nP --type=py -C1 '"(pending|processing|in_progress|completed|failed|submitted|queued)"'
echo '---'
# Any place comparing against in_progress?
rg -nP --type=py -C2 '"in_progress"'Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 50387
🏁 Script executed:
# Check AssessmentRun model definition for status field
rg -n "class AssessmentRun" backend/app/assessment/models.py -A 100 | head -150Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 5234
🏁 Script executed:
# Find all places where AssessmentRun.status is explicitly written (not just run.status)
rg -n "AssessmentRun.*status\s*=" backend/app/assessment/ --type=pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Check update_assessment_run function and similar CRUD operations
rg -n "def.*assessment.*status|assessment.*status.*=" backend/app/assessment/crud.py -B2 -A5Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 1489
🏁 Script executed:
# Find all places in assessment module where status is assigned to any value
rg -n '\.status\s*=\s*"' backend/app/assessment/ --type=pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Also check if "in_progress" string appears anywhere in assessment module
rg -n '"in_progress"' backend/app/assessment/ --type=pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 311
🏁 Script executed:
# Find all status assignments in assessment module
rg -n 'status\s*=\s*"(pending|processing|completed|failed)"' backend/app/assessment/ --type=pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 837
🏁 Script executed:
# Also check for status assignments via update function calls
rg -n 'update_assessment_run_status\|status=.*pending\|status=.*processing\|status=.*completed\|status=.*failed' backend/app/assessment/ --type=py | head -100Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
# Check for "completed" assignments for AssessmentRun status
rg -n 'status.*=.*"completed"' backend/app/assessment/ --type=pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 386
🏁 Script executed:
# Check the update_assessment_run_status function signature and where it's called
rg -n 'update_assessment_run_status' backend/app/assessment/ --type=py -B2 -A10Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 11169
Remove the dead "in_progress" branch from status conditionals.
"in_progress" is never written to AssessmentRun.status—the column is documented and only assigned the canonical values "pending", "processing", "completed", and "failed". The conditionals at lines 273 and 98 that check for r.status in {"processing", "in_progress"} can be simplified to just "processing". This removes unnecessary dead code.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/assessment/crud.py` around lines 270 - 289, Remove the dead
"in_progress" branch from status checks: anywhere the code computes
processing_runs or inspects AssessmentRun.status using r.status in
{"processing", "in_progress"} (e.g., the processing_runs calculation and the
other conditional in the same module) change the set to just "processing".
Update the expressions that sum or filter runs (symbol: processing_runs) and any
related conditionals that pass values into _determine_assessment_status so they
only check for "processing" to eliminate the unreachable "in_progress" case.
| model = kaapi_params.get("model") | ||
| reasoning = kaapi_params.get("reasoning") | ||
| effort = kaapi_params.get("effort") or reasoning | ||
| summary = kaapi_params.get("summary") | ||
| temperature = kaapi_params.get("temperature") | ||
| top_p = kaapi_params.get("top_p") | ||
|
|
||
| instructions = normalize_llm_text(kaapi_params.get("instructions")) | ||
| knowledge_base_ids = kaapi_params.get("knowledge_base_ids") | ||
| max_num_results = kaapi_params.get("max_num_results") | ||
| response_format = kaapi_params.get("response_format") | ||
| output_schema = kaapi_params.get("output_schema") | ||
|
|
||
| support_reasoning = litellm.supports_reasoning(model=f"openai/{model}") |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
litellm supports_reasoning behavior when model name is invalid or contains "None"
💡 Result:
No, LiteLLM does not specifically support reasoning behavior when the model name is invalid or contains "None". LiteLLM's supports_reasoning(model) function checks if a model supports reasoning features by looking up the model's info in its internal registry (model_cost map or get_model_info), which contains a "supports_reasoning" flag set to true/false for known models that support reasoning (e.g., certain OpenAI o1/o3, Anthropic claude-3-7, DeepSeek models). For invalid model names (not in the registry), get_model_info returns None, and supports_reasoning treats this as False (does not support). There is no special handling or enabling of reasoning behavior for invalid models or those containing "None". If reasoning params like reasoning_effort are passed to unsupported models, LiteLLM raises UnsupportedParamsError unless drop_params=True.
Citations:
- 1: [Feature]:
litellm.supports_reasoninganddrop_paramsworking with reasoning BerriAI/litellm#8765 - 2: [Bug]:
litellm.get_model_infonot givingsupports_xhigh_reasoning_effortcorrectly BerriAI/litellm#25096 - 3: [Feat] Add litellm.supports_reasoning() util to track if an llm supports reasoning BerriAI/litellm#9923
- 4: https://docs.litellm.ai/docs/reasoning_content
- 5: _select_model_name_for_cost_calc returning "" instead of None BerriAI/litellm#6694
- 6: [Bug]: litellm.cost_calculator.py::response_cost_calculator - Returning None BerriAI/litellm#5610
🏁 Script executed:
# First, locate and read the mappers.py file
fd -t f 'mappers.py' backend/
# Then examine the file structure and the specific lines mentioned
wc -l backend/app/assessment/mappers.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 230
🏁 Script executed:
# Read the relevant sections of the mappers.py file
# Start with lines 90-150 to see the OpenAI mapper and model handling
sed -n '90,150p' backend/app/assessment/mappers.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 2507
🏁 Script executed:
# Also check lines 175-190 to see the Google mapper's validation at line 182
sed -n '175,190p' backend/app/assessment/mappers.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 519
🏁 Script executed:
# Check the function signatures and full context of both mappers
sed -n '70,220p' backend/app/assessment/mappers.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 5488
Add early validation for required model parameter in OpenAI mapper to match Google mapper.
The OpenAI mapper passes an unvalidated model to litellm.supports_reasoning() at line 108. When model is None/missing, f"openai/{model}" becomes the string "openai/None", which litellm.supports_reasoning() treats as an unsupported model and returns False. The code then proceeds silently, with model omitted from openai_params due to the truthiness check at line 142.
The Google mapper has explicit validation (lines 184–185) that returns an error when model is missing. The OpenAI mapper should be symmetric:
Suggested fix
model = kaapi_params.get("model")
+ if not model:
+ return {}, ["Missing required 'model' parameter"]
reasoning = kaapi_params.get("reasoning")
effort = kaapi_params.get("effort") or reasoning
...
support_reasoning = litellm.supports_reasoning(model=f"openai/{model}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| model = kaapi_params.get("model") | |
| reasoning = kaapi_params.get("reasoning") | |
| effort = kaapi_params.get("effort") or reasoning | |
| summary = kaapi_params.get("summary") | |
| temperature = kaapi_params.get("temperature") | |
| top_p = kaapi_params.get("top_p") | |
| instructions = normalize_llm_text(kaapi_params.get("instructions")) | |
| knowledge_base_ids = kaapi_params.get("knowledge_base_ids") | |
| max_num_results = kaapi_params.get("max_num_results") | |
| response_format = kaapi_params.get("response_format") | |
| output_schema = kaapi_params.get("output_schema") | |
| support_reasoning = litellm.supports_reasoning(model=f"openai/{model}") | |
| model = kaapi_params.get("model") | |
| if not model: | |
| return {}, ["Missing required 'model' parameter"] | |
| reasoning = kaapi_params.get("reasoning") | |
| effort = kaapi_params.get("effort") or reasoning | |
| summary = kaapi_params.get("summary") | |
| temperature = kaapi_params.get("temperature") | |
| top_p = kaapi_params.get("top_p") | |
| instructions = normalize_llm_text(kaapi_params.get("instructions")) | |
| knowledge_base_ids = kaapi_params.get("knowledge_base_ids") | |
| max_num_results = kaapi_params.get("max_num_results") | |
| response_format = kaapi_params.get("response_format") | |
| output_schema = kaapi_params.get("output_schema") | |
| support_reasoning = litellm.supports_reasoning(model=f"openai/{model}") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/assessment/mappers.py` around lines 95 - 108, The OpenAI mapper
must validate the required model early: check kaapi_params.get("model")
(variable model) and if it's missing/None immediately return the same error
behavior used in the Google mapper (i.e., surface a validation error rather than
continuing); do this before calling
litellm.supports_reasoning(f"openai/{model}") so you never call
supports_reasoning with None and so model will be present when building
openai_params. Locate the OpenAI mapper's use of model, support_reasoning, and
openai_params and add that early guard/return to mirror the Google mapper's
validation logic.
| thinking_level = kaapi_params.get("thinking_level") | ||
| if thinking_level: | ||
| google_params["thinking_config"] = {"thinking_level": thinking_level} | ||
|
|
||
| reasoning = kaapi_params.get("reasoning") | ||
| if reasoning: | ||
| google_params["reasoning"] = reasoning | ||
|
|
||
| output_schema = kaapi_params.get("output_schema") | ||
| if output_schema is not None: | ||
| google_params["output_schema"] = _convert_json_schema_to_google(output_schema) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect how google_params keys (reasoning, output_schema, instructions, thinking_config) are consumed downstream.
fd -t f 'batch.py' backend/app/assessment | xargs -I{} rg -nP -C3 '\b(google_params|map_kaapi_to_google_params|output_schema|response_schema|response_mime_type|thinking_config|instructions)\b' {}
rg -nP -C3 '\bmap_kaapi_to_google_params\b'Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 13862
🏁 Script executed:
cat -n backend/app/assessment/mappers.py | head -250Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 9101
🏁 Script executed:
rg -n '\b(reasoning|effort|summary)\b' backend/app/assessment/batch.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 57
🏁 Script executed:
rg -nC5 'effort|summary' backend/app/assessment/mappers.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1937
Remove silent reasoning passthrough and add warning for unsupported effort/summary fields.
The Google mapper has inconsistent parameter handling compared to the OpenAI mapper:
-
reasoningis set but never consumed: Line 209 setsgoogle_params["reasoning"], but batch.py lines 485-510 never extract or use it. The field is silently dropped without warning. -
effort/summarylack warning messages: Unlike the OpenAI mapper (lines 132–136), the Google mapper does not warn wheneffortorsummaryare present. These fields are silently ignored, causing data loss without user notification. -
output_schemaandthinking_configare correctly handled: These are properly renamed downstream toresponseSchema/responseMimeType(batch.py 505–508) andthinkingConfig(batch.py 502–504).
Add explicit warnings for unsupported parameters to match OpenAI mapper behavior and remove the reasoning field passthrough, which has no corresponding Gemini API surface.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/assessment/mappers.py` around lines 203 - 213, Remove the silent
passthrough of the unused "reasoning" field (delete the assignment to
google_params["reasoning"]) and instead emit warnings when unsupported fields
appear: check kaapi_params for "effort" and "summary" and call the module
logger's warning (e.g., logger.warning) to notify users those params are
ignored; preserve the existing handling for thinking_level ->
google_params["thinking_config"] and output_schema ->
google_params["output_schema"] (using _convert_json_schema_to_google) unchanged.
| summary: Literal["auto", "detailed", "concise"] | None = Field( | ||
| default=None, | ||
| description=( | ||
| "Model-specific reasoning summary preference. " "Use null/None to disable." | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Description references obsolete "null" sentinel.
The summary Literal no longer accepts the string "null" (correctly fixed earlier). The description still says “Use null/None to disable,” which can mislead callers into passing the string "null" and getting a 422. Drop the null/ mention.
✏️ Suggested wording
summary: Literal["auto", "detailed", "concise"] | None = Field(
default=None,
description=(
- "Model-specific reasoning summary preference. " "Use null/None to disable."
+ "Model-specific reasoning summary preference. Use None to disable."
),
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/models/llm/request.py` around lines 38 - 43, Update the Field
description for the summary attribute in the request model to remove the
obsolete "null/" wording; change the text that currently reads "Use null/None to
disable." to something like "Use None to disable." so callers aren’t misled into
passing the string "null" — locate the summary: Literal["auto", "detailed",
"concise"] | None = Field(...) definition and edit only the description string
accordingly.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Summary
Target issue is #PLEASE_TYPE_ISSUE_NUMBER
Explain the motivation for making this change. What existing problem does the pull request solve?
This PR introduces a shared assessment pipeline for partner programs, replacing manual and fragmented evaluation workflows. It adds multimodal dataset ingestion, config-driven batch assessment runs, status tracking, retries, cron-based processing, SSE updates, and exportable results for easier comparison and operations.
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Please add here if any other information is required for the reviewer.
Summary by CodeRabbit