Skip to content

Assessment: Multi-step LLM workflow #124

@vprashrex

Description

@vprashrex

Is your feature request related to a problem?
Currently, there is no workflow to run LLM evaluations across multiple model configurations on the same dataset, which limits users' ability to compare model performance effectively.

Describe the solution you'd like
Implement a multi-step Assessment module that includes:

  • Dataset upload
  • Column mapping
  • Prompt and config selection
  • Review
  • Results tab with retry and export support
  • Real-time status updates via SSE

Why is this enhancement needed?
This enhancement allows for side-by-side comparison of model configurations on the same dataset and keeps concerns separate by being self-contained under app/assessment/.

Original issue

Describe the current behavior
A clear description of how it currently works and what the limitations are.
No workflow exists to run LLM evaluations across multiple model configurations on the same dataset.

Describe the enhancement you'd like
A clear and concise description of the improvement you want to see.
A multi-step Assessment module covering: dataset upload, column mapping, prompt + config selection, review, and a results tab with retry and export support. Real-time status updates via SSE.

Why is this enhancement needed?
Explain the benefits (e.g., performance, usability, maintainability, scalability).
Enables side-by-side comparison of model configs on the same dataset. Module is self-contained under app/assessment/ keeping concerns separate from the rest of the app.

Additional context
Add any other context, metrics, screenshots, or examples about the enhancement here.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

In Review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions