Is your feature request related to a problem?
Currently, there is no workflow to run LLM evaluations across multiple model configurations on the same dataset, which limits users' ability to compare model performance effectively.
Describe the solution you'd like
Implement a multi-step Assessment module that includes:
- Dataset upload
- Column mapping
- Prompt and config selection
- Review
- Results tab with retry and export support
- Real-time status updates via SSE
Why is this enhancement needed?
This enhancement allows for side-by-side comparison of model configurations on the same dataset and keeps concerns separate by being self-contained under app/assessment/.
Original issue
Describe the current behavior
A clear description of how it currently works and what the limitations are.
No workflow exists to run LLM evaluations across multiple model configurations on the same dataset.
Describe the enhancement you'd like
A clear and concise description of the improvement you want to see.
A multi-step Assessment module covering: dataset upload, column mapping, prompt + config selection, review, and a results tab with retry and export support. Real-time status updates via SSE.
Why is this enhancement needed?
Explain the benefits (e.g., performance, usability, maintainability, scalability).
Enables side-by-side comparison of model configs on the same dataset. Module is self-contained under app/assessment/ keeping concerns separate from the rest of the app.
Additional context
Add any other context, metrics, screenshots, or examples about the enhancement here.
Is your feature request related to a problem?
Currently, there is no workflow to run LLM evaluations across multiple model configurations on the same dataset, which limits users' ability to compare model performance effectively.
Describe the solution you'd like
Implement a multi-step Assessment module that includes:
Why is this enhancement needed?
This enhancement allows for side-by-side comparison of model configurations on the same dataset and keeps concerns separate by being self-contained under app/assessment/.
Original issue
Describe the current behavior
A clear description of how it currently works and what the limitations are.
No workflow exists to run LLM evaluations across multiple model configurations on the same dataset.
Describe the enhancement you'd like
A clear and concise description of the improvement you want to see.
A multi-step Assessment module covering: dataset upload, column mapping, prompt + config selection, review, and a results tab with retry and export support. Real-time status updates via SSE.
Why is this enhancement needed?
Explain the benefits (e.g., performance, usability, maintainability, scalability).
Enables side-by-side comparison of model configs on the same dataset. Module is self-contained under app/assessment/ keeping concerns separate from the rest of the app.
Additional context
Add any other context, metrics, screenshots, or examples about the enhancement here.