Skip to content

Evaluation: Group Export#25

Merged
AkhileshNegi merged 1 commit intomainfrom
feat/evaluation-grouped
Jan 29, 2026
Merged

Evaluation: Group Export#25
AkhileshNegi merged 1 commit intomainfrom
feat/evaluation-grouped

Conversation

@vprashrex
Copy link
Copy Markdown
Collaborator

@vprashrex vprashrex commented Jan 29, 2026

Target Issue: #7

Summary:

  • Add grouped format option displaying multiple LLM answers per question side-by-side

  • Add format selector dropdown and CSV export for both formats: [row, grouped], by default it would be in row

  • Support CSV export for grouped format

Summary by CodeRabbit

  • New Features
    • Added grouped format display for evaluation results with a format selector to toggle between row and grouped layouts
    • Extended CSV export functionality to support both row and grouped format exports

✏️ Tip: You can customize this high-level summary in your review settings.

…t selection

Summary:
- Add grouped format option displaying multiple LLM answers per question side-by-side

- Add format selector dropdown and CSV export for both formats: [row, grouped], by default it would be in row

- Support CSV export for grouped format
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

The changes introduce support for displaying and exporting evaluation results in a grouped format alongside the existing row format. This includes backend support for an export_format query parameter, a new grouped result table component, type definitions for grouped data structures, UI controls for format selection, and CSV export functions.

Changes

Cohort / File(s) Summary
Backend API Support
app/api/evaluations/[id]/route.ts
Added support for export_format query parameter; appends it to backend URL while preserving existing parameters like get_trace_info and resync_score.
Type Definitions
app/components/types.ts
Introduced GroupedTraceItem interface for grouped trace structure, added isGroupedFormat type guard to detect grouped format, and updated NewScoreObjectV2.traces to accept union of TraceItem or GroupedTraceItem arrays.
UI Components
app/components/DetailedResultsTable.tsx
Added new GroupedResultsTable internal component that renders a multi-column table for grouped traces with dynamic column generation, fixed column widths for horizontal scrolling, and reused color-coding and tooltip logic. Maintains backward compatibility with row format via pre-check.
Page Integration
app/evaluations/[id]/page.tsx
Added exportFormat state and format selector UI control, introduced three CSV export functions (exportGroupedCSV, exportRowCSV, handleExportCSV) with format detection logic, and integrated export_format parameter into evaluation fetch requests.

Sequence Diagram

sequenceDiagram
    participant User
    participant Page as Evaluation Page
    participant Backend as Backend API
    participant Component as DetailedResultsTable

    User->>Page: Select export format (Grouped/Row)
    Page->>Page: Update exportFormat state
    Page->>Backend: Fetch evaluation with export_format parameter
    Backend->>Backend: Process format parameter
    Backend->>Page: Return evaluation data in requested format
    Page->>Component: Pass evaluation data with traces
    
    alt Grouped Format Detected
        Component->>Component: isGroupedFormat check
        Component->>Component: Render GroupedResultsTable
        Component->>Component: Calculate max answers, column widths
        Component->>Component: Render multi-column table with grouped data
    else Row Format
        Component->>Component: normalizeToIndividualScores
        Component->>Component: Render individual score rows
    end
    
    Component->>User: Display formatted results
    User->>Page: Click export CSV
    Page->>Page: handleExportCSV orchestrates
    Page->>Page: Detect format and call appropriate exporter
    Page->>User: Download CSV file
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A grouped format hops in today,
Rows and answers side-by-side to play,
Type guards dance, components align,
Export formats arrange just fine,
With colors and scrolls, the view's divine! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Evaluation: Group Export' is vague and does not clearly convey the main changes. While it mentions evaluation and grouping/export, it lacks specificity about what features were added or what the grouped format entails. Consider using a more descriptive title that captures the core feature, such as 'Add grouped format and CSV export options for evaluations' or 'Support grouped results display and dual-format CSV export'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vprashrex vprashrex self-assigned this Jan 29, 2026
@vprashrex vprashrex added the enhancement New feature or request label Jan 29, 2026
@vprashrex vprashrex linked an issue Jan 29, 2026 that may be closed by this pull request
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/evaluations/[id]/page.tsx (1)

324-354: Row export is blocked for non‑V2 score formats.

handleExportCSV now returns an error unless the score object is V2, which prevents export for NewScoreObject even though exportRowCSV() supports it via normalization. This looks like a regression for existing data.

✅ Suggested fix
-      if (!isNewScoreObjectV2(scoreObject)) {
-        toast.error('Export not available for this score format');
-        return;
-      }
-
-      const traces = scoreObject.traces;
+      if (!isNewScoreObject(scoreObject) && !isNewScoreObjectV2(scoreObject)) {
+        toast.error('Export not available for this score format');
+        return;
+      }
+
+      if (!isNewScoreObjectV2(scoreObject)) {
+        exportRowCSV();
+        return;
+      }
+
+      const traces = scoreObject.traces;
🤖 Fix all issues with AI agents
In `@app/api/evaluations/`[id]/route.ts:
- Around line 54-61: Validate and whitelist the incoming export_format before
forwarding: read the raw value from request.nextUrl.searchParams, check it
against an explicit set of allowed values (e.g., 'row', 'json', 'csv' — whatever
your backend supports) and if it isn't in the whitelist default to 'row', then
use that sanitized value when calling url.searchParams.set('export_format',
...). Update the logic around exportFormat, searchParams, and
url.searchParams.set to enforce this whitelist so arbitrary values are never
forwarded to backendUrl.

In `@app/components/types.ts`:
- Around line 59-61: normalizeToIndividualScores currently assumes
NewScoreObjectV2.traces are TraceItem and will emit nested trace_scores when
traces are grouped; update normalizeToIndividualScores to detect grouped format
using an isGroupedFormat guard (or use existing
isNewScoreObjectV2()/isLegacyScoreObject() runtime checks), and when traces are
GroupedTraceItem[] flatten them into individual TraceItem entries before
producing trace_scores (or skip/return safely if flattening isn't applicable) so
trace_scores remains a flat array.
🧹 Nitpick comments (2)
app/evaluations/[id]/page.tsx (1)

196-257: Build grouped CSV headers from all scores, not just the first answer.

Using traces[0]?.scores[0] can miss score columns that appear only in other answers/groups, leading to dropped data in exports. Consider building a union of score names across all groups/answers.

♻️ Suggested improvement
-      const scoreNames = traces[0]?.scores[0]?.map(s => s.name) || [];
+      const scoreNames = Array.from(
+        new Set(
+          traces.flatMap(group =>
+            (group.scores ?? []).flatMap(scores => (scores ?? []).map(s => s.name))
+          )
+        )
+      );
app/components/DetailedResultsTable.tsx (1)

304-345: Consider de‑duplicating formatScoreValue across row/grouped tables.

The grouped renderer re-implements score formatting; extracting a shared helper would prevent divergence and simplify future changes.

Comment on lines +54 to +61
const searchParams = request.nextUrl.searchParams;
const exportFormat = searchParams.get('export_format') || 'row';

// Build URL with query parameters
const url = new URL(`${backendUrl}/api/v1/evaluations/${id}`);
url.searchParams.set('get_trace_info', 'true');
url.searchParams.set('resync_score', 'false');
url.searchParams.set('export_format', exportFormat);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Whitelist export_format before forwarding.

Right now any query value gets passed through; invalid values can trigger backend errors or unexpected behavior. Consider validating against supported options and defaulting to row.

✅ Suggested fix
-    const exportFormat = searchParams.get('export_format') || 'row';
+    const exportFormatParam = searchParams.get('export_format');
+    const exportFormat =
+      exportFormatParam === 'row' || exportFormatParam === 'grouped'
+        ? exportFormatParam
+        : 'row';
🤖 Prompt for AI Agents
In `@app/api/evaluations/`[id]/route.ts around lines 54 - 61, Validate and
whitelist the incoming export_format before forwarding: read the raw value from
request.nextUrl.searchParams, check it against an explicit set of allowed values
(e.g., 'row', 'json', 'csv' — whatever your backend supports) and if it isn't in
the whitelist default to 'row', then use that sanitized value when calling
url.searchParams.set('export_format', ...). Update the logic around
exportFormat, searchParams, and url.searchParams.set to enforce this whitelist
so arbitrary values are never forwarded to backendUrl.

Comment on lines 59 to +61
export interface NewScoreObjectV2 {
summary_scores: SummaryScore[];
traces: TraceItem[];
traces: TraceItem[] | GroupedTraceItem[];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize grouped traces before treating them as TraceItem.

NewScoreObjectV2.traces can now be grouped, but normalizeToIndividualScores still assumes TraceItem and will emit malformed trace_scores (nested arrays) when grouped traces flow through. Guard with isGroupedFormat and flatten or return safely.

💡 Suggested fix (flatten grouped traces)
   if (isNewScoreObjectV2(score)) {
+    if (isGroupedFormat(score.traces)) {
+      return score.traces.flatMap(group =>
+        group.llm_answers.map((answer, idx) => ({
+          trace_id: group.trace_ids?.[idx] ?? '',
+          input: { question: group.question },
+          output: { answer },
+          metadata: { ground_truth: group.ground_truth_answer },
+          trace_scores: group.scores?.[idx] ?? []
+        }))
+      );
+    }
     // Convert TraceItem[] to IndividualScore[]
     return score.traces.map(trace => ({
       trace_id: trace.trace_id,
       input: { question: trace.question },
       output: { answer: trace.llm_answer },
       metadata: { ground_truth: trace.ground_truth_answer },
       trace_scores: trace.scores
     }));
   }
Based on learnings: Use type guards like `isNewScoreObjectV2()` and `isLegacyScoreObject()` for runtime type checking when working with union types.
🤖 Prompt for AI Agents
In `@app/components/types.ts` around lines 59 - 61, normalizeToIndividualScores
currently assumes NewScoreObjectV2.traces are TraceItem and will emit nested
trace_scores when traces are grouped; update normalizeToIndividualScores to
detect grouped format using an isGroupedFormat guard (or use existing
isNewScoreObjectV2()/isLegacyScoreObject() runtime checks), and when traces are
GroupedTraceItem[] flatten them into individual TraceItem entries before
producing trace_scores (or skip/return safely if flattening isn't applicable) so
trace_scores remains a flat array.

@AkhileshNegi AkhileshNegi changed the title feat: add support for grouped evaluation results and CSV export format selection Evaluation: Group Export Jan 29, 2026
@AkhileshNegi AkhileshNegi merged commit 77ceacf into main Jan 29, 2026
1 check passed
@coderabbitai coderabbitai bot mentioned this pull request Mar 16, 2026
@Ayush8923 Ayush8923 deleted the feat/evaluation-grouped branch March 20, 2026 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Evaluation UI: Export Results

3 participants