Conversation
Removes fields that are unnecessary and converts full.jsonl files to transcript.jsonl files as per specification in RFD 9 Entire-Checkpoint: 7a4de317c664
There was a problem hiding this comment.
Pull request overview
Adds a transcript “compaction” layer to transform checkpoint full.jsonl transcripts into a smaller transcript.jsonl-style JSONL format intended for API consumption by removing/flattening fields and splitting out user tool results.
Changes:
- Introduce
Compact+ helpers to convertfull.jsonl→ compacted JSONL (dropping certain line types, stripping thinking blocks, minimizing tool results). - Add golden/fixture-based unit tests covering common conversion scenarios, truncation behavior, and deterministic field ordering.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| cmd/entire/cli/transcript/compact.go | Implements the compaction/conversion logic and JSON-building helpers. |
| cmd/entire/cli/transcript/compact_test.go | Adds golden tests/fixtures for compaction output, truncation, and edge cases. |
| prettyGot, _ := json.MarshalIndent(got, "", " ") | ||
| prettyWant, _ := json.MarshalIndent(want, "", " ") |
There was a problem hiding this comment.
In assertJSONLines, json.MarshalIndent errors are ignored (prettyGot, _ := ...). errchkjson is enabled for test files too, so this will fail lint. Please handle these errors (e.g., fail the test) or add a narrowly-scoped //nolint:errchkjson with an explanation if you intentionally want to ignore them.
| prettyGot, _ := json.MarshalIndent(got, "", " ") | |
| prettyWant, _ := json.MarshalIndent(want, "", " ") | |
| prettyGot, err := json.MarshalIndent(got, "", " ") | |
| if err != nil { | |
| t.Fatalf("line %d: failed to marshal actual JSON for diff: %v\nvalue: %#v", i, err, got) | |
| } | |
| prettyWant, err := json.MarshalIndent(want, "", " ") | |
| if err != nil { | |
| t.Fatalf("line %d: failed to marshal expected JSON for diff: %v\nvalue: %#v", i, err, want) | |
| } |
| entryType := unquote(raw["type"]) | ||
| if droppedTypes[entryType] { | ||
| return nil | ||
| } | ||
|
|
||
| switch entryType { | ||
| case "assistant": | ||
| return convertAssistant(raw, opts) | ||
| case "user": | ||
| return convertUser(raw, opts) | ||
| default: | ||
| return nil // drop unknown types in the new format | ||
| } |
There was a problem hiding this comment.
convertLine only reads the entry type from the type field. Cursor transcripts use role (and the transcript package already normalizes role→type in parse.go), so compaction would silently drop all Cursor lines. Consider falling back to role when type is missing/empty (or reusing the existing normalization logic) so both formats are supported.
| // extractUserContent separates user message content into text and tool_result entries. | ||
| func extractUserContent(contentRaw json.RawMessage) (string, []toolResultEntry) { | ||
| // String content | ||
| var str string | ||
| if json.Unmarshal(contentRaw, &str) == nil { | ||
| return str, nil | ||
| } | ||
|
|
||
| // Array content | ||
| var blocks []map[string]json.RawMessage | ||
| if json.Unmarshal(contentRaw, &blocks) != nil { | ||
| return "", nil | ||
| } | ||
|
|
||
| var texts []string | ||
| var toolResults []toolResultEntry | ||
|
|
||
| for _, block := range blocks { | ||
| blockType := unquote(block["type"]) | ||
|
|
||
| if blockType == "tool_result" { | ||
| toolResults = append(toolResults, toolResultEntry{ | ||
| toolUseID: unquote(block["tool_use_id"]), | ||
| }) | ||
| continue | ||
| } | ||
|
|
||
| if blockType == "text" { | ||
| texts = append(texts, unquote(block["text"])) | ||
| } | ||
| } | ||
|
|
||
| text := "" | ||
| if len(texts) > 0 { | ||
| text = texts[0] | ||
| for i := 1; i < len(texts); i++ { | ||
| text += "\n\n" + texts[i] | ||
| } | ||
| } | ||
|
|
||
| return text, toolResults | ||
| } |
There was a problem hiding this comment.
User text extraction here doesn’t apply textutil.StripIDEContextTags, so IDE/system tags like <ide_opened_file> or Cursor’s <user_query> wrappers can leak into the compacted transcript. The rest of the codebase relies on transcript.ExtractUserContent / StripIDEContextTags to remove these; this function should strip them too (after joining text blocks).
| // marshalOrdered produces a JSON object with keys in the given order. | ||
| // Pairs with nil values are omitted. | ||
| func marshalOrdered(pairs ...interface{}) []byte { | ||
| var buf bytes.Buffer | ||
| buf.WriteByte('{') | ||
| first := true | ||
| for i := 0; i < len(pairs)-1; i += 2 { | ||
| key := pairs[i].(string) | ||
| val, _ := pairs[i+1].(json.RawMessage) | ||
| if val == nil { | ||
| continue | ||
| } | ||
| if !first { | ||
| buf.WriteByte(',') | ||
| } | ||
| keyJSON, _ := json.Marshal(key) | ||
| buf.Write(keyJSON) | ||
| buf.WriteByte(':') | ||
| buf.Write(val) | ||
| first = false | ||
| } | ||
| buf.WriteByte('}') | ||
| return buf.Bytes() | ||
| } | ||
|
|
||
| func mustMarshal(v interface{}) json.RawMessage { | ||
| b, _ := json.Marshal(v) | ||
| return b | ||
| } |
There was a problem hiding this comment.
marshalOrdered / mustMarshal currently ignore encoding/json errors and also ignore the ok result of a type assertion (val, _ := ...). With errchkjson and errcheck (check-type-assertions/check-blank) enabled in this repo, this will fail lint. Recommend changing these helpers to either (1) avoid json.Marshal for constant keys, and (2) return errors instead of discarding them, and (3) avoid comma-ok assertions if you’re not going to handle the boolean.
| lineBytes, err := reader.ReadBytes('\n') | ||
| if err != nil && err != io.EOF { | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
Compact returns the raw ReadBytes error directly. wrapcheck is enabled and elsewhere in this package errors are wrapped with context (e.g., parse.go). Consider wrapping this with a message like “failed to read transcript” to satisfy linting and improve diagnosability.
Entire-Checkpoint: 8ab41f9871d4
Entire-Checkpoint: 29768eb8b417
Entire-Checkpoint: d1e909851feb
Extract shared types (Options, compactMeta), entry point (Compact), line slicing (sliceFromLine), and utility functions (marshalOrdered, mustMarshal, copyField, unquote) into transcript/compact/ package. Includes stubs.go for forward-declared agent-specific functions that will be added in subsequent tasks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 17414f7f1d94
Extract OpenCode-specific conversion logic (isOpenCodeFormat, compactOpenCode, and supporting types/functions) into compact/opencode.go and remove the corresponding stubs from stubs.go. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: be2ce61d069e
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: c6904e6f41c9
Extract Claude Code / Cursor / Factory AI Droid JSONL conversion functions into claudecode.go: compactJSONL entry point, convertLine, convertUser, convertAssistant, normalizeKind, and content helpers. Remove stubs.go since all agent-specific implementations now exist. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 0a8d9214030a
Create gemini.go in the compact sub-package with format detection, message parsing, and conversion for Gemini CLI session transcripts. Also add Gemini support to the existing transcript/compact.go and generate the expected test fixture for the Gemini golden test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: b3fbd156ae8d
…ull-transcript-agent-support
…ull-transcript-agent-support
Entire-Checkpoint: ea1404a01011
Entire-Checkpoint: 7cbecf505ecb
Entire-Checkpoint: 77072abe499b
Entire-Checkpoint: bfc3c0ad0fa5
…b.com/entireio/cli into convert-full-transcript
Entire-Checkpoint: a424eac54176
Entire-Checkpoint: 4830a551338c
|
Closing so I can incrementally add support for other agents / resolve the merge conflicts more easily with the latest main. |
Removes fields that are unused/unnecessary for the API consumption and converts full.jsonl files to transcript.jsonl files.
Entire-Checkpoint: 7a4de317c664
Note
Medium Risk
Introduces new transcript transformation logic that affects what data is retained/omitted (e.g., tool results, thinking blocks, dropped entry types), so downstream consumers may see behavior changes if assumptions differ across agent formats.
Overview
Adds a new
transcript.Compactpipeline that convertsfull.jsonltranscripts into a normalizedtranscript.jsonlstream with per-line metadata (v,agent,cli_version) and optional truncation viaStartLine.The compaction drops non-message noise entries, normalizes cross-agent schemas (supports
type,role,human,gemini, and Factory AI Droidtype:"message"envelopes), strips IDE context tags from user text, and removes assistant thinking blocks/tool_use.caller.User messages containing
tool_resultblocks are split into a user text line plus one or moreuser_tool_resultlines, withtoolUseResultminimized to only API-relevant fields; extensive golden/edge-case tests were added to lock output semantics and field order.Written by Cursor Bugbot for commit 0593dab. Configure here.