Skip to content

perf: replace subprocess zstd with in-process klauspost/compress#217

Closed
joshfriend wants to merge 2 commits intomainfrom
jfriend/in-process-zstd
Closed

perf: replace subprocess zstd with in-process klauspost/compress#217
joshfriend wants to merge 2 commits intomainfrom
jfriend/in-process-zstd

Conversation

@joshfriend
Copy link
Contributor

Both compress and decompress now use the klauspost Go library instead of shelling out to the zstd binary. The practical benefit isn't raw speed — zstd at 2 GB/s was never the bottleneck. It's fewer moving parts: no subprocess spawning, no kernel pipes for IPC, no scheduling jitter from coordinating across process boundaries. It also removes the runtime dependency on zstd being installed, making the binary more portable.

Benchmarked on r8id.metal-48xlarge with a 2.4 GB compressed / 4.5 GB on-disk cache bundle (334K files).

Both compress and decompress now use the klauspost Go library instead
of shelling out to zstd. The practical benefit isn't raw speed — zstd
at 2 GB/s was never the bottleneck. It's fewer moving parts: no
subprocess spawning, no kernel pipes for IPC, no scheduling jitter from
coordinating across process boundaries. It also removes the runtime
dependency on zstd being installed, making the binary more portable.

Benchmarked on r8id.metal-48xlarge with a 2.4 GB compressed / 4.5 GB
on-disk cache bundle (334K files).
@joshfriend joshfriend requested a review from a team as a code owner March 23, 2026 18:21
@joshfriend joshfriend requested review from inez and removed request for a team March 23, 2026 18:21
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e8abce973

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@joshfriend joshfriend force-pushed the jfriend/in-process-zstd branch 2 times, most recently from d03e465 to 6a22063 Compare March 23, 2026 18:59
If io.Copy returns early (e.g. upload interrupted), tar can block forever
writing to its stdout pipe. Close the read end so tar gets SIGPIPE and exits.
Also toned down the comment — we still have one kernel pipe (tar→Go), we
eliminated the second one (tar→zstd subprocess).
@joshfriend joshfriend force-pushed the jfriend/in-process-zstd branch from 6a22063 to 65df4f6 Compare March 23, 2026 19:01
@alecthomas
Copy link
Collaborator

Have you benchmarked this?

@joshfriend joshfriend marked this pull request as draft March 23, 2026 21:06
@alecthomas
Copy link
Collaborator

FYI the main reason for using system zstd/tar wasn't speed, it was compatibility. I'd be pretty surprised if scheduling jitter was a concern for cachew too...

@joshfriend
Copy link
Contributor Author

joshfriend commented Mar 24, 2026

I discovered during playpen testing that Go klauspost/compress zstd decompression is catastrophically slow for git mirror snapshots that contain multi-GB packfiles.

a very large repo mirror snapshot has a single 6.8 GB pack file. The Go zstd decoder feeds through archive/tar's single-threaded io.CopyBuffer loop, which serializes reads and writes. This bottlenecks the entire pipeline to ~20 MB/s on one CPU core — 5+ minutes for what native zstd -d | tar x does in 8.5 seconds.

Go's io.CopyBuffer alternates between zstd Read and file Write with no pipelining, while the native subprocess pipeline benefits from OS pipe buffering that lets both processes run concurrently with SIMD-optimized C code.

For cachew's use cases (git mirror snapshots with large packfiles), the 15% small-file improvement doesn't justify a 30x regression on large files. Keeping the subprocess pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants