perf: replace subprocess zstd with in-process klauspost/compress by joshfriend · Pull Request #217 · block/cachew

joshfriend · 2026-03-23T18:21:05Z

Both compress and decompress now use the klauspost Go library instead of shelling out to the zstd binary. The practical benefit isn't raw speed — zstd at 2 GB/s was never the bottleneck. It's fewer moving parts: no subprocess spawning, no kernel pipes for IPC, no scheduling jitter from coordinating across process boundaries. It also removes the runtime dependency on zstd being installed, making the binary more portable.

Benchmarked on r8id.metal-48xlarge with a 2.4 GB compressed / 4.5 GB on-disk cache bundle (334K files).

Both compress and decompress now use the klauspost Go library instead of shelling out to zstd. The practical benefit isn't raw speed — zstd at 2 GB/s was never the bottleneck. It's fewer moving parts: no subprocess spawning, no kernel pipes for IPC, no scheduling jitter from coordinating across process boundaries. It also removes the runtime dependency on zstd being installed, making the binary more portable. Benchmarked on r8id.metal-48xlarge with a 2.4 GB compressed / 4.5 GB on-disk cache bundle (334K files).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e8abce973

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

internal/snapshot/snapshot.go

If io.Copy returns early (e.g. upload interrupted), tar can block forever writing to its stdout pipe. Close the read end so tar gets SIGPIPE and exits. Also toned down the comment — we still have one kernel pipe (tar→Go), we eliminated the second one (tar→zstd subprocess).

alecthomas · 2026-03-23T21:00:35Z

Have you benchmarked this?

alecthomas · 2026-03-23T21:13:23Z

FYI the main reason for using system zstd/tar wasn't speed, it was compatibility. I'd be pretty surprised if scheduling jitter was a concern for cachew too...

joshfriend · 2026-03-24T20:43:10Z

I discovered during playpen testing that Go klauspost/compress zstd decompression is catastrophically slow for git mirror snapshots that contain multi-GB packfiles.

a very large repo mirror snapshot has a single 6.8 GB pack file. The Go zstd decoder feeds through archive/tar's single-threaded io.CopyBuffer loop, which serializes reads and writes. This bottlenecks the entire pipeline to ~20 MB/s on one CPU core — 5+ minutes for what native zstd -d | tar x does in 8.5 seconds.

Go's io.CopyBuffer alternates between zstd Read and file Write with no pipelining, while the native subprocess pipeline benefits from OS pipe buffering that lets both processes run concurrently with SIMD-optimized C code.

For cachew's use cases (git mirror snapshots with large packfiles), the 15% small-file improvement doesn't justify a 30x regression on large files. Keeping the subprocess pipeline.

joshfriend requested a review from a team as a code owner March 23, 2026 18:21

joshfriend requested review from inez and removed request for a team March 23, 2026 18:21

joshfriend mentioned this pull request Mar 23, 2026

perf: parallel tar extraction with 64 worker goroutines #218

Closed

chatgpt-codex-connector bot reviewed Mar 23, 2026

View reviewed changes

internal/snapshot/snapshot.go Show resolved Hide resolved

joshfriend force-pushed the jfriend/in-process-zstd branch 2 times, most recently from d03e465 to 6a22063 Compare March 23, 2026 18:59

joshfriend force-pushed the jfriend/in-process-zstd branch from 6a22063 to 65df4f6 Compare March 23, 2026 19:01

joshfriend marked this pull request as draft March 23, 2026 21:06

joshfriend closed this Mar 24, 2026

This was referenced Mar 24, 2026

perf: parallel range-GET S3 downloads for large objects #224

Closed

perf: parallel range-GET S3 downloads for large objects #225

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: replace subprocess zstd with in-process klauspost/compress#217

perf: replace subprocess zstd with in-process klauspost/compress#217
joshfriend wants to merge 2 commits intomainfrom
jfriend/in-process-zstd

joshfriend commented Mar 23, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

alecthomas commented Mar 23, 2026

Uh oh!

alecthomas commented Mar 23, 2026

Uh oh!

joshfriend commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joshfriend commented Mar 23, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

alecthomas commented Mar 23, 2026

Uh oh!

alecthomas commented Mar 23, 2026

Uh oh!

joshfriend commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joshfriend commented Mar 24, 2026 •

edited

Loading