Skip to content

fix(moq-lite): exit run_send_bandwidth when the session closes#1286

Merged
kixelated merged 1 commit into
mainfrom
fix/send-bandwidth-leak
Apr 11, 2026
Merged

fix(moq-lite): exit run_send_bandwidth when the session closes#1286
kixelated merged 1 commit into
mainfrom
fix/send-bandwidth-leak

Conversation

@kixelated

Copy link
Copy Markdown
Collaborator

Summary

  • run_send_bandwidth (introduced in Add bandwidth estimation for adaptive bitrate control #1208) spawns a detached task that clones the underlying web_transport_trait::Session, but its select! arms only wait on producer.closed() / producer.used() / producer.unused(). None of those fire when the wrapping moq_lite::Session drops — the task is the sole Producer, so producer.closed() can never transition, and the outer loop's producer.used() sits forever waiting for a new consumer on a dead session.
  • Net effect: every accepted quinn-backed session leaks a task that permanently pins the cloned session, the underlying quinn::Connection, and its StreamsState (HashMaps sized to max_concurrent_*_streams). On moq-relay with DEFAULT_MAX_STREAMS = 10_000, that's ~1.5–3 MB of live heap per connection accumulating unboundedly.
  • Confirmed via jemalloc heap profiles on use.cdn.moq.dev: ~484 MB retained after 53 min / 328 accepted connections, 79% of it in quinn_proto::endpoint::Endpoint::accept → hashbrown::HashMap::insert → RawTable::reserve_rehash inside StreamsState::new.

Fix

Wrap the existing polling logic in an outer select! that also listens on session.closed(), so the task exits as soon as the transport dies. Extract the polling loop into run_send_bandwidth_inner so all termination conditions live in one place.

Bumps moq-lite to 0.15.11.

Test plan

  • just check passes locally
  • Deploy to a single relay node and watch RSS over several hours
  • Confirm with a follow-up heap dump that quinn_proto::Connection::new is no longer dominant

🤖 Generated with Claude Code

The bandwidth polling task spawned in Session::new held a clone of the
underlying web_transport_trait::Session, but the select! arms inside
run_send_bandwidth only waited on producer.closed() / producer.used() /
producer.unused(). None of those fire when the wrapping Session drops:
the task is the sole Producer, so producer.closed() can never transition,
and an unused channel will never become used again.

As a result, once the moq_lite::Session dropped (e.g. on connection
close), the background task got stuck forever in the outer "wait for a
consumer" loop, permanently pinning the cloned session and, transitively,
the underlying quinn::Connection and its StreamsState allocations. On
relays this manifested as ~1.5-3 MB of live heap per accepted connection
growing unboundedly.

Fix by adding a top-level select! on session.closed() so the task exits
as soon as the transport dies, and extract the polling logic into
run_send_bandwidth_inner so the termination conditions sit in one place.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kixelated kixelated enabled auto-merge (squash) April 11, 2026 03:35
@coderabbitai

coderabbitai Bot commented Apr 11, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 25166fad-24b5-459e-a6c5-9f301c58aaea

📥 Commits

Reviewing files that changed from the base of the PR and between 8bf9923 and 271ea0f.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • rs/moq-lite/Cargo.toml
  • rs/moq-lite/src/session.rs

Walkthrough

The pull request includes a patch version increment for the moq-lite crate and refactors the run_send_bandwidth function in src/session.rs. The refactoring introduces an outer tokio::select! construct that races the session closure, producer closure, and polling logic. The previous loop logic is extracted into a new run_send_bandwidth_inner function, and in-loop selection branches for producer closure monitoring are removed in favor of error handling from the used() call within the inner loop.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main fix—exiting run_send_bandwidth when the session closes—which is the core change across all modified files.
Description check ✅ Passed The description comprehensively explains the memory leak issue, root cause, the implemented fix, and test plan; all directly related to the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/send-bandwidth-leak
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch fix/send-bandwidth-leak

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kixelated kixelated disabled auto-merge April 11, 2026 03:50
@kixelated kixelated merged commit 6f1b147 into main Apr 11, 2026
1 of 2 checks passed
@kixelated kixelated deleted the fix/send-bandwidth-leak branch April 11, 2026 03:50
@moq-bot moq-bot Bot mentioned this pull request Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant