Skip to content

feat: Add Garbage Collection (GC) and MaxArenasToKeep feature#98

Open
eeliu wants to merge 1 commit into
grandecola:mainfrom
eeliu:feature/gc-setMaxArenasToKeep
Open

feat: Add Garbage Collection (GC) and MaxArenasToKeep feature#98
eeliu wants to merge 1 commit into
grandecola:mainfrom
eeliu:feature/gc-setMaxArenasToKeep

Conversation

@eeliu
Copy link
Copy Markdown

@eeliu eeliu commented May 26, 2026

Description

This pull request introduces an automatic and manual Garbage Collection (GC) mechanism to bigqueue. By default, bigqueue retains all arena files indefinitely, which can lead to storage exhaustion for long-running queues. This feature enables users to periodically or automatically clean up consumed arena files.

Key Features and Implementation Details

  1. New Configuration - SetMaxArenasToKeep(n): Users can configure the queue to keep a maximum of n consumed arenas. Expired arenas before this threshold are deleted from the disk.
  2. Manual GC Trigger - GC(): Added an explicit GC() method allowing users to manually trigger disk cleanup (e.g., during off-peak hours).
  3. Data Structure Refactoring: Modified the internal arenas collection in arenaManager from a []*mmap.File (slice) to a map[int]*mmap.File. This is necessary to support non-contiguous Arena IDs that arise when old files are deleted from the disk.
  4. Metadata Preservation: Adjusted the metadata synchronization logic to ensure that if GC removes older head arenas, the global head is advanced correctly and seamlessly persisted to disk.

GC Workflow

flowchart TD
    A[Trigger GC] --> B[Gather Consumer Heads]
    B --> C[Calculate minHeadAid = min of all consumers]
    C --> D{Is minHeadAid valid?}
    D -- Yes --> E[Update Global Head to minHeadAid]
    D -- No --> Z[Exit GC]
    E --> F[Calculate limitAid = minHeadAid - maxArenasToKeep]
    F --> G{limitAid > 0?}
    G -- Yes --> H[Iterate aid from 0 to limitAid-1]
    H --> I[Unmap arena from memory]
    I --> J[Delete .dat file from disk]
    J --> K[Remove from in-memory arena map]
    K --> L[Repeat for next expired arena]
    L --> Z
    G -- No --> Z
Loading

Test Cases Introduced

  • gc_test.go: Contains basic functionality tests verifying that configuring SetMaxArenasToKeep correctly cleans up the anticipated arena files upon consumption.
  • gc_concurrency_test.go: Highly concurrent stress tests running Enqueue, Dequeue, and GC precisely at the same time to ensure no race conditions arise during the memory unmap or file deletion stages.
  • crash_recovery_test.go: Tests the resiliency of the queue during unexpected closures. It simulates a crashed state while an ongoing GC is only partially completed, verifying that the queue can restore itself correctly upon the next boot.
  • bigqueue_test.go (Updates): Validation checks ensuring that negative numbers for maxArenasToKeep return the appropriate initialization errors.

All tests are passing cleanly with expected coverage. Please let me know if there are any aspects of the implementation you would like me to adjust.

Copilot AI review requested due to automatic review settings May 26, 2026 05:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR introduces configurable garbage collection (GC) for arena files in bigqueue, adds a public GC() entrypoint, and expands documentation and tests to validate cleanup, concurrency, and crash recovery behaviors.

Changes:

  • Added SetMaxArenasToKeep configuration and metadata head updating to support arena file garbage collection.
  • Implemented arena deletion logic inside arenaManager and exposed MmapQueue.GC() as a public API.
  • Added extensive GC + concurrency + crash-recovery tests and updated docs/README examples to the NewMmapQueue API.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
metadata.go Re-enables putHead so GC can persist updated global head in metadata.
config.go Adds maxArenasToKeep config, option setter, and validation error.
bigqueue.go Exposes MmapQueue.GC() to trigger arena cleanup with queue locking.
arenamanager.go Refactors arena tracking (slice→map) and implements GC deletion logic + head updates.
gc_test.go Adds multi-scenario tests validating arena deletion and consumer-head semantics.
gc_concurrency_test.go Adds concurrent producer/consumer test with periodic GC.
crash_recovery_test.go Adds multi-process crash recovery tests for enqueue, dequeue, and torn GC state.
bigqueue_test.go Adds unit test for negative SetMaxArenasToKeep validation.
doc.go Updates examples to NewMmapQueue and documents GC usage.
README.md Documents SetMaxArenasToKeep and manual GC() usage; updates examples to NewMmapQueue.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gc_test.go
Comment thread arenamanager.go
Comment thread arenamanager.go
Comment thread arenamanager.go
Comment thread arenamanager.go
Comment thread gc_test.go
Comment thread crash_recovery_test.go
Comment thread crash_recovery_test.go
Comment thread README.md
Comment thread gc_test.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants