Skip to content

Improve episodic memory ranking with reinforcement and decay#2

Merged
mcheemaa merged 2 commits intoghostwright:mainfrom
coe0718:memory-ranking-reinforcement-decay
Mar 31, 2026
Merged

Improve episodic memory ranking with reinforcement and decay#2
mcheemaa merged 2 commits intoghostwright:mainfrom
coe0718:memory-ranking-reinforcement-decay

Conversation

@coe0718
Copy link
Copy Markdown
Contributor

@coe0718 coe0718 commented Mar 31, 2026

What Changed

This PR improves episodic memory retrieval without adding a second memory pipeline.

  • adds a small deterministic ranking helper for episodic memories
  • blends semantic match with importance, reinforcement from repeated access, and decay over time
  • increments episodic access_count on recall so reinforcement reflects actual usage
  • filters stale, low-signal episodic memories before they are injected into prompt context
  • adds focused tests for ranking behavior and context filtering
  • documents the updated episodic-memory behavior

Why

This is a first focused step toward the memory strategy discussed in #1.

Phantom already has good capture and search primitives, but episodic retrieval was mostly recency-biased and prompt injection could still include old one-off memories with little durable value. This change keeps TypeScript in the plumbing lane: ranking, decay, retrieval, and prompt shaping. It does not move semantic reasoning out of the Agent SDK or judges.

How I Tested

Commands run:

/home/coemedia/.bun/bin/bun test src/memory/__tests__/ranking.test.ts
/home/coemedia/.bun/bin/bun test src/memory/__tests__/episodic.test.ts
/home/coemedia/.bun/bin/bun test src/memory/__tests__/context-builder.test.ts
bash -lc 'export PATH="$HOME/.bun/bin:$PATH" && /home/coemedia/.bun/bin/bun test'
/home/coemedia/.bun/bin/bun run typecheck
./node_modules/.bin/biome check src/memory/ranking.ts src/memory/episodic.ts src/memory/context-builder.ts src/memory/__tests__/ranking.test.ts src/memory/__tests__/episodic.test.ts src/memory/__tests__/context-builder.test.ts

Notes:

  • Full test suite passes once Bun is on PATH for subprocess-spawn tests.
  • Full repo lint currently reports pre-existing issues in src/mcp/__tests__/dynamic-handlers.test.ts, unrelated to this PR.

Checklist

  • Tests pass (bun test)
  • Lint passes (bun run lint)
  • Typecheck passes (bun run typecheck)
  • No secrets or .env files included
  • Files stay under 300 lines
  • No Cardinal Rule violations (TypeScript does plumbing only, the Agent SDK does reasoning)
  • No default exports or barrel files added

@mcheemaa mcheemaa self-requested a review March 31, 2026 04:43
Copy link
Copy Markdown
Member

@mcheemaa mcheemaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really strong first contribution. Thank you for spending time understanding the codebase before jumping into code - the issue discussion in #1, the phased approach, and the attention to the Cardinal Rule all show a deep read of the project.

A few things I appreciated:

  • The extraction to ranking.ts is the right refactor. The episodic store was accumulating scoring logic that belongs in a pure-function module. Clean separation.
  • Catching the updateAccessCounts bug (private method not incrementing access_count) is a sharp find. Reinforcement would never have worked from the recall path without this fix.
  • The math is solid. Exponential decay with configurable half-lives, log-saturating reinforcement, sensible bypass rules for high-importance episodes.
  • The context filter is exactly the right scope - a single filter() call that keeps stale low-signal episodes from burning prompt tokens.

Two optional suggestions (neither blocks merge):

  1. In calculateEpisodeContextScore, the weightedAverage call passes a dummy zero for the third argument since the function takes 3 inputs but you only need 2. A brief comment or a 2-arg helper would help readability.

  2. The hoursSince IIFE has a redundant Number.isNaN guard - Date.parse already returns NaN for invalid strings and the Number.isFinite check handles it.

Verified locally (789 tests pass, typecheck and lint clean) and deployed to a test VM. Memory recall is working correctly with the new ranking in place.

Approving as-is. Welcome to Phantom.

@mcheemaa mcheemaa merged commit bbf89ef into ghostwright:main Mar 31, 2026
1 check passed
@coe0718
Copy link
Copy Markdown
Contributor Author

coe0718 commented Mar 31, 2026 via email

mutusfa pushed a commit to mutusfa/phantom that referenced this pull request Apr 14, 2026
…ement-decay

Improve episodic memory ranking with reinforcement and decay
imonlinux added a commit to imonlinux/phantom that referenced this pull request Apr 24, 2026
Complete security review and implementation of fixes for Nextcloud Talk
integration based on comprehensive security audit findings.

HIGH PRIORITY fixes (security-critical):
- ghostwright#1: Implement replay attack protection with LRU cache (5-minute TTL)
- ghostwright#2: Add 64KB request size limit before body buffering
- ghostwright#4: Replace Date.now() with crypto.randomUUID() for unique IDs
- ghostwright#7: Fix JSON unwrap logic for ActivityStreams Note objects
- ghostwright#11: Replace 'Error:' text sniffing with runtime error events

Logic and security fixes:
- ghostwright#3: Fix msgId/msg name collision in error handling
- ghostwright#5: Improve parseConversationId to handle colons in tokens
- ghostwright#6: Reject webhooks without target.id instead of silent fallback
- ghostwright#8: Normalize emoji to avoid variation selector validation issues
- ghostwright#9: Handle 404/409 reaction responses as success conditions
- ghostwright#10: Make setReaction return boolean for proper error handling
- ghostwright#12: Improve bot loop guard with actorId checking

Best practices and polish:
- ghostwright#13: Make port configurable instead of hardcoded 3200
- ghostwright#14: Move webhookPath default normalization to constructor
- ghostwright#15: Fix health check path precedence (check webhook first)
- ghostwright#16: Add exponential backoff retry for 5xx/429 responses
- ghostwright#17: Add URL validation and encoding for talkServer config
- ghostwright#18: Document HMAC signing asymmetry (inbound vs outbound)
- ghostwright#20: Import randomUUID explicitly from node:crypto
- ghostwright#21: Add reactions: true to channel capabilities
- ghostwright#22: Namespace environment variables with NEXTCLOUD_ prefix

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
imonlinux added a commit to imonlinux/phantom that referenced this pull request Apr 26, 2026
…k adapter

Implements full test suite for nextcloud.ts addressing all critical areas
identified in the nextcloud-talk-review document. 943 lines of tests
covering security, functionality, and edge cases.

Test coverage by category:

1. Signature verification (Fix ghostwright#1, ghostwright#18) - Security Critical
   - Valid HMAC signature acceptance
   - Invalid HMAC signature rejection
   - Replay attack protection via nonce cache
   - Nonce cache size limits (1000 entries, FIFO eviction)
   - Nonce expiration and periodic pruning (5-minute TTL)
   - Asymmetric signing (inbound: random+body, outbound: random+content)

2. Request size limits (Fix ghostwright#2) - Security Critical
   - Content-Length validation before buffering
   - Double-check after reading (missing Content-Length)
   - 64 KB limit enforcement (Nextcloud caps at 32k chars)

3. JSON unwrapping (Fix ghostwright#7) - Functionality Critical
   - ActivityStreams Note objects unwrap correctly
   - Plain text passes through unchanged
   - Literal JSON-like text not corrupted (only Note type unwraps)
   - Invalid JSON fallback to plain text

4. parseConversationId (Fix ghostwright#5) - Correctness Critical
   - Valid conversationId format parsing
   - Missing prefix returns null
   - Tokens containing colons handled correctly (indexOf+slice)
   - Thread-scoped ID to room token extraction

5. Bot loop guard (Fix ghostwright#12) - Multi-Bot Safety
   - Application actor filtering (actorType === "Application")
   - Self-filtering (actorId === config.botId)
   - Person messages processed normally
   - Multi-bot room scenarios

6. Retry and backoff (Fix ghostwright#16) - Resilience
   - 429 rate limiting with Retry-After header
   - 5xx server errors with exponential backoff + jitter
   - Network error retry logic
   - Non-retryable 4xx handling

7. Reaction error handling (Fix ghostwright#9)
   - 404 on remove treated as success
   - 409 on add treated as success
   - 5xx retry for reaction operations

8. URL validation and encoding (Fix ghostwright#17)
   - talkServer scheme removal (http://, https://)
   - Trailing slash removal
   - URL-encoding of roomToken and messageId

9. Target validation (Fix ghostwright#6)
   - Missing target.id rejection (no silent fallback)

10. Emoji normalization (Fix ghostwright#8)
    - Variation selector removal (U+26A0 vs U+26A0 U+FE0F)

11. Unique message IDs (Fix ghostwright#4)
    - crypto.randomUUID() vs Date.now()
    - Uniqueness across concurrent calls

12. Config normalization (Fix ghostwright#13, ghostwright#14)
    - webhookPath default in constructor
    - Configurable port
    - Session window configuration

13. Health check (Fix ghostwright#15)
    - Path precedence (webhook before health)

14. Message ID extraction
    - Numeric and string ID handling
    - Missing ID handling

15. Time-window session coalescing
    - Recent session continuation
    - New session creation
    - Parent message ID handling

16. Capabilities declaration (Fix ghostwright#21)
    - reactions: true declared

All tests use bun:test with mocked dependencies and follow existing
patterns from webhook.test.ts, slack.test.ts, and email.test.ts.

Related: nextcloud-talk-review.md Issue ghostwright#19
mcheemaa added a commit that referenced this pull request Apr 27, 2026
…rnal refs (B.1.4 fix-and-review #2)

Section A: env-only gate too strict on the agent_ready and first_dm_sent
heartbeats. SLACK_TRANSPORT === "http" is the actual signal that the
agent is in an HTTP-transport deployment with a host metadata gateway
listening; METADATA_BASE_URL alone misses the case where the channel
factory falls back to the link-local default. Fixed both call sites in
src/index.ts and src/channels/slack-introduction.ts; both now also
fall back to DEFAULT_METADATA_BASE_URL so an unset env is not a gating
failure.

Section B: heartbeat.test.ts refactored away from the
mock(...) as unknown as FetchImpl cast pattern. The wrapper carries
internal reset state that has shifted across Bun minor versions and
broke a single sibling test on Bun 1.3.13 (passes locally on 1.3.5).
Replaced makeFetchOk / makeFetchStatus / makeFetchThrows with plain
async functions typed directly as FetchImpl. No mock() involvement,
no cast, no per-runtime state. Should pass on every Bun release.

Section D: dashboard URL is now parameterized.
composeIntroductionText accepts an optional dashboardUrl, dropping the
"manage me" line entirely when unset. sendIntroductionDm reads
PHANTOM_DASHBOARD_URL via a resolveDashboardUrl helper that validates
with new URL(...) and drops a malformed value with a warning rather
than injecting unparseable text into the user-facing DM. Caller in
phantom-rootfs sets the env var so cloud deployments get the cloud URL;
self-hosters with no env var see a shorter, valid DM.

Section D (continued): scrubbed internal milestone references from
every comment new in the diff. "Phase B.1.4" prefixes dropped, "the
canary user" rephrased to "the user", "phantomd" replaced with "the
host metadata gateway" or "the host gateway", "architect plan section
12 (option A)" replaced with an inline description of the contract.
The diff against main now matches zero internal-data leak patterns
under the orchestrator's grep.

Tests:
- New slack-introduction tests assert the SLACK_TRANSPORT-based gate:
  unset / socket = no heartbeat, http = heartbeat with default URL when
  METADATA_BASE_URL is unset, http = heartbeat with set URL when both
  are set. Plus three new dashboard-URL tests covering valid URL,
  unset, and malformed.
- New heartbeat tests exercise the same FetchImpl signature without
  mock(); existing assertions intact.

Gates: bun lint clean, bun typecheck clean, 1957 / 1957 tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants