Improve episodic memory ranking with reinforcement and decay#2
Conversation
mcheemaa
left a comment
There was a problem hiding this comment.
This is a really strong first contribution. Thank you for spending time understanding the codebase before jumping into code - the issue discussion in #1, the phased approach, and the attention to the Cardinal Rule all show a deep read of the project.
A few things I appreciated:
- The extraction to
ranking.tsis the right refactor. The episodic store was accumulating scoring logic that belongs in a pure-function module. Clean separation. - Catching the
updateAccessCountsbug (private method not incrementingaccess_count) is a sharp find. Reinforcement would never have worked from the recall path without this fix. - The math is solid. Exponential decay with configurable half-lives, log-saturating reinforcement, sensible bypass rules for high-importance episodes.
- The context filter is exactly the right scope - a single
filter()call that keeps stale low-signal episodes from burning prompt tokens.
Two optional suggestions (neither blocks merge):
-
In
calculateEpisodeContextScore, theweightedAveragecall passes a dummy zero for the third argument since the function takes 3 inputs but you only need 2. A brief comment or a 2-arg helper would help readability. -
The
hoursSinceIIFE has a redundantNumber.isNaNguard -Date.parsealready returnsNaNfor invalid strings and theNumber.isFinitecheck handles it.
Verified locally (789 tests pass, typecheck and lint clean) and deployed to a test VM. Memory recall is working correctly with the new ranking in place.
Approving as-is. Welcome to Phantom.
|
I’ll look into your suggestions and add another PR for it at some point unless someone else gets it first. Sent from my iPhoneOn Mar 31, 2026, at 12:52 AM, Muhammad Ahmed Cheema ***@***.***> wrote:
Merged #2 into main.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
…ement-decay Improve episodic memory ranking with reinforcement and decay
Complete security review and implementation of fixes for Nextcloud Talk integration based on comprehensive security audit findings. HIGH PRIORITY fixes (security-critical): - ghostwright#1: Implement replay attack protection with LRU cache (5-minute TTL) - ghostwright#2: Add 64KB request size limit before body buffering - ghostwright#4: Replace Date.now() with crypto.randomUUID() for unique IDs - ghostwright#7: Fix JSON unwrap logic for ActivityStreams Note objects - ghostwright#11: Replace 'Error:' text sniffing with runtime error events Logic and security fixes: - ghostwright#3: Fix msgId/msg name collision in error handling - ghostwright#5: Improve parseConversationId to handle colons in tokens - ghostwright#6: Reject webhooks without target.id instead of silent fallback - ghostwright#8: Normalize emoji to avoid variation selector validation issues - ghostwright#9: Handle 404/409 reaction responses as success conditions - ghostwright#10: Make setReaction return boolean for proper error handling - ghostwright#12: Improve bot loop guard with actorId checking Best practices and polish: - ghostwright#13: Make port configurable instead of hardcoded 3200 - ghostwright#14: Move webhookPath default normalization to constructor - ghostwright#15: Fix health check path precedence (check webhook first) - ghostwright#16: Add exponential backoff retry for 5xx/429 responses - ghostwright#17: Add URL validation and encoding for talkServer config - ghostwright#18: Document HMAC signing asymmetry (inbound vs outbound) - ghostwright#20: Import randomUUID explicitly from node:crypto - ghostwright#21: Add reactions: true to channel capabilities - ghostwright#22: Namespace environment variables with NEXTCLOUD_ prefix Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…k adapter Implements full test suite for nextcloud.ts addressing all critical areas identified in the nextcloud-talk-review document. 943 lines of tests covering security, functionality, and edge cases. Test coverage by category: 1. Signature verification (Fix ghostwright#1, ghostwright#18) - Security Critical - Valid HMAC signature acceptance - Invalid HMAC signature rejection - Replay attack protection via nonce cache - Nonce cache size limits (1000 entries, FIFO eviction) - Nonce expiration and periodic pruning (5-minute TTL) - Asymmetric signing (inbound: random+body, outbound: random+content) 2. Request size limits (Fix ghostwright#2) - Security Critical - Content-Length validation before buffering - Double-check after reading (missing Content-Length) - 64 KB limit enforcement (Nextcloud caps at 32k chars) 3. JSON unwrapping (Fix ghostwright#7) - Functionality Critical - ActivityStreams Note objects unwrap correctly - Plain text passes through unchanged - Literal JSON-like text not corrupted (only Note type unwraps) - Invalid JSON fallback to plain text 4. parseConversationId (Fix ghostwright#5) - Correctness Critical - Valid conversationId format parsing - Missing prefix returns null - Tokens containing colons handled correctly (indexOf+slice) - Thread-scoped ID to room token extraction 5. Bot loop guard (Fix ghostwright#12) - Multi-Bot Safety - Application actor filtering (actorType === "Application") - Self-filtering (actorId === config.botId) - Person messages processed normally - Multi-bot room scenarios 6. Retry and backoff (Fix ghostwright#16) - Resilience - 429 rate limiting with Retry-After header - 5xx server errors with exponential backoff + jitter - Network error retry logic - Non-retryable 4xx handling 7. Reaction error handling (Fix ghostwright#9) - 404 on remove treated as success - 409 on add treated as success - 5xx retry for reaction operations 8. URL validation and encoding (Fix ghostwright#17) - talkServer scheme removal (http://, https://) - Trailing slash removal - URL-encoding of roomToken and messageId 9. Target validation (Fix ghostwright#6) - Missing target.id rejection (no silent fallback) 10. Emoji normalization (Fix ghostwright#8) - Variation selector removal (U+26A0 vs U+26A0 U+FE0F) 11. Unique message IDs (Fix ghostwright#4) - crypto.randomUUID() vs Date.now() - Uniqueness across concurrent calls 12. Config normalization (Fix ghostwright#13, ghostwright#14) - webhookPath default in constructor - Configurable port - Session window configuration 13. Health check (Fix ghostwright#15) - Path precedence (webhook before health) 14. Message ID extraction - Numeric and string ID handling - Missing ID handling 15. Time-window session coalescing - Recent session continuation - New session creation - Parent message ID handling 16. Capabilities declaration (Fix ghostwright#21) - reactions: true declared All tests use bun:test with mocked dependencies and follow existing patterns from webhook.test.ts, slack.test.ts, and email.test.ts. Related: nextcloud-talk-review.md Issue ghostwright#19
…rnal refs (B.1.4 fix-and-review #2) Section A: env-only gate too strict on the agent_ready and first_dm_sent heartbeats. SLACK_TRANSPORT === "http" is the actual signal that the agent is in an HTTP-transport deployment with a host metadata gateway listening; METADATA_BASE_URL alone misses the case where the channel factory falls back to the link-local default. Fixed both call sites in src/index.ts and src/channels/slack-introduction.ts; both now also fall back to DEFAULT_METADATA_BASE_URL so an unset env is not a gating failure. Section B: heartbeat.test.ts refactored away from the mock(...) as unknown as FetchImpl cast pattern. The wrapper carries internal reset state that has shifted across Bun minor versions and broke a single sibling test on Bun 1.3.13 (passes locally on 1.3.5). Replaced makeFetchOk / makeFetchStatus / makeFetchThrows with plain async functions typed directly as FetchImpl. No mock() involvement, no cast, no per-runtime state. Should pass on every Bun release. Section D: dashboard URL is now parameterized. composeIntroductionText accepts an optional dashboardUrl, dropping the "manage me" line entirely when unset. sendIntroductionDm reads PHANTOM_DASHBOARD_URL via a resolveDashboardUrl helper that validates with new URL(...) and drops a malformed value with a warning rather than injecting unparseable text into the user-facing DM. Caller in phantom-rootfs sets the env var so cloud deployments get the cloud URL; self-hosters with no env var see a shorter, valid DM. Section D (continued): scrubbed internal milestone references from every comment new in the diff. "Phase B.1.4" prefixes dropped, "the canary user" rephrased to "the user", "phantomd" replaced with "the host metadata gateway" or "the host gateway", "architect plan section 12 (option A)" replaced with an inline description of the contract. The diff against main now matches zero internal-data leak patterns under the orchestrator's grep. Tests: - New slack-introduction tests assert the SLACK_TRANSPORT-based gate: unset / socket = no heartbeat, http = heartbeat with default URL when METADATA_BASE_URL is unset, http = heartbeat with set URL when both are set. Plus three new dashboard-URL tests covering valid URL, unset, and malformed. - New heartbeat tests exercise the same FetchImpl signature without mock(); existing assertions intact. Gates: bun lint clean, bun typecheck clean, 1957 / 1957 tests pass.
What Changed
This PR improves episodic memory retrieval without adding a second memory pipeline.
access_counton recall so reinforcement reflects actual usageWhy
This is a first focused step toward the memory strategy discussed in #1.
Phantom already has good capture and search primitives, but episodic retrieval was mostly recency-biased and prompt injection could still include old one-off memories with little durable value. This change keeps TypeScript in the plumbing lane: ranking, decay, retrieval, and prompt shaping. It does not move semantic reasoning out of the Agent SDK or judges.
How I Tested
Commands run:
Notes:
src/mcp/__tests__/dynamic-handlers.test.ts, unrelated to this PR.Checklist
bun test)bun run lint)bun run typecheck).envfiles included