fix(memory): add exact-duplicate dedup to SemanticStore.store()#126
Closed
kagura-agent wants to merge 1 commit intoghostwright:mainfrom
Closed
fix(memory): add exact-duplicate dedup to SemanticStore.store()#126kagura-agent wants to merge 1 commit intoghostwright:mainfrom
kagura-agent wants to merge 1 commit intoghostwright:mainfrom
Conversation
…twright#125) Before this change, extractFactsFromSession generated a new crypto.randomUUID() for each fact on every consolidation run. Since findContradictions() explicitly excludes same-object facts, identical-text facts accumulated as separate Qdrant points across sessions. Add findExactDuplicate() that scrolls Qdrant for an existing valid fact with the same subject + object (both keyword-indexed). When a duplicate exists, store() merges source_episode_ids into the existing fact via updatePayload() and returns early — skipping the upsert of a new point. Also adds an object keyword payload index to enable efficient exact match filtering. Includes tests for both the dedup-and-merge path and the different-object-creates-new-point path.
Author
|
Closing — based on maintainer merge patterns, external PRs aren't being merged here. The fix is available in the branch if useful. Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Closes #125.
extractFactsFromSessiongenerates a newcrypto.randomUUID()for each fact on every consolidation run. SincefindContradictions()explicitly excludes same-object facts (existingObject !== newFact.object), identical-text facts accumulate as separate Qdrant points across sessions.This leads to the system prompt's Known Facts section containing multiple copies of the same fact (e.g. four copies of "No let's not worry about being a repeat contributor...").
Fix
Add
findExactDuplicate()toSemanticStorethat scrolls Qdrant for an existing valid fact with the samesubject+object(both keyword-indexed). When a duplicate exists,store()mergessource_episode_idsinto the existing fact viaupdatePayload()and returns early — skipping the upsert of a new point.Changes
src/memory/semantic.ts: Addobjectkeyword payload index, addfindExactDuplicate()method, add dedup check instore()between contradiction resolution and upsertsrc/memory/__tests__/semantic.test.ts: Add tests for dedup-and-merge path and different-object-creates-new-point path; update existing store test to mock the scroll endpointTesting
Full suite: 2368 pass, 7 fail (pre-existing, same on main).