Skip to content

fix(common-ai): pre-embed nodes so LlamaIndexEmbeddingOperator returns vectors#68488

Closed
AgentNero-ch wants to merge 1 commit into
apache:mainfrom
AgentNero-ch:fix/llamaindex-embedding-vector-none
Closed

fix(common-ai): pre-embed nodes so LlamaIndexEmbeddingOperator returns vectors#68488
AgentNero-ch wants to merge 1 commit into
apache:mainfrom
AgentNero-ch:fix/llamaindex-embedding-vector-none

Conversation

@AgentNero-ch

Copy link
Copy Markdown

What

LlamaIndexEmbeddingOperator.execute() returns chunks with "vector": None because it relies on VectorStoreIndex to populate node.embedding as a side effect. But VectorStoreIndex._get_node_with_embedding() attaches embeddings to copies of the nodes (via model_copy()), never the originals.

Fix

Call embed_model.get_text_embedding_batch() on the original nodes before passing them to VectorStoreIndex. The index's internal embed_nodes() skips nodes whose .embedding is already set, so there are no duplicate API calls.

Why this works

From llama-index-core source (indices/utils.py):
python
def embed_nodes(nodes, embed_model, ...):
for node in nodes:
if node.embedding is not None:
continue # skip already-embedded nodes
...

Verified across llama-index-core v0.10.68 through v0.14.22 — all versions copy nodes internally, so the side-effect assumption has never held.

Testing

Updated unit tests to mock get_text_embedding_batch instead of relying on VectorStoreIndex side effects. Added a new test verifying the pre-embed step is called with correct node texts.

Closes #68416

…s vectors

VectorStoreIndex._get_node_with_embedding() attaches embeddings to
*copies* of nodes (via model_copy()), never the originals. The
operator was relying on VectorStoreIndex populating
node.embedding as a side effect, which always yielded None.

Fix: call embed_model.get_text_embedding_batch() on the original
nodes before passing them to VectorStoreIndex. The index's internal
embed_nodes() skips nodes whose .embedding is already set, so
there are no duplicate API calls.

Closes apache#68416
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LlamaIndexEmbeddingOperator returns vector=None for every chunk

1 participant