Skip to content

RAG is slow in ChatQnA demo on Xeon #584

@NeoZhangJianyu

Description

@NeoZhangJianyu

I setup the demo based on ChatQnA (TGI) on Xeon (GNR).
Try RAG by the UI.
After upload the PDF file (2-5M), I search a question.
It will take 10-15s.

When update a text file with 3 lines, it's 2-3s.

Customer find the slow issue on embedding stage.

Metadata

Metadata

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions