Skip to content

[ChatQnA] Switch to vLLM as default llm backend on Gaudi#1404

Merged
chensuyue merged 14 commits into
opea-project:mainfrom
wangkl2:vllm-default-gaudi
Jan 17, 2025
Merged

[ChatQnA] Switch to vLLM as default llm backend on Gaudi#1404
chensuyue merged 14 commits into
opea-project:mainfrom
wangkl2:vllm-default-gaudi

Conversation

@wangkl2
Copy link
Copy Markdown
Collaborator

@wangkl2 wangkl2 commented Jan 16, 2025

Description

Switching from TGI to vLLM as the default LLM serving backend on Gaudi for the ChatQnA example to enhance the perf. Via benchmarking on Gaudi2 server with vLLM and TGI backend for LLM component for different ISL/OSL and various number of queries and concurrency, the geomean of measured LLMServe perf on a 7B model shows perf improvement of vLLM over TGI on several metrics including average total latency, average TPOT and throughput, while the geomean of average TTFT does not increase significantly. TGI is still offered as an option to deploy for LLM serving. Besides, vLLM LLM also replaces TGI LLM for other provided E2E ChatQnA pipelines including without-rerank pipeline and megaservice with guardrails. This PR also aligns the parameters of llm service in all chatqna test scripts with what in readme file.

Issues

#1213

Type of change

  • New feature (non-breaking change which adds new functionality)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

n/a

Tests

TGI-Gaudi version: 2.0.6
vLLM-fork version: 0.6.3.dev910+g3c39626f

Benchmark and compare the LLMServe perf on Gaudi2 server with OOB-vLLM and Tuned-TGI backend via GenAIEval. The geomean perf of vLLM performs better than TGI for average total latency, average TPOT and throughput (but not significant perf drop on avg TTFT) on 7B LLM with 4 sets of ISL/OSL (128/128, 128/1024, 1024/128, 1024/1024), measured on different num_queries and concurrency, including 32/8, 128/32.

Switching from TGI to vLLM as the default LLM serving backend on Gaudi for the ChatQnA example to enhance the perf. Via benchmarking on Gaudi2 server with vLLM and TGI backend for LLM component for different ISL/OSL and various number of queries and concurrency, the geomean of measured LLMServe perf on a 7B model shows perf improvement of vLLM over TGI on several metrics including average total latency, average TPOT and throughput, while the geomean of average TTFT does not increase significantly. TGI is still offered as an option to deploy for LLM serving. Besides, vLLM LLM also replaces TGI LLM for other provided E2E ChatQnA pipelines including without-rerank pipeline and megaservice with guardrails.

Implement opea-project#1213

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 16, 2025

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

pre-commit-ci Bot and others added 6 commits January 16, 2025 12:48
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
…Examples into vllm-default-gaudi

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
@chensuyue chensuyue merged commit 00e9da9 into opea-project:main Jan 17, 2025
chyundunovDatamonsters pushed a commit to chyundunovDatamonsters/OPEA-GenAIExamples that referenced this pull request Mar 4, 2025
…t#1404)

Switching from TGI to vLLM as the default LLM serving backend on Gaudi for the ChatQnA example to enhance the perf.

opea-project#1213
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
cogniware-devops pushed a commit to Cogniware-Inc/GenAIExamples that referenced this pull request Dec 19, 2025
…t#1404)

Switching from TGI to vLLM as the default LLM serving backend on Gaudi for the ChatQnA example to enhance the perf.

opea-project#1213
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Signed-off-by: cogniware-devops <ambarish.desai@cogniware.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants