[ChatQnA] Switch to vLLM as default llm backend on Gaudi by wangkl2 · Pull Request #1404 · opea-project/GenAIExamples

wangkl2 · 2025-01-16T12:48:06Z

Description

Switching from TGI to vLLM as the default LLM serving backend on Gaudi for the ChatQnA example to enhance the perf. Via benchmarking on Gaudi2 server with vLLM and TGI backend for LLM component for different ISL/OSL and various number of queries and concurrency, the geomean of measured LLMServe perf on a 7B model shows perf improvement of vLLM over TGI on several metrics including average total latency, average TPOT and throughput, while the geomean of average TTFT does not increase significantly. TGI is still offered as an option to deploy for LLM serving. Besides, vLLM LLM also replaces TGI LLM for other provided E2E ChatQnA pipelines including without-rerank pipeline and megaservice with guardrails. This PR also aligns the parameters of llm service in all chatqna test scripts with what in readme file.

Issues

#1213

Type of change

New feature (non-breaking change which adds new functionality)
Others (enhancement, documentation, validation, etc.)

Dependencies

n/a

Tests

TGI-Gaudi version: 2.0.6
vLLM-fork version: 0.6.3.dev910+g3c39626f

Benchmark and compare the LLMServe perf on Gaudi2 server with OOB-vLLM and Tuned-TGI backend via GenAIEval. The geomean perf of vLLM performs better than TGI for average total latency, average TPOT and throughput (but not significant perf drop on avg TTFT) on 7B LLM with 4 sets of ISL/OSL (128/128, 128/1024, 1024/128, 1024/1024), measured on different num_queries and concurrency, including 32/8, 128/32.

Switching from TGI to vLLM as the default LLM serving backend on Gaudi for the ChatQnA example to enhance the perf. Via benchmarking on Gaudi2 server with vLLM and TGI backend for LLM component for different ISL/OSL and various number of queries and concurrency, the geomean of measured LLMServe perf on a 7B model shows perf improvement of vLLM over TGI on several metrics including average total latency, average TPOT and throughput, while the geomean of average TTFT does not increase significantly. TGI is still offered as an option to deploy for LLM serving. Besides, vLLM LLM also replaces TGI LLM for other provided E2E ChatQnA pipelines including without-rerank pipeline and megaservice with guardrails. Implement opea-project#1213 Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

github-actions · 2025-01-16T12:48:20Z

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

for more information, see https://pre-commit.ci

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

…Examples into vllm-default-gaudi

…Examples into vllm-default-gaudi Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

…Examples into vllm-default-gaudi

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

…t#1404) Switching from TGI to vLLM as the default LLM serving backend on Gaudi for the ChatQnA example to enhance the perf. opea-project#1213 Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com> Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>

…t#1404) Switching from TGI to vLLM as the default LLM serving backend on Gaudi for the ChatQnA example to enhance the perf. opea-project#1213 Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com> Signed-off-by: cogniware-devops <ambarish.desai@cogniware.ai>

wangkl2 added 8 commits January 15, 2025 23:21

Switch to vllm llm backend for wo-rerank and guardrails pipe

2e5bc5c

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

update ut scripts for gaudi

978aaaf

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

solve conflicts

460cff6

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

solve conflicts

64fb275

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

Update readme

c7fe0dd

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

Resolve conflicts

e77faad

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

Fix ci issues

eb5880c

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

wangkl2 requested review from letonghan and lvliang-intel as code owners January 16, 2025 12:48

pre-commit-ci Bot and others added 6 commits January 16, 2025 12:48

[pre-commit.ci] auto fixes from pre-commit.com hooks

9b78ed7

for more information, see https://pre-commit.ci

Fix ci issues

f1198f5

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

Merge branch 'vllm-default-gaudi' of https://github.com/wangkl2/GenAI…

13843ea

…Examples into vllm-default-gaudi

Merge branch 'vllm-default-gaudi' of https://github.com/wangkl2/GenAI…

97eb79b

…Examples into vllm-default-gaudi Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

Merge branch 'vllm-default-gaudi' of https://github.com/wangkl2/GenAI…

4cafaa5

…Examples into vllm-default-gaudi

Fix ci issues for guardrails

8c976b7

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

joshuayao requested review from XinyuYe-Intel and yao531441 January 17, 2025 08:13

yao531441 approved these changes Jan 17, 2025

View reviewed changes

XinyuYe-Intel approved these changes Jan 17, 2025

View reviewed changes

chensuyue merged commit 00e9da9 into opea-project:main Jan 17, 2025

wangkl2 mentioned this pull request Mar 7, 2025

[Feature] vLLM enablement for 8 GenAI examples #1436

Closed

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ChatQnA] Switch to vLLM as default llm backend on Gaudi#1404

[ChatQnA] Switch to vLLM as default llm backend on Gaudi#1404
chensuyue merged 14 commits into
opea-project:mainfrom
wangkl2:vllm-default-gaudi

wangkl2 commented Jan 16, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Jan 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

wangkl2 commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues

Type of change

Dependencies

Tests

Uh oh!

github-actions Bot commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wangkl2 commented Jan 16, 2025 •

edited

Loading

github-actions Bot commented Jan 16, 2025 •

edited

Loading