Skip to content

[Bug] Get "can't connect to retriever" error when concurrency exceeds 32 #1556

@leslieluyu

Description

@leslieluyu

Priority

P2-High

OS type

Ubuntu

Hardware type

Gaudi2

Installation method

  • Pull docker images from hub.docker.com
  • Build docker images from source
  • Other

Deploy method

  • Docker
  • Docker Compose
  • Kubernetes Helm Charts
  • Kubernetes GMC
  • Other

Running nodes

Single Node

What's the version?

chatqna v1.2 norerank
chatqna-chatqna-ui-695995789c-sz67q opea/chatqna-ui:1.2
chatqna-data-prep-67f484b58f-xwvct opea/dataprep:1.2
chatqna-db8987c4c-slm6z opea/chatqna-without-rerank:1.2
chatqna-nginx-6d9df4b75b-swts2 opea/nginx:latest
chatqna-redis-vector-db-66c94f7fc5-csfx2 redis/redis-stack:7.2.0-v9
chatqna-retriever-usvc-5b64ff97c8-4fkd9 opea/retriever:1.2
chatqna-tei-7fc4845868-lr2wx ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
chatqna-tgi-f5fc79849-bhrsk ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-fv48f ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-jdmwj ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-jwsxb ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-nhkdj ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-q5glp ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-td5lk ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-zxtr9 ghcr.io/huggingface/tgi-gaudi:2.3.1

Description

Get error when load is heavy.
Use benchmark to get perf data. there are error message : "can't connect to retriever" (see message below) when concurrency exceed 32 .
There are no error when concurrency is below 16(1,2,4,8,16).
This phenomenon only occurs in version 1.2; it did not exist in previous versions(v1.1,v1.0,v0.8, etc.)

Reproduce steps

  1. deploy the chatqna v1.2 by using helm-chart
  2. send request by using benchmarking scripts
  3. see the log in chatqna backend

Raw log

chatqna-5d64d99997-w7928 chatqna INFO:     100.83.122.244:57853 - "POST /v1/chatqna HTTP/1.1" 500 Internal Server Error
chatqna-5d64d99997-w7928 chatqna ERROR:    Exception in ASGI application
chatqna-5d64d99997-w7928 chatqna     raise OSError(err, f'Connect call failed {address}')
chatqna-5d64d99997-w7928 chatqna ConnectionRefusedError: [Errno 111] Connect call failed ('172.21.103.154', 7000)
chatqna-5d64d99997-w7928 chatqna   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
chatqna-5d64d99997-w7928 chatqna   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
chatqna-5d64d99997-w7928 chatqna     raise client_error(req.connection_key, exc) from exc
chatqna-5d64d99997-w7928 chatqna aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host chatqna-retriever-usvc:7000 ssl:default [Connect call failed ('172.21.103.154', 7000)]
chatqna-5d64d99997-w7928 chatqna INFO:     100.83.122.244:36813 - "POST /v1/chatqna HTTP/1.1" 500 Internal Server Error
chatqna-5d64d99997-w7928 chatqna ERROR:    Exception in ASGI application
chatqna-5d64d99997-w7928 chatqna     raise OSError(err, f'Connect call failed {address}')
chatqna-5d64d99997-w7928 chatqna ConnectionRefusedError: [Errno 111] Connect call failed ('172.21.103.154', 7000)
chatqna-5d64d99997-w7928 chatqna   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
chatqna-5d64d99997-w7928 chatqna   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
chatqna-5d64d99997-w7928 chatqna     raise client_error(req.connection_key, exc) from exc
chatqna-5d64d99997-w7928 chatqna aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host chatqna-retriever-usvc:7000 ssl:default [Connect call failed ('172.21.103.154', 7000)]
chatqna-5d64d99997-w7928 chatqna INFO:     100.83.122.244:43839 - "POST /v1/chatqna HTTP/1.1" 500 Internal Server Error
chatqna-5d64d99997-w7928 chatqna ERROR:    Exception in ASGI application
chatqna-5d64d99997-w7928 chatqna     raise OSError(err, f'Connect call failed {address}')
chatqna-5d64d99997-w7928 chatqna ConnectionRefusedError: [Errno 111] Connect call failed ('172.21.103.154', 7000)
chatqna-5d64d99997-w7928 chatqna   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
chatqna-5d64d99997-w7928 chatqna   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
chatqna-5d64d99997-w7928 chatqna     raise client_error(req.connection_key, exc) from exc
chatqna-5d64d99997-w7928 chatqna aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host chatqna-retriever-usvc:7000 ssl:default [Connect call failed ('172.21.103.154', 7000)]
chatqna-5d64d99997-w7928 chatqna INFO:     100.83.122.244:1674 - "POST /v1/chatqna HTTP/1.1" 500 Internal Server Error
chatqna-5d64d99997-w7928 chatqna ERROR:    Exception in ASGI application
chatqna-5d64d99997-w7928 chatqna     raise OSError(err, f'Connect call failed {address}')
chatqna-5d64d99997-w7928 chatqna ConnectionRefusedError: [Errno 111] Connect call failed ('172.21.103.154', 7000)
chatqna-5d64d99997-w7928 chatqna   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
chatqna-5d64d99997-w7928 chatqna   File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
chatqna-5d64d99997-w7928 chatqna     raise client_error(req.connection_key, exc) from exc
chatqna-5d64d99997-w7928 chatqna aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host chatqna-retriever-usvc:7000 ssl:default [Connect call failed ('172.21.103.154', 7000)]

Attachments

No response

Metadata

Metadata

Labels

A0ScrubeDevbugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions