Priority
P2-High
OS type
Ubuntu
Hardware type
Gaudi2
Installation method
Deploy method
Running nodes
Single Node
What's the version?
chatqna v1.2 norerank
chatqna-chatqna-ui-695995789c-sz67q opea/chatqna-ui:1.2
chatqna-data-prep-67f484b58f-xwvct opea/dataprep:1.2
chatqna-db8987c4c-slm6z opea/chatqna-without-rerank:1.2
chatqna-nginx-6d9df4b75b-swts2 opea/nginx:latest
chatqna-redis-vector-db-66c94f7fc5-csfx2 redis/redis-stack:7.2.0-v9
chatqna-retriever-usvc-5b64ff97c8-4fkd9 opea/retriever:1.2
chatqna-tei-7fc4845868-lr2wx ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
chatqna-tgi-f5fc79849-bhrsk ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-fv48f ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-jdmwj ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-jwsxb ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-nhkdj ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-q5glp ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-td5lk ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-zxtr9 ghcr.io/huggingface/tgi-gaudi:2.3.1
Description
Get error when load is heavy.
Use benchmark to get perf data. there are error message : "can't connect to retriever" (see message below) when concurrency exceed 32 .
There are no error when concurrency is below 16(1,2,4,8,16).
This phenomenon only occurs in version 1.2; it did not exist in previous versions(v1.1,v1.0,v0.8, etc.)
Reproduce steps
- deploy the chatqna v1.2 by using helm-chart
- send request by using benchmarking scripts
- see the log in chatqna backend
Raw log
chatqna-5d64d99997-w7928 chatqna INFO: 100.83.122.244:57853 - "POST /v1/chatqna HTTP/1.1" 500 Internal Server Error
chatqna-5d64d99997-w7928 chatqna ERROR: Exception in ASGI application
chatqna-5d64d99997-w7928 chatqna raise OSError(err, f'Connect call failed {address}')
chatqna-5d64d99997-w7928 chatqna ConnectionRefusedError: [Errno 111] Connect call failed ('172.21.103.154', 7000)
chatqna-5d64d99997-w7928 chatqna File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
chatqna-5d64d99997-w7928 chatqna File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
chatqna-5d64d99997-w7928 chatqna raise client_error(req.connection_key, exc) from exc
chatqna-5d64d99997-w7928 chatqna aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host chatqna-retriever-usvc:7000 ssl:default [Connect call failed ('172.21.103.154', 7000)]
chatqna-5d64d99997-w7928 chatqna INFO: 100.83.122.244:36813 - "POST /v1/chatqna HTTP/1.1" 500 Internal Server Error
chatqna-5d64d99997-w7928 chatqna ERROR: Exception in ASGI application
chatqna-5d64d99997-w7928 chatqna raise OSError(err, f'Connect call failed {address}')
chatqna-5d64d99997-w7928 chatqna ConnectionRefusedError: [Errno 111] Connect call failed ('172.21.103.154', 7000)
chatqna-5d64d99997-w7928 chatqna File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
chatqna-5d64d99997-w7928 chatqna File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
chatqna-5d64d99997-w7928 chatqna raise client_error(req.connection_key, exc) from exc
chatqna-5d64d99997-w7928 chatqna aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host chatqna-retriever-usvc:7000 ssl:default [Connect call failed ('172.21.103.154', 7000)]
chatqna-5d64d99997-w7928 chatqna INFO: 100.83.122.244:43839 - "POST /v1/chatqna HTTP/1.1" 500 Internal Server Error
chatqna-5d64d99997-w7928 chatqna ERROR: Exception in ASGI application
chatqna-5d64d99997-w7928 chatqna raise OSError(err, f'Connect call failed {address}')
chatqna-5d64d99997-w7928 chatqna ConnectionRefusedError: [Errno 111] Connect call failed ('172.21.103.154', 7000)
chatqna-5d64d99997-w7928 chatqna File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
chatqna-5d64d99997-w7928 chatqna File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
chatqna-5d64d99997-w7928 chatqna raise client_error(req.connection_key, exc) from exc
chatqna-5d64d99997-w7928 chatqna aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host chatqna-retriever-usvc:7000 ssl:default [Connect call failed ('172.21.103.154', 7000)]
chatqna-5d64d99997-w7928 chatqna INFO: 100.83.122.244:1674 - "POST /v1/chatqna HTTP/1.1" 500 Internal Server Error
chatqna-5d64d99997-w7928 chatqna ERROR: Exception in ASGI application
chatqna-5d64d99997-w7928 chatqna raise OSError(err, f'Connect call failed {address}')
chatqna-5d64d99997-w7928 chatqna ConnectionRefusedError: [Errno 111] Connect call failed ('172.21.103.154', 7000)
chatqna-5d64d99997-w7928 chatqna File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
chatqna-5d64d99997-w7928 chatqna File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
chatqna-5d64d99997-w7928 chatqna raise client_error(req.connection_key, exc) from exc
chatqna-5d64d99997-w7928 chatqna aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host chatqna-retriever-usvc:7000 ssl:default [Connect call failed ('172.21.103.154', 7000)]
Attachments
No response
Priority
P2-High
OS type
Ubuntu
Hardware type
Gaudi2
Installation method
Deploy method
Running nodes
Single Node
What's the version?
chatqna v1.2 norerank
chatqna-chatqna-ui-695995789c-sz67q opea/chatqna-ui:1.2
chatqna-data-prep-67f484b58f-xwvct opea/dataprep:1.2
chatqna-db8987c4c-slm6z opea/chatqna-without-rerank:1.2
chatqna-nginx-6d9df4b75b-swts2 opea/nginx:latest
chatqna-redis-vector-db-66c94f7fc5-csfx2 redis/redis-stack:7.2.0-v9
chatqna-retriever-usvc-5b64ff97c8-4fkd9 opea/retriever:1.2
chatqna-tei-7fc4845868-lr2wx ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
chatqna-tgi-f5fc79849-bhrsk ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-fv48f ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-jdmwj ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-jwsxb ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-nhkdj ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-q5glp ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-td5lk ghcr.io/huggingface/tgi-gaudi:2.3.1
chatqna-tgi-f5fc79849-zxtr9 ghcr.io/huggingface/tgi-gaudi:2.3.1
Description
Get error when load is heavy.
Use benchmark to get perf data. there are error message : "can't connect to retriever" (see message below) when concurrency exceed 32 .
There are no error when concurrency is below 16(1,2,4,8,16).
This phenomenon only occurs in version 1.2; it did not exist in previous versions(v1.1,v1.0,v0.8, etc.)
Reproduce steps
Raw log
Attachments
No response