diff --git a/MultimodalQnA/README.md b/MultimodalQnA/README.md
index bda42ee285..df0cb91127 100644
--- a/MultimodalQnA/README.md
+++ b/MultimodalQnA/README.md
@@ -41,12 +41,14 @@ flowchart LR
UI([UI server
]):::orchid
end
+ ASR{{Whisper service
}}
TEI_EM{{Embedding service
}}
VDB{{Vector DB
}}
R_RET{{Retriever service
}}
DP([Data Preparation
]):::blue
LVM_gen{{LVM Service
}}
GW([MultimodalQnA GateWay
]):::orange
+ TTS{{SpeechT5 service
}}
%% Data Preparation flow
%% Ingest data flow
@@ -74,25 +76,42 @@ flowchart LR
R_RET <-.->VDB
DP <-.->VDB
+ %% Audio speech recognition used for translating audio queries to text
+ GW <-.-> ASR
+ %% Generate spoken responses with text-to-speech using the SpeechT5 model
+ GW <-.-> TTS
```
This MultimodalQnA use case performs Multimodal-RAG using LangChain, Redis VectorDB and Text Generation Inference on [Intel Gaudi2](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html) and [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html), and we invite contributions from other hardware vendors to expand the example.
+The [Whisper Service](https://github.com/opea-project/GenAIComps/blob/main/comps/asr/src/README.md)
+is used by MultimodalQnA for converting audio queries to text. If a spoken response is requested, the
+[SpeechT5 Service](https://github.com/opea-project/GenAIComps/blob/main/comps/tts/src/README.md) translates the text
+response from the LVM to a speech audio file.
+
The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Visit [Habana AI products](https://habana.ai/products) for more details.
In the below, we provide a table that describes for each microservice component in the MultimodalQnA architecture, the default configuration of the open source project, hardware, port, and endpoint.
-Gaudi default compose.yaml
+Gaudi and Xeon default compose.yaml settings
| MicroService | Open Source Project | HW | Port | Endpoint |
| ------------ | --------------------- | ----- | ---- | ----------------------------------------------------------- |
+| Dataprep | Redis, Langchain, TGI | Xeon | 6007 | /v1/generate_transcripts, /v1/generate_captions, /v1/ingest |
| Embedding | Langchain | Xeon | 6000 | /v1/embeddings |
-| Retriever | Langchain, Redis | Xeon | 7000 | /v1/multimodal_retrieval |
-| LVM | Langchain, TGI | Gaudi | 9399 | /v1/lvm |
+| LVM | Langchain, Transformers | Xeon | 9399 | /v1/lvm |
+| Retriever | Langchain, Redis | Xeon | 7000 | /v1/retrieval |
+| SpeechT5 | Transformers | Xeon | 7055 | /v1/tts |
+| Whisper | Transformers | Xeon | 7066 | /v1/asr |
| Dataprep | Redis, Langchain, TGI | Gaudi | 6007 | /v1/generate_transcripts, /v1/generate_captions, /v1/ingest |
+| Embedding | Langchain | Gaudi | 6000 | /v1/embeddings |
+| LVM | Langchain, TGI | Gaudi | 9399 | /v1/lvm |
+| Retriever | Langchain, Redis | Gaudi | 7000 | /v1/retrieval |
+| SpeechT5 | Transformers | Gaudi | 7055 | /v1/tts |
+| Whisper | Transformers | Gaudi | 7066 | /v1/asr |
@@ -104,8 +123,12 @@ By default, the embedding and LVM models are set to a default value as listed be
| --------- | ----- | ----------------------------------------- |
| embedding | Xeon | BridgeTower/bridgetower-large-itm-mlm-itc |
| LVM | Xeon | llava-hf/llava-1.5-7b-hf |
+| SpeechT5 | Xeon | microsoft/speecht5_tts |
+| Whisper | Xeon | openai/whisper-small |
| embedding | Gaudi | BridgeTower/bridgetower-large-itm-mlm-itc |
| LVM | Gaudi | llava-hf/llava-v1.6-vicuna-13b-hf |
+| SpeechT5 | Gaudi | microsoft/speecht5_tts |
+| Whisper | Gaudi | openai/whisper-small |
You can choose other LVM models, such as `llava-hf/llava-1.5-7b-hf ` and `llava-hf/llava-1.5-13b-hf`, as needed.
@@ -113,9 +136,28 @@ You can choose other LVM models, such as `llava-hf/llava-1.5-7b-hf ` and `llava-
The MultimodalQnA service can be effortlessly deployed on either Intel Gaudi2 or Intel XEON Scalable Processors.
-Currently we support deploying MultimodalQnA services with docker compose.
+Currently we support deploying MultimodalQnA services with docker compose. The [`docker_compose`](docker_compose)
+directory has folders which include `compose.yaml` files for different hardware types:
+
+```
+📂 docker_compose
+├── 📂 amd
+│  └── 📂 gpu
+│  └── 📂 rocm
+│  ├── 📄 compose.yaml
+│  └── ...
+└── 📂 intel
+ ├── 📂 cpu
+ │  └── 📂 xeon
+ │  ├── 📄 compose.yaml
+ │  └── ...
+ └── 📂 hpu
+ └── 📂 gaudi
+ ├── 📄 compose.yaml
+ └── ...
+```
-### Setup Environment Variable
+### Setup Environment Variables
To set up environment variables for deploying MultimodalQnA services, follow these steps:
@@ -124,8 +166,10 @@ To set up environment variables for deploying MultimodalQnA services, follow the
```bash
# Example: export host_ip=$(hostname -I | awk '{print $1}')
export host_ip="External_Public_IP"
+
+ # Append the host_ip to the no_proxy list to allow container communication
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
- export no_proxy="Your_No_Proxy"
+ export no_proxy="${no_proxy},${host_ip}"
```
2. If you are in a proxy environment, also set the proxy-related environment variables:
@@ -137,36 +181,41 @@ To set up environment variables for deploying MultimodalQnA services, follow the
3. Set up other environment variables:
- > Notice that you can only choose **one** command below to set up envs according to your hardware. Other that the port numbers may be set incorrectly.
+ > Choose **one** command below to set env vars according to your hardware. Otherwise, the port numbers may be set incorrectly.
```bash
# on Gaudi
- source ./docker_compose/intel/hpu/gaudi/set_env.sh
+ cd docker_compose/intel/hpu/gaudi
+ source ./set_env.sh
+
# on Xeon
- source ./docker_compose/intel/cpu/xeon/set_env.sh
+ cd docker_compose/intel/cpu/xeon
+ source ./set_env.sh
```
### Deploy MultimodalQnA on Gaudi
-Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) to build docker images from source.
+Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) if you would like to build docker images from
+source, otherwise images will be pulled from Docker Hub.
Find the corresponding [compose.yaml](./docker_compose/intel/hpu/gaudi/compose.yaml).
```bash
-cd GenAIExamples/MultimodalQnA/docker_compose/intel/hpu/gaudi/
+# While still in the docker_compose/intel/hpu/gaudi directory, use docker compose to bring up the services
docker compose -f compose.yaml up -d
```
-> Notice: Currently only the **Habana Driver 1.17.x** is supported for Gaudi.
+> Notice: Currently only the **Habana Driver 1.18.x** is supported for Gaudi.
### Deploy MultimodalQnA on Xeon
-Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for more instructions on building docker images from source.
+Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) if you would like to build docker images from
+source, otherwise images will be pulled from Docker Hub.
Find the corresponding [compose.yaml](./docker_compose/intel/cpu/xeon/compose.yaml).
```bash
-cd GenAIExamples/MultimodalQnA/docker_compose/intel/cpu/xeon/
+# While still in the docker_compose/intel/cpu/xeon directory, use docker compose to bring up the services
docker compose -f compose.yaml up -d
```
diff --git a/MultimodalQnA/docker_compose/intel/cpu/xeon/README.md b/MultimodalQnA/docker_compose/intel/cpu/xeon/README.md
index 7fdfaabad4..6c8293bb87 100644
--- a/MultimodalQnA/docker_compose/intel/cpu/xeon/README.md
+++ b/MultimodalQnA/docker_compose/intel/cpu/xeon/README.md
@@ -63,7 +63,7 @@ Since the `compose.yaml` will consume some environment variables, you need to se
**Export the value of the public IP address of your Xeon server to the `host_ip` environment variable**
-> Change the External_Public_IP below with the actual IPV4 value
+> Change the External_Public_IP below with the actual IPV4 value when setting the `host_ip` value (do not use localhost).
```
export host_ip="External_Public_IP"
@@ -72,13 +72,10 @@ export host_ip="External_Public_IP"
**Append the value of the public IP address to the no_proxy list**
```bash
-export your_no_proxy=${your_no_proxy},"External_Public_IP"
+export no_proxy=${no_proxy},${host_ip}
```
```bash
-export no_proxy=${your_no_proxy}
-export http_proxy=${your_http_proxy}
-export https_proxy=${your_http_proxy}
export MM_EMBEDDING_SERVICE_HOST_IP=${host_ip}
export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export LVM_SERVICE_HOST_IP=${host_ip}
@@ -114,8 +111,6 @@ export UI_PORT=5173
export UI_TIMEOUT=200
```
-Note: Please replace with `host_ip` with you external IP address, do not use localhost.
-
> Note: The `MAX_IMAGES` environment variable is used to specify the maximum number of images that will be sent from the LVM service to the LLaVA server.
> If an image list longer than `MAX_IMAGES` is sent to the LVM server, a shortened image list will be sent to the LLaVA service. If the image list
> needs to be shortened, the most recent images (the ones at the end of the list) are prioritized to send to the LLaVA service. Some LLaVA models have not
diff --git a/MultimodalQnA/docker_compose/intel/cpu/xeon/set_env.sh b/MultimodalQnA/docker_compose/intel/cpu/xeon/set_env.sh
index 115fd87e93..0c61c7dc91 100755
--- a/MultimodalQnA/docker_compose/intel/cpu/xeon/set_env.sh
+++ b/MultimodalQnA/docker_compose/intel/cpu/xeon/set_env.sh
@@ -8,10 +8,6 @@ popd > /dev/null
export host_ip=$(hostname -I | awk '{print $1}')
-export no_proxy=${your_no_proxy}
-export http_proxy=${your_http_proxy}
-export https_proxy=${your_http_proxy}
-
export MM_EMBEDDING_SERVICE_HOST_IP=${host_ip}
export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export LVM_SERVICE_HOST_IP=${host_ip}
diff --git a/MultimodalQnA/docker_compose/intel/hpu/gaudi/README.md b/MultimodalQnA/docker_compose/intel/hpu/gaudi/README.md
index a47ff05fc0..56f9b0789d 100644
--- a/MultimodalQnA/docker_compose/intel/hpu/gaudi/README.md
+++ b/MultimodalQnA/docker_compose/intel/hpu/gaudi/README.md
@@ -8,7 +8,7 @@ Since the `compose.yaml` will consume some environment variables, you need to se
**Export the value of the public IP address of your Gaudi server to the `host_ip` environment variable**
-> Change the External_Public_IP below with the actual IPV4 value
+> Change the External_Public_IP below with the actual IPV4 value when setting the `host_ip` value (do not use localhost).
```
export host_ip="External_Public_IP"
@@ -17,13 +17,10 @@ export host_ip="External_Public_IP"
**Append the value of the public IP address to the no_proxy list**
```bash
-export your_no_proxy=${your_no_proxy},"External_Public_IP"
+export no_proxy=${no_proxy},${host_ip}
```
```bash
-export no_proxy=${your_no_proxy}
-export http_proxy=${your_http_proxy}
-export https_proxy=${your_http_proxy}
export MM_EMBEDDING_SERVICE_HOST_IP=${host_ip}
export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export LVM_SERVICE_HOST_IP=${host_ip}
@@ -60,8 +57,6 @@ export UI_PORT=5173
export UI_TIMEOUT=200
```
-Note: Please replace with `host_ip` with you external IP address, do not use localhost.
-
> Note: The `MAX_IMAGES` environment variable is used to specify the maximum number of images that will be sent from the LVM service to the LLaVA server.
> If an image list longer than `MAX_IMAGES` is sent to the LVM server, a shortened image list will be sent to the LLaVA service. If the image list
> needs to be shortened, the most recent images (the ones at the end of the list) are prioritized to send to the LLaVA service. Some LLaVA models have not
diff --git a/MultimodalQnA/docker_compose/intel/hpu/gaudi/set_env.sh b/MultimodalQnA/docker_compose/intel/hpu/gaudi/set_env.sh
index 002b7e1cfe..b9be945ac7 100755
--- a/MultimodalQnA/docker_compose/intel/hpu/gaudi/set_env.sh
+++ b/MultimodalQnA/docker_compose/intel/hpu/gaudi/set_env.sh
@@ -13,10 +13,6 @@ export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export LVM_SERVICE_HOST_IP=${host_ip}
export MEGA_SERVICE_HOST_IP=${host_ip}
-export no_proxy=${your_no_proxy}
-export http_proxy=${your_http_proxy}
-export https_proxy=${your_http_proxy}
-
export REDIS_DB_PORT=6379
export REDIS_INSIGHTS_PORT=8001
export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"