Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 63 additions & 14 deletions MultimodalQnA/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,12 +41,14 @@ flowchart LR
UI([UI server<br>]):::orchid
end

ASR{{Whisper service <br>}}
TEI_EM{{Embedding service <br>}}
VDB{{Vector DB<br><br>}}
R_RET{{Retriever service <br>}}
DP([Data Preparation<br>]):::blue
LVM_gen{{LVM Service <br>}}
GW([MultimodalQnA GateWay<br>]):::orange
TTS{{SpeechT5 service <br>}}

%% Data Preparation flow
%% Ingest data flow
Expand Down Expand Up @@ -74,25 +76,42 @@ flowchart LR
R_RET <-.->VDB
DP <-.->VDB

%% Audio speech recognition used for translating audio queries to text
GW <-.-> ASR

%% Generate spoken responses with text-to-speech using the SpeechT5 model
GW <-.-> TTS

```

This MultimodalQnA use case performs Multimodal-RAG using LangChain, Redis VectorDB and Text Generation Inference on [Intel Gaudi2](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html) and [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html), and we invite contributions from other hardware vendors to expand the example.

The [Whisper Service](https://github.com/opea-project/GenAIComps/blob/main/comps/asr/src/README.md)
is used by MultimodalQnA for converting audio queries to text. If a spoken response is requested, the
[SpeechT5 Service](https://github.com/opea-project/GenAIComps/blob/main/comps/tts/src/README.md) translates the text
response from the LVM to a speech audio file.

The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Visit [Habana AI products](https://habana.ai/products) for more details.

In the below, we provide a table that describes for each microservice component in the MultimodalQnA architecture, the default configuration of the open source project, hardware, port, and endpoint.

<details>
<summary><b>Gaudi default compose.yaml</b></summary>
<summary><b>Gaudi and Xeon default compose.yaml settings</b></summary>

| MicroService | Open Source Project | HW | Port | Endpoint |
| ------------ | --------------------- | ----- | ---- | ----------------------------------------------------------- |
| Dataprep | Redis, Langchain, TGI | Xeon | 6007 | /v1/generate_transcripts, /v1/generate_captions, /v1/ingest |
| Embedding | Langchain | Xeon | 6000 | /v1/embeddings |
| Retriever | Langchain, Redis | Xeon | 7000 | /v1/multimodal_retrieval |
| LVM | Langchain, TGI | Gaudi | 9399 | /v1/lvm |
| LVM | Langchain, Transformers | Xeon | 9399 | /v1/lvm |
| Retriever | Langchain, Redis | Xeon | 7000 | /v1/retrieval |
| SpeechT5 | Transformers | Xeon | 7055 | /v1/tts |
| Whisper | Transformers | Xeon | 7066 | /v1/asr |
| Dataprep | Redis, Langchain, TGI | Gaudi | 6007 | /v1/generate_transcripts, /v1/generate_captions, /v1/ingest |
| Embedding | Langchain | Gaudi | 6000 | /v1/embeddings |
| LVM | Langchain, TGI | Gaudi | 9399 | /v1/lvm |
| Retriever | Langchain, Redis | Gaudi | 7000 | /v1/retrieval |
| SpeechT5 | Transformers | Gaudi | 7055 | /v1/tts |
| Whisper | Transformers | Gaudi | 7066 | /v1/asr |

</details>

Expand All @@ -104,18 +123,41 @@ By default, the embedding and LVM models are set to a default value as listed be
| --------- | ----- | ----------------------------------------- |
| embedding | Xeon | BridgeTower/bridgetower-large-itm-mlm-itc |
| LVM | Xeon | llava-hf/llava-1.5-7b-hf |
| SpeechT5 | Xeon | microsoft/speecht5_tts |
| Whisper | Xeon | openai/whisper-small |
| embedding | Gaudi | BridgeTower/bridgetower-large-itm-mlm-itc |
| LVM | Gaudi | llava-hf/llava-v1.6-vicuna-13b-hf |
| SpeechT5 | Gaudi | microsoft/speecht5_tts |
| Whisper | Gaudi | openai/whisper-small |

You can choose other LVM models, such as `llava-hf/llava-1.5-7b-hf ` and `llava-hf/llava-1.5-13b-hf`, as needed.

## Deploy MultimodalQnA Service

The MultimodalQnA service can be effortlessly deployed on either Intel Gaudi2 or Intel XEON Scalable Processors.

Currently we support deploying MultimodalQnA services with docker compose.
Currently we support deploying MultimodalQnA services with docker compose. The [`docker_compose`](docker_compose)
directory has folders which include `compose.yaml` files for different hardware types:

```
📂 docker_compose
├── 📂 amd
│   └── 📂 gpu
│   └── 📂 rocm
│   ├── 📄 compose.yaml
│   └── ...
└── 📂 intel
├── 📂 cpu
│   └── 📂 xeon
│   ├── 📄 compose.yaml
│   └── ...
└── 📂 hpu
└── 📂 gaudi
├── 📄 compose.yaml
└── ...
```

### Setup Environment Variable
### Setup Environment Variables

To set up environment variables for deploying MultimodalQnA services, follow these steps:

Expand All @@ -124,8 +166,10 @@ To set up environment variables for deploying MultimodalQnA services, follow the
```bash
# Example: export host_ip=$(hostname -I | awk '{print $1}')
export host_ip="External_Public_IP"

# Append the host_ip to the no_proxy list to allow container communication
# Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1"
export no_proxy="Your_No_Proxy"
export no_proxy="${no_proxy},${host_ip}"
```

2. If you are in a proxy environment, also set the proxy-related environment variables:
Expand All @@ -137,36 +181,41 @@ To set up environment variables for deploying MultimodalQnA services, follow the

3. Set up other environment variables:

> Notice that you can only choose **one** command below to set up envs according to your hardware. Other that the port numbers may be set incorrectly.
> Choose **one** command below to set env vars according to your hardware. Otherwise, the port numbers may be set incorrectly.

```bash
# on Gaudi
source ./docker_compose/intel/hpu/gaudi/set_env.sh
cd docker_compose/intel/hpu/gaudi
source ./set_env.sh

# on Xeon
source ./docker_compose/intel/cpu/xeon/set_env.sh
cd docker_compose/intel/cpu/xeon
source ./set_env.sh
```

### Deploy MultimodalQnA on Gaudi

Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) to build docker images from source.
Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) if you would like to build docker images from
source, otherwise images will be pulled from Docker Hub.

Find the corresponding [compose.yaml](./docker_compose/intel/hpu/gaudi/compose.yaml).

```bash
cd GenAIExamples/MultimodalQnA/docker_compose/intel/hpu/gaudi/
# While still in the docker_compose/intel/hpu/gaudi directory, use docker compose to bring up the services
docker compose -f compose.yaml up -d
```

> Notice: Currently only the **Habana Driver 1.17.x** is supported for Gaudi.
> Notice: Currently only the **Habana Driver 1.18.x** is supported for Gaudi.

### Deploy MultimodalQnA on Xeon

Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for more instructions on building docker images from source.
Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) if you would like to build docker images from
source, otherwise images will be pulled from Docker Hub.

Find the corresponding [compose.yaml](./docker_compose/intel/cpu/xeon/compose.yaml).

```bash
cd GenAIExamples/MultimodalQnA/docker_compose/intel/cpu/xeon/
# While still in the docker_compose/intel/cpu/xeon directory, use docker compose to bring up the services
docker compose -f compose.yaml up -d
```

Expand Down
9 changes: 2 additions & 7 deletions MultimodalQnA/docker_compose/intel/cpu/xeon/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Since the `compose.yaml` will consume some environment variables, you need to se

**Export the value of the public IP address of your Xeon server to the `host_ip` environment variable**

> Change the External_Public_IP below with the actual IPV4 value
> Change the External_Public_IP below with the actual IPV4 value when setting the `host_ip` value (do not use localhost).

```
export host_ip="External_Public_IP"
Expand All @@ -72,13 +72,10 @@ export host_ip="External_Public_IP"
**Append the value of the public IP address to the no_proxy list**

```bash
export your_no_proxy=${your_no_proxy},"External_Public_IP"
export no_proxy=${no_proxy},${host_ip}
Comment thread
HarshaRamayanam marked this conversation as resolved.
```

```bash
export no_proxy=${your_no_proxy}
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
export MM_EMBEDDING_SERVICE_HOST_IP=${host_ip}
export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export LVM_SERVICE_HOST_IP=${host_ip}
Expand Down Expand Up @@ -114,8 +111,6 @@ export UI_PORT=5173
export UI_TIMEOUT=200
```

Note: Please replace with `host_ip` with you external IP address, do not use localhost.

> Note: The `MAX_IMAGES` environment variable is used to specify the maximum number of images that will be sent from the LVM service to the LLaVA server.
> If an image list longer than `MAX_IMAGES` is sent to the LVM server, a shortened image list will be sent to the LLaVA service. If the image list
> needs to be shortened, the most recent images (the ones at the end of the list) are prioritized to send to the LLaVA service. Some LLaVA models have not
Expand Down
4 changes: 0 additions & 4 deletions MultimodalQnA/docker_compose/intel/cpu/xeon/set_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,6 @@ popd > /dev/null

export host_ip=$(hostname -I | awk '{print $1}')

export no_proxy=${your_no_proxy}
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}

export MM_EMBEDDING_SERVICE_HOST_IP=${host_ip}
export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export LVM_SERVICE_HOST_IP=${host_ip}
Expand Down
9 changes: 2 additions & 7 deletions MultimodalQnA/docker_compose/intel/hpu/gaudi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Since the `compose.yaml` will consume some environment variables, you need to se

**Export the value of the public IP address of your Gaudi server to the `host_ip` environment variable**

> Change the External_Public_IP below with the actual IPV4 value
> Change the External_Public_IP below with the actual IPV4 value when setting the `host_ip` value (do not use localhost).

```
export host_ip="External_Public_IP"
Expand All @@ -17,13 +17,10 @@ export host_ip="External_Public_IP"
**Append the value of the public IP address to the no_proxy list**

```bash
export your_no_proxy=${your_no_proxy},"External_Public_IP"
export no_proxy=${no_proxy},${host_ip}
```

```bash
export no_proxy=${your_no_proxy}
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
export MM_EMBEDDING_SERVICE_HOST_IP=${host_ip}
export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export LVM_SERVICE_HOST_IP=${host_ip}
Expand Down Expand Up @@ -60,8 +57,6 @@ export UI_PORT=5173
export UI_TIMEOUT=200
```

Note: Please replace with `host_ip` with you external IP address, do not use localhost.

> Note: The `MAX_IMAGES` environment variable is used to specify the maximum number of images that will be sent from the LVM service to the LLaVA server.
> If an image list longer than `MAX_IMAGES` is sent to the LVM server, a shortened image list will be sent to the LLaVA service. If the image list
> needs to be shortened, the most recent images (the ones at the end of the list) are prioritized to send to the LLaVA service. Some LLaVA models have not
Expand Down
4 changes: 0 additions & 4 deletions MultimodalQnA/docker_compose/intel/hpu/gaudi/set_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,6 @@ export MM_RETRIEVER_SERVICE_HOST_IP=${host_ip}
export LVM_SERVICE_HOST_IP=${host_ip}
export MEGA_SERVICE_HOST_IP=${host_ip}

export no_proxy=${your_no_proxy}
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}

export REDIS_DB_PORT=6379
export REDIS_INSIGHTS_PORT=8001
export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"
Expand Down