diff --git a/Translation/README.md b/Translation/README.md index 80d7977c8e..f79ecbfcb3 100644 --- a/Translation/README.md +++ b/Translation/README.md @@ -1,8 +1,15 @@ # Translation Application -Language Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. +The Translation example demonstrates the implementation of language translation using OPEA component-level microservices. -Translation architecture shows below: +## Table of contents + +1. [Architecture](#architecture) +2. [Deployment Options](#deployment-options) + +## Architecture + +The architecture of the Translation Application is illustrated below: ![architecture](./assets/img/translation_architecture.png) @@ -60,14 +67,12 @@ flowchart LR This Translation use case performs Language Translation Inference across multiple platforms. Currently, we provide the example for [Intel Gaudi2](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html) and [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html), and we invite contributions from other hardware vendors to expand OPEA ecosystem. -## Deploy Translation Service - -The Translation service can be effortlessly deployed on either Intel Gaudi2 or Intel Xeon Scalable Processors. - -### Deploy Translation on Gaudi - -Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) for instructions on deploying Translation on Gaudi. +## Deployment Options -### Deploy Translation on Xeon +The table below lists the available deployment options and their implementation details for different hardware platforms. -Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for instructions on deploying Translation on Xeon. +| Platform | Deployment Method | Link | +| ------------ | ----------------- | ----------------------------------------------------------------- | +| Intel Xeon | Docker compose | [Deployment on Xeon](./docker_compose/intel/cpu/xeon/README.md) | +| Intel Gaudi2 | Docker compose | [Deployment on Gaudi](./docker_compose/intel/hpu/gaudi/README.md) | +| AMD ROCm | Docker compose | [Deployment on AMD Rocm](./docker_compose/amd/gpu/rocm/README.md) | diff --git a/Translation/docker_compose/amd/gpu/rocm/README.md b/Translation/docker_compose/amd/gpu/rocm/README.md index b2a56bf1d0..827df63f29 100644 --- a/Translation/docker_compose/amd/gpu/rocm/README.md +++ b/Translation/docker_compose/amd/gpu/rocm/README.md @@ -1,364 +1,87 @@ -# Build and deploy Translation Application on AMD GPU (ROCm) +# Example Translation Deployment on AMD GPU (ROCm) -## Build Docker Images +This document outlines the deployment process for a Translation service utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on AMD GPU (ROCm). This example includes the following sections: -### 1. Build Docker Image +- [Translation Quick Start Deployment](#translation-quick-start-deployment): Demonstrates how to quickly deploy a Translation service/pipeline on AMD GPU (ROCm). +- [Translation Docker Compose Files](#translation-docker-compose-files): Describes some example deployments and their docker compose files. +- [Translation Service Configuration](#translation-service-configuration): Describes the service and possible configuration changes. -- #### Create application install directory and go to it: +## Translation Quick Start Deployment - ```bash - mkdir ~/translation-install && cd translation-install - ``` +This section describes how to quickly deploy and test the Translation service manually on AMD GPU (ROCm). The basic steps are: -- #### Clone the repository GenAIExamples (the default repository branch "main" is used here): +1. [Access the Code](#access-the-code) +2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token) +3. [Configure the Deployment Environment](#configure-the-deployment-environment) +4. [Deploy the Service Using Docker Compose](#deploy-the-service-using-docker-compose) +5. [Check the Deployment Status](#check-the-deployment-status) +6. [Test the Pipeline](#test-the-pipeline) +7. [Cleanup the Deployment](#cleanup-the-deployment) - ```bash - git clone https://github.com/opea-project/GenAIExamples.git - ``` +### Access the Code - If you need to use a specific branch/tag of the GenAIExamples repository, then (v1.3 replace with its own value): +Clone the GenAIExample repository and access the Translation AMD GPU (ROCm) Docker Compose files and supporting scripts: - ```bash - git clone https://github.com/opea-project/GenAIExamples.git && cd GenAIExamples && git checkout v1.3 - ``` - - We remind you that when using a specific version of the code, you need to use the README from this version: - -- #### Go to build directory: - - ```bash - cd ~/translation-install/GenAIExamples/Translation/docker_image_build - ``` - -- Cleaning up the GenAIComps repository if it was previously cloned in this directory. - This is necessary if the build was performed earlier and the GenAIComps folder exists and is not empty: - - ```bash - echo Y | rm -R GenAIComps - ``` - -- #### Clone the repository GenAIComps (the default repository branch "main" is used here): - - ```bash - git clone https://github.com/opea-project/GenAIComps.git - ``` - - If you use a specific tag of the GenAIExamples repository, - then you should also use the corresponding tag for GenAIComps. (v1.3 replace with its own value): - - ```bash - git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout v1.3 - ``` - - We remind you that when using a specific version of the code, you need to use the README from this version. - -- #### Setting the list of images for the build (from the build file.yaml) - - If you want to deploy a vLLM-based or TGI-based application, then the set of services is installed as follows: - - #### vLLM-based application - - ```bash - service_list="vllm-rocm translation translation-ui llm-textgen nginx" - ``` - - #### TGI-based application - - ```bash - service_list="translation translation-ui llm-textgen nginx" - ``` - -- #### Optional. Pull TGI Docker Image (Do this if you want to use TGI) - - ```bash - docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm - ``` - -- #### Build Docker Images - - ```bash - docker compose -f build.yaml build ${service_list} --no-cache - ``` - - After the build, we check the list of images with the command: - - ```bash - docker image ls - ``` - - The list of images should include: - - ##### vLLM-based application: - - - opea/vllm-rocm:latest - - opea/llm-textgen:latest - - opea/nginx:latest - - opea/translation:latest - - opea/translation-ui:latest - - ##### TGI-based application: - - - ghcr.io/huggingface/text-generation-inference:2.3.1-rocm - - opea/llm-textgen:latest - - opea/nginx:latest - - opea/translation:latest - - opea/translation-ui:latest - ---- - -### Docker Compose Configuration for AMD GPUs - -To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file: - -- compose_vllm.yaml - for vLLM-based application -- compose.yaml - for TGI-based - -```yaml -shm_size: 1g -devices: - - /dev/kfd:/dev/kfd - - /dev/dri/:/dev/dri/ -cap_add: - - SYS_PTRACE -group_add: - - video -security_opt: - - seccomp:unconfined ``` - -This configuration forwards all available GPUs to the container. To use a specific GPU, specify its `cardN` and `renderN` device IDs. For example: - -```yaml -shm_size: 1g -devices: - - /dev/kfd:/dev/kfd - - /dev/dri/card0:/dev/dri/card0 - - /dev/dri/renderD128:/dev/dri/renderD128 -cap_add: - - SYS_PTRACE -group_add: - - video -security_opt: - - seccomp:unconfined +git clone https://github.com/opea-project/GenAIExamples.git +cd GenAIExamples/Translation/docker_compose/amd/gpu/rocm/ ``` -**How to Identify GPU Device IDs:** -Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU. +Checkout a released version, such as v1.2: -### Set deploy environment variables - -#### Setting variables in the operating system environment: - -##### Set variable HUGGINGFACEHUB_API_TOKEN: - -```bash -### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token. -export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token' ``` - -#### Set variables value in set_env\*\*\*\*.sh file: - -Go to Docker Compose directory: - -```bash -cd ~/translation-install/GenAIExamples/Translation/docker_compose/amd/gpu/rocm +git checkout v1.2 ``` -The example uses the Nano text editor. You can use any convenient text editor: +### Generate a HuggingFace Access Token -#### If you use vLLM +Some HuggingFace resources, such as some models, are only accessible if you have an access token. If you do not already have a HuggingFace access token, you can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). -```bash -nano set_env_vllm.sh -``` +### Configure the Deployment Environment -#### If you use TGI +To set up environment variables for deploying Translation service, source the _set_env.sh_ or _set_env_vllm.sh_ script in this directory: -```bash -nano set_env.sh ``` - -If you are in a proxy environment, also set the proxy-related environment variables: - -```bash -export http_proxy="Your_HTTP_Proxy" -export https_proxy="Your_HTTPs_Proxy" +//with TGI: +source ./set_env.sh ``` -Set the values of the variables: - -- **HOST_IP, HOST_IP_EXTERNAL** - These variables are used to configure the name/address of the service in the operating system environment for the application services to interact with each other and with the outside world. - - If your server uses only an internal address and is not accessible from the Internet, then the values for these two variables will be the same and the value will be equal to the server's internal name/address. - - If your server uses only an external, Internet-accessible address, then the values for these two variables will be the same and the value will be equal to the server's external name/address. - - If your server is located on an internal network, has an internal address, but is accessible from the Internet via a proxy/firewall/load balancer, then the HOST_IP variable will have a value equal to the internal name/address of the server, and the EXTERNAL_HOST_IP variable will have a value equal to the external name/address of the proxy/firewall/load balancer behind which the server is located. - - We set these values in the file set_env\*\*\*\*.sh - -- **Variables with names like "**\*\*\*\*\*\*\_PORT"\*\* - These variables set the IP port numbers for establishing network connections to the application services. - The values shown in the file set_env.sh or set_env_vllm they are the values used for the development and testing of the application, as well as configured for the environment in which the development is performed. These values must be configured in accordance with the rules of network access to your environment's server, and must not overlap with the IP ports of other applications that are already in use. - -#### Set variables with script set_env\*\*\*\*.sh - -#### If you use vLLM - -```bash -. set_env_vllm.sh ``` - -#### If you use TGI - -```bash -. set_env.sh +//with VLLM: +source ./set_env_vllm.sh ``` -### Start the services: +The _set_env.sh_ script will prompt for required and optional environment variables used to configure the Translation service based on TGI. The _set_env_vllm.sh_ script will prompt for required and optional environment variables used to configure the Translation service based on VLLM. If a value is not entered, the script will use a default value for the same. It will also generate a _.env_ file defining the desired configuration. Consult the section on [Translation Service configuration](#translation-service-configuration) for information on how service specific configuration parameters affect deployments. -#### If you use vLLM +### Deploy the Service Using Docker Compose -```bash -docker compose -f compose_vllm.yaml up -d -``` - -#### If you use TGI +To deploy the Translation service, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute: ```bash +//with TGI: docker compose -f compose.yaml up -d ``` -All containers should be running and should not restart: - -##### If you use vLLM: - -- translationn-vllm-service -- translation-tgi-service -- translation-llm -- translation-backend-server -- translation-ui-server -- translation-nginx-server - -##### If you use TGI: - -- translation-tgi-service -- translation-llm -- translation-backend-server -- translation-ui-server -- translation-nginx-server - ---- - -## Validate the Services - -### 1. Validate the vLLM/TGI Service - -#### If you use vLLM: - ```bash -DATA='{"model": "haoranxu/ALMA-13B", "prompt": "What is Deep Learning?", "max_tokens": 100, "temperature": 0}' - -curl http://${HOST_IP}:${TRANSLATION_VLLM_SERVICE_PORT}/v1/chat/completions \ - -X POST \ - -d "$DATA" \ - -H 'Content-Type: application/json' -``` - -Checking the response from the service. The response should be similar to JSON: - -```json -{ - "id": "cmpl-059dd7fb311a46c2b807e0b3315e730c", - "object": "text_completion", - "created": 1743063706, - "model": "haoranxu/ALMA-13B", - "choices": [ - { - "index": 0, - "text": " Deep Learning is a subset of machine learning. It attempts to mimic the way the human brain learns. Deep Learning is a subset of machine learning. It attempts to mimic the way the human brain learns. Deep Learning is a subset of machine learning. It attempts to mimic the way the human brain learns. Deep Learning is a subset of machine learning. It attempts to mimic the way the human brain learns. Deep Learning is a subset of machine learning", - "logprobs": null, - "finish_reason": "length", - "stop_reason": null, - "prompt_logprobs": null - } - ], - - "usage": { - "prompt_tokens": 6, - "total_tokens": 106, - "completion_tokens": 100, - "prompt_tokens_details": null - } -} +//with VLLM: +docker compose -f compose_vllm.yaml up -d ``` -If the service response has a meaningful response in the value of the "choices.message.content" key, -then we consider the vLLM service to be successfully launched - -#### If you use TGI: +The Translation docker images should automatically be downloaded from the `OPEA registry` and deployed on the AMD GPU (ROCm) -```bash -DATA='{"inputs":"What is Deep Learning?",'\ -'"parameters":{"max_new_tokens":256,"do_sample": true}}' - -curl http://${HOST_IP}:${TRANSLATION_TGI_SERVICE_PORT}/generate \ - -X POST \ - -d "$DATA" \ - -H 'Content-Type: application/json' -``` +### Check the Deployment Status -Checking the response from the service. The response should be similar to JSON: +After running docker compose, check if all the containers launched via docker compose have started: -```json -{ - "generated_text": "\n\n What can it Do? What's the Hype? What Should You Do If" -} ``` - -If the service response has a meaningful response in the value of the "generated_text" key, -then we consider the TGI service to be successfully launched - -### 2. Validate the LLM Service - -```bash -DATA='{"query":"What is Deep Learning?",'\ -'"max_tokens":32,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,'\ -'"repetition_penalty":1.03,"stream":false}' - -curl http://${HOST_IP}:${TRANSLATION_LLM_SERVICE_PORT}/v1/chat/completions \ - -X POST \ - -d "$DATA" \ - -H 'Content-Type: application/json' +docker ps -a ``` -Checking the response from the service. The response should be similar to JSON: +For the default deployment, the following 5 containers should be running. -```json -{ - "id": "", - "choices": [ - { - "finish_reason": "length", - "index": 0, - "logprobs": null, - "text": " Deep Learning is a subset of machine learning. It attempts to mimic the way the human brain learns. Deep Learning is a subset of machine learning." - } - ], - "created": 1742978568, - "model": "haoranxu/ALMA-13B", - "object": "text_completion", - "system_fingerprint": "2.3.1-sha-a094729-rocm", - "usage": { - "completion_tokens": 32, - "prompt_tokens": 6, - "total_tokens": 38, - "completion_tokens_details": null, - "prompt_tokens_details": null - } -} -``` +### Test the Pipeline -### 3. Validate Nginx Service +Once the Translation service are running, test the pipeline using the following command: ```bash DATA='{"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}' @@ -386,61 +109,82 @@ data: {"id":"","choices":[{"finish_reason":"eos_token","index":0,"logprobs":null data: [DONE] ``` -### 4. Validate MegaService +**Note** The value of _host_ip_ was set using the _set_env.sh_ script and can be found in the _.env_ file. -```bash -DATA='{"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}' +### Cleanup the Deployment -curl http://${HOST_IP}:${TRANSLATION_BACKEND_SERVICE_PORT}/v1/translation \ - -H "Content-Type: application/json" \ - -d "$DATA" +To stop the containers associated with the deployment, execute the following command: + +``` +//with TGI: +docker compose -f compose.yaml down ``` -Checking the response from the service. The response should be similar to JSON: +```bash +//with VLLM: +docker compose -f compose_vllm.yaml up -d +``` -```textmate -data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" I"}],"created":1742978968,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null} +All the Translation containers will be stopped and then removed on completion of the "down" command. -data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" love"}],"created":1742978968,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null} +## Translation Docker Compose Files -data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" machine"}],"created":1742978968,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null} +The compose.yaml is default compose file using tgi as serving framework -data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" translation"}],"created":1742978968,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null} +| Service Name | Image Name | +| -------------------------- | -------------------------------------------------------- | +| translation-tgi-service | ghcr.io/huggingface/text-generation-inference:2.4.1-rocm | +| translation-llm | opea/llm-textgen:latest | +| translation-backend-server | opea/translation:latest | +| translation-ui-server | opea/translation-ui:latest | +| translation-nginx-server | opea/nginx:latest | -data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"."}],"created":1742978968,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":null} +## Translation Service Configuration for AMD GPUs -data: {"id":"","choices":[{"finish_reason":"eos_token","index":0,"logprobs":null,"text":""}],"created":1742978968,"model":"haoranxu/ALMA-13B","object":"text_completion","system_fingerprint":"2.3.1-sha-a094729-rocm","usage":{"completion_tokens":6,"prompt_tokens":3071,"total_tokens":3077,"completion_tokens_details":null,"prompt_tokens_details":null}} +To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file: -data: [DONE] +- compose_vllm.yaml - for vLLM-based service +- compose.yaml - for TGI-based +```yaml +shm_size: 1g +devices: + - /dev/kfd:/dev/kfd + - /dev/dri/:/dev/dri/ +cap_add: + - SYS_PTRACE +group_add: + - video +security_opt: + - seccomp:unconfined ``` -If the response text is similar to the one above, then we consider the service verification successful. - -### 5. Validate Frontend - -To access the UI, use the URL - http://${EXTERNAL_HOST_IP}:${TRANSLATION_FRONTEND_SERVICE_PORT} A page should open when you click through to this address: -![UI start page](../../../../assets/img/translation-ui-starting-page.png) - -If a page of this type has opened, then we believe that the service is running and responding, and we can proceed to functional UI testing. - -Let's enter the task for the service in the "Input" field. For example, "我爱机器翻译" with selected "German" as language source and press Enter. After that, a page with the result of the task should open: - -![UI start page](../../../../assets/img/translation-ui-response-example.png) -If the result shown on the page is correct, then we consider the verification of the UI service to be successful. - -### 6. Stop application - -#### If you use vLLM +This configuration forwards all available GPUs to the container. To use a specific GPU, specify its `cardN` and `renderN` device IDs. For example: -```bash -cd ~/translation-install/GenAIExamples/Translation/docker_compose/amd/gpu/rocm -docker compose -f compose_vllm.yaml down +```yaml +shm_size: 1g +devices: + - /dev/kfd:/dev/kfd + - /dev/dri/card0:/dev/dri/card0 + - /dev/dri/renderD128:/dev/dri/renderD128 +cap_add: + - SYS_PTRACE +group_add: + - video +security_opt: + - seccomp:unconfined ``` -#### If you use TGI +The table provides a comprehensive overview of the Translation service utilized across various deployments as illustrated in the example Docker Compose files. Each row in the table represents a distinct service, detailing its possible images used to enable it and a concise description of its function within the deployment architecture. -```bash -cd ~/translation-install/GenAIExamples/Translation/docker_compose/amd/gpu/rocm -docker compose -f compose.yaml down -``` +| Service Name | Possible Image Names | Optional | Description | +| -------------------------- | -------------------------------------------------------- | -------- | --------------------------------------------------------------------------------------------------- | +| translation-tgi-service | ghcr.io/huggingface/text-generation-inference:2.4.1-rocm | No | Specific to the TGI deployment, focuses on text generation inference using AMD GPU (ROCm) hardware. | +| translation-vllm-service | opea/vllm-rocm:latest | No | Handles large language model (LLM) tasks, utilizing AMD GPU (ROCm) hardware. | +| translation-llm | opea/llm-textgen:latest | No | Handles large language model (LLM) tasks | +| translation-backend-server | opea/translation:latest | No | Serves as the backend for the Translation service, with variations depending on the deployment. | +| translation-ui-server | opea/translation-ui:latest | No | Provides the user interface for the Translation service. | +| translation-nginx-server | opea/nginx:latest | No | A cts as a reverse proxy, managing traffic between the UI and backend services. | + +**How to Identify GPU Device IDs:** +Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU. diff --git a/Translation/docker_compose/intel/cpu/xeon/README.md b/Translation/docker_compose/intel/cpu/xeon/README.md index 4a41cb5385..1af360be83 100644 --- a/Translation/docker_compose/intel/cpu/xeon/README.md +++ b/Translation/docker_compose/intel/cpu/xeon/README.md @@ -1,169 +1,144 @@ -# Build Mega Service of Translation on Xeon +# Example Translation Deployment on Intel® Xeon® Platform -This document outlines the deployment process for a Translation application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `llm`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service. +This document outlines the deployment process for a Translation service utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. This example includes the following sections: -## 🚀 Apply Xeon Server on AWS +- [Translation Quick Start Deployment](#translation-quick-start-deployment): Demonstrates how to quickly deploy a Translation service/pipeline on Intel® Xeon® platform. +- [Translation Docker Compose Files](#translation-docker-compose-files): Describes some example deployments and their docker compose files. +- [Translation Service Configuration](#translation-service-configuration): Describes the service and possible configuration changes. -To apply a Xeon server on AWS, start by creating an AWS account if you don't have one already. Then, head to the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home) to begin the process. Within the EC2 service, select the Amazon EC2 M7i or M7i-flex instance type to leverage 4th Generation Intel Xeon Scalable processors. These instances are optimized for high-performance computing and demanding workloads. +## Translation Quick Start Deployment -For detailed information about these instance types, you can refer to this [link](https://aws.amazon.com/ec2/instance-types/m7i/). Once you've chosen the appropriate instance type, proceed with configuring your instance settings, including network configurations, security groups, and storage options. +This section describes how to quickly deploy and test the Translation service manually on Intel® Xeon® platform. The basic steps are: -After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed. +1. [Access the Code](#access-the-code) +2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token) +3. [Configure the Deployment Environment](#configure-the-deployment-environment) +4. [Deploy the Service Using Docker Compose](#deploy-the-service-using-docker-compose) +5. [Check the Deployment Status](#check-the-deployment-status) +6. [Test the Pipeline](#test-the-pipeline) +7. [Cleanup the Deployment](#cleanup-the-deployment) -## 🚀 Prepare Docker Images +### Access the Code -For Docker Images, you have two options to prepare them. +Clone the GenAIExample repository and access the Translation Intel® Xeon® platform Docker Compose files and supporting scripts: -1. Pull the docker images from docker hub. - - - More stable to use. - - Will be automatically downloaded when using docker compose command. - -2. Build the docker images from source. - - - Contain the latest new features. - - - Need to be manually build. - -If you choose to pull docker images form docker hub, skip this section and go to [Start Microservices](#start-microservices) part directly. - -Follow the instructions below to build the docker images from source. - -### 1. Build LLM Image - -```bash -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -docker build -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile . ``` - -### 2. Build MegaService Docker Image - -To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `translation.py` Python script. Build MegaService Docker image via below command: - -```bash -git clone https://github.com/opea-project/GenAIExamples -cd GenAIExamples/Translation/ -docker build -t opea/translation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . +git clone https://github.com/opea-project/GenAIExamples.git +cd GenAIExamples/Translation/docker_compose/intel/cpu/xeon/ ``` -### 3. Build UI Docker Image - -Build frontend Docker image via below command: +Checkout a released version, such as v1.2: -```bash -cd GenAIExamples/Translation/ui -docker build -t opea/translation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile . ``` - -### 4. Build Nginx Docker Image - -```bash -cd GenAIComps -docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/nginx/src/Dockerfile . +git checkout v1.2 ``` -Then run the command `docker images`, you will have the following Docker Images: +### Generate a HuggingFace Access Token -1. `opea/llm-textgen:latest` -2. `opea/translation:latest` -3. `opea/translation-ui:latest` -4. `opea/nginx:latest` +Some HuggingFace resources, such as some models, are only accessible if you have an access token. If you do not already have a HuggingFace access token, you can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). -## 🚀 Start Microservices +### Configure the Deployment Environment -### Required Models +To set up environment variables for deploying Translation service, source the set_env.sh script in this directory: -By default, the LLM model is set to a default value as listed below: +``` +cd ../../../ +source set_env.sh +cd intel/cpu/xeon +``` -| Service | Model | -| ------- | ----------------- | -| LLM | haoranxu/ALMA-13B | +The set_env.sh script will prompt for required and optional environment variables used to configure the Translation service. If a value is not entered, the script will use a default value for the same. It will also generate a env file defining the desired configuration. Consult the section on [Translation Service configuration](#translation-service-configuration) for information on how service specific configuration parameters affect deployments. -Change the `LLM_MODEL_ID` below for your needs. +### Deploy the Service Using Docker Compose -### Setup Environment Variables +To deploy the Translation service, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute: -1. Set the required environment variables: +```bash +docker compose up -d +``` - ```bash - # Example: host_ip="192.168.1.1" - export host_ip="External_Public_IP" - # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" - export no_proxy="Your_No_Proxy" - export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" - # Example: NGINX_PORT=80 - export NGINX_PORT=${your_nginx_port} - ``` +The Translation docker images should automatically be downloaded from the `OPEA registry` and deployed on the Intel® Xeon® Platform: -2. If you are in a proxy environment, also set the proxy-related environment variables: +``` +[+] Running 6/6 + ✔ Network xeon_default Created 0.1s + ✔ Container tgi-service Healthy 328.1s + ✔ Container llm-textgen-server Started 323.5s + ✔ Container translation-xeon-backend-server Started 323.7s + ✔ Container translation-xeon-ui-server Started 324.0s + ✔ Container translation-xeon-nginx-server Started 324.2s +``` - ```bash - export http_proxy="Your_HTTP_Proxy" - export https_proxy="Your_HTTPs_Proxy" - ``` +### Check the Deployment Status -3. Set up other environment variables: +After running docker compose, check if all the containers launched via docker compose have started: - ```bash - cd ../../../ - source set_env.sh - ``` +``` +docker ps -a +``` -### Start Microservice Docker Containers +For the default deployment, the following 5 containers should be running: -```bash -docker compose up -d ``` +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +89a39f7c917f opea/nginx:latest "/docker-entrypoint.…" 7 minutes ago Up About a minute 0.0.0.0:80->80/tcp, :::80->80/tcp translation-xeon-nginx-server +68b8b86a737e opea/translation-ui:latest "docker-entrypoint.s…" 7 minutes ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp translation-xeon-ui-server +8400903275b5 opea/translation:latest "python translation.…" 7 minutes ago Up About a minute 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp translation-xeon-backend-server +2da5545cb18c opea/llm-textgen:latest "bash entrypoint.sh" 7 minutes ago Up About a minute 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-textgen-server +dee02c1fb538 ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu "text-generation-lau…" 7 minutes ago Up 7 minutes (healthy) 0.0.0.0:8008->80/tcp, [::]:8008->80/tcp tgi-service +``` + +### Test the Pipeline -> Note: The docker images will be automatically downloaded from `docker hub`: +Once the Translation service are running, test the pipeline using the following command: ```bash -docker pull opea/llm-textgen:latest -docker pull opea/translation:latest -docker pull opea/translation-ui:latest -docker pull opea/nginx:latest +curl http://${host_ip}:8888/v1/translation -H "Content-Type: application/json" -d '{ + "language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}' ``` -### Validate Microservices +**Note** The value of _host_ip_ was set using the _set_env.sh_ script and can be found in the _.env_ file. -1. TGI Service +### Cleanup the Deployment - ```bash - curl http://${host_ip}:8008/generate \ - -X POST \ - -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \ - -H 'Content-Type: application/json' - ``` +To stop the containers associated with the deployment, execute the following command: -2. LLM Microservice +``` +docker compose -f compose.yaml down +``` - ```bash - curl http://${host_ip}:9000/v1/chat/completions \ - -X POST \ - -d '{"query":"Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"}' \ - -H 'Content-Type: application/json' - ``` +``` +[+] Running 6/6 + ✔ Container translation-xeon-nginx-server Removed 10.4s + ✔ Container translation-xeon-ui-server Removed 10.3s + ✔ Container translation-xeon-backend-server Removed 10.3s + ✔ Container llm-textgen-server Removed 10.3s + ✔ Container tgi-service Removed 2.8s + ✔ Network xeon_default Removed 0.4s +``` -3. MegaService +All the Translation containers will be stopped and then removed on completion of the "down" command. - ```bash - curl http://${host_ip}:8888/v1/translation -H "Content-Type: application/json" -d '{ - "language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}' - ``` +## Translation Docker Compose Files -4. Nginx Service +The compose.yaml is default compose file using tgi as serving framework - ```bash - curl http://${host_ip}:${NGINX_PORT}/v1/translation \ - -H "Content-Type: application/json" \ - -d '{"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}' - ``` +| Service Name | Image Name | +| ------------------------------- | ------------------------------------------------------------- | +| tgi-service | ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu | +| llm | opea/llm-textgen:latest | +| translation-xeon-backend-server | opea/translation:latest | +| translation-xeon-ui-server | opea/translation-ui:latest | +| translation-xeon-nginx-server | opea/nginx:latest | -Following the validation of all aforementioned microservices, we are now prepared to construct a mega-service. +## Translation Service Configuration -## 🚀 Launch the UI +The table provides a comprehensive overview of the Translation service utilized across various deployments as illustrated in the example Docker Compose files. Each row in the table represents a distinct service, detailing its possible images used to enable it and a concise description of its function within the deployment architecture. -Open this URL `http://{host_ip}:5173` in your browser to access the frontend. -![project-screenshot](../../../../assets/img/trans_ui_init.png) -![project-screenshot](../../../../assets/img/trans_ui_select.png) +| Service Name | Possible Image Names | Optional | Description | +| ------------------------------- | ------------------------------------------------------------- | -------- | ----------------------------------------------------------------------------------------------- | +| tgi-service | ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu | No | Specific to the TGI deployment, focuses on text generation inference using Xeon hardware. | +| llm | opea/llm-textgen:latest | No | Handles large language model (LLM) tasks | +| translation-xeon-backend-server | opea/translation:latest | No | Serves as the backend for the Translation service, with variations depending on the deployment. | +| translation-xeon-ui-server | opea/translation-ui:latest | No | Provides the user interface for the Translation service. | +| translation-xeon-nginx-server | opea/nginx:latest | No | Acts as a reverse proxy, managing traffic between the UI and backend services. | diff --git a/Translation/docker_compose/intel/hpu/gaudi/README.md b/Translation/docker_compose/intel/hpu/gaudi/README.md index 31ed7da040..005504a1a3 100644 --- a/Translation/docker_compose/intel/hpu/gaudi/README.md +++ b/Translation/docker_compose/intel/hpu/gaudi/README.md @@ -1,161 +1,143 @@ -# Build MegaService of Translation on Gaudi +# Example Translation Deployment on Intel® Gaudi® Platform -This document outlines the deployment process for a Translation application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Gaudi server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as We will publish the Docker images to Docker Hub, it will simplify the deployment process for this service. +This document outlines the deployment process for a Translation service utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Gaudi server. This example includes the following sections: -## 🚀 Prepare Docker Images +- [Translation Quick Start Deployment](#translation-quick-start-deployment): Demonstrates how to quickly deploy a Translation service/pipeline on Intel® Gaudi® platform. +- [Translation Docker Compose Files](#translation-docker-compose-files): Describes some example deployments and their docker compose files. +- [Translation Service Configuration](#translation-service-configuration): Describes the service and possible configuration changes. -For Docker Images, you have two options to prepare them. +## Translation Quick Start Deployment -1. Pull the docker images from docker hub. +This section describes how to quickly deploy and test the Translation service manually on Intel® Gaudi® platform. The basic steps are: - - More stable to use. - - Will be automatically downloaded when using docker compose command. +1. [Access the Code](#access-the-code) +2. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token) +3. [Configure the Deployment Environment](#configure-the-deployment-environment) +4. [Deploy the Service Using Docker Compose](#deploy-the-service-using-docker-compose) +5. [Check the Deployment Status](#check-the-deployment-status) +6. [Test the Pipeline](#test-the-pipeline) +7. [Cleanup the Deployment](#cleanup-the-deployment) -2. Build the docker images from source. +### Access the Code - - Contain the latest new features. +Clone the GenAIExample repository and access the Translation Intel® Gaudi® platform Docker Compose files and supporting scripts: - - Need to be manually build. - -If you choose to pull docker images form docker hub, skip to [Start Microservices](#start-microservices) part directly. - -Follow the instructions below to build the docker images from source. - -### 1. Build LLM Image - -```bash -git clone https://github.com/opea-project/GenAIComps.git -cd GenAIComps -docker build -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile . ``` - -### 2. Build MegaService Docker Image - -To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `translation.py` Python script. Build the MegaService Docker image using the command below: - -```bash -git clone https://github.com/opea-project/GenAIExamples -cd GenAIExamples/Translation -docker build -t opea/translation:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . +git clone https://github.com/opea-project/GenAIExamples.git +cd GenAIExamples/Translation/docker_compose/intel/hpu/gaudi/ ``` -### 3. Build UI Docker Image - -Construct the frontend Docker image using the command below: +Checkout a released version, such as v1.2: -```bash -cd GenAIExamples/Translation/ui/ -docker build -t opea/translation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . ``` - -### 4. Build Nginx Docker Image - -```bash -cd GenAIComps -docker build -t opea/nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/third_parties/nginx/src/Dockerfile . +git checkout v1.2 ``` -Then run the command `docker images`, you will have the following four Docker Images: +### Generate a HuggingFace Access Token -1. `opea/llm-textgen:latest` -2. `opea/translation:latest` -3. `opea/translation-ui:latest` -4. `opea/nginx:latest` +Some HuggingFace resources, such as some models, are only accessible if you have an access token. If you do not already have a HuggingFace access token, you can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). -## 🚀 Start Microservices +### Configure the Deployment Environment -### Required Models +To set up environment variables for deploying Translation service, source the _set_env.sh_ script in this directory: -By default, the LLM model is set to a default value as listed below: +``` +cd ../../../ +source set_env.sh +cd intel/hpu/gaudi/ +``` -| Service | Model | -| ------- | ----------------- | -| LLM | haoranxu/ALMA-13B | +The set_env.sh script will prompt for required and optional environment variables used to configure the Translation service. If a value is not entered, the script will use a default value for the same. It will also generate a env file defining the desired configuration. Consult the section on [Translation Service configuration](#translation-service-configuration) for information on how service specific configuration parameters affect deployments. -Change the `LLM_MODEL_ID` below for your needs. +### Deploy the Service Using Docker Compose -### Setup Environment Variables +To deploy the Translation service, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute: -1. Set the required environment variables: +```bash +docker compose up -d +``` - ```bash - # Example: host_ip="192.168.1.1" - export host_ip="External_Public_IP" - # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" - export no_proxy="Your_No_Proxy" - export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" - # Example: NGINX_PORT=80 - export NGINX_PORT=${your_nginx_port} - ``` +The Translation docker images should automatically be downloaded from the `OPEA registry` and deployed on the Intel® Gaudi® Platform: -2. If you are in a proxy environment, also set the proxy-related environment variables: +``` +[+] Running 5/5 + ✔ Container tgi-gaudi-server Healthy 222.4s + ✔ Container llm-textgen-gaudi-server Started 221.7s + ✔ Container translation-gaudi-backend-server Started 222.0s + ✔ Container translation-gaudi-ui-server Started 222.2s + ✔ Container translation-gaudi-nginx-server Started 222.6s +``` - ```bash - export http_proxy="Your_HTTP_Proxy" - export https_proxy="Your_HTTPs_Proxy" - ``` +### Check the Deployment Status -3. Set up other environment variables: +After running docker compose, check if all the containers launched via docker compose have started: - ```bash - cd ../../../ - source set_env.sh - ``` +``` +docker ps -a +``` -### Start Microservice Docker Containers +For the default deployment, the following 5 containers should be running: -```bash -docker compose up -d +``` +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +097f577b3a53 opea/nginx:latest "/docker-entrypoint.…" 5 minutes ago Up About a minute 0.0.0.0:80->80/tcp, :::80->80/tcp translation-gaudi-nginx-server +0578b7034af3 opea/translation-ui:latest "docker-entrypoint.s…" 5 minutes ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp translation-gaudi-ui-server +bc23dd5b9cb0 opea/translation:latest "python translation.…" 5 minutes ago Up About a minute 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp translation-gaudi-backend-server +2cf6fabaa7c7 opea/llm-textgen:latest "bash entrypoint.sh" 5 minutes ago Up About a minute 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-textgen-gaudi-server +f4764d0c1817 ghcr.io/huggingface/tgi-gaudi:2.3.1 "/tgi-entrypoint.sh …" 5 minutes ago Up 5 minutes (healthy) 0.0.0.0:8008->80/tcp, [::]:8008->80/tcp tgi-gaudi-server ``` -> Note: The docker images will be automatically downloaded from `docker hub`: +### Test the Pipeline + +Once the Translation service are running, test the pipeline using the following command: ```bash -docker pull opea/llm-textgen:latest -docker pull opea/translation:latest -docker pull opea/translation-ui:latest -docker pull opea/nginx:latest +curl http://${host_ip}:8888/v1/translation -H "Content-Type: application/json" -d '{ + "language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}' ``` -### Validate Microservices +**Note** The value of _host_ip_ was set using the _set_env.sh_ script and can be found in the _.env_ file. -1. TGI Service +### Cleanup the Deployment - ```bash - curl http://${host_ip}:8008/generate \ - -X POST \ - -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}' \ - -H 'Content-Type: application/json' - ``` +To stop the containers associated with the deployment, execute the following command: -2. LLM Microservice +``` +docker compose -f compose.yaml down +``` - ```bash - curl http://${host_ip}:9000/v1/chat/completions \ - -X POST \ - -d '{"query":"Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"}' \ - -H 'Content-Type: application/json' - ``` +``` +[+] Running 6/6 + ✔ Container translation-gaudi-nginx-server Removed 10.5s + ✔ Container translation-gaudi-ui-server Removed 10.3s + ✔ Container translation-gaudi-backend-server Removed 10.4s + ✔ Container llm-textgen-gaudi-server Removed 10.4s + ✔ Container tgi-gaudi-server Removed 12.0s + ✔ Network gaudi_default Removed 0.4s +``` -3. MegaService +All the Translation containers will be stopped and then removed on completion of the "down" command. - ```bash - curl http://${host_ip}:8888/v1/translation -H "Content-Type: application/json" -d '{ - "language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}' - ``` +## Translation Docker Compose Files -4. Nginx Service +The compose.yaml is default compose file using tgi as serving framework - ```bash - curl http://${host_ip}:${NGINX_PORT}/v1/translation \ - -H "Content-Type: application/json" \ - -d '{"language_from": "Chinese","language_to": "English","source_language": "我爱机器翻译。"}' - ``` +| Service Name | Image Name | +| -------------------------------- | ----------------------------------- | +| tgi-service | ghcr.io/huggingface/tgi-gaudi:2.3.1 | +| llm | opea/llm-textgen:latest | +| translation-gaudi-backend-server | opea/translation:latest | +| translation-gaudi-ui-server | opea/translation-ui:latest | +| translation-gaudi-nginx-server | opea/nginx:latest | -Following the validation of all aforementioned microservices, we are now prepared to construct a mega-service. +## Translation Service Configuration -## 🚀 Launch the UI +The table provides a comprehensive overview of the Translation service utilized across various deployments as illustrated in the example Docker Compose files. Each row in the table represents a distinct service, detailing its possible images used to enable it and a concise description of its function within the deployment architecture. -Open this URL `http://{host_ip}:5173` in your browser to access the frontend. -![project-screenshot](../../../../assets/img/trans_ui_init.png) -![project-screenshot](../../../../assets/img/trans_ui_select.png) +| Service Name | Possible Image Names | Optional | Description | +| -------------------------------- | ----------------------------------- | -------- | ----------------------------------------------------------------------------------------------- | +| tgi-service | ghcr.io/huggingface/tgi-gaudi:2.3.1 | No | Specific to the TGI deployment, focuses on text generation inference using Gaudi hardware. | +| llm | opea/llm-textgen:latest | No | Handles large language model (LLM) tasks | +| translation-gaudi-backend-server | opea/translation:latest | No | Serves as the backend for the Translation service, with variations depending on the deployment. | +| translation-gaudi-ui-server | opea/translation-ui:latest | No | Provides the user interface for the Translation service. | +| translation-gaudi-nginx-server | opea/nginx:latest | No | Acts as a reverse proxy, managing traffic between the UI and backend services. |