Skip to content

MultimodalQnA audio features completion#1697

Closed
mhbuehler wants to merge 65 commits into
opea-project:mainfrom
mhbuehler:mmqna-phase3
Closed

MultimodalQnA audio features completion#1697
mhbuehler wants to merge 65 commits into
opea-project:mainfrom
mhbuehler:mmqna-phase3

Conversation

@mhbuehler
Copy link
Copy Markdown
Collaborator

@mhbuehler mhbuehler commented Mar 19, 2025

Description

This PR completes the third and final phase of the RFC for MultimodalQnA image and audio support. The changes in GenAIExamples are listed below. An accompanying PR in GenAIComps is here.

New Features:

  • Added playable audio/TTS query responses from megaservice API
  • Added ability to upload, record, and send audio captions to dataprep API
  • Combined text, image, and audio query types into a unified multimodal text box
  • Added ability to list and delete files in the vector store
  • Parameterized UI timeout

Bug Fixes:

  • Fixed PDF ingestion status
  • Fixed PDF clearing behavior

Issues

Image and Audio Support in MultimodalQnA RFC

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

Version upgrades:

  • gradio: 5.11.0 -> 5.17.1
  • gradio_pdf: 0.0.19 -> 0.0.20

Tests

Updated:

  • MultimodalQnA/tests/test_compose_on_gaudi.sh
  • MultimodalQnA/tests/test_compose_on_xeon.sh
  • MultimodalQnA/tests/test_compose_on_rocm.sh

Co-authored-by: Harsha Ramayanam harsha.ramayanam@intel.com
Co-authored-by: Melanie Buehler melanie.h.buehler@intel.com
Co-authored-by: Dina Suehiro Jones dina.s.jones@intel.com
Co-authored-by: Omar Khleif omar.khleif@intel.com

okhleif-10 and others added 30 commits February 5, 2025 10:24
* Added tests + updated docs for asr mp3 change

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* addressed review comments

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

---------
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
* Added logic for showing/deleting files from vector store

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added message to show when vector store is empty

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

* Update MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py

Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>

---------

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>
* Parameterize UI timeout and increase default

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Add new variable to compose.yaml

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Update READMEs

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

---------

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
…ts (#58)

* MultimodalQnA README and diagram updates for phase 3 enhancements

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Wording

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Update to remove your_* vars

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Updates based on review comments

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

---------

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
* added TTS linkage to backend

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added modalities as a toggle

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* doc updates and code refactor

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added tts test to megaservice tests

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* addressed recent review comments

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

---------

Signed-off-by: okhleif-IL <omar.khleif@intel.com>
* Add test for image and audio data ingestion

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* README updates

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Add Gaudi tests

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Add note about matching base names in test

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

---------

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
* fixed test and added tts validation

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added gaudi test, reverted -speech change

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

---------

Signed-off-by: okhleif-IL <omar.khleif@intel.com>
…radio version

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
dmsuehir and others added 20 commits March 12, 2025 08:49
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
* Enable audio caption upload in the UI

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Improve handling of unsupported audio formats

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Improve label and exception

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Replace exception with error message so audio component still works

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

---------

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
…se (#64)

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
* Add missing env vars for MMQnA UI data prep endpoints

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Remove dockerfile branch

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

---------

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Revert Dockerfile
Adds audio response toggle button to UI
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 19, 2025

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

  • MultimodalQnA/ui/gradio/requirements.txt

Comment thread MultimodalQnA/Dockerfile
RUN apt-get update && apt-get install -y --no-install-recommends git
RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git
# RUN git clone --depth 1 https://github.com/opea-project/GenAIComps.git
RUN git clone --single-branch --branch="mmqna-phase3" https://github.com/mhbuehler/GenAIComps.git
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is for testing purposes and has to be reverted before merging.

@ashahba ashahba added this to the v1.3 milestone Mar 19, 2025
@mhbuehler mhbuehler closed this Mar 19, 2025
@mhbuehler mhbuehler deleted the mmqna-phase3 branch March 19, 2025 21:56
letonghan pushed a commit that referenced this pull request Sep 17, 2025
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
letonghan pushed a commit that referenced this pull request Sep 17, 2025
* add support for remote server

Signed-off-by: alexsin368 <alex.sin@intel.com>

* add steps to enable remote server

Signed-off-by: alexsin368 <alex.sin@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove use_remote_service

Signed-off-by: alexsin368 <alex.sin@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add OpenAI models instructions, fix format of commands

Signed-off-by: alexsin368 <alex.sin@intel.com>

* simplify ChatOpenAI instantiation

Signed-off-by: alexsin368 <alex.sin@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "simplify ChatOpenAI instantiation"

This reverts commit b7c4acf7d397a284f1499254fa8832533c0c98e3.

* add back check and logic for llm_engine, set openai_key argument

Signed-off-by: alexsin368 <alex.sin@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Provide ARCH option for lvm-video-llama image build (#1630)

Signed-off-by: ZePan110 <ze.pan@intel.com>
Signed-off-by: alexsin368 <alex.sin@intel.com>

* Add sglang microservice for supporting llama4 model (#1640)

Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Co-authored-by: Lv,Liang1 <liang1.lv@intel.com>
Signed-off-by: alexsin368 <alex.sin@intel.com>

* Remove invalid codeowner. (#1642)

Signed-off-by: ZePan110 <ze.pan@intel.com>
Signed-off-by: alexsin368 <alex.sin@intel.com>

* add support for remote server

Signed-off-by: alexsin368 <alex.sin@intel.com>

* add steps to enable remote server

Signed-off-by: alexsin368 <alex.sin@intel.com>

* remove use_remote_service

Signed-off-by: alexsin368 <alex.sin@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: alexsin368 <alex.sin@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: alexsin368 <alex.sin@intel.com>

* bug fix for chunk_size and overlap cause error in dataprep ingestion (#1643)

* bug fix for dataingest url

Signed-off-by: Mustafa <mustafa.cetin@intel.com>

* add validation function

Signed-off-by: Mustafa <mustafa.cetin@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* validation update

Signed-off-by: Mustafa <mustafa.cetin@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update validation function

Signed-off-by: Mustafa <mustafa.cetin@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mustafa <mustafa.cetin@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: alexsin368 <alex.sin@intel.com>

* MariaDB Vector integrations for retriever & dataprep services (#1645)

* Add MariaDB Vector third-party service

MariaDB Vector was introduced since MariaDB Server 11.7

Signed-off-by: Razvan-Liviu Varzaru <razvan@mariadb.org>

* Add retriever MariaDB Vector integration

Signed-off-by: Razvan-Liviu Varzaru <razvan@mariadb.org>

* Add dataprep MariaDB Vector integration

Signed-off-by: Razvan-Liviu Varzaru <razvan@mariadb.org>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix CI failures

- md5 is used for the primary key not as a security hash
- fixed mariadb readme headers

Signed-off-by: Razvan-Liviu Varzaru <razvan@mariadb.org>

---------

Signed-off-by: Razvan-Liviu Varzaru <razvan@mariadb.org>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: alexsin368 <alex.sin@intel.com>

* update PR reviewers (#1651)

Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: alexsin368 <alex.sin@intel.com>

* Expand test matrix, find all tests use 3rd party Dockerfiles (#1676)

* Expand test matrix, find all tests use 3rd party Dockerfiles

Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: alexsin368 <alex.sin@intel.com>

* fix the typo of README.md Comp (#1679)

Update README.md for first entry of OPEA

Signed-off-by: alexsin368 <alex.sin@intel.com>

* Fix request handle timeout issue (#1687)

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Signed-off-by: alexsin368 <alex.sin@intel.com>

* FEAT: Enable OPEA microservices to start as MCP servers (#1635)

Signed-off-by: alexsin368 <alex.sin@intel.com>

* Fix huggingface_hub API upgrade issue (#1691)

* Fix huggingfacehub API upgrade issue

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Signed-off-by: alexsin368 <alex.sin@intel.com>

* add OpenAI models instructions, fix format of commands

Signed-off-by: alexsin368 <alex.sin@intel.com>

* Fix dataprep opensearch ingest issue (#1697)

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Signed-off-by: alexsin368 <alex.sin@intel.com>

* Fix embedding issue with ArangoDB due to deprecated HuggingFace API (#1694)

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Signed-off-by: alexsin368 <alex.sin@intel.com>

* simplify ChatOpenAI instantiation

Signed-off-by: alexsin368 <alex.sin@intel.com>

* Revert "simplify ChatOpenAI instantiation"

This reverts commit b7c4acf7d397a284f1499254fa8832533c0c98e3.

Signed-off-by: alexsin368 <alex.sin@intel.com>

* add back check and logic for llm_engine, set openai_key argument

Signed-off-by: alexsin368 <alex.sin@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: alexsin368 <alex.sin@intel.com>
Signed-off-by: ZePan110 <ze.pan@intel.com>
Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
Signed-off-by: Mustafa <mustafa.cetin@intel.com>
Signed-off-by: Razvan-Liviu Varzaru <razvan@mariadb.org>
Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ying Hu <ying.hu@intel.com>
Co-authored-by: ZePan110 <ze.pan@intel.com>
Co-authored-by: Liang Lv <liang1.lv@intel.com>
Co-authored-by: Mustafa <109312699+MSCetin37@users.noreply.github.com>
Co-authored-by: Razvan Liviu Varzaru <45736827+RazvanLiviuVarzaru@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: Spycsh <39623753+Spycsh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants