Adds audio response toggle button to UI by HarshaRamayanam · Pull Request #60 · mhbuehler/GenAIExamples

HarshaRamayanam · 2025-03-04T20:05:05Z

Description

This PR adds a checkbox to the existing UI to toggle audio responses on (or) off.

Proposed changes:

Modified layout of MultimodalQnA Tab that combines Text & Image Query tab and Audio Query tab into one single Text, Image & Audio Query tab.
Removed the Submit button which is replaced by built-in submit_btn of the gr.MultimodalTextbox component.

In addition, this PR also bumps gradio version to 5.17.1 in order for gr.MultimodalTextbox component to work properly with the proposed changes. The reason for upgrading gradio version is due to this recent bug fix push for gradio.

Issues

Issue #1549

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds new functionality)
Breaking change (fix or feature that would break existing design and interface)
Others (enhancement, documentation, validation, etc.)

Dependencies

gradio version upgrade
5.11.0 -> 5.17.1

Tests

Tested the UI for the following scenarios -

Empty vector store and tried to query (text/image/audio or combination) fail gracefully with a message in the chatbot
Tested text, image, and audio queries and possible combination of queries without any errors.

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

…radio version Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

mhbuehler

This looks really good! Most of my comments are minor style issues. I will also do some testing as soon as I can.

mhbuehler · 2025-03-05T21:52:28Z

        "multimodalqna" \
        "multimodalqna-backend-server" \
-        '{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}, {"type": "image_url", "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"}}]}, {"role": "assistant", "content": "opea project! "}, {"role": "user", "content": [{"type": "text", "text": "goodbye"}]}]}'
+        '{"messages": [{"role": "user", "content": [{"type": "text", "text": "hello, "}, {"type": "image_url", "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"}}]}, {"role": "assistant", "content": "opea project! "}, {"role": "user", "content": "chao, "}], "max_tokens": 10, "modalities": ["text", "audio"]}'


I see that you're modifying this to get an audio response, but why are you changing the input query from audio to text?

Actually I grabbed @okhleif-IL 's branch into this I think these are his updates.

@mhbuehler I made that change, no particular reason I think I just copy/pasted it from a text file I have with various curl commands I use to test

If these changes are already in mmqna-phase3, I wonder why they are showing up in this diff.

Its weird that I synced with mmqna-phase3 again but its not clearing this diff. Anyway, I manually cleared the diff to sync with mmqna-phase3.

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

mhbuehler · 2025-03-10T17:50:50Z

-                                base64_frame = ""
-                            # Include the original caption for the returned image/video
-                            if self.caption and content[0]["type"] == "text":
-                                content[0]["text"] = content[0]["text"] + " " + self._template_caption()


This call to self._template_caption() is no longer happening, and it was an important fix that gives follow-up queries access to the original caption. Test this with the following steps: (1) upload an image with a caption that specifies the name of someone in the image, (2) query for the image based on the scene description, don't use the person's name, (3) after the image and response are returned, ask for the person's name in a follow-up query. It should give you the correct name.

Thanks for catching that. Fixed it here

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

dmsuehir

@HarshaRamayanam Thanks, looks like it's working well now. There's still one outstanding issue from earlier with that speecht5-gaudi entry in the build.yaml.

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

dmsuehir

LGTM 🎉

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

mhbuehler

LGTM

Revert Dockerfile

okhleif-10 and others added 18 commits February 5, 2025 13:30

first commit for tts addition

11f797e

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

added TTS linkage to backend

05ddb11

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

removed unused import

ee62b73

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

added necessary env vars

0f4e77d

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

Merge remote-tracking branch 'origin/mmqna-phase3' into omar/tts-mmqna

fc99972

reworked temp tts toggle logic

e500c10

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

added modalities as a toggle

aafee33

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

removed print statement

e686ec3

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

removed gaudi from tts

e4ae51d

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

Merge remote-tracking branch 'origin/mmqna-phase3' into omar/tts-mmqna

0818fff

doc updates and code refactor

a1c7adb

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

Merge remote-tracking branch 'origin/mmqna-phase3' into omar/tts-mmqna

0c056a4

added tts test to megaservice tests

632a60b

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

remove log diles

08ab760

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

addressed recent review comments

220096e

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

Merge branch 'mmqna-phase3' into hramayan/tts-mmqna-ui

186f7a8

Added Logic for audio responses & refactored code to align with new g…

2137998

…radio version Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

Merge branch 'mmqna-phase3' into hramayan/tts-mmqna-ui

a575dd3

HarshaRamayanam requested a review from mhbuehler as a code owner March 4, 2025 20:05

HarshaRamayanam added 2 commits March 4, 2025 16:47

Minr bug fixes and UI changes

59fb709

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

UI layout update & handling empty text with spaces

4013a0d

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

mhbuehler requested changes Mar 5, 2025

View reviewed changes

HarshaRamayanam added 3 commits March 5, 2025 15:02

Updates on review comments

cd4c645

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

Update on review comments

a2cf4dd

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

Merge branch 'mmqna-phase3' into hramayan/tts-mmqna-ui

b5a0e27

dmsuehir requested changes Mar 7, 2025

View reviewed changes

dmsuehir mentioned this pull request Mar 7, 2025

[Feature] Phase 3 image/audio enhancements to MultimodalQnA in v1.3 opea-project/GenAIExamples#1549

Closed

7 tasks

HarshaRamayanam and others added 2 commits March 6, 2025 16:33

Update MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py

58734e9

Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>

Some updates to review comments. More to come after testing

1e09283

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

dmsuehir reviewed Mar 7, 2025

View reviewed changes

Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py Outdated

dmsuehir reviewed Mar 7, 2025

View reviewed changes

Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py Outdated

HarshaRamayanam added 3 commits March 7, 2025 10:52

Restrict file media types to known/working formats

2c4ead5

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

Remove extra whitespace

1ce67e2

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

Fix test_compose_on_gaudi.sh script's diff not syncing with phase3

e9f0cd0

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

mhbuehler reviewed Mar 10, 2025

View reviewed changes

HarshaRamayanam added 7 commits March 10, 2025 16:53

Changes per review comments

5ad1c18

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

Added single space to the pload

3a34ec2

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

Added logic to flush chatbot assistant's voice reponse .wav

5b47407

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

Merge branch 'mmqna-phase3' into hramayan/tts-mmqna-ui

9189732

Merge branch 'mmqna-phase3' into hramayan/tts-mmqna-ui

dea974b

Fixed issue where assistant's image is not sent

d2a2bc4

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

Merge branch 'mmqna-phase3' into hramayan/tts-mmqna-ui

c1843f7

dmsuehir requested changes Mar 18, 2025

View reviewed changes

Comment thread MultimodalQnA/docker_image_build/build.yaml Outdated

HarshaRamayanam added 2 commits March 18, 2025 11:51

Revert build yaml

4ed2117

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

Clear diff

b4ba36c

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

dmsuehir approved these changes Mar 18, 2025

View reviewed changes

mhbuehler reviewed Mar 18, 2025

View reviewed changes

Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py Outdated

mhbuehler reviewed Mar 18, 2025

View reviewed changes

Comment thread MultimodalQnA/ui/gradio/utils.py Outdated

HarshaRamayanam added 2 commits March 18, 2025 14:57

changes per review

bc43cc1

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

small change

9aad174

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>

mhbuehler approved these changes Mar 18, 2025

View reviewed changes

Update Dockerfile

abf0200

Revert Dockerfile

HarshaRamayanam merged commit e24bddf into mmqna-phase3 Mar 18, 2025

Conversation

HarshaRamayanam commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues

Type of change

Dependencies

Tests

Uh oh!

mhbuehler left a comment

Choose a reason for hiding this comment

Uh oh!

mhbuehler Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

HarshaRamayanam Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

okhleif-10 Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

mhbuehler Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

HarshaRamayanam Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mhbuehler Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

HarshaRamayanam Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmsuehir left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dmsuehir left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mhbuehler left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HarshaRamayanam commented Mar 4, 2025 •

edited

Loading

HarshaRamayanam Mar 10, 2025 •

edited

Loading