Skip to content

Adds audio response toggle button to UI#60

Merged
HarshaRamayanam merged 40 commits into
mmqna-phase3from
hramayan/tts-mmqna-ui
Mar 18, 2025
Merged

Adds audio response toggle button to UI#60
HarshaRamayanam merged 40 commits into
mmqna-phase3from
hramayan/tts-mmqna-ui

Conversation

@HarshaRamayanam
Copy link
Copy Markdown
Collaborator

@HarshaRamayanam HarshaRamayanam commented Mar 4, 2025

Description

This PR adds a checkbox to the existing UI to toggle audio responses on (or) off.

Proposed changes:

  • Modified layout of MultimodalQnA Tab that combines Text & Image Query tab and Audio Query tab into one single Text, Image & Audio Query tab.
  • Removed the Submit button which is replaced by built-in submit_btn of the gr.MultimodalTextbox component.

In addition, this PR also bumps gradio version to 5.17.1 in order for gr.MultimodalTextbox component to work properly with the proposed changes. The reason for upgrading gradio version is due to this recent bug fix push for gradio.

Issues

Issue #1549

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

gradio version upgrade
5.11.0 -> 5.17.1

Tests

Tested the UI for the following scenarios -

  • Empty vector store and tried to query (text/image/audio or combination) fail gracefully with a message in the chatbot
  • Tested text, image, and audio queries and possible combination of queries without any errors.

okhleif-10 and others added 18 commits February 5, 2025 13:30
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
…radio version

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Copy link
Copy Markdown
Owner

@mhbuehler mhbuehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! Most of my comments are minor style issues. I will also do some testing as soon as I can.

"multimodalqna" \
"multimodalqna-backend-server" \
'{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}, {"type": "image_url", "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"}}]}, {"role": "assistant", "content": "opea project! "}, {"role": "user", "content": [{"type": "text", "text": "goodbye"}]}]}'
'{"messages": [{"role": "user", "content": [{"type": "text", "text": "hello, "}, {"type": "image_url", "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"}}]}, {"role": "assistant", "content": "opea project! "}, {"role": "user", "content": "chao, "}], "max_tokens": 10, "modalities": ["text", "audio"]}'
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you're modifying this to get an audio response, but why are you changing the input query from audio to text?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I grabbed @okhleif-IL 's branch into this I think these are his updates.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhbuehler I made that change, no particular reason I think I just copy/pasted it from a text file I have with various curl commands I use to test

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these changes are already in mmqna-phase3, I wonder why they are showing up in this diff.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its weird that I synced with mmqna-phase3 again but its not clearing this diff. Anyway, I manually cleared the diff to sync with mmqna-phase3.

Comment thread MultimodalQnA/ui/gradio/conversation.py Outdated
Comment thread MultimodalQnA/ui/gradio/conversation.py Outdated
Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py Outdated
Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py Outdated
Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py Outdated
Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py Outdated
Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py Outdated
Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py
Comment thread MultimodalQnA/ui/gradio/utils.py Outdated
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py Outdated
Comment thread MultimodalQnA/docker_image_build/build.yaml Outdated
Comment thread MultimodalQnA/ui/gradio/conversation.py Outdated
Comment thread MultimodalQnA/ui/gradio/utils.py Outdated
Comment thread MultimodalQnA/ui/gradio/conversation.py Outdated
Comment thread MultimodalQnA/ui/gradio/conversation.py Outdated
Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py
Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py Outdated
Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py Outdated
Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py
HarshaRamayanam and others added 2 commits March 6, 2025 16:33
Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py Outdated
Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py Outdated
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
base64_frame = ""
# Include the original caption for the returned image/video
if self.caption and content[0]["type"] == "text":
content[0]["text"] = content[0]["text"] + " " + self._template_caption()
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call to self._template_caption() is no longer happening, and it was an important fix that gives follow-up queries access to the original caption. Test this with the following steps: (1) upload an image with a caption that specifies the name of someone in the image, (2) query for the image based on the scene description, don't use the person's name, (3) after the image and response are returned, ask for the person's name in a follow-up query. It should give you the correct name.

Copy link
Copy Markdown
Collaborator Author

@HarshaRamayanam HarshaRamayanam Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that. Fixed it here

Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Copy link
Copy Markdown
Collaborator

@dmsuehir dmsuehir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HarshaRamayanam Thanks, looks like it's working well now. There's still one outstanding issue from earlier with that speecht5-gaudi entry in the build.yaml.

Comment thread MultimodalQnA/docker_image_build/build.yaml Outdated
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Copy link
Copy Markdown
Collaborator

@dmsuehir dmsuehir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

Comment thread MultimodalQnA/ui/gradio/multimodalqna_ui_gradio.py Outdated
Comment thread MultimodalQnA/ui/gradio/utils.py Outdated
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Copy link
Copy Markdown
Owner

@mhbuehler mhbuehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Revert Dockerfile
@HarshaRamayanam HarshaRamayanam merged commit e24bddf into mmqna-phase3 Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants