Adds audio response toggle button to UI#60
Conversation
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
…radio version Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
mhbuehler
left a comment
There was a problem hiding this comment.
This looks really good! Most of my comments are minor style issues. I will also do some testing as soon as I can.
| "multimodalqna" \ | ||
| "multimodalqna-backend-server" \ | ||
| '{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}, {"type": "image_url", "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"}}]}, {"role": "assistant", "content": "opea project! "}, {"role": "user", "content": [{"type": "text", "text": "goodbye"}]}]}' | ||
| '{"messages": [{"role": "user", "content": [{"type": "text", "text": "hello, "}, {"type": "image_url", "image_url": {"url": "https://www.ilankelman.org/stopsigns/australia.jpg"}}]}, {"role": "assistant", "content": "opea project! "}, {"role": "user", "content": "chao, "}], "max_tokens": 10, "modalities": ["text", "audio"]}' |
There was a problem hiding this comment.
I see that you're modifying this to get an audio response, but why are you changing the input query from audio to text?
There was a problem hiding this comment.
Actually I grabbed @okhleif-IL 's branch into this I think these are his updates.
There was a problem hiding this comment.
@mhbuehler I made that change, no particular reason I think I just copy/pasted it from a text file I have with various curl commands I use to test
There was a problem hiding this comment.
If these changes are already in mmqna-phase3, I wonder why they are showing up in this diff.
There was a problem hiding this comment.
Its weird that I synced with mmqna-phase3 again but its not clearing this diff. Anyway, I manually cleared the diff to sync with mmqna-phase3.
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
| base64_frame = "" | ||
| # Include the original caption for the returned image/video | ||
| if self.caption and content[0]["type"] == "text": | ||
| content[0]["text"] = content[0]["text"] + " " + self._template_caption() |
There was a problem hiding this comment.
This call to self._template_caption() is no longer happening, and it was an important fix that gives follow-up queries access to the original caption. Test this with the following steps: (1) upload an image with a caption that specifies the name of someone in the image, (2) query for the image based on the scene description, don't use the person's name, (3) after the image and response are returned, ask for the person's name in a follow-up query. It should give you the correct name.
There was a problem hiding this comment.
Thanks for catching that. Fixed it here
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
dmsuehir
left a comment
There was a problem hiding this comment.
@HarshaRamayanam Thanks, looks like it's working well now. There's still one outstanding issue from earlier with that speecht5-gaudi entry in the build.yaml.
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Signed-off-by: Harsha Ramayanam <harsha.ramayanam@intel.com>
Revert Dockerfile
Description
This PR adds a checkbox to the existing UI to toggle audio responses on (or) off.
Proposed changes:
MultimodalQnATab that combinesText & Image Querytab andAudio Querytab into one singleText, Image & Audio Querytab.Submitbutton which is replaced by built-insubmit_btnof thegr.MultimodalTextboxcomponent.In addition, this PR also bumps gradio version to
5.17.1in order forgr.MultimodalTextboxcomponent to work properly with the proposed changes. The reason for upgrading gradio version is due to this recent bug fix push for gradio.Issues
Issue #1549
Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
gradio version upgrade
5.11.0 -> 5.17.1Tests
Tested the UI for the following scenarios -