Skip to content

[feat(whisper)] Add recognize_whisper#625

Merged
ftnext merged 5 commits into
Uberi:masterfrom
joy-void-joy:whisper_integration
Nov 6, 2022
Merged

[feat(whisper)] Add recognize_whisper#625
ftnext merged 5 commits into
Uberi:masterfrom
joy-void-joy:whisper_integration

Conversation

@joy-void-joy

@joy-void-joy joy-void-joy commented Sep 28, 2022

Copy link
Copy Markdown
Contributor

Solve #624 by adding recognize_whisper to Recognizer.

This works by writing in a tempfile, due to the format whisper asks for.

Usage example:

import speech_recognition as sr

# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

print("Got it, now to recognize it...")

try:
    print("Whisper thinks you said " + r.recognize_whisper(audio, language='english'))
except sr.UnknownValueError:
    print("Whisper could not understand audio")
except sr.RequestError as e:
    print("Whisper error; {0}".format(e))

@ftnext

ftnext commented Sep 28, 2022

Copy link
Copy Markdown
Collaborator

Thanks!
I'll check this later.

@ftnext ftnext self-assigned this Sep 28, 2022
Comment thread tests/test_recognition.py
def test_whisper_chinese(self):
r = sr.Recognizer()
with sr.AudioFile(self.AUDIO_FILE_ZH) as source: audio = r.record(source)
self.assertEqual(r.recognize_whisper(audio, model="small", language="chinese", **self.WHISPER_CONFIG), u"砸自己的腳")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model="small" is required.

✍️ When I specify model="base" (the default value), this test failed due to wrong recognition.

======================================================================
FAIL: test_whisper_chinese (test_recognition.TestRecognition)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/.../speech_recognition-pr/tests/test_recognition.py", line 98, in test_whisper_chinese
    self.assertEqual(r.recognize_whisper(audio, language="chinese", **self.WHISPER_CONFIG), u"砸自己的腳")
AssertionError: "�<|translate|> I'm sorry." != '砸自己的腳'
- �<|translate|> I'm sorry.
+ 砸自己的腳


# recognize speech using whisper
try:
print("Whisper thinks you said " + r.recognize_whisper(audio, language="english"))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works!🎉 Thanks.

$ python examples/microphone_recognition.py
Say something!
/.../speech_recognition-pr/venv/lib/python3.9/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Whisper thinks you said  Hello whisper

@ftnext ftnext left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your great PR.
Whisper works with SpeechRecognition😃

I am very sorry for my too late review.
I would like to merge this once only the MUST comment are addressed.

@joy-void-joy Can you respond to that comment?
If it's difficult for you, that is no problem.
I'll fix the MUST comment and merge this PR this night (JST)

Let's discuss comments other than MUST after merge.

Comment thread README.rst Outdated
**transcribe_options
)

if show_dict:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: I found Conditional expressions x if C else y make here more concisely, but it depends on my preferences.

Comment thread speech_recognition/__init__.py
assert isinstance(audio_data, AudioData), "Data must be audio data"
import whisper

if load_options or not hasattr(self, "whisper_model") or self.whisper_model.get(model) is None:

@ftnext ftnext Nov 6, 2022

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✍️memo: or is short-circuit.

https://docs.python.org/3/reference/expressions.html#boolean-operations

The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.

  • When you passed not empty dict as load_options, load model
  • When load_options is None or {} and the instance does not have whisper_model attribute, then load model
  • When load_options is None or {} and the instance have whisper_model attribute but the name model does not included, then load model

@ftnext ftnext mentioned this pull request Nov 6, 2022
5 tasks
@ftnext

ftnext commented Nov 6, 2022

Copy link
Copy Markdown
Collaborator

It seems unit tests failed because of not pip installing whisper.
I'll fix the unittest workflow file to install.

ModuleNotFoundError: No module named 'whisper'

https://github.com/Uberi/speech_recognition/actions/runs/3405126020/jobs/5662866082

@ftnext

ftnext commented Nov 6, 2022

Copy link
Copy Markdown
Collaborator

FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'

https://github.com/Uberi/speech_recognition/actions/runs/3405145980/jobs/5662903599

It seems that ffmpeg are needed to install in the ubuntu-latest runner.
FYI: https://github.com/actions/runner-images/tree/main/images/linux

@joy-void-joy

Copy link
Copy Markdown
Contributor Author

Thanks for the review, do you need anything helped with? I think the fp16 bug should be fixed in the new version of whisper, I'd rather we didn't specify fp16=False if a GPU is available as there is a significant slowdown on CPU

@ftnext

ftnext commented Nov 9, 2022

Copy link
Copy Markdown
Collaborator

Thanks for your reply.

I'd rather we didn't specify fp16=False if a GPU is available as there is a significant slowdown on CPU

I agree.
I already merged #630 fp16=torch.cuda.is_available() and I believe the current implementation is similar to your idea.

If you have an idea to make it even slightly better, please send us a pull request.
Pull requests are always welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

whisper Features related to Whisper

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants