[feat(whisper)] Add recognize_whisper by joy-void-joy · Pull Request #625 · Uberi/speech_recognition

joy-void-joy · 2022-09-28T20:24:53Z

Solve #624 by adding recognize_whisper to Recognizer.

This works by writing in a tempfile, due to the format whisper asks for.

Usage example:

import speech_recognition as sr

# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

print("Got it, now to recognize it...")

try:
    print("Whisper thinks you said " + r.recognize_whisper(audio, language='english'))
except sr.UnknownValueError:
    print("Whisper could not understand audio")
except sr.RequestError as e:
    print("Whisper error; {0}".format(e))

Add a recognizer for https://github.com/openai/whisper

ftnext · 2022-09-28T23:49:56Z

Thanks!
I'll check this later.

ftnext · 2022-11-06T02:47:53Z

+    def test_whisper_chinese(self):
+        r = sr.Recognizer()
+        with sr.AudioFile(self.AUDIO_FILE_ZH) as source: audio = r.record(source)
+        self.assertEqual(r.recognize_whisper(audio, model="small", language="chinese", **self.WHISPER_CONFIG), u"砸自己的腳")


model="small" is required.

✍️ When I specify model="base" (the default value), this test failed due to wrong recognition.

====================================================================== FAIL: test_whisper_chinese (test_recognition.TestRecognition) ---------------------------------------------------------------------- Traceback (most recent call last): File "/.../speech_recognition-pr/tests/test_recognition.py", line 98, in test_whisper_chinese self.assertEqual(r.recognize_whisper(audio, language="chinese", **self.WHISPER_CONFIG), u"砸自己的腳") AssertionError: "�<|translate|> I'm sorry." != '砸自己的腳' - �<|translate|> I'm sorry. + 砸自己的腳

ftnext · 2022-11-06T02:55:20Z

+
+# recognize speech using whisper
+try:
+    print("Whisper thinks you said " + r.recognize_whisper(audio, language="english"))


It works!🎉 Thanks.

$ python examples/microphone_recognition.py Say something! /.../speech_recognition-pr/venv/lib/python3.9/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead warnings.warn("FP16 is not supported on CPU; using FP32 instead") Whisper thinks you said Hello whisper

ftnext

Thanks a lot for your great PR.
Whisper works with SpeechRecognition😃

I am very sorry for my too late review.
I would like to merge this once only the MUST comment are addressed.

@joy-void-joy Can you respond to that comment?
If it's difficult for you, that is no problem.
I'll fix the MUST comment and merge this PR this night (JST)

Let's discuss comments other than MUST after merge.

ftnext · 2022-11-06T03:46:37Z

+                **transcribe_options
+            )
+
+        if show_dict:


nits: I found Conditional expressions x if C else y make here more concisely, but it depends on my preferences.

ftnext · 2022-11-06T04:11:26Z

+        assert isinstance(audio_data, AudioData), "Data must be audio data"
+        import whisper
+
+        if load_options or not hasattr(self, "whisper_model") or self.whisper_model.get(model) is None:


✍️memo: or is short-circuit.

https://docs.python.org/3/reference/expressions.html#boolean-operations

The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.

When you passed not empty dict as load_options, load model

When load_options is None or {} and the instance does not have whisper_model attribute, then load model

When load_options is None or {} and the instance have whisper_model attribute but the name model does not included, then load model

ftnext · 2022-11-06T16:35:02Z

It seems unit tests failed because of not pip installing whisper.
I'll fix the unittest workflow file to install.

ModuleNotFoundError: No module named 'whisper'

https://github.com/Uberi/speech_recognition/actions/runs/3405126020/jobs/5662866082

ftnext · 2022-11-06T16:46:42Z

FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'

https://github.com/Uberi/speech_recognition/actions/runs/3405145980/jobs/5662903599

It seems that ffmpeg are needed to install in the ubuntu-latest runner.
FYI: https://github.com/actions/runner-images/tree/main/images/linux

joy-void-joy · 2022-11-09T14:09:40Z

Thanks for the review, do you need anything helped with? I think the fp16 bug should be fixed in the new version of whisper, I'd rather we didn't specify fp16=False if a GPU is available as there is a significant slowdown on CPU

ftnext · 2022-11-09T17:11:39Z

Thanks for your reply.

I'd rather we didn't specify fp16=False if a GPU is available as there is a significant slowdown on CPU

I agree.
I already merged #630 fp16=torch.cuda.is_available() and I believe the current implementation is similar to your idea.

If you have an idea to make it even slightly better, please send us a pull request.
Pull requests are always welcome!

[feat(whisper)] Add recognize_whisper

282402b

Add a recognizer for https://github.com/openai/whisper

ftnext self-assigned this Sep 28, 2022

ftnext reviewed Nov 6, 2022

View reviewed changes

ftnext requested changes Nov 6, 2022

View reviewed changes

ftnext reviewed Nov 6, 2022

View reviewed changes

ftnext mentioned this pull request Nov 6, 2022

Support pocketsphinx 5.0.0 #626

Closed

5 tasks

Fix inline code markup

aa09576

ftnext added 2 commits November 7, 2022 01:37

Install whisper before running tests

65e20dd

Merge branch 'master' into whisper_integration

68b2438

Install ffmpeg to run whisper in unit tests

b3665f4

ftnext merged commit 7461563 into Uberi:master Nov 6, 2022

ftnext added the whisper Features related to Whisper label Nov 7, 2022

This was referenced Nov 7, 2022

Add audio_transcribe example for whisper #628

Open

Add a recognizer for whisper #624

Closed

whisper: address the warning "FP16 is not supported on CPU; using FP32 instead" #629

Closed

ftnext mentioned this pull request Nov 21, 2022

In whisper implementation, tempfile is not required; In-memory stream can be used instead #633

Closed

salehA13 mentioned this pull request Jan 30, 2026

feat: Add Whisper transcription examples to audio_transcribe.py #866

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat(whisper)] Add recognize_whisper#625

[feat(whisper)] Add recognize_whisper#625
ftnext merged 5 commits into
Uberi:masterfrom
joy-void-joy:whisper_integration

joy-void-joy commented Sep 28, 2022 •

edited

Loading

Uh oh!

ftnext commented Sep 28, 2022

Uh oh!

ftnext Nov 6, 2022

Uh oh!

ftnext Nov 6, 2022

Uh oh!

ftnext left a comment

Uh oh!

Uh oh!

ftnext Nov 6, 2022

Uh oh!

Uh oh!

ftnext Nov 6, 2022 •

edited

Loading

Uh oh!

ftnext commented Nov 6, 2022 •

edited

Loading

Uh oh!

ftnext commented Nov 6, 2022

Uh oh!

joy-void-joy commented Nov 9, 2022

Uh oh!

ftnext commented Nov 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joy-void-joy commented Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ftnext commented Sep 28, 2022

Uh oh!

ftnext Nov 6, 2022

Choose a reason for hiding this comment

Uh oh!

ftnext Nov 6, 2022

Choose a reason for hiding this comment

Uh oh!

ftnext left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ftnext Nov 6, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ftnext Nov 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ftnext commented Nov 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ftnext commented Nov 6, 2022

Uh oh!

joy-void-joy commented Nov 9, 2022

Uh oh!

ftnext commented Nov 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joy-void-joy commented Sep 28, 2022 •

edited

Loading

ftnext Nov 6, 2022 •

edited

Loading

ftnext commented Nov 6, 2022 •

edited

Loading