speech recognition api

Python hosting: Host, run, and code Python in the cloud!

Google has a great Speech Recognition API. This API converts spoken text (microphone) into written text (Python strings), briefly Speech to Text. You can simply speak in a microphone and Google API will translate this into written text. The API has excellent results for English language.

A speech recognition API offloads the logic, such that you can simply send a web request to the API, which then returns the text that was recognized. You can do this from Python code directly, but your script will need internet access behind the scenes.

Related course:

Machine Learning Intro for Python Developers

Installation

Google Speech API v2 is limited to 50 queries per day. Make sure you have a good microphone.
Are you are looking for text to speech instead?

This is the installation guide for Ubuntu Linux. But this will probably work on other platforms is well. You will need to install a few packages: PyAudio, PortAudio and SpeechRecognition. PyAudio 0.2.9 is required and you may need to compile that manually.

git clone http://people.csail.mit.edu/hubert/git/pyaudio.git
cd pyaudio
sudo python setup.py install
sudo apt-get installl libportaudio-dev
sudo apt-get install python-dev
sudo apt-get install libportaudio0 libportaudio2 libportaudiocpp0 portaudio19-dev
sudo pip3 install SpeechRecognition

Program

This program will record audio from your microphone, send it to the speech API and return a Python string.

The audio is recorded using the speech recognition module, the module will include on top of the program. Secondly we send the record speech to the Google speech recognition API which will then return the output.
r.recognize_google(audio) returns a string.


#!/usr/bin/env python3
# Requires PyAudio and PySpeech.

import speech_recognition as sr

# Record Audio
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

# Speech recognition using Google Speech Recognition
try:
    # for testing purposes, we're just using the default API key
    # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
    # instead of `r.recognize_google(audio)`
    print("You said: " + r.recognize_google(audio))
except sr.UnknownValueError:
    print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))

Posted in robotics

2015-07-05

Leave a Reply:

skyler • Thu, 11 Aug 2016

Does this work on Mac?
Changing sudo apt-get to brew?

Frank • Thu, 11 Aug 2016

The module works on Mac too, but I'm not sure if the Google Speech Recognition API is still publicly available. The module provides access to several other speech engines such as CMU Sphinx, Wit.ai, api.ai and IBM Speech to Text.

To install on mac I think you can use pip:


sudo easy_install pip
pip install SpeechRecognition

but I don't have a mac so I'm not sure.
The official module site is SpeechRecognition.

Mark Tomson • Sat, 25 Mar 2017

How you set the Italian language ??

Frank • Sat, 25 Mar 2017

The language you can set depends on the recognition engine. This only works if the language is supported. The Sphinx engine supports English, French and Chinese.
To set a language use the parameter.

Recognition engines may change over time (free to commercial/API key) or not include every language. However, the principle of using the speech_recognition module remains the same.

The recognition engines are:

recognize_sphinx
recognize_google
recognize_wit
recognize_bing
recognize_api
recognize_houndify
recognize_ibm

Then specify the language as a parameter, depending on the speech engine.

For Sphinx,


recognizer_instance.recognize_sphinx(audio_data, language = "en-US", keyword_entries = None, show_all = False)

For Google one,


recognize_google(audio, language="it")

For ibm,

 
def recognize_ibm(self, audio_data, username, password, bandmodel, language = "en-US", show_all = False):

soumya g • Mon, 27 Mar 2017

Running this code snippet gives me the below error:


ALSA lib pcm_dsnoop.c:606:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave

Any suggestions on how I can proceed from here?

Frank • Tue, 28 Mar 2017

Try the same code on another computer or another recognition engine

Mark Tomson • Thu, 30 Mar 2017

Ok. Thank you very much

Aleem Firnas • Sun, 02 Apr 2017

after cloning the project from GIT, cd into pyaudio then when I try sudo python setup.py install it throws an error
" src/_portaudiomodule.c:29:23: fatal error: portaudio.h: No such file or directory
compilation terminated. "

Frank • Sun, 02 Apr 2017

A dependency called Portaudio is missing. This is an audio API, http://www.portaudio.com/
On Mac


brew install portaudio
sudo brew link portaudio
sudo pip install pyaudio

On Linux: http://askubuntu.com/questions/736238/how-do-i-install-and-setup-the-environment-for-using-portaudio/

Shubham Jain • Sun, 16 Apr 2017

Can i use this in Windows? If yes plz show me the steps to do it?

Frank • Sun, 16 Apr 2017

Yes, with one of the other speech engines it should work. Install the required modules and run it using Python.

Akhilesh Kumar • Wed, 26 Apr 2017

How to end listening @ line audio = r.listen(source) ??
Execution seems to be stuck at this line...

Frank • Sat, 29 Apr 2017

It should end automatically, try changing the speech engine.

Diego Bernal • Thu, 04 May 2017

Can you change the type of voice?

Frank • Fri, 05 May 2017

The speech API supports that, but I don't think it's in the speech_recognition module.

Sanwal Yousaf • Wed, 24 May 2017

Hey when i run through the installation steps, i can't get past this line

sudo apt-get installl libportaudio-dev

I get the error:

Package libportaudio-dev is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'libportaudio-dev' has no installation candidate

Any idea how i can resolve this?

Frank • Wed, 24 May 2017

Try download it from here: http://www.portaudio.com/download.html

Deepak Chawla • Thu, 25 May 2017

Hello Sir, I am using google speech API with default API key since 15 days but currently it does't recognize my voice with it where my microphone works well which I test at google voice where it works without any error. I can't understand what problem behind it. Please help me.
Hope for positive response.

Frank • Fri, 26 May 2017

I'm not sure, does the site https://www.google.com/intl/en/chrome/demos/speech.html work for you?

Chintan Mungra • Mon, 10 Jul 2017

sir , i have the same problem but this site https://www.google.com/intl... works for me by changing the default to usb mic in chrome setting.
so sir can you plz tell is there any way to change default to usb mic. i am using Raspberry PI3

Frank • Tue, 11 Jul 2017

The usb mic is needed on the raspberry PI. I don't have a raspberry pi, but it looks like
you can change it with:


Microphone(device_index=MICROPHONE_INDEX)

that's in the line


with sr.Microphone(device_index=MICROPHONE_INDEX) as source:

To list the microphones use this program:


import speech_recognition as sr
for index, name in enumerate(sr.Microphone.list_microphone_names()):
    print("Microphone with name \"{1}\" found for `Microphone(device_index={0})`".format(index, name))

Kavya Shree • Thu, 13 Jul 2017

I have to do convert speech to text in offline on SAMSUNG ARTIK board. Please tell which package do i need to install and the steps to follow.

Frank • Fri, 14 Jul 2017

Many speech APIs only work online. The module SpeechRecognition only works offline with the engine CMU Sphinx. All the other speech engines supported dby the module SpeechRecognition need internet connectivity.

Hector Aaron Castillo Elizalde • Wed, 19 Jul 2017

Hello, is there any solution for reducing the delay time? I have test the code and this does not work online, it takes a few seconds to give back the string. Thanks

Frank • Sat, 22 Jul 2017

There is no real time solution that I know of. Even on Android it takes a moment to listen

Akkas Singh • Tue, 25 Jul 2017

Not able to install any of the above packages on Windows 10
Got Python Version 2.7.13 and pip version 9.0.1

Everytime a get an error says : Could not find a version that satisfies the requirement libportaudio-dev(from version:)
No matching distribution found for libportaudio-dev

Help me out

Frank • Wed, 26 Jul 2017

On windows you need to compile PortAudio.
Also try: pip install pyaudio

Sincole Brans • Wed, 26 Jul 2017

Any documentation to publish this as webservice? Or can be consumed by hangout, skype or something?
Any leads?

Frank • Fri, 28 Jul 2017

This records the microphone locally (attached to the computer). If the client would run a Python program recording the microphone, you could forward the text to a server.