Speech Recognition using Google Speech API


Google has a great Speech Recognition API. This API converts spoken text (microphone) into written text (Python strings), briefly Speech to Text. You can simply speak in a microphone and Google API will translate this into written text. The API has excellent results for English language.

Google has also created the JavaScript Web Speech API, so you can recognize speech also in JavaScript if you want, here’s the link: https://www.google.com/intl/en/chrome/demos/speech.html. To use it on the web you will need Google Chrome version 25 or later.

Related courses:

Installation

Google Speech API v2 is limited to 50 queries per day. Make sure you have a good microphone.
Are you are looking for text to speech instead?

This is the installation guide for Ubuntu Linux. But this will probably work on other platforms is well. You will need to install a few packages: PyAudio, PortAudio and SpeechRecognition. PyAudio 0.2.9 is required and you may need to compile that manually.

git clone http://people.csail.mit.edu/hubert/git/pyaudio.git
cd pyaudio
sudo python setup.py install
sudo apt-get installl libportaudio-dev
sudo apt-get install python-dev
sudo apt-get install libportaudio0 libportaudio2 libportaudiocpp0 portaudio19-dev
sudo pip3 install SpeechRecognition

Program

This program will record audio from your microphone, send it to the speech API and return a Python string.

The audio is recorded using the speech recognition module, the module will include on top of the program. Secondly we send the record speech to the Google speech recognition API which will then return the output.
r.recognize_google(audio) returns a string.

#!/usr/bin/env python3
# Requires PyAudio and PySpeech.
 
import speech_recognition as sr
 
# Record Audio
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)
 
# Speech recognition using Google Speech Recognition
try:
    # for testing purposes, we're just using the default API key
    # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
    # instead of `r.recognize_google(audio)`
    print("You said: " + r.recognize_google(audio))
except sr.UnknownValueError:
    print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))

You may like: Personal Assistant Jarvis (Speech Recognition and Text to Speech) or Speech Engines


11 thoughts on “Speech Recognition using Google Speech API

  1. Shubham Jain - April 16, 2017

    Can i use this in Windows? If yes plz show me the steps to do it?

    1. Frank - April 16, 2017

      Yes, with one of the other speech engines it should work. Install the required modules and run it using Python.

  2. Aleem Firnas - April 2, 2017

    after cloning the project from GIT, cd into pyaudio then when I try sudo python setup.py install it throws an error
    ” src/_portaudiomodule.c:29:23: fatal error: portaudio.h: No such file or directory
    compilation terminated. “

    1. Frank - April 2, 2017

      A dependency called Portaudio is missing. This is an audio API, http://www.portaudio.com/
      On Mac

      brew install portaudio
      sudo brew link portaudio
      sudo pip install pyaudio

      On Linux: http://askubuntu.com/questions/736238/how-do-i-install-and-setup-the-environment-for-using-portaudio/

  3. soumya g - March 27, 2017

    Running this code snippet gives me the below error:

    ALSA lib pcm_dsnoop.c:606:(snd_pcm_dsnoop_open) unable to open slave
    ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
    ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
    ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
    ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
    ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave

    Any suggestions on how I can proceed from here?

    1. Frank - March 28, 2017

      Try the same code on another computer or another recognition engine

  4. Mark Tomson - March 25, 2017

    How you set the Italian language ??

    1. Frank - March 25, 2017

      The language you can set depends on the recognition engine. This only works if the language is supported. The Sphinx engine supports English, French and Chinese.
      To set a language use the parameter.

      Recognition engines may change over time (free to commercial/API key) or not include every language. However, the principle of using the speech_recognition module remains the same.

      The recognition engines are:

      • recognize_sphinx
      • recognize_google
      • recognize_wit
      • recognize_bing
      • recognize_api
      • recognize_houndify
      • recognize_ibm

      Then specify the language as a parameter, depending on the speech engine.

      For Sphinx,

      recognizer_instance.recognize_sphinx(audio_data, language = "en-US", keyword_entries = None, show_all = False)

      For Google one,

      recognize_google(audio, language="it")

      For ibm,

      def recognize_ibm(self, audio_data, username, password, bandmodel, language = "en-US", show_all = False):
      1. Mark Tomson - March 30, 2017

        Ok. Thank you very much

  5. skyler - August 11, 2016

    Does this work on Mac?
    Changing sudo apt-get to brew?

    1. Frank - August 11, 2016

      The module works on Mac too, but I’m not sure if the Google Speech Recognition API is still publicly available. The module provides access to several other speech engines such as CMU Sphinx, Wit.ai, api.ai and IBM Speech to Text.

      To install on mac I think you can use pip:

      sudo easy_install pip
      pip install SpeechRecognition

      but I don’t have a mac so I’m not sure.
      The official module site is SpeechRecognition.