Speech Recognition using Google Speech API

Google has a great Speech Recognition API. This API converts spoken text (microphone) into written text (Python strings), briefly Speech to Text. You can simply speak in a microphone and Google API will translate this into written text. The API has excellent results for English language.

Google has also created the JavaScript Web Speech API, so you can recognize speech also in JavaScript if you want, here’s the link: https://www.google.com/intl/en/chrome/demos/speech.html. To use it on the web you will need Google Chrome version 25 or later.

Related course:


Google Speech API v2 is limited to 50 queries per day. Make sure you have a good microphone.
Are you are looking for text to speech instead?

This is the installation guide for Ubuntu Linux. But this will probably work on other platforms is well. You will need to install a few packages: PyAudio, PortAudio and SpeechRecognition. PyAudio 0.2.9 is required and you may need to compile that manually.

git clone http://people.csail.mit.edu/hubert/git/pyaudio.git
cd pyaudio
sudo python setup.py install
sudo apt-get installl libportaudio-dev
sudo apt-get install python-dev
sudo apt-get install libportaudio0 libportaudio2 libportaudiocpp0 portaudio19-dev
sudo pip3 install SpeechRecognition


This program will record audio from your microphone, send it to the speech API and return a Python string.

The audio is recorded using the speech recognition module, the module will include on top of the program. Secondly we send the record speech to the Google speech recognition API which will then return the output.
r.recognize_google(audio) returns a string.

#!/usr/bin/env python3
# Requires PyAudio and PySpeech.

import speech_recognition as sr

# Record Audio
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)

# Speech recognition using Google Speech Recognition
# for testing purposes, we're just using the default API key
# to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
# instead of `r.recognize_google(audio)`
print("You said: " + r.recognize_google(audio))
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))

You may like: Personal Assistant Jarvis (Speech Recognition and Text to Speech) or Speech Engines

62 thoughts on “Speech Recognition using Google Speech API

  1. ayush dhiman
    - March 10, 2018

    Can You help me providing this program for speech to text for windows 10 ?

    1. Frank
      - March 11, 2018

      The program should work on all platforms, including windows 10

  2. chandra sutrisno
    - February 20, 2018


    Is there anyway to make this run offline?

    1. Frank
      - February 24, 2018

      No, this speech recognition uses the speech API (remote processing).

  3. Alem Nigussie
    - January 23, 2018

    Hi Frank – I am working on to get a suitable solution for Speech recognition for Amharic language. is there any API out there can be used to teach other language?


    1. Frank
      - January 27, 2018

      Besides a few blog articles, I can’t find an API for the Amharic language

      1. Sahil Kansal
        - February 20, 2018

        hey frank i wanted to make 2 videos compare so thinking to change audio to change text and then compare them is it possible to do that and how ?

        1. Frank
          - February 24, 2018

          Yes, it’s possible but there may be some constraints (background noise in the video? several people talking?). Extract the audio files from the videos and then feed it to the speech engine. The most simple way is to use ffmpeg to extract the audio files, but you can do it programmatically too.

  4. Tama rindo
    - January 19, 2018

    Let’s say I want to use audio fingerprinting to auto mute TV commercials using the infrared on my cell phone. Can I record the undesired commercials to be recognized automatically in order to have the phone mute them when they reappear?

    1. Frank
      - January 20, 2018

      Yes technically that’s possible, but you’d want to use plain audio fingerprinting instead of speech recognition.

  5. ILYES Zine
    - January 16, 2018

    i have some wave audio files. and i would like to use this API in order to do “speech ro text”. without using the microphone.
    audio files as input and the text string for output.
    is that possible?
    Thanks a lot !

    1. Frank
      - January 20, 2018

      try loading the wave file as the variable audio.

  6. Venkat Narendra
    - January 15, 2018

    is it possible to limit the listening time of google speech api ..if it is possible can any one please help me how to do that

Leave a Reply

Login disabled