building a quick speech to text

Building a Quick Speech-to-Text application with Node.js and Cloud Storage


Speech-to-Text applications have gained immense popularity due to their ability to convert spoken words into written text. In this blog post, we will explore how to leverage the power of Node.js, Python, and cloud storage services to create a quick and efficient speech-to-text application. By following this guide, you will be able to build a robust solution that can transcribe audio files with ease.

Why use speech-to-text applications?

Speech-to-text applications offer numerous benefits, including:
  1. Accessibility: By converting spoken words into text, these applications can enable individuals with hearing impairments to access the audio content.
  2. Automation: Speech-to-text technology automates the process of transcribing the audio files, saving time and effort.
  3. Language Support: These applications can transcribe speech into multiple languages, making the content accessible to global audience.
  4. Versatility: Speech-to-text technology finds applications in various domains such as transcription services, voice assistants, voice search and more.

Building Speech-to-text Application:

To build our application, we will use Node.js for the backend, for the speech-to-text functionality, and cloud storage services to store and retrieve the files.

Step 1: Environment Setup
  1. Install Node.js by visiting the official website at Download the latest stable version for your operating system and follow the installation instructions to set it up.
  2. Install Python by downloading the installer from the official Python website. Make sure to select the option to add Python to the system PATH during the installation.
Step 2: Installing Dependencies
  1. Open your terminal or command prompt and create a new directory for your project.
  2. Navigate to the project directory and initialize new Node.js project by running the following command:
    npm init -y
  3. To install dependencies for our speech-to-text application, run the following command in your project directory:
    npm install @google-cloud/speech @google-cloud/storage express fs multer path
Step 3: Authenticating to Google Cloud

To authenticate with Google Cloud, follow the below steps:

  1. Install the Google Cloud SDK: If you haven’t installed Google Cloud SDK so far, go to the official website to download and install the Google Cloud SDK that is compatible with your operating system.
  2. Launch the SDK: Run the following command to launch the Google Cloud SDK in terminal or command prompt:
    gcloud init
  3. Set the Default Project: The default project must be set after the SDK has been initialized in order to be used for gcloud commands. Run the below command, replacing [PROJECT_ID} with the project ID you want:
    gcloud config set project [PROJECT_ID]
  4. Authenticate with your Google Account: Run the below command to authenticate with Gcloud login:
    gcloud auth login
    This command will open a browser window prompting you to log in to Google Account. After successful authentication, the command-line interface will be authorized to access your Google Cloud resources.
  5. Verify Authentication: Run the following command to check if you are authenticated to view the current configuration:
    gcloud auth list
    This command will display a list of accounts you have authenticated with.

Once all these steps are completed, you should be authenticated to Google cloud and should be ready to use various Google Cloud Services and execute Google Cloud Commands.

After authentication, download the service account key file from Google Cloud. Follow the following steps:

Go to API and Services -> Credentials -> Select your project mail -> On the keys tab, you will get the service account key file.


Building your Speech-to-Text Application:

1. Cloud Storage:

Google Cloud Storage is used as a database as we have the voice files stored and we get the files from the storage and then we process the data. To perform this, we should install,
npm install @google-cloud/storage

Declare your project-id, Json key file, and your bucket name:

const {Storage} = require(“@google-cloud/storage”);
const projectId = “{project-id}”;
const keyFilename = “{keyfile.json}”;
const Bucket_Name = “{bucket_name}”;
let gcs = new Storage({
projectId: projectId,
keyFilename: keyFilename,
const bucket = gcs.bucket(Bucket_Name);

2. Speech API:

We can transcribe speech-to-text in 3 different ways.

  • Google Speech API
  • OpenAI Transcriber
  • AWS Transcribe Service

Google Speech API:

Google Speech API will help you convert the audio to text. Let’s see the implementation for Google Speech API

  1. Go to your terminal and run the following command:
    npm install @google-cloud/speech
  2. Require the dependency on your code.
  3. Get your gsuri for your file in bucket. The uri will look like
  4. Now, create your request for your application. The request should include gsuri, encoding, sample_hertz, language_code.
    const audio ={uri:gs://{bucket_name}/{filename.ext}};
    const config = {
    encoding: “LINEAR16”, // MULAW() //LINEAR16
    sample_rate_hertz: 48000,
    languageCode: “en-US”,};
    Google Speech API

Open AI Transcribe Service:

  • OpenAI provides 2 endpoints: Transcription and Translation
    Transcription: Transcribing the audio into the original language
    Translation: This endpoint helps in the English transcription and translation of the audio
  • The below python code will transcribe the audio to text from google cloud bucket.
    pip install OpenAI
    Include your OpenAI key to the code. (get your OpenAI key from
    After installing, use the transcribe or translate service to your code:
    transcript = openai.Audio.transcribe(
    file = ‘audio.wav’,
    model = “whisper-1”,

AWS Transcribe Function:

  • AWS function provides its own service to transcribe the text from audio-to-text.
  • Before using AWS Transcribe Service, you must ensure that you have AWS Transcribe API access and AWS bucket access.
  • First, install AWS-SDK using npm install aws-sdk and add this to your Node.js code.
    const AWS = require(‘aws-sdk’)
  • Get the requirements of the projects like Project-id, bucket name, AWS_ACCESS_KEY_ID, AWS_SECRET_KEY, and SESSION_KEY.
    const transcribeService = new AWS.TranscribeService({
    accessKeyId: ‘your_access_key_id’,
    secretAccessKey: ‘your_secret_access_key’,
    aws_session_token: ‘your session token’
    Pass the required parameters for the API Call
    const params = {
    TranscriptionJobName: ‘your_job_name’,
    LanguageCode: ‘en-US’,
    MediaFormat: ‘mp3’,
    Media: { MediaFileUri: ‘your_audio_file_url’ },
    To make the API Call with startTranscriptionJob method:
3. Write the transcribed content to CSV:
  • Using the FS module, we will write the transcribed content to our CSV file.
  • To install FS module:
    npm install fs
    fs.writeFile(‘/tmp/data.csv’, str)
4. Uploading CSV file to Bucket:
  • Voice files are being transferred to different folders for easy access. We have three different folders like input, processed, and error.
  • Here we are using cloud storage service for the data storing purposes.
  • With the help of upload function, we will upload the CSV file to the bucket.
    gcs.bucket(Bucket_Name).upload(filename, {
    destination: filename}
5. Moving Files to Folders:
  • Voice files are being transferred to different folders for easy access. We have three different folders like input, processed, and error.
  • Input – Unprocessed voice files
  • Processed – Voice files that are successfully transcribed.
  • Error – Voice files with errors will be stored here.

Initially, the voice file is stored in the input file folder. Once the process is successfully completed then it will move the voice files to processed folder. If there is any error to the files while processing, then the files will be moved to the error folder.

await mve.move(`${{Destination folder}}/${{filename}}`)

6. Download the file from Bucket:

Finally, the transcribed files have been written to the CSV file and the files will be directly available for download from the cloud storage.


By following this guide, you will be able to build a quick and efficient speech-to-text application using Node.js, Python, and cloud storage services. Leveraging the power of these technologies, you will be able to create robust applications that automate the transcription process, enhance accessibility, and provide multilingual support. Connect with our experts at Sensiple now and embrace the potential of speech-to-text technology and unleash its benefits in your next project.

About the Author

Harini RavichandranHarini Ravichandran is a Developer at Sensiple with 1 year of experience in Contact Center practice. She is skilled in java and node JS, and she has handled an array of tasks in Nerve Framework using Node JS.