Building a Quick Text-to-Voice application with Node.js, Python, and Cloud Storage

Introduction

Text-to-Voice applications have revolutionized the way we interact with technology by converting written text into spoken words. In this blog post, we will explore how to leverage the power of Node.js, Python, and cloud storage services to create a quick and efficient text-to-voice application. By following this guide, you will be able to build a robust solution that can transform text into high-quality speech.

Why use Text-to-Voice applications?

Text-to-Voice applications offer numerous benefits, including:
  1. Accessibility: These applications make content accessible to individuals with visual impairments and those who prefer listening rather than reading.
  2. Personalization: Users can customize the voice, tone, and speed of the generated speech to suit their preferences.
  3. Multilingual Support: Text-to-voice technology supports multiple languages, allowing users to convert text into speech in various linguistic contexts.
  4. Automation: Text-to-Voice applications automate the process of converting large volumes of text into speech, saving time and effort.

Building Text-to-Voice Application:

To build our application, we will use Node.js for the backend, Python for the speech-to-text functionality, and cloud storage services to store and retrieve the files.

Step 1: Environment Setup:
  1. Install Node.js by visiting the official website at https://nodejs.org/en/download/. Download the latest stable version for your operating system and follow the installation instructions to set it up.
  2. Install Python by downloading the installer from the official Python website. Make sure to select the option to add Python to the system PATH during the installation.
Step 2: Installing Dependencies:
  1. Open your terminal or command prompt and create a new directory for your project.
  2. Navigate to the project directory and initialize a new Node.js project by running the following command:
    npm init -y
  3. To install dependencies for our speech-to-text application, run the following command in your project directory:
    npm install @google-cloud/text-to-speech @google-cloud/storage fs express CSV–parser
Step 3: Authenticating to Google Cloud

To authenticate with Google Cloud, follow the below steps:

  1. Install the Google Cloud SDK: If you haven’t installed Google Cloud so far, go to the official website https://cloud.google.com/sdk/docs/install to download and install the Google Cloud SDK that is compatible with your operating system.
  2. Launch the SDK: Run the following command to launch the Google Cloud SDK in the terminal or command prompt:
    gcloud init
  3. Set the Default Project: The default project must be set after the SDK has been initialized to be used for gcloud commands. Run the below command, replacing [PROJECT_ID} with the project ID you want:
    gcloud config set project [PROJECT_ID]
  4. Authenticate with your Google Account: Run the below command to authenticate with Gcloud login:
    gcloud auth login
    This command will open a browser window prompting you to log in to your Google Account. After successful authentication, the command-line interface will be authorized to access your Google Cloud resources.
  5. Verify Authentication: Run the following command to check if you are authenticated to view the current configuration:
    gcloud auth list
    This command will display a list of accounts you have authenticated with.

Once all these steps are completed, you should be authenticated to Google Cloud and should be ready to use various Google Cloud Services and execute Google Cloud Commands.

After authentication, download the service account key file from Google Cloud. Follow the following steps:

Go to API and Services -> Credentials -> Select your project mail -> On the keys tab, you will get the service account key file.

Building your Text-to-Voice Application:

1. Get the CSV files from Cloud Bucket:

First, we will get the file from Gcloud storage.
The getFiles() method in the Google Cloud Storage client library for Node.js can be used to retrieve a list of all the files from the Cloud Bucket. Since we require only the CSV files, we can filter the list by using endsWith() method:

await storage.bucket(Bucket_Name). getFiles(options);

2. Read a CSV file:

Once the CSV files are downloaded from the cloud bucket, the CSV parser package method createReadStream() is used to read and parse the contents to JSON format. The converted JSON data is now stored in the CSVData array. Once the createReadStream() method is completed, the files will be moved to csvfileread() function.

new file.createReadStream()
. pipe(CSV())
. on(‘data’, async (data) => { csvData.push(data)})
. on(‘end’, () => {csvfileread()});

3. Request sent to API:

When JSON data is extracted from a CSVData array, the material in the JSON data will be divided up by key names. Using the text key, name values will be then converted into voice notes. Use the voice format indicated in the voice key name if you need a specific voice in your output audio. Use the format key name if you require a .mp3 file encoded in MP3 or a .wav file encoded in LINEAR16. The requested relevant data was obtained from a CSV file request and then sent to the Google Cloud Platform’s Text-to-Voice API, which synthesises speech.

Sample JSON data is given below.
{
filename: “sample”,
text: “Hello world!”,
voice: “en-US-Neural2-F”,
format: “MP3”,
}

4. Text-to-voice conversion process:

Synthesised speech is produced as a response to synthesis, which is the conversion of text input into audio data. Raw text data is acceptable as an input for text-to-speech systems. To create a fresh audio file, use the synthesis endpoint of the API.

The voice synthesis procedure generates raw audio data as a base64-encoded string. The base64-encoded string must be transformed into an audio file using the fs package’s writeFile() method. Encoded base64 text may be converted into playable media formats like mp3 or wav files.

const client = new txtSpeech.TextToSpeechClient(auth);
const synthesizeSpeechAsync = util.promisify(client.synthesizeSpeech). bind(client);
const response = await synthesizeSpeechAsync(request);
audioContent = response.audioContent;
await fs.promises.writeFile(`/temp/${audiofileName}`, audioContent);

5. File upload to cloud storage:

Once the file conversion is completed, the audio file should be stored in the Google Cloud Console bucket destination folder using the Google Cloud save method. Finally, you can download your voice file from the Google Cloud Console bucket destination folder.

const destination = `${folderName}/${csvname}/${audiofileName}`;
await gcs.bucket(Bucket_Name).file(destination).save(audioContent);

Conclusion

By following this guide, you will be able to build a quick and efficient text-to-voice application using Node.js, Python, and Cloud storage services. With the power of these technologies, you can create versatile applications that transforms the written text into high-quality speech, providing accessibility, personalization, and multilingual support to users. Connect with our experts at Sensiple now and embrace the potential of text-to-voice technology and unleash its benefits in your next project.

About the Author

Kannan KumarKannan Kumar is a Developer at Sensiple with 1 year of experience in the Contact Center practice. He is well-versed in the java and has handled an array of tasks in Nerve Framework using Node JS.