🤖
BotTalk Docs
  • BotTalk Documentation
  • Video Tutorials
  • Onboarding
    • 🔐Create BotTalk Account
    • 🏢Create an Organization
    • 📂Create a Project
  • Text-to-Speech
    • Configure your Project
      • Organization Access Settings
      • Voice Selection
      • Parser Settings
      • Extractor Rules
    • Integrate Web Player
      • Create a Web Player
      • Configure your Web Player
      • Integrate Web Player into your website
    • Automation Rules
      • Create Custom Rule
    • Analytics
    • Reporting
  • Features
    • Dictionaries
      • SSML-Tags
      • Regular Expressions
      • Contextual Preview
    • Playlist Player
      • Set up a Playlist
      • Create a Playlist Player
      • Configure your Playlist Player
      • Integrate Playlist Player into your website
  • MONETIZATION
    • Paywall Integration
    • Audio Advertising
  • CUSTOM VOICE
    • Create a Custom Voice
    • Preparation of recordings
    • Recording platform
    • Technical guidelines
  • Integration
    • ⚙️BotTalk API
      • 🔑Authorization
      • 🛑Error Handling
      • 🚦Webhooks
    • API Reference
      • 📰Articles
      • 📊Reports
  • Help
    • ❓FAQs
      • ❔General
      • 💰Billing
Powered by GitBook
On this page
  1. CUSTOM VOICE

Technical guidelines

Preparation of training data for voice synthesis

Based on the audio recordings and scripts for training the voice, BotTalk's voice experts create a unique voice that matches the audio recording.

Speaker voice

The selection of the speaker voice is done by the corporate. Consent of the speaker must be obtained by the Corporate. Please provide BotTalk with the speaker's full name and confirm that we may synthesize the speaker's voice.

Type of training data

In order for us to achieve the best possible quality of the desired custom voice, the training data must be delivered in the following format.

A data set contains audio recordings and a text file with the corresponding transcriptions. Each audio file should contain exactly one utterance (a single sentence or turn of a dialog system).

Technical delivery:

  • Collection of audio files (.zip).

  • Audio recordings as single utterances (.wav)

  • Associated formatted transcript (.txt)

To produce a good voice model, create the recordings in a quiet room with a high-quality microphone. Consistent volume, speaking rate, speaking pitch, and expressive mannerisms of speech are essential.

Transcript

BotTalk compiles relevant sentences for the transcript in advance and provides it to the corporate. The transcript is based on real news articles.

BotTalk will take care to adjust the length of the sentences to the maximum audio length. In case of any pronunciation errors, noise, too long pauses, please record the audio again.

If necessary, the speaker can listen to the audio recording and re-record it.

The recordings must match the corresponding transcript by 100%. Errors in the transcripts will lead to loss of quality during the training.

Audio files

Each audio file should contain a single utterance (a single sentence or a single turn of a dialog system). All files must be in the same spoken language. Multi-language custom Text-to-Speech voices aren't supported. Each audio file must have a unique filename with the filename extension .wav.

At least 2 hours of audio recording are needed to synthesize a voice.

Follow these guidelines when preparing audio.

File format

RIFF (.wav), grouped into a .zip file

File name

File name will be provided by BotTalk.

No duplicate file names allowed.

Sampling rate

For creating a custom voice, 44.100 Hz is required.

No silence at the beginning and the end.

Not peaking more than -6db.

Sample format

PCM, at least 16-bit

Archive format

.zip

PreviousRecording platformNextBotTalk API

Last updated 2 years ago