Audio Chunks Strategy

What is the Audio Chunks Strategy?

Rather than sending entire articles to TTS providers, BotTalk breaks down each article into smaller segments called “Audio Chunks.” These chunks are processed individually and then stitched together on the backend to produce the final audio.

Why It Matters

Lower Costs

TTS services typically charge per character or per request. By chunking content and optimizing how it’s sent, BotTalk minimizes redundancy and keeps audio generation costs low—especially for long-form content like news articles.

Intellectual Property Protection

Chunking ensures that your full article is never exposed to the TTS provider in one go, reducing the risk of the content being used for unintended purposes such as training large language models (LLMs).

How It Works (Model-Specific Strategies)

✅ For Traditional TTS Models (e.g., Google Wavenet, Amazon Polly, Microsoft Azure):

BotTalk uses sentence-based chunking.

Each sentence is audified separately.

Sentences are then glued together in sequence.

This works well because these models are not context-sensitive and handle standalone sentences naturally.

⚠️ For Modern TTS Models (e.g., ElevenLabs, Gemini):

These models are context-sensitive and rely on broader linguistic context.

Sentence-by-sentence audio generation leads to issues like:

Wrong language detection (e.g., interpreting German as English).

Inconsistent tone or voice between chunks.

BotTalk switches to paragraph-based or adaptive chunking to maintain natural prosody and voice consistency.

🎛 Customizing Chunk Separation

BotTalk gives you full control via:

Automation Rules → Create Custom Rule → Set Chunk Separation Strategy

Here, you can choose:

Sentence-Based

Paragraph-Based

Or leave it empty (to use the default strategy per TTS model)

If no rule is defined, BotTalk’s intelligent default behavior is applied based on the selected TTS provider.

Last updated