Creating Text-to-Speech (TTS)

1. Altered Studio contains hundreds of TTS voices, including our own 20 Professional voices, as well as Cloud voices from Amazon, Google, IBM and Microsoft. If you use a Cloud voice, your text will be sent to the provider’s servers for processing.

2 Like other effects, TTS can be applied to text in the whole file, a block or another selected area. Use the Text-to-Speech effect on the relevant block controls, or apply TTS to part or all of the file by using the blue action buttons at the top of the Transcription panel (you can also use the + button on the History panel or the Effects menu).

3. There are several ways to enter text for TTS generation: 

4. You can use the T shortcut to open the Text Entry screen. This screen allows you to enter single sections of text and assign a speaker, or to enter a multi-speaker dialogue with speaker names. Each speaker name will create a new block on the Transcription panel.

5. If you have transcribed an audio file, you can use the transcription to generate TTS.

6. Alternatively you can add a Text-to-Speech effect at any point, and use the Manual Text box in the Properties panel to enter your desired text. 

7. Transcribed text can also be changed in the Manual Text box, making it easy to quickly adjust words or punctuation to improve the output.

8. Once your text is ready, use the (+) button the Morph Properties panel to open the Voices selection screen. From here choose one or more Voices which you would like to use and click OK.

9. All 20 Altered Professional voices, and some of the Cloud voices have Speaking Styles, that give you further control over the vocal delivery. You can filter for these voices in the selection screen. 

10. Altered’s Professional voices and the Azure Third Party voices have a dial that allows you to control the tempo of the TTS. The Azure voices also have a pitch control dial.

11. Once you have chosen your voice(s), select your Speaking Styles and click the Synthesize button to generate the TTS. Note, unlike Morphing, TTS is deterministic synthesis, so you will get the same output each time you click Synthesise with the same settings. For this reason we recommend only running synthesis once per input, unless you change the text, the Speaking Style or the voice, each of which will obviously change the output.

12. You can export an individual TTS sample to disk by right-clicking and choosing Export as. Alternatively, you can export all the files in the sample bay at once by using thedownload icon in the top right corner of the Sample panel.

Tips for Working with Text-to-Speech

1. TTS is generally designed for short pieces of text, so it is best to create one sentence or short paragraph at a time.

2. The placement of gaps and pauses can help to improve the flow of the TTS output, and using shorter blocks of text can improve both the intonation and create more natural sounding pauses. 

3. Adding pauses can create breath sounds and change intonation to make the TTS more lifelike. To adjust pauses in your file add punctuation such as commas, full stops, ellipses, dash, etc. to replicate a pause. 

4. Any time capitalization is used, Altered Studio will treat the phrase as an acronym. For example, USA will be read as U-S-A. If you are getting pronunciation errors, check that you don’t have words in ALL CAPS inadvertently. 

5. If Altered Studio mispronounces a word you can use phonetic spelling to adjust the pronunciation. This is particularly useful with homonyms. For example, “read” could be dictated as “reed” or “red”, depending on its use in the sentence structure so you can use one of these spellings to force Altered Studio to pronounce the word correctly.

6. You can use TTS together with STS to improve the quality of the output. For example you might use TTS to read a sentence with Austin, and then add a Morph effect that uses this TTS audio as an input. In this morph effect you would use Austin with a high prosody setting to add more performance to the original TTS read. A high Prosody setting will instruct the model to capture more of the Altered Professional voice model’s performance and less of the TTS voice model’s performance. It is good practice to test a variety of prosody levels until you find the correct balance for your input.

7. Altered Studio also includes Google’s ‘Studio’ TTS voices, which are designed for long-form text reading. By combining this TTS output with a morph effect you can generate a variety of long form voices and outputs. Note that using the Google Studio voices will consume more TTS quota than other voices due to Google's pricing for these voices.

8. TTS uses Text Tokens for quota. Your plan will include a number of Text Tokens which are consumed for each character when you generate TTS. Some voices (such as the Google Studio voices) consume more tokens per character than others. The number of tokens consumed per character for TTS is displayed for each voice on the Library and Voice Selection screen, for example 1(T). Note the character calculation ignores punctuation, and numerals are counted as 4 characters.