TTS Speaking Styles, Cross-lingual Voice Morphing, API Access, On Premise Servers, Pricing & Licenses
03 November 2022 | 5 min read min
Altered Studio empowers Voice Over creators with Voice AI. We are constantly pushing the boundaries of Speech Synthesis aiming to commoditize technologies that otherwise would have benefited only a handful of companies. We believe that this is the best way to serve consumers and preserve the creative ecosystems around Voice Over while new technologies disrupt prior practices.
We've worked really hard to bring this amazing product to you, but it's still early days and there is a lot more we want to do! We listen to your feedback and we are happy to have you with us on our journey.
New Pricing Plans
As with every new technology, we started with exclusive packages, but from this latest release, we’re introducing the Creator Plan featuring new fast models that allow us to expand our offer to smaller companies and individuals. Our Indie Plan was deprecated by our new Professional Plan aimed towards SMEs. Finally, we ramped up our Enterprise Plan to cater to the needs of our bigger clients.
All plans now benefit from:
- Increased synthesis time that can go up to Unlimited when using your own resources.
- Text-To-Speech Synthesis with Speaking Styles; we added 14 Speaking-Styles per voice, for each of our Portfolio Voices.
- New Timbre models suitable for cross-lingual voice morphing and emotes. These models preserve the accent of the source speaker.
- More voices, especially, common voices of everyday people that can be used for background characters and crowds.
- Bigger/Better/Faster models everywhere.
- Integration with major 3-rd party Text-To-Speech providers.
- Text-To-Text translation that can be used for cross-lingual Voice Dubbing.
- An audio effect stack that allows you to perform many of your post-production tasks within Altered Studio.
Text-To-Speech with Speaking Styles
Text-To-Speech systems are capable of replicating the performance of the original Voice-Actor recordings that were used to train the AI model. As such, the performance is rigid and bound to the original recordings used for training, having a hard time providing the right gravitas and tone modulation that fits the context and the intent, resulting in outputs perceived as monotone and robotic.
The Text-To-Speech performance gap can easily be bridged using our Speech-To-Speech Voice Morphing, but it’s not always possible to have somebody driving the performance. For example, many game companies use Text-To-Speech as an audio placeholder to synthesize 10s of 1000s of script lines that will be replaced shortly after the recordings.
This is where Text-To-Speech with Speaking Styles comes in. The concept is pretty simple: we have pre-recorded a number of different speaking styles with each of our Actors in order to allow the TTS to perform them.
A Speaking Style is nothing more than a consistent way of speaking or performing. In the new version of Altered Studio, we added 14 Speaking Styles to all our Portfolio Voices. The first 4 are geared towards non-emotional content.
- Game Persona: a character that you would add to a Game. It is characterized by a consistent performance with vibrant high dynamics that seeks to captivate the attention.
- Declarative: Alexa-style speech. Suitable for declarative sentences that communicate information with confidence.
- News: suitable for reading news-articles.
- Narration: suitable for narrations typically found in documentaries, passages, etc.
The remaining 10 Speaking Styles are emotional covering various complex emotional states and projection levels like whispering, shouting, etc. All tailor-made for Video Games and might differ from Voice to Voice in our Portfolio Voices. For example:
- Neutral, whisper
- Aggressive, shout
Cross-Lingual Voice Morphing
Prior to this release, all our voice models were performing a level of Accent Conversion. This is not always desirable because there are use-cases where the accent needs to be retained. Like, when a foreigner character speaks in another language that it’s not his mother tongue.
For these use-cases, we made the new Timbre Voice Model that converts the Timbre of the voice without changing the accent. Changing Voice Timbre is roughly equivalent to changing the color of the voice and the identity - almost like performing a body swap.
Changing only physical aspects of the voice and not accent, that typically involves higher-level cognitive functions, has a powerful side-effect: it allows our Timbre Model to be Universal in the sense that it can seamlessly handle a wide range of languages and vocalizations (emotes), like in the example below:
API Access & On-Premise servers
As per common request by our Enterprise customers, we now offer API access to Altered Studio functionality. Our API allows users to directly integrate their workflow infrastructure with powerful Altered technology, easily and straightforward.
Further, we upgraded our local desktop server to be able to use multiple GPUs, so that it can now handle queries at scale with resource and priority management. Our Enterprise clients can now choose to run on-premise compute servers with just a few clicks in Altered Studio. This ensures a high level of security, given that content data traffic remains within their intranet.
In order to support a wider variety of users with different needs, we have added two new licenses on top of our standard Commercial License. These limited licenses allow us to provide access to Altered Studio to a lower cost.
Our Educational License is designed to make Altered Studio more accessible for students and non-profit organizations, so that they can publish educational projects. This license allows students to publish non-educational projects in public social platforms, such as YouTube etc, as long as the projects are not monetized.
Our Creator License is designed to make Altered Studio more accessible for individuals or companies with up to 5 employees. It allows commercial use of any outputs the customer creates with Altered Studio, as long as the overall revenue or funding from relevant activities is below $100k over the last 12 months.