Skip to content

What's a provider?

A Provider represents an instance of a service that delivers specific models or functionalities related to speech processing tasks. Providers handle diverse capabilities, such as Text-to-Speech (TTS), Speech-to-Text (STT), or sometimes both.

To clearly recognize providers, we differentiate them based on two main aspects:

  1. Nature: Determines whether the provider operates via third-party APIs or within a local infrastructure.
  2. Types: Indicates which functionality the provider specializes in — TTS, STT, or both.

Nature of providers

Cloud providers

These providers are connected to externals APIs managed by third-party services such as Google Cloud or OpenAI. For instance:

  • The provider openai serves models like whisper using the OpenAI API.


Local Providers

These providers are hosted on SonarSpeak-managed cloud environments, whether on serverless or on-premise cloud infrastructure. They operate independently of third-party APIs. For example:

  • The provider sonarspeak_serverless serves models like whisperx, which are exclusive to the local SonarSpeak ecosystem.

Types of Providers

Providers can specialize in one or both of the following functionalities:

  • Speech-to-Text (STT): Singaled by , converts spoken language into text.
  • Text-to-Speech (TTS): Singaled by , converts text into spoken language.
  • TTS and STT: Singaled by , combines the above two functionalities.