Audio transcription conditions

Note: Speech-to-text audio transcription is in limited early release and must be enabled by Vivox. For pricing information and to enable this service for your organization, contact your sales representative.

When a player is in a channel with another player who has speech-to-text transcription enabled, their speech is transcribed and sent in the following conditions:

If a player is within audible range when audio transcription completes, and they remain within audible range until the transcription is sent to the client, then they receive the audio transcription.
If a player re-enters audible range while they are speaking, then everything since they last entered audible range is transcribed.

A transcription is not sent in the following conditions:

If a player leaves audible range during the delay between the completion of speech and when the Vivox SDK receives the transcription, then no transcription is sent.
If a player leaves audible range during transcription, then no transcription is sent.

Note: There is a delay between the completion of speech and when the Vivox SDK receives the transcription.

Speech-to-text transcription follows the audio mute state for participants. If the participant's audio is muted, then audio transcription is not delivered to the app. This includes participants who are muted locally or for all users, and device muting for oneself in an echo channel. A user who mutes themselves still receives transcribed text from other users in the channel.

Speech audio volume does not generally matter for speech-to-text transcription, only whether the words are spoken clearly. Hardware level or software volume adjustments that change a user's audio volume do not impact the accuracy of transcribed speech. However, a person who is speaking quietly or mumbling can negatively impact transcription, although this is beyond developer control.

Audio transcription conditions#

Audio transcription conditions