Voice activity detection
Voice activity detection detects the presence or absence of speech in an application. In most cases, customers do not need to adjust the default voice activity detection (VAD) settings of the Vivox SDK.
Tip: Before manually tuning the VAD settings, test your setup using Automatically Adjusted VAD. Automatically Adjusted VAD is better at detecting a player speaking than the default VAD settings.
The Automatically Adjusted VAD can be enabled using VivoxService.Instance.EnableAutoVoiceActivityDetectionAsync()
. This will allow the SDK to automatically configure the VoiceActivityDetection settings. This will override any manual settings from VivoxService.Instance.SetVoiceActivityDetectionPropertiesAsync()
.
If the Automatically Adjusted VAD has been disabled using VivoxService.Instance.DisableAutoVoiceActivityDetectionAsync()
, then VivoxService.Instance.SetVoiceActivityDetectionPropertiesAsync(int hangover, int noiseFloor, int sensitivity)
can be called to either set the properties specifically, or to reset them to the default levels.
Parameter specifics
The hangover
parameter defines the amount of time (in milliseconds) it takes for the VAD to switch from speech mode back to silence after the last frame of speech is detected. The default setting is 2000.
The noiseFloor
parameter is a dimensionless value between 0 and 20000 that controls how the VAD separates speech from background noise. Lower values assume the user is in a quieter environment where the audio is only speech. Higher values assume a noisy background environment. The default value is 576.
Note: Changes to the VAD noiseFloor settings do not affect currently joined channels. If the ability to change VAD settings is available to the end-user, indicate that noise floor changes only take effect in the next voice session or only allow changing the noise floor channel when the client is not in a channel.
The sensitivity
parameter is a dimensionless value between 0 and 100 that indicates the sensitivity of the VAD. Increasing this value corresponds to decreasing the sensitivity of the VAD (0 is the most sensitive, and 100 is the least sensitive). Higher values of sensitivity require louder audio to trigger the VAD. The default value is 43.
Applications that use the default VAD and expose vad_sensitivity as a slider should limit the possible settings between zero (transmit all mic activity) and 70 (very selective).