- Operating System: Ubuntu 18.04+ or CentOS 7.0+
- CPU Architecture: x86-64
- Performance Requirements:
- CPU: 8 cores at 1.8 GHz or higher
- Memory: 2 GB (4 GB+ recommended)
- Network Requirements:
- Public IP
- Access to
.agora.io
and.agoralab.co
domains
- Apache Maven or other build tools
- JDK 8+
Refer to the official example documentation
For detailed examples, please refer to examples/README.md
For complete API documentation, please visit Agora Java Server SDK API Reference
AgoraAudioVadV2
is a Voice Activity Detection (VAD) module used to process audio frames. It can detect voice activity in audio streams and handle them based on configuration parameters.
public AgoraAudioVadV2(AgoraAudioVadConfigV2 config)
-
Parameters
config
:AgoraAudioVadConfigV2
type, VAD configuration.
Property Name | Type | Description | Default Value | Range |
---|---|---|---|---|
preStartRecognizeCount | int | Number of audio frames saved before starting speech state | 16 | [0, Integer.MAX_VALUE] |
startRecognizeCount | int | Number of audio frames in speech state | 30 | [1, Integer.MAX_VALUE] |
stopRecognizeCount | int | Number of audio frames in stop speech state | 20 | [1, Integer.MAX_VALUE] |
activePercent | float | Percentage of active frames in startRecognizeCount frames | 0.7 | [0.0, 1.0] |
inactivePercent | float | Percentage of inactive frames in stopRecognizeCount frames | 0.5 | [0.0, 1.0] |
startVoiceProb | int | Probability threshold for starting voice detection | 70 | [0, 100] |
stopVoiceProb | int | Probability threshold for stopping voice detection | 70 | [0, 100] |
startRmsThreshold | int | RMS threshold for starting voice detection | -50 | [-100, 0] |
stopRmsThreshold | int | RMS threshold for stopping voice detection | -50 | [-100, 0] |
startVoiceProb
: The lower the value, the higher the probability that the frame is judged as active, and the earlier the start phase begins. Lower it for more sensitive voice detection.stopVoiceProb
: The higher the value, the higher the probability that the frame is judged as inactive, and the earlier the stop phase begins. Increase it for quicker end of voice detection.startRmsThreshold
andstopRmsThreshold
:- The higher the value, the more sensitive to voice activity.
- In quiet environments, the default value of -50 is recommended.
- In noisy environments, it can be adjusted to between -40 and -30 to reduce false positives.
- Fine-tune according to the actual usage scenario and audio characteristics for optimal results.
public synchronized VadProcessResult processFrame(AudioFrame frame)
- Parameters
frame
:AudioFrame
type, the audio frame.
- Returns
VadProcessResult
type, the result of the VAD process.
public synchronized void destroy()
- Destroys the VAD module and releases resources.
Stores the VAD process result.
public VadProcessResult(byte[] result, Constants.VadState state)
- Parameters
result
:byte[]
type, the processed audio data.state
:Constants.VadState
type, the current VAD state.
Here is a simple example demonstrating how to use AgoraAudioVadV2
to process audio frames:
import io.agora.rtc.AgoraAudioVadV2;
import io.agora.rtc.AgoraAudioVadConfigV2;
import io.agora.rtc.Constants;
import io.agora.rtc.AudioFrame;
import io.agora.rtc.VadProcessResult;
public class Main {
public static void main(String[] args) {
// Create VAD configuration
AgoraAudioVadConfigV2 config = new AgoraAudioVadConfigV2();
config.setPreStartRecognizeCount(16);
config.setStartRecognizeCount(30);
config.setStopRecognizeCount(20);
config.setActivePercent(0.7f);
config.setInactivePercent(0.5f);
config.setStartVoiceProb(70);
config.setStopVoiceProb(70);
config.setStartRmsThreshold(-50);
config.setStopRmsThreshold(-50);
// Create VAD instance
AgoraAudioVadV2 vad = new AgoraAudioVadV2(config);
// Simulate audio frame processing
AudioFrame frame = new AudioFrame();
// Set frame properties...
VadProcessResult result = vad.processFrame(frame);
if (result != null) {
System.out.println("VAD State: " + result.getState());
System.out.println("Processed Data Length: " + result.getResult().length);
}
// Destroy VAD instance
vad.destroy();
}
}
- Enhanced the
processFrame
handling inAgoraAudioVadV2
with newSTART_SPEAKING
andSTOP_SPEAKING
state callbacks. - Improved parameter types for encoded frame callbacks.
onEncodedAudioFrameReceived
,onEncodedVideoImageReceived
, andonEncodedVideoFrame
now useByteBuffer
instead ofByte
arrays, enhancing performance and flexibility. - Optimized VAD plugin startup;
enableExtension
is now implemented within the SDK, so applications no longer need to call this method manually. - Fixed issues with the handling of
alphaBuffer
andmetadataBuffer
inVideoFrame
.
- Please update the code using encoded frame callbacks to accommodate the new
ByteBuffer
parameter type. - If you previously called the
enableExtension
method for the VAD plugin manually, you can now remove that call.
- Added
Vad2
interfaces related toAgoraAudioVad2
and removedVad
interfaces related toAgoraAudioVad
. - Added a new callback interface for receiving encoded audio frames:
IAudioEncodedFrameObserver
. - Fixed crashes related to
LocalAudioDetailedStats
callbacks. - Modified the parameter types for the
onAudioVolumeIndication
callback.
- For detailed release notes, please refer to the Release Notes.
If you encounter any issues, please refer to the Documentation Center or search for related issues on GitHub Issues
- Technical Support: [email protected]
- Business Inquiries: [email protected]
- Other Architectural Support: [email protected]