-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Hi, great work on FireRedVAD! 🎉
The README provides comprehensive benchmark results for the VAD (non-streaming) model on FLEURS-VAD-102, showing impressive SOTA performance (F1: 97.57%, AUC-ROC: 99.60%).
However, I noticed that the benchmark results for the other two models are not included:
-
Stream-VAD — Are there comparable benchmark results on FLEURS-VAD-102 or other datasets? Specifically, how does the streaming model compare to the non-streaming VAD in terms of F1, AUC-ROC, and latency?
-
AED — Since it detects three event types (speech, singing, music), are there benchmark results on audio event detection datasets? What metrics were used to evaluate the multi-class detection performance?
It would be very helpful for users to understand the trade-offs between the three models when choosing which one to use.
Thanks!