Benchmark results for Stream-VAD and AED models?

Hi, great work on FireRedVAD! 🎉

The README provides comprehensive benchmark results for the **VAD (non-streaming)** model on FLEURS-VAD-102, showing impressive SOTA performance (F1: 97.57%, AUC-ROC: 99.60%).

However, I noticed that the benchmark results for the other two models are not included:

1. **Stream-VAD** — Are there comparable benchmark results on FLEURS-VAD-102 or other datasets? Specifically, how does the streaming model compare to the non-streaming VAD in terms of F1, AUC-ROC, and latency?

2. **AED** — Since it detects three event types (speech, singing, music), are there benchmark results on audio event detection datasets? What metrics were used to evaluate the multi-class detection performance?

It would be very helpful for users to understand the trade-offs between the three models when choosing which one to use.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark results for Stream-VAD and AED models? #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark results for Stream-VAD and AED models? #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions