SenseVoice 和 Fun-ASR-Nano 的定位到底什么不同

Notice: In order to resolve issues more efficiently, please raise issue following the template.
（注意：为了更加高效率解决您遇到的问题，请按照模板提问，补充细节）

## ❓ Questions and Help

你好，我之前给 funasr开过一个问题单 https://github.com/FunAudioLLM/Fun-ASR/issues/100  想了解 Fun-ASR-Nano 这个模型情况。

我需要一个实时的ASR模型，我搜索 https://github.com/modelscope/FunASR/ 问题单，注意到有 2829 和 [2836](https://github.com/modelscope/FunASR/issues/2836) 回答 ”Fun-ASR-Nano 当前为离线模型，不支持真正的流式推理。如需实时识别，请使用 `paraformer-zh-streaming` 模型“ . 但是 paraformer-zh-streaming`  我自己做过测试，感觉识别准确率不太理想，而且 我看它最后的更新是 2024-01-23 ，感觉是不是因为数据比较老原因，造成识别率不理想。

然后我看到 SenseVoiceSmall 的 性能数据 https://github.com/modelscope/FunASR#benchmark

Model | GPU Speed | CPU Speed | vs Whisper-large-v3
-- | -- | -- | --
SenseVoice-Small | 170x realtime | 17x realtime | 🚀 13x faster
Paraformer-Large | 120x realtime | 15x realtime | 🚀 9x faster
Whisper-large-v3-turbo | 46x realtime | ❌ | 3.4x faster
Fun-ASR-Nano | 17x realtime | 3.6x realtime | 1.3x faster


SenseVoice-Small  的 RTF 是比 Fun-ASR-Nano 快太多了

所以想知道 SenseVoice-Small  和  Fun-ASR-Nano 各自优缺点到底是什么，各自适合什么样的场景？


另外，我刚看到  https://github.com/modelscope/FunASR/issues/2886 你更新说 ”FunASR 最新版本已支持基于 vLLM 的实时流式推理，推荐使用 Fun-ASR-Nano 模型（31 种语言）“

但实时识别是不是现在就推荐 Fun-ASR-Nano了？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SenseVoice 和 Fun-ASR-Nano 的定位到底什么不同 #288

❓ Questions and Help

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model	GPU Speed	CPU Speed	vs Whisper-large-v3
SenseVoice-Small	170x realtime	17x realtime	🚀 13x faster
Paraformer-Large	120x realtime	15x realtime	🚀 9x faster
Whisper-large-v3-turbo	46x realtime	❌	3.4x faster
Fun-ASR-Nano	17x realtime	3.6x realtime	1.3x faster

SenseVoice 和 Fun-ASR-Nano 的定位到底什么不同 #288

Description

❓ Questions and Help

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions