Skip to content

Conversation

@DePasqualeOrg
Copy link
Contributor

When transcribing short audio clips, the model may output a single_timestamp_ending pattern (text followed by a single timestamp and EOT). Previously, this would advance seek by the full 30-second segment size, potentially skipping remaining audio content.

This fix advances seek to the timestamp position instead, allowing transcription to continue from that point.

You can test this by checking out commit 9820718 and running whisper/demo_seek_fix.py to see the behavior before the fix. This behavior was specific to only one model variant in my testing.

The test script can be deleted before merging.

Before:

Text: The examination and testimony of the experts enabled the commission to conclude that five
Segments: 1
Accuracy: 71%

After:

Text: The examination and testimony of the experts enabled the commission to conclude that five shots may have been fired.
Segments: 2
Accuracy: 100%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant