-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Fix/ensure correct transcript finished #3871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix/ensure correct transcript finished #3871
Conversation
Summary of ChangesHello @ChrisQlasty, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue where transcription text from Live Gemini models was not being correctly assembled, leading to improperly stitched words. The changes introduce a refined algorithm for concatenating transcription fragments, ensuring that spaces are inserted appropriately while respecting punctuation and apostrophes. This significantly improves the readability and accuracy of the final transcribed text. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Response from ADK Triaging Agent Hello @ChrisQlasty, thank you for creating this PR! This PR is a bug fix. Could you please associate a GitHub issue with this PR? If there is no existing issue, could you please create one? This information will help reviewers to review your PR more efficiently. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses an issue with stitching transcription fragments from Live Gemini models by introducing logic to correctly handle whitespace and punctuation. The solution is effective and is supported by a comprehensive set of new unit tests that cover various edge cases. My review focuses on improving code maintainability by reducing duplication. I've suggested refactoring the duplicated transcription stitching logic into a helper method in gemini_llm_connection.py and simplifying the test setup in test_gemini_llm_connection.py with a helper function. These changes will make the code cleaner and easier to manage in the future.
| existing = self._input_transcription_text | ||
| # Insert a space when joining fragments except when the new | ||
| # chunk starts with a punctuation character that should attach | ||
| # to the previous token, or the existing text ends with an | ||
| # apostrophe. | ||
| conditional_space = ( | ||
| ' ' | ||
| if existing | ||
| and not ( | ||
| new_input_transcription_chunk[0] in PUNCTUATION_CHARS | ||
| or existing.endswith("'") | ||
| ) | ||
| else '' | ||
| ) | ||
| self._input_transcription_text = f'{existing}{conditional_space}{new_input_transcription_chunk.strip()}'.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for stitching transcription chunks is duplicated for both input_transcription (here) and output_transcription (lines 226-240). To improve maintainability and adhere to the Don't Repeat Yourself (DRY) principle, this logic should be extracted into a private helper method.
For example, you could create a method like this:
def _stitch_transcription_chunk(self, existing_text: str, new_chunk: str) -> str:
if not new_chunk:
return existing_text
# Insert a space when joining fragments except when the new
# chunk starts with a punctuation character that should attach
# to the previous token, or the existing text ends with an
# apostrophe.
conditional_space = (
' '
if existing_text
and not (
new_chunk[0] in PUNCTUATION_CHARS
or existing_text.endswith("'")
)
else ''
)
return f'{existing_text}{conditional_space}{new_chunk.strip()}'.strip()Then you could call it like so:
self._input_transcription_text = self._stitch_transcription_chunk(self._input_transcription_text, new_input_transcription_chunk)
| message1 = mock.Mock() | ||
| message1.usage_metadata = None | ||
| message1.server_content = mock.Mock() | ||
| message1.server_content.model_turn = None | ||
| message1.server_content.interrupted = False | ||
| message1.server_content.turn_complete = False | ||
| message1.server_content.generation_complete = False | ||
| message1.tool_call = None | ||
| message1.session_resumption_update = None | ||
| message1.server_content.input_transcription = ( | ||
| types.Transcription(text=fragment1, finished=False) | ||
| if tx_direction == 'input' | ||
| else None | ||
| ) | ||
| message1.server_content.output_transcription = ( | ||
| types.Transcription(text=fragment1, finished=False) | ||
| if tx_direction == 'output' | ||
| else None | ||
| ) | ||
|
|
||
| message2 = mock.Mock() | ||
| message2.usage_metadata = None | ||
| message2.server_content = mock.Mock() | ||
| message2.server_content.model_turn = None | ||
| message2.server_content.interrupted = False | ||
| message2.server_content.turn_complete = False | ||
| message2.server_content.generation_complete = False | ||
| message2.tool_call = None | ||
| message2.session_resumption_update = None | ||
| message2.server_content.input_transcription = ( | ||
| types.Transcription(text=fragment2, finished=False) | ||
| if tx_direction == 'input' | ||
| else None | ||
| ) | ||
| message2.server_content.output_transcription = ( | ||
| types.Transcription(text=fragment2, finished=False) | ||
| if tx_direction == 'output' | ||
| else None | ||
| ) | ||
|
|
||
| message3 = mock.Mock() | ||
| message3.usage_metadata = None | ||
| message3.server_content = mock.Mock() | ||
| message3.server_content.model_turn = None | ||
| message3.server_content.interrupted = False | ||
| message3.server_content.turn_complete = False | ||
| message3.server_content.generation_complete = False | ||
| message3.tool_call = None | ||
| message3.session_resumption_update = None | ||
| message3.server_content.input_transcription = ( | ||
| types.Transcription(text=None, finished=True) | ||
| if tx_direction == 'input' | ||
| else None | ||
| ) | ||
| message3.server_content.output_transcription = ( | ||
| types.Transcription(text=None, finished=True) | ||
| if tx_direction == 'output' | ||
| else None | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The setup for message1, message2, and message3 is quite verbose and contains a lot of repeated code. To improve readability and maintainability, consider extracting the message creation into a helper function.
For example, you could define a helper within the test file:
def _create_mock_transcription_message(text: str | None, finished: bool, direction: str) -> mock.Mock:
msg = mock.Mock()
msg.usage_metadata = None
msg.server_content = mock.Mock()
msg.server_content.model_turn = None
msg.server_content.interrupted = False
msg.server_content.turn_complete = False
msg.server_content.generation_complete = False
msg.tool_call = None
msg.session_resumption_update = None
transcription = types.Transcription(text=text, finished=finished)
if direction == 'input':
msg.server_content.input_transcription = transcription
msg.server_content.output_transcription = None
else:
msg.server_content.input_transcription = None
msg.server_content.output_transcription = transcription
return msgThen, you could simplify the test setup significantly:
message1 = _create_mock_transcription_message(fragment1, False, tx_direction)
message2 = _create_mock_transcription_message(fragment2, False, tx_direction)
message3 = _create_mock_transcription_message(None, True, tx_direction)
Problem:
The input transcript text from Live Gemini models (tested on
gemini-live-2.5-flash-preview-native-audio-09-2025) is not properly stitched. Words are often glued as new chunks do not guarantee a whitespace at end/beginning.Solution:
Added logic that properly handles words stitching for the final transcript which comes with the
finished=Trueflag.I do not see it as very elegant but that solves most common issues with the final input transcript.
Testing Plan
Added unit test with several cases that were observed to be not working with ADK 1.20.
Unit Tests:
Manual End-to-End (E2E) Tests:
Configure live Gemini model to return input_transcription. Talk with the model and print
event.input_transcription.textwhenevent.input_transcription.finished=True. Talk a lot and observe badly stitched fragments.Checklist