Commit 850ed2e
committed
perf(hybrid): prevent expensive array slicing when cache is disabled
Added a `max_checkpoints > 0` check to the `finally` block of the generation loop.
Previously, even though the underlying C++ state extraction was bypassed, the Python layer was still executing `self._input_ids[:self.n_tokens].tolist()`. For long contexts, slicing and converting this massive array to a Python list caused unnecessary CPU overhead and garbage collection (GC) pressure. This intercept acts as a double-layer isolation, ensuring absolute zero memory allocation and zero overhead for hybrid models running in single-turn mode.1 parent 191e334 commit 850ed2e
1 file changed
Lines changed: 5 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1396 | 1396 | | |
1397 | 1397 | | |
1398 | 1398 | | |
1399 | | - | |
| 1399 | + | |
| 1400 | + | |
| 1401 | + | |
| 1402 | + | |
| 1403 | + | |
1400 | 1404 | | |
1401 | 1405 | | |
1402 | 1406 | | |
| |||
0 commit comments