Problem description
Right now, the model from useLLM hook does not manage context window automatically and it results in errors when it is exceeded. Ideally, the model should remove the oldest tokens/messages automatically so the app does not crash. The problem occurs in both functional and managed approach.
After exceeding the window size, an error is thrown: [Error: Failed to generate text, error code: 18].
Discord thread
Steps to reproduce
- run few longer prompts (or many shorter prompts) using
.generate() or .sendMessage() method
What should be done
- in the managed approach, the oldest messages/tokens should be removed automatically if the context were to be exceeded
- in the functional approach, the state of messages should not be stored at all, the history of messages should be reset after each generation
- we should probably throw some more descriptive error
Benefits to React Native ExecuTorch
- users won't have to manage the messages history by their own