What feature would you like to request?
Hi, I’m exploring how to enable Qdrant/bm25 to properly tokenize and vectorize the Thai language.
I noticed that the model cache directory contains multiple language-specific text files:
(see image below)
I’d like to add a custom Thai corpus based on this word list:
https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt
However, simply adding a thai.txt file to the model cache directory does not work dynamically (on the fly). Could you clarify the correct way to extend or register a new language corpus for BM25, or whether additional configuration or rebuilding steps are required?
Is there any additional information you would like to provide?
No response
What feature would you like to request?
Hi, I’m exploring how to enable Qdrant/bm25 to properly tokenize and vectorize the Thai language.
I noticed that the model cache directory contains multiple language-specific text files:
(see image below)
I’d like to add a custom Thai corpus based on this word list:
https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/words_th.txt
However, simply adding a thai.txt file to the model cache directory does not work dynamically (on the fly). Could you clarify the correct way to extend or register a new language corpus for BM25, or whether additional configuration or rebuilding steps are required?
Is there any additional information you would like to provide?
No response