Description
The DocumentEmbedder.embed_from_directory() method in src/rag/rag.py processes documents sequentially, which is slow when embedding large document collections. We should implement multi-threading to parallelize file processing, embedding generation, and database insertion operations to significantly improve performance.
Proposed Solution
Add multi-threading support to the DocumentEmbedder class to process multiple files concurrently. This would involve:
- Using a thread pool to process files in parallel
- Batching embeddings and database insertions efficiently
- Maintaining thread safety for database operations
Related Code
src/rag/rag.py - DocumentEmbedder class (lines 328-464)
Description
The
DocumentEmbedder.embed_from_directory()method insrc/rag/rag.pyprocesses documents sequentially, which is slow when embedding large document collections. We should implement multi-threading to parallelize file processing, embedding generation, and database insertion operations to significantly improve performance.Proposed Solution
Add multi-threading support to the
DocumentEmbedderclass to process multiple files concurrently. This would involve:Related Code
src/rag/rag.py-DocumentEmbedderclass (lines 328-464)