The DIPY Code Assistant is a comprehensive tool designed to facilitate the extraction, processing, and retrieval of information from the DIPY documentation and codebase. Utilizing advanced language models and embedding techniques, this tool aims to provide an efficient and user-friendly interface for interacting with the extensive documentation of DIPY, a leading library for diffusion MRI and related processing.
- Document Conversion: Convert various source files and websites into text format, ensuring compatibility and ease of processing.
- Embeddings and Vector Stores: Employ LlamaCppEmbeddings and DeepLake vector stores for efficient information retrieval.
- Question Answering: Implement a QA system to answer user queries based on the extracted documents.
- Streamlit Interface: Provide an interactive web application for user interaction.
- Python 3.8+
- Streamlit
- Langchain Community
- BeautifulSoup
- Tiktoken
- dotenv
- DeepLake
- Llama Cpp
-
Clone the Repository:
git clone https://github.com/your-username/dipy-code-assistant.git cd dipy-code-assistant -
Create a Virtual Environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install Dependencies:
pip install -r requirements.txt
For installing llama with GPU support on Linux follow the below steps:
- Install OpenBLAS
sudo apt-get install openblas-dev
- Set environments variables
set CMAKE_ARGS="-DLLAMA_CUBLAS=on" set FORCE_CMAKE=1
- Install llama-cpp-python
pip install llama-cpp-python
-
Set Up Environment Variables:
Create a .env file in the project root directory and add your configuration details.ACTIVELOOP_TOKEN=your_deeplake_token
- a. Run the Streamlit App for Simple RAG based app:
b. Run the Streamlit App for Self Corrective RAG based app:
streamlit run app.py
streamlit run self_corrective_app.py
- Interact with the Application: Open your browser and navigate to the displayed URL (default: http://localhost:8501). Enter your queries in the text input box and receive responses from the bot.
-
Convert Codebase Files to Text:
python preprocess.py --flag source --upload -
Convert Website Content to Text:
python preprocess.py --flag website --upload
dipy-code-assistant
├── app.py # Streamlit application
├── self_correctiv_app.py # Streamlit application for self corrective RAG
├── create_db.py # Script to create the database and retriever
├── preprocess.py # Script to preprocess files and websites
├── utils.py # Utility functions for processing
├── self_corrective_utils.py # Utility functions for self corrective RAG app
├── requirements.txt # List of dependencies
├── .env # Environment variables
├── README.md # Project README file
└── model/ # Directory to store model files
load_model: Caches and loads the language model.load_retriever: Caches and loads the document retriever.handle_qa: Handles user queries and provides answers using the loaded model and retriever.- Streamlit UI Setup: Sets up the user interface for interaction.
run: Main function to process input and create a retriever based on the provided flag (source or website).
convert_files_to_txt: Converts source code files to text format and splits them into chunks.convert_website_to_text: Converts website content to text format and splits it into chunks.
num_tokens_from_string: Calculates the number of tokens in a text string.RecursiveCharacterTextSplitter: Splits documents into chunks for efficient processing.
- Enhanced Error Handling: Improve error handling mechanisms for robust processing.
- Additional Embedding Models: Integrate more embedding models to support a wider range of document types.
- User Authentication: Implement user authentication for secure access to the tool.
- Advanced Search Capabilities: Develop more sophisticated search algorithms to provide better and faster results.
- Integration with Other Data Sources: Enable the tool to retrieve and process information from additional data sources and APIs.
- User Interface Improvements: Enhance the Streamlit interface for better user experience and accessibility.
- Performance Optimization: Optimize the tool's performance for faster processing and response times.
- Detailed Logging and Monitoring: Implement logging and monitoring features to track the tool's usage and performance metrics.
This project is licensed under the MIT License. See the MIT License file for more details.
For any queries or support, please reach out to Aayush Jaiswal.