Implement intelligent chunking for large repository wiki generation and add XML structure prompt#361
Implement intelligent chunking for large repository wiki generation and add XML structure prompt#361adriandarian wants to merge 6 commits intoAsyncFuncAI:mainfrom
Conversation
…nd add XML structure prompt
Summary of ChangesHello @adriandarian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the system's capability to generate wiki documentation for large repositories. It introduces an intelligent file chunking mechanism that allows the API to process vast codebases more efficiently by breaking them down into smaller, manageable parts. Concurrently, it integrates a new XML-formatted prompt to ensure structured and consistent wiki output. These changes collectively enable a more robust and scalable approach to repository analysis and wiki generation, particularly for extensive projects. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces an intelligent chunking system for processing large repositories to generate wikis. It adds new logic for file collection, filtering, and chunking, and updates the /local_repo/structure endpoint to leverage this system. A new XML-based prompt for wiki structure generation is also included, with corresponding updates to the WebSocket handler to process these requests. My review focuses on improving the robustness of the chunking logic, increasing efficiency by removing redundant operations, and enhancing code quality by addressing debug artifacts, local imports, and duplicated code. While the chunking infrastructure is well-started, the functions to process these chunks are currently placeholders and will need implementation.
- Refactor `collect_all_files` to return README content alongside file paths. - Introduce `handle_response_stream` to streamline response processing for different providers. - Update WebSocket handling to utilize the new response handling function, reducing code duplication. - Improve logging for better traceability during file collection and response streaming.
|
good job! |
Have you considered introducing AST chunk,like https://developers.llamaindex.ai/python/framework-api-reference/node_parsers/code/ |
- Added ASTChunker class for semantic chunking of code files. - Integrated AST chunking with existing adalflow pipeline via ASTTextSplitter. - Created configuration for AST chunking in embedder.ast.json. - Updated data pipeline to support AST chunking based on configuration. - Developed enable_ast.py script to toggle AST chunking on and off. - Enhanced logging for chunking statistics and errors. - Added support for various programming languages in AST chunking. - Updated docker-compose to allow enabling AST chunking during build.
… docker-compose for config mounting
Had not considered before but like it, so here is an updated with a docker-compose flag to toggle AST on/off |
- Kept get_local_repo_structure import for chunked wiki generation - Added AWS credentials imports for Bedrock support
This PR introduces two major improvements to the
deepwiki-openproject:Intelligent Chunking for Large Repository Wiki Generation
Add XML Structure Prompt
Details
Motivation
Impact
How to Test
Closes Issues:
Reviewer Notes:
Please pay particular attention to chunking edge cases and XML schema compliance.