Before You Report a Bug, Please Confirm You Have Done The Following...
neo4j-graphrag-python's version
1.10.1
Python version
3.12
Operating System
Debian 13
Dependencies
"datasets==3.6.0",
"flask>=3.1.2",
"neo4j>=5.28.2",
"neo4j-graphrag[nlp,ollama,sentence-transformers]>=1.10.1",
"streamlit>=1.52.1",
Reproducible example
PDF_FILE = './some-file.pdf' # a PDF file of big size -- bigger than the model context
kg_builder = SimpleKGPipeline(
llm=llm,
driver=neo4j_driver,
embedder=embedder,
from_pdf=True,
text_splitter=text_splitter,
)
await kg_builder.run_async(file_path=PDF_FILE)
Relevant Log Output
JSONDecoder error stating that the output is not in JSON format
Expected Result
I expect the pipeline to finish
What happened instead?
In the schema extraction phase, the pipeline does not split the document(s) in chunks; rather it gives the whole document, despite its size, to the ollama endpoint in a single prompt.
Additional Info
What happens is that the pipeline expects to extract the schema with a single request to OLLAMA giving it the whole document. Instead, it should perform this step chunk-by-chunk.
By giving the whole document in the prompt, the model loses track of the instructions.
Before You Report a Bug, Please Confirm You Have Done The Following...
neo4j-graphrag-python's version
1.10.1
Python version
3.12
Operating System
Debian 13
Dependencies
Reproducible example
Relevant Log Output
JSONDecoder error stating that the output is not in JSON format
Expected Result
I expect the pipeline to finish
What happened instead?
In the schema extraction phase, the pipeline does not split the document(s) in chunks; rather it gives the whole document, despite its size, to the ollama endpoint in a single prompt.
Additional Info
What happens is that the pipeline expects to extract the schema with a single request to OLLAMA giving it the whole document. Instead, it should perform this step chunk-by-chunk.
By giving the whole document in the prompt, the model loses track of the instructions.