Hi, thanks for the code, it works perfect! I have a quick question, why use pypdf2 to split the pdf first? I think pdfminer can work with multiple pages and extract the content as well. Will the additional dependency, pdf splitting, pdf writing, text merging make the pipeline more complicated? I'm very curious to understand the design, thank you.
Hi, thanks for the code, it works perfect! I have a quick question, why use
pypdf2to split the pdf first? I thinkpdfminercan work with multiple pages and extract the content as well. Will the additional dependency, pdf splitting, pdf writing, text merging make the pipeline more complicated? I'm very curious to understand the design, thank you.