Skip to content

Latest commit

 

History

History
6 lines (5 loc) · 404 Bytes

File metadata and controls

6 lines (5 loc) · 404 Bytes

Pdf-text-extraction-using-python

A simple console based python code to extract texts from scanned pdfs using tesseract OCR of python.

Install the tesseract file, copy the tesseract.exe path in the python file, keep the temporary folder as it is because it stores the temporary images generated, it gets deleted afterwards. For word files, give the path with the file name example: D:/files/Texts.docx