Pdf-text-extraction-using-python

A simple console based python code to extract texts from scanned pdfs using tesseract OCR of python.

Install the tesseract file, copy the tesseract.exe path in the python file, keep the temporary folder as it is because it stores the temporary images generated, it gets deleted afterwards. For word files, give the path with the file name example: D:/files/Texts.docx