Error reading a pdf file using tika Python
I’m trying to read a pdf file using Python. I’ve tried
PyPDF2, but the output given is not very accurate. So I’ve read here (How to extract text from a PDF file?) that using
tika I would obtain a better solution.
This is my code
from tika import parser raw = parser.from_file('inputsSergio Martin.pdf') print(raw['content'])
And the error.
AttributeError: module ‘os’ has no attribute ‘setsid’
I’ve also read this answer (AttributeError: module ‘os’ has no attribute ‘setsid’) but I don’t know if this has something to do here or not.
I’m running this on Windows