Error reading a pdf file using tika Python

  apache-tika, operating-system, python, windows

Error reading a pdf file using tika Python

I’m trying to read a pdf file using Python. I’ve tried PyPDF2, but the output given is not very accurate. So I’ve read here (How to extract text from a PDF file?) that using tika I would obtain a better solution.

This is my code

from tika import parser
raw = parser.from_file('inputsSergio Martin.pdf')
print(raw['content'])

And the error.

AttributeError: module ‘os’ has no attribute ‘setsid’

I’ve also read this answer (AttributeError: module ‘os’ has no attribute ‘setsid’) but I don’t know if this has something to do here or not.

I’m running this on Windows


Source: StackOverflow

One Reply to “Error reading a pdf file using tika Python”

  • In the file Tika.py made the following change (palliative solution):
    In
    TikaServerProcess = Popen (cmd_string, stdout = logFile, stderr = STDOUT, shell = True, preexec_fn = os.setsid)

    For

    TikaServerProcess = Popen (cmd_string, stdout = logFile, stderr = STDOUT, shell = True)

Leave a Reply to Alessandro Cancel reply