Skip to content

Unicode ecodeError while parsing the PDF files.  #17

@adityardesai

Description

@adityardesai

Hi

I am using NLTKRest server to parse few of the PDF files from Polar Trec Data and get the required NER quantities. But for most of the PDF files I am seeing the following error from the REST server.

"UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128) // Werkzeug Debugger "

Command used is
curl -X POST -d "PDF TEXT in STRING" http://localhost:8888/nltk.

Error file is attached as well.
nltkrest.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions