Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Ok, ok, ok. You can’t extract text from any document at the moment, but textract integrates support for many common formats and we designed it to be as easy as possible to add other document formats.

There go my hopes to see painless OCR library for Python…



Hopefully it will be? There's a great suggestion to use tesseract-ocr to make this happen. https://github.com/deanmalmgren/textract/issues/16

If you have any other (better?) ways of doing this, feel free to add some comments on the issue tracker.


The list of common formats is still pretty robust.

http://textract.readthedocs.org/en/latest/#currently-support...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: