> Ok, ok, ok. You can’t extract text from any document at the moment, but textra... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		goblin89 on Aug 3, 2014 \| parent \| context \| favorite \| on: Textract, a Python package for extracting text fro... > Ok, ok, ok. You can’t extract text from any document at the moment, but textract integrates support for many common formats and we designed it to be as easy as possible to add other document formats. There go my hopes to see painless OCR library for Python…

deanmalmgren on Aug 4, 2014 | [–]

Hopefully it will be? There's a great suggestion to use tesseract-ocr to make this happen. https://github.com/deanmalmgren/textract/issues/16

If you have any other (better?) ways of doing this, feel free to add some comments on the issue tracker.

minimaxir on Aug 4, 2014 | [–]

The list of common formats is still pretty robust.

http://textract.readthedocs.org/en/latest/#currently-support...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact