Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From the link: "Camelot only works with text-based PDFs and not scanned documents." If you have character data, using it is almost always going to be more accurate than OCR.

I don't know how OP uses it with images converted to PDFs though, as that would be just like a scan, and ImageMagick doesn't do OCR as far as I can tell.



It uses pytesseract and Open-CV, so there is image processing.


Looks like it's a bit in-progress: https://github.com/camelot-dev/camelot/pull/209

"Update docs" isn't checked, and that's what I was going on.


Yes I need to work on that PR, haven't been getting a lot of free time these days. It adds OCR support using EasyOCR, which I found on HN some time ago!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: