Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I (*nix user) use a script that basically does:

    pdftotext -layout -eol unix -nopgbrk  $PDF | egrep ...
Many PDFs have compressed content streams, plain text utilities only see metadata in that case. Cached, compressed text-only output is usually tiny, and can be zgrep-ed.

pdfinfo shows document metadata (title, subject, keywords and more), but it's quite uncommon for these to be useful (Adobe and LᴬTᴇX-sourced PDFs tend to have this data).

Both come with xpdf.



This great; thanks for sharing!




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: