Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Robots.txt is different. Without it, bots have no way of knowing whether to get any other data from the site. You would need "bots allowed" information in the HTTP handshake itself to prevent bots from accidentally hitting pages they shouldn't. This can already be Very Bad.


> You would need "bots allowed" information in the HTTP handshake itself to prevent bots from accidentally hitting pages they shouldn't.

Humans could also hit such pages. If your GET requests change state, there's no helping you.


The whole point of robots.txt is that there are pages which people may hit that bots can't. What are you on?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: