They also have their own web scraper called ByteSpider that scrapes websites with lots of text very aggressively and ignores robots.txt. I've had to block it by useragent on one of my sites.
I don't think it ignores robots.txt, I think it just doesn't have a very good parser and you need to give them their own user-agent block. I had a similar level of frustration.
After all, if they wanted to completely ignore the wishes of the website owners they probably would not announce their spider as such in the user agent. They’d just pretend to be a web browser.
Some of them yes. But not all. Try for example to browse a Cloudflare protected site from Tor and you will be hit with a constant barrage of captchas even though you are only doing GET requests.
Yes, huristicly, a tor browser is more likely to be nefarious than a regular browser user. Note the use of huristisc - such as IP address - not related to user agent.