Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder if crooks will try to exploit this crawl. As a person who has an index of the web like this it has been interesting to see what they look for. SSN's and credit card numbers are common, as are sites running older versions of PHP software or exploitable shopping carts.


It makes it very easy for people to steal vast amounts of your content and republish it on their own sites, with ads all around it.

Many content sites have protections in place to recognize bots by their behavior or use "honeypots" to tell bots apart from human visitors and thus avoid large scale content theft.


Presumably those protections would prevent this bot from collecting data as well?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: