What is content scraping?
Content scraping is an illegal method of stealing original content from a legitimate website and posting the stolen content on another website without the knowledge or permission of the content owner. Content scrapers often attempt to inherit stolen content as their own and fail to provide an attribution to the content owners.
Content scraping can be achieved by manual copy and paste, or more sophisticated techniques can be used, such as. B. the use of special software, HTTP programming or HTML or DOM parsers.
Much of the content that is scrapped is copyrighted material; Publishing it without the consent of the copyright holder is a criminal offense. However, scraper sites are hosted all over the world, and scrapers asked to remove copyrighted content can simply switch domains or disappear.
Content scrapers are able to drive traffic to their websites by scraping high-quality, keyword-dense content from other websites. Bloggers are particularly vulnerable to this, probably because individual bloggers are unlikely to launch a legal attack on scrapers. Scrapers are encouraged to continue this practice as search engines have not yet found an effective way to filter unique content out of scraped content so that scrapers can continue to benefit.
Website administrators can protect themselves against scraping by simple measures, e.g. B. Add links to your own website within the content. This will at least allow them to get some traffic from scraped content. More comprehensive methods of handling bot scraping include:
- Commercial anti-bot applications
- Catch bots with a honey pot and block their IP addresses