XMCP writes one of the better black hat SEO blogs. In a post last November, he laid out a ton of advice about automating black hat SEO. Personally, I don’t approve of doing black hat SEO. Still, it’s an intellectually interesting subject. What’s more, black hat SEOs create a large fraction of all websites, and certainly of all blog comments, links, and so on. So it’s interesting to track them.
Most interesting to me and probably to most readers here is the part that shows where black hat SEOs get their content:
- Know your approach. You really have only 4-5 options
- Direct Scraping, full data.
- RSS Feeds
- Content Generation/Markov Scripts
- Manual, offshore labor
- Make sure to have an easy way for those who do your writing to retrieve their assignments. Get a reliable crew that will check the buffer every day, and start pumping out the desired articles. Include an easy way for them to submit their work on a webpage.
- If possible, have an automated payout system. Keep an automatic tally of their submitted articles, and have your script login to paypal and send them their payment. Be careful though, to avoid no payment, or god fobid duplicate payment.
- Gibberish (Scrape/Cloaking sites)
- No matter which way you choose do get your data, make sure it’s stored in a swiftly accessible database, and backed up consistently. Have it so all sites that are out there reference this database by domain, not IP. This way, if that server goes down, or is too distant from your most active web host, you can easily re-reroute the traffic to the backup database.
- Have your content creation feature tie directly in to your keyword/topic database.
The idea behind those “Markov scripts” is that you
- Obtain a large amount of genuine web content.
- Derive frequencies with which any given phrase is followed by another.
- Plug those frequencies into a Markov process that produces meaningless text.
Since the text is randomized and hence unique, it doesn’t pass the most obvious test for being spam. Further, because in some ways it resembles normal text, the black hat hopes it won’t pass any spam tests at all.
I basically believe that post, despite a couple of minor red flags (e.g., if he’s such an SEO expert, why is he using dynamic, numeric URLs in his own blog?). For one thing, the Slightly Shady SEO blog comes well-recommended in the SEO community. Besides, I’ve done a modest amount of reading on black hat subjects, and this indeed sounds like a legitimate first approximation to what’s really going on.