January 16, 2008

Automation secrets of black hat SEO

XMCP writes one of the better black hat SEO blogs. In a post last November, he laid out a ton of advice about automating black hat SEO. Personally, I don’t approve of doing black hat SEO. Still, it’s an intellectually interesting subject. What’s more, black hat SEOs create a large fraction of all websites, and certainly of all blog comments, links, and so on. So it’s interesting to track them.

Most interesting to me and probably to most readers here is the part that shows where black hat SEOs get their content:

Content Creation

  1. Know your approach. You really have only 4-5 options
    1. Direct Scraping, full data.
    2. RSS Feeds
    3. Content Generation/Markov Scripts
    4. Manual, offshore labor
      1. Make sure to have an easy way for those who do your writing to retrieve their assignments. Get a reliable crew that will check the buffer every day, and start pumping out the desired articles. Include an easy way for them to submit their work on a webpage.
      2. If possible, have an automated payout system. Keep an automatic tally of their submitted articles, and have your script login to paypal and send them their payment. Be careful though, to avoid no payment, or god fobid duplicate payment.
    5. Gibberish (Scrape/Cloaking sites)
  2. No matter which way you choose do get your data, make sure it’s stored in a swiftly accessible database, and backed up consistently. Have it so all sites that are out there reference this database by domain, not IP. This way, if that server goes down, or is too distant from your most active web host, you can easily re-reroute the traffic to the backup database.
  3. Have your content creation feature tie directly in to your keyword/topic database.

The idea behind those “Markov scripts” is that you

  1. Obtain a large amount of genuine web content.
  2. Derive frequencies with which any given phrase is followed by another.
  3. Plug those frequencies into a Markov process that produces meaningless text.

Since the text is randomized and hence unique, it doesn’t pass the most obvious test for being spam. Further, because in some ways it resembles normal text, the black hat hopes it won’t pass any spam tests at all.

I basically believe that post, despite a couple of minor red flags (e.g., if he’s such an SEO expert, why is he using dynamic, numeric URLs in his own blog?). For one thing, the Slightly Shady SEO blog comes well-recommended in the SEO community. Besides, I’ve done a modest amount of reading on black hat subjects, and this indeed sounds like a legitimate first approximation to what’s really going on.


2 Responses to “Automation secrets of black hat SEO”

  1. SlightlyShadySEO on January 16th, 2008 12:59 am

    Thanks for the citation πŸ™‚
    And I use the numeric urls on my own blog because it is actually the first blog I ever cared to run. When I started, I didn’t know many of the wordpress settings and such. Beyond that, I NEVER expected it to take off how it has. I was expecting maybe 20 subscribers of people I already talked to. So it seemed silly to mess with the URLs too much.
    Also, I’ve been in a bit of a traffic surge now for a couple months, with steady growth. If I were to switch now, the time it takes Google to adjust to the change(and the difficulties that sometimes surface with that change) could kill the momentum that I have worked so hard for.
    The initial rush is slowing down(from 20-40 new subscribers per day to about 4-12), so I may be switching it over soon.

    And also thank you very much for the kind words. They’re appreciated πŸ™‚

  2. Curt Monash on January 16th, 2008 2:57 pm

    Got it. My category pages all have “category” in the URL. And it’s too late to change that. So I feel your pain. (For me, category pages are good things to SEO for, not bad ones.)


Leave a Reply

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Warning: include(): php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/texttechnologies/public_html/wp-content/themes/monash/static_sidebar.php on line 29

Warning: include(http://www.monash.com/blog-promo.php): failed to open stream: php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/texttechnologies/public_html/wp-content/themes/monash/static_sidebar.php on line 29

Warning: include(): Failed opening 'http://www.monash.com/blog-promo.php' for inclusion (include_path='.:/usr/lib/php:/usr/local/lib/php') in /home/texttechnologies/public_html/wp-content/themes/monash/static_sidebar.php on line 29