Are spam-blogs skewing blogosphere stats?
This is a bit from a March post on Sifry's Alerts, charting the growth of the Blogosphere on Technorati. But it was the bit about spam blogs, or blogs created by automated robots on independently hosted sites that are really just "link farms," or idiotic attempts to "game" search engines, particularly Google's algorithms, to boost ranks and drive traffic to ads that got my attention.
The existence of such link farm sites are as absurd as spam, of course, and we always need to remember how absurd the entire equation of spam is.
(Question, did anyone else's email spam drop off suddenly with the conviction of the spammer dude in Virginia? Could it be possible that a mere handful of people are responsible for 90% of the spam? I'm making this up, of course, speculating, but what if it were so? That would mean that the Internet had been hijacked by a relative few, technically, something of an act of penis enlargement terrorism, eh? Enlarge your penis or I'll fly this plane to Cuba! Er, ah, maybe something like that... absurd I tell you.)
I've hit link farms in idle search sessions, hunting stuff down online. They are so annoying, whole batches of lookalike sites with the most gratuitously themed content and links. I'm betting all this stuff will be a faint memory in the future, and we'll have a few samples of these kinds of things in museums, to look at and laugh.
Here's the bit from Sifry:
Link: Sifry's Alerts: State of The Blogosphere, March 2005, Part 1: Growth of Blogs.
Technorati is now tracking over 7.8 million weblogs, and 937 million links. That's just about double the number of weblogs tracked in October 2004. In fact, the blogosphere is doubling in size about once every 5 months. It has already done so at this pace four times, which means that in the last 20 months, the blogosphere has increased in size by over 16 times.[...]
We are currently seeing about 30,000 - 40,000 new weblogs being created each day, depending on the day. Compared to the past, this is well over double the rate of change in October, when there were about 15,000 new weblogs created each day. The remarkable growth over the past 3 months can be attributed to the increase in new, mainstream services such as MSN Spaces, and in increases of use of services like Blogger, AOL Journals, and LiveJournal. In addition, services outside the United States have been taking off, including a number of media sites promoting blogging, such as Le Monde in France.
There is a dark underbelly to these numbers, however: Part of the growth of new weblogs created each day is due to an increase in spam blogs - fake blogs that are created by robots in order to foster link farms, attempted search engine optimization, or drive traffic through to advertising or affiliate sites. We have been battling the spam situation in a significant way for about 2 months - prior to January, spam wasn't much of an issue. All of these charts reflect Technorati's databases after spam blogs have been removed, and we feel that we've been able to capture and identify most of the spam out there, but one should note that there is definitely blog spam that we don't catch (tell us if you see spam in the index!). I'd estimate that we currently catch about 90% of spam and remove it from the index, and notify the blog hosting operators. Most of this fake blog spam comes from hosted services or from specific IP addresses. One of the results of the extremely productive Spam Squashing Summit of a few weeks ago is the increased collaboration between services in order to report and combat this spam. Right now, about 20% of the aggregate pings Technorati receives are from spam blogs, so you won't see that in these numbers - these statistics show only "cleaned" data.
[...]
Comments