Next generation of aggregation

April 6th, 2003 | by aobaoill |

I’ve been concerned for a while at the limited utility of tools such as Blogdex and Daypop. While they give a coarse overview of the most linked items, this suffers from a number of problems:

  • Stories appearing in several places (e.g. CNN, Reuters, BBC) are duplicated, or don’t get enough aggregated hits to make the grade
  • There’s no differentiation between type of story covered (war, US politics, flash site of the day)
  • There’s no difference between the source of the links

The last is most interesting to me because as as some people have noted the ‘distributed conversation’ that is the internet is in practice many small conversations – bigger is not necessarily better. The problem with generic counts is that areas of mild general interest – such as blogging as a phenomenon – prevent people from using aggregation sites to track special interests (ham radio, environmental protections, whatever) which may sometimes rise above the noise, but only on an infrequent basis.
Thankfully we can see the evolution of aggregators to address some of these problems. First Daypop started looking at Word Bursts – increases in the incidence of certain words – and now we have Memeufacture which follows new stories, and the most influential sources, in specific topic areas. (Found thanks to this wonderful piece).
The next iteration, of course, is to generate those topic areas automatically. This requires two steps – deciding what the areas are, and allocating sources to them. This is needed in order to be more than a special interest meta-blog. Blog Network, for example, condenses content relating to blogs, but requires that people add their blog to the list – that they self-identify as creating content in this topic area. Even where the organisers add sites, they will miss a random post on a weblog that is relevent to a topic, but may be atypical for the blog.
In addition I would differentiate between ‘blogosphere’ sources and external sources (such as professional media, company sites etc). And a final thought: positive and negative links. How can software notice whether a link is positive or negative? This is important in the days of PageRank, when Google adds points to a page if you reference it, even if it’s in the middle of a review where you’re trashing the linked-to document/content. This is why the recent ‘second superpower’ controversy may actually have helped coalesce the position of the ‘new’ meaning, as those complaining about the ‘Googlewashing‘ often linked to the article they were criticising, thereby pushing up it’s reputation in Google’s eyes…

Sorry, comments for this entry are closed at this time.