Digg’s Recommendation Engine

We’ve been developing filtering technologies based on behaviors and expressed likes/dislikes. It’s hard stuff and one thing is evident, relying on a single mechanism or ideology for recommendations is a strategy fraught with risk.

If you rely on active participants, people training the recommendation engine, you simply won’t get the data inputs necessary to deliver good recommendations. It’s equally true that if you rely simply on historical behaviors you will end up with a recommendation engine that breaks easily when an outlier condition is observed. Anyone who have purchased a random gift item for someone on Amazon knows exactly what I am referring to, the suggested items list gets polluted.

The user experience is also a sticky subject because the recommendation results have to ride alongside the main content or be easily navigated. The simple fact is that users want recommendations as something extra rather than the main experience. The challenges that Digg’s recommendation engine is experiencing are representative of what I am talking about, a less than stellar user experience that reflects UI and more significantly, recommendation results.

After using it for quite some time, like most such ideas, I find it utterly useless. I use Digg in the following way: I check out the front page and the upcoming Technology section for interesting stories. The recommendation engine merely gets in my way, making me go through a couple of extra clicks to get what I want (whenever Digg doesn’t automatically log me in, which is often). The stories that the recommendation engine feeds me seem completely random; standard categorization by topics works way better, and checking only what’s recommended feels like I’m missing out on good stories.

[From So, How’s That Digg Recommendation Engine Been Working For You?]

Personally, I’m a big believer in the value of recommendation engines as a feature which augments a primary user experience and am impressed by the progress we have made on this front. In many ways this is running in parallel to efforts to surface related content because both efforts require building metadata about content that includes key entities, categories, sentiment, and additional taxonomy data that helps narrow the content focus.

It’s also true that there can be too much of a good thing and users have little patience for a system that returns volumes of links and excerpts that are essentially identical, therefore it’s essential to have a filtering mechanism that attempts to surface just the best content according to quality and popularity filters.

I have long contended that nobody ever says “I need more content” or more sources, but this is often asserted as a way of saying “I need better content” in that content is being discovered, filtered and then presented in a manner that helps people find the things that they did now know they did not know. We’re getting there.