Fred Wilson on the Next Wave of the Web

fred-wilson-blog-avatarSearch, filtering, semantics, etc, etc. That’s the next wave of innovation in the real time web and that’s why FB opening up status is a big deal

Originally posted as a comment by fredwilson on A VC using Disqus.

Advertisement

Filtering FriendFeed – How Crowdsourcing Can Solve This

It would be nice to have filters on FriendFeed. For instance, it would be nice to be able to hide any post containing the word “Obama” without having to hide someone’s other stuff. Or the ability to hide any entry containing the word “ubuntu”, etc.

Thomas Hawk, FriendFeed direct post, May 1, 2008

The need for filters on FriendFeed is a recurring topic. Click here to see the numerous entries that contain the words ‘friendfeed’ and ‘filters’. Louis Gray notes the need for this in a recent post.

I want present an idea for filters that has two pieces:

  • Category filters
  • Keyword filters

The two pieces are interrelated, and crowdsourcing will be used to build out the category filters.

Let’s get to it, shall we?

Category Filters

FriendFeed already has a “Feed Preferences” page for each member. Here is where you can manage your category and keyword filters. The graphic below is mockup of this:

A. Category Filters

Various categories will be displayed, along with a link to the full list of categories. In the example, above, I say that I’d like to filter out all FriendFeed entries that relate to politics.

The value of category filtering is that it prevents you from having to manage every keyword that might relate to a category. In a recent post, I noted Dave Winer’s 38 different politics-related terms. For instance, he used the terms: Hillary, HRC, Clinton, Edwards, Obama, Rove, etc. Having the ability to automatically filter those out without having to set up keyword hides over and over would be a great benefit to many members. Particularly as FriendFeed gains traction with a flood of new members.

Now how would FriendFeed know that Hillary, Obama, HRC, etc. are part of the politics category? Keep reading.

B. Keyword Filters

Members will need the ability to see what words they have hidden. They can un-hide keywords, or add new keywords to hide directly on the Feed Preferences UI.

Keyword-Based Hides

FriendFeed currently supports hiding specific entries, plus entries from specific members and services. For instance, you can hide all Twitter updates. What is lacking is the ability to filter out entries with specific terms in them.

For instance, shown below are three tweets from Dave Winer regarding politics:

What I’d like to do is apply the Hide function to anything with ‘Harold Ickes’ or ‘Henry Waxman’. This is a mock up of that screen below:

A. Full Text of Entry Displays

The full text of the entry appears. Each word of the entry includes a link. The links are easy ways for members to populate the ‘hide terms’ input box.

B. Hide Terms Input Box

Commas separate each term.

C. Categorize the Terms to Be Hidden

As the member hides the terms, they will be asked to apply a category. The most popular categories previously applied to the keywords will be displayed. Or the member can type a category into the input box, and FriendFeed will auto-suggest different categories with each character entered.

Why do this? This is the basis of the crowdsourced solution.

Let the People Decide

People will have a much better handle on the categories that apply to a keyword than will a heavy-duty algorithm. Such human filtering is the basis of tagging.

Two elements are relevant here:

  • The need to prevent bad categories being assigned to keywords
  • The motivation to do this categorizing

Use Bayesian Stats to Prevent Bad Categories

Here’s the issue you want to avoid. Some prankster assigns the football category to the term “Paris Hilton” while hiding all entries containing her name. Suddenly, members who are filtering out football entries stop seeing their Paris Hilton updates (yes I know, horrors…).

Enter Bayesian statistics. Carl Bialik, a columnist for the Wall Street Journal, has a great column on the use of Bayesian stats for online ratings. The gist of this approach is that all items in a rating system are born with identical ratings. Their ratings only change as people vote, and it takes a sufficient number of votes to really move the rating of an item. Here’s an example of this from the WSJ column:

For instance, as noted in the column, IMDB.com doesn’t use straight averages to list the top 250 movies of all time, as voted on by its users. Instead, each movie starts out with 1,300 votes and a ranking of 6.7, which is the site’s average. That helps smooth the effects of a few intense votes; it takes a lot of votes to budge the IMDB meter up or down from 6.7

That same approach would be applied inside FriendFeed. It would take a large number of people putting a keyword into the same category before the keyword actually became “part” of that category.

Once a keyword graduates to a category, any users filtering that category won’t see entries with those keywords.

Motivation

Why would anyone bother to categorize the keywords they hide? One answer – not everyone will. But there are two drivers of members doing some keyword categorization.

First, members need to recognize that they are contributing to a system from which they are likely benefiting. If you filter any category, you will be benefiting from the work of others’ who have categorized keywords.

Second, the categorization experience has to be simple and fast. You’ve got the member right there, motivated to hide a term. Make it easy for them to channel that motivation into a simple categorization. The most popular previous categories are displayed, making it easy check them. And the auto-suggest feature can be done fairly quickly. I like how Faviki is doing it:

Faviki draws from thousands of different Wikipedia entries for this list.

Final Thoughts

One thing to consider here is that every entry coming into FriendFeed would need to be filtered for keywords. Serious processing power will be required. Fortunately, the FriendFeed guys have firsthand experience with high volumes of real-time queries for keywords at Google.

With regard to this proposal, I haven’t (yet) seen anything on the market that will provide the category tags that would help filter FriendFeed. Since it’s the members who are most in tune with what they want to filter, their common sense and motivation should be leveraged.

As FriendFeed grows, imagine new members easily managing the flow of information by simply filtering the politics category rather than having to set up an extensive list of new keywords. It would make the experience that much better for everyone.

*****

See this item on FriendFeed: http://friendfeed.com/search?q=%22Filtering+FriendFeed+-+How+Crowdsourcing+Can+Solve+This%22&public=1

Tag Recommendations for Content: Ready to Filter Noise?

In a recent post, I suggested that the semantic web might hold a solution for managing noise in social media. The semantic web can auto-generate tags for content, and these tags can be used to filter out subjects you don’t want to see.

As a follow-up, I wanted to see how four different services perform in terms of recommending tags for different content.

I’ve looked at the four services, each of which provide tag recommendations. Here they are, along with some information about how they approach their tag recommendations:

  • del.icio.us: Popular tags are what other people have tagged this page as, and recommended tags are a combination of tags you have already used and tags that other people have used.
  • Twine: Applies natural language processing and semantic indexing to just that data (via TechCrunch)
  • Diigo: We’ll automatically analyze the page content and recommend suitable tags for you
  • Faviki: Allows you to tag webpages you want to remember with Wikipedia terms.

Twine and Diigo take the initiaitve, and apply tags based on analyzing the content. del.icio.us and Faviki follow a crowdsourced approach, leveraging the previous tag work of members to provide recommendations.

Note that Faviki just opened its public beta. So it suffers from a lack of activity around content thus far. That will be noticed in the following analysis.

I ran the six articles through the four tagging services:

  1. The Guessing Game Has Begun on the Next iPhone – New York Times
  2. TiVo: The Gossip Girl of DVRs – Robert Seidman’s ‘TV by the Numbers’ blog
  3. Twitter! – TechCrunch
  4. Injury ‘bombshell’ hits Radcliffe – BBC Sport
  5. Why FriendFeed Is Disruptive: There’s Only 24 Hours in a Day – this blog
  6. Antioxidant Users Don’t Live Longer, Analysis Of Studies Concludes – Science Daily

The tag recommendations are below. Headline on the results? Recommendations appear to be a work in progress.

First, the New York Times iPhone article. Twine wins. Handily. At Diigo gave it a shot, but the nytimes tags really miss the mark. del.icio.us and Faviki weren’t even in the game.

Next, Robert Seidman’s post about Tivo. Twine comes up with several good tags. Diigo has something relevant. And again, del.icio.us and Faviki weren’t even in the game.

Now we get to the trick article, Michael Arrington’s no text blog entry Twitter! The table turn here. Twine comes up empty for the post. Based on the post’s presence on Techmeme and the 400+ comments on the blog post, a lot of people apparently bookmarked this post. This gives del.icio.us and Faviki something to work with, as seen below. And Diigo offers the single tag of…twitter!

Switching gears, this is a running-related article covering one of the top athletes in the world, Paula Radcliffe. Twine comes up the best here. Diigo manages “bombshell”…nice. del.icio.us and Faviki come up empty, presumably because no users bookmarked this article. And none of them could come up with tags of “running” or “marathon”.

I figured I’d run one of my own blog posts through this test. The post has been saved to del.icio.us a few times, so I figured there’d be something to work with there. Strangely, Twine comes up empty. Faviki…nuthin’.


Finally, I threw some science at the services. This article says that antioxidants don’t actually deliver what is promised. Twine comes up with a lot of tags, but misses the word “antioxidants”. Diigo only gets antioxidant. And someone must have bookmarked the article on del.icio.us, because it has a tag. Faviki…nada.

Conclusions

Twine clearly has the most advanced tag recommendation engine. It generates a bevy of tags. One thing I noticed between Twine and Diigo:

  • Twine most often draws tags from the content
  • Diigo more often draws tags from the title

Obviously my sample size isn’t statistically relevant, but I see that pattern in the above results.

The other thing to note is that these services do a really great job with auto-generating tags. For instance, the antioxidant article has 685 words. Both Twine and Diigo were able to come up with only what’s relevant out of all those words.

With del.icio.us and Faviki, if someone else hasn’t previously tagged the content, they don’t generate tags. Crowdsourced tagging – free form on del.icio.us, structured per Wikipedia on Faviki – still has a lot of value though. Nothing like human eyes assessing what an article is about. Faviki will get better with time and activity.

Note that both Twine and Diigo allow manually entered tags as well, getting the best of both auto-generated and human-generated.

When it comes to using tags as a way to filter noise in social media, both system- and human-generated tags will be needed.

  • System-generated tags ensures some level of tagging for most new content. This is important in an app like FriendFeed, where new content is constantly streaming in.
  • Human-generated tags pick up where the system leaves off. In the Paula Radcliffe example above, I’d expect people to use common sense tags like “running” and “marathon”.

The results of this simple test show the promise of tagging, and where the work lies ahead to create a robust semantic tagging system that could be used for noise control.

*****

See this item on FriendFeed: http://friendfeed.com/search?q=%22Tag+Recommendations+for+Content%3A+Ready+to+Filter+Noise%3F%22&public=1

FriendFeed Noise Control, Semantic Web and Dave Winer

On a FriendFeed discussion about the noise on the Web in general, Lindsay Donaghe posted this comment:

Actually I think it’s the same problem we have in general with the firehose of information we’re exposed (or expose ourselves) to on a daily basis. The struggle of where to apply our attention will only be resolved once someone develops intelligent agents to filter the bad stuff and alert us to the good stuff. Wish someone would hurry up and make those. That will be the ultimate killer app.

Louis Gray wrote this recently in his post Content Filters Proving Evasive for RSS, Social Media Sites:

So far, despite many users calling for content-based filters, solutions to block keywords or topics are missing from the vast majority of information spigots.

The recent meme about FriendFeed noise points to the frustration of some people with an inability to manage what content hits their screens. The two comments above underscore this feeling.

Here’s me own example. Dave Winer has two passions: technology and politics. For me personally, technology = signal. Politics = noise. I went through his FriendFeed stream for the month of May, and here are the 38 different political terms that show up:

So what to do? I’d like to suggest that the semantic web might be a solution for down the road.

What Is the Semantic Web?

Semantic web is still a confusing term. Two quotes from Wikipedia help describe it. This quote tells you generally what it’s about and importantly notes that there’s much development for the future:

The Semantic Web is an evolving extension of the World Wide Web in which the semantics of information and services on the web is defined, making it possible for the web to understand and satisfy the requests of people and machines to use the web content. At its core, the semantic web comprises a set of design principles, collaborative working groups, and a variety of enabling technologies. Some elements of the semantic web are expressed as prospective future possibilities that are yet to be implemented or realized.

This quote describes the problem that the semantic web will solve:

With HTML and a tool to render it (perhaps Web browser software, perhaps another user agent), one can create and present a page that lists items for sale. The HTML of this catalog page can make simple, document-level assertions such as “this document’s title is ‘Widget Superstore'”. But there is no capability within the HTML itself to assert unambiguously that, for example, item number X586172 is an Acme Gizmo with a retail price of €199, or that it is a consumer product.

It’s that last sentence there that addresses the noise issue. How does a server know that part X586172 can be categorized as a “consumer product”? That’s where the semantic web comes into play.

And how the noise can be controlled on FriendFeed.

Noise Control: Simplify Users’ Lives

One way to think of the semantic web is as tagging on steroids. In the example above, part X586172 is tagged as “consumer product”. And the tagging occurs without human intervention.

This is what’s needed on FriendFeed. The ability to take a wide range of terms that humans can understand are related. The relationship among the terms is tag.

Here’s what such an algorithm would do for Dave Winer’s political terms:

Now, imagine this in FriendFeed. Semantically-derived tags are appended to every item that flows through. Meanwhile, users have a new ‘Hide’ feature. Hide by topic. They could elect to hide streams with terms on a one-by-one basis. For instance, I’ll hide “robert reich”. I’ll hide “republicans”. I’ll hide “congress”. I’ll hide “obama”. I’ll hide “mitt romney”. I’ll hide…well, you get the picture.

In addition, users could just hide all items with the tag “politics”, and be done with it. Simple.

This could apply for all manner of topics: football, banking, Iraq, etc.

Just How Would These Semantic Tags Be Generated?

I’m not sure anything quite with this purpose exists yet. Reuters has been a leading player in the semantic web with its Open Calais initiative. However, Open Calais focuses of its tagging on people, places, and companies. So if Open Calais was applied to Dave Winder’s FriendFeed stream would have a lot of tags related to those topics. But not metadata tags.

A company called GroupSwim described their semantic tagging approach:

We use natural language processing to analyze the data our customers put into their sites. Our datasets tend to be much smaller but are high quality since someone doesn’t add something to GroupSwim unless they want to share it. Then, we compare the language used in the content to other semantic sources including WordNet, Wikipedia, etc. to do our automatic tagging and analysis.

Interesting, not sure what the tags they produce are. But it does give insight into a requirement: a core foundation of data against which all other data can be compared to derive tags. Something that would correctly map Obama and Clinton to a politics tag.

I’m sure there are other interesting approaches. It’d be great if someone was working on something in this area.

If anyone reading this knows of any semantic approaches that can apply metadata type of tags, feel free to leave a comment.

*****

See this item on FriendFeed: http://friendfeed.com/search?q=who%3Aeveryone+%22friendfeed+noise+control+semantic+web+dave+winer%22

Yes, FriendFeed Will Be Mainstream (by 2018) and Here’s Why

We recently went through a Twitter meme about whether it was mainstream yet. There is no debate as to whether FriendFeed is mainstream today – it’s not. The question really is, will FriendFeed ever see mainstream adoption? Robert Scoble played both sides of the coin (here, here).

FriendFeed will go mainstream. My definition of mainstream: 33% of Internet users are on it. It’s just going to take time, and it’ll look different from the way it does now.

Four points to cover in this mainstreaming question:

  1. What will FriendFeed replace?
  2. What is a reasonable timeline?
  3. What content will drive the activity on FriendFeed?
  4. What topics will drive engagement?

What Will FriendFeed Replace?

Harvard professor John Gourville has a great framework for analyzing whether a new technology will succeed. His “9x problem” says a new technology has to be nine times better than what it replaces. This is because of two reasons:

  • We overvalue what we already have by three times
  • We undervalue the benefits of a new technology by three times

What does this mean in everyday terms? There’s comfort in the status quo, and fear of the unknown.

There’s the argument that FriendFeed is a complement, not a replacement to existing services. There’s some truth there, but the bottom line is that we only have 24 hours in day. Where will end up spending our time?

Here’s what FriendFeed will replace:

  • Time spent on the individual social media that stream into FriendFeed (blogs, Flickr, etc.)
  • Visits to static, top-down media properties (e.g. CNN, ESPN, Drudge Report, etc.)
  • Visits to other user-driven aggregator sites (Digg, StumbleUpon, Yahoo! Buzz)
  • Usage of Google search (search human-filtered content on FriendFeed)

In terms of the “9x problem”, the nice thing is that people do not have to replace what they already do. Visit CNN? You can keep doing that. Like to see what’s on Digg? You can keep doing that.

Searching on FriendFeed will advance. You can do a search on a keyword or a semantically-derived tag, and specify the number of shares, likes or comments.

FriendFeed doesn’t require you to leave your favorite service. It’s the FriendFeed experience that will slowly steal more of your time. That mitigates the issue of people overvaluing what they already have. They won’t lose it, they’ll just spend less time on it. Thomas Hawk continues to be an active participant on Flickr, but more of his time is migrating to FriendFeed. As he says:

One of the best things about FriendFeed is that it gives you much of what you get from your favorite sites on the internet but in better ways.

I think FriendFeed will have the 9x problem beat, but it will take time.

What Is a Reasonable Timeline for FriendFeed to Go Mainstream?

The chart below, courtesy of Visualizing Economics, shows how long several popular technologies took to be adopted in the U.S.

Using my mainstream definition of 33% household penetration, here’s roughly when several technologies went mainstream:

  • Color TV = 11 years
  • Computer = 15 years
  • Internet = 8 years

In addition, here are some rough estimates of current levels of adoption for other technologies. Estimates are based on the number of U.S. Internet users, the recent Universal McCann survey of social media usage (warning, PDF opens with this link) and search engine rankings.

  • Google search = 68% of searches after 10 years
  • RSS = 19% of active Internet users after 4.5 years of RSS readers
  • Facebook = 9% of Internet users after 4.5 years (20mm U.S. members / 211mm U.S. Internet users)
  • Twitter = 0.6% of Internet users after 2.2 years (1.3mm members / 211mm U.S. Internet users)

Yes, the date of FriendFeed mainstream adoption is pure speculation. But looking at the adoption rates of several other technologies, ten years from now is within reason (i.e. 2018). The RSS adoption is a decent benchmark.

What Content Will Drive FriendFeed Activity?

Alexander van Elsas had a recent post where he listed the percentage for different content sources inside FriendFeed. The results were compiled by Benjamin Golub.

Not surprisingly, Twitter dominates the content sources. Original blog posts are a distant #2 content source, and Google Reader shares are #3. That speaks volumes into the world of early technology adopters.

When FriendFeed becomes mainstream, the sources of content will change pretty dramatically as shown in this table:

The biggest change is in the FriendFeed Direct Post. Relative to blogging or Twittering, putting someone else’s content into the FriendFeed stream is the easiest thing for people to do. FriendFeed Direct Posts are similar to Diggs or Stumbles. Since all the content we create, submit, like or comment is part of our personal TV broadcast on FriendFeed, Direct Posts can be just as much fun for users as newly created content by someone you know.

Direct Posts will draw from both traditional media sites as well as from other people’s blogs. Expect media sites and blogs to have a “Post to FriendFeed” link on every article.

Twitter drops as a percentage of content here. Why? FriendFeed’s commenting system replaces a lot of what people like about Twitter. Blogs drop a bit as well. More people will blog in 2018, but many of those will be sporadic bloggers. Still, 10% of the content consisting of original author submissions is pretty good.

Google Reader shares hold as a percentage as more people recognize the value of RSS versus regular-old bookmarks inside their browsers. ‘Other’ goes up, because who knows what cool other stuff will be introduced over the next ten years.

What Topics Will Drive Engagement?

Human nature won’t change. The same stuff that animates people today will continue to do so in the future. Politics, sex, technology and sports will be leaders in terms of what the content will be. There will be plenty of other topics as well. I can see the Iowa Chicks Knitting Club sharing and commenting on new patterns via FriendFeed.

One issue that will arise is that people will have multiple interests. They’ll essentially have various types of programming on their FriendFeed “TV channels”. For a good example of that today, see Dave Winer’s FriendFeed stream. Dave has two passions: technology and politics. I like the technology stuff, but I tend to ignore the political streams.

Well, this will become a bigger issue as FriendFeed expands. I personally like the noise of the people I follow, but my subscriptions seem to generally stick with recurring topics. But as more mainstream users come on board, the divergence of topics for any single person will likely increase.

FriendFeed will employ semantic web technologies to identify the topic of submitted items. These semantically-derived tags will be used to categorize content. Users can then subscribe only to content matching specific categories. How might this work?

A Dave Winer post with “Obama” in it is categorized as Politics. I could choose to hide all Dave Winer updates that are categorized in Politics.

Final Thoughts

The constant flow of new content, the rich comments and easy ‘Likes’, and the social aspect of FriendFeed will drive its mainstream adoption. It’s a terrific platform for self-expression and for engaging others who share your interests. It’s also got real potential to be a dominant platform for research. In the future, look for stories in magazines and newspapers asking, “Are we losing productivity because of FriendFeed?”

So what do you think? Will FriendFeed ever be mainstream? In ten years?

*****

See this item on FriendFeed : http://friendfeed.com/search?q=who%3Aeveryone+%22yes.+friendfeed+will+be+mainstream+%28by+2018%29%22