FriendFeed Noise Control, Semantic Web and Dave Winer


On a FriendFeed discussion about the noise on the Web in general, Lindsay Donaghe posted this comment:

Actually I think it’s the same problem we have in general with the firehose of information we’re exposed (or expose ourselves) to on a daily basis. The struggle of where to apply our attention will only be resolved once someone develops intelligent agents to filter the bad stuff and alert us to the good stuff. Wish someone would hurry up and make those. That will be the ultimate killer app.

Louis Gray wrote this recently in his post Content Filters Proving Evasive for RSS, Social Media Sites:

So far, despite many users calling for content-based filters, solutions to block keywords or topics are missing from the vast majority of information spigots.

The recent meme about FriendFeed noise points to the frustration of some people with an inability to manage what content hits their screens. The two comments above underscore this feeling.

Here’s me own example. Dave Winer has two passions: technology and politics. For me personally, technology = signal. Politics = noise. I went through his FriendFeed stream for the month of May, and here are the 38 different political terms that show up:

So what to do? I’d like to suggest that the semantic web might be a solution for down the road.

What Is the Semantic Web?

Semantic web is still a confusing term. Two quotes from Wikipedia help describe it. This quote tells you generally what it’s about and importantly notes that there’s much development for the future:

The Semantic Web is an evolving extension of the World Wide Web in which the semantics of information and services on the web is defined, making it possible for the web to understand and satisfy the requests of people and machines to use the web content. At its core, the semantic web comprises a set of design principles, collaborative working groups, and a variety of enabling technologies. Some elements of the semantic web are expressed as prospective future possibilities that are yet to be implemented or realized.

This quote describes the problem that the semantic web will solve:

With HTML and a tool to render it (perhaps Web browser software, perhaps another user agent), one can create and present a page that lists items for sale. The HTML of this catalog page can make simple, document-level assertions such as “this document’s title is ‘Widget Superstore'”. But there is no capability within the HTML itself to assert unambiguously that, for example, item number X586172 is an Acme Gizmo with a retail price of €199, or that it is a consumer product.

It’s that last sentence there that addresses the noise issue. How does a server know that part X586172 can be categorized as a “consumer product”? That’s where the semantic web comes into play.

And how the noise can be controlled on FriendFeed.

Noise Control: Simplify Users’ Lives

One way to think of the semantic web is as tagging on steroids. In the example above, part X586172 is tagged as “consumer product”. And the tagging occurs without human intervention.

This is what’s needed on FriendFeed. The ability to take a wide range of terms that humans can understand are related. The relationship among the terms is tag.

Here’s what such an algorithm would do for Dave Winer’s political terms:

Now, imagine this in FriendFeed. Semantically-derived tags are appended to every item that flows through. Meanwhile, users have a new ‘Hide’ feature. Hide by topic. They could elect to hide streams with terms on a one-by-one basis. For instance, I’ll hide “robert reich”. I’ll hide “republicans”. I’ll hide “congress”. I’ll hide “obama”. I’ll hide “mitt romney”. I’ll hide…well, you get the picture.

In addition, users could just hide all items with the tag “politics”, and be done with it. Simple.

This could apply for all manner of topics: football, banking, Iraq, etc.

Just How Would These Semantic Tags Be Generated?

I’m not sure anything quite with this purpose exists yet. Reuters has been a leading player in the semantic web with its Open Calais initiative. However, Open Calais focuses of its tagging on people, places, and companies. So if Open Calais was applied to Dave Winder’s FriendFeed stream would have a lot of tags related to those topics. But not metadata tags.

A company called GroupSwim described their semantic tagging approach:

We use natural language processing to analyze the data our customers put into their sites. Our datasets tend to be much smaller but are high quality since someone doesn’t add something to GroupSwim unless they want to share it. Then, we compare the language used in the content to other semantic sources including WordNet, Wikipedia, etc. to do our automatic tagging and analysis.

Interesting, not sure what the tags they produce are. But it does give insight into a requirement: a core foundation of data against which all other data can be compared to derive tags. Something that would correctly map Obama and Clinton to a politics tag.

I’m sure there are other interesting approaches. It’d be great if someone was working on something in this area.

If anyone reading this knows of any semantic approaches that can apply metadata type of tags, feel free to leave a comment.

*****

See this item on FriendFeed: http://friendfeed.com/search?q=who%3Aeveryone+%22friendfeed+noise+control+semantic+web+dave+winer%22

Advertisement

About Hutch Carpenter
Chief Scientist Revolution Credit

7 Responses to FriendFeed Noise Control, Semantic Web and Dave Winer

  1. Jason says:

    Hutch,

    Thanks for checking out GroupSwim. The tags we produce from the natural language processing get applied to all content added to a group/site. For example, we tag emails, discussions, files and wiki pages. The tags are suggested automatically and the user adding the content has the option of using the tags we suggest, adding their own, or a combination of both.

    The benefits from our tagging are significant.
    1. The tags helps our semantic search engine find information quickly and efficiently. For example, if you search on the word integration, the search engine will suggest other words associated with integration allowing a user to narrow their search. This would not be possible without consistent tagging.
    2. You can train the software to key on tags that are most relevant to your group. For example, you could add your products or competitors to the tagging engine to ensure the content always gets tagged when it sees those tags. Furthermore, you can create relationships with the tags. For example, you could train the engine to know “Apple” is a type of “Computer”. Then, if you search on “Computer”, it will also bring up content tagged “Apple” even if it doesn’t have the word “Computer” anywhere in the content.
    3. You can identify topics of expertise for specific people based on the tags they use (or we add automatically)
    4. We can suggest related content for people based on the semantic relationship of the tags. This type of feature makes it easier for people to find what they want.
    5. We flat out get better tags. People tend to be lazy with tagging and don’t do it unless they are really committed. We take this barrier out by doing it automatically.
    Hope this helps. Let me know if you would any more information or if you are interested in a demo. Thanks.

    Jason

  2. Pingback: High Quality Tagging Yields Significant Benefits - How and Why We Do It « The GroupSwim Diving Board

  3. Pingback: Colin Walker » Who is our audience and what do we owe them?

  4. Pingback: Steroids As The Social Media Holy Grail :: Disruptology

  5. Hey Hutch!

    Wanted to let you know that Calais 2.1 is live.

    In addition to our ongoing addition of new entities and vocabularies, the updated release features relevance ranking and integration with Yahoo Pipes.

    BTW – we have also updated our browser plug-in Gnosis for Firefox 3 and created a version for IE. Go to ‘Tools’ on OpenCalais.com.

    Thanks and hope to see you soon for a PBT reunion,
    -Krista

  6. Pingback: Who are our audience and what do we owe them? » Walker Media

  7. Pingback: Who are our audience and what do we owe them?- SquashBox Media

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: