Filtering FriendFeed – How Crowdsourcing Can Solve This

It would be nice to have filters on FriendFeed. For instance, it would be nice to be able to hide any post containing the word “Obama” without having to hide someone’s other stuff. Or the ability to hide any entry containing the word “ubuntu”, etc.

Thomas Hawk, FriendFeed direct post, May 1, 2008

The need for filters on FriendFeed is a recurring topic. Click here to see the numerous entries that contain the words ‘friendfeed’ and ‘filters’. Louis Gray notes the need for this in a recent post.

I want present an idea for filters that has two pieces:

  • Category filters
  • Keyword filters

The two pieces are interrelated, and crowdsourcing will be used to build out the category filters.

Let’s get to it, shall we?

Category Filters

FriendFeed already has a “Feed Preferences” page for each member. Here is where you can manage your category and keyword filters. The graphic below is mockup of this:

A. Category Filters

Various categories will be displayed, along with a link to the full list of categories. In the example, above, I say that I’d like to filter out all FriendFeed entries that relate to politics.

The value of category filtering is that it prevents you from having to manage every keyword that might relate to a category. In a recent post, I noted Dave Winer’s 38 different politics-related terms. For instance, he used the terms: Hillary, HRC, Clinton, Edwards, Obama, Rove, etc. Having the ability to automatically filter those out without having to set up keyword hides over and over would be a great benefit to many members. Particularly as FriendFeed gains traction with a flood of new members.

Now how would FriendFeed know that Hillary, Obama, HRC, etc. are part of the politics category? Keep reading.

B. Keyword Filters

Members will need the ability to see what words they have hidden. They can un-hide keywords, or add new keywords to hide directly on the Feed Preferences UI.

Keyword-Based Hides

FriendFeed currently supports hiding specific entries, plus entries from specific members and services. For instance, you can hide all Twitter updates. What is lacking is the ability to filter out entries with specific terms in them.

For instance, shown below are three tweets from Dave Winer regarding politics:

What I’d like to do is apply the Hide function to anything with ‘Harold Ickes’ or ‘Henry Waxman’. This is a mock up of that screen below:

A. Full Text of Entry Displays

The full text of the entry appears. Each word of the entry includes a link. The links are easy ways for members to populate the ‘hide terms’ input box.

B. Hide Terms Input Box

Commas separate each term.

C. Categorize the Terms to Be Hidden

As the member hides the terms, they will be asked to apply a category. The most popular categories previously applied to the keywords will be displayed. Or the member can type a category into the input box, and FriendFeed will auto-suggest different categories with each character entered.

Why do this? This is the basis of the crowdsourced solution.

Let the People Decide

People will have a much better handle on the categories that apply to a keyword than will a heavy-duty algorithm. Such human filtering is the basis of tagging.

Two elements are relevant here:

  • The need to prevent bad categories being assigned to keywords
  • The motivation to do this categorizing

Use Bayesian Stats to Prevent Bad Categories

Here’s the issue you want to avoid. Some prankster assigns the football category to the term “Paris Hilton” while hiding all entries containing her name. Suddenly, members who are filtering out football entries stop seeing their Paris Hilton updates (yes I know, horrors…).

Enter Bayesian statistics. Carl Bialik, a columnist for the Wall Street Journal, has a great column on the use of Bayesian stats for online ratings. The gist of this approach is that all items in a rating system are born with identical ratings. Their ratings only change as people vote, and it takes a sufficient number of votes to really move the rating of an item. Here’s an example of this from the WSJ column:

For instance, as noted in the column, IMDB.com doesn’t use straight averages to list the top 250 movies of all time, as voted on by its users. Instead, each movie starts out with 1,300 votes and a ranking of 6.7, which is the site’s average. That helps smooth the effects of a few intense votes; it takes a lot of votes to budge the IMDB meter up or down from 6.7

That same approach would be applied inside FriendFeed. It would take a large number of people putting a keyword into the same category before the keyword actually became “part” of that category.

Once a keyword graduates to a category, any users filtering that category won’t see entries with those keywords.

Motivation

Why would anyone bother to categorize the keywords they hide? One answer – not everyone will. But there are two drivers of members doing some keyword categorization.

First, members need to recognize that they are contributing to a system from which they are likely benefiting. If you filter any category, you will be benefiting from the work of others’ who have categorized keywords.

Second, the categorization experience has to be simple and fast. You’ve got the member right there, motivated to hide a term. Make it easy for them to channel that motivation into a simple categorization. The most popular previous categories are displayed, making it easy check them. And the auto-suggest feature can be done fairly quickly. I like how Faviki is doing it:

Faviki draws from thousands of different Wikipedia entries for this list.

Final Thoughts

One thing to consider here is that every entry coming into FriendFeed would need to be filtered for keywords. Serious processing power will be required. Fortunately, the FriendFeed guys have firsthand experience with high volumes of real-time queries for keywords at Google.

With regard to this proposal, I haven’t (yet) seen anything on the market that will provide the category tags that would help filter FriendFeed. Since it’s the members who are most in tune with what they want to filter, their common sense and motivation should be leveraged.

As FriendFeed grows, imagine new members easily managing the flow of information by simply filtering the politics category rather than having to set up an extensive list of new keywords. It would make the experience that much better for everyone.

*****

See this item on FriendFeed: http://friendfeed.com/search?q=%22Filtering+FriendFeed+-+How+Crowdsourcing+Can+Solve+This%22&public=1

Weekly Recap 053008: ‘No Comment’

The week that was…

*****

Good discussion this week about comments…first, there was the latest installment of this issue: comment dispersion away from the originating blog…Fred Wilson at A VC weighed in: Jackson instigated the conversation with that post. His reward is the comments it generates…interestingly, bloggers with big established audiences agreed with him…Chris Brogan wrote this on Fred’s blog: One part of the currency I crave from doing a blog is that conversation, especially on my blog, where I spend lots of effort building the posts to be conversation starters, not just fully formed ideas…Mathew Ingram wrote a concurring blog post Bloggers get “paid” with comments

Which made me wonder, do you think there’s a divide between larger established bloggers and smaller, newer bloggers on this issue of distributed conversations?

*****

Next up on the comment discussions…who actually owns the comments?…there was a controversy early in the week where Rob La Gesse was irritated at the comments that were occurring on FriendFeed about his blog post…so he pulled his blog RSS from FriendFeed, which eradicated that post and all its comments from the FriendFeed UI…this raised the question of who owns the comments, and whether FriendFeed should do a better job of keeping records…Mathew Ingram reached out to FriendFeed co-founder Paul Buchheit, who noted the bias is toward blogger control of their feeds and that they will look at ways to solve to better retain comments…

Later, Daniel Ha of Disqus wrote a post called A Commenter’s Rights…kind of a Bill of Rights for those who leave comments on blogs…one Right that I liked: ‘The ability to edit and remove their comments’…too many blogs don’t allow that, including wordpress.com…

We’ll close this out with a quote from my favorite cranky blogger Steven Hodson: This whole discussion about comments is becoming borderline stupid

*****

FriendFeed is growing, and not surprisingly, it’s getting its share of…um…interesting personalities…click this link which takes you to a search for “tweets totally f%(#ed twitter”…you’ll understand what I mean…

*****

Hats off to a couple of developers this week…I wrote a post titled FriendFeed ‘Likes’ Compatibility Index…I manually pulled together some stats to see which other FriendFeeders had the same Likes as me…well Yuvi wrote a script that he could run from his computer for any FriendFeed handle he entered…a bunch of us wanted our stats manually calculated, and he obliged…he blogged about it, and hit Techmeme…very nice Yuvi…

Then another developer, felix, created a UI where anyone could enter their FriendFeed handle to see the people who shared Likes the most…and then felix thought, “I’m going to turn this one up to 11″…he made pie charts out of the results, which have become a big hit on FriendFeed…FriendFeed co-founder Bret Taylor gave his thumbs up on felix’s blog, “Very cool!“…very nice work felix…

BTW, we’re all one playing for second place to Shey in the Likes department…

*****

Jeremiah Owyang apparently has an interesting post on FriendFeed that he’s writing for Saturday 5/31/08…Robert Scoble talked with Jeremiah, and gave this update:

I just talked with Jeremiah. He says FriendFeed will turn on a new functionality that Jeremiah is calling “MiniMeme.” He wouldn’t give me more details, but I am intrigued.

So check your RSS reader for Jeremiah’s post, and maybe we’ll be talking about that here next week….

*****

See this item on FriendFeed: http://friendfeed.com/search?q=%22Weekly+Recap+053008%3A+%E2%80%98No+Comment%E2%80%99%22&public=1

FriendFeed ‘Likes’ Compatibility Index

A favorite feature of FriendFeed is the Like. You get to indicate your interest in an item with a simple click of the Like button.

The act of applying a Like does two things:

  • Provides feedback to the content creator
  • Reveals what your interests are

It’s that second point that is interesting. Amazon.com matches you to other shoppers based on what you buy in order to provide recommendations. Toluu matches you with others based on common RSS feeds. Diigo matches you based on common bookmarks and tags.

How about matching people based on common FriendFeed Likes? Call it the FriendFeed Likes Compatibility Index.

Curious about this, I went to my Likes tab on FriendFeed. I went back to my 50 most recent Likes, and tallied the number of Likes by others. By doing this, I figured I’d see with whom I had the most in common.

The top 29 people are shown below – I put the cutoff at having 4 Likes in common. Some of these folks I know, others I really haven’t interacted with yet.

Here are my top matches in FriendFeed:

  1. Atul Arora (13 likes in common)
  2. Louis Gray (13)
  3. Mitchell Tsai (11)
  4. Shey (11)
  5. Robert Scoble (10)
  6. Thomas Hawk (9)
  7. Julian Baldwin ( 8 )
  8. Jason Kaneshiro ( 8 )
  9. Mark Trapp (7)
  10. Charlie Anzman (6)
  11. Mark Dykeman (6)
  12. Bearded Dave (5)
  13. Bwana McCall (5)
  14. Mack D. Male (5)
  15. Mike Fruchter (5)
  16. Phil Glockner (5)
  17. Alejandro S. (4)
  18. Andrew Badera (4)
  19. Anthony Farrior (4)
  20. Dobromir Hadzhiev (4)
  21. edythe (4)
  22. Kenichi Matsumoto (4)
  23. Marco (4)
  24. Nikpay (4)
  25. Rob Diana (4)
  26. Ruth Ferguson (4)
  27. Shawn L Morrissey (4)
  28. Susan Beebe (4)
  29. Timothy Neilen (4)

One small observation – I’m not in sync with a lot of women, am I? What’s up there? FriendFeed Is from Mars, Twitter Is from Venus?

Now what I need to do is to subscribe to those on this list that I haven’t yet. Also of note – there were 241 different people with whom I shared a Like in this analysis. Really great how FriendFeed lets you come into contact with a wide range of people.

Would be cool if a script could automate the FriendFeed Likes Compatibility Index…

*****

See this item on FriendFeed: http://friendfeed.com/search?q=%22FriendFeed+%27Likes%27+Compatibility+Index%22&public=1

Tag Recommendations for Content: Ready to Filter Noise?

In a recent post, I suggested that the semantic web might hold a solution for managing noise in social media. The semantic web can auto-generate tags for content, and these tags can be used to filter out subjects you don’t want to see.

As a follow-up, I wanted to see how four different services perform in terms of recommending tags for different content.

I’ve looked at the four services, each of which provide tag recommendations. Here they are, along with some information about how they approach their tag recommendations:

  • del.icio.us: Popular tags are what other people have tagged this page as, and recommended tags are a combination of tags you have already used and tags that other people have used.
  • Twine: Applies natural language processing and semantic indexing to just that data (via TechCrunch)
  • Diigo: We’ll automatically analyze the page content and recommend suitable tags for you
  • Faviki: Allows you to tag webpages you want to remember with Wikipedia terms.

Twine and Diigo take the initiaitve, and apply tags based on analyzing the content. del.icio.us and Faviki follow a crowdsourced approach, leveraging the previous tag work of members to provide recommendations.

Note that Faviki just opened its public beta. So it suffers from a lack of activity around content thus far. That will be noticed in the following analysis.

I ran the six articles through the four tagging services:

  1. The Guessing Game Has Begun on the Next iPhone – New York Times
  2. TiVo: The Gossip Girl of DVRs – Robert Seidman’s ‘TV by the Numbers’ blog
  3. Twitter! – TechCrunch
  4. Injury ‘bombshell’ hits Radcliffe – BBC Sport
  5. Why FriendFeed Is Disruptive: There’s Only 24 Hours in a Day – this blog
  6. Antioxidant Users Don’t Live Longer, Analysis Of Studies Concludes – Science Daily

The tag recommendations are below. Headline on the results? Recommendations appear to be a work in progress.

First, the New York Times iPhone article. Twine wins. Handily. At Diigo gave it a shot, but the nytimes tags really miss the mark. del.icio.us and Faviki weren’t even in the game.

Next, Robert Seidman’s post about Tivo. Twine comes up with several good tags. Diigo has something relevant. And again, del.icio.us and Faviki weren’t even in the game.

Now we get to the trick article, Michael Arrington’s no text blog entry Twitter! The table turn here. Twine comes up empty for the post. Based on the post’s presence on Techmeme and the 400+ comments on the blog post, a lot of people apparently bookmarked this post. This gives del.icio.us and Faviki something to work with, as seen below. And Diigo offers the single tag of…twitter!

Switching gears, this is a running-related article covering one of the top athletes in the world, Paula Radcliffe. Twine comes up the best here. Diigo manages “bombshell”…nice. del.icio.us and Faviki come up empty, presumably because no users bookmarked this article. And none of them could come up with tags of “running” or “marathon”.

I figured I’d run one of my own blog posts through this test. The post has been saved to del.icio.us a few times, so I figured there’d be something to work with there. Strangely, Twine comes up empty. Faviki…nuthin’.


Finally, I threw some science at the services. This article says that antioxidants don’t actually deliver what is promised. Twine comes up with a lot of tags, but misses the word “antioxidants”. Diigo only gets antioxidant. And someone must have bookmarked the article on del.icio.us, because it has a tag. Faviki…nada.

Conclusions

Twine clearly has the most advanced tag recommendation engine. It generates a bevy of tags. One thing I noticed between Twine and Diigo:

  • Twine most often draws tags from the content
  • Diigo more often draws tags from the title

Obviously my sample size isn’t statistically relevant, but I see that pattern in the above results.

The other thing to note is that these services do a really great job with auto-generating tags. For instance, the antioxidant article has 685 words. Both Twine and Diigo were able to come up with only what’s relevant out of all those words.

With del.icio.us and Faviki, if someone else hasn’t previously tagged the content, they don’t generate tags. Crowdsourced tagging – free form on del.icio.us, structured per Wikipedia on Faviki – still has a lot of value though. Nothing like human eyes assessing what an article is about. Faviki will get better with time and activity.

Note that both Twine and Diigo allow manually entered tags as well, getting the best of both auto-generated and human-generated.

When it comes to using tags as a way to filter noise in social media, both system- and human-generated tags will be needed.

  • System-generated tags ensures some level of tagging for most new content. This is important in an app like FriendFeed, where new content is constantly streaming in.
  • Human-generated tags pick up where the system leaves off. In the Paula Radcliffe example above, I’d expect people to use common sense tags like “running” and “marathon”.

The results of this simple test show the promise of tagging, and where the work lies ahead to create a robust semantic tagging system that could be used for noise control.

*****

See this item on FriendFeed: http://friendfeed.com/search?q=%22Tag+Recommendations+for+Content%3A+Ready+to+Filter+Noise%3F%22&public=1

Weekly Recap 052308: If You Love Your Blog, Set It Free

The week that was…

*****

Things kicked off with a pair of posts about the next stage of blogging. Yes, fractured comments and all…Duncan Riley wrote Blogging 2.0: It’s All About The User. He writes: If blogging 1.0 was about enabling the conversation on each blog, blogging 2.0 is about enabling the conversation across many blogs and supporting sites and services…Louis Gray followed up with Blogging 2.0 Causing Friction With 1.0 Bloggers…Louis nicely defines the old blogging paradigm: Blogging 1.0 centered around who could: (i)Amass the most page views; (ii) Display the most ads; (iii) Get the most comments; and (iv) Attract the most RSS subscribers

As a relatively novice blogger, I pretty easily fall into the Blogging 2.0 camp…why on earth would I want to keep the conversations limited to my little blog?…that’d be a recipe for having a stale blog…

But Blogging 1.0 is still a strong instinct out there…one example: see Allen Stern’s post on CenterNetworks, Let’s Get Serious About FriendFeed; the 1995 Message Board, the Smart Consolidator and the Stolen Conversation…read not just the post, but check out some of the comments…Blogging 1.0 will die hard…

*****

Help! I’ve fallen, and I can’t get up!…bad week for Twitter, everyone’s favorite social chat room: outages, outages, outages…this seems to be getting progressively worse, as Twitter’s success is killing it…

To show disapproval for Twitter’s handling of these outages, several folks staged a Twit-Out on Wednesday May 21…a number of regular Twitterers went the whole day without going over to Twitter…they also hid tweets from their FriendFeed streams…even the biggest Twitterer of all, Robert Scoble, joined in…

It wasn’t met with universal love, but they made their point…oh, and Twitter did go down that day…

But one bright spot: Twitter apparently scored a new $15 million round of VC funding…

*****

One outcome of the twitter issues this week…some bigger names in the social media world started to embrace it much more…Jeremiah Owyang, who previously marked the date when new Twitter subscribers could not be considered as early adopters, got into it again with FriendFeed…first he posted on FriendFeed that he now had a new place (FriendFeed) to look for conversations, which elicited a bunch of hearty “welcome aboard” type of messages…

Well that got Jeremiah fired up, and went into throw-down mode: Dudes, I’ve been on FriendFeed for a while, not a late adopter…he challenged Robert Scoble to list his date of FriendFeed registration…geek cred…

Of course, if you looked at his activity stats at that time, he had no comments, no likes…but he’s much more engaged now, which is cool…he even wrote a post about FriendFeed…

*****

One thing I’ve noticed in some favorited Flickr photos…models wearing little to nothing…not that I’m complaining, I love art…Thomas Hawk has some strong opinions about making this even easier here

*****

FriendFeed now has Rooms!…Rooms are separate spaces on FriendFeed where people can direct post items, and re-share items into a Room…they accomplish two things: (i) allow a focus around specific topics to follow; (ii) remove some of the items that were considered noise by many users…

Bwana McCall (second reference in this post, nice!) has a good initial set of use cases for rooms here…my favorite is the use of Rooms for live blogging like from one of those Apple events…

One bit of hilarity was the land grab that occurred for Room topics…Michael Nielsen asked Any plans to prevent squatting? I can see people snapping up thousands of “rooms” on the off chance that one day they’ll be worth something…um, well, uh…I managed to score Web 2.0, Enterprise 2.0, Running, Obama 2008 and Coca Cola among others…no idea what I’ll do with them, but anyone’s free to join…I wonder if the Obama campaign will want their Room?

Something that Rooms will foster: an increase in FriendFeed direct posts…regular feeds from your social media sites won’t stream automatically into Rooms…

*****

See this item on FriendFeed: http://friendfeed.com/search?q=%22weekly+recap+052308%22&public=1

Analyzing My FriendFeed Stats: I Should Be Direct Posting More

I’m curious about the level of interaction that occurs around the different content that streams through FriendFeed. Distributed conversations are fine by me, and I wonder what sparks them most often for content. So I did a little analysis of the ‘likes’ and comments that have happened for me.

Below are some pie charts. The first set analyze the ‘likes’. To the left is the percentage of my FriendFeed stream that comes from different content sources. To the right, I counted the number of ‘likes’ for the various content sources. For the ‘likes’ I only counted for the month of May, but I think it’s a decent approximation of my overall activity.

A couple observations:

  • Blog posts and FriendFeed Direct Posts are the biggest sources of ‘likes’
  • Google Reader shares and Twitter are a big part of my stream, but don’t generate a comparable percent of ‘likes’

Now let’s see how the comments look:

Would you look at that? FriendFeed direct posts dominate the comments. My blog posts are #2.

What’s It Mean?

I imagine everyone’s experience will vary. For me, I draw four conclusions.

My FriendFeed use is similar to people who Twitter: With FriendFeed direct posts, I’ll sometimes just make an observation. Other times, I direct post a website, generally with a graphic. This strikes me as similar to Twitter in that I’m posting something that can be consumed by anyone who subscribes to me. Also, these posts mean someone can stay within FriendFeed. Seems to make a difference in interaction when people can stay on the site. Like Twitter.

‘Likes’ dominate my blog posts: The Likes:Comments ratio for my blog posts is running at 4:1. For all the concern about fractured comments, I’d say people are overlooking basic recommendations of your content via ‘likes’. It’s not about the comments, it’s about the ‘likes’!

Comments on my posts frequently occur on someone else’s stream: There are several of my blog posts that have generated good comments. They just haven’t occurred on the RSS feed from my blog. These bigger comment fests have been when someone with much larger following and FriendFeed ‘presence’ (and I’m not going to write his name, because I use it too often…). But you know what? I’ll take those comments! They obviously weren’t happening just off my own post. In the long run that kind of exposure is vital for us smaller bloggers.

Google Reader shares suffer from repetition: Good blog posts will often be shared by several FriendFeed members, including those with larger followings. So when I share, I may be following others. So the repetition diminishes the interaction. I still share – there is some interaction. And Google Reader shares end up in several other places, like RSSmeme and ReadBurner. These services will show the most popular shares, so I want to vote for these blog posts.

Final Thoughts

Colin Walker has some interesting thoughts about using FriendFeed as a blogging platform. Looking at how FriendFeed Direct Posts and my blog generate the biggest activity, maybe he’s on to something.

*****

See this item on FriendFeed: http://friendfeed.com/search?q=%22analyzing+my+friendfeed+stats%22&public=1

FriendFeed Noise Control, Semantic Web and Dave Winer

On a FriendFeed discussion about the noise on the Web in general, Lindsay Donaghe posted this comment:

Actually I think it’s the same problem we have in general with the firehose of information we’re exposed (or expose ourselves) to on a daily basis. The struggle of where to apply our attention will only be resolved once someone develops intelligent agents to filter the bad stuff and alert us to the good stuff. Wish someone would hurry up and make those. That will be the ultimate killer app.

Louis Gray wrote this recently in his post Content Filters Proving Evasive for RSS, Social Media Sites:

So far, despite many users calling for content-based filters, solutions to block keywords or topics are missing from the vast majority of information spigots.

The recent meme about FriendFeed noise points to the frustration of some people with an inability to manage what content hits their screens. The two comments above underscore this feeling.

Here’s me own example. Dave Winer has two passions: technology and politics. For me personally, technology = signal. Politics = noise. I went through his FriendFeed stream for the month of May, and here are the 38 different political terms that show up:

So what to do? I’d like to suggest that the semantic web might be a solution for down the road.

What Is the Semantic Web?

Semantic web is still a confusing term. Two quotes from Wikipedia help describe it. This quote tells you generally what it’s about and importantly notes that there’s much development for the future:

The Semantic Web is an evolving extension of the World Wide Web in which the semantics of information and services on the web is defined, making it possible for the web to understand and satisfy the requests of people and machines to use the web content. At its core, the semantic web comprises a set of design principles, collaborative working groups, and a variety of enabling technologies. Some elements of the semantic web are expressed as prospective future possibilities that are yet to be implemented or realized.

This quote describes the problem that the semantic web will solve:

With HTML and a tool to render it (perhaps Web browser software, perhaps another user agent), one can create and present a page that lists items for sale. The HTML of this catalog page can make simple, document-level assertions such as “this document’s title is ‘Widget Superstore'”. But there is no capability within the HTML itself to assert unambiguously that, for example, item number X586172 is an Acme Gizmo with a retail price of €199, or that it is a consumer product.

It’s that last sentence there that addresses the noise issue. How does a server know that part X586172 can be categorized as a “consumer product”? That’s where the semantic web comes into play.

And how the noise can be controlled on FriendFeed.

Noise Control: Simplify Users’ Lives

One way to think of the semantic web is as tagging on steroids. In the example above, part X586172 is tagged as “consumer product”. And the tagging occurs without human intervention.

This is what’s needed on FriendFeed. The ability to take a wide range of terms that humans can understand are related. The relationship among the terms is tag.

Here’s what such an algorithm would do for Dave Winer’s political terms:

Now, imagine this in FriendFeed. Semantically-derived tags are appended to every item that flows through. Meanwhile, users have a new ‘Hide’ feature. Hide by topic. They could elect to hide streams with terms on a one-by-one basis. For instance, I’ll hide “robert reich”. I’ll hide “republicans”. I’ll hide “congress”. I’ll hide “obama”. I’ll hide “mitt romney”. I’ll hide…well, you get the picture.

In addition, users could just hide all items with the tag “politics”, and be done with it. Simple.

This could apply for all manner of topics: football, banking, Iraq, etc.

Just How Would These Semantic Tags Be Generated?

I’m not sure anything quite with this purpose exists yet. Reuters has been a leading player in the semantic web with its Open Calais initiative. However, Open Calais focuses of its tagging on people, places, and companies. So if Open Calais was applied to Dave Winder’s FriendFeed stream would have a lot of tags related to those topics. But not metadata tags.

A company called GroupSwim described their semantic tagging approach:

We use natural language processing to analyze the data our customers put into their sites. Our datasets tend to be much smaller but are high quality since someone doesn’t add something to GroupSwim unless they want to share it. Then, we compare the language used in the content to other semantic sources including WordNet, Wikipedia, etc. to do our automatic tagging and analysis.

Interesting, not sure what the tags they produce are. But it does give insight into a requirement: a core foundation of data against which all other data can be compared to derive tags. Something that would correctly map Obama and Clinton to a politics tag.

I’m sure there are other interesting approaches. It’d be great if someone was working on something in this area.

If anyone reading this knows of any semantic approaches that can apply metadata type of tags, feel free to leave a comment.

*****

See this item on FriendFeed: http://friendfeed.com/search?q=who%3Aeveryone+%22friendfeed+noise+control+semantic+web+dave+winer%22

Hey Yahoo! Forget MSFT, GOOG. Change the Search Rules.

These I wish I knew the moment I was turned off on Yahoo and what the root cause may be, but I no longer use anything Yahoo (except my Flickr account if you want to count that).

Vince DeGeorge, on FriendFeed

I was doing the same thing until I started using delicious as a search tool. Finally realized how powerful it was, and have been using it since.

Shaun McLane, on FriendFeed

I have previously written that Delicious search is one of the best ways of searching for things when a standard search doesn’t pull up what you are looking for. After Google, it is my favorite “search engine.”

Michael Arrington, TechCrunch, Delicious Integrated Into Yahoo Search Results

The latest news is that Microsoft is reaching out to Yahoo again. In fact, a couple reports (here, here) say that Microsoft wants to buy Yahoo’s search business.

Before any such transaction occurs, it seems worthwhile to think about what Yahoo could do with its existing assets. The three comments above are insightful. Yahoo is slowly losing share of mind, although it’s existing base of users will be around for a while. At the same time, there are nuggets in the Yahoo empire.

Search via del.icio.us ranks as one of those nuggets. Another nugget? Yahoo! Buzz. According to ReadWriteWeb, Yahoo! Buzz has surpassed Digg in terms of traffic, and its demographics better reflect web users.

Yet, Yahoo struggles against Google in the highly lucrative search market. Google increased to 67.9% of searches in April 2008, compared to Yahoo’s decline to 20.3% of searches.

What should Yahoo do? Stop playing Google’s game. Rewrite the search rules by embracing the social web fully, leveraging the social media assets it has.

And in doing so, demonstrate an aggressive path to make Yahoo a social media titan.

A Proposal for “Socializing” Yahoo Search

In January 2008, TechCrunch ran a post with a preview of del.icio.us integrated with regular Yahoo search results. Included in the search result links would be stats that tell a user:

  • Number of del.icio.us users who bookmarked the page
  • The top tags they used on the page

Both of those stats appear to be clickable. By clicking on the number of users stat, I assume a user would be taken to the del.icio.us page showing the users who bookmarked the page. If one clicked a tag, you’d land on the del.icio.us page for all web pages with that tag.

That’s a good start. But Yahoo can do better. Below is a diagram that shows how Yahoo can use its existing assets, combined with a good dose of the new social media experience, to radically change search:

Here’s a breakdown of what’s going on with the proposal.

Search Rankings

From what I’ve read, Yahoo has pretty much caught up to Google in terms of search performance. That means the use of links and clicks to rank websites is pretty common across the two search engines. However, Google does have the advantage of three times the traffic, which makes its insight into what’s relevant better than Yahoo.

But Yahoo has its own in-house advantages: del.icio.us and Yahoo! Buzz. Both address shortcomings in the links and clicks rankings for search engines:

  • Links require a media site or blogger to take the time to link. These links are insightful, but lack the broader reach of what Web users find relevant.
  • Clicks occur before a searcher knows whether the landing site is valuable. They don’t describe its usefulness after someone has clicked onto the site.

With del.icio.us and Yahoo! Buzz, Yahoo can tap into users sentiments about websites in a way that Google cannot. These insights can be used to influence the ranking of search results.

Search Results – Your Friends or Everyone

Here’s where it can really interesting. Notice I keep the general search results outside the influence of what your friends think. I think that’s important. A person should see results outside their own social circle. Otherwise, it will be hard to find new content.

But there is real power in seeing what your friends find valuable (e.g. see FriendFeed). So Yahoo should let you easily subscribe to other people for content discovery. Yahoo already has a head start on letting you set up your subscriptions:

  • Yahoo Mail
  • Yahoo Instant Messenger

In addition to that, you should be able to easily subscribe to anyone who publicly shares content they find interesting. Both del.icio.us and Yahoo! Buzz have public-facing lists for every user of what they bookmark or ‘buzz’. After viewing those lists, I should be able to easily subscribe to these users.

Once your network is developed, it becomes a powerful basis for improving information discovery.

Search Results – Associated Tags

Whenever tags are available from del.icio.us, they should be visible for each web site shown in the search results. This is what TechCrunch previewed. What do tags tell a user?

  • A way to discover other sites that might be relevant
  • Context for the web site
  • That someone thought enough of the web page to actually tag it

Tags should come in two flavors: everyone and your network. Clicking on a tag should display the top 10 associated sites right on the search results page. For more sites associated to the tag, the user is taken to del.icio.us.

Keeping the top sites on the search results page is important to make people use the functionality. Leaving the search results page just to see the sites associated to a tag will cause adoption to drop signficantly.

Search Results – Associated People

Each web page in the search results will show the number of people who have (i) bookmarked the site; or (ii) Yahoo! Buzzed the site. These numbers give a direct indication of how many people, not websites, found the web page valuable.

Clicking these numbers displays a list of the people, along with their most recent activity. This gives users a sense of whether they want to subscribe to a given user or not.

Search Agent

Once users perform a search, they will be able to subscribe to new content matching their search results. These subscriptions can be based on different criteria:

  • Any new content matching the search term (Google does this via Google Alerts) or a tag
  • Any new content matching the search term/tag and bookmarked by someone to whom the user subscribes
  • Any new content matching the search term/tag and Yahoo! Buzzed by someone to whom the user subscribes
  • Any new bookmarks or Yahoo! Buzzes by someone to whom the user subscribes

New content notifications occur via email or RSS. RSS can be anywhere, including on the user’s My Yahoo page. Again, FriendFeed has shown the power of these content streams.

Final Thoughts

My little post here isn’t the only idea someone could float. But it does at least address taking Yahoo much more deeply into the social media world, where users drive the value.

Yahoo revealed details of a proposed del.icio.us integration back in mid-January. And then nothing. Yahoo previewed Yahoo Mash, a new social network in September 2007. And then…nothing. The last post on the Yahoo Mash blog was January 11, 2008.

Yahoo has so many amazing assets. Search, email, portal home page. Several beloved social media apps (Flickr, del.icio.us, Upcoming). Yet they have not put them together into a cohesive strategy and experience.

And now, talk of selling the search business? C’mon Yahoo. You’ve got too much going on to give up yet. Stop playing by others’ rules. Make your own rules with the amazing assets you have.

*****

See this item on FriendFeed: http://friendfeed.com/e/1b07226a-b51b-f386-fbb8-bdaece83e9fe

The Noise About FriendFeed Noise

I’m actually enjoying the “noise” of FriendFeed. Anyone else?

Corvida, one of my favorite bloggers, has a post up on ReadWriteWeb titled Don’t Be So Naive: Friendfeed Adds to the Noise. In the post, she argues that FriendFeed is contributing to the noise with a lot of stream that hold no interest to her. Her examples include Flickr and Seesmic streams, as well as Twitters without a comment.

Now there is some truth to the noise issue, but I don’t think it rises to a “we’ve GOT to correct this ASAP” level.

In fact, I find the whole thing somewhat confusing. I love seeing the variety of topics and services that cross my FriendFeed page. Heck, I even added the Greasemonkey script to expand the list of items per page to 100 from the current 30. I hated missing stuff by relying only on the 30 items that appear on the first page.

So what am I doing differently from Corvida? Not sure really. Here’s what I know.

Number subscribers. I checked her subscriptions, and I’m subscribed to 55 more people than she is. So seemingly my risk of noise is higher. But it doesn’t bother me.

Blogger bias. I choose my subscriptions carefully. When I’m deciding whether to subscribe to someone, I tend to prefer someone who blogs. That requirement right there is a good one for managing noise. Bloggers seem to have a good level of signal in their FriendFeed streams. If someone only Twitters or shares items on Google Reader, I tend to hold off on subscribing. These rules aren’t ironclad, but they guide me.

Hiding. As I said, I’m not hiding much. I subscribe to one person, whose friends tend to blog in Chinese. I can’t read those, so I’ve been hiding these friends-of-friend on a one-by-one basis. I may need to hide all of his friends. I’m also close to hiding Jason Calacanis tweets as well. His tweets have a low signal-to-noise ratio for me. But it’s only a fraction of what I’m seeing.

See Louis Gray’s post about the various Hide features FriendFeed has – they’ll help clean up any noise issues you have.

Let’s Keep It Simple

Over-engineering a solution to noise is exactly the wrong thing to do. Beware the unintended consequences. The FriendFeed guys have put a lot of power in users’ hands to manage what is seen.

I have suggested a couple possibilities for cleaning up the duplicate links that can show up in FriendFeed. My guess is the FriendFeed guys are working on something related to that. That would be a help.

But really, let the streams flow. Your noise is my signal. I’m enjoying the content and conversations a lot. I even like the multiple times the same link shows up, because I’m piecing together an implicit social network based on that.

Bring the noise!

*****

See this item on FriendFeed: http://friendfeed.com/search?q=%22the+noise+about+friendfeed+noise%22&public=1

On FriendFeed, We’re All TV Channels

Husband: Hey honey, what’s on TV tonight?

Wife: Just seeing what’s new on Do you KNOW Clarence?

Husband: Cool. Any tech updates on Scobleizer?

Wife: Always! But I just want to chill tonight. Let’s see what’s up with Hawaii over at Roxanne.

Husband:Nice. Let me get a quick NBA update over on Odenized.

Wife: Give me that remote. No sports tonight!

Husband:I know what we need. Glasses of wine and some Thomas Hawk photos.

Wife: That’s it! Perfect!

When you watch TV, you have channels and shows that fit your interests. When you surf the Web, you have sites that you enjoy. All are forms of media, of programming, of content. That pretty well describes FriendFeed.

We’re all TV channels on FriendFeed.

You choose to follow people on FriendFeed because they stream content, comments and likes that fit your interests. Isn’t this like TV? ESPN gives you sports. Comedy Central gives you humor. MSNBC gives you prison lockdown stories…

Imagine if you tuned into ESPN and saw shows recounting the battles of World War II? Or if the Oxygen network was showing a hockey game? You’d be confused. And annoyed!

Which is an interesting take on the signal vs noise meme. One person’s signal is another person’s noise.

Select Your Channels Wisely

This is a theme which I’ve stressed before. If you subscribe to people who are not giving you programming you like, you’re going to run into the ‘noise’ issue.

Personally, I wouldn’t watch the Oxygen network. It just doesn’t interest me. It would be noise to me. But there are millions of women who do enjoy it. It’s signal to them.

Which is why I don’t follow any sort of auto-subscribe philosophy, in FriendFeed or Twitter. If someone subscribes to me, I may not subscribe back. Their programming just doesn’t fit my interests. It’s a very egalitarian thing to automatically subscribe back, but you’re bringing noise into your information stream.

Programming Changes

My FriendFeed mostly consists of social media stuff. I also enjoy the world of track and competitive running. If I suddenly switched programming, and fed a lot of running things through the stream, my existing network would look at that as noise. Just like if ESPN started running sci-fi movies. Not what people were expecting.

Louis Gray had another example of this in a recent blog post. Tony Chung switched his programming from Apple and next gen technologies, to covering the arts.

Final Thoughts

Mia Dand and Steven Hodson have nice blog posts on how content forms the relationship between a blog and its readers. They are good examinations of social media as programming.

FriendFeed is even larger than blogs. We get someone’s interests beyond just their blog. Heck, you don’t even need to blog in order to become a FriendFeed channel.

If you value having subscribers and developing a network of like-minded individuals, think about what your Friendfeed streams mean in terms of your programming. Even the simple ‘Like’ function brings content into others’ streams. I’d hate to be too careful about what I ‘like’ or comment on! Just recognize what it’s doing to your subscribers.

And with that…back to our regularly scheduled programming.

*****

See this item on FriendFeed: http://friendfeed.com/e/64843b5f-f950-c815-72ad-bb7931540ff9

Weekly Recap 050908: LouisGrayCrunched, BitchFeed

The week that was…

“Awesomesauce” “Apple sauce?” “Awesomesauce”…Corvida of SheGeeks.net coined this term, applying it to things she really likes. It’s gaining traction. I saw Alex Williams write it on FriendFeed. Robert Seidman’s thoughts on this? “would not say ‘awsome@^%$!’…speaking of Corvida, congrats on that ReadWriteWeb gig

Louis Gray runs something of a debutante ball for emerging bloggers. Three separate times, he’s run a post that calls out five bloggers to watch (here, here, here). When you get called out, you experience this rush of hits and an increase in blog subscribers. It’s really wild…Colin Walker’s blog was called out. His reaction? “We had the Digg Effect, then the Scobleizer effect but now it’s the Louis Gray effect :)”…Alexander van Elsas blog was called out as well. His reaction? “Thx Louis, its great to hear that there is actually someone reading the stuff ;-)”…I used the term LouisGrayCrunched when this blog got its Louis Gray spotlight. You really can’t believe how his mention changes a blog. I can’t wait to see his June list…if Louis Gray raises a blog like that, I can only imagine what a Scoble callout does…

This exchange is a little dated, but I thought it was funny. Emily Chang is something of a luminary in the web world. I guess she was growing tired of Techmeme. She tweeted this: “techmeme sort of reminds me of a gay bath house”…Did Techmeme Gabe Rivera get pissed? Nah. Tweeted back: “Thanks @emilychang …was getting terribly bored with the ‘echo chamber’ cliché”…

Have you seen people “retweet” on Twitter? It’s a little odd. There’s a couple variations. One is a person reposting something they tweeted earlier. The second is when you pick up someone’s tweet, and broadcast it to your followers…People do wonder about this practice. Shel Israel asks: “Would someone please explain retweeting? Do people retweet when no one responds? I seriously don’t get it.”…

Give credit to Guy Kawasaki, Alltop becomes a badge of honor…the aforementioned Corvida made it on there, as she noted in her blog…Sarah Perez was inducted, also noted on her blog…and Mark Dykeman was added…congrats to all, the recognition is well-deserved…please pass along updates on traffic referrals from Alltop as time goes along, will ya?…

Yahoo…boy oh boy…One thing I admire is that Jerry Yang stuck to his guns through a flurry of criticism. There are good arguments on both sides of the MSFT-YHOO acquisition saga…At business school, they drilled into me the importance of equity holders above all. On that score, Jerry and team really need to come up with a strong plan to get Yahoo moving forard again. Is Yahoo only a couple moves away from getting to $37 per share?…

One thing I did this week was add a FriendFeed link to a blog post…Hey, there was a good discussion going on at FriendFeed! I didn’t want my blog readers to miss it…

Speaking of which, there was a revisit of the dispersed comments issue this week. Quick recap: some bloggers don’t like that comments related to their blog posts are not actually being added to the blog itself. The comments end up on places like FriendFeed. For a recap of the previous flare-up of this issue, see here…The difference this time? The debate didn’t erupt over on Techmeme. It stayed on FriendFeed here…Instead of a Bitchmeme, perhaps we should start talking about a BitchFeed

Alert. Alert. Alert. Robert Scoble is building out his subscriber base in FriendFeed. He issued an open call for new people to whom he should subscribe. Get in now before you hit his limit….Can’t wait to see how this experiment unfolds…

*****

See this item on FriendFeed: http://friendfeed.com/e/b814ceb3-4359-1cfe-78c7-878e9b72618b

Spammers on Twitter and FriendFeed: Really a Problem?

Spam is a well-known issue in the email world. Personally, I’ve set up an email account used specifically for some online applications requiring an email address, just to manage the inevitable spam that will result. Spammy comments on blogs are also an issue, which Askimet handles nicely on wordpress.com.

But is spam an issue on Twitter and FriendFeed?

Two things I’ve recently read discuss the issue of spam hitting those services. One is on TechCrunch today, “Twitter Starts Blacklisting Spammers“. From the post,

You know you’ve made it as a communications medium when you start attracting spammers. On Twitter, the problem is getting bad enough that the service is starting to blacklist people who spam other members.

The other was more a question in one of the comments on another post on this blog:

I would just add that, although I love FriendFeed, I would not be surprised to see, as FF gets more popular, it too is overrun by silly people and spammers, to where its traffic sent is huge but equally as useless. These social sites go through a lifecycle of usefulness to pointlessness on their own.

Am I missing something here? If someone is spammy on these services, you simply unsubscribe. This isn’t email. Someone can’t start sending you spam on Twitter or FriendFeed just because they have your member URL. I do see spammers subscribe to me on Twitter, but I never subscribe back. I don’t see their spam.

I can understand the service providers wanting to manage this. But for members, the beauty of these tools is their permission-based nature.

You can’t spam me unless I let you.

UPDATE: Good discussion of this on FriendFeed (here). Mitchell Tsai notes the possibility of comment spam on FriendFeed.