How Should Tweets Be Ranked in Search Engine Results?
November 2, 2009 15 Comments
Anyone remember when Loic LeMeur had the temerity to suggest Twitter rank its search results by the number of followers people have? His post, with 109 comments and reaction from Michael Arrington, Robert Scoble and many others, clearly struck a nerve.
Fast forward to the past couple weeks. Both Microsoft Bing and Google announced deals to provide tweets in search results. Let me say that again: Google and Bing will be providing tweet search results!
Bing’s version is the first out the gate. In light of the earlier brouhaha, this may come across as insensitive…but I have to ask:
How should tweets be ranked in Bing and Google search results?
I hope your answer isn’t, “I wouldn’t.” Because that’s contrary to what made Google such a global powerhouse used by billions every year. And why Microsoft is working hard to increase Bing’s market share. Google and Bing built their business by presenting search results based on the authority of websites. This system of authority (e.g. PageRank) makes the results relevant to users.
So what about running searches for tweets? Should their presentation be utterly devoid of any authority ranking? Does it make sense to just show the latest tweet containing a given term? After all, that would simply be imitating what Summize (aka Twitter Search) does.
First, a good question to ask is, why do people want to search tweets? How does this differ from web search?
Why Are You Searching Tweets?
To my mind, there are three use cases where people will search for tweets rather than search for websites:
- Find people
- Find latest on a subject that won’t show up in search engines yet (lack of indexing, lack of authority)
- Jump into conversations on something
Find people: You’re interested in a topic, and want to find others who can either improve your knowledge on it or with whom you want to connect. This is using Twitter as people search. The model for all of here is, you are what you tweet. It’s what makes you findable to others.
In this case, my sense is that people will have an desire to find those who would have the most authority on a given topic.
Find latest on a subject: The appearance of an article or blog post in the search engines can take a while. That contributes to the challenge of finding the latest. But the more pressing issue is the display of new articles in the search results. A good article or post on a subject, such as Enterprise 2.0, is likely not going to be ranked very high in the Google or Bing search results. No one links to the article yet, and it competes against a bunch of other incumbent articles in the search indexes.
If something shows up on the third page of Google’s search results, does it really exist?
This issue is even more pernicious for current events. The San Francisco Bay Bridge has been closed for several days now. It seems every estimate about when it will reopen has been wrong, meaning we all have to scramble to figure out our commute for the next day. To get the latest on the Bay Bridge, I searched Google, including the aggregate news results. Everything was too old when I did that, reflecting previous pronouncements. I needed what people knew right now. I went to Twitter, and found tweets that told me the latest status. Very helpful.
To find the latest on topics, I think there is a role for leveraging some sort of authority. People who have established credibility can be good first filters on what’s relevant and useful. For Enterprise 2.0, what is Dion Hinchliffe tweeting? For the Bay Bridge, I most trusted the KTVU tweet I saw.
Jump into conversations: This is Twitter as water cooler. You know something is going on. But how do you connect with people? Searches are good for this. Hash tags for conferences or big stories. Take the recent fraudulent #balloonboy story. It definitely captivated everyone. But even now, you’ll see tweets like this:
What is that? That’s someone taking a popular hash tag and polluting the search stream with spam. Again, a case where adding some authority to the tweet search rankings will help.
Tweet Authority Criteria
Keep in mind that “authority” is used in the context of Google and Bing searches. Of course web searches miss many authorities on subjects, but they work pretty well for giving relevant information.
I categorize the bases of authority in three buckets:
- Relevancy of tweet stream to a subject
- Crowdsourced signals of authority
- Effectiveness in providing relevant content
As a point of reference, Bing’s initial measure of relevance was reported to be the number of followers a person has. Let’s look at the three categories of authority.
Relevancy of Tweet Stream to a Subject
The first basis for authority should be…does someone tend to post about a given topic? Frequency of posts are a good marker that a person has something of interest to share. If someone is going to be deemed an authority on a subject, I’d expect a fair number of tweets related to it.
One twist that would make this better. A semantic basis for linking terms. For example, if some one searches on Foo Fighters, consider people whose tweet streams include posts about “music” frequently as having higher authority.
Crowdsourced Signals of Authority
What does the crowd think of a given person or tweet? Let’s start with a single tweet. If someone posts something on a given topic, and it gets retweeted a lot, that should count hugely in terms of its authority for a given topic.
OK, now for the general stats. How many followers does someone have? Yes, it’s getting gamed. So the presence of a high number of followers isn’t an automatic definition for authority. But it does have relevance in constructing authority.
The benefit of computing this for users is that the authority of those who follow a person can be an input into his or her own authority.
Next… Twitter Lists. Number of followers is not the end of the story. Lists have two characteristics that can be used to compute authority. First is the number of Lists one is on. Tim O’Reilly is on over 2,500 Lists. No surprise – he really made ‘web 2.0′ ubiquitous in our culture.
But an even better indicator of authority is embedded in Lists. How does the crowd characterize a person? Those Lists are valuable for granting higher authority for a given topic.
Effectiveness in Providing Relevant Content
When someone tweets, how do people react? Robert Scoble has a good take from his blog post:
- Number of retweets of that tweet
- Number of favorites of that tweet
- Number of inbound links to that tweet
- Number of clicks on an item in Twitter search
I particularly like that #4 item – number of clicks. Once these tweets are in the Google and Bing search results, the clicks can be measured. These are powerful bases for measuring someone’s authority.
I’d add a measure for how often a shared link is clicked; say bit.ly’s click information. While the actual number of clicks tracked by bit.ly is wrong, let’s assume it’s wrong in a similar fashion for everyone. So the bit.ly clicks counts can give a measure of relative effectiveness in providing content.
What Do You Think?
That’s my somewhat exhaustive description of inputs for ranking tweets in Google and Bing search results. There’s more that would be needed. I can think of incorporating some element of time decay in how tweets are presented as well. But this post is long enough.
What do you think? How would you rank tweets in the big search engines?