August 22, 2008
by Hutch Carpenter

Tagging is a great way to put context on user generated content. The tag cloud to the right shows what the hundreds of thousands of blogs were talking about on the evening of August 21. (Click the image to see what bloggers are talking about right now).
Pretty much any web 2.0 service that has user-generated content supports tags. Flickr. YouTube. Del.icio.us. Google Reader. Last.fm. Tagging is entrenched in the web 2.0 world, and it’s one of those idea that spread without any standards.
But there is a problem of no single standard…
Beta, VHS.
Blue-Ray, HD-DVD.
Space or comma delimited?
What’s happened is that tagging formats are all over the map. Each web 2.0 service came up with what worked best for its product and developers:
This post at 37signals described the same tag formats above, and it got a lot of comments. Good energy around the subject. Brian Daniel Eisenberg thinks the failure to have a consistent tag method may undermine its adoption by the masses.
To me, there really is one best format.
Multiple Words, Comma Separated
I tweeted this on Twitter/FriendFeed:
Can there be a universal standard for tags? Multi-word tags, comma separated. Odd combos (underscore, dot, combined) are messy, inconsistent.
You can see the comments on the link. The gist of them? Multiple words, comma separated is the best format. Here’s why I think so:
- Forced separation of words changes their meaning (“product management” or “product” and “management”)
- Forced separation of words creates tag clouds that misrepresent subjects (is it “product” content? or “management” content?)
- With single terms, too many ways for users to combine the same term:
- productmanagement
- product.management
- product_management
- product-management
- Writing multiple words with spaces between them is the way we learn to write
- Putting commas between separate ideas, context, meanings and descriptions is the way we write
Let people (1) use more than one word for a tag, (2) written naturally without odd connectors like under_scores, and (3) using commas to separate tags. These rules are the best fit for germanic and romance languages, and I assume for most other languages as well.
To Brian’s point about the masses, let’s make tagging consistent with writing.
For Developers, It’s Pretty Much a Non-Issue
In The Need for Creating Tag Standards, the blog Neosmart Files writes:
Basically, it’s too late for a tagging standard that will be used unanimously throughout the web.
A lot of developer types weighed in on the comments. For the most part, they’re sanguine about the issue of different formats. Rip out any extraneous characters like spaces, periods, underscores, etc. What’s left is a single string that is the tag.
It’s About the Users
The issue fundamentally is how boxed in people are if they want to tag. In the Neosmart Files post, commenter Jason wrote this:
As this topic suggests, there are issues in resolving various tags that whilst literally different they are contextually equivalent. I believe this to be the critical juncture. Perhaps the solution lies not in heaping upon more standards, but improving the manner in which tags are processed by consumers.
From my perspective, multiple word, comma separated format is the most wide open, flexible way to handle tags. If a user likes running words together, he can do it. If a user wants to put underscores between words, she can do it. If a user likes spaces between words, not a problem.
But making users cram together words in odd combinations takes them out of their normal writing and thinking style. Tags should be formatted with humans in mind, not computers.
That’s my argument. What say you?
*****
See this post on FriendFeed: http://friendfeed.com/search?q=%22Why+Isn%E2%80%99t+This+the+Tag+Standard%3F+Multi+Word%2C+Comma+Separated%22&public=1
The Conversation