Semantic Web = Tagging on Steroids
February 17, 2008 1 Comment
I read a nice list of “11 Things to Know About Semantic Web“, over at ReadWriteWeb. “Semantic Web” is an intimidating term. “Semantic”…hmmm, that’s something to do with words, right? Don’t people say, “he’s just using semantics” in a pejorative way?
I’ve researched it a bit, and here’s my initial attempt to put it in layman’s terms. The semantic web takes a document of unstructured data (say, this blog post), and renders it into a set of tags that are readable by both humans and software programs. Not just any tags, but really powerful tags beyond what you or I would use.
Now, at this point, that sounds sort of redundant. Don’t a lot of pages already have user-created tags? Can’t search engines find the words that are in the web page? If you know a little HTML, aren’t there metadata tags?
Well, it turns out those various methodologies are great for humans, but do little for machines. That kind of makes sense, right? I mean, we all get that machines’ ability to interpret our written words is limited. Google doesn’t really make a connection between your search term and the pages it serves up. It just looks for instances of your search terms and then applies all that special magic it does (number of links to a page, number of times the page was clicked previously, etc.).
The key to making these unstructured web pages readable by a machine is something called…RDF. RDF stands for Resource Description Framework. Here’s RDF from Wikipedia: “The RDF metadata model is based upon the idea of making statements about resources in the form of subject-predicate-object expressions, called triples in RDF terminology.”
That “triples” thing (or, I’ve seen it as “triplets”). The idea of casting unstructured web data into subject-predicate-object is apparently quite powerful. Again, from Wikipedia: “In the English language statement ‘New York has the postal abbreviation NY’ , ‘New York’ would be the subject, ‘has the postal abbreviation’ the predicate and ‘NY’ the object.”
At this point, people much more versed in these technologies can explain how computers will use these triplets to better serve up content for a given search. For instance, Reuters has come out with its Open Calais initiative. They aim to “make all the worlds content more accessible, interoperable and valuable.” I will do some more research and write a follow-up post on this subject.
But I do want to note a few of the points provided in Bernard Lunn’s post over at ReadWriteWeb:
- Semantic Web will start the long, slow decline of relational database technology. Web 3.0 enables the transition from “structure upfront” to “structure on the fly”. The world is clearly too complex to structure upfront, despite the tremendous skills brought by data modelers. Structure on the fly is done by people adding structure as they use the service and by engines that automatically create structure from unstructured content.
- Don’t look for a killer app. That implies a client/consumer win. This is much more likely to be a server/platform/enterprise win.
- Semantic Web could slow the Google steamroller. This could be like the PC for IBM or the Web for Microsoft. The steamroller’s momentum carries it forward for a very long time and it can build all kinds of wrapper systems around it, but something new always does come along. Google mastered how to give some structure to countless unstructured HTML pages. Semantic Web will gradually make that less critical as the underlying content will be more structured.
- Tagging is the quietly disruptive technology. Everybody tags. It is the most basic human urge to mark what we find.
- Semantic Web will leverage the “community” to add structure and this will use some techniques from first generation Social Networking. But it is very unlikely that Semantic Web will emerge from the walled gardens of current social networking sites.
Final note. I ran this post through a free website that employs Reuters’ Open Calais protocols, “Calais Text Tagger“. It returns a lot of text chock full of semantic tags. I won’t repeat that here. But I did like this little output:
IndustryTerm: unstructured web, search terms, relational database technology, software programs, wrapper systems, given search, unstructured web data, search term, semantic web
Company: IBM, Reuters, Microsoft, Google
Person: Bernard Lunn
Gotta say, that was pretty slick. And it’s more tags than I’m applying to this post.