The Semantic Web
Today we all use the web, but Tim Berners-Lee drives it. It’s his vision, uses his protocols and he presides over it at the World Wide Web Consortium. What was a childhood obsession with connections has become a movement that has transformed the planet.
Surely, Berners-Lee ranks with Gutenberg, Marconi and Alexander Graham Bell as one of the most influential people in the history of communication. However, his predecessors stopped at one major innovation. Berners-Lee is going for an encore.
It’s called the Semantic Web and he intends it to be no less revolutionary than the World Wide Web. If successful, it will unlock the power of information stored in the world’s computers.
To understand how, we first need to understand how and why the web was created.
A Brief History of the Internet
The internet started as a US Military project called ARPANET that was part of the overall scientific funding in response to the Soviet launching of Sputnik. It was a revolutionary communication network mostly because of its open architecture, which allowed separate parts of the network to develop independently.
It also incorporated packet switching technology that was much more efficient than earlier systems. Messages could be broken up into small “packets,” sent along different routes, then put back together again. Like having multiple registers at the grocery store, this enabled much more traffic to be sent through existing wires.
The design and the efficiency of the network made it ideal for connecting academic institutions and allowing scientists to collaborate. From the early 1970’s till the late 1980’s, the Internet was primarily used by academics attached to large research institutions. Later, commercial dial-up services became available and consumers could connect to the internet as well.
“Walled Gardens” Before the Web
While it was theoretically possible for anybody connected to the internet to communicate with each other, practically it was difficult. One person’s system would have to talk to another person’s system and they would both have to speak the same language.
Furthermore, you would have to know what you were looking for and where it was. Finally, you would have to have access to the information. The Internet was far from universal.
While the internet of the late ‘80s and early ‘90s was exciting, it was also somewhat constricting. Applications like e-mail were becoming popular and millions subscribed to online informational services, yet it wasn’t anything like the internet we know today. It was an internet without the World Wide Web.
Tim Berners-Lee’s Vision
As a child in an academic family, Tim Berners-Lee was obsessed with the way knowledge was connected. He believed that information out of context loses its meaning.
Just as words describe other words, documents describe other documents; discoveries reference other discoveries and so on. We all stand on the shoulders of giants, so access to information demonstrably increases the efficiency of thought.
For him it was maddening that computers could hold so much information, yet much of it was useless. People who needed it couldn’t get to it. They usually didn’t know where it was and even if they did, it was cumbersome getting computers to talk to each other. He felt that information should not only be available, but easily accessible.
Vision Meets Opportunity
The problem was especially acute at CERN, where Berners-Lee worked and one of the world’s premier physics laboratories. Thousands of scientists would come each year to use its enormous particle accelerator and then go back to their home institutions.
There was an enormous need for the scientists to collaborate and share information. Many documentation systems had been proposed and implemented, but none had been effectively adopted. In effect, there was lots of knowledge with little connection. It was a perfect opportunity for Berners-Lee to work out his childhood dream of connecting intelligence.
Berners-Lee saw that the problem with the previous systems was that they were based on hierarchies. Nobody could agree on the proper way to classify and organize information. Moreover, they didn’t want to use a documentation system based on what someone else thought was important. What was central to one person was peripheral to another.
In 1989, a revolutionary year around the world for many reasons, Berners-Lee proposed a “web” of information that had no hierarchy, only links. His proposal proved to be as transformative as any event that year.
Universality of Meaning
Imagine you are in a foreign country where you don’t speak the language. You will have a hard time communicating with others in the way that you’re used to. However, within a very short time you will learn to recognize universal forms of communicating.
You’ll notice that traffic lights use the same colors to mean the same things. Red means stop, green means go. Signs for bathrooms also tend to be international, or at least super-regional. With a few very basic standards and some finger pointing, you’ll find that you’ll be able to get by.
Tim Berners-Lee wanted to do the same with the internet. He realized that it was foolish to try to get everybody to use the same languages and protocols to run their computer networks. Like language and culture, different local networks have to address different local needs and preferences.
He therefore sought to have as few rules as possible so that everybody would be more likely to use them. One of the ways he did this was by creating what programmers call a markup language. He named the particular language he invented HTML and it has become the basic language of the web.
To understand how a markup language works, think of a screenplay. The writer can add unspoken directions to actors if he wants them to act (wryly) or (angry) or (cheerful). In Berners-Lee’s web language, there were similar directions to <make this bold> or <go to www.digitaltonto.com>.
By using universal markups, information could be universally displayed even if different programs were used to create the document.
The World Wide Web
Although he came across an enormous amount of resistance, it all worked splendidly. All you had to do was tell people that you are using his standards to identify yourself (URL), announce that you were using his protocol (HTTP), and use HTML to “markup” your documents. Without changing your internal systems you could broadcast to the world!
Previously, if you wanted to broadcast or publish there were enormous obstacles and costs. For TV and Radio, you would need a license and even to print a small newsletter there were production and distribution costs.
If you wanted to announce something to the world you either had to be a business or a large institution or you would have to convince someone in power that what you had to say was important.
Now, with the web, all you needed was a personal computer and a phone line. Anybody could share anything they wanted without getting permission. The benefits have been enormous and we all have access to information that previous generations couldn’t dream of.
That was Tim Berners-Lee’s vision, that everybody could share documents with everybody else. However, he believes it doesn’t go far enough. Since the late 90’s he’s been working on a second stage that will unlock even more information.
The Semantic Web
Today’s web is centered on documents. They are different than traditional documents because they are dynamic – we push buttons and they will change – yet they are documents nonetheless. The Semantic Web seeks to free the data that underlies web documents.
Imagine you want to sell car. You can upload the specifications to different web sites and the users of those web sites can see what kind of car you have, what price you are offering, etc.
However, what if the person who wants to buy your car doesn’t go to the sites that you’ve posted on? You could sell your car much more easily if you could just upload it once and all car web sites could access it.
Ontologies
The key is to get computers who don’t speak the same language to understand that they are talking about the same thing.
Different systems use different terms. For instance, in our car selling example, the company who made the car can be a “manufacturer,” and “producer” or a “make” and that’s assuming that the site is in English.
A person would recognize that these terms are equivalent, but a database wouldn’t. We have come a long way in teaching machines to talk to people, now we have to get machines to understand other machines.
This will be done with a set of rules called RDF, which will allow computers to know that two things are the same in some way. Additional “metadata” can be added to definitions and function like a cross language dictionary. Like tourists do in a foreign country, one set of terms to be translated to be understood in terms of another set of terms.
Ontologies can be local or global. For instance, if an industry wants its computers to speak only a specific dialect, it can exclude global ontologies. Data can be freed from proprietary system structures just like documents were freed in the early days of the web.
The possibilities are exciting and applications are already being rolled out. Advertiser’s data about brands can be matched with media data about consumers. Data about poverty and hunger stored in computers around the world can be combined and analyzed. Through combining databases, we will be more likely to identify problems and find solutions.
The Future of the Semantic Web
Just like the early Web, the Semantic Web has its critics. Much of the criticism comes from the technical community, who fear that the extra data will prove cumbersome. They fear that applications will have to spend too much time describing what they are doing and not enough time doing it. Others fear that the project just isn’t feasible.
There are also privacy concerns. We are often uploading data to databases without even knowing it. Every time we go to a web site, make a purchase or link to a friend on a social network we are creating data. The prospect of all our activity being connected together is a bit scary.
However, it’s not the technology we should fear, but how people use it. Use can be regulated. Moreover, everyday useful data is collected and connecting all of that data together could help us solve big problems such as disease, global warming and poverty.
The journey seems worth the effort.
– Greg
Well written and explained. Personally, I think the semantic web is inevitable. The hierarchical, document based system which was so wonderful is collapsing under it’s own weight. It simply can’t scale to the levels needed. Plus, our system is so English-centric that a symbolic approach is inevitable. It is pure hubris that makes us think that the web will stay an Western/English centric document based system. So to the critics of the semantic web, I only ask – okay, if not that, then what? I’m not hearing any other answers.
Thanks for your input, Jim.
– Greg
Maybe there is no answer yet (to the question “If not semantic web, then what?”). The technology is not ready, and there is no killer application that would bring the benefit of it to the masses, as Netscape was for the World Wide Web, in taking arpanet out of accademic and military use and making internet available to anybody with a personal computer and a modem.
Personally, I see the semantic web as a great idea, still to be fully appreciated, but with potential utilization only in limited specialized sectors, when pre-defined or definible ontologies exist. This is not the direction that the web is taking: blogs, twits, social communities are all about informal unstructured content, mostly published by amateurs, with no or very little interested in classifying their material.
Different instead would be a context where a rigid ontology exists: this is the ideal ground for semantic web, where opportune metadata can be assigned, where information can be classified precisely and where search engines would return results based on relevance of the content and not on hits from other web sites!
But I believe that Google owes its success exactly to the level of “entropy” of information available in the web and the way it’s connected, and not to the actual relevance of the content. Or not?
Thanks
Stefano,
Thanks. I think you bring up a big issue: To what extent should ontologies be global and to what extent should they be local.
Just as the 2nd law of thermodynamics (entropy is always increasing) applies universally but not locally, there can be semantical spheres of influence. Very few laws globally, more super regionally and even more that are specific to a community or industry.
For example, eating good steak and drinking good wine applies equally to Italians, Ukrainians and Americans, but some Americans specialize in drinking whiskey and foul language:-)
– Greg
Ontologies must, I think, be global, local and bring multi-tribal entities. They need to be adapted and constructed as needed.
The semantic web, on the other hand, will I think be emergent rather than constructed. The world wide web (still a wonderful name) will continue to define itself with its curious mixture of standard definition and creative expression.
I don’t mean to sound too “new age” about this. There are plenty of places where we must have the practical, architected approach of systems, protocols and standards. But we have to also appreciate that we need from time to time to take a step back and see that the emergent properties of this network are also worthy of observation. I think that those who are talking about the semantic web like Berners-Lee are the type of visionary pragmatists who can see both levels simultaneously.
Hey Greg, great topic. Thanks for the post!
I see some kind of hybrid in our future. “The Internet” is transforming from a destination where users go to find… whatever, to a data source where that data is aggregated and embedded in other stream-of-life appliances and applications (think augmented reality navigation systems).
One of the best examples of this is Google search where, by their own numbers, less than 20% of searches conducted are initiated from their site. The capability is being embedded in toolbars and apps instead. Another great example is Twitter. The list gets longer every day.
The destination sites will persist for a time but even the best destination sites (Amazon, Ebay, Wikipedia, Hulu, etc) are already understanding this transformation and making their content available through alternative channels. Maybe something like the Semantic web will be the answer.
Thanks again for a great post!
jtrigsby
Interoperability does seem to be the order of the day. API’s and open architecture are leading the way, but the idea is that eventually we won’t need much of either.
Thanks for your input. Much appreciated!
– Greg
I think the question of the Semantic Web is akin to the excitement and potential held by Artificial intelligence (AI) and Expert Systems (ES), but never fully realized (at least with AI). The problem with AI is that too much is expected of it, with popular fantasy expectations of computers equalling or surpassing human capability. When benchmarked against such a broad expectation, it is bound to come up short. Expert systems on the other hand, defined a narrow scope for AI, and proclaimed that technology can solve certain problems requiring human-like intelligence provided the domain was tightly constrained. The best example of an ES are Chess games, especially the ones that beat Gary Kasparov. This is truly amazing from a machine if you really think about it.
The semantic web may turn out to be similar. We’re going to have to define domain boundaries within which the semantic web may in fact turn out to be extremely intelligent, but we should keep our expectations tempered. We may end up building various agents of the semantic web, all specializing in a particular function and doing it better than anyone or anything else on the planet. But to say that the web as a whole will become one big semantic intelligence engine may be asking for too much.
Hari,
Good point. Semantic web advocates are desperately trying to avoid the stigma of AI in order to succeed where it failed.
I also agree that a Global ontology will be limited, but more local ontologies have greater promise.
= Greg
Thanks, I enjoyed reading this post, excellent information.
Rafael,
Thanks. I’m glad you liked it.
– Greg
Thanks Greg, I’ve only recently come across the term Semantic web and I’m still trying to get my head around it. Yours is the clearest explanation so far although I have a suspicion that I’ll still fail in trying to describe it to others. Your explanation of how the web came about is also very clear. I appreciate it.
Anne,
Thanks. I’m glad you found it helpful.
– Greg
I think this technology is vital and already in existence within different paradigms, many of the fundamentals of software integrations, e.g. WebServices, are already addressing this problem but in a more domain specific manner.
Just like coming up with a universal prototype for wireless communication was a challenge for a while due to different interests, semantic web is also facing conflicting interests.
But eventually the ground-up (or bottom-up) pressure to have a more formalized universal language will prevail.
Obviously evolution of this technology has not been like the internet itself, but rather slow and incremental, so some of the analogies are misleading.
Peyman,
Thanks for your input. I agree that semantic technologies are going slower than the web, but as they are being integrated into the existing technology, they are more prevalent than you might think.
– Greg