Wednesday, January 11, 2012

Web 3.0 and the Semantic Web, a slight return

After talking with a friend and a friend of his about the Semantic Web and healthcare yesterday, I re-watched a great video on Web 3.0 by Kate Ray that I bookmarked and blogged about a couple of years ago. I like this video because it frames the problems that the Semantic Web is trying to solve. My last published book (for APress) had Web 3.0 in the title, a term that did not really catch on :-)

At least a little bit of my enthusiasm for Semantic Web technologies has diminished over the last ten years because of problems that I have had on customer projects trying to collect linked data from disparite sources and merge it into something useful. There are (apparently) no silver bullets and any data collection and exploitation activities involve a lot of difficult work.

I would not be surprised if this problem of merging different data sources is not solved by using Ontologies and webs of linked data sites, but rather, by vendors curating data in narrow domains and selling interfaces to this curated data.

In a world of too much information the activity of curation can have a very high value and this value and the market price for these services will determine the amount of resources invested in combinations of automated and manual curation of information.

4 comments:

SeeSharpWriter said...

I must say that I don't believe in Semantic Web anymore for these reasons:

- Unless the application can conclude implicit facts by reasoning and that provides value to the end user, there is no need to make an application semantic in first place. Simply put, without reasoning, semantics is just an expensive overhead.

- Reasoners need to be fast, responsive, reliable, correct and easy-to-use for common people. So far I have not seen such reasoner.

- It is difficult to explain to people the benefit they would get of introducing semantics. Honestly, can you convince a client to pay you extra in exchange for semantically charged app?

- People are concerned about privacy. They DON'T want their data to be merged from different sources in order to get "personalization" (and targeted marketing :S). Instead, they would like to ask questions or input commands as free text and get meaningful answers returned. (Something like Siri does on iPhone, but even that is based on statistics and NLP, not Semantic web)

- People are malicious sometimes. What if someone starts putting crap on the web in order to confuse your data engine?

I wish this whole Semantic thing works some day, though, but I really doubt it.

Mark Watson, author and consultant said...

Hello SeeSharpWriter, thanks for your comments - the last issue you raised (providence) is likely the toughest problem to solve: how to avoid using maliciously published incorrect information.

SeeSharpWriter said...

I guess there would be an authority service for semantic search, similar to how Google is an authority for text search and bans maliciously published data.

But can such powerful semantic search be really built? Wouldn't have Google built it so far?

Mark, what is the most useful semantic application you have ever built?

Mark Watson, author and consultant said...

Hello SeeSharpWriter, having set authorities for semantic data is difficult because there won't be a single trusted authority. Can a powerful semantic search be built? I woud say yes, driven by improvements in artificial intelligence and Internet infrastructure (as the energy and financial costs of computation are reduced, more things are possible and affordable). The most useful semantic application I have built: I don't comment on details of customer projects but I can answer this question because I have never built what I would call a complete semantic application. Rather, I use bits of the technology for organizing data in graph databases, write tools to automatically collect useful data (from Facebook graph, Freebase, DBPedia), and I do a lot of work in text analytics which I consider a requirement for advances in the semantic web (unless we want to mark up information manually, which does not scale).