Monday, September 15, 2008

Distributed robust system for provenance and trust in Semantic Web Applications and Tim Berners-Lee's new World Wide Web Foundation

With some reluctance, I am going to toss out what I think is a great business idea that is too large and resource intensive for me to pursue myself: develop the infrastructure and business models for a network graph (not a hierarchy) of "trust providers" similar to issuers like Thawte of SSL certificates, but for semantic web data.

First, I want to describe the problem to be solved: assuming the existence of RDF/RDFS/OWL data on the web, how do you know what is correct and what is faked for whatever nefarious reasons? What is the provenance of the data? Even human readers have a difficult time separating out real information from rumors, errors, and outright lies on the web.

Proposed solution: organizations "sign" data with a certificate for either a fee or other motivation. Using the current technology, RDF triples would be reified with one or more "trust tokens" (also implemented as RDF) from known signers who vouch for the provenance and accuracy of data. For now, this rating would have to be performed by human analysts, but could hopefully be done quickly and not too expensively with something like Amazon's Mechanical Turk system. I don't see this trust measurement as a Boolean trust or no-trust value - rather, a numeric range. Further: known signers can rate other signers. Signers would have a trust score. Accuracy and provenance of data could thus be assigned trust score based on the trust ratings given by one or more signers and the trust score of the signers themselves. The problem is to make this process of assignment a small fraction of the cost of producing RDF/RDFSOWL knowledge sources while adding significant extra value.

There is a lot of literature; try searching for "web of trust semantic web" and "provenance semantic web". When I read about Tim Berners-Lee's new World Wide Web Foundation this morning I started to hope that they might develop some open and free infrastructure software to support trust annotation of data. The high economic cost of quality trust-rated RDF/RDFS/OWL knowledge sources is definitely a problem, but it is difficult to even imagine the possible range of financial and social benefits. Having standard open source software to manage trust would help reduce costs for providing trust and provenance data through a network of cooperating trust providers.

No comments: