Monday, February 07, 2011

Curated data

It is difficult to predict what data will have long term value so it is often safest to archive everything. With data storage costs approaching zero I think that we can expect high value data to last forever, baring a nuclear war or the crash of society.

Curated data has a higher value than saving "everything." I think that the search engine Blekko is interesting and useful because of what it does not have: human powered curation yields fewer results but very little SPAM. The Guardian's curated structured data stores have much higher value than the original raw data (from government sources, etc.). I can imagine The Guardian curated data becoming a permanent part of our history as for example are ancient stone tablets we see in museums.

I have long planned on providing curated news and technology data that has semantic markup either on my ancient domain or a new placeholder but I seldom have free time slots because of my consulting business. Hint: I would like having a few partners who are into statistical natural language processing and general data geeks to help me with this. I don't know if it would end up being a viable business or just a public service portal.