In order to understand and process data we must understand the context in which it was created and used. We looked at document oriented data storage in the last chapter. To a large degree documents are an easier source of knowledge to use because they contain some of their own context, especially if they use a specific data schema. In general data items are smaller than documents and some external context is required to make sense of different data sources. The semantic web and linked data technologies provide a way to specify a context for understanding what is in data sources and the associations between data in different sources.
The thing that I like best about semantic web technologies is the support for exploiting data that was originally developed by small teams for specific projects. I view the process as a bottom up process with no need to plan complex schemas and plans for future use of data. Projects can start by solving a specific problem and then the usefulness of data can increase with future reuse. Good software developers learn early in their careers that design and implementation need to be done incrementally. As we develop systems, we get a better understanding of how our systems will be used and how to build them. I believe that this agile software development philosophy can be extended to data science: semantic web and linked data technologies facilitate agile development, allowing us to learn and modify our plans when building data sets and systems that use them.