TDWG 2016 has ended
Back To Schedule
Monday, December 5 • 14:45 - 15:00
Semantic Annotation for Tabular Data

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Tabular data, expressed as spreadsheets, and tab or comma-delimited files, are a convenient and common method for storing and transmitting biodiversity data.  However, tabular data is all too often “dark” data, lacking context and consistency, with little clarity about exactly what is being referred to in the data: for example, whether a set of fields in a “row” refers to a curated specimen, a living individual that is being tracked on an ongoing basis, or an observation. Common difficulties in working with dark data include values with no units, identifiers that are local in scope only or missing, and especially a lack of context for the relationships that exist between data values in columns. These issues are a true impediment for sharing and integrating data from distributed data sources.   While this topic has received a lot of attention in recent years, implementations that offer usable solutions for helping users improve semantic clarity and create instance identifiers have lagged.  This talk will explore a method for validating and classifying instance data based on project management rules, expressed in an XML (extensible markup language) configuration file, and useful for biologists and data managers.   Beginning with a look at the necessary steps of project configuration and then data validation, we will finish by following a sample input file from the National Phenology Network (NPN) as it is loaded into the Biocode Field Information Management System (http://biscicol.org/) and finally a look at the resulting triples and a discussion of implications and future directions.

Monday December 5, 2016 14:45 - 15:00 CST
Auditorium CTEC

Attendees (7)