Loading…
TDWG 2016 has ended
Symposium 00 [clear filter]
Monday, December 5
 

11:00 CST

A Standards Architecture for Integrating Information in Biodiversity Science
In this presentation, we will identify what we believe are the essential elements in a standards architecture for how we represent, share, and use biodiversity data.  Our shared vision should include enabling human users and machines to find all of the information and to traverse all of the data connections that a knowledgeable researcher can see in the biodiversity literature, collections and other resources. We should be able to start from any point in the biodiversity data graph and find the meaningful links to associated data objects. From specimen to taxon concept to taxon name to publication; from sequence to associated sequences to taxon concepts to species occurrences; etc.
This means that our data architecture needs to pay attention to the following matters (quite independently of the challenges of delivering the infrastructures that underpin their successful implementation):
Agreement on the set of core data classes within the biodiversity domain which we consider important enough to standardise (specimen, collection, taxon name, taxon concept, sequence, gene, publication, taxon trait, or whatever we all agree).
Agreement on the set of core relationships between instances of these classes that we consider important enough to standardise (specimen identifiedAs taxon concept, taxon name publishedIn publication, etc.).
Making sure that our data publishing mechanisms (cores, extensions, etc.) align accurately with these classes and support these relationships – this mainly means reworking the current confused interplay between cores, DwC classes, use of dcterms:type and use of basisOfRecord – every record should be clearly identified as an instance of a class (or a view of several linked class instances) and (for the core data classes) this should form the basis for inference and interpretation.
An ongoing process of defining for each core class what properties are mandatory (maybe only: id, class), highly desirable (depending on the class, things like: decimal coordinates, scientific name, identifiedAs, publishedIn), generally agreed (many other properties for which we have working vocabularies and do not want unnecessary multiplication, e.g.: waterbody, maximumDepthInMeters) or optional/bespoke (anything else that any data publisher wishes to include). In other words, allow any properties to be shared but ensure that the contours of the data are clear to standard tools.
A set of good examples of datasets mapped into this model, using various serialisations.
While accommodating plain text and URIs in the same fields enables data publishing from the enables data publishing from the widest possible range of sources, it leaves problems for data aggregators and users.


Monday December 5, 2016 11:00 - 11:15 CST
Auditorium CTEC

11:15 CST

Biodiversity Data Integration from an Aggregator’s Perspective
GBIF’s fundamental charge is to make all of the world’s biodiversity data (as much as people are willing to share) behave as though it were managed in a single consistent database, with linkages to any other similar resources in biological and earth sciences.  [Replace that with your preferred grand expression, but I hope one that highlights the contrast between consistent and inconsistent data.]  The ability to query and summarize data, with answers that are as complete and accurate as possible, is made much more difficult by the fact that people record and publish data so differently.
We will summarize GBIF data ingestion and integration operations, and highlight how standards, particularly vocabulary standards, could simplify the integration effort and vastly improve the quantity and quality of data that are represented consistently.
GBIF harvests more than 32,000 data resources from over 800 providers.  At the first level, follow DarwinCore, ABCD, and various extensions, standardize the larger concepts, but at the value level, contents are still very heterogeneous.
The key concepts that GBIF standardizes include: Decimal-Latitude, Decimal-Longitude, Country, Taxon-Name (ranks of the taxonomic hierarchy?), Collecting-Date (and Time?).  In addition to Specimen, Observation, and Taxon-Name, what are the key classes that we need to standardize?
The processes of standardizing content has been expensive, and fields that remain unstandardized impede the producing complete and accurate results.
What are the concepts that most important to address with content vocabulary?
How else can vocabulary standards improve the quantity and quality of biodiversity data?
Will internationalization of vocabularies be required?



Speakers

Monday December 5, 2016 11:15 - 11:30 CST
Auditorium CTEC

11:30 CST

A High-altitude View of TDWG Standards: Machine Processing, Graphs, and the Vocabulary Development Process
 
For the past ten years, TDWG has envisioned a system that would facilitate automated machine processing to enable aggregation of data about similar types of resources, linking of differing types of resources, and reasoning of entailed data that is not explicitly stated by providers.  Despite the attractiveness of this vision, progress towards achieving it has been very slow.  This presentation will take a very broad view of what we can expect to achieve through machine processing, the challenges TDWG has faced and will face in moving toward a system that enables machine processing, and how the goal of enabling machine processing must influence the vocabulary development process.  The presentation will lay out the issues in terms of a graph model, which is central to understanding the issues surrounding machine processing, and on which standards such as Resource Description Framework (RDF) are based.  However, the presentation will not dwell on the details of RDF.

Speakers

Monday December 5, 2016 11:30 - 11:55 CST
Auditorium CTEC

11:55 CST

GitHub for TDWG standards and Interest Groups
GitHub is an online platform (https://github.com) to manage source code. It offers the distributed version control and source code management of git, as well as a number of features that greatly facilite source code collaboration, especially for open source projects. It has become the largest host of source code in the world and supports projects ranging from traditional software management to scientific research and open data. In 2014 TDWG adopted Github to host, version and collaborate around its biodiversity information standards (https://github.com/tdwg) and is increasingly using it for executive and interest group activities. In this talk I will explain 1) how to contribute to TDWG standards and activities using GitHub, covering features such as version control, editing documents, and submitting pull requests and issues, as well as 2) how to manage a GitHub repository, including features such as the wiki, issue management, inviting collaborators and creating releases. It should provide you with enough knowledge to feel comfortable diving into GitHub, be it for TDWG or otherwise.

Speakers

Monday December 5, 2016 11:55 - 12:10 CST
Auditorium CTEC
 


Filter sessions
Apply filters to sessions.
  • Contributed 01
  • Contributed 02
  • Contributed 03
  • Contributed 04
  • Contributed 05
  • Interest Group 01
  • Interest Group 02
  • Interest Group 03: Data Quality
  • Interest Group 04
  • Interest Group 05
  • Interest Group 06
  • Interest Group 07
  • Interest Group 08
  • Interest Group 09
  • Lightning Talks
  • Symposium 00
  • Symposium 01: Semantics for Biodiversity Science
  • Symposium 02: BHL
  • Symposium 03
  • Symposium 04
  • Symposium 05
  • Symposium 06: Biodiversity Data Quality
  • Symposium 09
  • Symposium 10
  • Symposium 12
  • Symposium 13
  • Workshop 01
  • Workshop 03: Darwin Core Invasive Species Extension Hackathon
  • Workshop 03C
  • Workshop 04
  • Workshop 05
  • Workshop 06: Darwin Core
  • Workshop 06A
  • Workshop 08