Thursday, December 8 • 10:00 - 10:15
Names and identifiers in the CyVerse cyberinfrastucture

The CyVerse, formerly known as the iPlant Colalborative, is a U.S. National Science Foundation-funded initiative “to design, deploy, and expand a national cyberinfrastructure for life sciences research, and to train scientists in its use” (http://www.cyverse.org/about). As part of this mission, CyVerse currently houses over 2 petabytes of data, most of them (we assume) about organisms. CyVerse recently launched the Data Commons, with the goal of providing support for data management throughout the data lifecycle. As part of the Data Commons, users can now publish data the Data Commons Repository (DCR) with permanent identifiers such as Digital Object Identifiers (DOIs) or Archival Resource Keys (ARKs) or publish to external repositories such as the National Center for Biotechnology Information Sequence Read Archive (NCBI SRA). Our goal for providing this data publication services is not just data preservation, but primarily data discovery and reuse. Therefore, being able to find out what organism or taxon a dataset is about, and being able to discover data for an organism or taxon is a crucial use case for the DCR. For this, we need a good way to identify organisms and taxa.
CyVerse is not a standards organization, so we rely on and collaborate with community-supported standards for the Data Commons. For SRA, this means that users should supply an NCBI taxon identifier as part of their BioSample submission. For the DCR, which uses the DataCite metadata profile, users can supply a taxon name in the “Subject” field, or, if they are motivated, supply a taxon name or identifier as additional metadata. When users supply names, we currently have no way to link those names to stable identifiers or concepts. We offer the Taxonomic Name Resolution Service (TNRS; http://tnrs.iplantcollaborative.org/) as a means of standardizing names for plants, and we are open to other collaborations for name resolution. CyVerse is actively seeking input from its user communities (including TDWG) on standards and practices (in names, identifiers, or other areas) for use within the Data Commons and related efforts.


Thursday December 8, 2016 10:00 - 10:15 CST
Auditorium CTEC