Loading…
TDWG 2016 has ended
Back To Schedule
Wednesday, December 7 • 16:45 - 17:00
GUODA: A Unified Platform for Large-Scale Computational Research on Open-Access Biodiversity Data

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Managing research data has always been challenging but the recent availability of multi-gigabyte and larger datasets from major aggregators has created new problems, especially for individual and small institution researchers. A recent collaboration between the Integrated Digitized Biocollections (iDigBio) and the Encyclopedia of Life (EOL) called Global Unified Open Data Access (GUODA) aims to bring new techniques and resources for working with large biodiversity datasets to the widest community of researchers possible.
GUODA is both a computing infrastructure built and hosted by iDigBio and a community for collaboration in using the infrastructure. Our collaboration focuses on developing tools and workflows using Apache Spark for highly parallelized data analysis, a repository of pre-formatted and ready to use biodiversity datasets, and a resource management system capable of exposing these resources to the full skill range of software developers and data analysts.
This presentation will outline the software and hardware used in GUODA, the process and formats for transforming common biodiversity data such as the Global Biodiversity Information Facility (GBIF), iDigBio, and the Biodiversity Heritage Library (BHL) into computable data structures, and demonstrate the Jupyter Notebook interface to GUODA that is designed for researchers to interact with directly.


Wednesday December 7, 2016 16:45 - 17:00 CST
Auditorium CTEC