Expansion of Open Data Policy Encourages Reproduciblity and Data Reuse
Data availability and sharing are critical to ASM’s mission to advance the microbial sciences. On June 12-13, 2019, ASM attended the Interagency Working Group on Biological Data Sharing (IWGBDS) Workshop held at the University of Maryland’s Institute for Bioscience and Biotechnology Research to learn how other organizations are approaching data sharing. IWGBDS is developing a road map for the robust sharing and reuse of biological data across federal agencies, which includes everything from wet lab assays, to behavioral assessments, to natural history collections. The working group convened the workshop to identify challenges and potential solutions to sharing these varied data. Workshop presenters offered perspectives from a variety of contexts and fields, but continually returned to the concept of “FAIR,” or data that is findable, accessible, interoperable and reusable:
- Findable: data and metadata have unique and persistent identifiers and are indexed in a searchable resource.
- Accessible: data and metadata can be retrieved by a standardized communications protocol that is free and can be universally implemented.
- Interoperable: data and metadata use standard, well-defined vocabulary and measurements.
- Reusable: data has all relevant metadata, especially pertaining to the provenance of samples, and data and metadata have a clear and accessible usage license.
These attributes for open data were put forth in a 2016 article in Scientific Data and since then have been taken up as guiding principles for a variety of groups working on data sharing.
On October 3, 2019, ASM is expanding its current open data policy from open access journals to all journals published by the society. To publish in any ASM journal, authors will need to make their data publicly available (except in rare circumstances), preferably by depositing it in publicly accessible, curated and sustainable data repositories. There are many data-type specific repositories available:
- CodeOcean for code.
- Dryad for all general, underlying data.
- figshare for figures.
- Genbank for nucleotide sequences.
- Gene Expression Omnibus (GEO) for array- and sequence-based expression data.
- GitHub for software.
- MassIVE for proteomics.
- Protocols IO for protocols.
The expansion of ASM’s open data policy is not without challenges. Authors may have to do more work up front to ensure that the data underlying their papers is available with properly annotated metadata. However, the open data policy benefits both authors and readers in the long run. Data will receive persistent, unique identifiers [such as Digital Object Identifiers (DOIs), accession numbers, etc.] when they are deposited in these repositories, making them findable and (importantly) citable. Readers will have access to the original underlying data described in a paper, enabling the reuse of that data either for reproducibility purposes or for entirely new analyses. In return, the original data generators (i.e the authors) will receive credit for their work in the form of data citations. Formal data citations promote reproducibility and help identify how data are reused. In addition, as government agencies start requiring grantees to comply with FAIR data policies over the next few years, as was hinted in the workshop, ASM authors will be better prepared to handle these new requirements.
Scientific advances are predicated upon the principle that experiments and conclusions drawn from published information can be repeated and further advanced by others. Taking this step toward open data ensures that ASM does not perpetuate historic barriers to progress in the microbial sciences.