Dataset Citation Metadata
LinkML schema for dataset citation metadata, as used by KBase.
About the Dataset Citation Metadata Schema
The Dataset Citation Metadata (DCM) schema is designed to capture accurate credit information for scientific datasets whilst maintaining compatibility with DOI minting authorities such as DataCite, Crossref, and the US Department of Energy's Office of Scientific and Technical Information.
The schema is based heavily on the DataCite Metadata Schema and Commonmeta, with some alterations and additions to allow more accurate capture of dataset-relevant information.
It aims to satisfy the data citation recommendations published in Ten simple rules for getting and giving credit for data 1 and Data Citation Guidelines for Earth Science Data, Version 2 2.
Required Elements
The core elements required for a data citation are as follows:
-
contributor(s): people and/or organisations involved in generating the dataset.
-
title: the formal name of the dataset.
-
resolvable permanent identifier: a unique, resolvable identifier that can be used to access the data. We recommend the use of CURIEs (Compact URI) with prefixes registered at Bioregistry, which allows the CURIE to be resolved into the corresponding URI.
-
version: the version of the dataset; not all projects provide a version for their datasets, so the publication date of the dataset may be used instead.
-
access date: when the dataset was accessed; this is important as datasets may change over time, and not all resources version their datasets.
-
repository: organisation that hosts, archives, publishes, distributes, produces, etc., the dataset.
Example Dataset Citation
Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2003. CLPX-Ground: ISA snow depth transects and related measurements ver. 2.0. Edited by M. A. Parsons and M. J. Brodzik. NASA National Snow and Ice Data Center Distributed Active Archive Center. https://doi.org/10.5060/D4MW2F23. Accessed 2008-05-14.
Other Elements
The DCM schema also has the capacity to track other dataset-related information, including the funding information and the roles of contributors. These are not necessary for dataset citation but provide a more complete picture of the context for the dataset.
Sources used in Schema Design
Data producers, consumers, providers, and managers whose data citation information informed the development of this schema include (but are not limited to):
- Commonmeta
- Contributor Role Taxonomy
- Crossref
- DataCite Metadata Schema
- Environmental Molecular Sciences Laboratory
- Environmental System Science Data Infrastructure for a Virtual Ecosystem
- Joint Genome Institute
- KBase
- National Microbiome Data Collaborative
- National Center for Biotechnology Information
- Office of Scientific and Technical Information
- ORCID
- Research Organization Registry
- Schema.org
- SPDX
References
-
Wood-Charlson EM, Crockett Z, Erdmann C, Arkin AP, Robinson CB (2022) Ten simple rules for getting and giving credit for data. PLoS Comput Biol 18(9): e1010476. https://doi.org/10.1371/journal.pcbi.1010476 ↩
-
ESIP Data Preservation and Stewardship Committee (2019). Data Citation Guidelines for Earth Science Data, Version 2. ESIP. Online resource. https://doi.org/10.6084/m9.figshare.8441816.v1 ↩