Skip to content

cdm_schema

Schema for KBase CDM

URI: http://kbase.github.io/cdm-schema/linkml/cdm_schema

Name: cdm_schema

Classes

Class Description
Any Used as a range for slots that have more than one possible type.
AttributeMixin The attribute in an attribute-value pair. One of attribute_cv_id, attribute_cv_label, and attribute_string is required.
AttributeValue The value for any value of attribute for an entity. This object can hold both the un-normalized atomic value and the structured value.
        ControlledTermValue A quality, described using a text string from a controlled vocabulary or enum.
        ControlledVocabularyTermValue A quality, described using a term from an ontology or schema with a stable persistent identifier.
        DateTimeValue A date or date and time value.
        Geolocation A normalized value for a location on the earth's surface. Should be expressed in decimal degrees.
        QuantityRangeValue A numerical range, e.g. 5-7 cm.
        QuantityValue A simple quantity, e.g. 2 cm.
        TextValue A basic string value.
EntityAttributeValue Class comprising all possible entity-attribute-value slots.
EntityMixin A generic class for capturing attribute-value information about an entity in a structured form.
Schema The root class for the CDM schema.
Table Abstract class representing a table in the CDM schema.
        Association An association between an object--typically an entity such as a protein or a feature--and a classification system or ontology, such as the Gene Ontology, the Enzyme Classification, or TIGRFAMS domains.
        Cluster Represents an individual execution of a clustering protocol. See the ClusterMember class for clustering results.
        ClusterMember Relationship representing membership of a cluster. An optional score can be assigned to each cluster member.
        Contig A contig (derived from the word "contiguous") is a set of DNA segments or sequences that overlap in a way that provides a contiguous representation of a genomic region. A contig should not contain any gaps.
        ContigCollection A set of individual, overlapping contigs that represent the complete sequenced genome of an organism.
        Contributor Represents a contributor to a resource.

Contributors must have a 'contributor_type', either 'Person' or 'Organization', and one of the 'name' fields: either 'given_name' and 'family_name' (for a person), or 'name' (for an organization or a person).

The 'contributor_role' field takes values from the DataCite and CRediT contributor roles vocabularies. For more information on these resources and choosing appropriate roles, please see the following links:

DataCite contributor roles: https://support.datacite.org/docs/datacite-metadata-schema-v44-recommended-and-optional-properties#7a-contributortype

CRediT contributor role taxonomy: https://credit.niso.org
        ContributorXRoleXProject Describes the participation of a contributor in a project; ideally the contributor_role field will capture how the contributor was involved.
        DataSource The source for a resource, dataset, association, etc.
        DataSourceNew The source dataset from which data within the CDM was extracted. This might be an API query; a set of files downloaded from a website or uploaded by a user; a database dump; etc. A given data source should have either version information (e.g. a release number, like those used by UniProt or RefSeq) or an access date to allow the original raw data dump to be recapitulated.
        EncodedFeature An entity generated from a feature, such as a transcript.
        EntailedEdge A relation graph edge that is inferred. This table contains links between the nodes that appear as subjects in the Statement table. The graph contains all possible links between nodes and is created using a reasoner such as relation-graph to materialise inferrred links.
        Entity An entity in the CDM.
        Event Something that happened.
        Experiment A discrete scientific procedure undertaken to make a discovery, test a hypothesis, or demonstrate a known fact. The protocol_id links to the workflow followed to perform the experiment.
        ExperimentCondition A measurement, reagent, or description of one aspect of the environment used in an experiment; examples include temperature; aerobic or anaerobic conditions; presence of a chemical in the environment. Used to describe the context, conditions, or set-up of an experiment.
        ExperimentConditionSet A unique combination of experimental conditions and entities that are used in a specific experiment. One experiment condition set is expected to comprise of multiple ExperimentConditions.
        Feature A feature localized to an interval along a contig.
        FundingReference Represents a funding source for a resource, including the funding body and the grant awarded.

One (or more) of the fields grant_id, grant_url, or funder.organization_name is required; others are optional.

Recommended resources for organization identifiers include:
- Research Organization Registry, http://ror.org
- International Standard Name Identifier, https://isni.org
- Crossref Funder Registry, https://www.crossref.org/services/funder-registry/ (to be subsumed into ROR)

Some organizations may have a digital object identifier (DOI).
        GoldEnvironmentalContext Environmental context, described using JGI's five level system.
        Identifier A string used as a resolvable (external) identifier for an entity. This should be a CURIE in the form <database_prefix>:<local_identifier>. Bioregistry is used as the canonical reference for CURIE database prefixes; please use the prefix exactly as written in the Bioregistry entry.

If the string cannot be resolved to an URL, it should be added to the Name table instead.

This table is used for capturing external IDs. The internal CDM identifier should be used in the *_id field (e.g. feature_id, protein_id, contig_collection_id).
        License License information for the resource.
        LinkerTable Tables for linking between tables.
                AssociationXSupportingObject Links associations to entities to capture supporting objects in an association. May be a biological entity, such as a protein or feature, or a URL to a resource (e.g. a publication) that supports the association. Where possible, CDM identifiers should be used.
                ContigXContigCollection Captures the relationship between a contig and a contig collection; equivalent to contig part-of contig collection.
                ContigXEncodedFeature Captures the relationship between a contig and an encoded feature.
                ContigXFeature Captures the relationship between a contig and a feature; equivalent to feature part-of contig.
                ContigXProtein Captures the relationship between a contig and a protein; equivalent to protein is ribosomal translation of (http://purl.obolibrary.org/obo/RO_0002512) contig.
                ContigCollectionXEncodedFeature Captures the relationship between a contig collection and an encoded feature.
                ContigCollectionXFeature Captures the relationship between a contig collection and a feature; equivalent to feature part-of contig collection.
                ContigCollectionXProtein Captures the relationship between a contig collection and a protein; equivalent to protein is ribosomal translation of (http://purl.obolibrary.org/obo/RO_0002512) contig collection.
                ContributorXDataSource Captures the people and/or organizations involved in producing a dataset; ideally the contributor_role field will capture how the contributor was involved.
                ContributorAffiliation Captures relationships between contributors where one contributor is part of another contributor, e.g. a member of a group or a group that is part of a larger organization.
                DataSourceXDescription Links a data source to a description (e.g. the abstract or a free text description).
                DataSourceXFundingReference Links a data source to a funding reference.
                DataSourceXLicense Links a data source to a license.
                DataSourceXTitle Links a data source to a title.
                EncodedFeatureXFeature Captures the relationship between a feature and its transcription product.
                EncodedFeatureXProtein Captures the relationship between an encoded feature (RNA of some sort) and a protein.
                EntityIdentifiers Represents the link between an entity and its identifiers.
                EntityNames Represents the link between an entity and its names.
                FeatureXProtein Captures the relationship between a feature and a protein; equivalent to feature encodes protein.
                ProtocolVariable A variable that may or may not be set as part of an experiment.
        Measurement The value of a specified variable_id under the specified conditions.
        MeasurementSet Grouping table to collate a set of protocol outputs by variable, quality, and timestamp.
        MixsEnvironmentalContext Environmental context, described using the MiXS convention of broad and local environment, plus the medium.
        Name A string used as the name or label for an entity. This may be a primary name, alternative name, synonym, acronym, or any other label used to refer to an entity.

Identifiers that look like CURIEs or database references, but which cannot be resolved using Bioregistry or identifiers.org should be added to the Name table.
        OrderedProtocolStep A list of the steps in a protocol; the step_index indicates the order in which they should be executed.
        Parameter A parameter in a protocol. Currently specific to computational protocols.
        Prefix Maps CURIEs to URIs
        Project Administrative unit for collecting data related to a certain topic, location, data type, grant funding, and so on.
        Protein Proteins are large, complex molecules made up of one or more long, folded chains of amino acids, whose sequences are determined by the DNA sequence of the protein-encoding gene.
        Protocol A defined method or set of methods.
        ProtocolExecution An instance of executing a protocol. Used for
        ProtocolInput An input parameter for a protocol.
        ProtocolInputSet A set of input parameters for a protocol.
        ProtocolOutput The output of a protocol.
        ProtocolStep A step in a protocol.
        Publication A publication (e.g. journal article).
        ResourceDescription Textual information about the resource being represented.
        ResourceTitle Represents the title or name of a resource, the type of that title, and the language used (if appropriate).

The title field is required; title_type is only necessary if the text is not the primary title.
        Sample A material entity that can be characterised by an experiment.
        Sequence A sequence of nucleotides or amino acids.
        Statement Represents an RDF triple, a statement in the form "subject predicate object" or "subject predicate value".

See Semantic SQL for more information on the contents of this table and how it is populated.
        Variable A variable (input, output, environmental, etc.) that can be set or measured as part of a protocol.
        VariableValue The possible types for the value of a variable. Should be a LinkML data type or one of the defined CDM data types.
UnitMixin The unit used in expressing a quantity value.

Slots

Slot Description
affiliation_id The ID of the organization to which a contributor belongs. Should be the ID of another contributor.
aggregator_knowledge_source The knowledge source that aggregated the association. Should be a CDM ID from the Contributor or DataSource table.
annotation_date The date when the annotation was made.
asm_score A composite score for comparing contig collection quality.
association_id Internal (CDM) unique identifier for an association.
association_x_supporting_objects All association x supporting object records in the schema.
associations All associations in the schema.
attribute_cv_id The attribute being represented. For attributes that are in a controlled vocabulary, ontology, or enumeration, this attribute should capture the term ID from the controlled vocabulary.
attribute_cv_label The attribute being represented. For attributes that are in a controlled vocabulary, ontology, or enumeration, this attribute should capture the term from the controlled vocabulary.
attribute_string The attribute being represented, as a text string. This field should only be used if the attribute is not represented in a controlled vocabulary, ontology, or enumeration.
base The base URI a prefix will expand to.
cardinality The cardinality of the parameter.
cds_phase For features of type CDS, the phase indicates where the next codon begins relative to the 5' end (where the 5' end of the CDS is relative to the strand of the CDS feature) of the current CDS feature. cds_phase is required if the feature type is CDS.
checkm_completeness Estimate of the completeness of a contig collection (MAG or genome), estimated by CheckM tool. Ensure that percentage values are converted to floats.
checkm_contamination Estimate of the contamination of a contig collection (MAG or genome), estimated by CheckM tool. Ensure that percentage values are converted to floats.
checkm_version Version of the CheckM tool used.
checksum The checksum of the sequence, used to verify its integrity.
cluster_id Internal (CDM) unique identifier for a cluster.
From the Entity table: entity_id where entity_type == 'Cluster'.
cluster_members All cluster members in the schema.
clusters All clusters in the schema.
comments Any comments about the association.
contig_bp Total size in bp of all contigs
contig_collection_id Internal (CDM) unique identifier for a contig collection.
From the Entity table: entity_id where entity_type == 'ContigCollection'.
contig_collection_type The type of contig collection.
contig_collection_x_encoded_features All contig collection x encoded feature records in the schema.
contig_collection_x_features All contig collection x feature records in the schema.
contig_collection_x_proteins All contig collection x protein records in the schema.
contig_collections All contig collections in the schema.
contig_id Internal (CDM) unique identifier for a contig.
From the Entity table: entity_id where entity_type == 'Contig'.
contig_l50 Given a set of contigs, the L50 is defined as the sequence length of the shortest contig at 50% of the total contig collection length
contig_l90 The L90 statistic is less than or equal to the L50 statistic; it is the length for which the collection of all contigs of that length or longer contains at least 90% of the sum of the lengths of all contigs
contig_logsum The sum of the (length*log(length)) of all contigs, times some constant.
contig_max Maximum contig length
contig_n50 Given a set of contigs, each with its own length, the N50 count is defined as the smallest number_of_contigs whose length sum makes up half of contig collection size
contig_n90 Given a set of contigs, each with its own length, the N90 count is defined as the smallest number of contigs whose length sum makes up 90% of contig collection size
contig_powersum Powersum of all contigs is the same as logsum except that it uses the sum of (length*(length^P)) for some power P (default P=0.25)
contig_x_contig_collections All contig x contig collection records in the schema.
contig_x_encoded_features All contig x encoded feature records in the schema.
contig_x_features All contig x feature records in the schema.
contig_x_proteins All contig x protein records in the schema.
contigs All contigs in the schema.
contributor_affiliations All contributor affiliations in the schema.
contributor_id Internal (CDM) unique identifier for a contributor.
From the Entity table: entity_id where entity_type == 'Contributor'.
contributor_role Role(s) played by the contributor when working on the experiment. If more than one role was played, additional rows should be added to represent each role.
contributor_type Must be either 'Person' or 'Organization'.
contributor_x_role_x_project All contributor x role x project records in the schema.
contributors All contributors in the schema.
created Date/timestamp for when the entity was created or added to the CDM.
created_at The time at which the event started or was created.
data_source_created Date/timestamp for when the entity was created or added to the data source.
data_source_entity_id The primary, ideally unique, ID of the entity at the data source.
data_source_id Internal (CDM) unique identifier for a data source.
From the Entity table: entity_id where entity_type == 'DataSource'.
data_source_updated Date/timestamp for when the entity was updated in the data source.
data_source_x_descriptions All data source descriptions in the schema.
data_source_x_funding_references All data source x funding reference records in the schema.
data_source_x_licenses All data source x license records in the schema.
data_source_x_titles All data source x title records in the schema.
data_sources All data sources in the schema.
datatype the rdf datatype of the value, for example, xsd:string or xsd:float.
date_accessed The date when the data was downloaded from the data source.
date_published The date when the data source was originally made public.
date_time A date or date and time, expressed in ISO 8601 format with timezone indicators where appropriate. The date or date/time value, expressed in ISO 8601-compatible form. Dates should be expressed as YYYY-MM-DD; times should be expressed as HH:MM:SS with optional milliseconds and an indication of the timezone.
date_updated The date when the data source was last updated.
default The default value for the parameter if a value is not supplied.
description Brief textual definition or description.
description_text The text content of the informational element.
description_type The type of text being represented.
doi The DOI for a protocol.
e_value The 'score' of the feature. The semantics of this field are ill-defined. E-values should be used for sequence similarity features.
ecosystem JGI GOLD descriptor representing the top level ecosystem categorization.
ecosystem_category JGI GOLD descriptor representing the ecosystem category.
ecosystem_subtype JGI GOLD descriptor representing the subtype of ecosystem. May be "Unclassified".
ecosystem_type JGI GOLD descriptor representing the ecosystem type. May be "Unclassified".
encoded_feature_id Internal (CDM) unique identifier for an encoded feature.
From the Entity table: entity_id where entity_type == 'EncodedFeature'.
encoded_feature_x_features All encoded feature x feature records in the schema.
encoded_feature_x_proteins All encoded feature x protein records in the schema.
encoded_features All encoded features in the schema.
end The start and end coordinates of the feature are given in positive 1-based int coordinates, relative to the landmark given in column one. Start is always less than or equal to end. For features that cross the origin of a circular feature (e.g. most bacterial genomes, plasmids, and some viral genomes), the requirement for start to be less than or equal to end is satisfied by making end = the position of the end + the length of the landmark feature. For zero-length features, such as insertion sites, start equals end and the implied site is to the right of the indicated base in the direction of the landmark.
entailed_edges All entailed edges in the schema.
entities All entities in the schema.
entity_attribute_values All entity attribute values in the schema.
entity_id Internal (CDM) unique identifier for an entity.
entity_identifiers All identifier x entity records in the schema.
entity_names All name x entity records in the schema.
entity_type Type of entity being clustered.
env_broad_scale Report the major environmental system the sample or specimen came from. The system(s) identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. in the desert or a rainforest). We recommend using subclasses of EnvO's biome class: http://purl.obolibrary.org/obo/ENVO_00000428. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS
env_local_scale Report the entity or entities which are in the sample or specimen's local vicinity and which you believe have significant causal influences on your sample or specimen. We recommend using EnvO terms which are of smaller spatial grain than your entry for env_broad_scale. Terms, such as anatomical sites, from other OBO Library ontologies which interoperate with EnvO (e.g. UBERON) are accepted in this field. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS.
env_medium Report the environmental material(s) immediately surrounding the sample or specimen at the time of sampling. We recommend using subclasses of 'environmental material' (http://purl.obolibrary.org/obo/ENVO_00010483). EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS . Terms from other OBO ontologies are permissible as long as they reference mass/volume nouns (e.g. air, water, blood) and not discrete, countable entities (e.g. a tree, a leaf, a table top).
event_id Internal (CDM) unique identifier for an event.
events All events in the schema.
evidence_for_existence The evidence that this protein exists. For example, the protein may have been isolated from a cell, or it may be predicted based on sequence features.
evidence_type The type of evidence supporting the association. Should be a term from the Evidence and Conclusion Ontology (ECO). Specific pieces of evidence that support the association should be added as supporting objects, in the AssociationSupportingObject table.
experiment_condition_id Internal (CDM) unique identifier for an experiment condition.
experiment_condition_set_id Internal (CDM) unique identifier for a set of experimental conditions.
experiment_condition_sets All experiment condition sets in the schema.
experiment_conditions All experiment conditions in the schema.
experiment_id Internal (CDM) unique identifier for an experiment.
experiments All experiments in the schema.
family_name The family name(s) of the contributor.
feature_id Internal (CDM) unique identifier for a feature.
From the Entity table: entity_id where entity_type == 'Feature'.
feature_x_proteins All feature x protein records in the schema.
features All features in the schema.
funder The funder for the grant or award.
funding_reference_id Internal (CDM) unique identifier for a specific source of funding -- e.g. a grant or award.
From the Entity table: entity_id where entity_type == 'FundingReference'.
funding_references All funding references in the schema.
gap_percent The gap size percentage of all scaffolds
gc_average The average GC content of the contig collection, expressed as a percentage
gc_content GC content of the contig, expressed as a percentage.
gc_std The standard deviation of GC content across the contig collection
given_name The given name(s) of the contributor.
gold_environmental_context_id Internal (CDM) unique identifier for a GOLD environmental context.
gold_environmental_contexts All GOLD environmental contexts in the schema.
grant_id Code for the grant, assigned by the funder.
grant_title Title for the grant.
grant_url URL for the grant.
gtdb_taxon_id The GTDB taxon ID for this contig collection.
has_stop_codon Captures whether or not the sequence includes stop coordinates.
hash A hash value generated from one or more object attributes that serves to ensure the entity is unique.
id An identifier for an element. Note blank node ids are not unique across databases
identifier Fully-qualified URL or CURIE used as an identifier for an entity.
identifiers All identifiers in the schema.
is_representative Whether or not this member is the representative for the cluster. If 'is_representative' is false, it is assumed that this is a cluster member.
is_seed Whether or not this is the seed for this cluster.
language the human language in which the value is encoded, e.g. 'en'
latitude The latitude portion of a geolocation.
length Length of the contig in bp.
license Usage license for the resource. Use one of the SPDX license identifiers or provide a link to the license text if no SPDX ID is available.
license_id Unique identifier for a license.
licenses All licenses in the schema.
location The location for this event. May be described in terms of coordinates.
longitude The longitude portion of a geolocation.
maximum_numeric_value The maximum value part, expressed as number, of the quantity value when the value covers a range.
measurement_id Internal (CDM) unique identifier for a measurement.
measurement_set_id Internal (CDM) unique identifier for a specified variable in a specified experiment.
measurement_sets All measurement sets in the schema.
measurements All measurements in the schema.
minimum_numeric_value The minimum value part, expressed as number, of the quantity value when the value covers a range.
mixs_environmental_context_id Internal (CDM) unique identifier for a mixs environmental context.
mixs_environmental_contexts All MIxS environmental contexts in the schema.
n_chromosomes Total number of chromosomes
n_contigs Total number of contigs
n_scaffolds Total number of scaffolds
name A string used as a name or title.
name_cv_id If the name is from a controlled vocabulary (CV), the curie of the controlled vocabulary term.
names All names in the schema.
ncbi_taxon_id The NCBI taxon ID for this contig collection.
negated If true, the relationship between the subject and object is negated. For example, consider an association where the subject is a protein ID, the object is the GO term for "glucose biosynthesis", and the predicate is "involved in". With the "negated" field set to false, the association is interpreted as " is involved in glucose biosynthesis". With the "negated" field set to true, the association is interpreted as " is not involved in glucose biosynthesis".
numeric_value The numerical part of a quantity value.
object Note the range of this slot is always a node. If the triple represents a literal, the "value" field will be populated instead.
ordered_protocol_steps All ordered protocol steps in the schema.
p_value The 'score' of the feature. The semantics of this field are ill-defined. P-values should be used for ab initio gene prediction features.
parameter_id Internal (CDM) unique identifier for a parameter of a protocol.
parameter_type Whether the parameter applies to the protocol input or output.
parameters All parameters in the schema.
predicate The predicate of the statement
prefix A standardized prefix such as 'GO' or 'rdf' or 'FlyBase'.
prefixes The prefix mappings for the schema.
primary_knowledge_source The knowledge source or contributor that created the association. Should be a CDM ID from the Contributor or DataSource table.
project_id Internal (CDM) unique identifier for a project.
From the Entity table: entity_id where entity_type == 'Project'.
projects All projects in the schema.
protein_id Internal (CDM) unique identifier for a protein.
From the Entity table: entity_id where entity_type == 'Protein'.
proteins All proteins in the schema.
protocol_execution_id Internal CDM unique identifier for an execution of a protocol.
protocol_executions All protocol executions in the schema.
protocol_id Internal (CDM) unique identifier for a protocol.
From the Entity table: entity_id where entity_type == 'Protocol'.
protocol_input_id Internal CDM unique identifier for the value of an input parameter for a protocol.
protocol_input_set_id Internal CDM unique identifier for a set of input parameter values for a protocol.
protocol_input_sets All protocol input sets in the schema.
protocol_inputs All protocol inputs in the schema.
protocol_output_id Internal CDM unique identifier for the value of an output of a protocol.
protocol_outputs All protocol outputs in the schema.
protocol_step_id Internal CDM unique identifier for a step in a protocol.
protocol_steps All protocol steps in the schema.
protocol_variable_id Internal CDM unique identifier for a variable in a protocol.
protocol_variables All protocol variables in the schema.
protocols All protocols in the schema.
publication_id Unique identifier for a publication - e.g. PMID, DOI, URL, etc.
publications All publications in the schema.
publisher The publisher of the resource. For a dataset, this is the repository where it is stored.
quality A quality score for measurement.
raw_value The value that was specified for an annotation in its raw form; e.g. "2 cm" or "2-4 cm"
relationship Relationship between this identifier and the entity in the entity_id field. If absent, it is assumed that the identifier represents the same entity in another data source.
required Whether or not this parameter must be supplied.
resource_description_id Unique identifier for a description for a resource.
resource_descriptions All resource descriptions in the schema.
resource_title_id Unique identifier for a title for a resource.
resource_titles All resource titles in the schema.
resource_type The broad type of the source data for this object. 'dataset' is currently the only valid value supported by this schema.
sample_id Internal (CDM) unique identifier for a sample.
From the Entity table: entity_id where entity_type == 'Sample'.
samples All samples in the schema.
scaffold_bp Total size in bp of all scaffolds
scaffold_l50 Given a set of scaffolds, the L50 is defined as the sequence length of the shortest scaffold at 50% of the total contig collection length
scaffold_l90 The L90 statistic is less than or equal to the L50 statistic; it is the length for which the collection of all scaffolds of that length or longer contains at least 90% of the sum of the lengths of all scaffolds.
scaffold_logsum The sum of the (length*log(length)) of all scaffolds, times some constant. Increase the contiguity, the score will increase
scaffold_maximum_length Maximum scaffold length
scaffold_n50 Given a set of scaffolds, each with its own length, the N50 count is defined as the smallest number of scaffolds whose length sum makes up half of contig collection size
scaffold_n90 Given a set of scaffolds, each with its own length, the N90 count is defined as the smallest number of scaffolds whose length sum makes up 90% of contig collection size
scaffold_powersum Powersum of all scaffolds is the same as logsum except that it uses the sum of (length*(length^P)) for some power P (default P=0.25).
scaffolds_n_over_50K The number of scaffolds longer than 50,000 base pairs.
scaffolds_percent_over_50K The percentage of the total assembly length represented by scaffolds longer than 50,000 base pairs
scaffolds_total_length_over_50k The total length of scaffolds longer than 50,000 base pairs
score Output from the clustering protocol indicating how closely a member matches the representative.
sequence The protein amino acid sequence.
sequence_id Internal (CDM) unique identifier for a sequence.
From the Entity table: entity_id where entity_type == 'Sequence'.
sequences All sequences in the schema.
source The source for a specific piece of information; should be a CDM internal ID of a source in the DataSource table.
source_database ID of the data source from which this entity came.
specific_ecosystem JGI GOLD descriptor representing the most specific level of ecosystem categorization. May be "Unclassified".
start The start and end coordinates of the feature are given in positive 1-based int coordinates, relative to the landmark given in column one. Start is always less than or equal to end. For features that cross the origin of a circular feature (e.g. most bacterial genomes, plasmids, and some viral genomes), the requirement for start to be less than or equal to end is satisfied by making end = the position of the end + the length of the landmark feature. For zero-length features, such as insertion sites, start equals end and the implied site is to the right of the indicated base in the direction of the landmark.
statements All statements in the schema.
step Text description of a step in a protocol.
step_index The number of the step in an ordered progression.
strand The strand of the feature.
subject The subject of the statement
text_value The value, as a text string. This field should only be used if the value is not represented in a controlled vocabulary, ontology, or enumeration.
title A string used as a title for a resource.
title_type A descriptor for the title for cases where the contents of the title field is not the primary name or title.
type The type of value being represented - e.g. QuantityValue, TextValue, DateTimeValue, ControlledVocabularyTermValue, etc.
unit The units used to measure the value of the variable, if applicable. Units should be expressed using the Unit Ontology or a term from UCUM.
unit_cv_id The unit of the quantity, expressed as a CURIE from the Unit Ontology.
unit_cv_label The unit of a quantity, expressed as the term name of a term from the Unit Ontology or UCUM.
unit_string Links a QuantityValue to a unit. Units should be taken from the UCUM unit collection or the Unit Ontology. This field should only be used if the unit is not present in one of those sources.
updated Date/timestamp for when the entity was updated in the CDM.
url The URL from which the data was loaded.
value Note the range of this slot is always a string. Only used if the triple represents a literal assertion
value_cv_id For values that are in a controlled vocabulary (CV), this attribute should capture the controlled vocabulary ID for the value.
value_cv_label For values that are in a controlled vocabulary, ontology, or enumeration, this attribute should capture the term from the controlled vocabulary.
value_type The type(s) of the value.
variable_id Internal CDM unique identifier for a variable.
variable_value_id Internal CDM unique identifier for a variable value.
variable_values All variable values in the schema.
variables All variables in the schema.
version The version of the resource. This must be an absolute version, not a relative version like 'latest'.

Enumerations

Enumeration Description
AttributeValueType
CdsPhaseType For features of type CDS (coding sequence), the phase indicates where the feature begins with reference to the reading frame. The phase is one of the integers 0, 1, or 2, indicating the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon.
ClusterType The type of the entities in a cluster. Must be represented by a table in the CDM schema.
ContigCollectionType The type of the contig set; the type of the 'omics data set. Terms are taken from the Genomics Standards Consortium where possible. See the GSC checklists at https://genomicsstandardsconsortium.github.io/mixs/ for the controlled vocabularies used.
ContributorRole The type of contribution made by a contributor.
ContributorType The type of contributor being represented.
DescriptionType The type of text being represented.
EntityType The type of an entity. Must be represented by a table in the CDM schema.
EventType The type of date being represented.
ProteinEvidenceForExistence The evidence for the existence of a biological entity. See https://www.uniprot.org/help/protein_existence and https://www.ncbi.nlm.nih.gov/genbank/evidence/.
ProtocolParameterType An input, an operation parameter or switch, or an output for a protocol.
RefSeqStatusType RefSeq status codes, taken from https://www.ncbi.nlm.nih.gov/genbank/evidence/.
RelationshipType The relationship between two entities. For example, when a PermanentID class is used to represent objects in the CreditMetadata field related_identifiers, the relationship_type field captures the relationship between the resource being registered (A) and this ID (B).
ResourceType The type of resource being represented.
SequenceType The type of sequence being represented.
StrandType The strand that a feature appears on relative to a landmark. Also encompasses unknown or irrelevant strandedness.
TitleType The type of title being represented.
VariableType The type of the value of a variable. Should be a LinkML data type or one of the defined CDM data types.

Types

Type Description
Boolean A binary (true or false) value
CdmAssociationId A CDM ID (cdm_id) that identifies an association in the CDM.
CdmClusterId A CDM ID (cdm_id) that identifies a cluster in the CDM.
CdmContigCollectionId A CDM ID (cdm_id) that identifies a contig collection in the CDM.
CdmContigId A CDM ID (cdm_id) that identifies a contig in the CDM.
CdmContributorId A CDM ID (cdm_id) that identifies a contributor in the CDM.
CdmDataSourceId A CDM ID (cdm_id) that identifies a data source in the CDM.
CdmEncodedFeatureId A CDM ID (cdm_id) that identifies an encoded feature in the CDM.
CdmFeatureId A CDM ID (cdm_id) that identifies a feature in the CDM.
CdmId A universally unique ID, generated using uuid4, with the prefix "CDM:". Used internally in the CDM.
CdmLicenseId A CDM ID (cdm_id) that identifies a license in the CDM.
CdmProteinId A CDM ID (cdm_id) that identifies a protein in the CDM.
CdmProtocolId A CDM ID (cdm_id) that identifies a protocol in the CDM.
CdmSampleId A CDM ID (cdm_id) that identifies a sample in the CDM.
CdmSequenceId A CDM ID (cdm_id) that identifies a sequence in the CDM.
Curie a compact URI
Date a date (year, month and day) in an idealized calendar
DateOrDatetime Either a date or a datetime
Datetime The combination of a date and time
Decimal A real number with arbitrary precision that conforms to the xsd:decimal specification
DecimalDegree A decimal degree expresses latitude or longitude as decimal fractions.
Double A real number that conforms to the xsd:double specification
Float A real number that conforms to the xsd:float specification
Integer An integer
Iso8601 A date in ISO 8601 format, e.g. 2024-04-05T12:34:56Z. "Z" indicates UTC time.
Jsonpath A string encoding a JSON Path. The value of the string MUST conform to JSON Point syntax and SHOULD dereference to zero or more valid objects within the current instance document when encoded in tree form.
Jsonpointer A string encoding a JSON Pointer. The value of the string MUST conform to JSON Point syntax and SHOULD dereference to a valid object within the current instance document when encoded in tree form.
LiteralAsStringType A literal represented as a string.
LocalCurie A CURIE that exists as a subject in the statements table (i.e. statements.subject). Should not be used for external identifiers.
LocalCurieName The term name for an ontology term; should appear as an object in the statements table with a relationship indicating that it is the name of an ontology term.
Ncname Prefix part of CURIE
NodeIdType IDs are either CURIEs, IRI, or blank nodes. IRIs are wrapped in <>s to distinguish them from CURIEs, but in general it is good practice to populate the [prefixes][Prefixes.md] table such that they are shortened to CURIEs. Blank nodes are ids starting with _:.
Nodeidentifier A URI, CURIE or BNODE that represents a node in a model.
Objectidentifier A URI or CURIE that represents an object in the model.
Sparqlpath A string encoding a SPARQL Property Path. The value of the string MUST conform to SPARQL syntax and SHOULD dereference to zero or more valid objects within the current instance document when encoded as RDF.
String A character string
Time A time object represents a (local) time of day, independent of any particular day
Uri a complete URI
Uriorcurie a URI or a CURIE

Subsets

Subset Description