cdm_schema
Schema for KBase CDM
URI: http://kbase.github.io/cdm-schema/linkml/cdm_schema
Name: cdm_schema
Classes
Class | Description |
---|---|
Any | Used as a range for slots that have more than one possible type. |
AttributeMixin | The attribute in an attribute-value pair. One of attribute_cv_id , attribute_cv_label , and attribute_string is required. |
AttributeValue | The value for any value of attribute for an entity. This object can hold both the un-normalized atomic value and the structured value. |
ControlledTermValue | A quality, described using a text string from a controlled vocabulary or enum. |
ControlledVocabularyTermValue | A quality, described using a term from an ontology or schema with a stable persistent identifier. |
DateTimeValue | A date or date and time value. |
Geolocation | A normalized value for a location on the earth's surface. Should be expressed in decimal degrees. |
QuantityRangeValue | A numerical range, e.g. 5-7 cm. |
QuantityValue | A simple quantity, e.g. 2 cm. |
TextValue | A basic string value. |
EntityAttributeValue | Class comprising all possible entity-attribute-value slots. |
EntityMixin | A generic class for capturing attribute-value information about an entity in a structured form. |
Schema | The root class for the CDM schema. |
Table | Abstract class representing a table in the CDM schema. |
Association | An association between an object--typically an entity such as a protein or a feature--and a classification system or ontology, such as the Gene Ontology, the Enzyme Classification, or TIGRFAMS domains. |
Cluster | Represents an individual execution of a clustering protocol. See the ClusterMember class for clustering results. |
ClusterMember | Relationship representing membership of a cluster. An optional score can be assigned to each cluster member. |
Contig | A contig (derived from the word "contiguous") is a set of DNA segments or sequences that overlap in a way that provides a contiguous representation of a genomic region. A contig should not contain any gaps. |
ContigCollection | A set of individual, overlapping contigs that represent the complete sequenced genome of an organism. |
Contributor | Represents a contributor to a resource. Contributors must have a 'contributor_type', either 'Person' or 'Organization', and one of the 'name' fields: either 'given_name' and 'family_name' (for a person), or 'name' (for an organization or a person). The 'contributor_role' field takes values from the DataCite and CRediT contributor roles vocabularies. For more information on these resources and choosing appropriate roles, please see the following links: DataCite contributor roles: https://support.datacite.org/docs/datacite-metadata-schema-v44-recommended-and-optional-properties#7a-contributortype CRediT contributor role taxonomy: https://credit.niso.org |
ContributorXRoleXProject | Describes the participation of a contributor in a project; ideally the contributor_role field will capture how the contributor was involved. |
DataSource | The source for a resource, dataset, association, etc. |
DataSourceNew | The source dataset from which data within the CDM was extracted. This might be an API query; a set of files downloaded from a website or uploaded by a user; a database dump; etc. A given data source should have either version information (e.g. a release number, like those used by UniProt or RefSeq) or an access date to allow the original raw data dump to be recapitulated. |
EncodedFeature | An entity generated from a feature, such as a transcript. |
EntailedEdge | A relation graph edge that is inferred. This table contains links between the nodes that appear as subjects in the Statement table. The graph contains all possible links between nodes and is created using a reasoner such as relation-graph to materialise inferrred links. |
Entity | An entity in the CDM. |
Event | Something that happened. |
Experiment | A discrete scientific procedure undertaken to make a discovery, test a hypothesis, or demonstrate a known fact. The protocol_id links to the workflow followed to perform the experiment. |
ExperimentCondition | A measurement, reagent, or description of one aspect of the environment used in an experiment; examples include temperature; aerobic or anaerobic conditions; presence of a chemical in the environment. Used to describe the context, conditions, or set-up of an experiment. |
ExperimentConditionSet | A unique combination of experimental conditions and entities that are used in a specific experiment. One experiment condition set is expected to comprise of multiple ExperimentConditions. |
Feature | A feature localized to an interval along a contig. |
FundingReference | Represents a funding source for a resource, including the funding body and the grant awarded. One (or more) of the fields grant_id , grant_url , or funder.organization_name is required; others are optional.Recommended resources for organization identifiers include: - Research Organization Registry, http://ror.org - International Standard Name Identifier, https://isni.org - Crossref Funder Registry, https://www.crossref.org/services/funder-registry/ (to be subsumed into ROR) Some organizations may have a digital object identifier (DOI). |
GoldEnvironmentalContext | Environmental context, described using JGI's five level system. |
Identifier | A string used as a resolvable (external) identifier for an entity. This should be a CURIE in the form <database_prefix>:<local_identifier> . Bioregistry is used as the canonical reference for CURIE database prefixes; please use the prefix exactly as written in the Bioregistry entry.If the string cannot be resolved to an URL, it should be added to the Name table instead.This table is used for capturing external IDs. The internal CDM identifier should be used in the *_id field (e.g. feature_id, protein_id, contig_collection_id). |
License | License information for the resource. |
LinkerTable | Tables for linking between tables. |
AssociationXSupportingObject | Links associations to entities to capture supporting objects in an association. May be a biological entity, such as a protein or feature, or a URL to a resource (e.g. a publication) that supports the association. Where possible, CDM identifiers should be used. |
ContigXContigCollection | Captures the relationship between a contig and a contig collection; equivalent to contig part-of contig collection. |
ContigXEncodedFeature | Captures the relationship between a contig and an encoded feature. |
ContigXFeature | Captures the relationship between a contig and a feature; equivalent to feature part-of contig. |
ContigXProtein | Captures the relationship between a contig and a protein; equivalent to protein is ribosomal translation of (http://purl.obolibrary.org/obo/RO_0002512) contig. |
ContigCollectionXEncodedFeature | Captures the relationship between a contig collection and an encoded feature. |
ContigCollectionXFeature | Captures the relationship between a contig collection and a feature; equivalent to feature part-of contig collection. |
ContigCollectionXProtein | Captures the relationship between a contig collection and a protein; equivalent to protein is ribosomal translation of (http://purl.obolibrary.org/obo/RO_0002512) contig collection. |
ContributorXDataSource | Captures the people and/or organizations involved in producing a dataset; ideally the contributor_role field will capture how the contributor was involved. |
ContributorAffiliation | Captures relationships between contributors where one contributor is part of another contributor, e.g. a member of a group or a group that is part of a larger organization. |
DataSourceXDescription | Links a data source to a description (e.g. the abstract or a free text description). |
DataSourceXFundingReference | Links a data source to a funding reference. |
DataSourceXLicense | Links a data source to a license. |
DataSourceXTitle | Links a data source to a title. |
EncodedFeatureXFeature | Captures the relationship between a feature and its transcription product. |
EncodedFeatureXProtein | Captures the relationship between an encoded feature (RNA of some sort) and a protein. |
EntityIdentifiers | Represents the link between an entity and its identifiers. |
EntityNames | Represents the link between an entity and its names. |
FeatureXProtein | Captures the relationship between a feature and a protein; equivalent to feature encodes protein. |
ProtocolVariable | A variable that may or may not be set as part of an experiment. |
Measurement | The value of a specified variable_id under the specified conditions. |
MeasurementSet | Grouping table to collate a set of protocol outputs by variable, quality, and timestamp. |
MixsEnvironmentalContext | Environmental context, described using the MiXS convention of broad and local environment, plus the medium. |
Name | A string used as the name or label for an entity. This may be a primary name, alternative name, synonym, acronym, or any other label used to refer to an entity. Identifiers that look like CURIEs or database references, but which cannot be resolved using Bioregistry or identifiers.org should be added to the Name table. |
OrderedProtocolStep | A list of the steps in a protocol; the step_index indicates the order in which they should be executed. |
Parameter | A parameter in a protocol. Currently specific to computational protocols. |
Prefix | Maps CURIEs to URIs |
Project | Administrative unit for collecting data related to a certain topic, location, data type, grant funding, and so on. |
Protein | Proteins are large, complex molecules made up of one or more long, folded chains of amino acids, whose sequences are determined by the DNA sequence of the protein-encoding gene. |
Protocol | A defined method or set of methods. |
ProtocolExecution | An instance of executing a protocol. Used for |
ProtocolInput | An input parameter for a protocol. |
ProtocolInputSet | A set of input parameters for a protocol. |
ProtocolOutput | The output of a protocol. |
ProtocolStep | A step in a protocol. |
Publication | A publication (e.g. journal article). |
ResourceDescription | Textual information about the resource being represented. |
ResourceTitle | Represents the title or name of a resource, the type of that title, and the language used (if appropriate). The title field is required; title_type is only necessary if the text is not the primary title. |
Sample | A material entity that can be characterised by an experiment. |
Sequence | A sequence of nucleotides or amino acids. |
Statement | Represents an RDF triple, a statement in the form "subject predicate object" or "subject predicate value". See Semantic SQL for more information on the contents of this table and how it is populated. |
Variable | A variable (input, output, environmental, etc.) that can be set or measured as part of a protocol. |
VariableValue | The possible types for the value of a variable. Should be a LinkML data type or one of the defined CDM data types. |
UnitMixin | The unit used in expressing a quantity value. |
Slots
Slot | Description |
---|---|
affiliation_id | The ID of the organization to which a contributor belongs. Should be the ID of another contributor. |
aggregator_knowledge_source | The knowledge source that aggregated the association. Should be a CDM ID from the Contributor or DataSource table. |
annotation_date | The date when the annotation was made. |
asm_score | A composite score for comparing contig collection quality. |
association_id | Internal (CDM) unique identifier for an association. |
association_x_supporting_objects | All association x supporting object records in the schema. |
associations | All associations in the schema. |
attribute_cv_id | The attribute being represented. For attributes that are in a controlled vocabulary, ontology, or enumeration, this attribute should capture the term ID from the controlled vocabulary. |
attribute_cv_label | The attribute being represented. For attributes that are in a controlled vocabulary, ontology, or enumeration, this attribute should capture the term from the controlled vocabulary. |
attribute_string | The attribute being represented, as a text string. This field should only be used if the attribute is not represented in a controlled vocabulary, ontology, or enumeration. |
base | The base URI a prefix will expand to. |
cardinality | The cardinality of the parameter. |
cds_phase | For features of type CDS, the phase indicates where the next codon begins relative to the 5' end (where the 5' end of the CDS is relative to the strand of the CDS feature) of the current CDS feature. cds_phase is required if the feature type is CDS. |
checkm_completeness | Estimate of the completeness of a contig collection (MAG or genome), estimated by CheckM tool. Ensure that percentage values are converted to floats. |
checkm_contamination | Estimate of the contamination of a contig collection (MAG or genome), estimated by CheckM tool. Ensure that percentage values are converted to floats. |
checkm_version | Version of the CheckM tool used. |
checksum | The checksum of the sequence, used to verify its integrity. |
cluster_id | Internal (CDM) unique identifier for a cluster. From the Entity table: entity_id where entity_type == 'Cluster'. |
cluster_members | All cluster members in the schema. |
clusters | All clusters in the schema. |
comments | Any comments about the association. |
contig_bp | Total size in bp of all contigs |
contig_collection_id | Internal (CDM) unique identifier for a contig collection. From the Entity table: entity_id where entity_type == 'ContigCollection'. |
contig_collection_type | The type of contig collection. |
contig_collection_x_encoded_features | All contig collection x encoded feature records in the schema. |
contig_collection_x_features | All contig collection x feature records in the schema. |
contig_collection_x_proteins | All contig collection x protein records in the schema. |
contig_collections | All contig collections in the schema. |
contig_id | Internal (CDM) unique identifier for a contig. From the Entity table: entity_id where entity_type == 'Contig'. |
contig_l50 | Given a set of contigs, the L50 is defined as the sequence length of the shortest contig at 50% of the total contig collection length |
contig_l90 | The L90 statistic is less than or equal to the L50 statistic; it is the length for which the collection of all contigs of that length or longer contains at least 90% of the sum of the lengths of all contigs |
contig_logsum | The sum of the (length*log(length)) of all contigs, times some constant. |
contig_max | Maximum contig length |
contig_n50 | Given a set of contigs, each with its own length, the N50 count is defined as the smallest number_of_contigs whose length sum makes up half of contig collection size |
contig_n90 | Given a set of contigs, each with its own length, the N90 count is defined as the smallest number of contigs whose length sum makes up 90% of contig collection size |
contig_powersum | Powersum of all contigs is the same as logsum except that it uses the sum of (length*(length^P)) for some power P (default P=0.25) |
contig_x_contig_collections | All contig x contig collection records in the schema. |
contig_x_encoded_features | All contig x encoded feature records in the schema. |
contig_x_features | All contig x feature records in the schema. |
contig_x_proteins | All contig x protein records in the schema. |
contigs | All contigs in the schema. |
contributor_affiliations | All contributor affiliations in the schema. |
contributor_id | Internal (CDM) unique identifier for a contributor. From the Entity table: entity_id where entity_type == 'Contributor'. |
contributor_role | Role(s) played by the contributor when working on the experiment. If more than one role was played, additional rows should be added to represent each role. |
contributor_type | Must be either 'Person' or 'Organization'. |
contributor_x_role_x_project | All contributor x role x project records in the schema. |
contributors | All contributors in the schema. |
created | Date/timestamp for when the entity was created or added to the CDM. |
created_at | The time at which the event started or was created. |
data_source_created | Date/timestamp for when the entity was created or added to the data source. |
data_source_entity_id | The primary, ideally unique, ID of the entity at the data source. |
data_source_id | Internal (CDM) unique identifier for a data source. From the Entity table: entity_id where entity_type == 'DataSource'. |
data_source_updated | Date/timestamp for when the entity was updated in the data source. |
data_source_x_descriptions | All data source descriptions in the schema. |
data_source_x_funding_references | All data source x funding reference records in the schema. |
data_source_x_licenses | All data source x license records in the schema. |
data_source_x_titles | All data source x title records in the schema. |
data_sources | All data sources in the schema. |
datatype | the rdf datatype of the value, for example, xsd:string or xsd:float. |
date_accessed | The date when the data was downloaded from the data source. |
date_published | The date when the data source was originally made public. |
date_time | A date or date and time, expressed in ISO 8601 format with timezone indicators where appropriate. The date or date/time value, expressed in ISO 8601-compatible form. Dates should be expressed as YYYY-MM-DD; times should be expressed as HH:MM:SS with optional milliseconds and an indication of the timezone. |
date_updated | The date when the data source was last updated. |
default | The default value for the parameter if a value is not supplied. |
description | Brief textual definition or description. |
description_text | The text content of the informational element. |
description_type | The type of text being represented. |
doi | The DOI for a protocol. |
e_value | The 'score' of the feature. The semantics of this field are ill-defined. E-values should be used for sequence similarity features. |
ecosystem | JGI GOLD descriptor representing the top level ecosystem categorization. |
ecosystem_category | JGI GOLD descriptor representing the ecosystem category. |
ecosystem_subtype | JGI GOLD descriptor representing the subtype of ecosystem. May be "Unclassified". |
ecosystem_type | JGI GOLD descriptor representing the ecosystem type. May be "Unclassified". |
encoded_feature_id | Internal (CDM) unique identifier for an encoded feature. From the Entity table: entity_id where entity_type == 'EncodedFeature'. |
encoded_feature_x_features | All encoded feature x feature records in the schema. |
encoded_feature_x_proteins | All encoded feature x protein records in the schema. |
encoded_features | All encoded features in the schema. |
end | The start and end coordinates of the feature are given in positive 1-based int coordinates, relative to the landmark given in column one. Start is always less than or equal to end. For features that cross the origin of a circular feature (e.g. most bacterial genomes, plasmids, and some viral genomes), the requirement for start to be less than or equal to end is satisfied by making end = the position of the end + the length of the landmark feature. For zero-length features, such as insertion sites, start equals end and the implied site is to the right of the indicated base in the direction of the landmark. |
entailed_edges | All entailed edges in the schema. |
entities | All entities in the schema. |
entity_attribute_values | All entity attribute values in the schema. |
entity_id | Internal (CDM) unique identifier for an entity. |
entity_identifiers | All identifier x entity records in the schema. |
entity_names | All name x entity records in the schema. |
entity_type | Type of entity being clustered. |
env_broad_scale | Report the major environmental system the sample or specimen came from. The system(s) identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. in the desert or a rainforest). We recommend using subclasses of EnvO's biome class: http://purl.obolibrary.org/obo/ENVO_00000428. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS |
env_local_scale | Report the entity or entities which are in the sample or specimen's local vicinity and which you believe have significant causal influences on your sample or specimen. We recommend using EnvO terms which are of smaller spatial grain than your entry for env_broad_scale. Terms, such as anatomical sites, from other OBO Library ontologies which interoperate with EnvO (e.g. UBERON) are accepted in this field. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS. |
env_medium | Report the environmental material(s) immediately surrounding the sample or specimen at the time of sampling. We recommend using subclasses of 'environmental material' (http://purl.obolibrary.org/obo/ENVO_00010483). EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS . Terms from other OBO ontologies are permissible as long as they reference mass/volume nouns (e.g. air, water, blood) and not discrete, countable entities (e.g. a tree, a leaf, a table top). |
event_id | Internal (CDM) unique identifier for an event. |
events | All events in the schema. |
evidence_for_existence | The evidence that this protein exists. For example, the protein may have been isolated from a cell, or it may be predicted based on sequence features. |
evidence_type | The type of evidence supporting the association. Should be a term from the Evidence and Conclusion Ontology (ECO). Specific pieces of evidence that support the association should be added as supporting objects, in the AssociationSupportingObject table. |
experiment_condition_id | Internal (CDM) unique identifier for an experiment condition. |
experiment_condition_set_id | Internal (CDM) unique identifier for a set of experimental conditions. |
experiment_condition_sets | All experiment condition sets in the schema. |
experiment_conditions | All experiment conditions in the schema. |
experiment_id | Internal (CDM) unique identifier for an experiment. |
experiments | All experiments in the schema. |
family_name | The family name(s) of the contributor. |
feature_id | Internal (CDM) unique identifier for a feature. From the Entity table: entity_id where entity_type == 'Feature'. |
feature_x_proteins | All feature x protein records in the schema. |
features | All features in the schema. |
funder | The funder for the grant or award. |
funding_reference_id | Internal (CDM) unique identifier for a specific source of funding -- e.g. a grant or award. From the Entity table: entity_id where entity_type == 'FundingReference'. |
funding_references | All funding references in the schema. |
gap_percent | The gap size percentage of all scaffolds |
gc_average | The average GC content of the contig collection, expressed as a percentage |
gc_content | GC content of the contig, expressed as a percentage. |
gc_std | The standard deviation of GC content across the contig collection |
given_name | The given name(s) of the contributor. |
gold_environmental_context_id | Internal (CDM) unique identifier for a GOLD environmental context. |
gold_environmental_contexts | All GOLD environmental contexts in the schema. |
grant_id | Code for the grant, assigned by the funder. |
grant_title | Title for the grant. |
grant_url | URL for the grant. |
gtdb_taxon_id | The GTDB taxon ID for this contig collection. |
has_stop_codon | Captures whether or not the sequence includes stop coordinates. |
hash | A hash value generated from one or more object attributes that serves to ensure the entity is unique. |
id | An identifier for an element. Note blank node ids are not unique across databases |
identifier | Fully-qualified URL or CURIE used as an identifier for an entity. |
identifiers | All identifiers in the schema. |
is_representative | Whether or not this member is the representative for the cluster. If 'is_representative' is false, it is assumed that this is a cluster member. |
is_seed | Whether or not this is the seed for this cluster. |
language | the human language in which the value is encoded, e.g. 'en' |
latitude | The latitude portion of a geolocation. |
length | Length of the contig in bp. |
license | Usage license for the resource. Use one of the SPDX license identifiers or provide a link to the license text if no SPDX ID is available. |
license_id | Unique identifier for a license. |
licenses | All licenses in the schema. |
location | The location for this event. May be described in terms of coordinates. |
longitude | The longitude portion of a geolocation. |
maximum_numeric_value | The maximum value part, expressed as number, of the quantity value when the value covers a range. |
measurement_id | Internal (CDM) unique identifier for a measurement. |
measurement_set_id | Internal (CDM) unique identifier for a specified variable in a specified experiment. |
measurement_sets | All measurement sets in the schema. |
measurements | All measurements in the schema. |
minimum_numeric_value | The minimum value part, expressed as number, of the quantity value when the value covers a range. |
mixs_environmental_context_id | Internal (CDM) unique identifier for a mixs environmental context. |
mixs_environmental_contexts | All MIxS environmental contexts in the schema. |
n_chromosomes | Total number of chromosomes |
n_contigs | Total number of contigs |
n_scaffolds | Total number of scaffolds |
name | A string used as a name or title. |
name_cv_id | If the name is from a controlled vocabulary (CV), the curie of the controlled vocabulary term. |
names | All names in the schema. |
ncbi_taxon_id | The NCBI taxon ID for this contig collection. |
negated | If true, the relationship between the subject and object is negated. For example, consider an association where the subject is a protein ID, the object is the GO term for "glucose biosynthesis", and the predicate is "involved in". With the "negated" field set to false, the association is interpreted as " |
numeric_value | The numerical part of a quantity value. |
object | Note the range of this slot is always a node. If the triple represents a literal, the "value" field will be populated instead. |
ordered_protocol_steps | All ordered protocol steps in the schema. |
p_value | The 'score' of the feature. The semantics of this field are ill-defined. P-values should be used for ab initio gene prediction features. |
parameter_id | Internal (CDM) unique identifier for a parameter of a protocol. |
parameter_type | Whether the parameter applies to the protocol input or output. |
parameters | All parameters in the schema. |
predicate | The predicate of the statement |
prefix | A standardized prefix such as 'GO' or 'rdf' or 'FlyBase'. |
prefixes | The prefix mappings for the schema. |
primary_knowledge_source | The knowledge source or contributor that created the association. Should be a CDM ID from the Contributor or DataSource table. |
project_id | Internal (CDM) unique identifier for a project. From the Entity table: entity_id where entity_type == 'Project'. |
projects | All projects in the schema. |
protein_id | Internal (CDM) unique identifier for a protein. From the Entity table: entity_id where entity_type == 'Protein'. |
proteins | All proteins in the schema. |
protocol_execution_id | Internal CDM unique identifier for an execution of a protocol. |
protocol_executions | All protocol executions in the schema. |
protocol_id | Internal (CDM) unique identifier for a protocol. From the Entity table: entity_id where entity_type == 'Protocol'. |
protocol_input_id | Internal CDM unique identifier for the value of an input parameter for a protocol. |
protocol_input_set_id | Internal CDM unique identifier for a set of input parameter values for a protocol. |
protocol_input_sets | All protocol input sets in the schema. |
protocol_inputs | All protocol inputs in the schema. |
protocol_output_id | Internal CDM unique identifier for the value of an output of a protocol. |
protocol_outputs | All protocol outputs in the schema. |
protocol_step_id | Internal CDM unique identifier for a step in a protocol. |
protocol_steps | All protocol steps in the schema. |
protocol_variable_id | Internal CDM unique identifier for a variable in a protocol. |
protocol_variables | All protocol variables in the schema. |
protocols | All protocols in the schema. |
publication_id | Unique identifier for a publication - e.g. PMID, DOI, URL, etc. |
publications | All publications in the schema. |
publisher | The publisher of the resource. For a dataset, this is the repository where it is stored. |
quality | A quality score for measurement. |
raw_value | The value that was specified for an annotation in its raw form; e.g. "2 cm" or "2-4 cm" |
relationship | Relationship between this identifier and the entity in the entity_id field. If absent, it is assumed that the identifier represents the same entity in another data source. |
required | Whether or not this parameter must be supplied. |
resource_description_id | Unique identifier for a description for a resource. |
resource_descriptions | All resource descriptions in the schema. |
resource_title_id | Unique identifier for a title for a resource. |
resource_titles | All resource titles in the schema. |
resource_type | The broad type of the source data for this object. 'dataset' is currently the only valid value supported by this schema. |
sample_id | Internal (CDM) unique identifier for a sample. From the Entity table: entity_id where entity_type == 'Sample'. |
samples | All samples in the schema. |
scaffold_bp | Total size in bp of all scaffolds |
scaffold_l50 | Given a set of scaffolds, the L50 is defined as the sequence length of the shortest scaffold at 50% of the total contig collection length |
scaffold_l90 | The L90 statistic is less than or equal to the L50 statistic; it is the length for which the collection of all scaffolds of that length or longer contains at least 90% of the sum of the lengths of all scaffolds. |
scaffold_logsum | The sum of the (length*log(length)) of all scaffolds, times some constant. Increase the contiguity, the score will increase |
scaffold_maximum_length | Maximum scaffold length |
scaffold_n50 | Given a set of scaffolds, each with its own length, the N50 count is defined as the smallest number of scaffolds whose length sum makes up half of contig collection size |
scaffold_n90 | Given a set of scaffolds, each with its own length, the N90 count is defined as the smallest number of scaffolds whose length sum makes up 90% of contig collection size |
scaffold_powersum | Powersum of all scaffolds is the same as logsum except that it uses the sum of (length*(length^P)) for some power P (default P=0.25). |
scaffolds_n_over_50K | The number of scaffolds longer than 50,000 base pairs. |
scaffolds_percent_over_50K | The percentage of the total assembly length represented by scaffolds longer than 50,000 base pairs |
scaffolds_total_length_over_50k | The total length of scaffolds longer than 50,000 base pairs |
score | Output from the clustering protocol indicating how closely a member matches the representative. |
sequence | The protein amino acid sequence. |
sequence_id | Internal (CDM) unique identifier for a sequence. From the Entity table: entity_id where entity_type == 'Sequence'. |
sequences | All sequences in the schema. |
source | The source for a specific piece of information; should be a CDM internal ID of a source in the DataSource table. |
source_database | ID of the data source from which this entity came. |
specific_ecosystem | JGI GOLD descriptor representing the most specific level of ecosystem categorization. May be "Unclassified". |
start | The start and end coordinates of the feature are given in positive 1-based int coordinates, relative to the landmark given in column one. Start is always less than or equal to end. For features that cross the origin of a circular feature (e.g. most bacterial genomes, plasmids, and some viral genomes), the requirement for start to be less than or equal to end is satisfied by making end = the position of the end + the length of the landmark feature. For zero-length features, such as insertion sites, start equals end and the implied site is to the right of the indicated base in the direction of the landmark. |
statements | All statements in the schema. |
step | Text description of a step in a protocol. |
step_index | The number of the step in an ordered progression. |
strand | The strand of the feature. |
subject | The subject of the statement |
text_value | The value, as a text string. This field should only be used if the value is not represented in a controlled vocabulary, ontology, or enumeration. |
title | A string used as a title for a resource. |
title_type | A descriptor for the title for cases where the contents of the title field is not the primary name or title. |
type | The type of value being represented - e.g. QuantityValue, TextValue, DateTimeValue, ControlledVocabularyTermValue, etc. |
unit | The units used to measure the value of the variable, if applicable. Units should be expressed using the Unit Ontology or a term from UCUM. |
unit_cv_id | The unit of the quantity, expressed as a CURIE from the Unit Ontology. |
unit_cv_label | The unit of a quantity, expressed as the term name of a term from the Unit Ontology or UCUM. |
unit_string | Links a QuantityValue to a unit. Units should be taken from the UCUM unit collection or the Unit Ontology. This field should only be used if the unit is not present in one of those sources. |
updated | Date/timestamp for when the entity was updated in the CDM. |
url | The URL from which the data was loaded. |
value | Note the range of this slot is always a string. Only used if the triple represents a literal assertion |
value_cv_id | For values that are in a controlled vocabulary (CV), this attribute should capture the controlled vocabulary ID for the value. |
value_cv_label | For values that are in a controlled vocabulary, ontology, or enumeration, this attribute should capture the term from the controlled vocabulary. |
value_type | The type(s) of the value. |
variable_id | Internal CDM unique identifier for a variable. |
variable_value_id | Internal CDM unique identifier for a variable value. |
variable_values | All variable values in the schema. |
variables | All variables in the schema. |
version | The version of the resource. This must be an absolute version, not a relative version like 'latest'. |
Enumerations
Enumeration | Description |
---|---|
AttributeValueType | |
CdsPhaseType | For features of type CDS (coding sequence), the phase indicates where the feature begins with reference to the reading frame. The phase is one of the integers 0, 1, or 2, indicating the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon. |
ClusterType | The type of the entities in a cluster. Must be represented by a table in the CDM schema. |
ContigCollectionType | The type of the contig set; the type of the 'omics data set. Terms are taken from the Genomics Standards Consortium where possible. See the GSC checklists at https://genomicsstandardsconsortium.github.io/mixs/ for the controlled vocabularies used. |
ContributorRole | The type of contribution made by a contributor. |
ContributorType | The type of contributor being represented. |
DescriptionType | The type of text being represented. |
EntityType | The type of an entity. Must be represented by a table in the CDM schema. |
EventType | The type of date being represented. |
ProteinEvidenceForExistence | The evidence for the existence of a biological entity. See https://www.uniprot.org/help/protein_existence and https://www.ncbi.nlm.nih.gov/genbank/evidence/. |
ProtocolParameterType | An input, an operation parameter or switch, or an output for a protocol. |
RefSeqStatusType | RefSeq status codes, taken from https://www.ncbi.nlm.nih.gov/genbank/evidence/. |
RelationshipType | The relationship between two entities. For example, when a PermanentID class is used to represent objects in the CreditMetadata field related_identifiers , the relationship_type field captures the relationship between the resource being registered (A) and this ID (B). |
ResourceType | The type of resource being represented. |
SequenceType | The type of sequence being represented. |
StrandType | The strand that a feature appears on relative to a landmark. Also encompasses unknown or irrelevant strandedness. |
TitleType | The type of title being represented. |
VariableType | The type of the value of a variable. Should be a LinkML data type or one of the defined CDM data types. |
Types
Type | Description |
---|---|
Boolean | A binary (true or false) value |
CdmAssociationId | A CDM ID (cdm_id) that identifies an association in the CDM. |
CdmClusterId | A CDM ID (cdm_id) that identifies a cluster in the CDM. |
CdmContigCollectionId | A CDM ID (cdm_id) that identifies a contig collection in the CDM. |
CdmContigId | A CDM ID (cdm_id) that identifies a contig in the CDM. |
CdmContributorId | A CDM ID (cdm_id) that identifies a contributor in the CDM. |
CdmDataSourceId | A CDM ID (cdm_id) that identifies a data source in the CDM. |
CdmEncodedFeatureId | A CDM ID (cdm_id) that identifies an encoded feature in the CDM. |
CdmFeatureId | A CDM ID (cdm_id) that identifies a feature in the CDM. |
CdmId | A universally unique ID, generated using uuid4, with the prefix "CDM:". Used internally in the CDM. |
CdmLicenseId | A CDM ID (cdm_id) that identifies a license in the CDM. |
CdmProteinId | A CDM ID (cdm_id) that identifies a protein in the CDM. |
CdmProtocolId | A CDM ID (cdm_id) that identifies a protocol in the CDM. |
CdmSampleId | A CDM ID (cdm_id) that identifies a sample in the CDM. |
CdmSequenceId | A CDM ID (cdm_id) that identifies a sequence in the CDM. |
Curie | a compact URI |
Date | a date (year, month and day) in an idealized calendar |
DateOrDatetime | Either a date or a datetime |
Datetime | The combination of a date and time |
Decimal | A real number with arbitrary precision that conforms to the xsd:decimal specification |
DecimalDegree | A decimal degree expresses latitude or longitude as decimal fractions. |
Double | A real number that conforms to the xsd:double specification |
Float | A real number that conforms to the xsd:float specification |
Integer | An integer |
Iso8601 | A date in ISO 8601 format, e.g. 2024-04-05T12:34:56Z. "Z" indicates UTC time. |
Jsonpath | A string encoding a JSON Path. The value of the string MUST conform to JSON Point syntax and SHOULD dereference to zero or more valid objects within the current instance document when encoded in tree form. |
Jsonpointer | A string encoding a JSON Pointer. The value of the string MUST conform to JSON Point syntax and SHOULD dereference to a valid object within the current instance document when encoded in tree form. |
LiteralAsStringType | A literal represented as a string. |
LocalCurie | A CURIE that exists as a subject in the statements table (i.e. statements.subject ). Should not be used for external identifiers. |
LocalCurieName | The term name for an ontology term; should appear as an object in the statements table with a relationship indicating that it is the name of an ontology term. |
Ncname | Prefix part of CURIE |
NodeIdType | IDs are either CURIEs, IRI, or blank nodes. IRIs are wrapped in <>s to distinguish them from CURIEs, but in general it is good practice to populate the [prefixes][Prefixes.md] table such that they are shortened to CURIEs. Blank nodes are ids starting with _: . |
Nodeidentifier | A URI, CURIE or BNODE that represents a node in a model. |
Objectidentifier | A URI or CURIE that represents an object in the model. |
Sparqlpath | A string encoding a SPARQL Property Path. The value of the string MUST conform to SPARQL syntax and SHOULD dereference to zero or more valid objects within the current instance document when encoded as RDF. |
String | A character string |
Time | A time object represents a (local) time of day, independent of any particular day |
Uri | a complete URI |
Uriorcurie | a URI or a CURIE |
Subsets
Subset | Description |
---|---|