Skip to content

cdm_schema

Schema for KBase CDM

URI: http://kbase.github.io/cdm-schema/cdm_schema

Name: cdm_schema

Classes

Class Description
Any Used as a range for slots that have more than one possible type
Table root class for all schema entities
        Association An association between an object--typically an entity such as a protein or a ...
        AssociationXPublication Links associations to supporting literature
        AssociationXSupportingObject Links associations to entities to capture supporting objects in an associatio...
        AttributeValue A generic class for capturing tag-value information in a structured form
                Geolocation A normalized value for a location on the earth's surface
                QuantityValue A simple quantity, e
                        Measurement A qualitative or quantitative observation of an attribute of an object or eve...
                                ProcessedMeasurement A measurement that requires additional processing to generate a result
                TextValue A basic string value
        AttributeValueEntity Represents the link between an entity and its attribute values
        Cluster Represents an individual execution of a clustering protocol
        ClusterMember Relationship representing membership of a cluster
        Contig A contig (derived from the word "contiguous") is a set of DNA segments or seq...
        ContigXContigCollection Captures the relationship between a contig and a contig collection; equivalen...
        ContigXEncodedFeature Captures the relationship between a contig and an encoded feature
        ContigXFeature Captures the relationship between a contig and a feature; equivalent to featu...
        ContigXProtein Captures the relationship between a contig and a protein; equivalent to prote...
        ContigCollection A set of individual, overlapping contigs that represent the complete sequence...
        ContigCollectionXEncodedFeature Captures the relationship between a contig collection and an encoded feature
        ContigCollectionXFeature Captures the relationship between a contig collection and a feature; equivale...
        ContigCollectionXProtein Captures the relationship between a contig collection and a protein; equivale...
        Contributor Represents a contributor to the resource
        ContributorXRoleXExperiment
        ContributorXRoleXProject
        DataSource The source dataset from which data within the CDM was extracted
        EncodedFeature An entity generated from a feature, such as a transcript
        EncodedFeatureXFeature Captures the relationship between a feature and its transcription product
        EntailedEdge A relation graph edge that is inferred
        Entity A database entity
        EntityXMeasurement Captures a measurement made on an entity
        Event Something that happened
        Experiment A discrete scientific procedure undertaken to make a discovery, test a hypoth...
        ExperimentXProject Captures the relationship between an experiment and the project that it is a ...
        ExperimentXSample Represents the participation of a sample in an experiment
        Feature A feature localized to an interval along a contig
        FeatureXProtein Captures the relationship between a feature and a protein; equivalent to feat...
        GoldEnvironmentalContext Environmental context, described using JGI's five level system
        IdentifiedEntity Represents the link between an entity and its identifiers
        Identifier A string used as a resolvable (external) identifier for an entity
        MixsEnvironmentalContext Environmental context, described using the MiXS convention of broad and local...
        Name A string used as the name or label for an entity
        NamedEntity Represents the link between an entity and its names
        Prefix Maps CURIEs to URIs
        Project Administrative unit for collecting data related to a certain topic, location,...
        Protein Proteins are large, complex molecules made up of one or more long, folded cha...
        Protocol Defined method or set of methods
        ProtocolXProtocolParticipant
        ProtocolParticipant Either an input or an output of a protocol
        Publication A publication (e
        Sample A material entity that can be characterised by an experiment
        Sequence
        Statements Represents an RDF triple

Slots

Slot Description
aggregator_knowledge_source The knowledge source that aggregated the association
annotation_date The date when the annotation was made
asm_score A composite score for comparing contig collection quality
association_id Internal (CDM) unique identifier for an association
attribute_cv_term_id If the attribute is a term from a controlled vocabulary, the ID of the term
attribute_name The attribute being captured in this annotation
base The base URI a prefix will expand to
cds_phase For features of type CDS, the phase indicates where the next codon begins rel...
checkm2_completeness Estimate of the completeness of a contig collection (MAG or genome), estimate...
checkm2_contamination Estimate of the contamination of a contig collection (MAG or genome), estimat...
checksum The checksum of the sequence, used to verify its integrity
cluster_id Internal (CDM) unique identifier for a cluster
comments Any comments about the association
contig_bp Total size in bp of all contigs
contig_collection_id Internal (CDM) unique identifier for a contig collection
contig_collection_type The type of contig collection
contig_id Internal (CDM) unique identifier for a contig
contributor_id Internal (CDM) unique identifier for a contributor
contributor_role Role(s) played by the contributor when working on the experiment
contributor_type Must be either 'Person' or 'Organization'
created Date/timestamp for when the entity was created or added to the CDM
created_at The time at which the event started or was created
ctg_L50 Given a set of contigs, the L50 is defined as the sequence length of the shor...
ctg_L90 The L90 statistic is less than or equal to the L50 statistic; it is the lengt...
ctg_logsum The sum of the (length*log(length)) of all contigs, times some constant
ctg_max Maximum contig length
ctg_N50 Given a set of contigs, each with its own length, the N50 count is defined as...
ctg_N90 Given a set of contigs, each with its own length, the N90 count is defined as...
ctg_powsum Powersum of all contigs is the same as logsum except that it uses the sum of ...
data_source_created Date/timestamp for when the entity was created or added to the data source
data_source_entity_id The primary ID of the entity at the data source
data_source_id Internal (CDM) unique identifier for a data source
data_source_updated Date/timestamp for when the entity was updated in the data source
datatype the rdf datatype of the value, for example, xsd:string
date_accessed The date when the data was downloaded from the data source
description Brief textual definition or description
doi The DOI for a protocol
e_value The 'score' of the feature
ecosystem JGI GOLD descriptor representing the top level ecosystem categorization
ecosystem_category JGI GOLD descriptor representing the ecosystem category
ecosystem_subtype JGI GOLD descriptor representing the subtype of ecosystem
ecosystem_type JGI GOLD descriptor representing the ecosystem type
encoded_feature_id Internal (CDM) unique identifier for an encoded feature
end The start and end coordinates of the feature are given in positive 1-based in...
entity_id Internal (CDM) unique identifier for an entity
entity_type Type of entity being clustered
env_broad_scale Report the major environmental system the sample or specimen came from
env_local_scale Report the entity or entities which are in the sample or specimen's local vic...
env_medium Report the environmental material(s) immediately surrounding the sample or sp...
event_id Internal (CDM) unique identifier for an event
evidence_for_existence The evidence that this protein exists
evidence_type The type of evidence supporting the association
experiment_id Internal (CDM) unique identifier for an experiment
family_name The family name(s) of the contributor
feature_id Internal (CDM) unique identifier for a feature
gap_pct The gap size percentage of all scaffolds
gc_avg The average GC content of the contig collection, expressed as a percentage
gc_content GC content of the contig, expressed as a percentage
gc_std The standard deviation of GC content across the contig collection
given_name The given name(s) of the contributor
gold_environmental_context_id Internal (CDM) unique identifier for a GOLD environmental context
has_stop_codon Captures whether or not the sequence includes stop coordinates
hash A hash value generated from one or more object attributes that serves to ensu...
id An identifier for an element
identifier Fully-qualified URL or CURIE used as an identifier for an entity
is_representative Whether or not this member is the representative for the cluster
is_seed Whether or not this is the seed for this cluster
language the human language in which the value is encoded, e
latitude
length Length of the contig in bp
location The location for this event
longitude
maximum_value If the quantity describes a range, represents the upper bound of the range
measurement_id Internal (CDM) unique identifier for a measurement
minimum_value If the quantity describes a range, represents the lower bound of the range
mixs_environmental_context_id Internal (CDM) unique identifier for a mixs environmental context
n_contigs Total number of contigs
n_scaffolds Total number of scaffolds
name A string used as a name or title
negated If true, the relationship between the subject and object is negated
object Note the range of this slot is always a node
p_value The 'score' of the feature
participant_type The type of participant in the protocol
predicate The predicate of the statement
prefix A standardized prefix such as 'GO' or 'rdf' or 'FlyBase'
primary_knowledge_source The knowledge source that created the association
project_id Internal (CDM) unique identifier for a project
protein_id Internal (CDM) unique identifier for a protein
protocol_id Internal (CDM) unique identifier for a protocol
protocol_participant_id The unique identifier for the protocol participant
publication_id Unique identifier for a publication - e
quality The quality of the measurement, indicating the confidence that one can have i...
raw_value Raw value from the source data
relationship Relationship between this identifier and the entity in the entity_id field
sample_id Internal (CDM) unique identifier for a sample
scaf_bp Total size in bp of all scaffolds
scaf_L50 Given a set of scaffolds, the L50 is defined as the sequence length of the sh...
scaf_L90 The L90 statistic is less than or equal to the L50 statistic; it is the lengt...
scaf_l_gt50k The total length of scaffolds longer than 50,000 base pairs
scaf_logsum The sum of the (length*log(length)) of all scaffolds, times some constant
scaf_max Maximum scaffold length
scaf_N50 Given a set of scaffolds, each with its own length, the N50 count is defined ...
scaf_N90 Given a set of scaffolds, each with its own length, the N90 count is defined ...
scaf_n_gt50K The number of scaffolds longer than 50,000 base pairs
scaf_pct_gt50K The percentage of the total assembly length represented by scaffolds longer t...
scaf_powsum Powersum of all scaffolds is the same as logsum except that it uses the sum o...
score Output from the clustering protocol indicating how closely a member matches t...
sequence The protein amino acid sequence
sequence_id Internal (CDM) unique identifier for a sequence
source The source for a specific piece of information; should be a CDM internal ID o...
source_database ID of the data source from which this entity came
specific_ecosystem JGI GOLD descriptor representing the most specific level of ecosystem categor...
start The start and end coordinates of the feature are given in positive 1-based in...
strand The strand of the feature
subject The subject of the statement
type The type of the entity
unit The unit of the quantity
updated Date/timestamp for when the entity was updated in the CDM
url The URL from which the data was loaded
value Note the range of this slot is always a string
value_cv_term_id If the term comes from the controlled vocabulary, the CURIE for the term
version For versioned data sources, the version of the dataset

Enumerations

Enumeration Description
CdsPhaseType For features of type CDS (coding sequence), the phase indicates where the fea...
ClusterType The type of the entities in a cluster
ContigCollectionType The type of the contig set; the type of the 'omics data set
ContributorRole The role of a contributor to a resource
ContributorType The type of contributor being represented
EntityType The type of an entity
ProteinEvidenceForExistence The evidence for the existence of a biological entity
RefSeqStatusType RefSeq status codes, taken from https://www
SequenceType The type of sequence being represented
StrandType The strand that a feature appears on relative to a landmark

Types

Type Description
Boolean A binary (true or false) value
Curie a compact URI
DataSourceUuid A UUID that identifies a data source in the CDM
Date a date (year, month and day) in an idealized calendar
DateOrDatetime Either a date or a datetime
Datetime The combination of a date and time
Decimal A real number with arbitrary precision that conforms to the xsd:decimal speci...
Double A real number that conforms to the xsd:double specification
Float A real number that conforms to the xsd:float specification
Integer An integer
Iso8601 A date in ISO 8601 format, e
Jsonpath A string encoding a JSON Path
Jsonpointer A string encoding a JSON Pointer
LiteralAsStringType
LocalCurie A CURIE that exists as a subject in the statements table (i
Ncname Prefix part of CURIE
NodeIdType IDs are either CURIEs, IRI, or blank nodes
Nodeidentifier A URI, CURIE or BNODE that represents a node in a model
Objectidentifier A URI or CURIE that represents an object in the model
Sparqlpath A string encoding a SPARQL Property Path
String A character string
Time A time object represents a (local) time of day, independent of any particular...
Uri a complete URI
Uriorcurie a URI or a CURIE
UUID A universally unique ID, generated using uuid4, with the prefix "CDM:"

Subsets

Subset Description