cdm_schema

Schema for KBase CDM

URI: http://kbase.github.io/cdm-schema/cdm_schema

Name: cdm_schema

Classes

Class	Description
Any	Used as a range for slots that have more than one possible type
Table	root class for all schema entities
Association	An association between an object--typically an entity such as a protein or a ...
AssociationXPublication	Links associations to supporting literature
AssociationXSupportingObject	Links associations to entities to capture supporting objects in an associatio...
AttributeValue	A generic class for capturing tag-value information in a structured form
Geolocation	A normalized value for a location on the earth's surface
QuantityValue	A simple quantity, e
Measurement	A qualitative or quantitative observation of an attribute of an object or eve...
ProcessedMeasurement	A measurement that requires additional processing to generate a result
TextValue	A basic string value
AttributeValueEntity	Represents the link between an entity and its attribute values
Cluster	Represents an individual execution of a clustering protocol
ClusterMember	Relationship representing membership of a cluster
Contig	A contig (derived from the word "contiguous") is a set of DNA segments or seq...
ContigXContigCollection	Captures the relationship between a contig and a contig collection; equivalen...
ContigXEncodedFeature	Captures the relationship between a contig and an encoded feature
ContigXFeature	Captures the relationship between a contig and a feature; equivalent to featu...
ContigXProtein	Captures the relationship between a contig and a protein; equivalent to prote...
ContigCollection	A set of individual, overlapping contigs that represent the complete sequence...
ContigCollectionXEncodedFeature	Captures the relationship between a contig collection and an encoded feature
ContigCollectionXFeature	Captures the relationship between a contig collection and a feature; equivale...
ContigCollectionXProtein	Captures the relationship between a contig collection and a protein; equivale...
Contributor	Represents a contributor to the resource
ContributorXRoleXExperiment
ContributorXRoleXProject
DataSource	The source dataset from which data within the CDM was extracted
EncodedFeature	An entity generated from a feature, such as a transcript
EncodedFeatureXFeature	Captures the relationship between a feature and its transcription product
EntailedEdge	A relation graph edge that is inferred
Entity	A database entity
EntityXMeasurement	Captures a measurement made on an entity
Event	Something that happened
Experiment	A discrete scientific procedure undertaken to make a discovery, test a hypoth...
ExperimentXProject	Captures the relationship between an experiment and the project that it is a ...
ExperimentXSample	Represents the participation of a sample in an experiment
Feature	A feature localized to an interval along a contig
FeatureXProtein	Captures the relationship between a feature and a protein; equivalent to feat...
GoldEnvironmentalContext	Environmental context, described using JGI's five level system
IdentifiedEntity	Represents the link between an entity and its identifiers
Identifier	A string used as a resolvable (external) identifier for an entity
MixsEnvironmentalContext	Environmental context, described using the MiXS convention of broad and local...
Name	A string used as the name or label for an entity
NamedEntity	Represents the link between an entity and its names
Prefix	Maps CURIEs to URIs
Project	Administrative unit for collecting data related to a certain topic, location,...
Protein	Proteins are large, complex molecules made up of one or more long, folded cha...
Protocol	Defined method or set of methods
ProtocolXProtocolParticipant
ProtocolParticipant	Either an input or an output of a protocol
Publication	A publication (e
Sample	A material entity that can be characterised by an experiment
Sequence
Statements	Represents an RDF triple

Slots

Slot	Description
aggregator_knowledge_source	The knowledge source that aggregated the association
annotation_date	The date when the annotation was made
asm_score	A composite score for comparing contig collection quality
association_id	Internal (CDM) unique identifier for an association
attribute_cv_term_id	If the attribute is a term from a controlled vocabulary, the ID of the term
attribute_name	The attribute being captured in this annotation
base	The base URI a prefix will expand to
cds_phase	For features of type CDS, the phase indicates where the next codon begins rel...
checkm2_completeness	Estimate of the completeness of a contig collection (MAG or genome), estimate...
checkm2_contamination	Estimate of the contamination of a contig collection (MAG or genome), estimat...
checksum	The checksum of the sequence, used to verify its integrity
cluster_id	Internal (CDM) unique identifier for a cluster
comments	Any comments about the association
contig_bp	Total size in bp of all contigs
contig_collection_id	Internal (CDM) unique identifier for a contig collection
contig_collection_type	The type of contig collection
contig_id	Internal (CDM) unique identifier for a contig
contributor_id	Internal (CDM) unique identifier for a contributor
contributor_role	Role(s) played by the contributor when working on the experiment
contributor_type	Must be either 'Person' or 'Organization'
created	Date/timestamp for when the entity was created or added to the CDM
created_at	The time at which the event started or was created
ctg_L50	Given a set of contigs, the L50 is defined as the sequence length of the shor...
ctg_L90	The L90 statistic is less than or equal to the L50 statistic; it is the lengt...
ctg_logsum	The sum of the (length*log(length)) of all contigs, times some constant
ctg_max	Maximum contig length
ctg_N50	Given a set of contigs, each with its own length, the N50 count is defined as...
ctg_N90	Given a set of contigs, each with its own length, the N90 count is defined as...
ctg_powsum	Powersum of all contigs is the same as logsum except that it uses the sum of ...
data_source_created	Date/timestamp for when the entity was created or added to the data source
data_source_entity_id	The primary ID of the entity at the data source
data_source_id	Internal (CDM) unique identifier for a data source
data_source_updated	Date/timestamp for when the entity was updated in the data source
datatype	the rdf datatype of the value, for example, xsd:string
date_accessed	The date when the data was downloaded from the data source
description	Brief textual definition or description
doi	The DOI for a protocol
e_value	The 'score' of the feature
ecosystem	JGI GOLD descriptor representing the top level ecosystem categorization
ecosystem_category	JGI GOLD descriptor representing the ecosystem category
ecosystem_subtype	JGI GOLD descriptor representing the subtype of ecosystem
ecosystem_type	JGI GOLD descriptor representing the ecosystem type
encoded_feature_id	Internal (CDM) unique identifier for an encoded feature
end	The start and end coordinates of the feature are given in positive 1-based in...
entity_id	Internal (CDM) unique identifier for an entity
entity_type	Type of entity being clustered
env_broad_scale	Report the major environmental system the sample or specimen came from
env_local_scale	Report the entity or entities which are in the sample or specimen's local vic...
env_medium	Report the environmental material(s) immediately surrounding the sample or sp...
event_id	Internal (CDM) unique identifier for an event
evidence_for_existence	The evidence that this protein exists
evidence_type	The type of evidence supporting the association
experiment_id	Internal (CDM) unique identifier for an experiment
family_name	The family name(s) of the contributor
feature_id	Internal (CDM) unique identifier for a feature
gap_pct	The gap size percentage of all scaffolds
gc_avg	The average GC content of the contig collection, expressed as a percentage
gc_content	GC content of the contig, expressed as a percentage
gc_std	The standard deviation of GC content across the contig collection
given_name	The given name(s) of the contributor
gold_environmental_context_id	Internal (CDM) unique identifier for a GOLD environmental context
has_stop_codon	Captures whether or not the sequence includes stop coordinates
hash	A hash value generated from one or more object attributes that serves to ensu...
id	An identifier for an element
identifier	Fully-qualified URL or CURIE used as an identifier for an entity
is_representative	Whether or not this member is the representative for the cluster
is_seed	Whether or not this is the seed for this cluster
language	the human language in which the value is encoded, e
latitude
length	Length of the contig in bp
location	The location for this event
longitude
maximum_value	If the quantity describes a range, represents the upper bound of the range
measurement_id	Internal (CDM) unique identifier for a measurement
minimum_value	If the quantity describes a range, represents the lower bound of the range
mixs_environmental_context_id	Internal (CDM) unique identifier for a mixs environmental context
n_contigs	Total number of contigs
n_scaffolds	Total number of scaffolds
name	A string used as a name or title
negated	If true, the relationship between the subject and object is negated
object	Note the range of this slot is always a node
p_value	The 'score' of the feature
participant_type	The type of participant in the protocol
predicate	The predicate of the statement
prefix	A standardized prefix such as 'GO' or 'rdf' or 'FlyBase'
primary_knowledge_source	The knowledge source that created the association
project_id	Internal (CDM) unique identifier for a project
protein_id	Internal (CDM) unique identifier for a protein
protocol_id	Internal (CDM) unique identifier for a protocol
protocol_participant_id	The unique identifier for the protocol participant
publication_id	Unique identifier for a publication - e
quality	The quality of the measurement, indicating the confidence that one can have i...
raw_value	Raw value from the source data
relationship	Relationship between this identifier and the entity in the `entity_id` field
sample_id	Internal (CDM) unique identifier for a sample
scaf_bp	Total size in bp of all scaffolds
scaf_L50	Given a set of scaffolds, the L50 is defined as the sequence length of the sh...
scaf_L90	The L90 statistic is less than or equal to the L50 statistic; it is the lengt...
scaf_l_gt50k	The total length of scaffolds longer than 50,000 base pairs
scaf_logsum	The sum of the (length*log(length)) of all scaffolds, times some constant
scaf_max	Maximum scaffold length
scaf_N50	Given a set of scaffolds, each with its own length, the N50 count is defined ...
scaf_N90	Given a set of scaffolds, each with its own length, the N90 count is defined ...
scaf_n_gt50K	The number of scaffolds longer than 50,000 base pairs
scaf_pct_gt50K	The percentage of the total assembly length represented by scaffolds longer t...
scaf_powsum	Powersum of all scaffolds is the same as logsum except that it uses the sum o...
score	Output from the clustering protocol indicating how closely a member matches t...
sequence	The protein amino acid sequence
sequence_id	Internal (CDM) unique identifier for a sequence
source	The source for a specific piece of information; should be a CDM internal ID o...
source_database	ID of the data source from which this entity came
specific_ecosystem	JGI GOLD descriptor representing the most specific level of ecosystem categor...
start	The start and end coordinates of the feature are given in positive 1-based in...
strand	The strand of the feature
subject	The subject of the statement
type	The type of the entity
unit	The unit of the quantity
updated	Date/timestamp for when the entity was updated in the CDM
url	The URL from which the data was loaded
value	Note the range of this slot is always a string
value_cv_term_id	If the term comes from the controlled vocabulary, the CURIE for the term
version	For versioned data sources, the version of the dataset

Enumerations

Enumeration	Description
CdsPhaseType	For features of type CDS (coding sequence), the phase indicates where the fea...
ClusterType	The type of the entities in a cluster
ContigCollectionType	The type of the contig set; the type of the 'omics data set
ContributorRole	The role of a contributor to a resource
ContributorType	The type of contributor being represented
EntityType	The type of an entity
ProteinEvidenceForExistence	The evidence for the existence of a biological entity
RefSeqStatusType	RefSeq status codes, taken from https://www
SequenceType	The type of sequence being represented
StrandType	The strand that a feature appears on relative to a landmark

Types

Type	Description
Boolean	A binary (true or false) value
Curie	a compact URI
DataSourceUuid	A UUID that identifies a data source in the CDM
Date	a date (year, month and day) in an idealized calendar
DateOrDatetime	Either a date or a datetime
Datetime	The combination of a date and time
Decimal	A real number with arbitrary precision that conforms to the xsd:decimal speci...
Double	A real number that conforms to the xsd:double specification
Float	A real number that conforms to the xsd:float specification
Integer	An integer
Iso8601	A date in ISO 8601 format, e
Jsonpath	A string encoding a JSON Path
Jsonpointer	A string encoding a JSON Pointer
LiteralAsStringType
LocalCurie	A CURIE that exists as a subject in the `statements` table (i
Ncname	Prefix part of CURIE
NodeIdType	IDs are either CURIEs, IRI, or blank nodes
Nodeidentifier	A URI, CURIE or BNODE that represents a node in a model
Objectidentifier	A URI or CURIE that represents an object in the model
Sparqlpath	A string encoding a SPARQL Property Path
String	A character string
Time	A time object represents a (local) time of day, independent of any particular...
Uri	a complete URI
Uriorcurie	a URI or a CURIE
UUID	A universally unique ID, generated using uuid4, with the prefix "CDM:"

Subsets

Subset	Description