Skip to content

kbase_cdm

Schema for KBase CDM

URI: https://github.com/kbase/cdm-schema

Name: kbase_cdm

Classes

Class Description
FeatureAttributes Additional attributes of a feature, parsed from column 9 of a GFF file.
FeatureXProtein Captures the relationship between a feature and a protein; equivalent to feature encodes protein. An additional protocol ID may be specified to annotate the protocol used to elucidate or predict the relationship.
Identifier An external identifier for an entity. This should be a URI or CURIE.
MeasurementSet A series of qualitative or quantitative measurements.
Name The name or label for an entity. This may be a primary name, alternative name, synonym, acronym, or any other label used to refer to an entity.
NamedThing Abstract class to represent things with names.
        Event Something that happened.
        Experiment A discrete scientific procedure undertaken to make a discovery, test a hypothesis, or demonstrate a known fact.
        Measurement A qualitative or quantitative observation of an attribute of an object or event against a standardized scale, to enable it to be compared with other objects or events.
                ProcessedMeasurement A measurement that requires additional processing to generate a result.
        NamedThingWithId Abstract class to represent things with names and identifiers.
                Project Administrative unit for collecting data related to a certain topic, location, data type, grant funding, and so on.
                Protocol Defined method or set of methods .
                Sample A material entity that can be characterised by an experiment.
                UniqueNamedThing Represents an entity with a hash value generated from combining its unique attributes.
                        Contig A contig (derived from the word "contiguous") is a set of DNA segments or sequences that overlap in a way that provides a contiguous representation of a genomic region. A contig should not contain any gaps.
                        Contigset A set of individual, overlapping contigs that represent the complete sequenced genome of an organism.
                                Genome A contigset with a completeness score of greater than 90% and a contamination score of less than 5%.
                        EncodedFeature An entity generated from a feature, such as a transcript.
                        EnvironmentalContext The environmental context for the event.
                                GoldEnvironmentalContext Environmental context, described using JGI's five level system.
                                MixsEnvironmentalContext Environmental context, described using the MiXS convention of broad and local environment, plus the medium.
                        Feature A feature localized to an interval along a contig.
                        Protein Proteins are large, complex molecules made up of one or more long, folded chains of amino acids, whose sequences are determined by the DNA sequence of the protein-encoding gene.
        ProtocolParticipant Either an input or an output of a protocol.

Slots

Slot Description
asm_score A composite score for comparing contigset quality
attribute_name The name of the attribute
attribute_value The value of the attribute
cds_phase For features of type CDS, the phase indicates where the next codon begins rel...
checkm2_completeness Estimate of the completeness of a contigset (MAG or genome), estimated by Che...
checkm2_contamination Estimate of the contamination of a contigset (MAG or genome), estimated by Ch...
conditions TBD
contig_bp Total size in bp of all contigs
contig_id Internal (CDM) unique identifier
contigset_id Internal (CDM) unique identifier
created_at The time at which the event started or was created
ctg_L50 Given a set of contigs, the L50 is defined as the sequence length of the shor...
ctg_L90 The L90 statistic is less than or equal to the L50 statistic; it is the lengt...
ctg_logsum The sum of the (length*log(length)) of all contigs, times some constant
ctg_max Maximum contig length
ctg_N50 Given a set of contigs, each with its own length, the N50 count is defined as...
ctg_N90 Given a set of contigs, each with its own length, the N90 count is defined as...
ctg_powsum Powersum of all contigs is the same as logsum except that it uses the sum of ...
description Definition or description of the entity
doi The DOI for a protocol
e_value The 'score' of the feature
ecosystem JGI GOLD descriptor representing the top level ecosystem categorization
ecosystem_category JGI GOLD descriptor representing the ecosystem category
ecosystem_subtype JGI GOLD descriptor representing the subtype of ecosystem
ecosystem_type JGI GOLD descriptor representing the ecosystem type
encoded_by The feature(s) that encode this protein
encoded_feature_id Internal (CDM) unique identifier
encodes Known or predicted transcription products from this feature
end The start and end coordinates of the feature are given in positive 1-based in...
entity_id Internal (CDM) unique identifier for the entity that has the identifiers
env_broad_scale Report the major environmental system the sample or specimen came from
env_local_scale Report the entity or entities which are in the sample or specimen's local vic...
env_medium Report the environmental material(s) immediately surrounding the sample or sp...
environmental_context_id The environmental context for this event
event_id Internal (CDM) unique identifier
evidence_for_existence The evidence that this protein exists
experiment_id Internal (CDM) unique identifier
feature_id Internal (CDM) unique identifier
gap_pct The gap size percentage of all scaffolds
gc_avg The average GC content of the contigset, expressed as a percentage
gc_content GC content of the contig, expressed as a percentage
gc_std The standard deviation of GC content across the contigset
generated_by The algorithm or procedure that generated the feature
gold_environmental_context_id Internal (CDM) unique identifier
has_participant Participants in an experiment
has_stop_codon Captures whether or not the sequence includes a stop coordinates
hash Unique string generated by combining the defining attributes of the class
identifier Fully-qualifier URL or CURIE used as an identifier for an entity
identifiers URIs or CURIEs used to refer to this entity
inputs The inputs for a protocol; may be software parameters, experimental reagents,...
length Length of the contig in bp
location The location for this event
measurement_id Internal (CDM) unique identifier
mixs_environmental_context_id Internal (CDM) unique identifier
n_contigs Total number of contigs
n_scaffolds Total number of scaffolds
name The string used as a name
names Names, alternative names, and synonyms for an entity
object The object in a subject-predicate-object statement
outputs The outputs of a protocol; may be physical entities, files, etc
p_value The 'score' of the feature
part_of The project to which this experiment belongs
predicate The predicate in a subject-predicate-object statement
processed_measurement_id Internal (CDM) unique identifier
project_id Internal (CDM) unique identifier
protein_id Internal (CDM) unique identifier for a protein
protocol_id Internal (CDM) unique identifier for a protocol
quality The quality of the measurement, indicating the confidence that one can have i...
sample_id Internal (CDM) unique identifier
scaf_bp Total size in bp of all scaffolds
scaf_L50 Given a set of scaffolds, the L50 is defined as the sequence length of the sh...
scaf_L90 The L90 statistic is less than or equal to the L50 statistic; it is the lengt...
scaf_l_gt50k The total length of scaffolds longer than 50,000 base pairs
scaf_logsum The sum of the (length*log(length)) of all scaffolds, times some constant
scaf_max Maximum scaffold length
scaf_N50 Given a set of scaffolds, each with its own length, the N50 count is defined ...
scaf_N90 Given a set of scaffolds, each with its own length, the N90 count is defined ...
scaf_n_gt50K The number of scaffolds longer than 50,000 base pairs
scaf_pct_gt50K The percentage of the total assembly length represented by scaffolds longer t...
scaf_powsum Powersum of all scaffolds is the same as logsum except that it uses the sum o...
sequence The protein amino acid sequence
source The data source for the identifier
source_database ID of the data source from which this entity came
source_protocol ID of the protocol used to generate the feature
specific_ecosystem JGI GOLD descriptor representing the most specific level of ecosystem categor...
start The start and end coordinates of the feature are given in positive 1-based in...
strand The strand of the feature
subject The subject in a subject-predicate-object statement
type The type of the entity
unit Units used in the measurement
url The URL for a protocol
value Value of the measurement
version The version of the protocol that has been used

Enumerations

Enumeration Description
CdsPhaseType Descr
EvidenceForExistence The evidence for the existence of a biological entity
StrandType The strand that a feature appears on relative to a landmark

Types

Type Description
Boolean A binary (true or false) value
Curie a compact URI
Date a date (year, month and day) in an idealized calendar
DateOrDatetime Either a date or a datetime
Datetime The combination of a date and time
Decimal A real number with arbitrary precision that conforms to the xsd:decimal speci...
Double A real number that conforms to the xsd:double specification
Float A real number that conforms to the xsd:float specification
Integer An integer
Jsonpath A string encoding a JSON Path
Jsonpointer A string encoding a JSON Pointer
Ncname Prefix part of CURIE
Nodeidentifier A URI, CURIE or BNODE that represents a node in a model
Objectidentifier A URI or CURIE that represents an object in the model
Sparqlpath A string encoding a SPARQL Property Path
String A character string
Time A time object represents a (local) time of day, independent of any particular...
Uri a complete URI
Uriorcurie a URI or a CURIE
UUID A universally unique ID, generating using uuid4

Subsets

Subset Description