Class: Contigset
A set of individual, overlapping contigs that represent the complete sequenced genome of an organism.
URI: kb_cdm:Contigset
classDiagram
class Contigset
click Contigset href "../Contigset"
UniqueNamedThing <|-- Contigset
click UniqueNamedThing href "../UniqueNamedThing"
Contigset <|-- Genome
click Genome href "../Genome"
Contigset : asm_score
Contigset : checkm2_completeness
Contigset : checkm2_contamination
Contigset : contig_bp
Contigset : contigset_id
Contigset : ctg_L50
Contigset : ctg_L90
Contigset : ctg_logsum
Contigset : ctg_max
Contigset : ctg_N50
Contigset : ctg_N90
Contigset : ctg_powsum
Contigset : description
Contigset : gap_pct
Contigset : gc_avg
Contigset : gc_std
Contigset : hash
Contigset : identifiers
Contigset --> "*" Identifier : identifiers
click Identifier href "../Identifier"
Contigset : n_contigs
Contigset : n_scaffolds
Contigset : names
Contigset --> "*" Name : names
click Name href "../Name"
Contigset : scaf_bp
Contigset : scaf_L50
Contigset : scaf_L90
Contigset : scaf_l_gt50k
Contigset : scaf_logsum
Contigset : scaf_max
Contigset : scaf_N50
Contigset : scaf_N90
Contigset : scaf_n_gt50K
Contigset : scaf_pct_gt50K
Contigset : scaf_powsum
Inheritance
- NamedThing
- NamedThingWithId
- UniqueNamedThing
- Contigset
- UniqueNamedThing
- NamedThingWithId
Slots
Name | Cardinality and Range | Description | Inheritance |
---|---|---|---|
asm_score | 0..1 Float |
A composite score for comparing contigset quality | direct |
checkm2_completeness | 0..1 Float |
Estimate of the completeness of a contigset (MAG or genome), estimated by Che... | direct |
checkm2_contamination | 0..1 Float |
Estimate of the contamination of a contigset (MAG or genome), estimated by Ch... | direct |
contigset_id | 1 UUID |
Internal (CDM) unique identifier | direct |
contig_bp | 0..1 Integer |
Total size in bp of all contigs | direct |
ctg_L50 | 0..1 Integer |
Given a set of contigs, the L50 is defined as the sequence length of the shor... | direct |
ctg_L90 | 0..1 Integer |
The L90 statistic is less than or equal to the L50 statistic; it is the lengt... | direct |
ctg_N50 | 0..1 Integer |
Given a set of contigs, each with its own length, the N50 count is defined as... | direct |
ctg_N90 | 0..1 Integer |
Given a set of contigs, each with its own length, the N90 count is defined as... | direct |
ctg_logsum | 0..1 Float |
The sum of the (length*log(length)) of all contigs, times some constant | direct |
ctg_max | 0..1 Integer |
Maximum contig length | direct |
ctg_powsum | 0..1 Float |
Powersum of all contigs is the same as logsum except that it uses the sum of ... | direct |
gap_pct | 0..1 Float |
The gap size percentage of all scaffolds | direct |
gc_avg | 0..1 Float |
The average GC content of the contigset, expressed as a percentage | direct |
gc_std | 0..1 Float |
The standard deviation of GC content across the contigset | direct |
n_contigs | 0..1 Integer |
Total number of contigs | direct |
n_scaffolds | 0..1 Integer |
Total number of scaffolds | direct |
scaf_L50 | 0..1 Integer |
Given a set of scaffolds, the L50 is defined as the sequence length of the sh... | direct |
scaf_L90 | 0..1 Integer |
The L90 statistic is less than or equal to the L50 statistic; it is the lengt... | direct |
scaf_N50 | 0..1 Integer |
Given a set of scaffolds, each with its own length, the N50 count is defined ... | direct |
scaf_N90 | 0..1 Integer |
Given a set of scaffolds, each with its own length, the N90 count is defined ... | direct |
scaf_bp | 0..1 Integer |
Total size in bp of all scaffolds | direct |
scaf_l_gt50k | 0..1 Integer |
The total length of scaffolds longer than 50,000 base pairs | direct |
scaf_logsum | 0..1 Float |
The sum of the (length*log(length)) of all scaffolds, times some constant | direct |
scaf_max | 0..1 Integer |
Maximum scaffold length | direct |
scaf_n_gt50K | 0..1 Integer |
The number of scaffolds longer than 50,000 base pairs | direct |
scaf_pct_gt50K | 0..1 Float |
The percentage of the total assembly length represented by scaffolds longer t... | direct |
scaf_powsum | 0..1 Float |
Powersum of all scaffolds is the same as logsum except that it uses the sum o... | direct |
hash | 0..1 String |
A hash value generated from one or more object attributes that serves to ensu... | UniqueNamedThing |
identifiers | * Identifier |
URIs or CURIEs used to refer to this entity | NamedThingWithId |
description | 0..1 String |
Definition or description of the entity | NamedThing |
names | * Name |
Names, alternative names, and synonyms for an entity | NamedThing |
Aliases
- genome
- biological subject
- assembly
Identifier and Mapping Information
Schema Source
- from schema: https://github.com/kbase/cdm-schema
Mappings
Mapping Type | Mapped Value |
---|---|
self | kb_cdm:Contigset |
native | kb_cdm:Contigset |
LinkML Source
Direct
name: Contigset
description: A set of individual, overlapping contigs that represent the complete
sequenced genome of an organism.
from_schema: https://github.com/kbase/cdm-schema
aliases:
- genome
- biological subject
- assembly
is_a: UniqueNamedThing
attributes:
asm_score:
name: asm_score
description: A composite score for comparing contigset quality
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: float
checkm2_completeness:
name: checkm2_completeness
description: Estimate of the completeness of a contigset (MAG or genome), estimated
by CheckM2 tool
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: float
checkm2_contamination:
name: checkm2_contamination
description: Estimate of the contamination of a contigset (MAG or genome), estimated
by CheckM2 tool
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: float
contigset_id:
name: contigset_id
description: Internal (CDM) unique identifier.
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
identifier: true
domain_of:
- Contigset
range: UUID
required: true
contig_bp:
name: contig_bp
description: Total size in bp of all contigs
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
ctg_L50:
name: ctg_L50
description: Given a set of contigs, the L50 is defined as the sequence length
of the shortest contig at 50% of the total contigset length
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
ctg_L90:
name: ctg_L90
description: The L90 statistic is less than or equal to the L50 statistic; it
is the length for which the collection of all contigs of that length or longer
contains at least 90% of the sum of the lengths of all contigs
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
ctg_N50:
name: ctg_N50
description: Given a set of contigs, each with its own length, the N50 count is
defined as the smallest number_of_contigs whose length sum makes up half of
contigset size
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
ctg_N90:
name: ctg_N90
description: Given a set of contigs, each with its own length, the N90 count is
defined as the smallest number of contigs whose length sum makes up 90% of contigset
size
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
ctg_logsum:
name: ctg_logsum
description: The sum of the (length*log(length)) of all contigs, times some constant.
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: float
ctg_max:
name: ctg_max
description: Maximum contig length
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
ctg_powsum:
name: ctg_powsum
description: Powersum of all contigs is the same as logsum except that it uses
the sum of (length*(length^P)) for some power P (default P=0.25)
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: float
gap_pct:
name: gap_pct
description: The gap size percentage of all scaffolds
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: float
gc_avg:
name: gc_avg
description: The average GC content of the contigset, expressed as a percentage
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: float
gc_std:
name: gc_std
description: The standard deviation of GC content across the contigset
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: float
n_contigs:
name: n_contigs
description: Total number of contigs
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
n_scaffolds:
name: n_scaffolds
description: Total number of scaffolds
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
scaf_L50:
name: scaf_L50
description: Given a set of scaffolds, the L50 is defined as the sequence length
of the shortest scaffold at 50% of the total contigset length
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
scaf_L90:
name: scaf_L90
description: The L90 statistic is less than or equal to the L50 statistic; it
is the length for which the collection of all scaffolds of that length or longer
contains at least 90% of the sum of the lengths of all scaffolds.
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
scaf_N50:
name: scaf_N50
description: Given a set of scaffolds, each with its own length, the N50 count
is defined as the smallest number of scaffolds whose length sum makes up half
of contigset size
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
scaf_N90:
name: scaf_N90
description: Given a set of scaffolds, each with its own length, the N90 count
is defined as the smallest number of scaffolds whose length sum makes up 90%
of contigset size
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
scaf_bp:
name: scaf_bp
description: Total size in bp of all scaffolds
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
scaf_l_gt50k:
name: scaf_l_gt50k
description: The total length of scaffolds longer than 50,000 base pairs
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
scaf_logsum:
name: scaf_logsum
description: The sum of the (length*log(length)) of all scaffolds, times some
constant. Increase the contiguity, the score will increase
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: float
scaf_max:
name: scaf_max
description: Maximum scaffold length
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
scaf_n_gt50K:
name: scaf_n_gt50K
description: The number of scaffolds longer than 50,000 base pairs.
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: integer
scaf_pct_gt50K:
name: scaf_pct_gt50K
description: The percentage of the total assembly length represented by scaffolds
longer than 50,000 base pairs
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: float
scaf_powsum:
name: scaf_powsum
description: Powersum of all scaffolds is the same as logsum except that it uses
the sum of (length*(length^P)) for some power P (default P=0.25).
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
domain_of:
- Contigset
range: float
Induced
name: Contigset
description: A set of individual, overlapping contigs that represent the complete
sequenced genome of an organism.
from_schema: https://github.com/kbase/cdm-schema
aliases:
- genome
- biological subject
- assembly
is_a: UniqueNamedThing
attributes:
asm_score:
name: asm_score
description: A composite score for comparing contigset quality
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: asm_score
owner: Contigset
domain_of:
- Contigset
range: float
checkm2_completeness:
name: checkm2_completeness
description: Estimate of the completeness of a contigset (MAG or genome), estimated
by CheckM2 tool
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: checkm2_completeness
owner: Contigset
domain_of:
- Contigset
range: float
checkm2_contamination:
name: checkm2_contamination
description: Estimate of the contamination of a contigset (MAG or genome), estimated
by CheckM2 tool
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: checkm2_contamination
owner: Contigset
domain_of:
- Contigset
range: float
contigset_id:
name: contigset_id
description: Internal (CDM) unique identifier.
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
identifier: true
alias: contigset_id
owner: Contigset
domain_of:
- Contigset
range: UUID
required: true
contig_bp:
name: contig_bp
description: Total size in bp of all contigs
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: contig_bp
owner: Contigset
domain_of:
- Contigset
range: integer
ctg_L50:
name: ctg_L50
description: Given a set of contigs, the L50 is defined as the sequence length
of the shortest contig at 50% of the total contigset length
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: ctg_L50
owner: Contigset
domain_of:
- Contigset
range: integer
ctg_L90:
name: ctg_L90
description: The L90 statistic is less than or equal to the L50 statistic; it
is the length for which the collection of all contigs of that length or longer
contains at least 90% of the sum of the lengths of all contigs
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: ctg_L90
owner: Contigset
domain_of:
- Contigset
range: integer
ctg_N50:
name: ctg_N50
description: Given a set of contigs, each with its own length, the N50 count is
defined as the smallest number_of_contigs whose length sum makes up half of
contigset size
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: ctg_N50
owner: Contigset
domain_of:
- Contigset
range: integer
ctg_N90:
name: ctg_N90
description: Given a set of contigs, each with its own length, the N90 count is
defined as the smallest number of contigs whose length sum makes up 90% of contigset
size
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: ctg_N90
owner: Contigset
domain_of:
- Contigset
range: integer
ctg_logsum:
name: ctg_logsum
description: The sum of the (length*log(length)) of all contigs, times some constant.
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: ctg_logsum
owner: Contigset
domain_of:
- Contigset
range: float
ctg_max:
name: ctg_max
description: Maximum contig length
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: ctg_max
owner: Contigset
domain_of:
- Contigset
range: integer
ctg_powsum:
name: ctg_powsum
description: Powersum of all contigs is the same as logsum except that it uses
the sum of (length*(length^P)) for some power P (default P=0.25)
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: ctg_powsum
owner: Contigset
domain_of:
- Contigset
range: float
gap_pct:
name: gap_pct
description: The gap size percentage of all scaffolds
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: gap_pct
owner: Contigset
domain_of:
- Contigset
range: float
gc_avg:
name: gc_avg
description: The average GC content of the contigset, expressed as a percentage
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: gc_avg
owner: Contigset
domain_of:
- Contigset
range: float
gc_std:
name: gc_std
description: The standard deviation of GC content across the contigset
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: gc_std
owner: Contigset
domain_of:
- Contigset
range: float
n_contigs:
name: n_contigs
description: Total number of contigs
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: n_contigs
owner: Contigset
domain_of:
- Contigset
range: integer
n_scaffolds:
name: n_scaffolds
description: Total number of scaffolds
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: n_scaffolds
owner: Contigset
domain_of:
- Contigset
range: integer
scaf_L50:
name: scaf_L50
description: Given a set of scaffolds, the L50 is defined as the sequence length
of the shortest scaffold at 50% of the total contigset length
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: scaf_L50
owner: Contigset
domain_of:
- Contigset
range: integer
scaf_L90:
name: scaf_L90
description: The L90 statistic is less than or equal to the L50 statistic; it
is the length for which the collection of all scaffolds of that length or longer
contains at least 90% of the sum of the lengths of all scaffolds.
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: scaf_L90
owner: Contigset
domain_of:
- Contigset
range: integer
scaf_N50:
name: scaf_N50
description: Given a set of scaffolds, each with its own length, the N50 count
is defined as the smallest number of scaffolds whose length sum makes up half
of contigset size
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: scaf_N50
owner: Contigset
domain_of:
- Contigset
range: integer
scaf_N90:
name: scaf_N90
description: Given a set of scaffolds, each with its own length, the N90 count
is defined as the smallest number of scaffolds whose length sum makes up 90%
of contigset size
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: scaf_N90
owner: Contigset
domain_of:
- Contigset
range: integer
scaf_bp:
name: scaf_bp
description: Total size in bp of all scaffolds
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: scaf_bp
owner: Contigset
domain_of:
- Contigset
range: integer
scaf_l_gt50k:
name: scaf_l_gt50k
description: The total length of scaffolds longer than 50,000 base pairs
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: scaf_l_gt50k
owner: Contigset
domain_of:
- Contigset
range: integer
scaf_logsum:
name: scaf_logsum
description: The sum of the (length*log(length)) of all scaffolds, times some
constant. Increase the contiguity, the score will increase
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: scaf_logsum
owner: Contigset
domain_of:
- Contigset
range: float
scaf_max:
name: scaf_max
description: Maximum scaffold length
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: scaf_max
owner: Contigset
domain_of:
- Contigset
range: integer
scaf_n_gt50K:
name: scaf_n_gt50K
description: The number of scaffolds longer than 50,000 base pairs.
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: scaf_n_gt50K
owner: Contigset
domain_of:
- Contigset
range: integer
scaf_pct_gt50K:
name: scaf_pct_gt50K
description: The percentage of the total assembly length represented by scaffolds
longer than 50,000 base pairs
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: scaf_pct_gt50K
owner: Contigset
domain_of:
- Contigset
range: float
scaf_powsum:
name: scaf_powsum
description: Powersum of all scaffolds is the same as logsum except that it uses
the sum of (length*(length^P)) for some power P (default P=0.25).
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: scaf_powsum
owner: Contigset
domain_of:
- Contigset
range: float
hash:
name: hash
description: A hash value generated from one or more object attributes that serves
to ensure the entity is unique.
from_schema: https://github.com/kbase/cdm-schema
alias: hash
owner: Contigset
domain_of:
- UniqueNamedThing
range: string
identifiers:
name: identifiers
description: URIs or CURIEs used to refer to this entity.
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: identifiers
owner: Contigset
domain_of:
- NamedThingWithId
range: Identifier
multivalued: true
description:
name: description
description: Definition or description of the entity.
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: description
owner: Contigset
domain_of:
- NamedThing
- Event
- Experiment
- Identifier
- Name
- Project
- Protein
- Sample
range: string
names:
name: names
description: Names, alternative names, and synonyms for an entity.
from_schema: https://github.com/kbase/cdm-schema
rank: 1000
alias: names
owner: Contigset
domain_of:
- NamedThing
range: Name
multivalued: true