Skip to content

Class: Contigset

A set of individual, overlapping contigs that represent the complete sequenced genome of an organism.

URI: kb_cdm:Contigset

classDiagram class Contigset click Contigset href "../Contigset" UniqueNamedThing <|-- Contigset click UniqueNamedThing href "../UniqueNamedThing" Contigset <|-- Genome click Genome href "../Genome" Contigset : asm_score Contigset : checkm2_completeness Contigset : checkm2_contamination Contigset : contig_bp Contigset : contigset_id Contigset : ctg_L50 Contigset : ctg_L90 Contigset : ctg_logsum Contigset : ctg_max Contigset : ctg_N50 Contigset : ctg_N90 Contigset : ctg_powsum Contigset : description Contigset : gap_pct Contigset : gc_avg Contigset : gc_std Contigset : hash Contigset : identifiers Contigset --> "*" Identifier : identifiers click Identifier href "../Identifier" Contigset : n_contigs Contigset : n_scaffolds Contigset : names Contigset --> "*" Name : names click Name href "../Name" Contigset : scaf_bp Contigset : scaf_L50 Contigset : scaf_L90 Contigset : scaf_l_gt50k Contigset : scaf_logsum Contigset : scaf_max Contigset : scaf_N50 Contigset : scaf_N90 Contigset : scaf_n_gt50K Contigset : scaf_pct_gt50K Contigset : scaf_powsum

Inheritance

Slots

Name Cardinality and Range Description Inheritance
asm_score 0..1
Float
A composite score for comparing contigset quality direct
checkm2_completeness 0..1
Float
Estimate of the completeness of a contigset (MAG or genome), estimated by Che... direct
checkm2_contamination 0..1
Float
Estimate of the contamination of a contigset (MAG or genome), estimated by Ch... direct
contigset_id 1
UUID
Internal (CDM) unique identifier direct
contig_bp 0..1
Integer
Total size in bp of all contigs direct
ctg_L50 0..1
Integer
Given a set of contigs, the L50 is defined as the sequence length of the shor... direct
ctg_L90 0..1
Integer
The L90 statistic is less than or equal to the L50 statistic; it is the lengt... direct
ctg_N50 0..1
Integer
Given a set of contigs, each with its own length, the N50 count is defined as... direct
ctg_N90 0..1
Integer
Given a set of contigs, each with its own length, the N90 count is defined as... direct
ctg_logsum 0..1
Float
The sum of the (length*log(length)) of all contigs, times some constant direct
ctg_max 0..1
Integer
Maximum contig length direct
ctg_powsum 0..1
Float
Powersum of all contigs is the same as logsum except that it uses the sum of ... direct
gap_pct 0..1
Float
The gap size percentage of all scaffolds direct
gc_avg 0..1
Float
The average GC content of the contigset, expressed as a percentage direct
gc_std 0..1
Float
The standard deviation of GC content across the contigset direct
n_contigs 0..1
Integer
Total number of contigs direct
n_scaffolds 0..1
Integer
Total number of scaffolds direct
scaf_L50 0..1
Integer
Given a set of scaffolds, the L50 is defined as the sequence length of the sh... direct
scaf_L90 0..1
Integer
The L90 statistic is less than or equal to the L50 statistic; it is the lengt... direct
scaf_N50 0..1
Integer
Given a set of scaffolds, each with its own length, the N50 count is defined ... direct
scaf_N90 0..1
Integer
Given a set of scaffolds, each with its own length, the N90 count is defined ... direct
scaf_bp 0..1
Integer
Total size in bp of all scaffolds direct
scaf_l_gt50k 0..1
Integer
The total length of scaffolds longer than 50,000 base pairs direct
scaf_logsum 0..1
Float
The sum of the (length*log(length)) of all scaffolds, times some constant direct
scaf_max 0..1
Integer
Maximum scaffold length direct
scaf_n_gt50K 0..1
Integer
The number of scaffolds longer than 50,000 base pairs direct
scaf_pct_gt50K 0..1
Float
The percentage of the total assembly length represented by scaffolds longer t... direct
scaf_powsum 0..1
Float
Powersum of all scaffolds is the same as logsum except that it uses the sum o... direct
hash 0..1
String
A hash value generated from one or more object attributes that serves to ensu... UniqueNamedThing
identifiers *
Identifier
URIs or CURIEs used to refer to this entity NamedThingWithId
description 0..1
String
Definition or description of the entity NamedThing
names *
Name
Names, alternative names, and synonyms for an entity NamedThing

Aliases

  • genome
  • biological subject
  • assembly

Identifier and Mapping Information

Schema Source

  • from schema: https://github.com/kbase/cdm-schema

Mappings

Mapping Type Mapped Value
self kb_cdm:Contigset
native kb_cdm:Contigset

LinkML Source

Direct

name: Contigset
description: A set of individual, overlapping contigs that represent the complete
  sequenced genome of an organism.
from_schema: https://github.com/kbase/cdm-schema
aliases:
- genome
- biological subject
- assembly
is_a: UniqueNamedThing
attributes:
  asm_score:
    name: asm_score
    description: A composite score for comparing contigset quality
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: float
  checkm2_completeness:
    name: checkm2_completeness
    description: Estimate of the completeness of a contigset (MAG or genome), estimated
      by CheckM2 tool
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: float
  checkm2_contamination:
    name: checkm2_contamination
    description: Estimate of the contamination of a contigset (MAG or genome), estimated
      by CheckM2 tool
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: float
  contigset_id:
    name: contigset_id
    description: Internal (CDM) unique identifier.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    identifier: true
    domain_of:
    - Contigset
    range: UUID
    required: true
  contig_bp:
    name: contig_bp
    description: Total size in bp of all contigs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  ctg_L50:
    name: ctg_L50
    description: Given a set of contigs, the L50 is defined as the sequence length
      of the shortest contig at 50% of the total contigset length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  ctg_L90:
    name: ctg_L90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all contigs of that length or longer
      contains at least 90% of the sum of the lengths of all contigs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  ctg_N50:
    name: ctg_N50
    description: Given a set of contigs, each with its own length, the N50 count is
      defined as the smallest number_of_contigs whose length sum makes up half of
      contigset size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  ctg_N90:
    name: ctg_N90
    description: Given a set of contigs, each with its own length, the N90 count is
      defined as the smallest number of contigs whose length sum makes up 90% of contigset
      size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  ctg_logsum:
    name: ctg_logsum
    description: The sum of the (length*log(length)) of all contigs, times some constant.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: float
  ctg_max:
    name: ctg_max
    description: Maximum contig length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  ctg_powsum:
    name: ctg_powsum
    description: Powersum of all contigs is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25)
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: float
  gap_pct:
    name: gap_pct
    description: The gap size percentage of all scaffolds
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: float
  gc_avg:
    name: gc_avg
    description: The average GC content of the contigset, expressed as a percentage
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: float
  gc_std:
    name: gc_std
    description: The standard deviation of GC content across the contigset
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: float
  n_contigs:
    name: n_contigs
    description: Total number of contigs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  n_scaffolds:
    name: n_scaffolds
    description: Total number of scaffolds
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  scaf_L50:
    name: scaf_L50
    description: Given a set of scaffolds, the L50 is defined as the sequence length
      of the shortest scaffold at 50% of the total contigset length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  scaf_L90:
    name: scaf_L90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all scaffolds of that length or longer
      contains at least 90% of the sum of the lengths of all scaffolds.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  scaf_N50:
    name: scaf_N50
    description: Given a set of scaffolds, each with its own length, the N50 count
      is defined as the smallest number of scaffolds whose length sum makes up half
      of contigset size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  scaf_N90:
    name: scaf_N90
    description: Given a set of scaffolds, each with its own length, the N90 count
      is defined as the smallest number of scaffolds whose length sum makes up 90%
      of contigset size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  scaf_bp:
    name: scaf_bp
    description: Total size in bp of all scaffolds
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  scaf_l_gt50k:
    name: scaf_l_gt50k
    description: The total length of scaffolds longer than 50,000 base pairs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  scaf_logsum:
    name: scaf_logsum
    description: The sum of the (length*log(length)) of all scaffolds, times some
      constant. Increase the contiguity, the score will increase
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: float
  scaf_max:
    name: scaf_max
    description: Maximum scaffold length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  scaf_n_gt50K:
    name: scaf_n_gt50K
    description: The number of scaffolds longer than 50,000 base pairs.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: integer
  scaf_pct_gt50K:
    name: scaf_pct_gt50K
    description: The percentage of the total assembly length represented by scaffolds
      longer than 50,000 base pairs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: float
  scaf_powsum:
    name: scaf_powsum
    description: Powersum of all scaffolds is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25).
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - Contigset
    range: float

Induced

name: Contigset
description: A set of individual, overlapping contigs that represent the complete
  sequenced genome of an organism.
from_schema: https://github.com/kbase/cdm-schema
aliases:
- genome
- biological subject
- assembly
is_a: UniqueNamedThing
attributes:
  asm_score:
    name: asm_score
    description: A composite score for comparing contigset quality
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: asm_score
    owner: Contigset
    domain_of:
    - Contigset
    range: float
  checkm2_completeness:
    name: checkm2_completeness
    description: Estimate of the completeness of a contigset (MAG or genome), estimated
      by CheckM2 tool
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: checkm2_completeness
    owner: Contigset
    domain_of:
    - Contigset
    range: float
  checkm2_contamination:
    name: checkm2_contamination
    description: Estimate of the contamination of a contigset (MAG or genome), estimated
      by CheckM2 tool
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: checkm2_contamination
    owner: Contigset
    domain_of:
    - Contigset
    range: float
  contigset_id:
    name: contigset_id
    description: Internal (CDM) unique identifier.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    identifier: true
    alias: contigset_id
    owner: Contigset
    domain_of:
    - Contigset
    range: UUID
    required: true
  contig_bp:
    name: contig_bp
    description: Total size in bp of all contigs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: contig_bp
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  ctg_L50:
    name: ctg_L50
    description: Given a set of contigs, the L50 is defined as the sequence length
      of the shortest contig at 50% of the total contigset length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: ctg_L50
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  ctg_L90:
    name: ctg_L90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all contigs of that length or longer
      contains at least 90% of the sum of the lengths of all contigs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: ctg_L90
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  ctg_N50:
    name: ctg_N50
    description: Given a set of contigs, each with its own length, the N50 count is
      defined as the smallest number_of_contigs whose length sum makes up half of
      contigset size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: ctg_N50
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  ctg_N90:
    name: ctg_N90
    description: Given a set of contigs, each with its own length, the N90 count is
      defined as the smallest number of contigs whose length sum makes up 90% of contigset
      size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: ctg_N90
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  ctg_logsum:
    name: ctg_logsum
    description: The sum of the (length*log(length)) of all contigs, times some constant.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: ctg_logsum
    owner: Contigset
    domain_of:
    - Contigset
    range: float
  ctg_max:
    name: ctg_max
    description: Maximum contig length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: ctg_max
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  ctg_powsum:
    name: ctg_powsum
    description: Powersum of all contigs is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25)
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: ctg_powsum
    owner: Contigset
    domain_of:
    - Contigset
    range: float
  gap_pct:
    name: gap_pct
    description: The gap size percentage of all scaffolds
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: gap_pct
    owner: Contigset
    domain_of:
    - Contigset
    range: float
  gc_avg:
    name: gc_avg
    description: The average GC content of the contigset, expressed as a percentage
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: gc_avg
    owner: Contigset
    domain_of:
    - Contigset
    range: float
  gc_std:
    name: gc_std
    description: The standard deviation of GC content across the contigset
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: gc_std
    owner: Contigset
    domain_of:
    - Contigset
    range: float
  n_contigs:
    name: n_contigs
    description: Total number of contigs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: n_contigs
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  n_scaffolds:
    name: n_scaffolds
    description: Total number of scaffolds
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: n_scaffolds
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  scaf_L50:
    name: scaf_L50
    description: Given a set of scaffolds, the L50 is defined as the sequence length
      of the shortest scaffold at 50% of the total contigset length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_L50
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  scaf_L90:
    name: scaf_L90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all scaffolds of that length or longer
      contains at least 90% of the sum of the lengths of all scaffolds.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_L90
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  scaf_N50:
    name: scaf_N50
    description: Given a set of scaffolds, each with its own length, the N50 count
      is defined as the smallest number of scaffolds whose length sum makes up half
      of contigset size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_N50
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  scaf_N90:
    name: scaf_N90
    description: Given a set of scaffolds, each with its own length, the N90 count
      is defined as the smallest number of scaffolds whose length sum makes up 90%
      of contigset size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_N90
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  scaf_bp:
    name: scaf_bp
    description: Total size in bp of all scaffolds
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_bp
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  scaf_l_gt50k:
    name: scaf_l_gt50k
    description: The total length of scaffolds longer than 50,000 base pairs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_l_gt50k
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  scaf_logsum:
    name: scaf_logsum
    description: The sum of the (length*log(length)) of all scaffolds, times some
      constant. Increase the contiguity, the score will increase
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_logsum
    owner: Contigset
    domain_of:
    - Contigset
    range: float
  scaf_max:
    name: scaf_max
    description: Maximum scaffold length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_max
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  scaf_n_gt50K:
    name: scaf_n_gt50K
    description: The number of scaffolds longer than 50,000 base pairs.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_n_gt50K
    owner: Contigset
    domain_of:
    - Contigset
    range: integer
  scaf_pct_gt50K:
    name: scaf_pct_gt50K
    description: The percentage of the total assembly length represented by scaffolds
      longer than 50,000 base pairs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_pct_gt50K
    owner: Contigset
    domain_of:
    - Contigset
    range: float
  scaf_powsum:
    name: scaf_powsum
    description: Powersum of all scaffolds is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25).
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_powsum
    owner: Contigset
    domain_of:
    - Contigset
    range: float
  hash:
    name: hash
    description: A hash value generated from one or more object attributes that serves
      to ensure the entity is unique.
    from_schema: https://github.com/kbase/cdm-schema
    alias: hash
    owner: Contigset
    domain_of:
    - UniqueNamedThing
    range: string
  identifiers:
    name: identifiers
    description: URIs or CURIEs used to refer to this entity.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: identifiers
    owner: Contigset
    domain_of:
    - NamedThingWithId
    range: Identifier
    multivalued: true
  description:
    name: description
    description: Definition or description of the entity.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: description
    owner: Contigset
    domain_of:
    - NamedThing
    - Event
    - Experiment
    - Identifier
    - Name
    - Project
    - Protein
    - Sample
    range: string
  names:
    name: names
    description: Names, alternative names, and synonyms for an entity.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: names
    owner: Contigset
    domain_of:
    - NamedThing
    range: Name
    multivalued: true