Skip to content

Class: ContigCollection

A set of individual, overlapping contigs that represent the complete sequenced genome of an organism.

URI: kb_cdm:ContigCollection

classDiagram class ContigCollection click ContigCollection href "../ContigCollection/" Table <|-- ContigCollection click Table href "../Table/" ContigCollection : asm_score ContigCollection : checkm2_completeness ContigCollection : checkm2_contamination ContigCollection : contig_bp ContigCollection : contig_collection_id ContigCollection : contig_collection_type ContigCollection --> "0..1" ContigCollectionType : contig_collection_type click ContigCollectionType href "../ContigCollectionType/" ContigCollection : ctg_L50 ContigCollection : ctg_L90 ContigCollection : ctg_logsum ContigCollection : ctg_max ContigCollection : ctg_N50 ContigCollection : ctg_N90 ContigCollection : ctg_powsum ContigCollection : gap_pct ContigCollection : gc_avg ContigCollection : gc_std ContigCollection : hash ContigCollection : n_contigs ContigCollection : n_scaffolds ContigCollection : scaf_bp ContigCollection : scaf_L50 ContigCollection : scaf_L90 ContigCollection : scaf_l_gt50k ContigCollection : scaf_logsum ContigCollection : scaf_max ContigCollection : scaf_N50 ContigCollection : scaf_N90 ContigCollection : scaf_n_gt50K ContigCollection : scaf_pct_gt50K ContigCollection : scaf_powsum

Inheritance

Slots

Name Cardinality and Range Description Inheritance
contig_collection_id 1
UUID
Internal (CDM) unique identifier for a contig collection. direct
hash 0..1
String
A hash value generated from one or more object attributes that serves to ensure the entity is unique. direct
asm_score 0..1
Float
A composite score for comparing contig collection quality direct
checkm2_completeness 0..1
Float
Estimate of the completeness of a contig collection (MAG or genome), estimated by CheckM2 tool direct
checkm2_contamination 0..1
Float
Estimate of the contamination of a contig collection (MAG or genome), estimated by CheckM2 tool direct
contig_bp 0..1
Integer
Total size in bp of all contigs direct
contig_collection_type 0..1
ContigCollectionType
The type of contig collection. direct
ctg_L50 0..1
Integer
Given a set of contigs, the L50 is defined as the sequence length of the shortest contig at 50% of the total contig collection length direct
ctg_L90 0..1
Integer
The L90 statistic is less than or equal to the L50 statistic; it is the length for which the collection of all contigs of that length or longer contains at least 90% of the sum of the lengths of all contigs direct
ctg_N50 0..1
Integer
Given a set of contigs, each with its own length, the N50 count is defined as the smallest number_of_contigs whose length sum makes up half of contig collection size direct
ctg_N90 0..1
Integer
Given a set of contigs, each with its own length, the N90 count is defined as the smallest number of contigs whose length sum makes up 90% of contig collection size direct
ctg_logsum 0..1
Float
The sum of the (length*log(length)) of all contigs, times some constant. direct
ctg_max 0..1
Integer
Maximum contig length direct
ctg_powsum 0..1
Float
Powersum of all contigs is the same as logsum except that it uses the sum of (length*(length^P)) for some power P (default P=0.25) direct
gap_pct 0..1
Float
The gap size percentage of all scaffolds direct
gc_avg 0..1
Float
The average GC content of the contig collection, expressed as a percentage direct
gc_std 0..1
Float
The standard deviation of GC content across the contig collection direct
n_contigs 0..1
Integer
Total number of contigs direct
n_scaffolds 0..1
Integer
Total number of scaffolds direct
scaf_L50 0..1
Integer
Given a set of scaffolds, the L50 is defined as the sequence length of the shortest scaffold at 50% of the total contig collection length direct
scaf_L90 0..1
Integer
The L90 statistic is less than or equal to the L50 statistic; it is the length for which the collection of all scaffolds of that length or longer contains at least 90% of the sum of the lengths of all scaffolds. direct
scaf_N50 0..1
Integer
Given a set of scaffolds, each with its own length, the N50 count is defined as the smallest number of scaffolds whose length sum makes up half of contig collection size direct
scaf_N90 0..1
Integer
Given a set of scaffolds, each with its own length, the N90 count is defined as the smallest number of scaffolds whose length sum makes up 90% of contig collection size direct
scaf_bp 0..1
Integer
Total size in bp of all scaffolds direct
scaf_l_gt50k 0..1
Integer
The total length of scaffolds longer than 50,000 base pairs direct
scaf_logsum 0..1
Float
The sum of the (length*log(length)) of all scaffolds, times some constant. Increase the contiguity, the score will increase direct
scaf_max 0..1
Integer
Maximum scaffold length direct
scaf_n_gt50K 0..1
Integer
The number of scaffolds longer than 50,000 base pairs. direct
scaf_pct_gt50K 0..1
Float
The percentage of the total assembly length represented by scaffolds longer than 50,000 base pairs direct
scaf_powsum 0..1
Float
Powersum of all scaffolds is the same as logsum except that it uses the sum of (length*(length^P)) for some power P (default P=0.25). direct

Usages

used by used in type used
Association subject any_of[range] ContigCollection

Aliases

  • genome
  • biological subject
  • assembly
  • contig collection
  • contig set

Identifier and Mapping Information

Schema Source

  • from schema: http://kbase.github.io/cdm-schema/cdm_schema

Mappings

Mapping Type Mapped Value
self kb_cdm:ContigCollection
native kb_cdm:ContigCollection

LinkML Source

Direct

name: ContigCollection
description: A set of individual, overlapping contigs that represent the complete
  sequenced genome of an organism.
from_schema: http://kbase.github.io/cdm-schema/cdm_schema
aliases:
- genome
- biological subject
- assembly
- contig collection
- contig set
is_a: Table
slots:
- contig_collection_id
- hash
slot_usage:
  contig_collection_id:
    name: contig_collection_id
    identifier: true
attributes:
  asm_score:
    name: asm_score
    description: A composite score for comparing contig collection quality
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  checkm2_completeness:
    name: checkm2_completeness
    description: Estimate of the completeness of a contig collection (MAG or genome),
      estimated by CheckM2 tool
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  checkm2_contamination:
    name: checkm2_contamination
    description: Estimate of the contamination of a contig collection (MAG or genome),
      estimated by CheckM2 tool
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  contig_bp:
    name: contig_bp
    description: Total size in bp of all contigs
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  contig_collection_type:
    name: contig_collection_type
    description: The type of contig collection.
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: ContigCollectionType
  ctg_L50:
    name: ctg_L50
    description: Given a set of contigs, the L50 is defined as the sequence length
      of the shortest contig at 50% of the total contig collection length
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  ctg_L90:
    name: ctg_L90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all contigs of that length or longer
      contains at least 90% of the sum of the lengths of all contigs
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  ctg_N50:
    name: ctg_N50
    description: Given a set of contigs, each with its own length, the N50 count is
      defined as the smallest number_of_contigs whose length sum makes up half of
      contig collection size
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  ctg_N90:
    name: ctg_N90
    description: Given a set of contigs, each with its own length, the N90 count is
      defined as the smallest number of contigs whose length sum makes up 90% of contig
      collection size
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  ctg_logsum:
    name: ctg_logsum
    description: The sum of the (length*log(length)) of all contigs, times some constant.
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  ctg_max:
    name: ctg_max
    description: Maximum contig length
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  ctg_powsum:
    name: ctg_powsum
    description: Powersum of all contigs is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25)
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  gap_pct:
    name: gap_pct
    description: The gap size percentage of all scaffolds
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  gc_avg:
    name: gc_avg
    description: The average GC content of the contig collection, expressed as a percentage
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  gc_std:
    name: gc_std
    description: The standard deviation of GC content across the contig collection
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  n_contigs:
    name: n_contigs
    description: Total number of contigs
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  n_scaffolds:
    name: n_scaffolds
    description: Total number of scaffolds
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_L50:
    name: scaf_L50
    description: Given a set of scaffolds, the L50 is defined as the sequence length
      of the shortest scaffold at 50% of the total contig collection length
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_L90:
    name: scaf_L90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all scaffolds of that length or longer
      contains at least 90% of the sum of the lengths of all scaffolds.
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_N50:
    name: scaf_N50
    description: Given a set of scaffolds, each with its own length, the N50 count
      is defined as the smallest number of scaffolds whose length sum makes up half
      of contig collection size
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_N90:
    name: scaf_N90
    description: Given a set of scaffolds, each with its own length, the N90 count
      is defined as the smallest number of scaffolds whose length sum makes up 90%
      of contig collection size
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_bp:
    name: scaf_bp
    description: Total size in bp of all scaffolds
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_l_gt50k:
    name: scaf_l_gt50k
    description: The total length of scaffolds longer than 50,000 base pairs
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_logsum:
    name: scaf_logsum
    description: The sum of the (length*log(length)) of all scaffolds, times some
      constant. Increase the contiguity, the score will increase
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  scaf_max:
    name: scaf_max
    description: Maximum scaffold length
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_n_gt50K:
    name: scaf_n_gt50K
    description: The number of scaffolds longer than 50,000 base pairs.
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_pct_gt50K:
    name: scaf_pct_gt50K
    description: The percentage of the total assembly length represented by scaffolds
      longer than 50,000 base pairs
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  scaf_powsum:
    name: scaf_powsum
    description: Powersum of all scaffolds is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25).
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    domain_of:
    - ContigCollection
    range: float

Induced

name: ContigCollection
description: A set of individual, overlapping contigs that represent the complete
  sequenced genome of an organism.
from_schema: http://kbase.github.io/cdm-schema/cdm_schema
aliases:
- genome
- biological subject
- assembly
- contig collection
- contig set
is_a: Table
slot_usage:
  contig_collection_id:
    name: contig_collection_id
    identifier: true
attributes:
  asm_score:
    name: asm_score
    description: A composite score for comparing contig collection quality
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: asm_score
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  checkm2_completeness:
    name: checkm2_completeness
    description: Estimate of the completeness of a contig collection (MAG or genome),
      estimated by CheckM2 tool
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: checkm2_completeness
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  checkm2_contamination:
    name: checkm2_contamination
    description: Estimate of the contamination of a contig collection (MAG or genome),
      estimated by CheckM2 tool
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: checkm2_contamination
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  contig_bp:
    name: contig_bp
    description: Total size in bp of all contigs
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: contig_bp
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  contig_collection_type:
    name: contig_collection_type
    description: The type of contig collection.
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: contig_collection_type
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: ContigCollectionType
  ctg_L50:
    name: ctg_L50
    description: Given a set of contigs, the L50 is defined as the sequence length
      of the shortest contig at 50% of the total contig collection length
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: ctg_L50
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  ctg_L90:
    name: ctg_L90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all contigs of that length or longer
      contains at least 90% of the sum of the lengths of all contigs
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: ctg_L90
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  ctg_N50:
    name: ctg_N50
    description: Given a set of contigs, each with its own length, the N50 count is
      defined as the smallest number_of_contigs whose length sum makes up half of
      contig collection size
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: ctg_N50
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  ctg_N90:
    name: ctg_N90
    description: Given a set of contigs, each with its own length, the N90 count is
      defined as the smallest number of contigs whose length sum makes up 90% of contig
      collection size
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: ctg_N90
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  ctg_logsum:
    name: ctg_logsum
    description: The sum of the (length*log(length)) of all contigs, times some constant.
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: ctg_logsum
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  ctg_max:
    name: ctg_max
    description: Maximum contig length
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: ctg_max
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  ctg_powsum:
    name: ctg_powsum
    description: Powersum of all contigs is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25)
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: ctg_powsum
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  gap_pct:
    name: gap_pct
    description: The gap size percentage of all scaffolds
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: gap_pct
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  gc_avg:
    name: gc_avg
    description: The average GC content of the contig collection, expressed as a percentage
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: gc_avg
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  gc_std:
    name: gc_std
    description: The standard deviation of GC content across the contig collection
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: gc_std
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  n_contigs:
    name: n_contigs
    description: Total number of contigs
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: n_contigs
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  n_scaffolds:
    name: n_scaffolds
    description: Total number of scaffolds
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: n_scaffolds
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_L50:
    name: scaf_L50
    description: Given a set of scaffolds, the L50 is defined as the sequence length
      of the shortest scaffold at 50% of the total contig collection length
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: scaf_L50
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_L90:
    name: scaf_L90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all scaffolds of that length or longer
      contains at least 90% of the sum of the lengths of all scaffolds.
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: scaf_L90
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_N50:
    name: scaf_N50
    description: Given a set of scaffolds, each with its own length, the N50 count
      is defined as the smallest number of scaffolds whose length sum makes up half
      of contig collection size
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: scaf_N50
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_N90:
    name: scaf_N90
    description: Given a set of scaffolds, each with its own length, the N90 count
      is defined as the smallest number of scaffolds whose length sum makes up 90%
      of contig collection size
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: scaf_N90
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_bp:
    name: scaf_bp
    description: Total size in bp of all scaffolds
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: scaf_bp
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_l_gt50k:
    name: scaf_l_gt50k
    description: The total length of scaffolds longer than 50,000 base pairs
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: scaf_l_gt50k
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_logsum:
    name: scaf_logsum
    description: The sum of the (length*log(length)) of all scaffolds, times some
      constant. Increase the contiguity, the score will increase
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: scaf_logsum
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  scaf_max:
    name: scaf_max
    description: Maximum scaffold length
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: scaf_max
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_n_gt50K:
    name: scaf_n_gt50K
    description: The number of scaffolds longer than 50,000 base pairs.
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: scaf_n_gt50K
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_pct_gt50K:
    name: scaf_pct_gt50K
    description: The percentage of the total assembly length represented by scaffolds
      longer than 50,000 base pairs
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: scaf_pct_gt50K
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  scaf_powsum:
    name: scaf_powsum
    description: Powersum of all scaffolds is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25).
    from_schema: http://kbase.github.io/cdm-schema/cdm_components
    rank: 1000
    alias: scaf_powsum
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  contig_collection_id:
    name: contig_collection_id
    description: Internal (CDM) unique identifier for a contig collection.
    from_schema: http://kbase.github.io/cdm-schema/cdm_schema
    rank: 1000
    identifier: true
    alias: contig_collection_id
    owner: ContigCollection
    domain_of:
    - Contig_X_ContigCollection
    - ContigCollection_X_EncodedFeature
    - ContigCollection_X_Feature
    - ContigCollection_X_Protein
    - ContigCollection
    range: UUID
    required: true
  hash:
    name: hash
    description: A hash value generated from one or more object attributes that serves
      to ensure the entity is unique.
    from_schema: http://kbase.github.io/cdm-schema/cdm_schema
    rank: 1000
    alias: hash
    owner: ContigCollection
    domain_of:
    - Contig
    - ContigCollection
    - EncodedFeature
    - Feature
    - Protein
    range: string