Skip to content

Class: ContigCollection

A set of individual, overlapping contigs that represent the complete sequenced genome of an organism.

URI: cdm:ContigCollection

classDiagram class ContigCollection click ContigCollection href "../ContigCollection" HasNames <|-- ContigCollection click HasNames href "../HasNames" HasIdentifiers <|-- ContigCollection click HasIdentifiers href "../HasIdentifiers" HasHash <|-- ContigCollection click HasHash href "../HasHash" ContigCollection <|-- Genome click Genome href "../Genome" ContigCollection : asm_score ContigCollection : checkm2_completeness ContigCollection : checkm2_contamination ContigCollection : contig_bp ContigCollection : contig_collection_id ContigCollection : contig_collection_type ContigCollection --> "0..1" ContigCollectionType : contig_collection_type click ContigCollectionType href "../ContigCollectionType" ContigCollection : ctg_L50 ContigCollection : ctg_L90 ContigCollection : ctg_logsum ContigCollection : ctg_max ContigCollection : ctg_N50 ContigCollection : ctg_N90 ContigCollection : ctg_powsum ContigCollection : gap_pct ContigCollection : gc_avg ContigCollection : gc_std ContigCollection : hash ContigCollection : identifiers ContigCollection --> "*" Identifier : identifiers click Identifier href "../Identifier" ContigCollection : n_contigs ContigCollection : n_scaffolds ContigCollection : names ContigCollection --> "*" Name : names click Name href "../Name" ContigCollection : scaf_bp ContigCollection : scaf_L50 ContigCollection : scaf_L90 ContigCollection : scaf_l_gt50k ContigCollection : scaf_logsum ContigCollection : scaf_max ContigCollection : scaf_N50 ContigCollection : scaf_N90 ContigCollection : scaf_n_gt50K ContigCollection : scaf_pct_gt50K ContigCollection : scaf_powsum

Inheritance

Slots

Name Cardinality and Range Description Inheritance
asm_score 0..1
Float
A composite score for comparing contig collection quality direct
checkm2_completeness 0..1
Float
Estimate of the completeness of a contig collection (MAG or genome), estimate... direct
checkm2_contamination 0..1
Float
Estimate of the contamination of a contig collection (MAG or genome), estimat... direct
contig_collection_id 1
UUID
Internal (CDM) unique identifier direct
contig_bp 0..1
Integer
Total size in bp of all contigs direct
contig_collection_type 0..1
ContigCollectionType
The type of contig collection direct
ctg_L50 0..1
Integer
Given a set of contigs, the L50 is defined as the sequence length of the shor... direct
ctg_L90 0..1
Integer
The L90 statistic is less than or equal to the L50 statistic; it is the lengt... direct
ctg_N50 0..1
Integer
Given a set of contigs, each with its own length, the N50 count is defined as... direct
ctg_N90 0..1
Integer
Given a set of contigs, each with its own length, the N90 count is defined as... direct
ctg_logsum 0..1
Float
The sum of the (length*log(length)) of all contigs, times some constant direct
ctg_max 0..1
Integer
Maximum contig length direct
ctg_powsum 0..1
Float
Powersum of all contigs is the same as logsum except that it uses the sum of ... direct
gap_pct 0..1
Float
The gap size percentage of all scaffolds direct
gc_avg 0..1
Float
The average GC content of the contig collection, expressed as a percentage direct
gc_std 0..1
Float
The standard deviation of GC content across the contig collection direct
n_contigs 0..1
Integer
Total number of contigs direct
n_scaffolds 0..1
Integer
Total number of scaffolds direct
scaf_L50 0..1
Integer
Given a set of scaffolds, the L50 is defined as the sequence length of the sh... direct
scaf_L90 0..1
Integer
The L90 statistic is less than or equal to the L50 statistic; it is the lengt... direct
scaf_N50 0..1
Integer
Given a set of scaffolds, each with its own length, the N50 count is defined ... direct
scaf_N90 0..1
Integer
Given a set of scaffolds, each with its own length, the N90 count is defined ... direct
scaf_bp 0..1
Integer
Total size in bp of all scaffolds direct
scaf_l_gt50k 0..1
Integer
The total length of scaffolds longer than 50,000 base pairs direct
scaf_logsum 0..1
Float
The sum of the (length*log(length)) of all scaffolds, times some constant direct
scaf_max 0..1
Integer
Maximum scaffold length direct
scaf_n_gt50K 0..1
Integer
The number of scaffolds longer than 50,000 base pairs direct
scaf_pct_gt50K 0..1
Float
The percentage of the total assembly length represented by scaffolds longer t... direct
scaf_powsum 0..1
Float
Powersum of all scaffolds is the same as logsum except that it uses the sum o... direct
names *
Name
Names, alternative names, and synonyms for an entity HasNames
identifiers *
Identifier
URIs or CURIEs used to refer to this entity HasIdentifiers
hash 0..1
String
A hash value generated from one or more object attributes that serves to ensu... HasHash

Usages

used by used in type used
ContigXContigCollection contig_collection_id range ContigCollection
ContigCollectionXFeature contig_collection_id range ContigCollection
ContigCollectionXProtein contig_collection_id range ContigCollection

Aliases

  • genome
  • biological subject
  • assembly
  • contig collection
  • contig set

Identifier and Mapping Information

Schema Source

  • from schema: https://github.com/kbase/cdm-schema

Mappings

Mapping Type Mapped Value
self cdm:ContigCollection
native cdm:ContigCollection

LinkML Source

Direct

name: ContigCollection
description: A set of individual, overlapping contigs that represent the complete
  sequenced genome of an organism.
from_schema: https://github.com/kbase/cdm-schema
aliases:
- genome
- biological subject
- assembly
- contig collection
- contig set
mixins:
- HasNames
- HasIdentifiers
- HasHash
attributes:
  asm_score:
    name: asm_score
    description: A composite score for comparing contig collection quality
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  checkm2_completeness:
    name: checkm2_completeness
    description: Estimate of the completeness of a contig collection (MAG or genome),
      estimated by CheckM2 tool
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  checkm2_contamination:
    name: checkm2_contamination
    description: Estimate of the contamination of a contig collection (MAG or genome),
      estimated by CheckM2 tool
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  contig_collection_id:
    name: contig_collection_id
    description: Internal (CDM) unique identifier.
    from_schema: https://github.com/kbase/cdm-schema
    identifier: true
    domain_of:
    - Contig_X_ContigCollection
    - ContigCollection_X_Feature
    - ContigCollection_X_Protein
    - ContigCollection
    range: UUID
    required: true
  contig_bp:
    name: contig_bp
    description: Total size in bp of all contigs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  contig_collection_type:
    name: contig_collection_type
    description: The type of contig collection.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: ContigCollectionType
  ctg_L50:
    name: ctg_L50
    description: Given a set of contigs, the L50 is defined as the sequence length
      of the shortest contig at 50% of the total contig collection length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  ctg_L90:
    name: ctg_L90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all contigs of that length or longer
      contains at least 90% of the sum of the lengths of all contigs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  ctg_N50:
    name: ctg_N50
    description: Given a set of contigs, each with its own length, the N50 count is
      defined as the smallest number_of_contigs whose length sum makes up half of
      contig collection size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  ctg_N90:
    name: ctg_N90
    description: Given a set of contigs, each with its own length, the N90 count is
      defined as the smallest number of contigs whose length sum makes up 90% of contig
      collection size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  ctg_logsum:
    name: ctg_logsum
    description: The sum of the (length*log(length)) of all contigs, times some constant.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  ctg_max:
    name: ctg_max
    description: Maximum contig length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  ctg_powsum:
    name: ctg_powsum
    description: Powersum of all contigs is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25)
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  gap_pct:
    name: gap_pct
    description: The gap size percentage of all scaffolds
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  gc_avg:
    name: gc_avg
    description: The average GC content of the contig collection, expressed as a percentage
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  gc_std:
    name: gc_std
    description: The standard deviation of GC content across the contig collection
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  n_contigs:
    name: n_contigs
    description: Total number of contigs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  n_scaffolds:
    name: n_scaffolds
    description: Total number of scaffolds
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_L50:
    name: scaf_L50
    description: Given a set of scaffolds, the L50 is defined as the sequence length
      of the shortest scaffold at 50% of the total contig collection length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_L90:
    name: scaf_L90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all scaffolds of that length or longer
      contains at least 90% of the sum of the lengths of all scaffolds.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_N50:
    name: scaf_N50
    description: Given a set of scaffolds, each with its own length, the N50 count
      is defined as the smallest number of scaffolds whose length sum makes up half
      of contig collection size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_N90:
    name: scaf_N90
    description: Given a set of scaffolds, each with its own length, the N90 count
      is defined as the smallest number of scaffolds whose length sum makes up 90%
      of contig collection size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_bp:
    name: scaf_bp
    description: Total size in bp of all scaffolds
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_l_gt50k:
    name: scaf_l_gt50k
    description: The total length of scaffolds longer than 50,000 base pairs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_logsum:
    name: scaf_logsum
    description: The sum of the (length*log(length)) of all scaffolds, times some
      constant. Increase the contiguity, the score will increase
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  scaf_max:
    name: scaf_max
    description: Maximum scaffold length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_n_gt50K:
    name: scaf_n_gt50K
    description: The number of scaffolds longer than 50,000 base pairs.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: integer
  scaf_pct_gt50K:
    name: scaf_pct_gt50K
    description: The percentage of the total assembly length represented by scaffolds
      longer than 50,000 base pairs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: float
  scaf_powsum:
    name: scaf_powsum
    description: Powersum of all scaffolds is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25).
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    domain_of:
    - ContigCollection
    range: float

Induced

name: ContigCollection
description: A set of individual, overlapping contigs that represent the complete
  sequenced genome of an organism.
from_schema: https://github.com/kbase/cdm-schema
aliases:
- genome
- biological subject
- assembly
- contig collection
- contig set
mixins:
- HasNames
- HasIdentifiers
- HasHash
attributes:
  asm_score:
    name: asm_score
    description: A composite score for comparing contig collection quality
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: asm_score
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  checkm2_completeness:
    name: checkm2_completeness
    description: Estimate of the completeness of a contig collection (MAG or genome),
      estimated by CheckM2 tool
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: checkm2_completeness
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  checkm2_contamination:
    name: checkm2_contamination
    description: Estimate of the contamination of a contig collection (MAG or genome),
      estimated by CheckM2 tool
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: checkm2_contamination
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  contig_collection_id:
    name: contig_collection_id
    description: Internal (CDM) unique identifier.
    from_schema: https://github.com/kbase/cdm-schema
    identifier: true
    alias: contig_collection_id
    owner: ContigCollection
    domain_of:
    - Contig_X_ContigCollection
    - ContigCollection_X_Feature
    - ContigCollection_X_Protein
    - ContigCollection
    range: UUID
    required: true
  contig_bp:
    name: contig_bp
    description: Total size in bp of all contigs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: contig_bp
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  contig_collection_type:
    name: contig_collection_type
    description: The type of contig collection.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: contig_collection_type
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: ContigCollectionType
  ctg_L50:
    name: ctg_L50
    description: Given a set of contigs, the L50 is defined as the sequence length
      of the shortest contig at 50% of the total contig collection length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: ctg_L50
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  ctg_L90:
    name: ctg_L90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all contigs of that length or longer
      contains at least 90% of the sum of the lengths of all contigs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: ctg_L90
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  ctg_N50:
    name: ctg_N50
    description: Given a set of contigs, each with its own length, the N50 count is
      defined as the smallest number_of_contigs whose length sum makes up half of
      contig collection size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: ctg_N50
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  ctg_N90:
    name: ctg_N90
    description: Given a set of contigs, each with its own length, the N90 count is
      defined as the smallest number of contigs whose length sum makes up 90% of contig
      collection size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: ctg_N90
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  ctg_logsum:
    name: ctg_logsum
    description: The sum of the (length*log(length)) of all contigs, times some constant.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: ctg_logsum
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  ctg_max:
    name: ctg_max
    description: Maximum contig length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: ctg_max
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  ctg_powsum:
    name: ctg_powsum
    description: Powersum of all contigs is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25)
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: ctg_powsum
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  gap_pct:
    name: gap_pct
    description: The gap size percentage of all scaffolds
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: gap_pct
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  gc_avg:
    name: gc_avg
    description: The average GC content of the contig collection, expressed as a percentage
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: gc_avg
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  gc_std:
    name: gc_std
    description: The standard deviation of GC content across the contig collection
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: gc_std
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  n_contigs:
    name: n_contigs
    description: Total number of contigs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: n_contigs
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  n_scaffolds:
    name: n_scaffolds
    description: Total number of scaffolds
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: n_scaffolds
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_L50:
    name: scaf_L50
    description: Given a set of scaffolds, the L50 is defined as the sequence length
      of the shortest scaffold at 50% of the total contig collection length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_L50
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_L90:
    name: scaf_L90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all scaffolds of that length or longer
      contains at least 90% of the sum of the lengths of all scaffolds.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_L90
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_N50:
    name: scaf_N50
    description: Given a set of scaffolds, each with its own length, the N50 count
      is defined as the smallest number of scaffolds whose length sum makes up half
      of contig collection size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_N50
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_N90:
    name: scaf_N90
    description: Given a set of scaffolds, each with its own length, the N90 count
      is defined as the smallest number of scaffolds whose length sum makes up 90%
      of contig collection size
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_N90
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_bp:
    name: scaf_bp
    description: Total size in bp of all scaffolds
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_bp
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_l_gt50k:
    name: scaf_l_gt50k
    description: The total length of scaffolds longer than 50,000 base pairs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_l_gt50k
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_logsum:
    name: scaf_logsum
    description: The sum of the (length*log(length)) of all scaffolds, times some
      constant. Increase the contiguity, the score will increase
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_logsum
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  scaf_max:
    name: scaf_max
    description: Maximum scaffold length
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_max
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_n_gt50K:
    name: scaf_n_gt50K
    description: The number of scaffolds longer than 50,000 base pairs.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_n_gt50K
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: integer
  scaf_pct_gt50K:
    name: scaf_pct_gt50K
    description: The percentage of the total assembly length represented by scaffolds
      longer than 50,000 base pairs
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_pct_gt50K
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  scaf_powsum:
    name: scaf_powsum
    description: Powersum of all scaffolds is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25).
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: scaf_powsum
    owner: ContigCollection
    domain_of:
    - ContigCollection
    range: float
  names:
    name: names
    description: Names, alternative names, and synonyms for an entity.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: names
    owner: ContigCollection
    domain_of:
    - HasNames
    range: Name
    multivalued: true
  identifiers:
    name: identifiers
    description: URIs or CURIEs used to refer to this entity.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: identifiers
    owner: ContigCollection
    domain_of:
    - HasIdentifiers
    range: Identifier
    multivalued: true
  hash:
    name: hash
    description: A hash value generated from one or more object attributes that serves
      to ensure the entity is unique.
    from_schema: https://github.com/kbase/cdm-schema
    rank: 1000
    alias: hash
    owner: ContigCollection
    domain_of:
    - HasHash
    range: string