KBase Narrative Import Process
KBase offers useful tools for analyzing genomics data within Narratives. By adding some metadata to your DTS transfer request, you can easily import your data into a KBase narrative.
Here we describe a schema for the instructions
field of a DTS transfer POST request.
This field is embedded in the manifest.json
file written to the root of the destination folder for
a payload, and holds a JSON object that tells KBase's staging service how to import the contents of
the payload into a KBase narrative. This schema was originally proposed in a GitHub issue.
What we describe here is relevant only to KBase Narrative imports--the structure of the instructions
field is specific to the destination of a transfer to allow the DTS to interact reasonably with
specific systems and organizations.
JSON Object Structure
The instructions
field included in the transfer POST request has the following fields:
protocol
: contains the string"KBase narrative import"
objects
: contains an object interpreted as a dictionary whose keys are supported KBase data types and whose values are lists of JSON objects, each with fields specific to that data type. Each data types is described in the following section.
Here's a brief example of such a JSON object that annotates two Genbank genome files:
{
"instructions": {
"protocol": "KBase narrative import",
"objects": {
"genbank_genome": [
{
"staging_file_subdir_path": "path/to/some_genome.gbk",
"genome_name": "some_genome"
"genome_type": null,
"source": null,
"release": null,
"genetic_code": null,
"scientific_name": null,
"generate_ids_if_needed": null,
"generate_missing_genes": null
},
{
"staging_file_subdir_path": "path/to/some_other_genome.gbk",
"genome_name": "some_other_genome"
"genome_type": null,
"source": null,
"release": null,
"genetic_code": null,
"scientific_name": null,
"generate_ids_if_needed": null,
"generate_missing_genes": null
}
]
}
}
}
Supported KBase Data Types
The data types supported by the KBase staging service are listed below with their fields. These data types are based on the KBase staging service's import specification templates.
All fields required!
Every object of a given data type must have all of its fields specified, with null
values
indicating empty fields. While inconvenient, this complete specification is required by the
current implementation of the KBase staging service.
Fields that must be non-null
are marked bold below.
Assembly (assembly
)
staging_file_subdir_path
: a string containing the path to the file, rooted in the directory containing the manifestassembly_name
: the name of the assembly objecttype
: one of the following strings:"draft isolate"
,"finished isolate"
,"mag"
,"sag"
,"virus"
,"plasmid"
,"construct"
,"metagenome"
min_contig_length
: an integer containing the minimum length of a contig within the assembly
Genbank genome (genbank_genome
)
staging_file_subdir_path
: a string containing the path to the file, rooted in the directory containing the manifestgenome_name
: the name of the genome objectgenome_type
: one of the following strings:"draft isolate"
,"finished isolate"
,"mag"
,"sag"
,"virus"
,"plasmid"
,"construct"
source
: one of the following strings:"RefSeq user"
,"Ensembl user"
,"Other"
release
: a stringgenetic_code
: an integerscientific_name
: the scientific name of the genomegenerate_ids_if_needed
: a stringgenerate_missing_genes
: a string
GFF+FASTA genome (gff_genome
)
fasta_file
: a string containing the path to a FASTA file, rooted in the directory containing the manifestgff_file
: a string containing the path to a GFF file, rooted in the directory containing the manifestgenome_name
: the name of the genome objectgenome_type
: one of the following strings:"draft isolate"
,"finished isolate"
,"fungi"
,"mag"
,"other Eukaryote"
,"plant"
,"sag"
,"virus"
,"plasmid"
,"construct"
scientific_name
: the scientic name of the genomesource
: one of the following strings:"RefSeq user"
,"Ensembl user"
,"JGI"
,"Other"
taxon_wsname
: a stringrelease
: a stringgenetic_code
: an integergenerate_missing_genes
: a string
GFF+FASTA metagenome (gff_metagenome
)
fasta_file
: a string containing the path to a FASTA file, rooted in the directory containing the manifestgff_file
: a string containing the path to a GFF file, rooted in the directory containing the manifestgenome_name
: the name of the genome objectsource
: one of the following strings:"EBI user"
,"IMG user"
,"JGI user"
,"BGI user"
,"Other"
release
: a stringgenetic_code
: an integergenerate_missing_genes
: a string
Interleaved FASTQ reads (fastq_reads_interleaved
)
fastq_fwd_staging_file_name
: a string containing the path to the file, rooted in the directory containing the manifestname
: the name of the interleaved FASTQ reads objectsequencing_tech
: one of the following strings:"Illumina"
,"PacBio CLR"
,"PacBio CCS"
,"IonTorrent"
,"NanoPore"
,"Unknown"
single_genome
: a stringread_orientation_outward
: a stringinsert_size_std_dev
: a float indicating the standard deviation of the insert size distributioninsert_size_mean
: a float indicating the mean of the insert size distribution
Noninterleaved FASTQ reads (fastq_reads_noninterleaved
)
fastq_fwd_staging_file_name
: a string containing the path to the forward reads file, rooted in the directory containing the manifestfastq_rev_staging_file_name
: a string containing the path to the reverse reads file, rooted in the directory containing the manifestname
: the name of the noninterleaved FASTQ reads objectsequencing_tech
: one of the following strings:"Illumina"
,"PacBio CLR"
,"PacBio CCS"
,"IonTorrent"
,"NanoPore"
,"Unknown"
single_genome
: a stringread_orientation_outward
: a stringinsert_size_std_dev
: a float indicating the standard deviation of the insert size distributioninsert_size_mean
: a float indicating the mean of the insert size distribution
SRA reads (sra_reads
)
sra_staging_file_name
: a string containing the path to the SRA staging file, rooted in the directory containing the manifestname
: the name of the SRA reads objectsequencing_tech
: one of the following strings:"Illumina"
,"PacBio CLR"
,"PacBio CCS"
,"IonTorrent"
,"NanoPore"
,"Unknown"
single_genome
: a stringread_orientation_outward
: a stringinsert_size_std_dev
: a float indicating the standard deviation of the insert size distributioninsert_size_mean
: a float indicating the mean of the insert size distribution