KBase Narrative Import Process
KBase offers useful tools for analyzing genomics data within Narratives. By adding some metadata to your DTS transfer request, you can easily import your data into a KBase narrative.
Here we describe a structure for the instructions
field of a DTS transfer POST request
suitable for importing content to a KBase narrative. The instructions
field is embedded in the
manifest.json
file written to the root of the destination folder for a payload. In general, this
field holds a JSON object that conveys information to a destination database that can be used for
processing data after a transfer. In the case of KBase, the instructions
field is interpreted by
a file staging service to import the contents of the payload into a KBase narrative. This particular
structure is described in detail here.
What we describe here is relevant only to KBase Narrative imports--the structure of the instructions
field is specific to the destination of a transfer to allow the DTS to interact reasonably with
specific systems and organizations.
JSON Object Structure
The instructions
field included in the transfer POST request has the following fields:
protocol
: contains the string"KBase narrative import"
objects
: contains an object interpreted as a dictionary whose keys are supported KBase data types and whose values are lists of JSON objects, each with fields specific to that data type. Each data types is described in the following section.
Here's a brief example of such a JSON object that annotates two Genbank genome files:
{
"instructions": {
"protocol": "KBase narrative import",
"objects": [
{
"data_type": "genbank_genome",
"parameters": {
"staging_file_subdir_path": "path/to/some_genome.gbk",
"genome_name": "some_genome"
"generate_missing_genes": 0,
"genome_type": null,
"source": null,
"release": null,
"genetic_code": null,
"scientific_name": null,
"generate_ids_if_needed": null
}
},
{
"data_type": "genbank_genome",
"parameters": {
"staging_file_subdir_path": "path/to/some_other_genome.gbk",
"genome_name": "some_other_genome"
"generate_missing_genes": 0,
"genome_type": null,
"source": null,
"release": null,
"genetic_code": null,
"scientific_name": null,
"generate_ids_if_needed": null
}
}
]
}
}
An object does not necessarily correspond to a single file!
Some data types are associated with multiple files. For example, the gff_genome
type has both
a FASTA file and a GFF file. These files appear in their respective fields in a single object
within the objects
array.
Supported KBase Data Types
The data types supported by the KBase staging service are listed below with their fields. These data types are based on the KBase staging service's import specification templates.
All fields are required!
Every object of a given data type must have all of its fields specified, with null
values
indicating empty fields. While inconvenient, this complete specification is required by the
current implementation of the KBase staging service.
Fields that must be non-null
are marked bold below.
Assembly (assembly
)
assembly_name
: the name of the assembly objectmin_contig_length
: an integer containing the minimum length of a contig within the assemblystaging_file_subdir_path
: a string containing the path to the file, rooted in the directory containing the manifesttype
: one of the following strings:"draft isolate"
,"finished isolate"
,"mag"
,"sag"
,"virus"
,"plasmid"
,"construct"
,"metagenome"
Genbank genome (genbank_genome
)
genome_name
: the name of the genome objectgenerate_ids_if_needed
: a stringgenetic_code
: an integergenerate_missing_genes
: 0 or 1genome_type
: one of the following strings:"draft isolate"
,"finished isolate"
,"mag"
,"sag"
,"virus"
,"plasmid"
,"construct"
release
: a stringscientific_name
: the scientific name of the genomesource
: one of the following strings:"RefSeq user"
,"Ensembl user"
,"Other"
staging_file_subdir_path
: a string containing the path to the file, rooted in the directory containing the manifest
GFF+FASTA genome (gff_genome
)
fasta_file
: a string containing the path to a FASTA file, rooted in the directory containing the manifestgenerate_missing_genes
: 0 or 1gff_file
: a string containing the path to a GFF file, rooted in the directory containing the manifestgenetic_code
: an integergenome_name
: the name of the genome objectgenome_type
: one of the following strings:"draft isolate"
,"finished isolate"
,"fungi"
,"mag"
,"other Eukaryote"
,"plant"
,"sag"
,"virus"
,"plasmid"
,"construct"
release
: a stringscientific_name
: the scientic name of the genomesource
: one of the following strings:"RefSeq user"
,"Ensembl user"
,"JGI"
,"Other"
taxon_wsname
: a string
GFF+FASTA metagenome (gff_metagenome
)
fasta_file
: a string containing the path to a FASTA file, rooted in the directory containing the manifestgenerate_missing_genes
: 0 or 1genetic_code
: an integergenome_name
: the name of the genome objectgff_file
: a string containing the path to a GFF file, rooted in the directory containing the manifestrelease
: a stringsource
: one of the following strings:"EBI user"
,"IMG user"
,"JGI user"
,"BGI user"
,"Other"
Interleaved FASTQ reads (fastq_reads_interleaved
)
fastq_fwd_staging_file_name
: a string containing the path to the file, rooted in the directory containing the manifestinsert_size_std_dev
: a float indicating the standard deviation of the insert size distributioninsert_size_mean
: a float indicating the mean of the insert size distributionname
: the name of the interleaved FASTQ reads objectread_orientation_outward
: a stringsequencing_tech
: one of the following strings:"Illumina"
,"PacBio CLR"
,"PacBio CCS"
,"IonTorrent"
,"NanoPore"
,"Unknown"
single_genome
: a string
Noninterleaved FASTQ reads (fastq_reads_noninterleaved
)
fastq_fwd_staging_file_name
: a string containing the path to the forward reads file, rooted in the directory containing the manifestfastq_rev_staging_file_name
: a string containing the path to the reverse reads file, rooted in the directory containing the manifestinsert_size_mean
: a float indicating the mean of the insert size distributioninsert_size_std_dev
: a float indicating the standard deviation of the insert size distributionname
: the name of the noninterleaved FASTQ reads objectread_orientation_outward
: a stringsequencing_tech
: one of the following strings:"Illumina"
,"PacBio CLR"
,"PacBio CCS"
,"IonTorrent"
,"NanoPore"
,"Unknown"
single_genome
: a string
SRA reads (sra_reads
)
insert_size_mean
: a float indicating the mean of the insert size distributioninsert_size_std_dev
: a float indicating the standard deviation of the insert size distributionname
: the name of the SRA reads objectread_orientation_outward
: a stringsra_staging_file_name
: a string containing the path to the SRA staging file, rooted in the directory containing the manifestsequencing_tech
: one of the following strings:"Illumina"
,"PacBio CLR"
,"PacBio CCS"
,"IonTorrent"
,"NanoPore"
,"Unknown"
single_genome
: a string