Working with the file utils

Need to work with a type of file? There’s a built-in utility for that. Remember that utils run in separate containers that have a link to scratch, so keep your files in scratch and upload them from there.

Where are the utils?

Search for “Util” in the catalog.

DataFileUtil, GenomeFileUtil, and AssemblyUtil are very commonly used. Read the examples below for a quick idea on how to use them.

Note

Upload local genome files for all of your tests. If you use an existing workspace reference in your tests, it won’t work in the AppDev/CI environments.

DataFileUtil

DataFileUtil is the preferred general utility to fetch and save objects. Use GenomeFileUtil or AssemblyUtil if you are specifically working with FASTA or GFF files.

Initialize DataFileUtil client and get an object by reference.

self.callback_url = os.environ['SDK_CALLBACK_URL']
self.dfu = DataFileUtil(self.callback_url)
genome_ref = "your/object_reference"
genome_data = dfu.get_objects({'object_refs': [genome_ref]})['data'][0]
genome_obj = genome_data['data']
genome_meta = genome_data['info'][10]

Upload a file or directory to shock

# Upload a directory and zip it
file = self.dfu.file_to_shock({"file_path": scratch_path, "pack": "zip"})
file['shock_id']  # has the shock id

# Upload a single file to shock
file = self.dfu.file_to_shock({"file_path": scratch_path})["shock_id"]
file['shock_id']  # has the shock id

Save an object to the workspace and get an object reference

save_object_params = {
    'id': workspace_id,
    'objects': [{
        'type': 'KBaseRNASeq.RNASeqSampleSet',
        'data': sample_set_data,
        'name': sample_set_object_name
    }]
}

dfu_oi = dfu.save_objects(save_object_params)[0]
# Construct the workspace reference: 'workspace_id/object_id/version'
object_reference = str(dfu_oi[6]) + '/' + str(dfu_oi[0]) + '/' + str(dfu_oi[4])

Download an object from shock to a filepath

self.dfu.shock_to_file({
    'shock_id': shock_id,
    'file_path': scratch_directory,
    'unpack': 'unpack'
})

GenomeFileUtil

Download:

file = gfu.genome_to_gff({'genome_ref': genome_ref})
file['path']  # -> '/path/to/your/gff_file'

Upload:

gfu = GenomeFileUtil(os.environ['SDK_CALLBACK_URL'], token=self.getContext()['token'])
gfu.genbank_to_genome({
    'file': {'path': scratch_path},
    'workspace_name': workspace_name,
    'genome_name': genome_obj
})

AssemblyUtil

Download:

assembly_util = AssemblyUtil(self.callback_url)
file = assembly_util.get_assembly_as_fasta({
    'ref': assembly_workspace_reference
})
file['path']  # -> 'path/to/your/fasta/file.fna'

Upload:

assembly_util = AssemblyUtil(self.callback_url)
return assembly_util.save_assembly_from_fasta({
    'file': {'path': scratch_file_path},
    'workspace_name': workspace_name,
    'assembly_name': 'my_uploaded_assembly'
}

GenomeSearchUtil