7. Implement Code
Important
If you are new to Python, here are a few tips:
Tabs are not used. Anything that is indented is filled with spaces.
Indentation is not optional. Python uses indentation to designate blocks of code that go together.
Indentation must be consistent. If you start with 4, 8 and 12 spaces being your indentation levels, stick with those throughout.
The pound sign (#) indicates comments to the end of the line.
The actual code for your app will live in the python package under lib/{username}ContigFilter
. The entry point, where your code is initially called, lives in the file: lib/{username}ContigFilter/{username}ContigFilterImpl.py
. It is sometimes called the “Implementation” file or simply the “Impl” file. This is the file where you edit your own Python code.
This “Implementation” file defines the python methods available in the module. Two of the methods correspond to our two apps and are named run_{username}ContigFilter
and run_{username}ContigFilter_max
and they are part of the class inside method_nameImpl.py
.
Note
In a real-world app, you may want to split up your code into several python modules and packages. You can place extra modules and folders inside lib/MyOtherModule
and import them in lib/{username}ContigFilter/{username}ContigFilterImpl.py
.
Much of the Implementation file is auto-generated based on the KIDL .spec file. The make
command updates the Implementation file. To separate auto-generated code from developer code, developer code belongs between #BEGIN
and #END
comments. For example:
#BEGIN_HEADER
#END_HEADER
#BEGIN_CLASS_HEADER
#END_CLASS_HEADER
#BEGIN_CONSTRUCTOR
#END_CONSTRUCTOR
#BEGIN run_{username}ContigFilter
#END run_{username}ContigFilter
The make
command preserves everything between the #BEGIN
and #END
comments and replaces everything else.
Warning
Don’t put any spaces between the ‘#’ and ‘BEGIN’ or ‘END’. It has bad consequences.
The method for run_{username}ContigFilter
is a working app and has a lot of code between the #BEGIN
and #END
.
The new app run_{username}ContigFilter_max
has nothing between the #BEGIN
and #END
.
Note
At this point, you could
take a short-cut and copy all the code from the
run_{username}ContigFilter
method and paste it into therun_{username}ContigFilter_max
method.make a few minor edits to add the filter for contigs exceeding the maximum length.
run
kb-sdk test
and see if everything works.
The rest of this page is for those who want to understand how the code works and how to create tests for the code. It goes through the process of building up the code section, step-by-step.
7.1. Receive parameters
Our first goal is to receive and print the method’s parameters. Open method_nameImpl.py
and find the run_{username}ContigFilter_max
method, which should have some auto-generated boilerplate code and docstrings.
You want to edit code between the comments #BEGIN run_{username}ContigFilter_max
and #END run_{username}ContigFilter_max
. These are special SDK-generated annotations that we have to keep in the code to get everything to compile correctly. If you run make
again in the future, it will update the code outside these comments, but will not change the code you put between the #BEGIN
and #END
comments.
Between the comments, add a simple print statement, such as: print(params['min_length'], params['max_length'], params['assembly_ref'])
. This let us see what is getting passed into our method.
def run_{username}ContigFilter_max(self, ctx, params):
"""
:param workspace_name: instance of String
:param params: instance of type "ContigFilterParams" (Input
parameters) -> structure: parameter "min_length" of Long,
parameter "assembly_ref" of String
:returns: instance of type "ContigFilterResults" (Output results) ->
structure:
"""
# ctx is the context object
# return variables are: returnVal
#BEGIN run_{username}ContigFilter_max
print(params['min_length'], params['max_length'], params['assembly_ref'])
output = {}
#END run_{username}ContigFilter_max
return [output]
Don’t try to change the docstring, or anything else outside the #BEGIN run_{username}ContigFilter_max
and #END run_{username}ContigFilter_max
comments, as your change will get overwritten by the make
command.
7.2. Initialize a test
Your {username}ContigFilterImpl.py
file is tested using test/{username}ContigFilterImpl_server_test.py
. This file also has a variety of auto-generated boilerplate code and tests for the first app. Python will automatically run all methods that start with the name test
. There are three tests for the old app. As a temporary measure, we will rename them so they don’t run until we are done working on the new app.
Change
def test_run_{username}ContigFilter_ok(self)`
todef my_test_run_{username}ContigFilter_ok(self)
Change
def test_run_{username}ContigFilter_err1(self)
todef my_test_run_{username}ContigFilter_err1(self)
Change
def test_run_{username}ContigFilter_err2(self)
todef my_test_run_{username}ContigFilter_err2(self)
Now add your own test for the new app method at the bottom of the test class and call it the test_run_{username}ContigFilter_max(self)
.
def test_run_{username}ContigFilter_max(self):
ref = "79/16/1"
result = self.serviceImpl.run_{username}ContigFilter_max(self.ctx, {
'workspace_name': self.wsName,
'assembly_ref': ref,
'min_length': 100,
'max_length': 1000000
})
print(result)
# TODO -- assert some things (later)
We need to provide three parameters to our function: a workspace name, an assembly reference string, and a min length integer. For the reference string, we can use this sample reference to a Shewanella oneidensis assembly on AppDev: 79/16/1
. You can always get a workspace name from the test class by using self.wsName
.
Note
- Make sure that you have put your developer token in the
test_local/test.cfg
as mentioned in the
Run kb-sdk test
and, if everything works, you’ll see the docker container boot up, the run_{username}ContigFilter_max
method will get called, and you will see some printed output.
7.3. Set the callback URL and scratch path
Note
In this “ContigFilter” module, the steps in this section have already been done. They are included here so you can see why they were added to the basic module template.
The callback URL points to a server that is used to spin up other SDK apps that we will need to use in our own app. In our case, we want to use AssemblyUtil to validate and download genome data. When we use that app, our app makes a request to the callback server, which spins up a separate docker container that runs AssemblyUtil.
The other parameter we need is the path to the scratch directory. Scratch is a special directory that we can use to store files used to run the app. It is a shared directory that is also accessible by other apps, such as AssemblyUtil. You cannot use directories like /tmp
when working with AssemblyUtil, because other apps won’t have access to it.
Note
The {username}ContigFilterImpl.py code always uses the scratch directory to store files in your app.
Important
Scratch is a temporary directory and only lasts as long as your app runs. When your app stops running, scratch files are gone. To generate persistent data, we can use Reports, which are described in more detail later on.
To enable callbacks and the scratch directory, this code was added into your __init__
method in your {username}ContigFilterImpl.py
, between the #BEGIN_CONSTRUCTOR
and #END_CONSTRUCTOR
comments:
# Inside your __init__ function:
#BEGIN_CONSTRUCTOR
self.callback_url = os.environ['SDK_CALLBACK_URL']
self.shared_folder = config['scratch']
#END_CONSTRUCTOR
Also added was an import os
in the header of your {username}ContigFilterImpl.py
file, between the #BEGIN_HEADER
and #END_HEADER
comments.
We need to convert the reference to bacterial genome data, passed as an input parameter, into an actual FASTA file that our app can access. For that, we can use the AssemblyUtil app.
The app was installed from your repository’s root directory with:
$ kb-sdk install AssemblyUtil
That added an entry for AssemblyUtil
to your dependencies.json
file. It also added a python package under lib/installed_clients
. Other dependencies can be added the same way.
Important
Don’t forget to git add
these new dependencies to your source control when you run kb-sdk install.
At the top of your {username}ContigFilterImpl.py
file, the module is imported with:
from installed_clients.AssemblyUtilClient import AssemblyUtil
If you made any changes to this code, run the kb-sdk test
command again to make sure you have no errors.
7.4. Add some basic validations
It’s good practice to make some run-time checks of the parameters passed into your {username}ContigFilterImpl#run_{username}ContigFilter_max
method. While params will get checked in the Narrative UI, if your app ever gets called from another codebase, it will bypass any UI typechecks.
Make sure your user passes in a workspace, an assembly reference, a minimum length greater than zero, and a maximum length greater than zero:
# Inside run_{username}ContigFilter_max(), after #BEGIN run_{username}ContigFilter_max, before any other code
# Check that the parameters are valid
for name in ['min_length', 'max_length', 'assembly_ref', 'workspace_name']:
if name not in params:
raise ValueError('Parameter "' + name + '" is required but missing')
if not isinstance(params['min_length'], int) or (params['min_length'] < 0):
raise ValueError('Min length must be a non-negative integer')
if not isinstance(params['max_length'], int) or (params['max_length'] < 0):
raise ValueError('Max length must be a non-negative integer')
if not isinstance(params['assembly_ref'], str) or not len(params['assembly_ref']):
raise ValueError('Pass in a valid assembly reference string')
Feel free to add another test for the max_length
being greater than the min_length
.
Re-run kb-sdk test
to make sure everything still works.
Back to defining tests (test/{username}ContigFilterImpl_server_test.py
).
We can add some additional tests to make sure we raise ValueErrors for invalid parameters:
# Inside test/{username}ContigFilterImpl_server_test.py
# At the end of the test class
def test_invalid_params(self):
impl = self.serviceImpl
ctx = self.ctx
ws = self.wsName
# Missing assembly ref
with self.assertRaises(ValueError):
impl.run_{username}ContigFilter_max(ctx, {'workspace_name': ws,
'min_length': 100, 'max_length': 1000000})
# Missing min length
with self.assertRaises(ValueError):
impl.run_{username}ContigFilter_max(ctx, {'workspace_name': ws, 'assembly_ref': 'x',
'max_length': 1000000})
# Min length is negative
with self.assertRaises(ValueError):
impl.run_{username}ContigFilter_max(ctx, {'workspace_name': ws, 'assembly_ref': 'x',
'min_length': -1, 'max_length': 1000000})
# Min length is wrong type
with self.assertRaises(ValueError):
impl.run_{username}ContigFilter_max(ctx, {'workspace_name': ws, 'assembly_ref': 'x',
'min_length': 'x', 'max_length': 1000000})
# Assembly ref is wrong type
with self.assertRaises(ValueError):
impl.run_{username}ContigFilter_max(ctx, {'workspace_name': ws, 'assembly_ref': 1,
'min_length': 1, 'max_length': 1000000})
Testing for invalid max_length is left as an exercise for the student.
7.5. Download the FASTA file
Back to the method_nameImpl.py
file.
Inside your run_{username}ContigFilter_max
method, initialize the utility and use it to download the assembly_ref
:
# Inside run_{username}ContigFilter_max()
assembly_util = AssemblyUtil(self.callback_url)
fasta_file = assembly_util.get_assembly_as_fasta({'ref': params['assembly_ref']})
print(fasta_file)
We have to initialize AssemblyUtil by passing
self.callback_url
The
get_assembly_as_fasta
method downloads a file from a workspace ref
Run kb-sdk test
again and you should see the file download along with its path in the container.
7.6. Filter out contigs based on length
Now we can finally start to implement the real functionality of the app!
The biopython package (http://biopython.org/ ), included in the SDK build, has a module called SeqIO ( http://biopython.org/wiki/SeqIO ) that can help us read and filter genome sequence data.
This module should already be included in the module’s {username}ContigFilterImpl.py
between the header comments like so:
# other imports
from Bio import SeqIO
Now, inside run_{username}ContigFilter_max
, enter code to filter out contigs less than the given min_length: or greater than the max_length.
# Inside {username}ContigFilterImpl#run_{username}ContigFilter_max, after you have fetched the fasta file:
# Parse the downloaded file in FASTA format
parsed_assembly = SeqIO.parse(fasta_file['path'], 'fasta')
min_length = params['min_length']
max_length = params['max_length']
# Keep a list of contigs greater than min_length
good_contigs = []
# total contigs regardless of length
n_total = 0
# total contigs over the min_length
n_remaining = 0
for record in parsed_assembly:
n_total += 1
if len(record.seq) >= min_length and len(record.seq) <= max_length:
good_contigs.append(record)
n_remaining += 1
output = {
'n_total': n_total,
'n_remaining': n_remaining
}
Run kb-sdk test
again and check the output.
7.7. Add real tests
Return to test/{username}ContigFilterImpl_server_test.py
and add tests for the functionality we just added above.
Set min_length
to a value that filters out some contigs but not others. In our case, our FASTA only has 2 sequences of lengths 4,969,811 and 161,613. An in-between minimum could be 200,000. To test the upper end, a minimum could be 100,000 and a maximum could be 400,000
We would expect to keep 1 contig and filter out the other.
# Inside {username}ContigFilterImpl_server_test:
def test_run_{username}ContigFilter_test_min(self):
ref = "79/16/1"
params = {
'workspace_name': self.wsName,
'assembly_ref': ref,
'min_length': 200000,
'max_length': 6000000
}
result = self.serviceImpl.run_{username}ContigFilter_max(self.ctx, params)
self.assertEqual(result[0]['n_total'], 2)
self.assertEqual(result[0]['n_remaining'], 1)
def test_run_{username}ContigFilter_test_max(self):
ref = "79/16/1"
params = {
'workspace_name': self.wsName,
'assembly_ref': ref,
'min_length': 100000,
'max_length': 4000000
}
result = self.serviceImpl.run_{username}ContigFilter_max(self.ctx, params)
self.assertEqual(result[0]['n_total'], 2)
self.assertEqual(result[0]['n_remaining'], 1)
Run kb-sdk test
again to make sure it all passes.
7.8. Output the filtered assembly
Next, we want to save and upload a new version of our genome assembly data with the contigs filtered out.
Beneath the code that we wrote to filter the assembly, add this file saving and uploading code.
# Underneath your loop that filters contigs:
# Create a file to hold the filtered data
workspace_name = params['workspace_name']
filtered_path = os.path.join(self.shared_folder, 'filtered.fasta')
SeqIO.write(good_contigs, filtered_path, 'fasta')
# Upload the filtered data to the workspace
new_ref = assembly_util.save_assembly_from_fasta({
'file': {'path': filtered_path},
'workspace_name': workspace_name,
'assembly_name': fasta_file['assembly_name']
})
output = {
'n_total': n_total,
'n_remaining': n_remaining,
'filtered_assembly_ref': new_ref
}
#END run_{username}ContigFilter_max
Add a simple assertion into your test_run_{username}ContigFilter_max
method to check for the filtered_assembly_ref
. Something like:
self.assertTrue(len(result[0]['filtered_assembly_ref']))
Run kb-sdk test
again to make sure you have no errors
7.9. Build a report object
In order to output data into the UI inside a narrative, your app needs to build and return a KBaseReport ( https://github.com/kbaseapps/KBaseReport ).
The following KBaseReport app should be installed already:
$ kb-sdk install KBaseReport
Import the report module should be between the #BEGIN_HEADER
and #END_HEADER
section of your {username}ContigFilterImpl.py
file:
from KBaseReport.KBaseReportClient import KBaseReport
The KBaseReport takes a series of dictionary objects that can have text messages, object references, and more. Add the report initialization code inside your run_{username}ContigFilter_max
method:
# Inside the run_{username}ContigFilter_max method, below where we uploaded the new file:
# Create an output summary message for the report
text_message = "".join([
'Filtered assembly to ',
str(n_remaining),
' contigs out of ',
str(n_total)
])
# Data for creating the report, referencing the assembly we uploaded
report_data = {
'objects_created': [
{'ref': new_ref, 'description': 'Filtered contigs'}
],
'text_message': text_message
}
# Initialize the report
kbase_report = KBaseReport(self.callback_url)
report = kbase_report.create({
'report': report_data,
'workspace_name': workspace_name
})
# Return the report reference and name in our results
output = {
'report_ref': report['ref'],
'report_name': report['name'],
'n_total': n_total,
'n_remaining': n_remaining,
'filtered_assembly_ref': new_ref
}
#END run_{username}ContigFilter_max
Add a couple assertions in our test_run_{username}ContigFilter_max
method inside test/{username}ContigFilterImpl_server_test.py
to check for the report name and ref:
self.assertTrue(len(result[0]['report_name']))
self.assertTrue(len(result[0]['report_ref']))
Run kb-sdk test
again to make sure it all works.
7.10. Configure your app’s output data
We nearly have a complete app. The last step has already been added to our “ContigFilter” module. If starting from a blank template, you would take all the result data we defined in {username}ContigFilterImpl#run_{username}ContigFilter_max
and add entries for them in our {username}ContigFilter.spec
KIDL type file as well as our spec.json
UI config file.
If not there already, add a type entry for our result data in our KIDL file:
Run make
and kb-sdk test
again to make sure everything works.
In your ui/narrative/methods/run_{username}ContigFilter_max/spec.json
file, if not there already, add entries for this output data:
...
"output_mapping": [
{
"service_method_output_path": [0,"report_name"],
"target_property": "report_name"
},
{
"service_method_output_path": [0,"report_ref"],
"target_property": "report_ref"
},
{
"narrative_system_variable": "workspace",
"target_property": "workspace_name"
}
]
...
Now we have some output entries that point to our report and workspace, which will show up when the job finishes in the narrative.
Finally, under widgets/output
in the spec.json (near the top around line 10), set output
to no-display
. This prevents our app from creating a separate output cell. It may already be set to no-display
because that is the default.
...
"widgets": {
"input": null,
"output": "no-display"
},
...
We’ve added an entry for everything we put in the output
dictionary field that gets returned from {username}ContigFilterImpl#run_{username}ContigFilter_max
.
Run kb-sdk test
a final time to make sure everything runs smoothly. If so, we have a working app!
Now that you are done with the new app, remember the three tests for the old app that we commented out? Time to uncomment them. In the test file test/{username}ContigFilterImpl_server_test.py
:
Change
def my_test_run_{username}ContigFilter_ok(self)`
todef test_run_{username}ContigFilter_ok(self)
Change
def my_test_run_{username}ContigFilter_err1(self)
todef test_run_{username}ContigFilter_err1(self)
Change
def my_test_run_{username}ContigFilter_err2(self)
todef test_run_{username}ContigFilter_err2(self)