Running shell commands

Download and compile using your Dockerfile

Many KBase apps wrap third-party tools with commandline interfaces. You can compile these programs for your app by entering the build commands in your Dockerfile. For example, FastANI compiles and installs with this series of commands:

# Fetch and compile FastANI
RUN git clone https://github.com/ParBLiSS/FastANI.git /opt/FastANI \
    && cd /opt/FastANI \
    && git checkout tags/v1.0 -b v1.0 \
    && ./bootstrap.sh \
    && ./configure \
    && make
# Place FastANI in the PATH for the docker user
RUN ln -s $(readlink -f ./fastANI) /usr/local/bin/fastANI

Some things to note:

  • Be sure to check out a specific version tag or commit so that your app doesn’t receive unexpected, possibly broken updates

  • Put this code high up in your Dockerfile, above where your app gets initialized, so that it gets cached and doesn’t have to re-run every time you change your app

  • You can place commands in the directory /kb/deployment/bin/, and it will automatically get added to your container’s $PATH

Invoking the command in Python

You can use the subprocess package to invoke commands from within python.

The following example from the kb_quast repository demonstrates how to do (and document) this well with Python.

def run_quast_exec(self, outdir, filepaths, labels, skip_glimmer=False):
    threads = psutil.cpu_count() * self.THREADS_PER_CORE
    # DO NOT use genemark instead of glimmer, not open source
    # DO NOT use metaQUAST, uses SILVA DB which is not open source
    cmd = ['quast.py', '--threads', str(threads), '-o', outdir, '--labels', ','.join(labels),
           '--glimmer', '--contig-thresholds', '0,1000,10000,100000,1000000'] + filepaths

    if skip_glimmer:
        self.log('skipping glimmer due to large input file(s)')
        cmd.remove('--glimmer')

    self.log('running QUAST with command line ' + str(cmd))
    retcode = _subprocess.call(cmd)
    self.log('QUAST return code: ' + str(retcode))
    if retcode:
        # can't actually figure out how to test this. Give quast garbage it skips the file.
        # Give quast a file with a missing sequence it acts completely normally.
        raise ValueError('QUAST reported an error, return code was ' + str(retcode))
    # quast will ignore bad files and keep going, which is a disaster for
    # reproducibility and accuracy if you're not watching the logs like a hawk.
    # for now, use this hack to check that all files were processed. Maybe there's a better way.
    files_proc = len(open(_os.path.join(outdir, 'report.tsv'), 'r').readline().split('\t')) - 1
    files_exp = len(filepaths)
    if files_proc != files_exp:
        err = ('QUAST skipped some files - {} expected, {} processed.'
               .format(files_exp, files_proc))
        self.log(err)