Editing your app’s Dockerfile

To set up custom programs for your app, such as third-party tools, you can edit your Dockerfile, which contains the setup code needed to build up your container.

A helpful resource is the Dockerfile Best Practices guide.

The first two lines of the Dockerfile look something like:

FROM kbase/sdkbase2:latest
MAINTAINER KBase Developer

Change KBase Developer to be the email address of the person who will be responsible for the upkeep of the module.

The base KBase Docker image contains a KBase Ubuntu image with the dependencies necessary JSON-RPC server in the supported language, as well as a core set of KBase API Clients. You will need to include any dependencies, including the installation of whatever tool you are wrapping, to build a custom Docker image that can run your Module.

For example:

RUN \
  git clone https://github.com/voutcn/megahit.git && \
  cd megahit && \
  git checkout tags/v1.0.3 && \
  make

Note: you do not have to add any lines to the Dockerfile for installing your SDK Module code, as this will happen automatically. The contents of your SDK Module git repo will be added at /kb/module.

It is also important to note that layers in the Docker image are generated by each command in the Dockerfile. To make a streamlined Docker image which will deploy on a worker node more quickly and make cycles of a Docker image build faster, it is best to remove any large, extraneous files, such as a tarball from which you installed a tool.

To accomplish this, commands to your Dockerfile should look like this:

RUN \
  wget blah.com/tarball.tgz && \
  tar xzf tarball.tgz && \
  cd tarball && \
  make && \
  cd .. && \
  rm -rf tarball.tgz

Where each && lets you chain together multiple bash commands, and the backslash continues the same, single-line command over multiple lines.

Avoid this:

RUN wget blah.com/tarball.tgz
RUN tar xzf tarball.tgz
RUN cd tarball
RUN make
RUN cd ..
RUN rm -rf tarball.tgz

Each call to RUN creates a separate Docker image layer that sits on top of the previous one. Previous layers are read-only, so you can’t modify their content if you wanted to clean up files. In general, you will want one RUN command for each discrete service that you set up in your container.

Final note: Docker will rebuild everything from the first detected change in a dockerfile but pull everything upstream of that from its cache. If you are pulling in external data using RUN and a command like git clone or wget, then changes in those sources will not automatically be reflected in a rebuilt Docker image unless the Docker file changes at or before that import.