Working with reference data

Adding Reference Data to a KBase App

Usage

To make use of this feature, the app developer needs to do a few simple things.

  1. Add the data-version tag to the kbase.yaml file and give it a semantic version tag.

  2. Add any downloading and data preparation steps to the init section of the entrypoint.sh script in the scripts directory. The init script should place the data in /data and any apps that use the data should be configured to look for the data in this location.

  3. Create a __READY__ file in /data to indicate that the reference data volume was successfully created. Ideally, some sanity tests should be performed to confirm things ran correctly by testing for the presence of expected files. If the __READY__ file is not present, the registration will fail, and the reference data area will be removed.

You can see an example in the RAST application

Updating Reference Data

If a new version of reference data is required, the developer can increase the version number for the reference data in kbase.yaml, make any updates to the init section, and re-register the app. This will trigger the registration to initialize the reference data for the new data version. Older versions of the app will continue to use the previous reference data specified in that versions kbase.yaml file. This helps to ensure reproducibility.

Gotchas

There a few things to watch out for with reference data:

  • The reference data area (/data) is mounted during initialization (module registration) and during app execution and replaces /data from the Docker image. Any modifications made by the Dockerfile to this space will not be visible. Changes must be done in the init block in entrypoint.sh.

  • The reference data is only writeable at registration time. This is to ensure that the data is not accidentally changed during execution which could break reproducibility. If the app requires the reference data to be writeable when it executes, then you add code to the execution that copies the data into the writeable work area prior to running the underlying application.