Configuring the DTS
You can configure a DTS instance by creating a YAML text
file similar to dts.yaml.example
in the repository. Typically this file is named dts.yaml
, and is passed as an
argument to the dts
executable. Here we describe the different sections in
this file and how they affect your DTS instance.
Configuration File Sections
Click on any of the links below to see the relevant details for a section.
- service: configureѕ settings for the DTS web service such as the port on which it listens, the maximum number of connections, intervals for polling and scrubbing completed tasks, data directories, and diagnostics
- endpoints: configures the endpoints used to transfer files from one place to another
- databases: configures databases for organizations that integrate with the DTS
Each of these sections is described below, with a motivating example.
service
service:
port: 8080
max_connections: 100
max_payload_size: 50
poll_interval: 60000
endpoint: globus-local
data_dir: /path/to/dir
manifest_dir: /path/to/dir
delete_after: 604800
debug: true
double_check_staging: false
The service
section contains parameters that control nuts-and-bolts behavior
of the web service portion of the Data Transfer Service. The fields in this
section are:
port
: the port on which the service listensmax_connections
: the maximum number of connections that are simultaneously available for DTS clients. If a client sends a request to the DTS when all connections are occupied, the request is denied.max_payload_size
: the maximum payload size (in GB) allowed by the service. If a client requests the transfer of a payload larger than this size, the request is denied.poll_interval
: the interval (in milliseconds) at which the DTS checks for progress in any ongoing transfers. Because the file transfers orchestrated by the DTS typically take a long time, it's reasonable to set this parameter to a minute (60000 ms) or even longer. However, sometimes it's useful to have a smaller polling interval, like when you're testing a feature. This parameter is optional and defaults to 60000 ms.endpoint
: the name of an endpoint (defined in the endpoints section) used by the DTS to generate and transfer manifests to destination endpoints. This endpoint must have access to the file system to which the DTS writes its manifests.data_dir
: a path to a directory on the local file system that the DTS uses for its own storage. The DTS should have read/write access to this directory.manifest_dir
: a path to a directory on the local file system in which the DTS writes transfer manifests. The endpoint named in theendpoint
parameter must have read access to this directory in order to send the manifest to its destination.delete_after
: the interval (in seconds) after which the DTS deletes the record for a completed transfer, whether the transfer completed successfully or unsuccessfully. This makes it possible for users to query the status of completed transfers for the given interval. This parameter is optional and defaults to 7 days (604800 seconds).debug
: an optional parameter that, if set totrue
, enables more detailed logging and other features that are helpful for troubleshooting and development work. The default value isfalse
.double_check_staging
: an optional parameter that, if set totrue
, performs additional checks for staged files. This parameter can be useful for figuring out the appropriateroot
for an endpoint.
endpoints
endpoints:
globus-local:
name: name-of-local-endpoint
id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
provider: globus
auth:
client_id: <ID of client with authentication secret>
client_secret: <secret>
globus-jdp:
name: name-of-jdp-endpoint
id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
provider: globus
auth:
client_id: <ID of client with authentication secret>
client_secret: <secret>
globus-kbase:
name: name-of-kbase-endpoint
id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
provider: globus
auth:
client_id: <ID of client with authentication secret>
client_secret: <secret>
This section is a mapping (set of key-value pairs) that associates the names of endpoints (keys) with sets of parameters that define their behaviors (values). The endpoints defined here can be referred to in the other sections. The fields that define the behavior of each endpoint are:
name
: a human-readable name for the endpoint, which can be helpful in diagnostic and error-related messagesid
: a UUID that uniquely identifies the endpoint in a (provider-specific) way that allows the DTS to access itprovider
: the name of the service providing the endpoint capability. Valid values for this parameter are:globus
: identifies the endpoint as a Globus Collection (in which case theid
parameter is the corresponding UUID)local
: identifies the endpoint as a local endpoint with access only to the DTS's local file system. This type of endpoint is only useful for testing.
auth
: this optional parameter provides authentication information to the endpoint's provider if necessary. Its fields are:client_id
: an ID that identifies the DTS to the endpoint's provider as a clientclient_secret
: a string containing a secret corresponding to the ID provided by theclient_id
parameter
root
: this optional parameter specifies the root directory used by DTS to refer to files on the underlying filesystem of the endpoint. If left blank, the root directory is set to/
.
databases
databases:
jdp:
name: JGI Data Portal
organization: Joint Genome Institute
endpoint: globus-jdp
kbase:
name: KBase Workspace Service (KSS)
organization: KBase
endpoint: globus-kbase
This section is a mapping (set of key-value pairs) that associates the names of databases (keys) with sets of parameters that define the databases themselves (at least, as far as the DTS is concerned). These databases are the sources and destinations for all file transfers performed by the DTS. The keys in this section identify the databases that are configured for the DTS, and are referred to in transfer requests specified by DTS clients. Supported databases are:
jdp
: the Joint Genome Institute Data Portalkbase
: the Department of Energy Systems Biology Knowledgebase (KBase)
Valid fields for each database are:
name
: a human-readable name for the database, useful in diagnostic and error-related messagesorganization
: a human-readable name for the organization that provides the database (again, purely informational)endpoint
: the name of the endpoint defined in the endpoints section that provides the DTS with access to the file staging area for the database