Storage systems

A storage systems can be thought of as an individual data repository that you want to access through Tapis. The following JSON object shows how a basic storage systems is described.

{
   "id":"sftp.storage.example.com",
   "name":"Example SFTP Storage System",
   "type":"STORAGE",
   "description":"My example storage system using SFTP to store data for testing",
   "storage":{
      "host":"storage.example.com",
      "port":22,
      "protocol":"SFTP",
      "rootDir":"/",
      "homeDir":"/home/systest",
      "auth":{
         "username":"systest",
         "password":"changeit",
         "type":"PASSWORD"
      }
   }
}

The first four attributes are common to both storage and execution systems. The storage attribute describes the connectivity and authentication information needed to connect to the remote system. Here we describe a SFTP server accessible on port 22 at host storage.example.com. We specify that we want the rootDir, or virtual system root exposed through Tapis, to be the system’s physical root directory, and we want the authenticated user’s home directory to be the homeDir, or virtual home directory and base of all relative paths given to Tapis. Finally, we tell Tapis to use password based authentication and provided the necessary credentials.

This example is given as a simple illustration of how to describe a systems for use by Tapis. In most situations you should NOT provide your username and password. In fact, if you are using a compute or storage systems from your university or government-funded labs it is, at best, against the user agreement and, at worst, illegal to give your password to a third party service such as Tapis. In these situations, use one of the many other authentication options such as SSH keys, X509 authentication, or a 3rd party authentication service like the MyProxy Gateway.

The full list of storage system attributes is described in the following table.

Attribute Type Description
available boolean Whether the system is currently available for use in the API. Unavailable systems will not be visible to anyone but the owner. This differs from the `status` attribute in that a system may be UP, but not available for use in Tapis. Defaults to true
description string Verbose description of this system.
id string Required: A unique identifier you assign to the system. A system id must be globally unique across a tenant and cannot be reused once deleted.
name string Required: Common display name for this system.
site string The site associated with this system. Primarily for logical grouping.
status UP, DOWN, MAINTENANCE, UNKNOWN The functional status of the system. Systems must be in UP status to be used.
storage JSON Object Required: Storage configuration describing the storage config defining how to connect to this system for data staging.
type STORAGE, EXECUTION Required: Must be STORAGE.

Supported data and authentication protocols

The example above described a system accessible by SFTP. Tapis supports many different data and authentication protocols for interacting with your data. Sample configurations for many protocol combinations are given below.

Sample storage system definition with each supported data protocol and authentication configuration.

{
   "id":"sftp.storage.example.com",
   "name":"Example SFTP Storage System",
   "status":"UP",
   "type":"STORAGE",
   "description":"My example storage system using SFTP to store data for testing",
   "site":"example.com",
   "storage":{
      "host":"storage.example.com",
      "port":22,
      "protocol":"SFTP",
      "rootDir":"/",
      "homeDir":"/home/systest",
      "auth":{
         "username":"systest",
         "password":"changeit",
         "type":"PASSWORD"
      }
   }
}

In each of the examples above, the storage objects were slightly different, each unique to the protocol used. Descriptions of every attribute in the storage> object and its children are given in the following tables.

storage attributes give basic connectivity information describing things like how to connect to the system and on what port.

Attribute Type Description
auth JSON object Required: A JSON object describing the default authentication credential for this system.
container string The container to use when interacting with an object store. Specifying a container provides isolation when exposing your cloud storage accounts so users do not have access to your entire storage account. This should be used in combination with delegated cloud credentials such as an AWS IAM user credential.
homeDir string The path on the remote system, relative to rootDir to use as the virtual home directory for all API requests. This will be the base of any requested paths that do not being with a '/'. Defaults to '/', thus being equivalent to rootDir.
host string Required: The hostname or ip address of the storage server
port int Required: The port number of the storage server.
mirror boolean Whether the permissions set on the server should be pushed to the storage system itself. Currently, this only applies to IRODS systems.
protocol FTP, GRIDFTP, IRODS, IRODS4, LOCAL, S3, SFTP Required: The protocol used to authenticate to the storage server.
publicAppsDir string The path on the remote system where apps will be stored if this system is used as the default public storage system.
proxy JSON Object The proxy server through with Tapis will tunnel when submitting jobs. Currently proxy servers will use the same authentication mechanism as the target server.
resource string The name of the default resource to use when defining an IRODS system.
rootDir string The path on the remote system to use as the virtual root directory for all API requests. Defaults to '/'.
zone string The name of the default zone to use when defining an IRODS system.

storage.auth attributes give authentication information describing how to authenticate to the system specified in the storage config above.

Attribute Type Description
credential string The credential used to authenticate to the remote system. Depending on the authentication protocol of the remote system, this could be an OAuth Token, X.509 certificate.
internalUsername string The username of the internal user associated with this credential.
password string The password on the remote system used to authenticate.
privateKey string The private ssh key used to authenticate to the remote system.
publicKey string The public ssh key used to authenticate to the remote system.
server JSON object A JSON object describing the authentication server from which a valid credential may be obtained. Currently only auth type X509 supports this attribute.
type APIKEYS, LOCAL, PAM, PASSWORD, SSHKEYS, or X509 Required: The path on the remote system where apps will be stored if this system is used as the default public storage system.
username string The remote username used to authenticate.

storage.auth.server attributes give information about how to obtain a credential that can be used in the authentication process. Currently only systems using the X509 authentication can leverage this feature to communicate with MyProxy and MyProxy Gateway servers.

Attribute Type Description
name string A descriptive name given to the credential server
endpoint string Required: The endpoint of the authentication server.
port integer Required: The port on which to connect to the server.
protocol MPG, MYPROXY Required: The protocol with which to obtain an authentication credential.

system.proxy configuration attributes give information about how to connect to a remote system through a proxy server. This often happens when the target system is behind a firewall or resides on a NAT. Currently proxy servers can only reuse the authentication configuration provided by the target system.

Attribute Type Description
name string Required: A descriptive name given to the proxy server.
host string Required: The hostname of the proxy server.
port integer Required: The port on which to connect to the proxy server. If null, the port in the parent storage config is used.

If you have not yet set up a system of your own, now is a good time to grab a sandbox system to use while you follow along with the rest of this tutorial.

Creating a new storage system

tapis systems create -v -F sftp-password.json
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -F "fileToUpload=@sftp-password.json" https://api.tacc.utexas.edu/systems/v2

The response from the service will be similar to the following:

{
  "site": null,
  "id": "sftp.storage.example.com",
  "revision": 1,
  "default": false,
  "lastModified": "2016-09-06T17:46:42.621-05:00",
  "status": "UP",
  "description": "My example storage system using SFTP to store data for testing",
  "name": "Example SFTP Storage System",
  "owner": "nryan",
  "globalDefault": false,
  "available": true,
  "uuid": "4036169328045649434-242ac117-0001-006",
  "public": false,
  "type": "STORAGE",
  "storage": {
    "mirror": false,
    "port": 22,
    "homeDir": "/home/systest",
    "protocol": "SFTP",
    "host": "storage.example.com",
    "publicAppsDir": null,
    "proxy": null,
    "rootDir": "/",
    "auth": {
      "type": "PASSWORD"
    }
  },
  "_links": {
    "roles": {
      "href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com/roles"
    },
    "owner": {
      "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
    },
    "credentials": {
      "href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com/credentials"
    },
    "self": {
      "href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com"
    },
    "metadata": {
      "href": "https://api.tacc.utexas.edu/meta/v2/data/?q=%7B%22associationIds%22%3A%224036169328045649434-242ac117-0001-006%22%7D"
    }
  }
}

Congratulations, you just added your first system. This storage system can now be used by the Files service to manage data, the Transfer service as a source or destination of data movement, the Apps service as a application repository, and the Jobs Service as both a staging and archiving destination.

Notice that the JSON returned from the Systems service is different than what was submitted. Several fields have been added, and several other have been removed. On line 3, the UUID of the system has been added. This is the same UUID that is used in notifications and metadata references. On line 5, the status value was added in and assigned a default value since we did not specify it. Ditto for the site attribute on line 8.

Three new fields were added on lines 9-11. revision is the number of times this system has been updated. This being our first time registering the system, it is set to 1. public tells whether this system is published as a shared resource for all users. We will cover this more in the section on System scope. lastModified is a timestamp of the last time the system was updated.

In the storage object, the publicAppsDir and mirror fields were both added and set to their default values. In this example we are not using a proxy server, so it was defaulted to null. Last, and most important, all authentication information has been omitted from the response object. Regardless of the authentication type, no user credential information will ever be returned once they are stored.