Tapis

Tapis is an open source, science-as-a-service API platform for powering your digital lab. Documentation is presented below:

Introduction

This is the documentation for Tapis V2. The documentation for Tapis V3 is listed in the section below.

The Tapis V2 Platform is an open source, science-as-a-service API platform for powering your digital lab. Tapis allows you to bring together your public, private, and shared high performance computing (HPC), high throughput computing (HTC), Cloud, and Big Data resources under a single, web-friendly REST API.

  • Run code
  • Manage data
  • Collaborate meaningfully
  • Integrate anywhere

The Tapis documentation site contains documentation, guides, tutorials, and lots of examples to help you build your own digital lab.

V3 Documentation

If you are looking for documentation for Tapis V3 please go here

Source Code

If you are looking for source code, you can find it here:

Conventions

Throughout the documentation you will regularly encounter the following variables. These represent user-specific values that should be replaced when attempting any of the calls using your account.

Variable Description Example
${API_HOST} Base hostname of the API api.tacc.utexas.edu
${API_VERSION} Version of the API endpoint v2.2.8
${API_USERNAME} Username of the current user nryan
${API_KEY} Client key used to request an access token from the Tapis Auth service hZ_z3f4Hf3CcgvGoMix0aksN4BOD6
${API_SECRET} Client secret used to request an access token from the Tapis Auth service gTgpCecqtOc6Ao3GmZ_FecVSSV8a
${API_TOKEN} Client unique identifier of an application requesting access to Tapis Auth service de32225c235cf47b9965997270a1496c

JSON Notation

Show JSON notation
{
    "active": true,
    "created": "2014-09-04T16:59:33.000-05:00",
    "frequency": 60,
    "id": "0001409867973952-5056a550b8-0001-014",
    "internalUsername": null,
    "lastCheck": [
      {
        "created": "2014-10-02T13:03:25.000-05:00",
        "id": "0001412273000497-5056a550b8-0001-015",
        "message": null,
        "result": "PASSED",
        "type": "STORAGE"
      },
      {
        "created": "2014-10-02T13:03:25.000-05:00",
        "id": "0001411825368981-5056a550b8-0001-015",
        "message": null,
        "result": "FAILED",
        "type": "LOGIN"
      }
    ],
    "lastSuccess": "2014-10-02T11:03:13.000-05:00",
    "lastUpdated": "2014-10-02T13:03:25.000-05:00",
    "nextUpdate": "2014-10-02T14:03:15.000-05:00",
    "owner": "systest",
    "target": "demo.storage.example.com",
    "updateSystemStatus": false,
    "_links": {
        "checks": {
            "href": "https://api.tacc.utexas.edu/monitor/v2/0001409867973952-5056a550b8-0001-014/checks"
        },
        "notifications": {
            "href": "https://api.tacc.utexas.edu/notifications/v2/?associatedUuid=0001409867973952-5056a550b8-0001-014"
        },
        "owner": {
            "href": "https://api.tacc.utexas.edu/profiles/v2/systest"
        },
        "self": {
            "href": "https://api.tacc.utexas.edu/monitor/v2/0001409867973952-5056a550b8-0001-014"
        },
        "system": {
            "href": "https://api.tacc.utexas.edu/systems/v2/demo.storage.example.com"
        }
    }
}

Javascript dot notation will be used to refer to individual properties of JSON objects. For example, consider the following JSON object.

  • active refers to the top level active attribute in the response object.
  • lastCheck.[].result generically refers to the result attribute contained within any of the objects contained in the lastCheck array.
  • lastCheck.[0].result specifically refers to the result attribute contained within the first object in the lastCheck array.

Versioning

The current major version of Tapis is given in the URI immediately following the API resource name. For example, if the endpoint is https://api.tacc.utexas.edu/jobs/v2/, the API version would be v2. The current major version of Tapis is v2. (Full version: 2.2.23)

Special Character Handling

In certain situations, usually where file system paths and names are involved in some way, Tapis will generate sanitized object names (“slugs”) to make them safe to use. Slugs will be created on the fly by applying the following rules:

  1. Lowercase the string
  2. Replace spaces with a dash
  3. Remove any special characters and punctuation that might require encoding in the URL. Allowed characters are alphanumeric characters, numbers, underscores, and periods.

Secure communication

Tapis uses SSL to secure communication with the clients. If HTTPS is not specified in the request, the request will be redirected to a secure channel.

Requests

The Tapis API is based on REST principles: data resources are accessed via standard HTTPS requests in UTF-8 format to an API endpoint. The API uses appropriate HTTP verbs for each action whenever possible.

Verb Description
GET Used for retrieving resources
POST Used for creating resources
PUT Used for manipulating resources or collections
DELETE Used for deleting resources

Common API query parameters

Several URL query parameters are common across all services. The following table lists them for reference

Name Values Purpose
offset integer (zero based) Skips the first offset results in the response
limit integer Limits the number of responses to, at most, this number
pretty boolean If true, pretty prints the response. Default false
naked boolean If true, returns only the value of the result attribute in the standard response wrapper
filter string A comma-delimited list of fields to return for each object in the response. Each field may be referenced using JSON notation

Experimental query parameters

Starting with the 2.1.10 release, two new query parameters have been introduced into the jobs API as an experimental feature. The following table lists them for reference

Name Values Purpose
sort asc,desc The sort order of the response. asc by default
sortBy string The field by which to sort the response. Any field present in the full representation of the resource that you are querying is supported. Multiple values are not currently supported

Responses

All data is received and returned as a JSON object.

Response Details

{
    "status": "error",
    "message": "Permission denied. You do not have permission to view this system",
    "version": "2.1.16-r8228",
    "result": {}
}

Apart from the response code, all responses from Tapis are in the form of a json object. The object takes the following form.

Key Value Type Value Description
status string User will see message=null on status "success"
message string A short description of the cause of the error
result object,array The JSON response object or array
version string The current full release version of Tapis. Ex “2.1.16-r8228”

Here, for example, is the response that occurs when trying to fetch information for system to which you do not have access:

Naked Responses

In situations where you do not care to parse the wrapper for the raw response data, you may request a naked response from the API by adding naked=true in to the request URL. This will return just the value of the result attribute in the response wrapper.

naked=true

{
  "id" : "data.iplantcollaborative.org",
  "name" : "CyVerse Data Store",
  "type" : "STORAGE",
  "description" : "CyVerse's petabyte-scale, cloud-based data management service.",
  "status" : "UP",
  "public" : true,
  "lastUpdated" : "2017-10-10T00:00:00.000-05:00",
  "default" : true,
  "_links" : {
    "self" : {
      "href" : "https://agave.iplantc.org/systems/v2/data.iplantcollaborative.org"
    }
  }
}

naked=false

{
  "status" : "success",
  "message" : null,
  "version" : "2.2.8-rff32e62",
  "result" : [ {
    "id" : "data.iplantcollaborative.org",
    "name" : "CyVerse Data Store",
    "type" : "STORAGE",
    "description" : "CyVerse's petabyte-scale, cloud-based data management service.",
    "status" : "UP",
    "public" : true,
    "lastUpdated" : "2017-10-10T00:00:00.000-05:00",
    "default" : true,
    "_links" : {
      "self" : {
        "href" : "https://agave.iplantc.org/systems/v2/data.iplantcollaborative.org"
      }
    }
  } ]
}

Formatting

By default, all responses are serialized JSON. To receive pre-formatted JSON, add pretty=true to any query string.

Note

The tapis-cli also produces a table formatted output.

Timestamps

Timestamps are returned in ISO 8601 format offset for Central Standard Time (-05:00) YYYY-MM-DDTHH:MM:SSZ-05:00.

Cross Origin Resource Sharing (CORS)

Many modern applications choose to implement client-server communication exclusively in Javascript. For this reason, Tapis provides cross-origin resource sharing (CORS) support so AJAX requests from a web browser are not constrained by cross-origin requests and can safely make GET, PUT, POST, and DELETE requests to the API.

Hypermedia

{
    "associationIds": [],
    "created": "2013-11-16T11:25:38.900-06:00",
    "internalUsername": null,
    "lastUpdated": "2013-11-16T11:25:38.900-06:00",
    "name": "color",
    "owner": "nryan",
    "uuid": "0001384622738900-5056a550b8-0001-012",
    "value": "red",
    "_links": {
        "self": {
            "href": "https://api.tacc.utexas.edu/meta/v2/data/0001384622738900-5056a550b8-0001-012"
        },
        "owner": {
            "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
        }
    }
}

Tapis is a fully descriptive hypermedia API. Given any point, you should be able to run API through the links provided in the _links object in each resource representation. The following user metadata object contains two referenced objects. The first, self is common to all objects, and contains the URL of that object. The second, owner contains the URL to the profile of the user who created the object.

Customizing Responses

Returns the name, status, app id, and the url to the archived job output for every user job
Show curl
curl -sk -H \
    "Authorization: Bearer ${API_KEY}" \
    "https://api.tacc.utexas.edu/jobs/v2/?limit=2&filter=name,status,appId,_links.archiveData.href
tapis jobs list -v -l 2 -c name -c id -c status -c _links.archiveData
The response would look something like the following:
[
  {
    "name" : "demo-pyplot-demo-advanced test-1414139896",
    "status": "FINISHED",
    "appId" : "demo-pyplot-demo-advanced-0.1.0",
    "_links": {
      "archiveData": {
        "href": "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/listings"
      }
    }
  },
  {
    "name": "demo-pyplot-demo-advanced test-1414270831",
    "status": "FINISHED",
    "appId" : "demo-pyplot-demo-advanced-0.1.0",
    "_links": {
      "archiveData": {
        "href": "https://api.tacc.utexas.edu/jobs/v2/3259859908028273126-242ac115-0001-007/outputs/listings"
      }
    }
  }
]
Returns the system id, type, whether it is your default system, and the hostname from the system’s storage config
Show curl
curl -sk -H \
    "Authorization: Bearer ${API_KEY}" \
    "https://api.tacc.utexas.edu/systems/v2/?filter=id,type,default,storage.host
tapis systems list -v -l 2 -c id -c name -c type -c default -c storage.host
The response would look something like the following:
[
  {
    "id": "user.storage",
    "type": "STORAGE",
    "default": false,
    "storage": {
      "host": "data.tacc.utexas.edu"
    }
  },
  {
    "id": "docker.tacc.utexas.edu",
    "type": "EXECUTION",
    "default": false,
    "storage": {
      "host": "129.114.6.50"
    }
  }
]

In many situations, Tapis may return back too much or too little information in the response to a query. For example, when searching jobs, the inputs and parameters fields are not included in the default summary response objects. You can customize the responses you receive from all the Science APIs using the filter query parameter.

The filter query parameter takes a comma-delimited list of fields to return for each object in the response. Each field may be referenced using JSON notation similar to the search syntax (minus the .[operation] suffix.

Status Codes

The API uses the following response status codes, as defined in the RFC 2616 on successful and unsuccessful requests.

Success Codes

Response Code Meaning Description
200 Success The request succeeded
201 Created The request succeeded and a new resource was created. Only applicable on PUT and POST actions
202 Accepted The request has been accepted for processing, but the processing has not been completed. Common for all async actions such as job submissions, file transfers, etc
206 Partial Content The server has fulfilled the partial GET request for the resource. This will always be the return status of a request using a Range header
301 Moved Permanently The requested resource has been assigned a new permanent URI. You should follow the Location header, repeating the request
304 Not Modified You requested an action that succeeded, but did not modify the resource

Error Codes

Response Code Meaning Description
400 Bad request Your request was invalid
401 Unauthorized Authentication required, but not provided
403 Forbidden You do not have permission to access the given resource
404 Not found No resource was found at the given URL
405 Method Not Allowed You tried to access a resource with an invalid method
406 Not Acceptable You requested a response format that isn’t supported

Best Practices

General

  • Always use SSL. Tapis services will force SSL if you don't specify it, but it's best to protect your application with SSL as a best practice.

Systems Service

  • Use restricted SSH keys whenever possible.
  • Open SSH keys are not supported.
  • Use SSH keys rather than passwords whenever possible.
  • Use a MyProxy Gateway service whenever available rather than a stock MyProxy service to avoid password exposure.
  • Always configure a default storage system for your organization. This provides tremendous benefit to users who don't want to think about the makeup of your infrastructure.
  • Use contextual naming for systems. nryan-vm-sftp-prod is favorable to my-vm. DNS is also a good approach to naming, but you will still need to contextualize it with something like a username since multiple users may want to register the same system.
  • Grant the minimum sufficient role for a user that enables them to do what you want them to do. Don't grant a PUBLISHER role when a GUEST role will suffice. Don't grant an ADMIN role when a USER role will get the job done.
  • Always explicitly specify a scratchDir for your execution systems. This will allow you easily see where your job data will go and avoids systems where your home directory has a smaller quota than other areas of your system.

Files Service

  • Always favor the full canonical URL over assuming default systems. Default systems may change on a user-to-user basis, but canonical URLs will always be the same.
  • Error on the side of privacy by granting permissions to single users and groups over making data public.
  • Avoid over-sharing by granting permissions on specific files or minimum subtrees rather than sharing entire home folders.

PostIts

  • Always limit the lifetime of a postit by specifying either the maximum number of uses or an expiration date. This will prevent people from accessing resources long after you intended for them to do so.

Tutorials

This tutorial is designed to allow you to practice and get familiar with the Tapis enviornment.

Prerequisites

In order to navigate this tutorial you should have knowledge and familiarity with the following items:

Possess a TACC user account

In order to obtain a TACC user-account, first you must proceed to https://portal.tacc.utexas.edu/account-request?p_p_id=createaccount_WAR_createaccountportlet&p_p_lifecycle=1&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_count=1&_createaccount_WAR_createaccountportlet_action=continue

  • At the bottom of the page there is a button to click and accept. Click that button.
  • At the next page you must fill out your contact information.

Quick Start Tutorial

This quick start guide is designed to show you how to do the following:

  1. Create an Oauth Client.
  2. Submit a job using a public image classifyApp.
  3. Retrieve job output information.

Create an OAuth client

Most requests to the Tapis REST APIs require authorization; that is, the user must have granted permission for an application to access the requested data.

Step 1: Create an Oauth Client by entering the following curl command:

curl -sku "$API_USERNAME" -X POST \
-d "clientName=my_cli_app&description=Client app used for scripting up cool stuff" \
https://api.tacc.utexas.edu/clients/v2

Create a variable for the client key and secret by entering:

export key=<client key>
export secret=<secret>

Step 2: Generate an access token by entering the following curl command:

curl -v -u $key:$secret -X POST
-d 'grant_type=password&username=<username>&password=<password>&scope=PRODUCTION'
https://api.tacc.utexas.edu/token

Once you have obtained that token, save it as a variable by entering the following command

export tok=<TOKEN>

For more information please see:

OAuth tutorial: https://tacc.github.io/CSC2017Institute/docs/day2/Intro_Agave_OAuth.html

Running a job

Now you are ready to run a Tapis Job. The Tapis Jobs is the service that allows you to run applications registered with the Tapis Apps service across multiple, distributed, heterogeneous systems through a common REST interface.

For this tutorial we have registered an Image Classifier App using Tapis Apps Service. Tapis.app.imageclassify-1.0u3 is a public app that uses public storage and execution systems. Follow the steps below to submit the Tapis Job and view the output.

Step 1: Crafting the job definition:

Create the following file jobs.json

Show JSON
{
"name":"tapis.demo.imageclassify.job",
"appId":"tapis.app.imageclassify-1.0u3",
"archive":false,
"memoryPerNode":"1"

}

Job parameters used referred in the definition above are:

  • name- The user selected name for the job.
  • appId- The unique ID (name + version) of the application run by this job. This must be a valid application that the user has permission to run.
  • archive- Whether the job output should be archived. When true, all new files created during job execution will be moved to the Archive Path on the Archive system.
  • memoryPerNode- The memory requested for each node on which the job runs. Values are expressed as [num][units], where num can be a decimal number and units can be KB, MB, GB, TB (default = GB). Examples include 200MB, 1.5GB and 5.

Step 2: Submit the job by using the curl-command below:

curl -sk -H "Authorization: Bearer $tok" -X POST -d @jobs.json \
-H "Content-Type: application/json" https://api.tacc.utexas.edu/jobs

Note: Please make sure to run it from the same folder where you have created jobs.json You should see a message “Successfully submitted job job-id”. Everytime you submit a job, a unique job id is created.

Job output

You can check the status of the job and receive the output of the job at the same time.

Type in the curl command below:

curl -sk -H "Authorization: Bearer $tok" https://api.tacc.utexas.edu/jobs/v2/$job_id/outputs/listings/?pretty=true

NOTE

You can download the files if you want by entering in the command:

curl -sk -H "Authorization: Bearer $tok" https://api.tacc.utexas.edu/jobs/v2/$job_id/outputs/media/$PATH

Guides

The Tapis REST APIs enable applications to create and manage digital laboratories that spans campuses, the cloud, and multiple data centers using a cohesive set of web-friendly interfaces.

Authorization

Most requests to the Tapis REST APIs require authorization; that is, the user must have granted permission for an application to access the requested data. To prove that the user has granted permission, the request header sent by the application must include a valid access token.

Before you can begin the authorization process, you will need to register your client application. That will give you a unique client key and secret key to use in the authorization flows.

Supported Authorization Flows

The Tapis REST APIs currently supports four authorization flows:

  1. The Authorization Code flow first gets a code then exchanges it for an access token and a refresh token. Since the exchange uses your client secret key, you should make that request server-side to keep the integrity of the key. An advantage of this flow is that you can use refresh tokens to extend the validity of the access token.
  2. The Implicit Grant flow is carried out client-side and does not involve secret keys. The access tokens that are issued are short-lived and there are no refresh tokens to extend them when they expire.
  3. Resource Owner Password Credentials flow is suitable for native and mobile applications as well as web services, this flow allows client applications to obtain an access token for a user by directly providing the user credentials in an authentication request. This flow exposes the user’s credentials to the client application and is primarily used in situations where the client application is highly trusted such as the command line.
  4. The Client Credentials flow enables users to interact with their own protected resources directly without requiring browser interaction. This is a critical addition for use at the command line, in scripts, and in offline programs. This flow assumes the person registering the client application and the user on whose behalf requests are made be the same person.
Flow Can fetch a user’s data by requesting access? Uses secret key? (key exchange must happen server-side!) Access token can be refreshed?
Authorization Code Yes Yes Yes
Implicit Grant Yes No No
Resource Owner Password Credentials Yes Yes Yes
Client Credentials No Yes No
Unauthorized No No No

Token lifetimes

There are two kinds of tokens you will obtained: access and refresh. Access token lifetimes are configured by the organization operating each tenant and vary based on the flow used to obtain them. By default, access tokens are valid for 4 hours.

Authorization Flow Access Token Lifetime Refresh Token Lifetime
Authorization 4 hours infinite
Implicit 1 hour n/a
User Credential Password 4 hours infinite
Client Credentials 4 hours n/a

Authorization Code

The method is suitable for long-running applications in which the user logs in once and the access token can be refreshed. Since the token exchange involves sending your secret key, this should happen on a secure location, like a backend service, not from a client like a browser or mobile apps. This flow is described in RFC-6749. This flow is also the authorization flow used in our REST API Tutorial.

Authorization Code Flow Diagram

1. Your application requests authorization

A typical request will look something like this
https://api.tacc.utexas.edu/authorize/?client_id=gTgp...SV8a&response_type=code&redirect_uri=https%3A%2F%2Fexample.com%2Fcallback&scope=PRODUCTION&state=866

The authorization process starts with your application sending a request to the Tapis authorization service. (The reason your application sends this request can vary: it may be a step in the initialization of your application or in response to some user action, like a button click.) The request is sent to the /authorize endpoint of the Authorization service:

The request will include parameters in the query string:

Request body parameter Value
response_type Required. As defined in the OAuth 2.0 specification, this field must contain the value "code".
client_id Required. The application's client ID, obtained when the client application was registered with Tapis (see Client Registration).
redirect_uri Required. The URI to redirect to after the user grants/denies permission. This URI needs to have been entered in the Redirect URI whitelist that you specified when you registered your application. The value of redirect_uri here must exactly match one of the values you entered when you registered your application, including upper/lowercase, terminating slashes, etc.
scope Optional. A space-separated list of scopes. Currently only PRODUCTION is supported.
state Optional, but strongly recommended. The state can be useful for correlating requests and responses. Because your redirect_uri can be guessed, using a state value can increase your assurance that an incoming connection is the result of an authentication request. If you generate a random string or encode the hash of some client state (e.g., a cookie) in this state variable, you can validate the response to additionally ensure that the request and response originated in the same browser. This provides protection against attacks such as cross-site request forgery. See RFC-6749.

2. The user is asked to authorize access within the scopes

The Tapis Authorization service presents details of the scopes for which access is being sought. If the user is not logged in, they are prompted to do so using their API username and password.

When the user is logged in, they are asked to authorize access to the actions and services defined in the scopes.

3. The user is redirected back to your specified URI

Let’s assume you provided the following callback URL.
https://example.com/callback

After the user accepts (or denies) your request, the Tapis Authorization service redirects back to the redirect_uri. If the user has accepted your request, the response query string contains a code parameter with the access code you will use in the next step to retrieve an access token.

Sample success redirect back from the server
https://example.com/callback?code=Pq3S..M4sY&state=866
Query parameter Value
access_token An access token that can be provided in subsequent calls, for example to Tapis Profiles API.
token_type Value: "bearer"
expires_in The time period (in seconds) for which the access token is valid.
state The value of the state parameter supplied in the request.

If the user has denied access, there will be no access token and the final URL will have a query string containing the following parameters:

# Sample denial redirect back from the server
https://example.com/callback?error=access_denied&state=867
Query parameter Value
error The reason authorization failed, for example: “access_denied”
state The value of the state parameter supplied in the request.

4. Your application requests refresh and access tokens

POST https://api.tacc.utexas.edu/token

When the authorization code has been received, you will need to exchange it with an access token by making a POST request to the Tapis Authorization service, this time to its /token endpoint. The body of this POST request must contain the following parameters:

Request body parameter Value
grant_type Required. As defined in the OAuth 2.0 specification, this field must contain the value "authorization_code".

5. The tokens are returned to your application

# An example cURL request
curl -X POST -d "grant_type= authorization_code"
    -d "code=Pq3S..M4sY"
    -d "client_id=gTgp...SV8a"
    -d "client_secret=hZ_z3f...BOD6"
    -d "redirect_uri=https%3A%2F%2Fwww.foo.com%2Fauth"
    https://api.tacc.utexas.edu/token
The response would look something like this:
{
    "access_token": "a742...12d2",
    "expires_in": 14400,
    "refresh_token": "d77c...Sacf",
    "token_type": "bearer"
}

On success, the response from the Tapis Authorization service has the status code 200 OK in the response header, and a JSON object with the fields in the following table in the response body:

Key Value type Value description
access_token string An access token that can be provided in subsequent calls, for example to Tapis REST APIs.
token_type string How the access token may be used: always "Bearer".
expires_in int The time period (in seconds) for which the access token is valid. (Maximum 14400 seconds, or 4 hours.)
refresh_token string A token that can be sent to the Spotify Accounts service in place of an authorization code. (When the access code expires, send a POST request to the Accounts service /token endpoint, but use this code in place of an authorization code. A new access token will be returned. A new refresh token might be returned too.)

6. Use the access token to access the Tapis REST APIs

Make a call to the API
curl -H "Authorization: Bearer a742...12d2"
    https://api.tacc.utexas.edu/profiles/v2/me?pretty=true&naked=true
The response would look something like this:
{
    "create_time": "20140905072223Z",
    "email": "rjohnson@mlb.com",
    "first_name": "Randy",
    "full_name": "Randy Johnson",
    "last_name": "Johnson",
    "mobile_phone": "(123) 456-7890",
    "phone": "(123) 456-7890",
    "status": "Active",
    "uid": 0,
    "username": "rjohnson"
}

Once you have a valid access token, you can include it in Authorization header for all subsequent requests to APIs in the Platform.

7. Requesting access token from refresh token

curl -u $key:$secret
-d  grant_type=refresh_token
-d refresh_token=$refresh
https://api.tacc.utexas.edu/token
The response would look something like this.
{
    "access_token": "61e6...Mc96",
    "expires_in": 14400,
    "token_type": "bearer"
}

Access tokens are deliberately set to expire after a short time, usually 4 hours, after which new tokens may be granted by supplying the refresh token originally obtained during the authorization code exchange.

The request is sent to the token endpoint of the Tapis Authorization service:

The body of this POST request must contain the following parameters:

Request body parameter Value
grant_type Required. Set it to "refresh_token". refresh_token
refresh_token Required. The refresh token returned from the authorization code exchange.

The header of this POST request must contain the following parameter:

Implicit Grant

Implicit grant flow is for clients that are implemented entirely using JavaScript and running in resource owner’s browser. You do not need any server side code to use it. This flow is described in RFC-6749.

Implicit Flow

1. Your application requests authorization

https://api.tacc.utexas.edu/authorize?client_id=gTgp...SV8a&redirect_uri=http:%2F%2Fexample.com%2Fcallback&scope=PRODUCTION&response_type=token&state=867

The flow starts off with your application redirecting the user to the /authorize endpoint of the Authorization service. The request will include parameters in the query string:

Request body parameter Value
response_type Required. As defined in the OAuth 2.0 specification, this field must contain the value "token".
client_id Required. The application's client ID, obtained when the client application was registered with Tapis (see Client Registration).
redirect_uri Required. This parameter is used for validation only (there is no actual redirection). The value of this parameter must exactly match the value of redirect_uri supplied when requesting the authorization code.
scope Required. A space-separated list of scopes. Currently only PRODUCTION is supported.
state Optional, but strongly recommended. The state can be useful for correlating requests and responses. Because your redirect_uri can be guessed, using a state value can increase your assurance that an incoming connection is the result of an authentication request. If you generate a random string or encode the hash of some client state (e.g., a cookie) in this state variable, you can validate the response to additionally ensure that the request and response originated in the same browser. This provides protection against attacks such as cross-site request forgery. See RFC-6749.
show_dialog Optional. Whether or not to force the user to approve the app again if they’ve already done so. If false (default), a user who has already approved the application may be automatically redirected to the URI specified by redirect_uri. If true, the user will not be automatically redirected and will have to approve the app again.

2. The user is asked to authorize access within the scopes

The Tapis Authorization service presents details of the scopes for which access is being sought. If the user is not logged in, they are prompted to do so using their API username and password.

When the user is logged in, they are asked to authorize access to the services defined in the scopes. By default all of the Core Science APIs fall under a single scope called, PRODUCTION.

3. The user is redirected back to your specified URI

Let’s assume we specified the following callback address.
https://example.com/callback
A valid success response would be
https://example.com/callback#access_token=Vr17...amUa&token_type=bearer&expires_in=3600&state=867

After the user grants (or denies) access, the Tapis Authorization service redirects the user to the redirect_uri. If the user has granted access, the final URL will contain the following data parameters in the query string.

Query parameter Value
access_token An access token that can be provided in subsequent calls, for example to Tapis Profiles API.
token_type Value: "bearer"
expires_in The time period (in seconds) for which the access token is valid.
state The value of the state parameter supplied in the request.

If the user has denied access, there will be no access token and the final URL will have a query string containing the following parameters:

A failed response would resemble something like
https://example.com/callback?error=access_denied&state=867
Query parameter Value
error The reason authorization failed, for example: “access_denied”
state The value of the state parameter supplied in the request.

4. Use the access token to access the Tapis REST APIs

curl -H "Authorization: Bearer 61e6...Mc96" https://api.tacc.utexas.edu/profiles/v2/me?pretty=true
The response would look something like this:
{
    "create_time": "20140905072223Z",
    "email": "nryan@mlb.com",
    "first_name": "Nolan",
    "full_name": "Nolan Ryan",
    "last_name": "Ryan",
    "mobile_phone": "(123) 456-7890",
    "phone": "(123) 456-7890",
    "status": "Active",
    "uid": 0,
    "username": "nryan"
}

The access token allows you to make requests to any of the Tapis REST APIs on behalf of the authenticated user.

Resource Owner Password Credentials

The method is suitable for scenarios where there is a high degree of trust between the end-user and the client application. This could be a Desktop application, shell script, or server-to-server communication where user authorization is needed. This flow is described in RFC-6749.

1. Your application requests authorization

curl -sku "Authorization: Basic Qt3c...Rm1y="
    -d grant_type=password
    -d username=rjohnson
    -d password=password
    -d scope=PRODUCTION
    https://api.tacc.utexas.edu/token
The response would look something like this:
{
    "access_token": "3Dsr...pv21",
    "expires_in": 14400,
    "refresh_token": "dyVa...MqR0",
    "token_type": "bearer"
}

The request is sent to the /token endpoint of the Tapis Authentication service. The request will include the following parameters in the request body:

Request body parameter Value
Grant_type Required. Set it to "refresh_token"
username Required. The username of an active API user
password Required. The password of an active API user
scope Required. A space-separated list of scopes. Currently only PRODUCTION is supported

The header of this POST request must contain the following parameter:

Header parameter Value
Authorization Required.Set it to “refresh_token”Required. Base 64 encoded string that contains the client ID and client secret key. The field must have the format: Authorization: Basic encoded client_id:client_secret>. (This can also be achieved with curl using the `-u option and specifying the raw colon separated client_id and client_secret.</i>`)
https://example.com/callback?error=access_denied

If the user has not accepted your request or an error has occurred, the response query string contains an error parameter indicating the error that occurred during login. For example:

2. Use the access token to access the Tapis REST APIs

curl -H "Authorization: Bearer 3Dsr...pv21"
    https://api.tacc.utexas.edu/profiles/v2/me?pretty=true
The response would look something like this:
{
    "create_time": "20140905072223Z",
    "email": "rjohnson@mlb.com",
    "first_name": "Randy",
    "full_name": "Randy Johnson",
    "last_name": "Johnson",
    "mobile_phone": "(123) 456-7890",
    "phone": "(123) 456-7890",
    "status": "Active",
    "uid": 0,
    "username": "rjohnson"
}

The access token allows you to make requests to any of the Tapis REST APIs on behalf of the authenticated user.

3. Requesting access token from refresh token

curl -sku "Authorization: Basic Qt3c...Rm1y="
    -d grant_type=refresh_token
    -d refresh_token=dyVa...MqR0
    -d scope=PRODUCTION
    https://api.tacc.utexas.edu/token
The response would look something like this:
{
    "access_token": "8erF...NGly",
    "expires_in": 14400,
    "token_type": "bearer"
}

Access tokens are deliberately set to expire after a short time, usually 4 hours, after which new tokens may be granted by supplying the refresh token obtained during original request.

The request is sent to the token endpoint of the Tapis Authorization service. The body of this POST request must contain the following parameters:

Request body parameter Value
grant_type Required. Set it to "refresh_token". refresh_token
refresh_token Required. The refresh token returned from the authorization code exchange.
scope Required. A space-separated list of scopes. Required. Currently only PRODUCTION is supported.

Client Credentials

The method is suitable for authenticating your requests to the Tapis REST API. This flow is described in RFC-6749.

1. Your application requests authorization

curl -sku "Authorization: Basic Qt3c...Rm1y="
    -d grant_type=client_credentials
    -d scope=PRODUCTION
    https://api.tacc.utexas.edu/token
The response would look something like this:
{
    "access_token": "61e6...Mc96",
    "expires_in": 14400,
    "token_type": "bearer"
}

The request is sent to the /token endpoint of the Tapis Authentication service. The request must include the following parameters in the request body:

Request body parameter Value
grant_type Required. Set it to "client_credentials".
scope Optional. A space-separated list of scopes. Currently on PRODUCTION is supported.

2. Use the access token to access the Tapis REST APIs

curl -H "Authorization: Bearer 61e6...Mc96"
     https://api.tacc.utexas.edu/profiles/v2/me
The response would look something like this:
{
    "email": "nryan@mlb.com",
    "firstName" : "Nolan",
    "lastName" : "Ryan",
    "position" : "null",
    "institution" : "Houston Astros",
    "phone": "(123) 456-7890",
    "fax" : null,
    "researchArea" : null,
    "department" : null,
    "city" : "Houston",
    "state" : "TX",
    "country" : "USA",
    "gender" : "M",
    "_links" : {
      "self" : {
        "href" : "https://api.tacc.utexas.edu/profiles/v2/nryan"
      },
      "users" : {
        "href" : "https://api.tacc.utexas.edu/profiles/v2/nryan/users"
      }
    }
}

The access token allows you to make requests to any of the Tapis REST APIs on behalf of the authenticated user.

Clients and API Keys

By now you already have a user account. Your user account identifies you to the web applications you interact with. A username and password is sufficient for interacting with an application because the application has a user interface, so it knows that the authenticated user is the same one interacting with it. The Tapis API does not have a user interface, so simply providing it a username and password is not sufficient. Tapis needs to know both the user on whose behalf it is acting as well as the client application that is making the call. Whereas every person has a single user account, they may leverage multiple services to do their daily work.

In different types of Tapis interactions, the user is the same, but the context with which they interact with the Tapis is different. Further, the different Tapis interactions all involve client applications developed by the same organization. The situation is further complicated when one or more 3rd party client applications are used to leverage the infrastructure. Tapis needs to track both the users and client applications with whom it interacts. It does this through the issuance of API keys.

Tapis uses OAuth2 to authenticate users and make authorization decisions about what APIs client applications have permission to access. A discussion of OAuth2 is out of the context of this tutorial. You can read more about it on the OAuth2 website or from the websites of any of the many other service providers using it today. In this section, we will walk you through getting your API keys so we can stay focused on learning how to interact with the Tapis’ (Tapis) APIs.

Creating a new client application

In order to interact with any of the Tapis APIs, you will need to first get a set of API keys. You can get your API keys from the Clients service. The example below shows how to get your API keys using both curl and the Tapis CLI.

curl -sku "$API_USERNAME:$API_PASSWORD" -X POST -d "clientName=my_cli_app" -d "description=Client app used for scripting up cool stuff" https://api.tacc.utexas.edu/clients/v2

Note: the -S option will store the new API keys for future use so you don’t need to manually enter then when you authenticate later.

The response to this call will look something like:

{
   "callbackUrl":"",
   "key":"gTgp...SV8a",
   "secret":"hZ_z3f...BOD6",
   "description":"Client app used for scripting up cool stuff",
   "name":"my_cli_app",
   "tier":"Unlimited",
   "_links":{
      "self":{
         "href":"https://api.tacc.utexas.edu/clients/v2/my_cli_app"
      },
      "subscriber":{
         "href":"https://api.tacc.utexas.edu/profiles/v2/nryan"
      },
      "subscriptions":{
         "href":"https://api.tacc.utexas.edu/clients/v2/my_cli_app/subscriptions/"
      }
   }
}

Your API keys should be kept in a secure place and not shared with others. This will prevent other, unauthorized client applications from impersonating your application. If you are developing a web application, you should also provide a valid callbackUrl when creating your keys. This will reduce the risk of your keys being reused even if they are compromised. You should also create a unique set of API keys for each client application you develop. This will allow you to better monitor your usage on a client application-to-application basis and reduce the possibility of inadvertently hitting usage quotas due to cumulative usage across client applications.

Listing your existing client applications

curl -sku "$API_USERNAME:$API_PASSWORD" https://api.tacc.utexas.edu/clients/v2

The response to this call will look something like:

[
   {
      "callbackUrl":"",
      "key":"xn8b...0y3d",
      "description":"",
      "name":"DefaultApplication",
      "tier":"Unlimited",
      "_links":{
         "self":{
            "href":"https://api.tacc.utexas.edu/clients/v2/DefaultApplication"
         },
         "subscriber":{
            "href":"https://api.tacc.utexas.edu/profiles/v2/nryan"
         },
         "subscriptions":{
            "href":"https://api.tacc.utexas.edu/clients/v2/DefaultApplication/subscriptions/"
         }
      }
   },
   {
      "callbackUrl":"",
      "key":"gTgp...SV8a",
      "description":"Client app used for scripting up cool stuff",
      "name":"my_cli_app",
      "tier":"Unlimited",
      "_links":{
         "self":{
            "href":"https://api.tacc.utexas.edu/clients/v2/my_cli_app"
         },
         "subscriber":{
            "href":"https://api.tacc.utexas.edu/profiles/v2/nryan"
         },
         "subscriptions":{
            "href":"https://api.tacc.utexas.edu/clients/v2/my_cli_app/subscriptions/"
         }
      }
   }
]

Over time you may develop several client applications. Managing several sets of API keys can become tricky. You can see which applications you have created by querying the Clients service.

Deleting client registrations

curl -sku "$API_USERNAME:$API_PASSWORD" -X DELETE https://api.tacc.utexas.edu/clients/v2/my_cli_app

The response to this call is simply a null result object.

At some point you may need to delete a client. You can do this by requesting a DELETE on your client in the Clients service.

Listing current subscriptions

curl -sku "$API_USERNAME:$API_PASSWORD" https://api.tacc.utexas.edu/clients/v2/my_cli_app/subscriptions

The response to this call will look something like:

[
  {
     "context":"/apps",
     "name":"Apps",
     "provider":"admin",
     "status":"PUBLISHED",
     "version":"v2",
     "tier":"Unlimited",
     "_links":{
        "api":{
           "href":"https://api.tacc.utexas.edu/apps/v2/"
        },
        "client":{
           "href":"https://api.tacc.utexas.edu/clients/v2/systest_test_client"
        },
        "self":{
           "href":"https://api.tacc.utexas.edu/clients/v2/systest_test_client/subscriptions/"
        }
     }
  },
  {
     "context":"/files",
     "name":"Files",
     "provider":"admin",
     "status":"PUBLISHED",
     "version":"v2",
     "tier":"Unlimited"
     "_links":{
        "api":{
           "href":"https://api.tacc.utexas.edu/files/v2/"
        },
        "client":{
           "href":"https://api.tacc.utexas.edu/clients/v2/systest_test_client"
        },
        "self":{
           "href":"https://api.tacc.utexas.edu/clients/v2/systest_test_client/subscriptions/"
        }
     }
  },
  ...
]

When you register a new client application and get your API keys, you are given access to all the Tapis APIs by default. You can see the APIs you have access to by querying the subscriptions collection of your client.

Updating client subscriptions

curl -sku "$API_USERNAME:$API_PASSWORD" -X POST -d "name=transforms" https://api.tacc.utexas.edu/clients/v2/my_cli_app/subscriptions

You can also use a wildcard to resubscribe to all active APIs.

curl -sku "$API_USERNAME:$API_PASSWORD" -X POST -d "name=*" https://api.tacc.utexas.edu/clients/v2/my_cli_app/subscriptions
The response to this call will be a JSON array identical to the one returned when listing your subscriptions.

Over time, new APIs will be deployed. When this happens you will need to subscribe to the new APIs. You can do this by POSTing a request to the subscription collection with the information about the new API.

Systems

A system in Tapis represents a server or collection of servers. A server can be physical, virtual, or a collection of servers exposed through a single hostname or ip address. Systems are identified and referenced in Tapis by a unique ID unrelated to their ip address or hostname. Because of this, a single physical system may be registered multiple times. This allows different users to configure and use a system in whatever way they need to for their specific needs.

Systems come in two flavors: storage and execution. Storage systems are only used for storing and interacting with data. Execution systems are used for running apps (aka jobs or batch jobs) as well as storing and interacting with data.

The Systems service gives you the ability to add and discover storage and compute resources for use in the rest of the API. You may add as many or as few storage systems as you need to power your digital lab. When you register a system, it is private to you and you alone. Systems can also be published into the public space for all users to use. Depending on who is administering Tapis for your organization, this may have already happened and you may already have one or more storage systems available to you by default.

In this tutorial we walk you through how to discovery, manage, share, and configure systems for your specific needs. This tutorial is best done in a hands-on manner, so if you do not have a compute or storage system of your own to use, you can grab a VM from our sandbox.

Discovering systems

tapis systems list -v
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/systems/v2/

The response will be something like this:

[
  {
    "id" : "user.storage",
    "name" : "Storage VM for the drug discovery portal",
    "type" : "STORAGE",
    "default" : false,
    "_links" : {
      "self" : {
        "href" : "https://api.tacc.utexas.edu/systems/v2/user.storage"
      }
    },
    "available": null,
    "description" : "SFTP on drugdiscovery for the drug discovery portal",
    "public" : true,
    "status" : "UP",
  },
  {
    "id" : "docker.tacc.utexas.edu",
    "name" : "Demo Docker VM",
    "type" : "EXECUTION",
    "default" : false,
    "_links" : {
      "self" : {
        "href" : "https://api.tacc.utexas.edu/systems/v2/docker.tacc.utexas.edu"
      }
    },
    "available": null,
    "description" : "Cloud VM used for Docker demonstrations and tutorials.",
    "public" : true,
    "status" : "UP"
  }
]

The Systems service allows you to list and search for systems you have registered and systems that have been shared with you. To get a list of all your systems, make a GET request on the Systems collection.

System description can get rather verbose, so a summary object is returned when listing a resource collection. The summary object contains the most critical fields in order to reduce response size when retrieving a user’s systems. You can customize this behavior using the filter query parameter.

The above response my vary depending on who administers Tapis for your organization. To customize this tutorial for your specific account, login.

Filtering results

List all systems (up to the page limit)

tapis systems search -v --type eq STORAGE
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/systems/v2/?type=storage

List only execution systems

tapis systems search -v --type eq EXECUTION
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/systems/v2/?type=execution

List only public systems

tapis systems search --public eq TRUE
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/systems/v2/?publicOnly=true

List only private systems

tapis systems search --public eq FALSE
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/systems/v2/?privateOnly=true

Only return default systems

tapis systems search --default eq TRUE
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/systems/v2/?default=true

You can further filter the results by type, scope, and default status. See the search section for further filtering options.

System details

tapis systems show -v hpc-tacc-jetstream
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/systems/v2/user.storage

The response will be something like this:

{
  "id": "hpc-tacc-jetstream",
  "name": "TACC Jetstream (Docker Host)",
  "type": "EXECUTION",
  "default": false,
  "_links": {
    "metadata": {
    "href": "https://api.sd2e.org/meta/v2/data/?q=%7B%22associationIds%22%3A%228014294480571067929-242ac11a-0001-006%22%7D"
    },
    "roles": {
      "href": "https://api.sd2e.org/systems/v2/hpc-tacc-jetstream/roles"
      },
      "self": {
      "href": "https://api.sd2e.org/systems/v2/hpc-tacc-jetstream"
      },
      "history": {
      "href": "https://api.sd2e.org/systems/v2/hpc-tacc-jetstream/history"
      }
    },
    "available": true,
    "description": "Linux container support via Docker 17.12.1-ce",
    "environment": null,
    "executionType": "CLI",
    "globalDefault": false,
    "lastModified": "2019-09-11T12:49:47.000-05:00",
    "login": {
    "proxy": null,
    "protocol": "SSH",
    "port": 22,
    "auth": {
      "type": "SSHKEYS"
      },
      "host": "129.114.17.137"
      },
      "maxSystemJobs": 10,
      "maxSystemJobsPerUser": 10,
      "owner": "sd2eadm",
      "public": true,
      "queues": [
      {
      "maxJobs": 128,
      "maxMemoryPerNode": 1,
      "default": false,
      "maxRequestedTime": "00:15:00",
      "name": "short",
      "description": "Rapid turnaround jobs",
      "maxNodes": 1,
      "maxProcessorsPerNode": 1,
      "mappedName": null,
      "maxUserJobs": 10,
      "customDirectives": "-A SD2E-Community"
      },
    ],
    "revision": 20,
    "scheduler": "FORK",
    "scratchDir": "",
    "site": "jetstream-cloud.org",
    "status": "UP",
    "storage": {
      "proxy": null,
      "protocol": "SFTP",
      "mirror": false,
      "port": 22,
      "auth": {
        "type": "SSHKEYS"
      },
      "host": "129.114.17.137",
      "rootDir": "/data/jobs",
      "homeDir": "/"
    },
    "uuid": "8014294480571067929-242ac11a-0001-006",
    "workDir": ""
  }

To query for detailed information about a specific system, add the system id to the url and make another GET request.

This time, the response will be a JSON object with a full system description. The following is the description of a storage system. In the next section we talk more about storage systems and how to register one of your own.

Storage systems

A storage systems can be thought of as an individual data repository that you want to access through Tapis. The following JSON object shows how a basic storage systems is described.

{
   "id":"sftp.storage.example.com",
   "name":"Example SFTP Storage System",
   "type":"STORAGE",
   "description":"My example storage system using SFTP to store data for testing",
   "storage":{
      "host":"storage.example.com",
      "port":22,
      "protocol":"SFTP",
      "rootDir":"/",
      "homeDir":"/home/systest",
      "auth":{
         "username":"systest",
         "password":"changeit",
         "type":"PASSWORD"
      }
   }
}

The first four attributes are common to both storage and execution systems. The storage attribute describes the connectivity and authentication information needed to connect to the remote system. Here we describe a SFTP server accessible on port 22 at host storage.example.com. We specify that we want the rootDir, or virtual system root exposed through Tapis, to be the system’s physical root directory, and we want the authenticated user’s home directory to be the homeDir, or virtual home directory and base of all relative paths given to Tapis. Finally, we tell Tapis to use password based authentication and provided the necessary credentials.

This example is given as a simple illustration of how to describe a systems for use by Tapis. In most situations you should NOT provide your username and password. In fact, if you are using a compute or storage systems from your university or government-funded labs it is, at best, against the user agreement and, at worst, illegal to give your password to a third party service such as Tapis. In these situations, use one of the many other authentication options such as SSH keys, X509 authentication, or a 3rd party authentication service like the MyProxy Gateway.

The full list of storage system attributes is described in the following table.

Attribute Type Description
available boolean Whether the system is currently available for use in the API. Unavailable systems will not be visible to anyone but the owner. This differs from the `status` attribute in that a system may be UP, but not available for use in Tapis. Defaults to true
description string Verbose description of this system.
id string Required: A unique identifier you assign to the system. A system id must be globally unique across a tenant and cannot be reused once deleted.
name string Required: Common display name for this system.
site string The site associated with this system. Primarily for logical grouping.
status UP, DOWN, MAINTENANCE, UNKNOWN The functional status of the system. Systems must be in UP status to be used.
storage JSON Object Required: Storage configuration describing the storage config defining how to connect to this system for data staging.
type STORAGE, EXECUTION Required: Must be STORAGE.

Supported data and authentication protocols

The example above described a system accessible by SFTP. Tapis supports many different data and authentication protocols for interacting with your data. Sample configurations for many protocol combinations are given below.

Sample storage system definition with each supported data protocol and authentication configuration.

{
   "id":"sftp.storage.example.com",
   "name":"Example SFTP Storage System",
   "status":"UP",
   "type":"STORAGE",
   "description":"My example storage system using SFTP to store data for testing",
   "site":"example.com",
   "storage":{
      "host":"storage.example.com",
      "port":22,
      "protocol":"SFTP",
      "rootDir":"/",
      "homeDir":"/home/systest",
      "auth":{
         "username":"systest",
         "password":"changeit",
         "type":"PASSWORD"
      }
   }
}

In each of the examples above, the storage objects were slightly different, each unique to the protocol used. Descriptions of every attribute in the storage> object and its children are given in the following tables.

storage attributes give basic connectivity information describing things like how to connect to the system and on what port.

Attribute Type Description
auth JSON object Required: A JSON object describing the default authentication credential for this system.
container string The container to use when interacting with an object store. Specifying a container provides isolation when exposing your cloud storage accounts so users do not have access to your entire storage account. This should be used in combination with delegated cloud credentials such as an AWS IAM user credential.
homeDir string The path on the remote system, relative to rootDir to use as the virtual home directory for all API requests. This will be the base of any requested paths that do not being with a '/'. Defaults to '/', thus being equivalent to rootDir.
host string Required: The hostname or ip address of the storage server
port int Required: The port number of the storage server.
mirror boolean Whether the permissions set on the server should be pushed to the storage system itself. Currently, this only applies to IRODS systems.
protocol FTP, GRIDFTP, IRODS, IRODS4, LOCAL, S3, SFTP Required: The protocol used to authenticate to the storage server.
publicAppsDir string The path on the remote system where apps will be stored if this system is used as the default public storage system.
proxy JSON Object The proxy server through with Tapis will tunnel when submitting jobs. Currently proxy servers will use the same authentication mechanism as the target server.
resource string The name of the default resource to use when defining an IRODS system.
rootDir string The path on the remote system to use as the virtual root directory for all API requests. Defaults to '/'.
zone string The name of the default zone to use when defining an IRODS system.

storage.auth attributes give authentication information describing how to authenticate to the system specified in the storage config above.

Attribute Type Description
credential string The credential used to authenticate to the remote system. Depending on the authentication protocol of the remote system, this could be an OAuth Token, X.509 certificate.
internalUsername string The username of the internal user associated with this credential.
password string The password on the remote system used to authenticate.
privateKey string The private ssh key used to authenticate to the remote system.
publicKey string The public ssh key used to authenticate to the remote system.
server JSON object A JSON object describing the authentication server from which a valid credential may be obtained. Currently only auth type X509 supports this attribute.
type APIKEYS, LOCAL, PAM, PASSWORD, SSHKEYS, or X509 Required: The path on the remote system where apps will be stored if this system is used as the default public storage system.
username string The remote username used to authenticate.

storage.auth.server attributes give information about how to obtain a credential that can be used in the authentication process. Currently only systems using the X509 authentication can leverage this feature to communicate with MyProxy and MyProxy Gateway servers.

Attribute Type Description
name string A descriptive name given to the credential server
endpoint string Required: The endpoint of the authentication server.
port integer Required: The port on which to connect to the server.
protocol MPG, MYPROXY Required: The protocol with which to obtain an authentication credential.

system.proxy configuration attributes give information about how to connect to a remote system through a proxy server. This often happens when the target system is behind a firewall or resides on a NAT. Currently proxy servers can only reuse the authentication configuration provided by the target system.

Attribute Type Description
name string Required: A descriptive name given to the proxy server.
host string Required: The hostname of the proxy server.
port integer Required: The port on which to connect to the proxy server. If null, the port in the parent storage config is used.

If you have not yet set up a system of your own, now is a good time to grab a sandbox system to use while you follow along with the rest of this tutorial.

Creating a new storage system

tapis systems create -v -F sftp-password.json
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -F "fileToUpload=@sftp-password.json" https://api.tacc.utexas.edu/systems/v2

The response from the service will be similar to the following:

{
  "site": null,
  "id": "sftp.storage.example.com",
  "revision": 1,
  "default": false,
  "lastModified": "2016-09-06T17:46:42.621-05:00",
  "status": "UP",
  "description": "My example storage system using SFTP to store data for testing",
  "name": "Example SFTP Storage System",
  "owner": "nryan",
  "globalDefault": false,
  "available": true,
  "uuid": "4036169328045649434-242ac117-0001-006",
  "public": false,
  "type": "STORAGE",
  "storage": {
    "mirror": false,
    "port": 22,
    "homeDir": "/home/systest",
    "protocol": "SFTP",
    "host": "storage.example.com",
    "publicAppsDir": null,
    "proxy": null,
    "rootDir": "/",
    "auth": {
      "type": "PASSWORD"
    }
  },
  "_links": {
    "roles": {
      "href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com/roles"
    },
    "owner": {
      "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
    },
    "credentials": {
      "href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com/credentials"
    },
    "self": {
      "href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com"
    },
    "metadata": {
      "href": "https://api.tacc.utexas.edu/meta/v2/data/?q=%7B%22associationIds%22%3A%224036169328045649434-242ac117-0001-006%22%7D"
    }
  }
}

Congratulations, you just added your first system. This storage system can now be used by the Files service to manage data, the Transfer service as a source or destination of data movement, the Apps service as a application repository, and the Jobs Service as both a staging and archiving destination.

Notice that the JSON returned from the Systems service is different than what was submitted. Several fields have been added, and several other have been removed. On line 3, the UUID of the system has been added. This is the same UUID that is used in notifications and metadata references. On line 5, the status value was added in and assigned a default value since we did not specify it. Ditto for the site attribute on line 8.

Three new fields were added on lines 9-11. revision is the number of times this system has been updated. This being our first time registering the system, it is set to 1. public tells whether this system is published as a shared resource for all users. We will cover this more in the section on System scope. lastModified is a timestamp of the last time the system was updated.

In the storage object, the publicAppsDir and mirror fields were both added and set to their default values. In this example we are not using a proxy server, so it was defaulted to null. Last, and most important, all authentication information has been omitted from the response object. Regardless of the authentication type, no user credential information will ever be returned once they are stored.

Execution Systems

In contrast to storage systems, execution systems specify compute resources where application binaries can be run. In addition to the storage attribute found in storage systems, execution systems also have a login attribute describing how to connect to the remote system to submit jobs as well as several other attributes that allow Tapis to determine how to stage data and run software on the system. The full list of execution system attributes is given in the following tables.

Name Type Description
available boolean Whether the system is currently available for use in the API. Unavailable systems will not be visible to anyone but the owner. This differs from the status attribute in that a system may be UP, but not available for use in Tapis. Defaults to true
description string Verbose description of this system.
environment String List of key-value pairs that will be added to the environment prior to execution of any command.
executionType HPC, Condor, CLI Required: Specifies how jobs should go into the system. HPC and Condor will leverage a batch scheduler. CLI will fork processes.
id string Required: A unique identifier you assign to the system. A system id must be globally unique across a tenant and cannot be reused once deleted.
maxSystemJobs integer Maximum number of jobs that can be queued or running on a system across all queues at a given time. Defaults to unlimited.
maxSystemJobsPerUser integer Maximum number of jobs that can be queued or running on a system for an individual user across all queues at a given time. Defaults to unlimited.
name string Required: Common display name for this system.
queues JSON Array An array of batch queue definitions providing descriptive and quota information about the queues you want to expose on your system. If not specified, no other system queues will be available to jobs submitted using this system.
scheduler LSF, LOADLEVELER, PBS, SGE, CONDOR, FORK, COBALT, TORQUE, MOAB, SLURM, CUSTOM_LSF, CUSTOM_LOADLEVELER, CUSTOM_PBS, CUSTOM_SGE, CUSTOM_CONDOR, FORK, CUSTOM_COBALT, CUSTOM_TORQUE, CUSTOM_MOAB, CUSTOM_SLURM, UNKNOWN Required: The type of batch scheduler available on the system. This only applies to systems with executionType HPC and CONDOR. The *_CUSTOM version of each scheduler provides a mechanism for you to override the default scheduler directives added by Tapis and explicitly add your own through the customDirectives field in each of the batchQueue definitions for your system.
scratchDir string Path to use for a job scratch directory. This value is the first choice for creating a job's working directory at runtime. The path will be resolved relative to the rootDir value in the storage config if it begins with a "/", and relative to the system homeDir otherwise.
site string The site associated with this system. Primarily for logical grouping.
startupScript String Path to a script that will be run prior to execution of any command on this system. The path will be a standard path on the remote system. A limited set of system macros are supported in this field. They are rootDir, homeDir, systemId, workDir, and homeDir. The standard set of runtime job attributes are also supported. Between the two set of macros, you should be able to construct distinct paths per job, user, and app. Any environment variables defined in the system description will be added after this script is sourced. If this script fails, output will be logged to the .agave.log file in your job directory. Job submission will still continue regardless of the exit code of the script.
status UP, DOWN, MAINTENANCE, UNKNOWN The functional status of the system. Systems must be in UP status to be used.
storage JSON Object Required: Storage configuration describing the storage config defining how to connect to this system for data staging.
type STORAGE, EXECUTION Required: Must be EXECUTION.
workDir string Path to use for a job working directory. This value will be used if no scratchDir is given. The path will be resolved relative to the rootDir value in the storage config if it begins with a "/", and relative to the system homeDir otherwise.

Startup startupScript

Every time Tapis establishes a connection to an execution system, local or remote, it will attempt to source the startupScript provided in your system definition. The value of startupScript may be an absolute path on the system (ie. “/usr/local/bin/common_aliases.sh”, “/home/nryan/.bashrc”, etc.) or a path relative to physical home directory of the account used to authenticate to the system (“.bashrc”, “.profile”, “agave/scripts/startup.sh”, etc).

The startupScript field supports the use of template variables which Tapis will resolve at runtime before establishing a connection. If you would prefer to specify the startup script as a virtualized path on the system, prepend ${SYSTEM_ROOT_DIR} to the path. If the system will be made public, you can specify a file relative to the home directory of the calling user by prefixing your startupScript value with ${SYSTEM_ROOT_DIR}/${SYSTEM_HOME_DIR}/${USERNAME} A full list of the variables available is given in the following table.

The startupScript is NOT a virtual path relative to the relative to the system rootDir and homeDir. It is an acutal path on the remote system. The reason being that this value can only be set by the system owner, so it is unlikely to be a security issue, and the login account home directory may not be visible from the virtualized file system exposed by the system definition. This gives the owner a way to properly configure their user environment while protecting assets they would otherwise choose not to expose.

Schedulers and system execution types

Tapis supports job execution both interactively and through batch queueing systems (aka schedulers). We cover the mechanics of job submission in the Job Management tutorial. Here we just point out that regardless of how your job is actually run on the underlying system, the process of submitting, monitoring, sharing, and otherwise interacting with your job through Tapis is identical. Describing the scheduler and execution types for your system is really just a matter of picking the most efficient and/or available mechanism for running jobs on your system.

As you saw in the table above, executionType refers to the classification of jobs going into the system and scheduler refers to the type of batch scheduler used on a system. These two fields help limit the range of job submission options used on a specific system. For example, it is not uncommon for a HPC system to accept jobs from both a Condor scheduler and a batch scheduler. It is also possible, though generally discouraged, to fork jobs directly on the command line. With so many options, how would users publishing apps on such a system know what mechanism to use? Specifying the execution type and scheduler help narrow down the options to a single execution mechanism.

Thankfully, picking the right combination is pretty simple. The following table illustrates the available combinations.

Variable Description
SYSTEM_ID ID of the system (ex. ssh.execute.example.com)
SYSTEM_UUID fThe UUID of the system
SYSTEM_STORAGE_PROTOCOL The protocol used to move data to and from this system
SYSTEM_STORAGE_HOST The storage host for this sytem
SYSTEM_STORAGE_PORT The storage port for this system
SYSTEM_STORAGE_RESOURCE The system resource for iRODS systems
SYSTEM_STORAGE_ZONE The system zone for iRODS systems
SYSTEM_STORAGE_ROOTDIR The virtual root directory exposed on this system
SYSTEM_STORAGE_HOMEDIR The home directory on this system relative to the STORAGE_ROOT_DIR
SYSTEM_STORAGE_AUTH_TYPE The storage authentication method for this system
SYSTEM_STORAGE_CONTAINER The the object store bucket in which the rootDir resides.
SYSTEM_LOGIN_PROTOCOL The protocol used to establish a session with this system (eg SSH, GSISSH, etc *NOTE: OpenSSH Keys are not supported. )
SYSTEM_LOGIN_HOST The login host for this system
SYSTEM_LOGIN_PORT The login port for this system
SYSTEM_LOGIN_AUTH_TYPE The login authentication method for this system
SYSTEM_OWNER The username of the user who created the system.
executionType scheduler Description
HPC LSF, LOADLEVELER, PBS, SGE, COBALT, TORQUE, MOAB, SLURM Jobs will be submitted to the local scheduler using the appropriate scheduler commands. Systems with this execution type will not allow forked jobs.
CONDOR CONDOR Jobs will be submitted to the condor scheduler running locally on the remote system. Tapis will not do any installation for you, so the setup and administration of the Condor server is up to you.
CLI FORK Jobs will be started as a forked process and monitored using the system process id.

When you are describing your system, consider the policies put in place by your system administrators. If the system you are defining has a scheduler, chances are they want you to use it.

Defining batch queues

Tapis supports the notion of multiple submit queues. On HPC systems, queues should map to actual batch scheduler queues on the target server. Additionally, queues are used by Tapis as a mechanism for implementing quotas on job throughput in a given queue or across an entire system. Queues are defined as a JSON array of objects assigned to the queues attribute. The following table summarizes all supported queue parameters.

Name Type Description
name string Arbitrary name for the queue. This will be used in the job submission process, so it should line up with the name of an actual queue on the execution system.
maxJobs integer Maximum number of jobs that can be queued or running within this queue at a given time. Defaults to 10. -1 for no limit
maxUserJobs integer Maximum number of jobs that can be queued or running by any single user within this queue at a given time. Defaults to 10. -1 for no limit
maxNodes integer Maximum number of nodes that can be requested for any job in this queue. -1 for no limit
maxProcessorsPerNode integer Maximum number of processors per node that can be requested for any job in this queue. -1 for no limit
maxMemoryPerNode string Maximum memory per node for jobs submitted to this queue in ###.#[E|P|T|G]B format.
maxRequestedTime string Maximum run time for any job in this queue given in hh:mm:ss format.
customDirectives string Arbitrary text that will be appended to the end of the scheduler directives in a batch submit script. This could include a project number, system-specific directives, etc.
default boolean True if this is the default queue for the system, false otherwise.

Configuring quotas

In the batch queues table above, several attributes exist to specify limits on the number of total jobs and user jobs in a given queue. Corresponding attributes exist in the execution system to specify limits on the number of total and user jobs across an entire system. These attributes, when used appropriately, can be used to tell Tapis how to enforce limits on the concurrent activity of any given user. They can also ensure that Tapis will not unfairly monopolize your systems as your application usage grows.

If you have ever used a shared HPC system before, you should be familiar with batch queue quotas. If not, the important thing to understand is that they are a critical tool to ensure fair usage of any shared resource. As the owner/administrator for your registered system, you can use the batch queues you define to enforce whatever usage policy you deem appropriate.

Consider one example where you are using a VM to run image analysis routines on demand through Tapis, your server will become memory bound and experience performance degradation if too many processes are running at once. To avoid this, you can set a limit using a batch queue configuration that limits the number of simultaneous tasks that can run at once on your server.

Another example where quotas can be helpful is to help you properly partitioning your system resources. Consider a user analyzing unstructured data. The problem is computationally and memory intensive. To preserve resources, you could create one queue with a moderate value of maxJobs and conservative maxMemoryPerNode, maxProcessorsPerNode, and maxNodes values to allow good throughput of small job. You could then create another queue with large maxMemoryPerNode, maxProcessorsPerNode, and maxNodes values while only allowing a single job to run at a time. This gives you both high throughput and high capacity on a single system.

The following sample queue definitions illustrate some other interesting use cases.

{
    "name":"short_job",
    "mappedName": null,
    "maxJobs":100,
    "maxUserJobs":10,
    "maxNodes":32,
    "maxMemoryPerNode":"64GB",
    "maxProcessorsPerNode":12,
    "maxRequestedTime":"00:15:00",
    "customDirectives":null,
    "default":true
}

System login protocols

As with storage systems, Tapis supports several different protocols and mechanisms for job submission. We already covered scheduler and queue support. Here we illustrate the different login configurations possible. For brevity, only the value of the login JSON object is shown.

The full list of login configuration options is given in the following table. We omit the login.auth and login.proxy attributes as they are identical to those used in the storage config.

Attribute Type Description
auth JSON object Required: A JSON object describing the default login authentication credential for this system.
host string Required: The hostname or ip address of the server where the job will be submitted.
port int The port number of the server where the job will be submitted. Defaults to the default port of the protocol used.
protocol SSH, GSISSH, LOCAL Required: The protocol used to submit jobs for execution. *NOTE: OpenSSH Keys are not supported.
proxy JSON Object The proxy server through with Tapis will tunnel when submitting jobs. Currently proxy servers will use the same authentication mechanism as the target server.

Scratch and work directories

In the Job Management tutorial we will dive into how Tapis manages the end-to-end lifecycle of running a job. Here we point out two relevant attributes that control where data is staged and where your job will physically run. The scratchDir and workDir attributes control where the working directories for each job will be created on an execution system. The following table summarizes the decision making process Tapis uses to determine where the working directories should be created.

rootDir value homeDir value scratchDir value Effective system path for job working directories
/ / /
/ / / /
/ / /scratch /scratch
/ /home/nryan /home/nryan
/ /home/nryan / /
/ /home/nryan /scratch /scratch
/home/nryan / /home/nryan
/home/nryan / / /home/nryan
/home/nryan / /scratch /home/nryan/scratch
/home/nryan /home /home/nryan/home
/home/nryan /home / /home/nryan
/home/nryan /home /scratch /home/nryan/scratch

While it is not required, it is a best practice to always specify scratchDir and workDir values for your execution systems and, whenever possible, place them outside of the system homeDir to ensure data privacy. The reason for this is that the file system available on many servers is actually made up of a combination of physically attached storage, mounted volumes, and network mounts. Often times, your home directory will have a very conservative quota while the mounted storage will essentially be quota free. As the above table shows, when you do not specify a scratchDir or workDir, Tapis will attempt to create your job work directories in your system homeDir. It is very likely that, in the course of running simulations, you will reach the quota on your home directory, thereby causing that job and all future jobs to fail on the system until you clear up more space. To avoid this, we recommend specifying a location with sufficient available space to handle the work you want to do.

Another common error that arises from not specifying thoughtful scratchDir and workDir values for your execution systems is jobs failing due to “permission denied” errors. This often happens when your scratchDir and/or workDir resolve to the actual system root. Usually the account you are using to access the system will not have permission to write to /, so all attempts to create a job working directory fail, accurately, due to a “permission denied” error.

While it is not required, it is a best practice to always specify scratchDir and workDir values for your execution systems and, whenever possible, place them outside of the system homeDir to ensure data privacy.

Creating a new execution system

tapis systems create -v -F ssh-password.json
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -F "fileToUpload=@ssh-password.json" https://api.tacc.utexas.edu/systems/v2

The response from the server will be similar to the following

{
   "id":"demo.execute.example.com",
   "uuid":"0001323106792914-5056a550b8-0001-006",
   "name":"Example SSH Execution Host",
   "status":"UP",
   "type":"EXECUTION",
   "description":"My example system using ssh to submit jobs used for testing.",
   "site":"example.com",
   "revision":1,
   "public":false,
   "lastModified":"2013-07-02T10:16:11.000-05:00",
   "executionType":"HPC",
   "scheduler":"SGE",
   "environment":null,
   "startupScript":"./bashrc",
   "maxSystemJobs":100,
   "maxSystemJobsPerUser":10,
   "workDir":"/work",
   "scratchDir":"/scratch",
   "queues":[
      {
         "name":"normal",
         "maxJobs":100,
         "maxUserJobs":10,
         "maxNodes":32,
         "maxMemoryPerNode":"64GB",
         "maxProcessorsPerNode":12,
         "maxRequestedTime":"48:00:00",
         "customDirectives":null,
         "default":true
      },
      {
         "name":"largemem",
         "maxJobs":25,
         "maxUserJobs":5,
         "maxNodes":16,
         "maxMemoryPerNode":"2TB",
         "maxProcessorsPerNode":4,
         "maxRequestedTime":"96:00:00",
         "customDirectives":null,
         "default":false
      }
   ],
   "login":{
      "host":"texas.rangers.mlb.com",
      "port":22,
      "protocol":"SSH",
      "proxy":null,
      "auth":{
         "type":"PASSWORD"
      }
   },
   "storage":{
      "host":"texas.rangers.mlb.com",
      "port":22,
      "protocol":"SFTP",
      "rootDir":"/home/nryan",
      "homeDir":"",
      "proxy":null,
      "auth":{
         "type":"PASSWORD"
      }
   }
}

Disabling

Disable a system

tapis systems disable $SYSTEM_ID
Show curl
curl -sk -H "Authorization: Bearer $AUTH_TOKEN"
  -H "Content-Type: application/json"
  -X PUT --data-binary '{"action": "disable"}'
  https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID

The response will look something like the following:

{
  "site": null,
  "id": "sftp.storage.example.com",
  "revision": 1,
  "default": false,
  "lastModified": "2016-09-06T17:46:42.621-05:00",
  "status": "UP",
  "description": "My example storage system using SFTP to store data for testing",
  "name": "Example SFTP Storage System",
  "owner": "nryan",
  "globalDefault": false,
  "available": false,
  "uuid": "4036169328045649434-242ac117-0001-006",
  "public": false,
  "type": "STORAGE",
  "storage": {
    "mirror": false,
    "port": 22,
    "homeDir": "/home/systest",
    "protocol": "SFTP",
    "host": "storage.example.com",
    "publicAppsDir": null,
    "proxy": null,
    "rootDir": "/",
    "auth": {
      "type": "PASSWORD"
    }
  },
  "_links": {
    "roles": {
      "href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com/roles"
    },
    "owner": {
      "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
    },
    "credentials": {
      "href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com/credentials"
    },
    "self": {
      "href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com"
    },
    "metadata": {
      "href": "https://api.tacc.utexas.edu/meta/v2/data/?q=%7B%22associationIds%22%3A%224036169328045649434-242ac117-0001-006%22%7D"
    }
  }
}

There may be times when you need to disable a system. If your system has scheduled maintenance periods, you may want to disable the system until the maintenance period ends. You can do this by making a PUT request on a monitor with the a field name action set to “disabled”, or simply updating the status to “MAINTENANCE”. While disabled, all apps and jobs will be disabled. All file operations will be rejected during system downtimes as well. Once restored, all operations will pick back up.

Enabling a system

Enable a system

tapis systems enable $SYSTEM_ID
Show curl
curl -sk -H "Authorization: Bearer $AUTH_TOKEN"
  -H "Content-Type: application/json"
  -X PUT --data-binary '{"action": "enable"}'
  https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID

The response will look something like the following:

{
  "site": null,
  "id": "sftp.storage.example.com",
  "revision": 1,
  "default": false,
  "lastModified": "2016-09-06T17:46:42.621-05:00",
  "status": "UP",
  "description": "My example storage system using SFTP to store data for testing",
  "name": "Example SFTP Storage System",
  "owner": "nryan",
  "globalDefault": false,
  "available": true,
  "uuid": "4036169328045649434-242ac117-0001-006",
  "public": false,
  "type": "STORAGE",
  "storage": {
    "mirror": false,
    "port": 22,
    "homeDir": "/home/systest",
    "protocol": "SFTP",
    "host": "storage.example.com",
    "publicAppsDir": null,
    "proxy": null,
    "rootDir": "/",
    "auth": {
      "type": "PASSWORD"
    }
  },
  "_links": {
    "roles": {
      "href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com/roles"
    },
    "owner": {
      "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
    },
    "credentials": {
      "href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com/credentials"
    },
    "self": {
      "href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com"
    },
    "metadata": {
      "href": "https://api.tacc.utexas.edu/meta/v2/data/?q=%7B%22associationIds%22%3A%224036169328045649434-242ac117-0001-006%22%7D"
    }
  }
}

Similarly, to enable a monitor, make a PUT request with the a field name action set to “enabled”. Once reenabled, the monitor will resume its previous check schedule as specified in the nextUpdate field, or immediately if that time has already expired.

Deleting systems

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X DELETE https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID

The call will return an empty result.

In the event you wish to delete a system, you can make a DELETE request on the system URL. Deleting a system will disable the system and all applications published on that system from use. Any running jobs will be continue to run, but all pending, archiving, paused, and staged jobs will be killed, and any data archived on that system will no longer be available. Restoring a deleted system requires intervention from your tenant admin. Once deleted, the system id cannot be reused at a later time. Use this operation with care.

If you simply wish to remove a system from service, you can update the system status or available attributes depending on whether you want to disable user or visibility.

Multi-user environments

If your application supports a multi-user environment and those users do not have API accounts, then you may run into a situation where you are juggling multiple user credentials for a single system. Tapis has a solution for this problem in the for of its Internal User feature. You can map your application users into a private user store Tapis provides you and assign those users credentials on your systems. This allows you to move seamlessly from community users to private users and back without having to alter your application code. For a deep discussion on the mechanics and implications of credential management with internal users, see the Internal User Credential Management guide.

System roles

Systems you register are private to you and you alone. You can, however, allow other Tapis clients to utilize the system you define by granting them a role on the system using the systems roles services. The available roles are given in the table below.

Role Description
GUEST Gives any authenticated user readonly access to the system. No file operations or job executions are allowed for users with GUEST access.
USER Gives a user the ability to run jobs and access data on the system.
PUBLISHER All the rights of USER as well as the ability to publish applications listing the system as an execution host.
ADMIN All the rights of PUBLISHER as well as the ability to edit and grant roles on the system details. Admins may use the system to access data and run jobs using the default credential assigned to the system, but they may not view or update any of the credentials stored by the system owner. It is not possible for anyone but the system owner to assign or leverage internal user credentials on a system.
OWNER Reserved for the user that originally created the system. This role is non-revokable.

Please see the Systems Roles tutorial for a deep discussion of system roles and how they are used.

System scope

Throughout these tutorials and Beginner's Guides, we have referred to both public and private systems. In addition to roles, systems have a concept of scope associated with them. Not to be confused with OAuth scope mentioned in the Authentication Guide, system scope refers to the availability of a system to the general user community. The following table lists the available scopes and their meanings.

Scope Required role Description
private Admin System is visible and available for use to the owner and to anyone whom they grant a role.
read only Tenant admin Storage system is visible and available for data browsing and download by any API user. Write access is restricted unless explicitly granted to a specific user.
public Tenant admin System is visible and available to all users for reading and writing. Virtual user home directories are enforced and write access outside of a user's home directory is restricted unless explicitly granted by a system admin.

Private systems

All systems are private by default. This means that no one can use a system you register without you or another user with “admin” permissions granting them a role on that system. Most of the time, unless you are configuring a tenant for your organization, all the systems you register will stay private. Do not mistake the term private for isolated. Private simply means not public. Another way to think of private systems is as “invitation only.” You are free to share your system as many or as few people as you want and it will still remain a private system.

Public systems

Public systems are available for use by every API user within your tenant. Once public, systems inherit specific behavior unique to their type. We will cover each system type in turn.

Public Storage Systems

Public storage systems enforce a virtual user home directory with implied user permissions. The following table gives a brief summary of the permission implications. You can read more about chan in the Data Permissions tutorial.

rootDir homeDir URL path User permission
/ /home READ
/ /home / READ
/ /home /var READ
/ /home systest ALL
/ /home systest/some/subdir ALL
/ /home rjohnson NONE

Notice in the above example that on public systems, users will have implied ownership of a folder matching their username in the system’s homeDir. In the table, this means that user “systest” will have ownership of the physical home directory /home/systest on the system after it’s public. It is important that, before publishing a system, you make sure that the account used to access the system can actually write to these folders. Otherwise, users will not be able to access their data on the system you make public.

Before making a system public, make sure that you have a strategy for mapping API users to directories on the system you want to expose. If mapping to the /home folder on a Unix system, make sure the account used to access the system has write access to all user directories.

Public Execution Systems

Public execution systems do not share the same behavior as public storage systems. Unless explicit permission has been given, public execution systems are not accessible for data access by non-privileged users. This is because public systems allow all users to run applications on them and granting public access to the file system would expose user job data to all users. If you do need to expose the data on a public execution system, either register it again as a storage system (using an appropriate rootDir outside of the system scratchDir and workDir paths), or grant specific users a role on the system.

Publishing a system

To publish a system and make it public, you make a PUT request on the system’s url.

tapis systems publish -v $SYSTEM_ID
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -H "Content-Type: application/json"
    -X PUT
    --data-binary '{"action":"publish"}'
    https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID

The response from the service will be the same system description we saw before, this time with the public attribute set to true.

Unpublishing a system

tapis systems unpublish -v $SYSTEM_ID
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -H "Content-Type: application/json"
    -X PUT
    --data-binary '{"action":"unpublish"}'
    https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID

The response from the service will be the same system description we saw before, this time with the public attribute set to false.

To unpublish a system, make the same request with the action attribute set to unpublish.

Default systems

As you continue to use Tapis over time, it will not be uncommon for you to accumulate additional storage and execution systems through both self-registration and other people sharing their systems with you. It may even be the case that you have multiple public systems available to you. In this situation, it is helpful for both you and your users to specify what the default systems should be.

Default systems are the systems that are used when the user does not specify a system to use when performing a remote action in Tapis. For example, specifying an archivePath in a job request, but no archiveSystem, or specifying a deploymentPath in an app description, but no deploymentSystem. In these situations, Tapis will use the user’s default storage system.

Four types of default systems are possible. The following table describes them.

Type Scope Role needed to set Description
storage user default USER Default storage system for an individual user. This takes priority over any global defaults and will be used in all data operations in leu of a system being specified for this user.
storage global default Tenant admin Default storage system for an entire tenant. This will be used as the default storage system whenever a user has not explicitly specified another. Only public systems may be made the global default.
execution user default USER Default execution system for an individual user. This takes priority over any global defaults and will be used in all app and job operations in leu of an execution system being specified for this user. In the case of app registration, normal user role requirements apply.
execution global default Tenant admin Default execution system for an entire tenant. This will be used as the default execution system whenever a user has not explicitly specified another. Only public systems may be made the global default.

As a best practice, it is recommended to always specify the system you intend to use when interacting with Tapis. This will eliminate ambiguity in each request and make your actions more repeatable over time as the availability and configuration of the global and user default systems may change.

Setting user default system

To set a system as the user’s default, you make a PUT request on the system’s url. Only systems the user has access to may be used as their default.

tapis systems default set $SYSTEM_ID
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -H "Content-Type: application/json"
    -X PUT
    --data-binary '{"action":"setDefault"}'
    https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID

The response from the service will be the same system description we saw before, this time with the default attribute set to true.

Unsetting user default system

tapis systems default unset $SYSTEM_ID
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -H "Content-Type: application/json"
    -X PUT
    --data-binary '{"action":"unsetDefault"}'
    https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID

The response from the service will be the same system description we saw before, this time with the default attribute set to false.

To remove a system as the user’s default, make the same request with the action attribute set to unsetDefault. Keep in mind that you cannot remove the global default system from being the user’s default. You can only set a different one to replace it.

Setting global default system

Tenant administrators may wish to set default storage and execution systems for an entire tenant. These are called global default systems. There may be at most one system of each type set as a global default. To set a global default system, first make sure that the system is public. Only public systems may be set as a global default. Next, make sure you have administrator permissions for your tenant. Only tenant admins may publish systems and manage the global defaults. Lastly, make a PUT request on the system’s url with an action attribute in the body set to unsetGlobalDefault.

tapis systems default set -G $SYSTEM_ID
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -H "Content-Type: application/json"
    -X PUT
    --data-binary '{"action":"setGlobalDefault"}'
    https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID

The response from the service will be the same system description we saw before, this time with both the default and public attributes set to true.

Setting global default systems does not preclude users from manually setting their own default systems. Any user-defined default systems will trump the global default system setting for that user.

To remove a system from being the global default, make the same request with the action attribute set to unsetGlobalDefault.

tapis systems default unset -G $SYSTEM_ID
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -H "Content-Type: application/json"
    -X PUT
    --data-binary '{"action":"unsetGlobalDefault"}'
    https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID

This time the response from the service will have default set to false and public set to true.

Files

The Tapis Files service allows you to manage data across multiple storage systems using multiple protocols. It supports traditional file operations such as directory listing, renaming, copying, deleting, and upload/download that are traditional to most file services. It also supports file importing from arbitrary locations, metadata assignment, and a full access control layer allowing you to keep your data private, share it with your colleagues, or make it publicly available.

Files service URL structure

Canonical URL for all file items accessible in the Platform

https://api.tacc.utexas.edu/files/v2/media/system/$SYSTEM_ID/$PATH

Every file and directory referenced through the Files service has a canonical URL show in the first example. The following table defines each component:

Token Description
$SYSTEM_ID The id of the system where the file or directory lives. The correspond to the ids returned from the Systems service.
$PATH (Optional:) The path on the remote system. By default, all paths are relative to the home directory defined in the system description. To specify an absolute path, prefix the path with a `/`. For more on path resolution, see the next section.

Tapis also supports the concept of default systems. Excluding the /system/$SYSTEM_ID segments from the above URL, the Files service will automatically assume you are referencing your default storage system. Thus, if your default system was api.tacc.cloud, the following two examples would be identical.

If api.tacc.cloud is your default storage system then

https://api.tacc.utexas.edu/files/v2/media/shared

is equivalent to this:

https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/shared

This comes in especially handy when referencing your default system paths in other contexts such as job requests and when interacting with the Tapis CLI. A good example of this situation is when you have a global default storage system accessible to all your users. In this case, most users will use that for all of their data staging and archiving needs. These users may find it easier not to even think about the system they are using. The default system support in the Files service allows them to do just that.

When building applications against the Files service, it is considered a best practice to always specify the intended system ID when constructing URL paths to avoid situations where users change their default systems. This will also provide long-term stability to your data references and make debugging much easier. You can read more about default systems in the Systems Guide.

Understanding file paths

One powerful, but potentially confusing feature of Tapis is its support for virtualizing systems paths. Every registered system specifies both a root directory, rootDir, and a home directory, homeDir attribute in its storage configuration. rootDir tells Tapis the absolute path on the remote system that it should treat as /. Similar to the Linux chroot command; no requests made to Tapis will ever be resolved to locations outside of rootDir.

Type of storage system Examples of rootDir values
Linux
  • Actual system root directory, `/`
  • RAID array physically attached to the system
  • NSF mount you want to share
  • An arbitrary file path, such as your `$HOME` directory from which you want to server application data.
Cloud
  • A bucket on S3
  • A folder/marker file in your object store
iRODS
  • A specific resource or zone you want to expose.
  • A collection you want to publish for use
  • Your personal home folder

homeDir specifies the path, relative to rootDir, that Tapis should use for relative paths. Since Tapis is stateless, there is no concept of a current working directory. Thus, when you specify a path to Tapis that does not begin with a /, Tapis will always prefix the path with the value of homeDir. The following table gives several examples of how different combinations of rootDir, homeDir, and URL paths will be resolved by Tapis.

"rootDir" value "homeDir" value Tapis URL path Resolved path on system
/ / -- /
/ / .. /
/ / home /home
/ / /home /home
/ /home/nryan -- /home/nryan
/ /home/nryan / /
/ /home/nryan .. /home
/ /home/nryan nryan /home/nryan/nryan
/ /home/nryan /nryan /nryan
/home/nryan / -- /home/nryan
/home/nryan / .. /home/nryan
/home/nryan /home / /home/nryan
/home/nryan /home .. /home/nryan
/home/nryan /home home /home/nryan/home/home
/home/nryan /home /bgibson /home/nryan/bgibson

Transferring data

Before we talk about how to do basic operations on your data, let’s first talk about how you can move your data around. You already have a storage system available to you, so we will start with the “hello world” of data movement, uploading a file.

Uploading data

Uploading a file

tapis files upload agave://tacc.work.taccuser files/picksumipsum.txt
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -X POST \
    -F "fileToUpload=@files/picksumipsum.txt" \
    https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan

The response will look something like this:

{
    "internalUsername": null,
    "lastModified": "2014-09-03T10:28:09.943-05:00",
    "name": "picksumipsum.txt",
    "nativeFormat": "raw",
    "owner": "nryan",
    "path": "/home/nryan/picksumipsum.txt",
    "source": "http://127.0.0.1/picksumipsum.txt",
    "status": "STAGING_QUEUED",
    "systemId": "api.tacc.cloud",
    "uuid": "0001409758089943-5056a550b8-0001-002",
    "_links": {
        "history": {
            "href": "https://api.tacc.utexas.edu/files/v2/history/system/api.tacc.cloud/nryan/picksumipsum.txt"
        },
        "self": {
            "href": "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan/picksumipsum.txt"
        },
        "system": {
            "href": "https://api.tacc.utexas.edu/systems/v2/api.tacc.cloud"
        }
    }
}

You may upload data to a remote systems by performing a multipart POST on the FILES service. If you are using the Tapis CLI, you can perform recursive directory uploads. If you are manually calling curl or building an app with the Tapis SDK, you will need to implement the recursion yourself. You can take a look in the files-upload script to see how this is done. The following is an example of how to upload a file that we will use in the remainder of this tutorial.

You will see a progress bar while the file uploads, followed by a response from the server with a description of the uploaded file. Tapis does not block during data movement operations, so it may be just a moment before the file physically shows up on the remote system.

Importing data

You can also have Tapis download data from an external URL. Rather than making a multipart file upload request, you can pass in a JSON object with the URL and an optional target file name, type, and array of notifications subscriptions. Tapis supports several protocols for ingestion listed in the next table.

Schema Details
http Supported with and without user info
https Supported with and without user info
ftp Anonymous FTP only
sftp User info required in URL
agave No user info supported.

To demonstrate how this works, we will import a README.md file from the Tapis Samples git repository in Bitbucket.

Download a file from a web accessible URL

Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
    -- data &#039;{ "url":"https://bitbucket.org/agaveapi/science-api-samples/raw/master/README.md"}&#039;
    https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan

The response will look something like this:

{
    "name" : "README.md",
    "uuid" : "0001409758713912-5056a550b8-0001-002",
    "owner" : "nryan",
    "internalUsername" : null,
    "lastModified" : "2014-09-10T20:00:55.266-05:00",
    "source" : "https://bitbucket.org/agaveapi/science-api-samples/raw/master/README.md",
    "path" : "/home/nryan/README.md",
    "status" : "STAGING_QUEUED",
    "systemId" : "api.tacc.cloud",
    "nativeFormat" : "raw",
    "_links" : {
      "self" : {
        "href" : "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan/README.md"
      },
      "system" : {
        "href" : "https://api.tacc.utexas.edu/systems/v2/api.tacc.cloud"
      },
      "history" : {
        "href" : "https://api.tacc.utexas.edu/files/v2/history/system/api.tacc.cloud/nryan/README.md"
      }
    }
}

Downloading data from a third party is done offline as an asynchronous activity, so the response from the server will come right away. One thing worth noting is that the file length given in the response will always be -1. This is because, generally speaking, Tapis does not know what the actual source file size is until after the repsonse is send back. The file size will be updated as the download progresses. You can track the progress by querying the destination file item’s history. An entry will be present showing the progress of the download.

For this exercise, the file we just downloaded is just a few KB, so you should see it appear in your home folder on api.tacc.cloud almost immediately. If you were importing larger datasets, the transfer could take significantly longer depending on the network quality between Tapis and the source location. In this case, you would see the file size continue to increase until it completed. In the event of a failed transfer, Tapis will retry several times before canceling the transfer.

Tapis attempts to make smart decisions about how and when to transfer data. This includes leveraging third-party transfers whenever possible, scaling directory copies out horizontally, and taking advantage of chunked or parallel uploads. As a result, data may arrive in a non-deterministic way on the target system. This is normal and should be expected.

Transferring data

Transferring data between systems

Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data-binary '{"url":"agave://stampede.tacc.utexas.edu//etc/motd"}' \
    https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan

The response from the service will be the same as the one we received importing a file.

Much like downloading data, Tapis can manage the transfer of data between registered systems. This is, in fact, how data is staged prior to running a simulation. Data transfers are carried out asynchronously, so you can simply start a transfer and go about your business. Tapis will ensure it completes. If you would like a notification when the transfer completes or reaches a certain stage, you can subscribe for one or more emails, webhooks, and/or realtime notifications, and Tapis will alert them when as the transfer progresses. The following table lists the available file events. For more information about the events and notifications systems, please see the Notifications Guide and Event Reference.

In the example below, we will transfer a file from stampede.tacc.utexas.edu to api.tacc.cloud. While the request looks pretty basic, there is a lot going on behind the scenes. Tapis will authenticate to both systems, check permissions, stream data out of Stampede using GridFTP and proxy it into api.tacc.cloud using the SFTP protocol, adjusting the transfer buffer size along the way to optimize throughput. Doing this by hand is both painful and error prone. Doing it with Tapis is nearly identical to copying a file from one directory to another on your local system.

One of the benefits of the Files service is that it frees you up to work in parallel and scale with your application demands. In the next example we will use the Files service to create redundant archives of a shared project directory.

Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data-binary '{"url":"agave://api.tacc.cloud/nryan/foo_project"}' \
    https://api.tacc.utexas.edu/files/v2/media/system/nryan.storage1/

Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data-binary '{"url":"agave://api.tacc.cloud/nryan/foo_project"}' \
    https://api.tacc.utexas.edu/files/v2/media/system/nryan.storage2/

Notice in the above examples that the Files services works identically regardless of whether the source is a file or directory. If the source is a file, it will copy the file. If the source is a directory, it will recursively process the contents until everything has been copied.

Basic data operations

Now that we understand how to move data into, out of, and between systems, we will look at how to perform file operations on the data. Again, remember that the Files service gives you a common REST interface to all your storage and execution systems regardless of the authentication mechanism or protocol they use. The examples below will use your default public storage system, but they would work identically with any storage system you have access to.

Directory listing

Listing a file or directory

tapis files list -v agave://tacc.work.taccuser/
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    https://api.tacc.utexas.edu/files/v2/listings/system/api.tacc.cloud/nryan

The response would look something like this:

[
    {
        "format": "folder",
        "lastModified": "2012-08-03T06:30:12.000-05:00",
        "length": 0,
        "mimeType": "text/directory",
        "name": ".",
        "path": "nryan",
        "permisssions": "ALL",
        "system": "api.tacc.cloud",
        "type": "dir",
        "_links": {
            "self": {
                "href": "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan"
            },
            "system": {
                "href": "https://api.tacc.utexas.edu/systems/v2/api.tacc.cloud"
            }
        }
    },
    {
    "format": "raw",
    "lastModified": "2014-09-10T19:47:44.000-05:00",
    "length": 3235,
    "mimeType": "text/plain",
    "name": "picksumipsum.txt",
    "path": "nryan/picksumipsum.txt",
    "permissions": "ALL",
    "system": "api.tacc.cloud",
    "type": "file",
    "_links": {
            "self": {
                "href": "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan/picksumipsum.txt"
        },
        "system": {
            "href": "https://api.tacc.utexas.edu/systems/v2/api.tacc.cloud"
        }
    }
    }
]

Obtaining a directory listing, or information about a specific file is done by making a GET request on the /files/v2/listings/ resource.

The response to this contains a summary listing of the contents of your home directory on api.tacc.cloud. Appending a file path to your commands above would give information on a specific file.

Move, copy, rename, delete

Basic file operations are available by sending a POST request the the /files/v2/media/ collection with the following parameters.

Attribute Description
action The action you want to perform. Select one of "move", "copy", "rename", "mkdir".
path Full path to the destination file or folder. This may be the name of a new directory or renamed file, or an absolute or relative Tapis path where the file or directory should be copied/moved.

Copying files and directories

Copy a file item within the same system.
tapis files copy AGAVE_URI DESTINATION
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X PUT \
    --data-binary '{"action":"copy","path":"$DESTPATH"}' \
    https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/$PATH

The response from a copy operation will be a JSON object describing the new file or folder.

Copying can be performed on any remote system. Unlike the Unix cp command, all copy invocations in Tapis will overwrite the destination target if it exists. In the event of a directory collision, the contents of the two directory trees will be merged with the source overwriting the destination. Any overwritten files will maintain their provenance records and have an additional entry added to record the copy operation.

Moving files and directories

tapis files move AGAVE_URI DESTINATION
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X PUT \
    --data-binary '{"action":"move","path":"$DESTPATH"}' \
    https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/$PATH

Moving can be performed on any remote system. Moving a file or directory will overwrite the destination target if it exists. Unlike copy operations, the destination will be completely replaced by the source in the event of a collision. No merge will take place. Further, the provenance of the source will replace that of the target.

Renaming files and directories

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X PUT \
    --data-binary '{"action":"rename","path":"$NEWNAME"}' \
    https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/$PATH

Renaming, like copying and moving, is only applicable within the context of a single system. Unlike on Unix systems, renaming and moving are not synonymous. When specifying a new name for a file or directory, the new name is relative to the parent directory of the original file or directory. Also, If a file or directory already exists with that name, the operation will fail and an error message will be returned. All provenance information will follow the renamed file or directory.

Creating a new directory

tapis files mkdir AGAVE_URI DIRECTORY
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X PUT \
    --data-binary '{"action":"mkdir","path":"$NEWDIR"}' \
    https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/$PATH

Creating a new directory is a recursive action in Tapis. If the parent directories do not exist, they will be created on the fly. If a file or directory already exists with that name, the operation will fail and an error message will be returned.

Deleting a file item

tapis files delete AGAVE_URI
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -X DELETE \
    https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/$PATH

A standard Tapis response with an empty result value will be returned. As with creating a directory, deleting a file or directory is a recursive action in Tapis. No prompt or warning will be given once the request is sent. It is up to you to implement such checks in your application logic and/or user interface.

File history

A full history of changes, permissions changes, and access events made through the Files API is recorded for every file and folder on registered Tapis systems. The recorded history events represent a subset of the events thrown by the Files API. Generally speaking, the events saved in a file item’s history represent mutations on the physical file item or its metadata.

Direct vs indirect events

Tapis will record both direct and indirect events made on a file item. Examples of direct events are transferring a directory from one system to another or renaming a file. Examples of indirect events are a user manually deleting a file from the command line. The table below contains a list of all the provenance actions recorded.

Event Description
CREATED File or directory was created
DELETED The file was deleted
RENAME The file was renamed
MOVED The file was moved to another path
OVERWRITTEN The file was overwritten
PERMISSION_GRANT A user permission was added
PERMISSION_REVOKE A user permission was deleted
STAGING_QUEUED File/folder queued for staging
STAGING File or directory is currently in flight
STAGING_FAILED Staging failed
STAGING_COMPLETED Staging completed successfully
PREPROCESSING Prepairing file for processing
TRANSFORMING_QUEUED File/folder queued for transform
TRANSFORMING Transforming file/folder
TRANSFORMING_FAILED Transform failed
TRANSFORMING_COMPLETED Transform completed successfully
UPLOADED New content was uploaded to the file.
CONTENT_CHANGED Content changed within this file/folder. If a folder, this event will be thrown whenever content changes in any file within this folder at most one level deep.

Out of band file system changes

Tapis does not own the storage and execution systems you access through the Science APIs, so it cannot guarantee that everything that every possible change made to the file system is recorded. Thus, Tapis takes a best-effort approach to provenance allowing you to choose, through your own use of best practices, how thorough you want the provenance trail of your data to be.

Listing file history

List the history of a file item

tapis files history -v agave:://tacc.work.taccuser/nryan/picksumipsum.txt
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    https://api.tacc.utexas.edu/files/v2/history/nryan/picksumipsum.txt

The response to this contains a summary listing all permissions on the

[
  {
    "status": "DOWNLOAD",
    "created": "2016-09-20T19:47:56.000-05:00",
    "createdBy": "public",
    "description": "File was downloaded"
  },
  {
    "status": "STAGING_QUEUED",
    "created": "2016-09-20T19:48:12.000-05:00",
    "createdBy": "nryan",
    "description": "File/folder queued for staging"
  },
  {
    "status": "STAGING_COMPLETED",
    "created": "2016-09-20T19:48:16.000-05:00",
    "createdBy": "nryan",
    "description": "Staging completed successfully"
  },
  {
    "status": "TRANSFORMING_COMPLETED",
    "created": "2016-09-20T19:48:17.000-05:00",
    "createdBy": "nryan",
    "description": "Your scheduled transfer of http://129.114.97.92/picksumipsum.txt completed staging. You can access the raw file on iPlant Data Store at /home/nryan/picksumipsum.txt or via the API at https://api.tacc.utexas.edu/files/v2/media/system/data.agaveapi.co//nryan/picksumipsum.txt."
  }
]

Basic paginated listing of file item history events is available as shown in the example. Currently, the file history service is readonly. The only way to erase the history on a file item is to delete the file item through the API.

File metadata management

In many systems, the concept of metadata is directly tied to the notion of a file system. Tapis takes a broader view of metadata and supports it as its own first class resource in the REST API. For more information on how to leverage metadata in Tapis, please consult the Metadata Guide. In there we cover all aspects of how to manage, search, validate, and associate metadata across your entire digital lab.

File permissions

Tapis has a fine-grained permission model supporting use cases from creating and exposing readonly storage systems to sharing individual files and folders with one or more users. The permissions available for files items are listed in the following table. Please note that a user must have WRITE permissions to grant or revoke permissions on a file item.

Name Description
READ User can view, but not edit or execute the resource
WRITE User can edit, but not view or execute the resource
EXECUTE User can execute, but not view or edit the resource
READ_WRITE User can view and write the resource, but not execute
READ_EXECUTE User can view and execute the resource, but not edit it
WRITE_EXECUTE User can edit and execute the resource, but not view it
ALL User has full control over the resource
NONE User has all permissions revoked on the given resource

Listing all permissions

List the permissions on a file item

tapis files pems list agave://tacc.work.taccuser/test_folder/picksumipsum.txt
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
  'https://tacc.cloud/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?pretty=true''

The response will look something like the following:

[
  {
    "username": "nryan",
    "internalUsername": null,
    "permission": {
      "read": true,
      "write": true,
      "execute": true
    },
    "recursive": true,
    "_links": {
      "self": {
        "href": "https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?username.eq=nryan"
      },
      "file": {
        "href": "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan/picksumipsum.txt"
      },
      "profile": {
        "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
      }
    }
  }
]

To list all permissions for a file item, make a GET request on the file item’s permission collection

List permissions for a specific user

List the permissions on a file item for a given user

tapis files pems show agave://tacc.work.taccuser rclemens
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
  https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?username=rclemens

The response will look something like the following:

{
  "username":"rclemens",
  "permission":{
    "read":true,
    "write":true
  },
  "_links":{
    "self":{
      "href":"https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?username=rclemens"
    },
    "parent":{
      "href":"https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt"
    },
    "profile":{
      "href":"https://api.tacc.utexas.edu/profiles/v2/rclemens"
    }
  }
}

Checking permissions for a single user is done using Tapis URL query search syntax.

Grant permissions

Grant read access to a file item

tapis files pems grant agave://tacc.work.taccuser rclemens READ
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -X POST \
  --data '{"username":"rclemens", "permission":"READ"}' \
  https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt

Grant read and write access to a file item

tapis files pems grant agave://tacc.work.taccuser rclemens READ_WRITE
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -X POST \
  --data '{"username","rclemens", "permission":"READ_WRITE"}' \
  https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt

The response will look something like the following

[
  {
    "username": "rclemens",
    "internalUsername": null,
    "permission": {
      "read": true,
      "write": true,
      "execute": false
    },
    "recursive": false,
    "_links": {
      "self": {
        "href": "https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?username.eq=rclemens"
      },
      "file": {
        "href": "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan/picksumipsum.txt"
      },
      "profile": {
        "href": "https://api.tacc.utexas.edu/profiles/v2/rclemens"
      }
    }
  }
]

To grant another user read access to your metadata item, assign them READ permission. To enable another user to update a file item, grant them READ_WRITE or ALL access.

Delete single user permissions

Delete permission for single user on a file item

tapis files pems revoke agave://tacc.work.taccuser rclemens
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data '{"username","rclemens", "permission":"NONE"}' \
    https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt

A response similiar to the following will be returned

[
  {
    "username": "rclemens",
    "internalUsername": null,
    "permission": {
      "read": false,
      "write": false,
      "execute": false
    },
    "recursive": false,
    "_links": {
      "self": {
        "href": "https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?username.eq=rclemens"
      },
      "file": {
        "href": "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan/picksumipsum.txt"
      },
      "profile": {
        "href": "https://api.tacc.utexas.edu/profiles/v2/rclemens"
      }
    }
  }
]

Permissions may be deleted for a single user by making a DELETE request on the metadata user permission resource. This will immediately revoke all permissions to the file item for that user.

Please note that ownership cannot be revoked or reassigned. The user who created the metadata item will always have ownership of that item.

Deleting all permissions

Delete all permissions on a file item

tapis files pems drop agave://tacc.work.taccuser
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data '{"username","*", "permission":"NONE"}' \
    https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -X DELETE \
    https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt

An empty response will be returned from the service. Permissions may be cleared for all users on a file item by making a DELETE request on the file item permission collection.

The above operation will delete all permissions for a file item, such that only the owner will be able to access it. Use with care.

Recursive operations

Recursively delete all permissions on a directory

Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data '{"username","*", "permission":"READ_WRITE", "recursive": true}' \
    https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/directory/

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -X DELETE \
    https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?recursive=true

An empty response will be returned from the service on delete. Update will return something like the following.

[
  {
    "username": "nryan",
    "internalUsername": null,
    "permission": {
      "read": true,
      "write": true,
      "execute": true
    },
    "recursive": true,
    "_links": {
      "self": {
        "href": "https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?username.eq=nryan"
      },
      "file": {
        "href": "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan/picksumipsum.txt"
      },
      "profile": {
        "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
      }
    }
  }
]

When dealing with directories, the permission operations you perform will apply onto to the directory item itself. Permissions will not automatically propagate to the directory contents. In cases where you want to recursively apply permissions to the entire directory tree, you can do so by including the recursive attribute in your permission objects or to your URL query parameters when making a DELETE request.

Publishing data

Tapis provides multiple ways to share your data with your colleagues and the general public. In addition to the standard permission model enabling you to share your data with one or more authenticated users within the Platform, you also have the ability to publish your data and make it available via an unauthenticated public URL. Unlike traditional web and cloud hosting, your data remains in its original location and is served in situ by Tapis upon user request.

Publishing a file for folder is simply a matter of granting the special public user READ permission on a file or folder. Similar to the way listings and permissions are exposed through unique paths in the Files API, published data is served from a custom /files/v2/download path. The public data URLs have the following structure:

https://api.tacc.utexas.edu/files/v2/download/<username>/system/<system_id>/<path>

Notice two things. First, a username is inserted after the download path element. This is needed because there is no authorized user for whom to validate system or file ownership on a public request. The username gives the context by which to verify the availability of the system and file item being requested. Second, the system_id is mandatory in public data requests. This ensures that the public URL remains the same even when the default storage system of the user who published it changes.

The following sections give examples of publishing files and folders in the Tapis Platform.

See the PostIts Guide for other ways to securely share your data with others.

Publishing individual files

Publish file item on your default storage system for public access

tapis files pems grant agave://tacc.work.taccuser/nryan/picksumipsum.txt public READ
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -X POST \
  --data '{"username","public", "permission":"READ"}' \
  https://api.tacc.utexas.edu/files/v2/pems/nryan/picksumipsum.txt

Publish file item on a named system for public access

Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -X POST \
  --data '{"username","public", "permission":"READ"}' \
  https://api.tacc.utexas.edu/files/v2/pems/system/data.iplantcollaborative.org/nryan/picksumipsum.txt

The response will look something like the following:

{
  "username": "public",
  "permission": {
    "read": true,
    "write": false,
    "execute": false
  },
  "recursive": false,
  "_links": {
    "self": {
      "href": "https://api.tacc.utexas.edu/files/v2/pems/system/data.iplantcollaborative.org/nryan/picksumipsum.txt?username.eq=public"
    },
    "file": {
      "href": "https://api.tacc.utexas.edu/files/v2/pems/system/data.iplantcollaborative.org/nryan/picksumipsum.txt"
    },
    "profile": {
      "href": "https://api.tacc.utexas.edu/profiles/v2/public"
    }
  }
}

Publishing a file for folder is simply a matter of giving the special public user READ permission on the file. Once published, the file will be available at the following URL:

https://api.tacc.utexas.edu/files/v2/download/nryan/system/data.iplantcollaborative.org/nryan/picksumipsum.txt

Publishing directories

Publish directory on your default storage system for public access

Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -X POST \
  --data '{"username","public", "permission":"READ", "recursive": true}' \
  https://api.tacc.utexas.edu/files/v2/pems/nryan/public

Publish directory on a named system for public access

Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
   -H "Content-Type: application/json" \
   -X POST \
   --data '{"username","public", "permission":"READ", "recursive": true}' \
   https://api.tacc.utexas.edu/files/v2/pems/system/data.iplantcollaborative.org/nryan/public

The response will look something like the following:

{
  "username": "public",
  "permission": {
    "read": true,
    "write": false,
    "execute": false
  },
  "recursive": true,
  "_links": {
    "self": {
      "href": "https://api.tacc.utexas.edu/files/v2/pems/system/data.iplantcollaborative.org/nryan/public?username.eq=public"
    },
    "file": {
      "href": "https://api.tacc.utexas.edu/files/v2/pems/system/data.iplantcollaborative.org/nryan/public"
    },
    "profile": {
      "href": "https://api.tacc.utexas.edu/profiles/v2/public"
    }
  }
}

Publishing an entire directory is identical to publishing a single file item. To make all the contents of the directory public as well, include a recursive field to your request with a value of true. Once published, the directory and all its contents will be avaialble for download. The above example will make every file and folder in the “nryan/public” directory of “data.iplantcollaborative.org” available for download at the following URL:

https://papi.tacc.utexas.edu/files/v2/download/nryan/system/data.iplantcollaborative.org/nryan/public

Remember that whenever you publish a folder, anything you put in that folder becomes publicly available. As with any cloud storage service, think before blindly copying data into your cloud storage. If you want to restrict the duration or frequency which your public data is accessed, you should see the PostIts Guide for other ways to securely share your data with others.

Publishing considerations

Publishing data through Tapis can be a great way to share and access data. There are situations in which it may not be an ideal choice. We list several of the pitfalls user run into when publishing their data.

Large file publishing

Before publishing your large datasets, take a step back and consider how you might leverage the Files or Transfers API to reliable serve up your data. HTTP is not the fastest way to serve up the data, and it may not be the best usage pattern for applications hoping to consume it. Thinking through your use case is well worth the time, even if publishing ends up being the best approach.

Static website hosting

Website hosting is a fairly common use case for data publishing. The challenge is that your assets are still hosted remotely from our API servers and fetched on demand. This can create some heavy latency when serving up lots of assets. Depending on the nature of your backend storage solution, it may not easily handle access patterns common to the web. In those situations, you may see some files fail to load from time to time. If your site has many files, even a small failure rate can keep your site from reliably loading.

If you are going to use the file publishing service for web hosting, the following tips can help improve your overall experience.

  1. Whenever possible, reference versions of your css, fonts, and javascript dependencies hosted on public CDN. CloudFlare, Google, and Amazon all host public mirrors of the most popular javascript libraries and frameworks. Linking to those can greatly speed up your load time.
  2. Use a technology like Webpack to reduce the number of files needed to serve your application.
  3. Lazy load your assets with oclazyload, requirejs or including async attributes on your <script> elements.
  4. Store your assets on a storage system with as little connection and protocol overhead as possible. That means avoiding tape archives, gridftp, overprovisioned shared resources, and systems only accessible through a proxied connection. While the service will still work in all of these situations, it is common for the overhead involved in establishing a connection and authenticating to take longer than the actual file transfer when the file is small. Simply avoiding slower storage protocols can greating speed up your application’s load time.

Apps

An app, in the context of Tapis, is an executable code available for invocation through the Tapis Jobs service on a specific execution system. Put another way, an app is a piece of code that you can run on a specific system. If a single code needs to be run on multiple systems, each combination of app and system needs to be defined as an app.

Apps are language agnostic and may or may not carry with them their own dependencies. (More on bundling your app in a moment.) Any code that can be forked at the command line or submitted to a batch scheduler can be registered as a Tapis app and run through the Jobs service.

The Apps service is the central registry for all Tapis apps. The Apps service provides permissions, validation, archiving, and revision information about each app in addition to the usual discovery capability. The rest of this tutorial explains in detail how to register an app to the Apps service, how to manage and share apps, and what the different application scopes mean.

Inputs and Parameters

In this section we take a detailed look at the inputs and parameters sections of your app descriptions. Each of these sections takes an array of JSON objects. Each JSON object represents either a data source that needs staging in prior to job execution or a primary value passed into your app as a parameter. In either case, the JSON object only requires an id by which to reference the object in a job request, and a type field indicating primary type if the object represents a parameter.

In practice, you will want to add some descriptive information, constraints, and runtime validation checks to reduce the amount of error users can run into when attempting to run your app. The full lists of app input and parameter attributes are provided in their respective sections below. However, before we dive deeper into the next section on app inputs, let’s first get a big picture view of what we are doing when we define our app’s input and parameters.

Input and Parameter Information Flow

When a user submits a job request in step 1, they specify the inputs and parameters needed to run that job. Those attributes are defined in your app description. The Jobs service will use your app description to validate the values in the job request and either reject it with a descriptive error message as in step 2, or accept it as in step 4. Once the job request is accepted, the values provided for the inputs and parameters given in the job request are used to replace their corresponding template placeholder values in the wrapper script. For example, the job request assigned a value of foo for the input with id equal to input1. Before submitting the job request to the remote system, the Jobs service will replace all occurrences of ${input1} in the app wrapper script with foo. The same will happen with param1 and param2. All occurrences of ${param1} will be replaced with bar and all occurrences of ${param2} will be replaced with 2, just as specified in the job request.

information_source:
 Notice that Tapis will not handle variable quoting for you. It is up to you to handle any type casting, escaping, and quoting of template values necessary for your app’s logic.

As we look at how to define inputs and parameters for your app, keep this big picture in mind. The purpose of inputs is to specify data that needs to be staged prior to your job running and to tell your wrapper script about them. The purpose of parameters is to specify variables that need to be passed to your wrapper script. To do this, we only need a simple id by which to reference the values in a job request. The rest of what we will discuss in this tutorial is the mechanism that Tapis provides for you to validate, describe, discover, and restrict application inputs and parameters to provider better user and developer experiences using your app.

Inputs

The inputs attribute of your app description contains a JSON array of input objects. An input represents one or more pieces of data that your app will use at runtime. That data can be a single file, a directory, or a response from a web service. It can reside on a system that Tapis knows about, or at a publicly accessible URL. Regardless of where it lives and what it is, Tapis will grab the data (recursively if need be) and copy it to your job’s working directory just before execution.

information_source:
 In the Job management tutorial, we talk in detail about the job lifecycle. Here we simply point out that Tapis will handle the staging of your app’s deploymentPath separately from the staging of your assets. Thus, as a best practice, it is preferable to include all of the assets your app needs to run in your deploymentPath rather than defining them as inputs. This will allow Tapis to make better caching decisions and reduce the overall throughput when running a job.

A minimal input object contains a single inputs.[].id attribute that uniquely identifies it within the context of your app. Any alphanumeric value under 64 characters can be an identifier, but it must be unique among all the inputs and parameters in that app.

{
  "id": "input1"
}

Most of the time, such a minimal definition is not helpful. At the very least, you would want some descriptive information, a restriction on the cardinality, and potentially a default value. This can be achieved with the details, semantics, and value objects. The full list of input attributes is shown in the following table. We cover each attribute in the corresponding section below.

Name Type Description
id string Required: The textual id of this input. This value must be unique within all inputs and inputs for an app description.
details JSON object  
details.argument string A command line argument or flag to be prepended before the input value.
details.description string Human-readable description of the input. Often used to create contextual help in automatically generated UI.
details.label string Human-readable label for the input. Often implemented as text label next to the field in automatically generated UI.
details.showArgument boolean Whether to include the argument value for this input when performing the template variable replacement during job submission. If true, the details.argument value will be prepended, without spaces, to the actual input value(s).
details.repeatArgument boolean When multiple values are provided for this input, this attribute determines whether to include the argument value before each user-supplied value when performing the template variable replacement during job submission. The details.showArgument value must be true for this value to be applied.
semantics JSON object Describes the semantic definition of this inputs and the filetypes it represents. Multiple ontologies and values are supported.
semantics.fileTypes JSON array Array of string values describing the file types represented by this input. The types correspond to values from the Transforms service. Use “raw-0” for the time being.
semantics.minCardinality integer Minimum number of values this input must have.
semantics.maxCardinality integer Maximum number of values this input can have. A null value or value of -1 indicates no limit.
semantics.ontology JSON array List of ontology terms (or URIs pointing to ontology terms) applicable to the input. We recommend at least specifying an XSL Schema Simple Type.
value JSON object A description of the anticipated value and the situations when it is required.
value.default string, JSON array The default value for this input. This value is optional except when value.required is true and value.visible is false. Values may be absolute or relative paths on the user’s default storage sytem, a Tapis URI, or any valid URL with a supported schema.
value.order integer The order in which this input should appear when auto-generating a command line invocation.
value.required boolean Required: Is specification of this input mandatory to run a job?
value.validator string Perl-formatted regular expression to restrict valid values.
value.visible boolean When automatically generated a UI, should this field be visible to end users? If false, users will not be able to set this value in their job request.
value.enquote boolean Should the value be surrounded in quotation marks prior to injecting into the wrapper template at job runtime.

Input details section

The inputs.[].details object contains information specifying how to describe an input in different contexts. The description and label values provide human readable information appropriate for a tool tip and form label respectively. Neither of these attributes are required, however they dramatically improve the readability of your app description if you include them.

Often times you will need to translate your input value into actual command line arguments. By default, Tapis will replace all occurrences of your attribute inputs.[].id in your wrapper script with the value of that attribute in your job description. That means that you are responsible for inserting any command line flags or arguments into the wrapper script yourself. This is a pretty straightforward process, however in situations where an input is optional, the resulting command line could be broken if the user does not specify an input value in their job request. One way to work around this is to add a conditional check to the variable assignment and exclude the command line flag or argument if it does not have a value set. Another is to use the inputs.[].details.argument attribute.

The inputs.[].details.argument value describes the command line argument that corresponds to this input, and the inputs.[].details.showArgument attribute specifies whether the inputs.[].details.argument value should be injected into the wrapper template in front of the actual runtime value. The following table illustrates the result of these attributes in different scenarios.

argument showArgument Input value from job request Value injected into wrapper template
  true /etc/motd /etc/motd
-f true /etc/motd -f/etc/motd
-f (trailing space) true /etc/motd -f /etc/motd
-f false /etc/motd /etc/motd
–filename true /etc/motd –filename/etc/motd
–filename= true /etc/motd –filename=/etc/motd
–filename false /etc/motd /etc/motd

Input semantics section

The inputs.[].semantics object contains semantic information about the input. The minCardinality attribute specifies the minimum number of data sources that can be specified for the input. This attribute is used to validate the value(s) provided for the input in a job request. The ontology attribute specifies a JSON array of URLs pointing to the ontology definitions of this file type. (We recommend at least specifying an XSL Schema Simple Type{:target=”_blank”}.) Finally, the fileTypes attribute contains a JSON array of file type strings as specified in the transforms service. (In most situations you will leave the fileTypes attribute null or specify RAW-0 as the single file type in the array.)

Input value section

The inputs.[].value object contains the information needed to validate user-supplied input values in a job request. The validator attribute accepts a Perl regular expression which will be applied to the input value(s). Any submissions that do not match the validator expression will be rejected.

information_source:
 If inputs[].semantics.minCardinality is greater than 1, multiple values will be accepted for input. These values may be provided in a semicolon delimited list or in a JSON array. The values may be relative paths to the user’s default storage system, or URLs. Whatever value(s) the user provides, the validator will be applied independently to the entire value, not just the name.

The default attribute allows you to specify a default value for the input. This will be used in lieu of a user-supplied value if the input is required, but not visible. All default values must match the validator expression, if provided.

The required attribute specifies whether the input must be specified during a job submission.

The visible attribute takes a boolean value specifying whether the input should be accepted as a user-supplied value in a job request. If false, the value will be ignored at job submission and the default value will be used instead. Whenever visible is set to false, required must be true.

The order attribute is used to specify the order in which inputs should be listed in the response from the API and in command-line generation. By default, order is set to zero. Thus, providing a value greater than zero is sufficient to force any single input to be listed last.

Validating inputs

The previous section covered different ways you can specify for Tapis to validate and restrict the data inputs to your app. When a user submits an job request, the order in which they are applied is as follows.

  1. visible
  2. required
  3. minCardinality
  4. maxCardinality
  5. validator

Once an input passes these tests, Tapis will check that it exists and that the user has permission to access the data. Assuming everything passes, the input is accepted and scheduled for staging.

Parameters

The parameters attribute of your app description contains a JSON array of parameter objects. A parameter represents one or more arguments that your app will use at runtime. Those arguments can be more or less anything you want them to be. If, for some reason, your app handles data staging on its own and you do not want Tapis to move the data on your behalf, but you do need a data reference passed in, you can define it as a parameter rather than an input.

A minimal parameter object contains a single id attribute that uniquely identifies it within the context of your app and a value.type attribute specifying the primary type of the parameter. Any alphanumeric value under 64 characters can be an identifier, but it must be unique among all the inputs and parameters in that app. The parameter type is restricted to a handful of primary types listed in the table below.

{
  "id": "parameter1",
  "value": {
    "type": "string"
  }
}

In most situations you will want some descriptive information and validation of the user-supplied values for this parameter. As with your app inputs, app parameters have details, semantics, and value objects that allow you to do just that. The full list of parameter attributes is shown in the following table. We cover each attribute in the corresponding section below.

Name Type Description
id string Required: The textual id of this parameter. This value must be unique within all parameters and parameters for an app description.
details JSON object  
details.argument string A command line argument or flag to be prepended before the parameter value.
details.description string Human-readable description of the parameter. Often used to create contextual help in automatically generated UI.
details.label string Human-readable label for the parameter. Often implemented as text label next to the field in automatically generated UI.
details.showArgument boolean Whether to include the argument value for this parameter when performing the template variable replacement during job submission. If true, the details.argument value will be prepended, without spaces, to the actual parameter value(s).
details.repeatArgument boolean When multiple values are provided for this input, this attribute determines whether to include the argument value before each user-supplied value when performing the template variable replacement during job submission. The details.showArgument value must be true for this value to be applied.
semantics JSON object Describes the semantic definition of this parameters and the filetypes it represents. Multiple ontologies and values are supported.
semantics.minCardinality integer Minimum number of values this parameter must have.
semantics.maxCardinality integer Maximum number of values this parameter can have. A null value or value of -1 indicates no limit.
semantics.ontology JSON array List of ontology terms (or URIs pointing to ontology terms) applicable to the parameter. We recommend at least specifying an XSL Schema Simple Type.
value JSON object A description of the anticipated value and the situations when it is required.
value.default string, JSON array The default value for this parameter. This value can be left blank except when value.required is true and value.visible is false. If the value.type is of this parameter is enumeration, this value must be one of the specified value.enumValues. If the value.type is of this parameter is bool or flag, then only boolean values are accepted here.
value.enumValues JSON array An array of values specifying the possible values this parameter may have when value.type is enumeration. Both JSON Objects and strings are supported in the array. If a JSON Object is given, the object must be a single value attribute. The key will be the value passed into the wrapper template. The value will be the display value shown when auto-generating the option element in the select box representing this input.
value.order integer The order in which this parameter should appear when auto-generating a command line invocation.
value.required boolean Required: Is specification of this parameter mandatory to run a job?
value.type string, number, enumeration, bool, flag JSON type for this parameter (used to generate and validate UI).
value.validator string Perl-formatted regular expression to restrict valid values.
value.visible boolean When automatically generated a UI, should this field be visible to end users? If false, users will not be able to set this value in their job request.
value.enquote boolean Should the value be surrounded in quotation marks prior to injecting into the wrapper template at job runtime.

Parameter details section

The parameters.[].details object contains information specifying how to describe a parameter in different contexts and is identical to the inputs.[].details object.

Parameter semantics section

The parameters.[].semantics object contains semantic information about the parameter. Unlike the inputs.[].semantics object, it only has a single attribute, ontology. The ontology attribute specifies a JSON array of URLs pointing to the ontology definitions of this parameter type. (We recommend at least specifying an XSL Schema Simple Type{:target=”_blank”}.)

Parameter value section

The parameters.[].value object contains the information needed to validate user-supplied parameter values in a job request. The type attribute defines the primary type of this parameter’s values. The available types are:

  • number: any real number.
  • string: any JSON-escaped alphanumeric string.
  • bool: true or false.
  • flag: true or false. Identical to boolean, but only the argument value will be inserted into the wrapper template.
  • enumeration: a JSON array of strings values or JSON objects representing the acceptable values for this parameter. If an array of JSON objects is given, each object should have a single attribute with the key being a desired enumeration value, and the value being a human readable descriptive name for the enumerated value. The value of using objects vs strings is that object values provide a way to create more descriptive user interfaces by customizing both the content and value of a HTML select box’s option elements. An example of both is given below.
[
  "red",
  "white",
  "green",
  "black"
]

[
  { "red": "Deep Cherry Red" },
  { "white": "Bright White" },
  { "green": "Black Forest Green" },
  { "black": "Brilliant Black Crystal Pearl" }
]

The validator attribute accepts a Perl regular expression which will be applied to the input value(s). Any submissions that do not match the validator expression will be rejected. This attribute is available both to parameters of type number and string. It is not available to bool or flag parameter types, or to enumeration parameters as they require the enumValues attribute instead.

The default attribute allows you to specify a default value for the parameter. This will be used in lieu of a user-supplied value if the parameter is required, but not visible. All default values must match the appropriate validator if type is number or string, or be one of the values in the enumValues array if type is enumeration.

The enumValues attribute is a JSON array of alphanumeric values specifying the acceptable values for this input. This attribute only exists for enumeration parameter types.

The required attribute specifies whether the parameter must be specified during a job submission.

The visible attribute takes a boolean value specifying whether the parameter should be accepted as as a user-supplied value in a job requests. If false, the value will be ignored at job submission and the default value will be used instead. Whenever visible is set to false, required must be true.

The order attribute is used to specify the order in which parameters should be listed in the response from the API and in command-line generation. By default, order is set to 0. Thus, providing a value greater than zero is sufficient to force any single parameter to be listed last.

Validating parameters

The previous section covered different ways you can tell for Tapis to validate and restrict the parameters to your app. When a user submits an job request, the order in which they are applied is as follows.

  1. visible
  2. required
  3. type
  4. validator / enumValues

Wrapper Templates

In order to run your application, you will need to create a wrapper template that calls your executable code. The wrapper template is a simple script that Tapis will filter and execute to start your app. The filtering Tapis applies to your wrapper script is to inject runtime values from a job request into the script to replace the template variables representing the inputs and parameters of your app.

The order in which wrapper templates are processed in HPC and Condor apps is as follows.

  1. environment variables injected.
  2. startupScript run.
  3. Scheduler directives prepended to the wrapper template.
  4. additionalDirectives concatenated after the scheduler directives.
  5. Custom modules concatenated after the additionalDirectives.
  6. inputs and parameters template variables replaced with values from the job request.
  7. Blacklist commands, if present, are disabled in the scripts.
  8. Resulting script is written to the remote job execution folder and executed.

The order in which wrapper templates are processed in CLI apps is as follows.

  1. Shell environment sourced
  2. environment variables injected
  3. startupScript run
  4. Custom modules prepended to the top of the wrapper
  5. inputs and parameters template variables replaced with values from the job request
  6. Blacklist commands, if present, are disabled in the scripts.
  7. Resulting script is forked into the background immediately.

Environment

Comes from the system definition. Handle in your script if you cannot change the system definition to suit your needs. Ship whatever you need with your app’s assets.

Modules

See more about Modules and Lmod. Can be used to customize your environment, locate your application, and improve portability between systems. Tapis does not install or manage the module installation on a particular system, however it does know how to interact with it. Specifying the modules needed to run your app either in your wrapper template or in your system definition can greatly help you during the development process.

Default job macros

Tapis provides information about the job, system, and user as predefined macros you can use in your wrapper templates. The full list of runtime job macros are give in the following table.

Variable Description
AGAVE_JOB_APP_ID The appId for which the job was requested.
AGAVE_JOB_ARCHIVE Binary boolean value indicating whether the current job will be archived after the wrapper template exits.
AGAVE_JOB_ARCHIVE_SYSTEM The system to which the job will be archived after the wrapper template exits.
AGAVE_JOB_ARCHIVE_URL The fully qualified URL to the archive folder where the job output will be copied if archiving is enabled, or the URL of the output listing
AGAVE_JOB_ARCHIVE_PATH The path on the archiveSystem where the job output will be copied if archiving is enabled.
AGAVE_JOB_BATCH_QUEUE The batch queue on the AGAVE_JOB_EXECUTION_SYSTEM to which the job was submitted.
AGAVE_JOB_EXECUTION_SYSTEM The Tapis execution system id where this job is running.
AGAVE_JOB_ID The unique identifier of the job.
AGAVE_JOB_MEMORY_PER_NODE The amount of memory per node requested at submit time.
AGAVE_JOB_NAME The slugified version of the name of the job. See the section on Special Characters for more information about slugs.
AGAVE_JOB_NAME_RAW The name of the job as given at submit time.
AGAVE_JOB_NODE_COUNT The number of nodes requested at submit time.
AGAVE_JOB_OWNER The username of the job owner.
AGAVE_JOB_PROCESSORS_PER_NODE The number of cores requested at submit time.
AGAVE_JOB_SUBMIT_TIME The time at which the job was submitted in ISO-8601 format.
AGAVE_JOB_TENANT The id of the tenant to which the job was submitted.
AGAVE_JOB_ARCHIVE_URL The Tapis url to which the job will be archived after the job completes.
AGAVE_JOB_CALLBACK_RUNNING Represents a call back to the API stating the job has started.
AGAVE_JOB_CALLBACK_CLEANING_UP Represents a call back to the API stating the job is cleaning up.
AGAVE_JOB_CALLBACK_ALIVE Represents a call back to the API stating the job is still alive. This will essentially update the timestamp on the job and add an entry to the job's history record.
AGAVE_JOB_CALLBACK_NOTIFICATION Represents a call back to the API telling it to forward a notification to the registered endpoint for that job. If no notification is registered, this will be ignored.
AGAVE_JOB_CALLBACK_FAILURE Represents a call back to the API stating the job failed. Use this with caution as it will tell the API the job failed even if it has not yet completed. Upon receiving this callback, Tapis will abandon the job and skip any archiving that may have been requested. Think of this as kill -9 for the job lifecycle.

Input data

Tapis will stage the files and folders you specify as inputs to your app. These will be available in the top level of your job directory at runtime. Additionally, the names of each of the inputs will be injected into your wrapper template for you to use in your application logic. Please be aware that Tapis will not attempt to resolve namespace conflicts between your app inputs. That means that if a job specifies two inputs with the same name, one will overwrite the other during the input staging phase of the job and, though the variable names will be correctly injected to the wrapper script, your job will most likely fail due to missing data.

See the table below for fields that must be defined for an app’s inputs:

Field Mandatory Type Description
id X string This is the "name" of the file. You will use this in your wrapper script later whenever you need to refer to the BAM file being sorted
value.default string The path, relative to X, of the default value for the input
value.order integer Ignore for now
value.required X boolean Is specification of this input mandatory to run a job?
value.validator string Perl-format regular expression to restrict valid values
value.visible boolean When automatically generated a UI, should this field be visible to end users?
semantics.ontology array[string] List of ontology terms (or URIs pointing to ontology terms) applicable to the input format
semantics.minCardinality integer Minimum number of values accepted for this input
semantics.maxCardinality integer Maximum number of values accepted for this input
semantics.fileTypes X array[string] List of Tapis file types accepted. Always use "raw-0" for the time being
details.description string Human-readable description of the input. Often implemented as contextual help in automatically generated UI
details.label string Human-readable label for the input. Often implemented as text label next to the field in automatically generated UI
details.argument string The command-line argument associated with specifying this input at run time
details.showArgument boolean Include the argument in the substitution done by Tapis when a run script is generated

Variable injection

If you refer back to the app definition we used in the App Management Tutorial, you will see there are multiple inputs and parameters defined for that app. Each input and parameter object had an id attribute. That id value is the attribute name you use to associate runtime values with app inputs and parameters. When a job is submitted to Tapis, prior to physically running the wrapper template, all instances of that id are replaced with the actual value from the job request. The example below shows our app description, a job request, and the resulting wrapper template at run time.

Type declarations

During the jobs submission process, Tapis will store your inputs and parameters as serialized JSON. At the point that variable injection occurs, Tapis will replace all occurrences of your input and parameter with their value provided in the job request. In order for Tapis to properly identify your input and parameter ids, wrap them in brackets and prepend a dollar sign. For example, if you have a parameter with id param1, you would include it in your wrapper script as ${param1}. Case sensitivity is honored at all times.

Boolean values

Boolean values are passed in as truthy values. true = 1, false is empty.

Cardinality

Cardinality is not used in resolving wrapper template variables.

Parameter Flags

If your parameter was of type “flag”, Tapis will replace all occurences of the template variable with the value you provided for the argument field.

App packaging

Tapis API apps have a generalized structure that allows them to carry dependencies around with them. In the case below, package-name-version.dot.dot</em> is a folder that you build on your local system, then store in your Tapis Cloud Storage in a designated location (we recommend /home/username/applications/app_folder_name). It contains binaries, support scripts, test data, etc. all in one package. Tapis basically uses a very rough form of containerized applications (more on this later). We suggest you set your apps up to look something like the following:

Tapis runs a job by first transferring a copy of this directory into temporary directory on the target executionSystem. Then, the input data files (we’ll show you how to specify those are later) are staged into place automatically. Next, Tapis writes a scheduler submit script (using a template you provide i.e. script.template) and puts it in the queue on the target system. The Tapis service then monitors progress of the job and, assuming it completes, copies all newly-created files to the location specified when the job was submitted. Along the way, critical milestones and metadata are recorded in the job’s history.

Tapis app development proceeds via the following steps:

  1. Build the application locally on the executionSystem
  2. Ensure that you are able to run it directly on the executionSystem
  3. Describe the application using a Tapis app description
  4. Create a shell template for running the app
  5. Upload the application directory to a storageSystem
  6. Post the app description to the Tapis apps service
  7. Debug your app by running jobs and updating the app until it works as intended
  8. (Optional) Share the app with some friends to let them test it

Application metadata

Field Mandatory Type Description
checkpointable X boolean Application supports checkpointing
defaultMemoryPerNode integer Default RAM (GB) to request per compute node
defaultProcessorsPerNode integer Default processor count to request per compute node
defaultMaxRunTime integer Default maximum run time (hours:minutes:seconds) to request per compute node
defaultNodeCount integer Default number of compute nodes per job
defaultQueue string On HPC systems, default batch queue for jobs
deploymentPath X string Path relative to homeDir on deploymentSystem where application bundle will reside
deployementSystem X string The Tapis-registered STORAGE system upon which you have write permissions where the app bundle resides
executionSystem X string a Tapis-registered EXECUTION system upon which you have execute and app registration permissions where jobs will run
helpURI X string A URL pointing to help or description for the app you are deploying
label X string Human-readable title for the app
longDescription string A short paragraph describing the functionality of the app
modules array[string] Ordered list of modules on systems that use lmod or modules
name X string unique, URL-compatible (no special chars or spaces) name for the app
ontology X array[string] List of ontology terms (or URIs pointing to ontology terms) associated with the app
parallelism X string Is your application capable of using more than a single compute node? (SERIAL or PARALLEL)
shortDescription X string Brief description of the app
storageSystem X string The Tapis-registered STORAGE system upon which you have write permissions. Default source of and destination for data consumed and emitted by the app
tags array[string] List of human-readable tags for the app
templatePath X string Path to the shell template file, relative to deploymentPath
testPath X string Path to the shell test file, relative to deploymentPath
version X string Preferred format: Major.minor.point integer values for app

warning:The combination of name and version must be unique the entire iPlant API namespace.

Parameter metadata

Field Mandatory Type Description
id X string This is the "name" of the parameter. At runtime, it will be replaced in your script template based on the value passed as part of the job specification
value.default string If your app has a fixed-name output, specify it here
value.order integer Ignore for now. Supports automatic generation of command lines.
value.required boolean Is specification of this parameter mandatory to run a job?
value.type string JSON type for this parameter (used to generate and validate UI). Valid values: "string", "number", "enumeration", "bool", "flag"
value.validator string Perl-formatted regular expression to restrict valid values
value.visible boolean When automatically generated a UI, should this field be visible to end users?
semantics.ontology array[string] List of ontology terms (or URIs pointing to ontology terms) applicable to the parameter. We recommend at least specifying an XSL Schema Simple Type.
details.description string Human-readable description of the parameter. Often used to create contextual help in automatically generated UI
details.label string Human-readable label for the parameter. Often implemented as text label next to the field in automatically generated UI
details.argument string The command-line argument associated with specifying this parameter at run time
details.showArgument boolean Include the argument in the substitution done by Tapis when a run script is generated

Output metadata

Field Mandatory Type Description
id X string This is the "name" of the output. It is not currently used by the wrapper script but may be in the future
value.default string If your app has a fixed-name output, specify it here
value.order integer Ignore for now
value.required X boolean Is specification of this input mandatory to run a job?
value.validator string Perl-format regular expression used to match output files
value.visible boolean When automatically generated a UI, should this field be visible to end users?
semantics.ontology array[string] List of ontology terms (or URIs pointing to ontology terms) applicable to the output format
semantics.minCardinality integer Minimum number of values expected for this output
semantics.maxCardinality integer Maximum number of values expected for this output
semantics.fileTypes X array[string] List of Tapis file types that may apply to the output. Always use "raw-0" for the time being
details.description string Human-readable description of the output
details.label string Human-readable label for the output
details.argument string The command-line argument associated with specifying this output at run time (not currently used)
details.showArgument boolean Include the argument in the substitution done by Tapis when a run script is generated (not currently used)

information_source:
 Note: If the app you are working on doesn’t natively produce output with a predictable name, one thing you can do is add extra logic to your script to take the existing output and rename it to something you can control or predict.

Tools and Utilities

  1. Stumped for ontology terms to apply to your Tapis app inputs, outputs, and parameters? You can search EMBL-EBI for ontology terms, and BioPortal can provide links to EDAM.
  2. Need to validate JSON files? Try JSONlint or JSONparser

Build a samtools application bundle

# Log into Stampede
ssh stampede2.tacc.utexas.edu

# Unload system's samtools module if it happens to be loaded by default
module unload samtools

# All TACC systems have a directory than can be accessed as $WORK
cd $WORK

# Set up a project directory
mkdir tacc_prod
mkdir tacc_prod/src
mkdir -p tacc_prod/samtools-0.1.19/stampede2/bin
mkdir -p tacc_prod/samtools-0.1.19/stampede2/test

# Build samtools using the Intel C Compiler
# If you don't have icc, gcc will work but icc usually gives more efficient binaries
cd iPlant/src
wget "http://downloads.sourceforge.net/project/samtools/samtools/0.1.19/samtools-0.1.19.tar.bz2"
tar -jxvf samtools-0.1.19.tar.bz2
cd samtools-0.1.19
make CC=icc CFLAGS='-xCORE-AVX2 -axCORE-AVX512,MIC-AVX512 -O3'

# Copy the samtools binary and support scripts to the project bin directory
cp -R samtools bcftools misc ../../samtools-0.1.19/stampede2/bin/
cd ../../samtools-0.1.19/stampede2

# Test that samtools will launch
bin/samtools

  Program: samtools (Tools for alignments in the SAM format)
  Version: 0.1.19-44428cd

  Usage:   samtools <command> [options]

  Command: view        SAM <-> BAM conversion
           sort        sort alignment file
           mpileup     multi-way pileup...

# Package up the bin directory as an compressed archive
# and remove the original. This preserves the execute bit
# and other permissions and consolidates movement of all
# bundled dependencies in bin to a single operation. You
# can adopt a similar approach with lib and include.
tar -czf bin.tgz bin && rm -rf bin

Run samtools sort locally

Your first objective is to create a script that you know will run to completion under the Stampede scheduler and environment (or whatever executionSystem you’re working on). It will serve as a model for the template file you create later. In our case, we need to write a script that can be submitted to the Slurm scheduler. The standard is to use Bash for such scripts. You have five main objectives in your script:

  • Unpack binaries from bin.tgz
  • Extend your PATH to contain bin
  • Craft some option-handling logic to accept parameters from Tapis
  • Craft a command line invocation of the application you will run
  • Clean up when you’re done

First, you will need some test data in your current directory (i.e., $WORK/iPlant/samtools-0.1.19/stampede2/ ). You can use this test file

tapis files download agave://tacc.work.taccusershared/iplantcollaborative/example_data/Samtools_mpileup/ex1.bam

or you can any other BAM file for your testing purposes. Make sure if you use another file to change the filename in your test script accordingly!

Now, author your script. You can paste the following code into a file called test-sort.sh or you can copy it from here.

#!/bin/bash

# Tapis automatically writes these scheduler
# directives when you submit a job but we have to
# do it by hand when writing our test

#SBATCH -p development
#SBATCH -t 00:30:00
#SBATCH -n 16
#SBATCH -A tacc.prod
#SBATCH -J test-samtools
#SBATCH -o test-samtools.o%j

# Set up inputs and parameters
# We're emulating passing these in from Tapis
# inputBam is the name of the file to be sorted
inputBam="ex1.bam"
# outputPrefix is a parameter that establishes
# the prefix for the final sorted file
outputPrefix="sorted"
# Parameter for memory used in sort operation, in bytes
maxMemSort=500000000
# Boolean: Sort by name instead of coordinate
nameSort=0

# Unpack the bin.tgz file containing samtools binaries
# If you are relying entirely on system-supplied binaries
# you don't need this bit
tar -xvf bin.tgz
# Extend PATH to include binaries in bin
# If you need to extend lib, include, etc
# the same approach is applicable
export PATH=$PATH:"$PWD/bin"

# Dynamically construct a command line
# by building an ARGS string then
# adding the command, file specifications, etc
#
# We're doing this in a way familar to Tapis V1 users
# first. Later, we'll illustrate how to make use of
# Tapis V2's new parameter passing functions
#
# Start with empty ARGS...
ARGS=""
# Add -m flag if maxMemSort was specified
# You might want to add a constraint for how large maxMemSort
# can be based on the available memory on your executionSystem
if [ ${maxMemSort} -gt 0 ]; then ARGS="${ARGS} -m $maxMemSort"; fi

# Boolean handler for -named sort
if [ ${nameSort} -eq 1 ]; then ARGS="${ARGS} -n "; fi

# Run the actual program
samtools sort ${ARGS} ${inputBam} ${outputPrefix}

# Now, delete the bin/ directory
rm -rf bin

Submit the job to the queue on Stampede…

chmod 700 test-sort.sh
sbatch test-sort.sh

You can monitor your jobs in the queue using

showq -u your_tacc_username

Assuming all goes according to plan, you’ll end up with a sorted BAM called sorted.bam, and your bin directory (but not the bin.tgz file) should be erased. Congratulations, you’re in the home stretch: it’s time to turn the test script into a Tapis app.

Craft a Tapis app description

In order for Tapis to know how to run an instance of the application, we need to provide quite a bit of metadata about the application. This includes a unique name and version, the location of the application bundle, the identities of the execution system and destination system for results, whether its an HPC or other kind of job, the default number of processors and memory it needs to run, and of course, all the inputs and parameters for the actual program. It seems a bit over-complicated, but only because you’re comfortable with the command line already. Your goal here is to allow your applications to be portable across systems and present a web-enabled, rationalized interface for your code to consumers.

Rather than have you write a description for “samtools sort” from scratch, let’s systematically dissect an existing file provided with the SDK. Go ahead and copy the file into place and open it in your text editor of choice. If you don’t have the SDK installed, you can download the JSON descriptions here.

cd $WORK/tacc_prod/samtools-0.1.19/stampede2/
wget 'https://github.com/TACC-Cloud/agave-docs/blob/doc_changes/docs/agave/guides/apps/samtools-sort.json'

Open up samtools-sort.json in a text editor or in your web browser and follow along below.

Overview

Your file samtools-sort.json is written in JSON, and conforms to a Tapis-specific data model. We will dive into key elements here:

To make this file work for you, you will be, at a minimum, editing:

  1. Its executionSystem to match your private instance of Stampede.
  2. Its deploymentPath to match your iPlant applications path
  3. The name of the app to something besides “samtools-sort”. We recommend “$your_cyverse_username-samtools-sort”.

Instructions for making these changes will follow.

All Tapis application descriptions have the following structure:

{   "application_metadata":"value",
  "inputs":[],
  "parameters":[],
  "outputs":[]
}

There is a defined list of application metadata fields, some of which are mandatory. Inputs, parameters, and outputs are specified as an array of simple data structures, which are described earlier in the Application metadata section.

Inputs

To tell Tapis what files to stage into place before job execution, you need to define the app’s inputs in a JSON array. To implement the SAMtools sort app, you need to tell Tapis that a BAM file is needed to act as the subject of our sort:

{
"id":"inputBam",
"value":{
"default":"",
"order":0,
"required":true,
"validator":"",
"visible":true
},
"semantics":{
"ontology":[
"http://sswapmeet.sswap.info/mime/application/X-bam"
],
"minCardinality":1,
"fileTypes":[
"raw-0"
]
},
"details":{
"description":"",
"label":"The BAM file to sort",
"argument":null,
"showArgument":false
}
}

For information on what these fields mean, see the input metadata table.

information_source:
 A note on paths: In this CyVerse-oriented tutorial, we assume you will stage data to and from “data.iplantcollaborative.org”, the default storage system for CyVerse users. In this case, you can use relative paths relative to homeDir on that system (i.e. vaughn/analyses/foobar). To add portability, marshal data from other storageSystems, or import from public servers, you can also specify fully qualified URIs as follows:

Parameters

Parameters are specified in a JSON array, and are broadly similar to inputs. Here’s an example of the parameter we will define allowing users to specify how much RAM to use in a “samtools sort” operation.

{
  "id":"maxMemSort",
  "value":{
    "default":"500000000",
    "order":1,
    "required":true,
    "type":"number",
    "validator":"",
    "visible":true
  },
  "semantics":{
    "ontology":[
      "xs:integer"
    ]
  },
  "details":{
    "description":null,
    "label":"Maxiumum memory in bytes, used for sorting",
    "argument":"-m",
    "showArgument":false
  }
}

For information on what these fields mean, see the parameters metadata table.

Outputs

While we don’t support outputs 100% yet, Tapis apps are designed to participate in workflows. Thus, just as we define the list of valid and required inputs to an app, we also must (when we know them) define a list of its outputs. This allows it to “advertise” to consumers of Tapis services what it expects to emit, allowing apps to be chained together. Note that unlike inputs and parameters, output “id”s are NOT passed to the template file. If you must specify an output filename in the application json, do it as a parameter! Outputs are defined basically the same way as inputs:

{
  "id":"bam",
  "value":{
    "default":"sorted.bam",
    "order":0,
    "required":false,
    "validator":"",
    "visible":true
  },
  "semantics":{
    "ontology":[
      "http://sswapmeet.sswap.info/mime/application/X-bam"
    ],
    "minCardinality":1,
    "fileTypes":[
      "raw-0"
    ]
  },
  "details":{
    "description":"",
    "label":"Sorted BAM file",
    "argument":null,
    "showArgument":false
  }
}

For more info on these fields, see Output metadata table.

Craft a shell script template

Create sort.template using your test-sort.sh script as the starting point.

cp test-sort.sh sort.template

Now, open sort.template in the text editor of your choice. Delete the bash shebang line and the SLURM pragmas. Replace the hard-coded values for inputs and parameters with variables defined by your app description.

# Set up inputs...
# Since we don't check these when constructing the
# command line later, these will be marked as required
inputBam=${inputBam}
# and parameters
outputPrefix=${outputPrefix}
# Maximum memory for sort, in bytes
# Be careful, Neither Tapis nor scheduler will
# check that this is a reasonable value. In production
# you might want to code min/max for this value
maxMemSort=${maxMemSort}
# Boolean: Sort by name instead of coordinate
nameSort=${nameSort}

# Unpack the bin.tgz file containing samtools binaries
tar -xvf bin.tgz
# Set the PATH to include binaries in bin
export PATH=$PATH:"$PWD/bin"

# Build up an ARGS string for the program
# Start with empty ARGS...
ARGS=""
# Add -m flag if maxMemSort was specified
if [ ${maxMemSort} -gt 0 ]; then ARGS="${ARGS} -m $maxMemSort"; fi

# Boolean handler for -named sort
if [ ${nameSort} -eq 1 ]; then ARGS="${ARGS} -n "; fi

# Run the actual program
samtools sort ${ARGS} $inputBam ${outputPrefix}

# Now, delete the bin/ directory
rm -rf bin

Note

Another example to create a custom app using the tapis-cli can be found at Create a custom App Example

Permissions

Apps have fine grained permissions similar to those found in the Jobs and Files services. Using these, you can share your app other Tapis users. App permissions are private by default, so when you first POST your app to the Apps service, you are the only one who can see it. You may share your app with other users by granting them varying degrees of permissions. The full list of app permission values are listed in the following table.

Permission Description
READ Gives the ability to view the app description.
WRITE Gives the ability to update the app.
EXECUTE Gives the ability to submit jobs using the app
ALL Gives full READ and WRITE and EXECUTE permissions to the user.
READ_WRITE Gives full READ and WRITE permissions to the user
READ_EXECUTE Gives full READ and EXECUTE permissions to the user
WRITE_EXECUTE Gives full WRITE and EXECUTE permissions to the user

App permissions are distinct from all other roles and permissions and do not have implications outside the Apps service. This means that if you want to allow someone to run a job using your app, it is not sufficient to grant them READ_EXECUTE permissions on your app. They must also have an appropriate user role on the execution system on which the app will run. Similarly, if you do not have the right to publish on the executionSystem or access the deploymentPath on the deploymentSystem in your app description, you will not be able to publish your app.

Listing permissions

App permissions are managed through a set of URLs consistent with the permission operations elsewhere in the API. To query for a user’s permission for an app, perform a GET on the user’s unique app permissions url.

You can use the following CLI command:

tapis apps pems show -v $APP_ID $USERNAME
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
https://agave.iplantc.org/apps/v2/$APP_ID/pems/$USERNAME?pretty=true

The response from the service will be a JSON object representing the user permission. If the user does not have a permission for that app, the permission value will be NONE. By default, only you have permission to your private apps. Public apps will return a single permission for the public meta user rather than return a permissions for every user.

Show json response
{
"username": "$USERNAME",
"permission": {
  "read": true,
  "write": true,
  "execute": true
},
"_links": {
  "self": {
    "href": "https://agave.iplantc.org/apps/v2/$APP_ID/pems/$USERNAME"
  },
  "app": {
    "href": "https://agave.iplantc.org/apps/v2/$APP_ID"
  },
  "profile": {
    "href": "https://agave.iplantc.org/profiles/v2/$USERNAME"
  }
}
}

You can also query for all permissions granted on a specific app by making a GET request on the app’s permission collection.

tapis apps pems list -v $APP_ID
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
https://agave.iplantc.org/apps/v2/$APP_ID/pems?pretty=true

This time the service will respond with a JSON array of permission objects.

Show json response
{
  "username": "$USERNAME",
  "permission": {
    "read": true,
    "write": true,
    "execute": true
  },
  "_links": {
    "self": {
      "href": "https://agave.iplantc.org/apps/v2/$APP_ID/pems/$USERNAME"
    },
    "app": {
      "href": "https://agave.iplantc.org/apps/v2/$APP_ID"
    },
    "profile": {
      "href": "https://agave.iplantc.org/profiles/v2/$USERNAME"
    }
  }
}

Adding and updating permissions

Setting permissions is done by posting a JSON object containing a permission and username. Alternatively, you can POST just the permission and append the username to the URL.

tapis apps pems grant -v $APP_ID bgibson READ
Show curl
# Standard syntax to grant permissions to a specific user
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "username=bgibson&permission=READ" https://agave.iplantc.org/apps/v2/$APP_ID/pems?pretty=true

# Abbreviated POST data to grant permission to a single user
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "permission=READ" https://agave.iplantc.org/apps/v2/$APP_ID/pems/bgibson?pretty=true

The response will contain a JSON object representing the permission that was just created.
Show json response
{
"username": "bgibson",
"permission": {
  "read": true,
  "write": false,
  "execute": false
},
"_links": {
  "self": {
    "href": "https://agave.iplantc.org/apps/v2/$APP_ID/pems/bgibson"
  },
  "app": {
    "href": "https://agave.iplantc.org/apps/v2/$APP_ID"
  },
  "profile": {
    "href": "https://agave.iplantc.org/profiles/v2/bgibson"
  }
}
}

Deleting permissions

Permissions can be deleted on a user-by-user basis, or all at once. To delete an individual user permission, make a DELETE request on the user’s app permission URL.

tapis apps pems revoke -v $APP_ID $USERNAME
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X DELETE https://agave.iplantc.org/apps/v2/$APP_ID/pems/bgibson?pretty=true

Show response

The CLI response will be:

{
"username": "bgibson",
"permission": {
  "read": true,
  "write": false,
  "execute": false
},
"_links": {
  "self": {
    "href": "https://agave.iplantc.org/apps/v2/$APP_ID/pems/bgibson"
  },
  "app": {
    "href": "https://agave.iplantc.org/apps/v2/$APP_ID"
  },
  "profile": {
    "href": "https://agave.iplantc.org/profiles/v2/bgibson"
  }
}
}
Successfully removed permission for bgibson on app $APP_ID

And the cURL response will be an empty result object. |

You can accomplish the same thing by updating the user permission to an empty value.

tapis apps pems grant -v $APP_ID $USERNAME $PERMISSION
Show curl
# Delete permission for a single user by updating with an empty permission value
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"  \
    -X POST -d "username=bgibson" -d "permission=NONE" \
    https://agave.iplantc.org/apps/v2/$APP_ID/pems?pretty=true

# Delete permission for a single user by updating with an empty permission value
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -X POST -d "permission=" \
    https://agave.iplantc.org/apps/v2/$APP_ID/pems/bgibson?pretty=true

Since this is an update operation, the resulting JSON permission object will be returned showing the user has no permissions to the app anymore.

Show json response
{
"username": "bgibson",
"permission": {
  "read": false,
  "write": false,
  "execute": false
},
"_links": {
  "self": {
    "href": "https://agave.iplantc.org/apps/v2/$APP_ID/pems/bgibson"
  },
  "app": {
    "href": "https://agave.iplantc.org/apps/v2/$APP_ID"
  },
  "profile": {
    "href": "https://agave.iplantc.org/profiles/v2/bgibson"
  }
}
}

To delete all permissions for an app, make a DELETE request on the app’s permissions collection.

tapis apps pems drop $APP_ID
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -X DELETE \
    https://agave.iplantc.org/apps/v2/$APP_ID/pems?pretty=true

The response will be an empty result object.

App Publishing

In addition to traditional permissions, apps also have a concept of scope. Unless otherwise configured, apps are private to the owner and the users they grant permission. Applications can, however move from the private space into the public space for use any anyone. Moving an app into the public space is called publishing. Publishing an app gives it much greater exposure and results in increased usage by the user community. It also comes with increased responsibilities for the original owner as well as the API administrators. Several of these are listed below:

  • Public apps must run on public systems. This makes the app available to everyone.
  • Public apps must be vetted for performance, reliability, and security by the API administrators.
  • The original app author must remain available via email for ongoing support.
  • Public apps must be copied into a public repository and checksummed.
  • Updates to public apps must result in a snapshot of the original app being created and stored with its resulting checksum in a separate location.
  • API administrators must maintain and support the app throughout its lifetime.
information_source:
 If you have an app you would like to see published, please contact your API administrators for more information.

Publishing an app

To publish an app, make a PUT request on the app resource. In this example, we publish the wc-osg-1.00 app.

tapis apps publish -e condor.opensciencegrid.org wc-osg-1.00
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -H "Content-Type: application/json"
    -X PUT
    --data-binary '{"action":"publish","executionSystem":"condor.opensciencegrid.org"}'
    https://agave.iplantc.org/apps/v2/wc-osg-1.00?pretty=true

The response from the service will resemble the following:
Show json response
{
"id": "wc-osg-1.00u1",
"name": "wc-osg",
"icon": null,
"uuid": "8734854070765284890-242ac116-0001-005",
"parallelism": "SERIAL",
"defaultProcessorsPerNode": 1,
"defaultMemoryPerNode": 1,
"defaultNodeCount": 1,
"defaultMaxRunTime": null,
"defaultQueue": null,
"version": "1.00",
"revision": 1,
"isPublic": false,
"helpURI": "http://www.gnu.org/s/coreutils/manual/html_node/wc-invocation.html",
"label": "wc condor",
"shortDescription": "Count words in a file",
"longDescription": "",
"tags": [
  "gnu",
  "textutils"
],
"ontology": [
  "http://sswapmeet.sswap.info/algorithms/wc"
],
"executionType": "CONDOR",
"executionSystem": "condor.opensciencegrid.org",
"deploymentPath": "/agave/apps/wc-1.00",
"deploymentSystem": "public.storage.agave",
"templatePath": "/wrapper.sh",
"testPath": "/wrapper.sh",
"checkpointable": true,
"lastModified": "2016-09-15T04:48:17.000-05:00",
"modules": [
  "load TACC",
  "purge"
],
"available": true,
"inputs": [
  {
    "id": "query1",
    "value": {
      "validator": "",
      "visible": true,
      "required": false,
      "order": 0,
      "enquote": false,
      "default": [
        "read1.fq"
      ]
    },
    "details": {
      "label": "File to count words in: ",
      "description": "",
      "argument": null,
      "showArgument": false,
      "repeatArgument": false
    },
    "semantics": {
      "minCardinality": 1,
      "maxCardinality": -1,
      "ontology": [
        "http://sswapmeet.sswap.info/util/TextDocument"
      ],
      "fileTypes": [
        "text-0"
      ]
    }
  }
],
"parameters": [],
"outputs": [
  {
    "id": "outputWC",
    "value": {
      "validator": "",
      "order": 0,
      "default": "wc_out.txt"
    },
    "details": {
      "label": "Text file",
      "description": "Results of WC"
    },
    "semantics": {
      "minCardinality": 1,
      "maxCardinality": 1,
      "ontology": [
        "http://sswapmeet.sswap.info/util/TextDocument"
      ],
      "fileTypes": []
    }
  }
],
"_links": {
  "self": {
    "href": "https://api.tacc.utexas.edu/apps/v2/wc-osg-1.00u1"
  },
  "executionSystem": {
    "href": "https://api.tacc.utexas.edu/systems/v2/condor.opensciencegrid.org"
  },
  "storageSystem": {
    "href": "https://api.tacc.utexas.edu/systems/v2/public.storage.agave"
  },
  "history": {
    "href": "https://api.tacc.utexas.edu/apps/v2/wc-osg-1.00u1/history"
  },
  "metadata": {
    "href": "https://api.tacc.utexas.edu/meta/v2/data/?q=%7B%22associationIds%22%3A%228734854070765284890-242ac116-0001-005%22%7D"
  },
  "owner": {
    "href": "https://papi.tacc.utexas.edu/profiles/v2/nryan"
  },
  "permissions": {
    "href": "https://api.tacc.utexas.edu/apps/v2/wc-osg-1.00u1/pems"
  }
}
}

Notice a few things about the response.

  1. Both the executionSystem and deploymentSystem have changed. Public apps must run and store their assets on public systems.
  2. We did not specify the deploymentSystem where the public app assets should be stored, so Tapis placed them on the default public storage system, public.storage.agave.
  3. We did not specify the deploymentPath where the public app assets should be stored, so Tapis placed them in the publicAppsDir of the deploymentPath.
  4. The deploymentPath is now a zip archive rather than a folder. Tapis does this because once, published, the app can no longer be updated, so the assets are frozen and stored in a separate location, removed from user access.
  5. The id of the app has changed. It now has a u1 appended to the original app id. This indicates that it is a public app and that it has been updated a single time. If we were to publish the app again, the resulting id would be wc-osg-1.00u2. This differs from unpublished apps whose revision number increments without impacting the app id. Every time you publish an app, the id of the resulting public app will change.

Disabling an App

Unpublishing a public system is equivalent to disabling it.

Unlike systems, it is not possible to unpublish an app. Once published, a deep copy of the app is store in an external location with its own provenance trail. If you would like to remove a published app from further use, simply disable it.

tapis apps disable -v $APP_ID
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -X PUT -d "action=disable"
    https://agave.iplantc.org/apps/v2/$APP_ID?pretty=true

The response will look identical to before, but with available set to false

Cloning an app

Often times you will want to copy an existing app for use on another system, or simply to obtain a private copy of the app for your own use. This can be done using the clone functionality in the Apps service. The following tabs show how to do this using the unix curl command as well as with the Tapis CLI.

tapis apps clone -n my-pyplot-demo -x 2.2 demo-pyplot-demo-advanced-0.1.0
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -X PUT 'https://agave.iplantc.org/apps/v2/$APP_ID?pretty=true' \
    --data-urlencode action=clone \
    --data-urlencode name=$NEW_APP_NAME \
    --data-urlencode version=0.1.2 \
    --data-urlencode deploymentSystem= $STORAGE_SYSTEM \
    --data-urlencode executionSystem= $EXECUTION_SYSTEM

information_source:
 When cloning public apps, the entire app bundle will be recreated on the deploymentSystem you specify or your default storage system. The same is not true for private apps. Cloning a private app will copy the job description, but not the app bundle. This is to honor the original ownership of the assets and prevent them from leaking out to the public space without the owner’s permission. If you need direct access to the app’s assets, request that the owner give you read access to the folder listed as the deploymentPath in the app description.

Jobs

The Jobs service is a basic execution service that allows you to run applications registered with the Apps service across multiple, distributed, heterogeneous systems through a common REST interface. The service manages all aspects of execution and job management from data staging, job submission, monitoring, output archiving, event logging, sharing, and notifications. The Jobs service also provides a persistent reference to your job’s output data and a mechanism for sharing all aspects of your job with others. Each feature will be described in more detail in the following section.

Aloe Jobs Service (now in production)

Version 2.4 of the Jobs service is now in production. This version, code-named Aloe, is the rearchitected Jobs service with improved reliability, scalability, performance and serviceability.

A new version of the Jobs service documentation is being developed. Until the unified documentation is ready, please see the old Tapis Jobs service documentation for a basic understanding of the interface and the Aloe documentation in the links below for the up-to-date details.

The following links discuss details of the new production Jobs service:

New Jobs Architecture

Changes

Migration Guide

Aloe Beta Tester Guide

Job submission

Job submission is a term recycled from shared batch computing environments where a user would submit a request for a unit of computational work (called a Job) to the batch scheduler, then go head home for dinner while waiting for the computer to complete the job they gave it.

Originally the batch scheduler was a person and the term batch came from their ability to process several submissions together. Later on, as human schedulers were replaced by software, the term stuck even though the process remained unchanged. Today the term job submission is essentially unchanged.

A user submits a request for a unit of work to be done. The primary difference is that today, often times, the wait time between submission and execution is considerably less. On shared systems, such as many of the HPC systems originally targeted by Tapis, waiting for your job to start is the price you pay for the incredible performance you get once your job starts.

Tapis, too, adopts the concept of job submission, though it is not in and of itself a scheduler. In the context of Tapis’ (Tapis) Job service, the process of running an application registered with the Apps service is referred to as submitting a job.

Unlike in the batch scheduling world where each scheduler has its own job submission syntax and its own idiosyncrasies, the mechanism for submitting a job to Tapis is consistent regardless of the application or system on which you run. A HTML form or JSON object are posted to the Jobs service. The submission is validated, and the job is forwarded to the scheduling and execution services for processing.

Because Tapis takes an app-centric view of science, execution does not require knowing about the underlying systems on which an application runs. Simply knowing how the parameters and inputs you want to use when running an app is sufficient to define a job. Tapis will handle the rest.

As mentioned previously, jobs are submitted by making a HTTP POST request either a HTML form or a JSON object to the Jobs service. All job submissions must include a few mandatory values that are used to define a basic unit of work. Table 1 lists the optional and required attributes of all job submissions.

Name Value(s) Description
name string Descriptive name of the job. This will be slugified and used as one component of directory names in certain situations.
appId string The unique name of the application being run by this job. This must be a valid application that the calling user has permission to run.
batchQueue string The batch queue on the execution system to which this job is submitted. Defaults to the app's defaultQueue property if specified. Otherwise a best-fit algorithm is used to match the job parameters to a queue on the execution system with sufficient capabilities to run the job.
nodeCount integer The number of nodes to use when running this job. Defaults to the app's defaultNodes property or 1 if no default is specified.
processorsPerNode integer The number of processors this application should utilize while running. Defaults to the app's defaultProcessorsPerNode property or 1 if no default is specified. If the application is not of executionType PARALLEL, this should be 1.
memoryPerNode string The maximum amount of memory needed per node for this application to run given in ####.#[E|P|T|G]B format. Defaults to the app's defaultMemoryPerNode property if it exists. GB are assumed if no magnitude is specified.
maxRunTime string The estimated compute time needed for this application to complete given in hh:mm:ss format. This value must be less than or equal to the max run time of the queue to which this job is assigned.
notifications* JSON array An array of one or more JSON objects describing an event and url which the service will POST to when the given event occurs. For more on Notifications, see the section on webhooks below.
archive* boolean Whether the output from this job should be archived. If true, all new files created by this application's execution will be archived to the archivePath in the user's default storage system.
archiveSystem* string System to which the job output should be archived. Defaults to the user's default storage system if not specified.
archivePath* string Location where the job output should be archived. A relative path or absolute path may be specified. If not specified, a unique folder will be created in the user's home directory of the archiveSystem at 'archive/jobs/job-$JOB_ID'

Table 1. The optional and required attributes common to all job submissions. Optional fields are marked with an astericks.

Note

In this tutorial we will use JSON for our examples, however, one could replace the JSON object with a HTML form mapping JSON attribute and values to HTML form attributes and values one for one and get the same results, with the exception of the notifications attribute which is not accepted using HTML form submission and would need to be added after submitting the job request by sending each of the notification objects with the returned job id to the Notifications API.

In addition to the standard fields for all jobs, the application you specify in the appId field will also have its own set of inputs and parameters specified during registration that are unique to that app. (For more information about app registration and descriptions, see the Apps section..

The following snippet shows a sample JSON job request that could be submitted to the Jobs service to run the pyplot-0.1.0 app. from the Advanced App Example tutorial.

Show JSON job request
{
 "name":"pyplot-demo test",
 "appId":"demo-pyplot-demo-advanced-0.1.0",
 "inputs":{
   "dataset":[
     "agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv",
     "agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata2.csv"
   ]
 },
 "archive":false,
 "parameters":{
   "unpackInputs":false,
   "chartType":[
     "bar",
     "line"
   ],
   "width":1024,
   "height":512,
   "background":"#d96727",
   "showYLabel":true,
   "ylabel":"The Y Axis Label",
   "showXLabel":true,
   "xlabel":"The X Axis Label",
   "showLegend":true,
   "separateCharts":false
 },
 "notifications":[
   {
     "url":"$API_EMAIL",
     "event":"RUNNING"
   },
   {
     "url":"$API_EMAIL",
     "event":"FINISHED"
   },
   {
     "url":"http://http://requestbin.agaveapi.co/o1aiawo1?job_id=${JOB_ID}&amp;status=${JOB_STATUS}",
     "event":"*",
     "persistent":true
   }
 ]
}

Notice that this example specifies a single input attribute, dataset. The pyplot-0.1.0 app definition specified that the dataset input attribute could accept more than one value (maxCardinality = 2). In the job request object, that translates to an array of string values. Each string represents a piece of data that Tapis will transfer into the job work directory prior to job execution. Any value accepted by the Files service when importing data is accepted here. Some examples of valid values are given in the following table.

Name Description
inputs/pyplot/testdata.csv A relative path on the user's default storage system.
/home/apiuser/inputs/pyplot/testdata.csv An absolute path on the user's default storage system.
agave://$PUBLIC_STORAGE_SYSTEM/ $API_USERNAME/inputs/pyplot/testdata.csv a Tapis URL explicitly specifying a source system and relative path.
agave://$PUBLIC_STORAGE_SYSTEM//home/ apiuser/$API_USERNAME/inputs/pyplot/testdata.csv a Tapis URL explicitly specifying a source system and absolute path.
http://example.com/inputs/pyplot/testdata.csv Standard url with any supported transfer protocol.

Table 2. Examples of different syntaxes that input values can be specified in the job request object. Here we assume that the validator for the input field is such that these would pass.

The example job request also specifies parameters object with the parameters defined in the pyplot-0.1.0 app description. Notice that the parameter type value specified in the app description is reflected here. Numbers are given as numbers, not strings. Boolean and flag attributes are given as boolean true and false values. As with the input section, there is also a parameter chartType that accepts multiple values. In this case that translates to an array of string value. Had the parameter type required another primary type, that would be used in the array instead.

Finally, we see a notifications array specifying that we want Tapis send three notifications related to this job. The first is a one-time email when the job starts running. The second is a one-time email when the job reaches a terminal state. The third is a webhook to the url we specified. More on notifications in the section on monitoring below.

Job submission validation

To get a template for the Job submission JSON for a particular app, you can use the following CLI command:

$ jobs-template $APP_ID > job.json

You can submit the job with the following CLI command:

$ tapis jobs submit -F job.json
Show cURL
$ curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "@job.json" -H "Content-Type: application/json" https://agave.iplantc.org/jobs/v2/?pretty=true

If everything went well, you will receive a response that looks something like the following JSON object.

Show response
{
 "status" : "success",
 "message" : null,
 "version" : "2.2.14-red7223e",
 "result" : {
   "id" : "$JOB_ID",
   "name" : "$USERNAME-$APP_ID",
   "owner" : "$USERNAME",
   "appId" : "$APP_ID",
   "executionSystem" : "$PUBLIC_EXECUTION_SYSTEM",
   "batchQueue" : "normal",
   "nodeCount" : 1,
   "processorsPerNode" : 16,
   "memoryPerNode" : 32.0,
   "maxRunTime" : "01:00:00",
   "archive" : false,
   "retries" : 0,
   "localId" : null,
   "created" : "2018-01-26T15:01:44.000-06:00",
   "lastModified" : "2018-01-26T15:01:45.000-06:00",
   "outputPath" : null,
   "status" : "PENDING",
   "submitTime" : "2018-01-26T15:01:44.000-06:00",
   "startTime" : null,
   "endTime" : null,
   "inputs" : {
     "inputBam" : [ "agave://data.iplantcollaborative.org/shared/iplantcollaborative/example_data/Samtools_mpileup/ex1.bam" ]
   },
   "parameters" : {
     "nameSort" : true,
     "maxMemSort" : 800000000
   },
   "_links" : {
     "self" : {
       "href" : "https://agave.iplantc.org/jobs/v2/1674389564419740136-242ac113-0001-007"
     },
     "app" : {
       "href" : "https://agave.iplantc.org/apps/v2/$APP_ID"
     },
     "executionSystem" : {
       "href" : "https://agave.iplantc.org/systems/v2/$PUBLIC_EXECUTION_SYSTEM"
     },
     "archiveSystem" : {
       "href" : "https://agave.iplantc.org/systems/v2/$PUBLIC_EXECUTION_SYSTEM"
     },
     "archiveData" : {
       "href" : "https://agave.iplantc.org/jobs/v2/1674389564419740136-242ac113-0001-007/outputs/listings"
     },
     "owner" : {
       "href" : "https://agave.iplantc.org/profiles/v2/$USERNAME"
     },
     "permissions" : {
       "href" : "https://agave.iplantc.org/jobs/v2/1674389564419740136-242ac113-0001-007/pems"
     },
     "history" : {
       "href" : "https://agave.iplantc.org/jobs/v2/1674389564419740136-242ac113-0001-007/history"
     },
     "metadata" : {
       "href" : "https://agave.iplantc.org/meta/v2/data/?q=%7B%22associationIds%22%3A%221674389564419740136-242ac113-0001-007%22%7D"
     },
     "notifications" : {
       "href" : "https://agave.iplantc.org/notifications/v2/?associatedUuid=1674389564419740136-242ac113-0001-007"
     },
     "notification" : [ ]
   }
 }
}

Job monitoring

Once you submit your job request, the job will be handed off to Tapis’s back end execution service. Your job may run right away, or it may wait in a batch queue on the execution system until the required resources are available. Either way, the execution process occurs completely asynchronous to the submission process. To monitor the status of your job, Tapis supports two different mechanisms: polling and webhooks.

information_source:
 For the sake of brevity, we placed a detailed explanation of the job lifecycle in a separate, aptly title post, The Job Lifecycle. There you will find detailed information about how, when, and why everything moves from place to place and how you can peek behind the curtains.

Polling

If you have ever taken a long road trip with children, you are probably painfully aware of how polling works. Starting several minutes from the time you leave the house, a child asks, “Are we there yet? You reply, “No.” Several minutes later the child again asks, “Are we there yet?” You again reply, “No.” This process continues until you finally arrive at your destination. This is called polling and polling is bad

Polling for your job status works the same way. After submitting your job, you start a while loop that queries the Jobs service for your job status until it detects that the job is in a terminal state. The following two URLs both return the status of your job. The first will result in a list of abbreviated job descriptions, the second will result in a full description of the job with the given $JOB_ID, exactly like that returned when submitting the job. The third will result in a much smaller response object that contains only the $JOB_ID and status being returned.

Show curl
curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" https://agave.iplantc.org/jobs/v2/?pretty=true
curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" https://agave.iplantc.org/jobs/v2/$JOB_ID
curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" https://agave.iplantc.org/jobs/v2/$JOB_ID/status

Show json response
{
"id" : "$JOB_ID",
"name" : "$USERNAME-$APP_ID",
"owner" : "$USERNAME",
"appId" : "$APP_ID",
"executionSystem" : "$PUBLIC_EXECUTION_SYSTEM",
"batchQueue": "normal",
"nodeCount": 1,
"processorsPerNode": 16,
"memoryPerNode": 32,
"maxRunTime": "01:00:00",
"archive": false,
"retries": 0,
"localId": "659413",
"created": "2018-01-26T15:08:02.000-06:00",
"lastUpdated": "2018-01-26T15:09:55.000-06:00",
"outputPath": "$USERNAME/$JOB_ID-$APP_ID",
"status": "FINISHED",
"submitTime": "2018-01-26T15:09:45.000-06:00",
"startTime": "2018-01-26T15:09:53.000-06:00",
"endTime": "2018-01-26T15:09:55.000-06:00",
"inputs": {
  "inputBam": [
    "agave://data.iplantcollaborative.org/shared/iplantcollaborative/example_data/Samtools_mpileup/ex1.bam"
  ]
},
"parameters": {
  "nameSort": true,
  "maxMemSort": 800000000
},
"_links": {
  "self": {
    "href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID"
  },
  "app": {
    "href": "https://api.tacc.utexas.edu/apps/v2/$APP_ID"
  },
  "executionSystem": {
    "href": "https://api.tacc.utexas.edu/systems/v2/$PUBLIC_EXECUTION_SYSTEM"
  },
  "archiveSystem": {
    "href": "https://api.tacc.utexas.edu/systems/v2/$PUBLIC_EXECUTION_SYSTEM""
  },
  "archiveData": {
    "href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/outputs/listings"
  },
  "owner": {
    "href": "https://api.tacc.utexas.edu/profiles/v2/$USERNAME"
  },
  "permissions": {
    "href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/pems"
  },
  "history": {
    "href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/history"
  },
  "metadata": {
    "href": "https://api.tacc.utexas.edu/meta/v2/data/?q=%7B%22associationIds%22%3A%22462259152402771480-242ac113-0001-007%22%7D"
  },
  "notifications": {
    "href": "https://api.tacc.utexas.edu/notifications/v2/?associatedUuid=$JOB_ID"
  }
}
}

The list of all possible job statuses is given in table 2.

Event Description
CREATED The job was updated
UPDATED The job was updated
DELETED The job was deleted
PERMISSION_GRANT User permission was granted
PERMISSION_REVOKE Permission was removed for a user on this job
PENDING Job accepted and queued for submission.
STAGING_INPUTS Transferring job input data to execution system
CLEANING_UP Job completed execution
ARCHIVING Transferring job output to archive system
STAGING_JOB Job inputs staged to execution system
FINISHED Job complete
KILLED Job execution killed at user request
FAILED Job failed
STOPPED Job execution intentionally stopped
RUNNING Job started running
PAUSED Job execution paused by user
QUEUED Job successfully placed into queue
SUBMITTING Preparing job for execution and staging binaries to execution system
STAGED Job inputs staged to execution system
PROCESSING_INPUTS Identifying input files for staging
ARCHIVING_FINISHED Job archiving complete
ARCHIVING_FAILED Job archiving failed
HEARTBEAT Job heartbeat received

Table 2. Job statuses listed in progressive order from job submission to completion.

Polling is an incredibly effective approach, but it is bad practice for two reasons. First, it does not scale well. Querying for one job status every few seconds does not take much effort, but querying for 100 takes quite a bit of time and puts unnecessary load on Tapis’s servers. Second, polling provides what is effectively a binary response. It tells you whether a job is done or not done, it does not give you any information on what is actually going on with the job or where it is in the overall execution process.

The job history URL provides much more detailed information on the various state changes, system messages, and progress information associated with data staging. The syntax of the job history URL is as follows:

Show curl
curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" https://agave.iplantc.org/jobs/v2/$JOB_ID/history?pretty=true

Show json response
{
"status":"success",
"message":null,
"version":"2.1.0-r6d11c",
"result":[
  {
    "created":"2014-10-24T04:47:45.000-05:00",
    "status":"PENDING",
    "description":"Job accepted and queued for submission."
  },
  {
    "created":"2014-10-24T04:47:47.000-05:00",
    "status":"PROCESSING_INPUTS",
    "description":"Attempt 1 to stage job inputs"
  },
  {
    "created":"2014-10-24T04:47:47.000-05:00",
    "status":"PROCESSING_INPUTS",
    "description":"Identifying input files for staging"
  },
  {
    "created":"2014-10-24T04:47:48.000-05:00",
    "status":"STAGING_INPUTS",
    "description":"Staging agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv to remote job directory"
  },
  {
    "progress":{
      "averageRate":0,
      "totalFiles":1,
      "source":"agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv",
      "totalActiveTransfers":0,
      "totalBytes":3212,
      "totalBytesTransferred":3212
    },
    "created":"2014-10-24T04:47:48.000-05:00",
    "status":"STAGING_INPUTS",
    "description":"Copy in progress"
  },
  {
    "created":"2014-10-24T04:47:50.000-05:00",
    "status":"STAGED",
    "description":"Job inputs staged to execution system"
  },
  {
    "created":"2014-10-24T04:47:55.000-05:00",
    "status":"SUBMITTING",
    "description":"Preparing job for submission."
  },
  {
    "created":"2014-10-24T04:47:55.000-05:00",
    "status":"SUBMITTING",
    "description":"Attempt 1 to submit job"
  },
  {
    "created":"2014-10-24T04:48:08.000-05:00",
    "status":"RUNNING",
    "description":"Job started running"
  },
  {
    "created":"2014-10-24T04:48:12.000-05:00",
    "status":"CLEANING_UP"
  },
  {
    "created":"2014-10-24T04:48:15.000-05:00",
    "status":"FINISHED",
    "description":"Job completed. Skipping archiving at user request."
  }
]
}

Depending on the nature of your job and the reliability of the underlying systems, the response from this service can grow rather large, so it is important to be aware that this query can be an expensive call for your client application to make. Everything we said before about polling job status applies to polling job history with the additional caveat that you can chew through quite a bit of bandwidth polling this service, so keep that in mind if your application is bandwidth starved.

Often times, however, polling is unavoidable. In these situations, we recommend using an exponential backoff to check job status. An exponential backoff is an alogrithm that increases the time between retries as the number of failures increases.

Webhooks

Webhooks are the alternative, preferred way for your application to monitor the status of asynchronous actions in Tapis. If you are a Gang of Four disciple, webhooks are a mechanism for implementing the Observer Pattern. They are widely used across the web and chances are that something you’re using right now is leveraging them. In the context of Tapis, a webhook is a URL that you give to Tapis in advance of an event which it later POSTs a response to when that event occurs. A webhook can be any web accessible URL.

information_source:
 For more information about webhooks, events, and notifications in Tapis, please see the Notifications and Events Guides.

The Jobs service provides several template variables for constructing dynamic URLs. Template variables can be included anywhere in your URL by surrounding the variable name in the following manner ${VARIABLE_NAME}. When an event of interest occurs, the variables will be resolved and the resulting URL called. Several example urls are given below.

The full list of template variables are listed in the following table.

Variable Description
UUID The UUID of the job
EVENT The event which occurred
JOB_STATUS The status of the job at the time the event occurs
JOB_URL The url of the job within the API
JOB_ID The unique id used to reference the job within Tapis.
JOB_SYSTEM ID of the job execution system (ex. ssh.execute.example.com)
JOB_NAME The user-supplied name of the job
JOB_START_TIME The time when the job started running in ISO8601 format.
JOB_END_TIME The time when the job stopped running in ISO8601 format.
JOB_SUBMIT_TIME The time when the job was submitted to Tapis for execution by the user in ISO8601 format.
JOB_ARCHIVE_PATH The path on the archive system where the job output will be staged.
JOB_ARCHIVE_URL The Tapis URL for the archived data.
JOB_ERROR The error message explaining why a job failed. Null if completed successfully.

Table 3. Template variables available for use when defining webhooks for your job.

Email

In situations where you do not have a persistent web address, or access to a backend service, you may find it more convenient to subscribe for email notifications rather then providing a webhook. Tapis supports email notifications as well. Simply specify a valid email address in the url field in your job submission notification object and an email will be sent to that address when a relevant event occurs. A sample email message is given below.

Stopping

Once your job is submitted, you have the ability to stop the job. This will kill the job on the system on which it is running.

You can kill a job with the following CLI command:

tapis jobs cancel $JOB_UUID
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "action=kill" https://api.tacc.utexas.edu/jobs/v2/$JOB_ID

Show json response
{
"id" : "$JOB_ID",
"name" : "demo-pyplot-demo-advanced test-1414139896",
"owner" : "$API_USERNAME",
"appId" : "demo-pyplot-demo-advanced-0.1.0",
"executionSystem" : "$PUBLIC_EXECUTION_SYSTEM",
"batchQueue" : "debug",
"nodeCount" : 1,
"processorsPerNode" : 1,
"memoryPerNode" : 1.0,
"maxRunTime" : "01:00:00",
"archive" : false,
"retries" : 0,
"localId" : "10321",
"outputPath" : null,
"status" : "STOPPED",
"submitTime" : "2014-10-24T04:48:11.000-05:00",
"startTime" : "2014-10-24T04:48:08.000-05:00",
"endTime" : null,
"inputs" : {
  "dataset" : "agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv"
},
"parameters" : {
  "chartType" : "bar",
  "height" : "512",
  "showLegend" : "false",
  "xlabel" : "Time",
  "background" : "#FFF",
  "width" : "1024",
  "showXLabel" : "true",
  "separateCharts" : "false",
  "unpackInputs" : "false",
  "ylabel" : "Magnitude",
  "showYLabel" : "true"
},
"_links" : {
  "self" : {
    "href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007"
  },
  "app" : {
    "href" : "https://api.tacc.utexas.edu/apps/v2/demo-pyplot-demo-advanced-0.1.0"
  },
  "executionSystem" : {
    "href" : "https://api.tacc.utexas.edu/systems/v2/$PUBLIC_EXECUTION_SYSTEM"
  },
  "archiveData" : {
    "href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/listings"
  },
  "owner" : {
    "href" : "https://api.tacc.utexas.edu/profiles/v2/$API_USERNAME"
  },
  "permissions" : {
    "href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/pems"
  },
  "history" : {
    "href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/history"
  },
  "metadata" : {
    "href" : "https://api.tacc.utexas.edu/meta/v2/data/?q={"associationIds":"0001414144065563-5056a550b8-0001-007"}"
  },
  "notifications" : {
    "href" : "https://api.tacc.utexas.edu/notifications/v2/?associatedUuid=0001414144065563-5056a550b8-0001-007"
  }
}
}

Deleting a job

Over time the number of jobs you have run can grow rather large. You can delete jobs to remove them from your listing results, with the following CLI command:

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X DELETE https://api.tacc.utexas.edu/jobs/v2/$JOB_ID

warning:Deleting a job will hide it from view, not permanently delete the record.

Resubmitting a job

Often times you will want to rerun a previous job as part of a pipeline, automation, or validation that the results were valid. In this situation, it is convenient to use the resubmit feature of the Jobs service.

Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "action=resubmit" https://api.tacc.utexas.edu/jobs/v2/$JOB_ID

Resubmission provides you the options to enforce as much or as little rigor as you desire with respect to reproducibility in the job submission process. The following options are available to you for configuring a resubmission according to your requirements.

Field Type Description
ignoreInputConflicts boolean Whether to ignore discrepencies in the previous app inputs for the resubmitted job. If true, the resubmitted job will make a best fit attempt and migrating the inputs.
ignoreParameterConflicts boolean Whether to ignore discrepencies in the previous app parameters for the resubmitted job. If true, the resubmitted job will make a best fit attempt and migrating the parameters.
preserveNotifications boolean Whether to recreate the notification of the original job for the resubmitted job.

Outputs

Throughout the lifecycle of a job, your inputs, application assets, and outputs are copied from and shuffled between several different locations. Though it is possible in many instances to explicitly locate and view all the moving pieces of your job through the Files service, resolving where those pieces are given the status, execution system, storage systems, data protocols, login protocols, and execution mechanisms of your job at a given time is…challenging. It is important, however, that you have the ability to monitor your job’s output throughout the lifetime of the job.

To make tracking the output of a specific job easier to do, the Jobs service provides a special URL for referencing individual job outputs

Show curl
curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/outputs/listings/?pretty=true

The syntax of this service is consistent with the Files service syntax, as is the JSON response from the service. The response would be similar to the following:

Show json response
{
"status" : "success",
"message" : null,
"version" : "2.1.0-r6d11c",
"result" : [ {
  "name" : "output",
  "path" : "/output",
  "lastModified" : "2014-11-06T13:34:35.000-06:00",
  "length" : 0,
  "permission" : "NONE",
  "mimeType" : "text/directory",
  "format" : "folder",
  "type" : "dir",
  "_links" : {
    "self" : {
      "href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/media/output"
    },
    "system" : {
      "href" : "https://api.tacc.utexas.edu/systems/v2/data.iplantcollaborative.org"
    },
    "parent" : {
      "href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007"
    }
  }
}, {
  "name" : "demo-pyplot-demo-advanced-test-1414139896.err",
  "path" : "/demo-pyplot-demo-advanced-test-1414139896.err",
  "lastModified" : "2014-11-06T13:34:27.000-06:00",
  "length" : 442,
  "permission" : "NONE",
  "mimeType" : "application/octet-stream",
  "format" : "unknown",
  "type" : "file",
  "_links" : {
    "self" : {
      "href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/media/demo-pyplot-demo-advanced-test-1414139896.err"
    },
    "system" : {
      "href" : "https://api.tacc.utexas.edu/systems/v2/data.iplantcollaborative.org"
    },
    "parent" : {
      "href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007"
    }
  }
}, {
  "name" : "demo-pyplot-demo-advanced-test-1414139896.out",
  "path" : "/demo-pyplot-demo-advanced-test-1414139896.out",
  "lastModified" : "2014-11-06T13:34:30.000-06:00",
  "length" : 1396,
  "permission" : "NONE",
  "mimeType" : "application/octet-stream",
  "format" : "unknown",
  "type" : "file",
  "_links" : {
    "self" : {
      "href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/media/demo-pyplot-demo-advanced-test-1414139896.out"
    },
    "system" : {
      "href" : "https://api.tacc.utexas.edu/systems/v2/data.iplantcollaborative.org"
    },
    "parent" : {
      "href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007"
    }
  }
}, {
  "name" : "demo-pyplot-demo-advanced-test-1414139896.pid",
  "path" : "/demo-pyplot-demo-advanced-test-1414139896.pid",
  "lastModified" : "2014-11-06T13:34:33.000-06:00",
  "length" : 6,
  "permission" : "NONE",
  "mimeType" : "application/octet-stream",
  "format" : "unknown",
  "type" : "file",
  "_links" : {
    "self" : {
      "href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/media/demo-pyplot-demo-advanced-test-1414139896.pid"
    },
    "system" : {
      "href" : "https://api.tacc.utexas.edu/systems/v2/data.iplantcollaborative.org"
    },
    "parent" : {
      "href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007"
    }
  }
}, {
  "name" : "testdata.csv",
  "path" : "/testdata.csv",
  "lastModified" : "2014-11-06T13:34:42.000-06:00",
  "length" : 3212,
  "permission" : "NONE",
  "mimeType" : "application/octet-stream",
  "format" : "unknown",
  "type" : "file",
  "_links" : {
    "self" : {
      "href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/media/testdata.csv"
    },
    "system" : {
      "href" : "https://api.tacc.utexas.edu/systems/v2/data.iplantcollaborative.org"
    },
    "parent" : {
      "href" : "https://api.tacc.utexas.edujobs/v2/0001414144065563-5056a550b8-0001-007"
    }
  }
} ]
}

To download a file you would use the following syntax

Show curl
curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/outputs/media/$PATH

information_source:
 The Jobs output service follows the same conventions of the Files service. Thus, you may specify a range header to retrieve a specific byte range. This is particularly helpful when tracking job progress since it gives you a mechanism to tail the output and error log files.

Regardless of job status, the above services will always point to the most recent location of the job data. If you choose for the Jobs service to archive your job after completion, the URL will point to the archive folder of the job. If you do not choose to archive your data, or if archiving fails, the URL will point to the execution folder created for your job at runtime. Because Tapis does not own any of the underlying hardware, it cannot guarantee that those locations will always exist. If, for example, the execution system enforces a purge policy, the output data may be deleted by the system administrators. Tapis will let you know if the data is no longer present, however, it cannot prevent it from being deleted. This is another reason that it is important to archive data you feel will be needed in the future.

Job Lifecycle Management

Tapis handles all of the end-to-end details involved with managing a job lifecycle for you. This can seem like black magic at times, so here we detail the overall lifecycle process every job goes through.

  1. Job request is made, validated, and saved.
  2. Job is queued up for execution. Job stays in a pending state until there are resources to run the job. This means that the target execution system is online, the storage system with the app assets is online, and neither the user nor the system are over quota. a) Resource do not become available with 7 days, the job is killed. b) Resources are available, the job moves on.
  3. When resources are available to run the job on the execution system, a work directory is created on the execution system. The job work directory is created based on the following logic:
if (executionSystem.scratchDir exists)
then
    $jobDir = executionSystem.scratchDir
else if (executionSystem.workDir exists)
then
    $jobDir = system.workDir
else
    $jobDir = system.storage.homeDir
endif

$jobDir = $jobDir + "/" + job.owner + "/job-" + job.uuid
  1. The job inputs are staged to the job work directory, job status is updated to "INPUTS_STAGING" a) All inputs succeed and the job is updated to "STAGED" b) One or more inputs fail to transfer. Job status is set back to "PENDING" and staging will be attempted up to 2 more times. c. User does not have permission to access one or more inputs. The job is set to "FAILED" and exists.
  2. The job again waits until the resources are available to run the job. Usually this is immediately after the inputs finish staging. a) Resource do not become available with 7 days, the job is killed. b) Resources are available, the job moves on.
  3. The app deploymentPath is copied from the app.deploymentSystem to a temp dir on the API server. The jobs API then processes the app.deploymentDir + "/" + app.templatePath file to create the .ipcexe file. The process goes as follows:
    1. Script headers are written. This includes scheduler directives if a batch system, shbang if a forked app.
    2. Additional executionSystem[job.batchQueue].customDirectives are written
    3. "RUNNING" callback written
    4. Module commands are written
    5. executionSystem.environment is written
    6. wrapper script is filtered
      1. blacklisted commands are removed
      2. app parameter template variables are resolved against job parameter values.
      3. app input template variables are resolved against job input values
      4. blacklisted commands are removed again
    7. "CLEANING_UP" callback written
    8. All template macros are resolved.
    9. job.name.slugify + ".ipcexe" file written to temp directory
  4. App assets with wrapper template are copied to remote job work directory.
  5. Directory listing of job work directory is written to a .agave.archive manifest file in the remote job work directory.
  6. Command line is generated to invoke the *.ipcexe file by the appropriate method for the execution system.
  7. Command line is run on the remote system. a. The command succeeds and the scheduler/process/job id is captured and stored with the job record. b. The command fails, return the job to "STAGED" status and try up to 2 more times.
  8. Job is updated to "QUEUED"
  9. Job waits for a "RUNNING" callback and adds a background process to monitor the job in case the callback never comes.
  10. Callback checks the job status according the the following schedule:
* every 30 seconds for the first 5 minutes
* every minute for the next 30 minutes
* every 5 minutes for the next hour
* every 15 minutes for the next 12 hours
* every 30 minutes for the next 24 hours
* every hour for the next 14 days
  1. Job either calls back with a "CLEANING_UP" status update or the monitoring process discovers the job no longer exists on the remote system.

  2. If job.archive is true, send job to archiving queue to stage outputs to job.archiveSystem
    1. Resource do not become available with 7 days, the job is killed.
    2. Resources are available, the job moves on.
      1. Read the .agave.archive manifest file from the job work directory
      2. Begin a breadth first directory traversal of the job work directory
      3. If a file/folder is not in the .agave.archive manifest, copy it to the job.archivePath on the job.archiveSystem
      4. Delete the job work directory
  3. Update job status to "FINISHED"

Jobs Permissions and Sharing

As with the Systems, Apps, and Files services, your jobs have their own set of access controls. Using these, you can share your job and its data with other Tapis users. Job permissions are private by default. The permissions you give a job apply both to the job, its outputs, its metadata, and the permissions themselves. Thus, by sharing a job with another user, you share all aspects of that job.

Job permissions are managed through a set of URLs consistent with the permissions URL elsewhere in the API.

Granting

Granting permissions is simply a matter of issuing a POST with the desired permission object to the job’s pems collection.

tapis jobs pems grant $JOB_UUID $USERNAME $PERMISSION
Show curl
# General grant
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST --data-binary '{"permission":"READ","username":"$USERNAME"}' \
    https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/pems

# Custom url grant
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST --data-binary '{"permission":"READ"}' \
    https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/pems/$USERNAME

Show json response
{
"username": "$USERNAME",
"internalUsername": null,
"permission": {
  "read": true,
  "write": false
},
"_links": {
  "self": {
    "href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/pems/$USERNAME"
  },
  "parent": {
    "href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID"
  },
  "profile": {
    "href": "https://api.tacc.utexas.edu/profiles/v2/$USERNAME"
  }
}
}

The available permission values are listed in Table 2.

Permission Description
READ Gives the ability to view the job status, and output data.
WRITE Gives the ability to perform actions, manage metadata, and set permissions.
ALL Gives full READ and WRITE permissions to the user.
READ_WRITE Synonymous to ALL. Gives full READ and WRITE permissions to the user

Table 2. Supported job permission values.

Job permissions are distinct from file permissions. In many instances, your job output will be accessible via the Files and Jobs services simultaneously. Granting a user permissions to a job output file through the Files services does not alter the accessibility of that file through the Jobs service. It is important, then, that you consider to whom you grant permissions, and the implications of that decision in all areas of your application.

Listing

To find the permissions for a given job, make a GET on the job’s pems collection. Here we see that both the job owner and the user we just granted permission to appear in the response.

tapis jobs pems list -V $JOB_UUID
Show curl
curl -sk -H "Authorization: Bearer $AUTH_TOKEN" \
  'https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/pems/'

Show json response
[
{
  "username": "$API_USERNAME",
  "internalUsername": null,
  "permission": {
    "read": true,
    "write": true
  },
  "_links": {
    "self": {
      "href": "https://api.tacc.utexas.edu/jobs/v2/6608339759546166810-242ac114-0001-007/pems/$API_USERNAME"
    },
    "parent": {
      "href": "https://api.tacc.utexas.edu/jobs/v2/6608339759546166810-242ac114-0001-007"
    },
    "profile": {
      "href": "https://api.tacc.utexas.edu/profiles/v2/$API_USERNAME"
    }
  }
},
{
  "username": "$USERNAME",
  "internalUsername": null,
  "permission": {
    "read": true,
    "write": false
  },
  "_links": {
    "self": {
      "href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/pems/$USERNAME"
    },
    "parent": {
      "href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID"
    },
    "profile": {
      "href": "https://api.tacc.utexas.edu/profiles/v2/$USERNAME"
    }
  }
}
]

Updating

Updating is exactly like granting permissions. Just POST to the same job’s pems collection.

tapis jobs pems grant $USERNAME $PERMISSION $JOB_UUID
Show curl
curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST --data-binary {"permission":"READ_WRITE}" \
    https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/$USERNAME

Show json response
{
"username": "$USERNAME",
"internalUsername": null,
"permission": {
  "read": true,
  "write": true
},
"_links": {
  "self": {
    "href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/pems/$USERNAME"
  },
  "parent": {
    "href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID"
  },
  "profile": {
    "href": "https://api.tacc.utexas.edu/profiles/v2/$USERNAME"
  }
}
}

Deleting

To delete a permission, you can issue a DELETE request on the user permission resource we’ve been using, or update with an empty permission value.

tapis jobs pems revoke $JOB_UUID $USERNAME
Show curl
curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" \
    -X DELETE \
    https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/$USERNAME

Notifications

Under the covers, the Tapis API is an event-driven distributed system implemented on top of a reliable, cloud-based messaging system. This means that every action either observed or taken by Tapis is tied to an event. The changing of a job from one status to another is an event. The granting of permissions on a file is an event. Editing a piece of metadata is an event, and to be sure, the moment you created an account with Tapis was an event. You get the idea.

Having such a fine-grain event system is helpful for the same reason that having a fine-grain permission model is helpful. It affords you the highest degree of flexibility and control possible to achieve the behavior you desire. With Tapis’s event system, you have the ability to alert your users (or yourself) the instant something occurs. You can be proactive rather than reactive, and you can begin orchestrating your complex tasks in a loosely coupled, asynchronous way.

Subscriptions

As consumers of Tapis, you have the ability to subscribe to events occurring on any resource to which you have access. By that we mean, for example, you could subscribe to events on your job and a job that someone shared with you, but you could not subscribe to events on a job submitted by someone else who has not shared the job with you. Basically, if you can see a resource, you can subscribe to its events.

The Notifications service is the primary mechanism by which you create and manage your event subscriptions. A typical use case is a user subscribing for an email alert when her job completes. The following JSON object represents a request for such a notification.

Example notification subscription request
{
"associatedUuid": "0001409758089943-5056a550b8-0001-002",
"event": "OVERWRITTEN",
"persistent": true,
"url": "nryan@rangers.mlb.com"
}

The associatedUuid value is the UUID of her job. Here, we given the UUID of the picsumipsum.txt file we uploaded in the Files Guide. The event value is the name of the event to which she wants to be notified. This example is asking for an email to be sent whenever the file is overwritten. She could have just as easily specified a status of DELETED or RENAME to be notified when the file was deleted or renamed.

The persistent value specifies whether the notification should fire more than once. By default, all event subscriptions are transient. This is because the events themselves are transient. An event occurs, then it is over. There are, however, many situations where events could occur over and over again. Permission events, changes to metadata and data, application registrations on a system, job submissions to a system or queue, etc., all are transient events that can potentially occur many, many times. In these cases it is either not possible or highly undesirable to constantly resubscribe for the same event. The persistent attribute tells the notification service to keep a subscription alive until it is explicitly deleted.

information_source:
 In certain situations you may wish to subscribe to multiple events. You are free to add as many subscriptions as you wish, however in the event that you want to subscribe to all possible events for a given resource, use the wildcard value, *, as the event. This tells the Notifications service that you wanted to be notified of every event for that resource.
information_source:
 A listing of all Tapis’s resource-level events, grouped by resource, can be found in the Events section.

Continuing to work through the example, the url value specifies where the notification should be sent. In this example, our example user specified that she would like to be notified via email. Tapis supports both email and webhook notifications. If you are unfamiliar with webhooks, take a moment to glance at the webhooks.org page for a brief overview. If you are a Gang of Four disciple, webhooks are a mechanism for implementing the Observer Pattern. Webhooks are widely used across the web and chances are that something you’re using right now is leveraging them.

URL Macros

In the context of Tapis, a webhook is a URL to which Tapis will send a POST request when that event occurs. A webhook can be any web accessible URL. While you cannot customize the POST content that Tapis sends (it is unique to the event), you can take advantage of the many template variables that Tapis provides to customize the URL at run time. The following tables show the webhook template variables available for each resource. Use the select box to view the macros for different resources.

Variable Description
UUID The UUID of the app.
EVENT The event which occurred
APP_ID The application id (ex. sabermetrics-2.1)

The value of webhook template variables is that they allow you to build custom callbacks using the values of the resource variable at run time. Several commonly used webhooks are shown in the tables above.

Receive a callback when a new user is created that includes the new user’s information
https://example.com/sendWelcome.php?username=${USERNAME}&email=${EMAIL}&firstName=${FIRST_NAME}&lastName=${LAST_NAME}&src=api.tacc.utexas.edu&nonce=1234567
Receive self-describing job status updates
http://example.com/job/${JOB_ID}?status=${STATUS}&lastUpdated=${LAST_UPDATED}
Get notified on all jobs going into and out of queues
http://example.com/system/${EXECUTION_SYSTEM}/queue/${QUEUE}?action=add
http://example.com/system/${EXECUTION_SYSTEM}/queue/${QUEUE}?action=subtract
Rerun an analysis when a files finishes staging
https://$TAPIS_BASE_URL/jobs/v2/a32487q98wasdfa9-09090b0b-007?action=resubmit
Use plus mailing to route job notifications to different folders
nryan+${EXECUTION_SYSTEM}+${JOB_ID}@gmail.com

Creating

Create a new notification subscription with the following CLI command:

tapis notifications create -F notification.json
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST \
  -H "Content-Type: application/json" \
  --data-binary '{"associatedUuid": "7554973644402463206-242ac114-0001-007", "event": "FINISHED", "url": "http://requestbin.agaveapi.co/zyiomxzy?path=${PATH}&system=>{SYSTEM}&event=${EVENT}" }' \
  https://api.tacc.utexas.edu/notifications/v2?pretty=true

Show json response
{
"id": "7612526206168863206-242ac114-0001-011",
"owner": "nryan",
"url": "http://requestbin.agaveapi.co/zyiomxzy?path=${PATH}&system=${SYSTEM}&event=${EVENT}",
"associatedUuid": "7554973644402463206-242ac114-0001-007",
"event": "FINISHED",
"responseCode": null,
"attempts": 0,
"lastSent": null,
"success": false,
"persistent": false,
"status": "ACTIVE",
"lastUpdated": "2016-08-24T10:07:03.000-05:00",
"created": "2016-08-24T10:07:03.000-05:00",
"policy": {
  "retryLimit": 5,
  "retryRate": 5,
  "retryDelay": 0,
  "saveOnFailure": true,
  "retryStrategy": "NONE"
},
"_links": {
  "self": {
    "href": "https://api.tacc.utexas.edu/notifications/v2/7612526206168863206-242ac114-0001-011"
  },
  "history": {
    "href": "https://api.tacc.utexas.edu/notifications/v2/7612526206168863206-242ac114-0001-011/history"
  },
  "attempts": {
    "href": "https://api.tacc.utexas.edu/notifications/v2/7612526206168863206-242ac114-0001-011/attempts"
  },
  "owner": {
    "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
  },
  "job": {
    "href": "https://api.tacc.utexas.edu/jobs/v2/7554973644402463206-242ac114-0001-007"
  }
}
}

Updating

Updating a subscription is done identically to creation except that the form or JSON is POSTed to the existing subscription URL. An example of doing this using curl as well as the CLI is given below.

The updated notification subscription object:

{
"associatedUuid": "7554973644402463206-242ac114-0001-007",
"event": "*",
"url": "http://requestbin.agaveapi.co/zyiomxzy?path=${PATH}&system=${SYSTEM}&event=${EVENT}"
}

CLI command to update subscription, using the above JSON:

tapis notifications create -F notification.json 2699130208276770330-242ac114-0001-011
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST \
  -H "Content-Type: application/json" \
  -F "fileToUpload=@notification.json" \
  https://api.tacc.utexas.edu/notifications/v2/2699130208276770330-242ac114-0001-011

Show json response
{
"id": "7612526206168863206-242ac114-0001-011",
"owner": "nryan",
"url": "http://requestbin.agaveapi.co/zyiomxzy?path=${PATH}&system=${SYSTEM}&event=${EVENT}",
"associatedUuid": "7554973644402463206-242ac114-0001-007",
"event": "*",
"responseCode": null,
"attempts": 0,
"lastSent": null,
"success": false,
"persistent": false,
"status": "ACTIVE",
"lastUpdated": "2016-08-24T10:07:03.000-05:00",
"created": "2016-08-24T10:07:03.000-05:00",
"policy": {
  "retryLimit": 5,
  "retryRate": 5,
  "retryDelay": 0,
  "saveOnFailure": true,
  "retryStrategy": "NONE"
},
"_links": {
  "self": {
    "href": "https://api.tacc.utexas.edu/notifications/v2/7612526206168863206-242ac114-0001-011"
  },
  "history": {
    "href": "https://api.tacc.utexas.edu/notifications/v2/7612526206168863206-242ac114-0001-011/history"
  },
  "attempts": {
    "href": "https://api.tacc.utexas.edu/notifications/v2/7612526206168863206-242ac114-0001-011/attempts"
  },
  "owner": {
    "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
  },
  "job": {
    "href": "https://api.tacc.utexas.edu/jobs/v2/7554973644402463206-242ac114-0001-007"
  }
}
}

Listing

You can get a list of your current notification subscriptions by performing a GET operation on the base /notifications collection. Adding the UUID of a notification will return just that notification. You can also query for all notifications assigned to a specific UUID by adding associatedUuid=$uuid. An example of querying all notifications using curl as well as the CLI is given below.

List all notificaiton subscriptions with the following CLI command:

tapis notifications list -v
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
https://api.tacc.utexas.edu/notifications/v2/2699130208276770330-242ac114-0001-011

Show json response
[
{
  "id": "7612526206168863206-242ac114-0001-011",
  "url": "http://requestbin.agaveapi.co/zyiomxzy?path=${PATH}&system=${SYSTEM}&event=${EVENT}",
  "associatedUuid": "7554973644402463206-242ac114-0001-007",
  "event": "*",
  "_links": {
    "self": {
      "href": "https://api.tacc.utexas.edu/notifications/v2/7612526206168863206-242ac114-0001-011"
    },
    "profile": {
      "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
    },
    "job": {
      "href": "https://api.tacc.utexas.edu/jobs/v2/7554973644402463206-242ac114-0001-007"
    }
  }
},
{
  "id": "7404907487080223206-242ac114-0001-011",
  "url": "nryan@rangers.texas.mlb.com",
  "associatedUuid": "6904887394479903206-242ac114-0001-007",
  "event": "FINISHED",
  "_links": {
    "self": {
      "href": "https://api.tacc.utexas.edu/notifications/v2/7404907487080223206-242ac114-0001-011"
    },
    "profile": {
      "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
    },
    "job": {
      "href": "https://api.tacc.utexas.edu/jobs/v2/6904887394479903206-242ac114-0001-007"
    }
  }
},
{
  "id": "3676815741209931290-242ac114-0001-011",
  "url": "nryan@rangers.texas.mlb.com",
  "associatedUuid": "3717016635100491290-242ac114-0001-007",
  "event": "FINISHED",
  "_links": {
    "self": {
      "href": "https://api.tacc.utexas.edu/notifications/v2/3676815741209931290-242ac114-0001-011"
    },
    "profile": {
      "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
    },
    "job": {
      "href": "https://api.tacc.utexas.edu/jobs/v2/3717016635100491290-242ac114-0001-007"
    }
  }
}
]

Unsubscribing

To unsubscribe from an event, perform a DELETE on the notification URL. Once deleted, you can not restore a subscription. You can, however create a new one. Keep in mind that if you do this, the UUID of the new notification will be different that that of the deleted one. An example of deleting a notification using curl as well as the CLI is given below.

Unsubscribe from a notification subscription with the following CLI command:

tapis notificaitons delete 2699130208276770330-242ac114-0001-011
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -X DELETE \
    https://api.tacc.utexas.edu/notifications/v2/2699130208276770330-242ac114-0001-011

A standard Tapis response with an empty result will be returned.

Retry Policies

In some situations, Tapis may be unable to publish a specific notification. When this happens, Tapis will immediately retry the notification 5 times in an attempt to deliver it successfully. When delivery fails for a 5th time, the notification is abandoned. If your application requires a more tenacious or methodical approach to retry delivery, you may provide a notification policy.

Example notification subscription object with custom retry policy:

{
  "url" : "$REQUEST_BIN?path=${PATH}&system=${SYSTEM}&event=${EVENT}",
  "event" : "*",
  "persistent": true,
  "policy": {
      "retryStrategy": "IMMEDIATE",
      "retryLimit": 20,
      "retryRate": 5,
      "retryDelay": 0,
      "saveOnFailure": true
    }
}

Name Type Description
retryStrategy NONE, IMMEDIATE, DELAYED, EXPONENTIAL The retry strategy to employ. Default is IMMEDIATE
retryRate int; 0:86400 The frequency with which attempts should be made to deliver the message.
retryLimit int; 0:1440 The maximum attempts that should be made to delivery the message.
retryDelay int; 0:86400 The initial delay between the initial delivery attempt and the first retry.
saveOnFailure boolean Whether the failed message should be persisted if unable to be delivered within the retryLimit

Notification retry policies describe the strategy, frequency, delay, limit, and persistence to be applied when publishing an individual event for a given notification. The example above is our previous example with a notification policy included.

Failed deliveries

By providing a retry policy where saveOnFailure is true, failed messages will be persisted and made available for querying at a later time. This is a great way to handled missed work due to a server failure, maintenance downtime, etc.

To query failed attempts for a specific notification, enter the following CLI command:

Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
https://$API_BASE_URL/notifications/$API_VERSION/229681451607921126-8e1831906a8e-0001-042"/attempts

A list of notification attempts will be returned.

Show json response
[
{
  "id" : "229681451607921126-8e1831906a8e-0001-042",
  "url" : "https://httpbin.org/status/500",
  "event" : "SENT",
  "associatedUuid" : "5833036796741676570-b0b0b0bb0b-0001-011",
  "startTime" : "2016-06-19T22:21:02.266-05:00",
  "endTime" : "2016-06-19T22:21:03.268-05:00",
  "response" : {
    "code" : 500,
    "message" : ""
  },
  "_links" : {
    "self" : {
      "href" : "https://$API_BASE_URL/notifications/$API_VERSION/229123105859441126-8e1831906a8e-0001-011/attempts/229681451607921126-8e1831906a8e-0001-042"
    },
    "notification" : {
      "href" : "https://$API_BASE_URL/notifications/$API_VERSION/5833036796741676570-b0b0b0bb0b-0001-011"
    },
    "profile" : {
      "href" : "https://$API_BASE_URL/profiles/$API_VERSION/ipcservices"
    }
  }
}
]

Note: There is no way to save successful notification deliveries.

PostIts

The PostIts service is a URL shortening service similar to bit.ly, goo.gl, and t.co. It allows you to create pre-authenticated, disposable URLs to any resource in the Tapis Platform. You have control over the lifetime and number of times the URL can be redeemed, and you can expire a PostIt at any time. The most common use of PostIts is to create URLs to files so that you can share with others without having to upload them to a third-party service. Anytime you need to share your science with your world, PostIts can help you.

Creating PostIts

To create a PostIt, send a POST request to the PostIts service with the target url you want to share. In this example, we are sharing a file we have in Tapis’s cloud storage account.

In the response you see standard fields such as created timestamp and the postit token. You also see several fields that lead into the discussion of another aspect of PostIts, such as the ability to restrict usage and expire them on demand.

When creating a postit, one has an option to create a postit with a specified number of allowed uses and expiration, or to create an unlimited postit. If max uses or lifetime is not provided, the default values will be applied regardless if the postit is unlimited. If postit is unlimited, these values will just act as placeholders but will not be used when redeeming.

Default parameters:

  • maxUses - 1
  • lifetime - 30 days
  • unlimited - false

You can create a postit with either content type ‘application/json’ or ‘application/x-www-form-urlencoded’. The target URL must contain the base URL for the correct tenant. The url must also point to one of the following Tapis services: JOBS, FILES, APPS or SYSTEMS.

APPLICATION/JSON examples

Creating a postit with maxUses and lifetime:

Show cURL
$ curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d '{"maxUses": 3, "lifetime": 600", "url": "<target_url>"}' -H "Content-Type: application/json" https://api.tacc.utexas.edu/postits/v2?pretty=true"

Creating unlimited postit:

Show cURL
$ curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d '{"unlimited": true, "url": "<target_url>"}' -H "Content-Type: application/json" https://api.tacc.utexas.edu/postits/v2?pretty=true"

APPLICATION/X-WWW-FORM-URLENCODED examples

Creating a postit with maxUses and lifetime:

Show cURL
$ curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "maxUses=3&lifetime=600&url=<target_url>"} https://api.tacc.utexas.edu/postits/v2pretty=true"

Creating unlimited postit:

Show cURL
$ curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "unlimited=true&url=<target_url>" https://api.tacc.utexas.edu/postits/v2?pretty=true"

CLI example

(Note: CLI does not currently support unlimited postits)

Show CLI Command
tapis postits create \
 -m 10 \
 -L 86400 \
 https://api.tacc.utexas.edu/files/v2/media/system/data.iplantcollaborative.org/nryan/picksumipsum.txt

Example Postit Creation Response

Show JSON Response
  {
  "creator": "jstubbs"
  "createdAt": "2020-09-30T21:51:31-05:00",
  "expiresAt": "2020-10-01T00:14:51-05:00",
  "remainingUses": 10,
  "postit": "0feb1aa5-01aa-4445-b580-a008064a4c44-010",
  "numberIsed": 0,
  "tenantId": "tacc.prod",
  "status": "ACTIVE"
  "noauth": false,
  "method": "GET"
  "url": "https://api.tacc.utexas.edu/files/v2/media/system/data.iplantcollaborative.org//home/jstubbs/picksumipsum.txt",
  "method": "GET",
  "_links":{
    "self":{
      "href":"https://api.tacc.utexas.edu/postits/v2/0feb1aa5-01aa-4445-b580-a008064a4c44-010"
    },
    "profile":{
      "href":"https://api.tacc.utexas.edu/profiles/v2/jstubbs"
    },
    "file":{
      "href":"https://api.tacc.utexas.edu/files/v2/media/system/data.iplantcollaborative.org//home/jstubbs/picksumipsum.txt"
    },
    "update":{
      "href":"https://api.tacc.utexas.edu/postits/v2/update/0feb1aa5-01aa-4445-b580-a008064a4c44-010"
    },
    "list":{
      "href":"https://api.tacc.utexas.edu/postits/v2/listing/0feb1aa5-01aa-4445-b580-a008064a4c44-010"
    }
  }
}

Available parameters to create a postit.

JSON Parameter JSON Type Description
maxUses integer The number of times a postit can be redeemed. Must be at least 1. Negative values are not allowed.
lifetime integer How long the postit will live, in seconds. This number is used to generate the expiration time and date by adding the seconds to the current date and time. The resulting expiration time must be before date 1/19/2038.
force boolean Appends the force argument to the curl command.
unlimited boolean True to create a postit that does not have an expiration date or max uses.
url string The url to be redeemed by the postit. *Always required.
noauth boolean Legacy parameter that will be accepted, but ignored by the new Aloe service.
internalUsername string Legacy parameter that will be accepted, but ignored by the new Aloe service.
method string Legacy parameter that will be accepted, but ignored by the new Aloe service.
warning:If you intend and using a PostIt as a link in a web page or a messaging service like Slack, HipChat, Facebook, Twitter, etc, which unfurl URL for display, then you should set the maximum uses greater than 4 due to the number of preflight requests made to the URL for display. Failing to do so will result in the URL showing up in the feed, but failing to resolve when clicked to download.

Listing PostIts

To list all currently active PostIts, see the following commands:

Show CLI Command
tapis postits list -v
Show cURL
curl -sk -H "Authorization: Bearer $AUTH_TOKEN" 'https://api.tacc.utexas.edu/postits/v2/?pretty=true'

The curl interface also allows listing postits by status. Just use ?status=<status> at the end of the URL. For example, the following curl would return all expired postits. See the table below for other status options.

curl -sk -H "Authorization: Bearer $AUTH_TOKEN"
\ 'https://api.tacc.utexas.edu/postits/v2/?pretty=true&status=expired'

Status Fields

Status Description
ACTIVE Postit is redemeemable.
EXPIRED_AND_NO_USES Postit is both expired and out of remaining uses.
EXPIRED Postit has expired.
NO_USES Postit is out of remaining uses.
REVOKED The postit has been revoked. Can no longer redeem nor update this postit.
NOT_FOUND (Not a status) Indicates status could not be calculated.
ALL (Not a status) Indicates to include all statuses.

Listing Single PostIt

You can list the information for any PostIt UUID, as long as it is on the same tenant.

List a single postit

Show cURL
curl -H "Authorization: Bearer $AUTH_TOKEN"'https://api.tacc.utexas.edu/postits/v2/listing/0feb1aa5-01aa-4445-b580-a008064a4c44-010'

Updating PostIts

The creator of a postit and tenant admins can update a postit. One may update maxUses, lifetime and unlimited. If a postit transitions from unlimited to limited without maxUses and lifetime, the current expiration and remaining uses is used. When updating the lifetime, a new expiration time will be calculated based on the lifetime sent in. It does not add on to the current expiration time.

If you need to update other fields, such as url, you will need to revoke this postit and create a new one.

Update a postit from unlimited to limited, in JSON format

Show cURL
curl -H "Authorization: Bearer $AUTH_TOKEN"'https://api.tacc.utexas.edu/postits/v2/update/0feb1aa5-01aa-4445-b580-a008064a4c44-010' \
    -X POST -d '{"maxUses": 100, "lifetime": 2000, "unlimited": false}' -H "Content-type: application/json"

Update a postit from limited to unlimited, in XML format

Show cURL
curl -H "Authorization: Bearer $AUTH_TOKEN"'https://api.tacc.utexas.edu/postits/v2/update/0feb1aa5-01aa-4445-b580-a008064a4c44-010' \
    -X POST -d "unlimited=true"

Redeeming PostIts

You redeem a PostIt by making a non-authenticated HTTP request on the PostIt URL. In the above example, that would be https://api.tacc.utexas.edu/postits/v2/0feb1aa5-01aa-4445-b580-a008064a4c44-010. Every time you make a get request on the PostIt, the remainingUses field decrements by 1 and the numberUsed field increments by 1. This continues until the value hits 0 or the PostIt outlives its expiresAt field. If a postit is unlimited, the remainingUses field does not decrement, and the expiresAt field is not used. However, the postit will retain these original values for the case that a postit is reverted to a limited postit.

cURL command for redeeming a PostIt, which would download the picksumipsum.txt file from your storage system to the user’s machine:

Show cURL
curl -s -o picksumipsum.txt 'https://api.tacc.utexas.edu/postits/v2/0feb1aa5-01aa-4445-b580-a008064a4c44-010'

warning:There will be no response for redeeming PostIts, even if the redemption fails.

Forcing PostIt Browser Downloads

If you are using PostIts in a browser environment, you can force a file download by adding force=true to the PostIt URL query. If the target URL is a file item, the name of the file item will be included in the Content-Disposition header so the downloaded file has the correct file name. You may also add the same query parameter to any target file item to force the Content-Disposition header from the Files API.

Expiring PostIts

In addition to setting expiration parameters when you create a PostIt, you can manually expire a PostIt at any time by making an authenticated DELETE request on the PostIt URL. This will instantly expire, or revoke, the PostIt from further use. A revoked postit cannot be updated.

Manually expiring a PostIt with CLI:

Show CLI Command
tapis postits delete 0feb1aa5-01aa-4445-b580-a008064a4c44-010

Show cURL
curl -k -H "Authorization: Bearer $AUTH_TOKEN" -X DELETE 'https://api.tacc.utexas.edu/postits/v2/0feb1aa5-01aa-4445-b580-a008064a4c44-010?pretty=true'

Metadata v2

The Tapis Metadata service allows you to manage metadata and associate it with Tapis entities via associated UUIDs. It supports JSON schema for structured JSON metadata; it also accepts any valid JSON-formatted metadata or plain text String when no schema is specified. As with other Tapis services, a full access control layer is available, enabling you to keep your metadata private or share it with your colleagues.

Metadata Structure

Key-value metadata item
{
  "name": "some metadata",
  "value": "A model organism...",
}
Structured metadata item, metadata.json
{
  "name":"some metadata",
  "value":{
    "title":"Example Metadata",
    "properties":{
      "species":"arabidopsis",
      "description":"A model organism..."
    }
  }
}

Every metadata item has four fields shown in the following table.

The name field is just that, a user-defined name you give to your metadata item. There is no uniqueness constraint put on the name field, so it is up to you to the application to enforce whatever naming policy it sees fit.

Depending on your application needs, you may use the Metadata service as a key-value store, document store, or both. When using it as a key-value store, you provide text for the value field. When you fetching data, you could search by exact value or full-text search as needed.

When using the Metadata service as a document store, you provide a JSON object or array for the value field. In this use case you can leverage additional functionality such as structured queries, atomic updates, etc.

Either use case is acceptable and fully supported. Your application needs will determine the best approach for you to take.

Associations

Each metadata item also has an optional associationIds field. This field contains a JSON array of Tapis UUID for which this metadata item applies. This provides a convenient grouping mechanism by which to organize logically-related resources. One common examples is creating a metadata item to represent a “data collection” and associating files and folders that may be geographically distributed under that “data collection”. Another is creating a metadata item to represent a “project”, then sharing the “project” with other users involved in the “project”.

Metadata items can also be associated with other metadata items to create hierarchical relationships. Building on the “project” example, additional metadata items could be created for “links”, “videos”, and “experiments” to hold references for categorized groups of postits, video file items, and jobs respectively. Such a model translates well to a user interface layer and eliminates a large amount of boilerplate code in your application.

information_source:
 The associationIds field does not carry with it any special permissions or behavior. It is simply a link between a metadata item and the resources it represents.

Creating Metadata

Create a new metadata item
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
    -H 'Content-Type: application/json'
    --data-binary '{"value": {"title": "Example Metadata", "properties": {"species": "arabidopsis", "description": "A model organism..."}}, "name": "mustard plant"}'
    https://api.tacc.utexas.edu/meta/v2/data?pretty=true

Show Tapis CLI
tapis meta create -v -V '{"value": {"title": "Example Metadata", "properties": {"species": "arabidopsis", "description": "A model organism..."}}, "name": "mustard plant"}'

The response will look something like the following:
Show json response

{

“uuid”: “4054837257140638186-242ac116-0001-012”, “schemaId”: null, “internalUsername”: null, “owner”: “sgopal”, “associationIds”: [], “name”: “sgopal.c41109da13893b6f.200414T000224Z”, “value”: {

“value”: {

“title”: “Example Metadata”, “properties”: {

“species”: “arabidopsis”, “description”: “A model organism…”

}

},
“name”: “mustard plant”

}, “created”: “2020-04-13T19:02:24.336-05:00”, “lastUpdated”: “2020-04-13T19:02:24.336-05:00”, “_links”: {

“self”: {
“href”: “https://api.sd2e.org/meta/v2/data/4054837257140638186-242ac116-0001-012

}, “permissions”: {

}, “owner”: {

}, “associationIds”: []

}

}


New Metadata are created in the repository via a POST to their collection URLs. As we mentioned before, there is no uniqueness constraint placed on metadata items. Thus, repeatedly POSTing the same metadata item to the service will create duplicate entries, each with their own unique UUID assigned by the service.

Updating Metadata

Update a metadata item
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
    -H 'Content-Type: application/json'
    --data-binary '{"value": {"title": "Example Metadata", "properties": {"species": "arabidopsis", "description": "A model plant organism..."}}, "name": "some metadata", "associationIds":["179338873096442342-242ac113-0001-002","6608339759546166810-242ac114-0001-007"]}'
    https://api.tacc.utexas.edu/meta/v2/data/7341557475441971686-242ac11f-0001-012?pretty=true

Show Tapis CLI
tapis meta update -v -V '{"value": {"title": "Example Metadata", "properties": {"species": "arabidopsis", "description": "A model plant organism..."}}, "name": "some metadata", "associationIds":["179338873096442342-242ac113-0001-002","6608339759546166810-242ac114-0001-007"]}' 9057222358650121750-242ac116-0001-012

The response will look something like the following:
Show json response
{
  "uuid": "7341557475441971686-242ac11f-0001-012",
  "schemaId": null,
  "internalUsername": null,
  "associationIds": [
    "179338873096442342-242ac113-0001-002",
    "6608339759546166810-242ac114-0001-007"
  ],
  "lastUpdated": "2016-08-29T05:51:39.908-05:00",
  "name": "some metadata",
  "value": {
    "title": "Example Metadata",
    "properties": {
      "species": "arabidopsis",
      "description": "A model plant organism..."
    }
  },
  "created": "2016-08-29T05:43:18.618-05:00",
  "owner": "nryan",
  "_links": {
    "self": {
      "href": "https://api.tacc.utexas.edu/meta/v2/data/7341557475441971686-242ac11f-0001-012"
    },
    "permissions": {
      "href": "https://api.tacc.utexas.edu/meta/v2/data/7341557475441971686-242ac11f-0001-012/pems"
    },
    "owner": {
      "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
    },
    "associationIds": [
      {
        "rel": "179338873096442342-242ac113-0001-002",
        "href": "https://api.tacc.utexas.edu/files/v2/media/system/storage.example.com//",
        "title": "file"
      },
      {
        "rel": "6608339759546166810-242ac114-0001-007",
        "href": "https://api.tacc.utexas.edu/jobs/v2/6608339759546166810-242ac114-0001-007",
        "title": "job"
      }
    ]
  }
}

Updating metadata is done by POSTing an updated metadata object to the existing resource. When updating, it is important to note that it is not possible to change the metadata uuid, owner, lastUpdated or created fields. Those fields are managed by the service.

Deleting Metadata

Delete a metadata item
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -X DELETE
    https://api.tacc.utexas.edu/meta/v2/data/7341557475441971686-242ac11f-0001-012?pretty=true

Show Tapis CLI
tapis meta delete 7341557475441971686-242ac11f-0001-012

An empty response will be returned from the service.

To delete a metadata item, simply make a DELETE request on the metadata resource.

warning:Deleting a metadata item will permanently delete the item and all its permissions, etc.

Metadata details

Fetching a metadata item
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    https://api.tacc.utexas.edu/meta/v2/data/6877878304112316906-242ac116-0001-012?pretty=true

Show Tapis CLI
tapis meta show -v 6877878304112316906-242ac116-0001-012

The response will look something like the following:
Show json response
{
  "uuid": "6877878304112316906-242ac116-0001-012",
  "schemaId": null,
  "internalUsername": null,
  "owner": "sgopal",
  "associationIds": [],
  "name": "sgopal.c41109da13893b6f.200414T001817Z",
  "value": {
    "value": {
      "title": "Example Metadata",
      "properties": {
        "species": "arabidopsis",
        "description": "A model organism..."
      }
    },
    "name": "mustard plant"
  },
  "created": "2020-04-13T19:18:17.567-05:00",
  "lastUpdated": "2020-04-13T19:18:17.567-05:00",
  "_links": {
    "self": {
      "href": "https://api.sd2e.org/meta/v2/data/6877878304112316906-242ac116-0001-012"
    },
    "permissions": {
      "href": "https://api.sd2e.org/meta/v2/data/6877878304112316906-242ac116-0001-012/pems"
    },
    "owner": {
      "href": "https://api.sd2e.org/profiles/v2/sgopal"
    },
    "associationIds": []
  }
}

To fetch a detailed description of a metadata item, make a GET request on the resource URL. The response will be the full metadata item representation. Two points of interest in the example response are that the response does not have an id field. Instead, it has a uuid field which serves as its ID. This is the result of regression support for legacy consumers and will be changed in the next major release.

The second point of interest in the response is the _links.associationIds array in the hypermedia response. This contains an expanded representation of the associationIds field in the body. The objects in this array are similar to the information you would recieve by calling the UUID API to resolve each of the associationIds array values. By leveraging the information in the hypermedia response, you can save several round trips to resolve basic information about the resources the associationIds represent.

information_source:
 In the event you need the entire resource representations for each associationIds value, you can simply explode the json array into a comma-separated string and call the UUID API with expand=true in the query.

Metadata browsing

Listing your metadata
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    'https://api.tacc.utexas.edu/meta/v2/data?limit=1&pretty=true'

Show Tapis CLI
tapis meta list -v -l 1

The response will look something like the following:
Show json response
[
  {
    "uuid": "6877878304112316906-242ac116-0001-012",
    "owner": "sgopal",
    "associationIds": [],
    "name": "sgopal.c41109da13893b6f.200414T001817Z",
    "value": {
      "value": {
        "title": "Example Metadata",
        "properties": {
          "species": "arabidopsis",
          "description": "A model organism..."
        }
      },
      "name": "mustard plant"
    },
    "created": "2020-04-13T19:18:17.567-05:00",
    "lastUpdated": "2020-04-13T19:18:17.567-05:00",
    "_links": {
      "self": {
        "href": "https://api.sd2e.org/meta/v2/data/6877878304112316906-242ac116-0001-012"
      },
      "permissions": {
        "href": "https://api.sd2e.org/meta/v2/data/6877878304112316906-242ac116-0001-012/pems"
      },
      "owner": {
        "href": "https://api.sd2e.org/profiles/v2/sgopal"
      },
      "associationIds": []
    }
  }
]

To browse your Metadata, make a GET request against the /meta/v2/data collection. This will return all the metadata you created and to which you have been granted READ access. This includes any metadata items that have been shared with the public or world users. In practice, users will have many metadata items created and shared with them as part of normal use of the platform, so pagination and search become important aspects of interacting with the service.

For admins, who have implicit access to all metadata, the default listing response will be a paginated list of every metadata item in the tenant. To avoid such a scenario, admin users can append privileged=false to bypass implicit permissions and only return the metadata queries to which they have ownership or been granted explicit access.

information_source:
 Admin users can append privileged=false to bypass implicit permissions and only return the metadata queries to which they have ownership or been granted explicit access.

Metadata Validation

Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
    -H 'Content-Type: application/json'
    --data-binary '{"schemaId": "4736020169528054246-242ac11f-0001-013", "value": {"title": "Example Metadata", "properties": {"description": "A model organism..."}}, "name": "some metadata"}'
    https://api.tacc.utexas.edu/meta/v2/data

Show Tapis CLI
tapis meta update -v -V <<<'{"schemaId": "4736020169528054246-242ac11f-0001-013", "value": {"title": "Example Metadata", "properties": {"description": "A model organism..."}}, "name": "some metadata"}'

The response will look something like the following:
Show json response
{
  "status" : "error",
  "message" : "Metadata value does not conform to schema.",
  "version" : "2.1.8-r8bb7e86"
}

Often times it is necessary to validate metadata for format or simple quality control. The Metadata service is capable of validating the value of a metadata item against a predefined JSON Schema definition. In order to leverage this feature, you must first register your JSON Schema definition with the Metadata Schemata service, then reference the UUID of that metadata schema resource in the schemaId field.

Given our previous example metadata schema object, the following request would fail due to a missing “species” value in the metadata item value field.

Metadata Searching

Searching metadata for all items with name like “mustard plant”
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    --data-urlencode '{"name": "mustard plant"}'
    https://api.tacc.utexas.edu/meta/v2/data

Show Tapis CLI
tapis meta search --name like "mustard"

The response will look something like the following:
Show json response
[
  {
    "uuid": "7341557475441971686-242ac11f-0001-012",
    "schemaId": null,
    "internalUsername": null,
    "associationIds": [
      "179338873096442342-242ac113-0001-002",
      "6608339759546166810-242ac114-0001-007"
    ],
    "lastUpdated": "2016-08-29T05:51:39.908-05:00",
    "name": "some metadata",
    "value": {
      "title": "Example Metadata",
      "properties": {
        "species": "arabidopsis",
        "description": "A model plant organism..."
      }
    },
    "created": "2016-08-29T05:43:18.618-05:00",
    "owner": "nryan",
    "_links": {
      "self": {
        "href": "https://api.tacc.utexas.edu/meta/v2/schemas/4736020169528054246-242ac11f-0001-013"
      },
      "permissions": {
        "href": "https://api.tacc.utexas.edu/meta/v2/schemas/4736020169528054246-242ac11f-0001-013/pems"
      },
      "owner": {
        "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
      },
      "associationIds": [
        {
          "rel": "179338873096442342-242ac113-0001-002",
          "href": "https://api.tacc.utexas.edu/files/v2/media/system/storage.example.com//",
          "title": "file"
        },
        {
          "rel": "6608339759546166810-242ac114-0001-007",
          "href": "https://api.tacc.utexas.edu/jobs/v2/6608339759546166810-242ac114-0001-007",
          "title": "job"
        }
      ]
    }
  }
]

In addition to retrieving Metadata via its UUID, the Metadata service supports MongoDB query syntax. Just add the q=<value> to URL query portion of your GET request on the metadata collection. This differs from other APIs, but provides a richer syntax to query and filter responses.

If you wanted to look up Metadata corresponding to a specific value within its JSON Metadata value, you can specify this using a JSON object such as {"name": "mustard plant"}. Remember that, in order to send JSON in a URL query string, it must first be URL encoded. Luckily this is easily handled for us by curl and the Tapis CLI.

The given query will return all metadata with name, “mustard plant” that you have permission to access.

Search Examples

metadata search by exact name
{"name": "mustard plant"}
metadata search by field in value
{"value.type": "a plant"}
metadata search for values with any field matching an item in the given array
{ "value.profile.status": { "$in": [ "active", "paused" ] } }
metadata search for items with a name matching a case-insensitive regex
{ "name": { "$regex": "^Cactus.*", "$options": "i"}}
metadata search for value by regex matched against each line of a value
{ "value.description": { "$regex": ".*monocots.*", "$options": "m"}}
metadata search for value by conditional queries
{
   "$or":[
      {
         "value.description":{
            "$regex":[
               ".*prickly pear.*",
               ".*tapis.*",
               ".*century.*"
            ],
            "$options":"i"
         }
      },
      {
         "value.title":{
            "$regex":".*Cactus$"
         },
         "value.order":{
            "$regex":"Agavoideae"
         }
      }
   ]
}

Some common search syntax examples. Consult the MongoDB Query Documentation for more examples and full syntax documentation.

Metadata Permissions

The Metadata service supports permissions for both Metadata and Schemata consistent with that of a number of other Tapis services. If no permissions are explicitly set, only the owner of the Metadata and tenant administrators can access it.

The permissions available for Metadata and Metadata Schemata are listed in the following table. Please note that a user must have WRITE permissions to grant or revoke permissions on a metadata or schema item.

Name Description
READ User can view the resource
WRITE User can edit, but not view the resource
READ_WRITE User can manage the resource
ALL User can manage the resource
NONE User can view the resource
information_source:
 You need to change the uuids and usernames to for the queries below to work.

Listing all permissions

List the permissions on Metadata for a given user
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    https://api.tacc.utexas.edu/meta/v2/data/6877878304112316906-242ac116-0001-012/pems/rclemens?pretty=true

Show Tapis CLI
tapis meta pems show -v\
6877878304112316906-242ac116-0001-012 sgopal

The response will look something like the following:
Show json response
{
  "username": "sgopal",
  "permission": {
    "read": true,
    "write": true
  },
  "_links": {
    "self": {
      "href": "https://api.sd2e.org/meta/v2/6877878304112316906-242ac116-0001-012/pems/sgopal"
    },
    "parent": {
      "href": "https://api.sd2e.org/meta/v2/6877878304112316906-242ac116-0001-012"
    },
    "profile": {
      "href": "https://api.sd2e.org/meta/v2/sgopal"
    }
  }
}

To list all permissions for a metadata item, make a GET request on the metadata item’s permission collection

List permissions for a specific user

List the permissions on Metadata for a given user
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    https://api.tacc.utexas.edu/meta/v2/data/6877878304112316906-242ac116-0001-012/pems/nryan?pretty=true

Show Tapis CLI
tapis meta pems show -v\
6877878304112316906-242ac116-0001-012 sgopal

The response will look something like the following:
Show json response
{
  "username":"sgopal",
  "permission":{
    "read":true,
    "write":true
  },
  "_links":{
    "self":{
      "href":"https://api.tacc.utexas.edu/meta/v2/6877878304112316906-242ac116-0001-012/pems/sgopal"
    },
    "parent":{
      "href":"https://api.tacc.utexas.edu/meta/v2/6877878304112316906-242ac116-0001-012"
    },
    "profile":{
      "href":"https://api.tacc.utexas.edu/meta/v2/sgopal"
    }
  }
}

Checking permissions for a single user is simply a matter of adding the username of the user in question to the end of the metadata permission collection.

Grant permissions

Grant read access to a metadata item
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
    --data '{"permission":"READ"}'
    https://api.tacc.utexas.edu/meta/v2/data/6877878304112316906-242ac116-0001-012/pems/rclemens?pretty=true

Show Tapis CLI
tapis meta pems grant -v 6877878304112316906-242ac116-0001-012 rclemens READ

Grant read and write access to a metadata item
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
    --data '{"permission":"READ_WRITE"}'
    https://api.tacc.utexas.edu/meta/v2/data/6877878304112316906-242ac116-0001-012/pems/rclemens?pretty=true

Show Tapis CLI
tapis meta pems grant -v 6877878304112316906-242ac116-0001-012 rclemens READ_WRITE

The response will look something like the following:
Show json response
{
  "username": "rclemens",
  "permission": {
    "read": true,
    "write": true
  },
  "_links": {
    "self": {
      "href": "https://api.tacc.utexas.edu/meta/v2/6877878304112316906-242ac116-0001-012/pems/rclemens"
    },
    "parent": {
      "href": "https://api.tacc.utexas.edu/meta/v2/6877878304112316906-242ac116-0001-012"
    },
    "profile": {
      "href": "https://api.tacc.utexas.edu/meta/v2/sgopal"
    }
  }
}

To grant another user read access to your metadata item, assign them READ permission. To enable another user to update a metadata item, grant them READ_WRITE or ALL access.

Delete single user permissions

Delete permission for single user on a Metadata item
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -X DELETE
    https://api.tacc.utexas.edu/meta/v2/data/6877878304112316906-242ac116-0001-012/pems/rclemens?pretty=true

Show Tapis CLI
tapis meta pems revoke 6877878304112316906-242ac116-0001-012 rclemens

An empty response will come back from the API.

Permissions may be deleted for a single user by making a DELETE request on the metadata user permission resource. This will immediately revoke all permissions to the metadata item for that user.

information_source:
 Please note that ownership cannot be revoked or reassigned. The user who created the metadata item will always have ownership of that item.

Deleting all permissions

Delete all permissions on a Metadata item
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -X DELETE
    https://api.tacc.utexas.edu/meta/v2/data/6877878304112316906-242ac116-0001-012/pems?pretty=true

Show Tapis CLI
tapis meta pems drop 6877878304112316906-242ac116-0001-012

An empty response will be returned from the service.

Permissions may be deleted for a single user by making a DELETE request on the metadata resource permission collection.

warning:The above operation will delete all permissions for a Metadata item, such that only the owner will be able to access it. Use with care.

Metadata Schemata

Schema can be provided in JSON Schema form. The service will validate that the schema is valid JSON and store it. To validate Metadata against it, the schema UUID should be given as a parameter, schemaId, when uploading Metadata. If no schemaId` is provided, the Metadata service will accept any JSON Object or plain text string and store it accordingly. This flexible approach allows Tapis a high degree of flexibility in handling structured and unstructured metadata alike.

For more on JSON Schema please see http://json-schema.org/

information_source:
 The metadata service supports both JSON Schema v3 and v4. No additional work is needed on your part to specify which version you want to use, the service will autodetect the version and validate it accordingly.

To add a metadata schema to the repository:

Creating schemata

Example JSON Schema document, schema.json
{
  "title": "Example Schema",
  "type": "object",
  "properties": {
    "species": {
      "type": "string"
    }
  },
  "required": [
    "species"
  ]
}
Creating a new metadata schema
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -X POST -H "Content-Type: application/json"
    --data-binary '{ "title": "Example Schema", "type": "object", "properties": { "species": { "type": "string" } },"required": ["species"] }'
    https://api.tacc.utexas.edu/meta/v2/schemas/

The response will look something like the following:
Show json response
{
  "uuid": "4736020169528054246-242ac11f-0001-013",
  "internalUsername": null,
  "lastUpdated": "2016-08-29T04:52:11.474-05:00",
  "schema": {
    "title": "Example Schema",
    "type": "object",
    "properties": {
      "species": {
        "type": "string"
      }
    },
    "required": [
      "species"
    ]
  },
  "created": "2016-08-29T04:52:11.474-05:00",
  "owner": "nryan",
  "_links": {
    "self": {
      "href": "https://api.tacc.utexas.edu/meta/v2/schemas/4736020169528054246-242ac11f-0001-013"
    },
    "permissions": {
      "href": "https://papi.tacc.utexas.edu/meta/v2/schemas/4736020169528054246-242ac11f-0001-013/pems"
    },
    "owner": {
      "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
    }
  }
}

To create a new metadata schema that can be used to validate metadata items upon addition or updating, POST a JSON Schema document to the service.

More JSON Schema examples can be found in the Tapis Samples project.

Updating schema

Update a metadata schema
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
    -H 'Content-Type: application/json'
    --data-binary '{ "title": "Example Schema", "type": "object", "properties": { "species": { "type": "string" }, "description": {"type":"string"} },"required": ["species"] }'
    https://api.tacc.utexas.edu/meta/v2/data/4736020169528054246-242ac11f-0001-013

Show Tapis CLI
tapis meta update -v <<< '{ "title": "Example Schema", "type": "object", "properties": { "species": { "type": "string" }, "description": {"type":"string"} },"required": ["species"] }' 4736020169528054246-242ac11f-0001-013

The response will look something like the following:
Show json response
{
  "uuid": "4736020169528054246-242ac11f-0001-013",
  "internalUsername": null,
  "lastUpdated": "2016-08-29T04:52:11.474-05:00",
  "schema": {
    "title": "Example Schema",
    "type": "object",
    "properties": {
      "species": {
        "type": "string"
      }
    },
    "required": [
      "species"
    ]
  },
  "created": "2016-08-29T04:52:11.474-05:00",
  "owner": "nryan",
  "_links": {
    "self": {
      "href": "https://api.tacc.utexas.edu/meta/v2/schemas/4736020169528054246-242ac11f-0001-013"
    },
    "permissions": {
      "href": "https://api.tacc.utexas.edu/meta/v2/schemas/4736020169528054246-242ac11f-0001-013/pems"
    },
    "owner": {
      "href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
    }
  }
}

Updating metadata schema is done by POSTing an updated schema object to the existing resource. When updating, it is important to note that it is not possible to change the schema uuid, owner, lastUpdated or created fields. Those fields are managed by the service.

Deleting schema

Delete a metadata schema
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -X DELETE
    https://api.tacc.utexas.edu/meta/v2/data/4736020169528054246-242ac11f-0001-013

An empty response will be returned from the service.

To delete a metadata schema, simply make a DELETE request on the metadata schema resource.

warning:Deleting a metadata schema will permanently delete the schema and all its history, permissions, etc. Once the schema is deleted, the remaining metadata items will not be automatically updated, thus it is important to know that updates to metadata items that still reference the schema will fail.

Specifying schemata as $ref

When building new JSON Schema definitions, it is often helpful to break each object out into its own definition and use $ref fields to reference them. The metadata service supports such references between metadata schema resources. Simply provide the fully qualified URL of another valid metadata schema resources as the value to a $ref field and Tapis will resolve the reference internally, applying the appropriate authentication and authorization for the requesting user to the request to the referenced resource.

warning:When using Tapis Metadata Schema as external references in a JSON Schema definition, make sure you grant at READ permission or greater to every referenced Tapis Metadata Schema resource needed to resolved the JSON Schema definition.

Metadata v3

Meta V3 is a REST API Microservice for MongoDB which provides server-side Data, Identity and Access Management for Web and Mobile applications.

Overview

Meta V3 is:

A Stateless Microservice. With Meta V3 projects can focus on building Angular or other frontend applications, because most of the server-side logic necessary to manage database operations, authentication / authorization and related APIs is automatically handled, without the need to write any server-side code except for the UX/UI.

For example, to insert data into MongoDB a developer has to just create client-side JSON documents and then execute POST operations via HTTP to Meta V3. Other functions of a modern MongoDB installation like flexible schema, geoJson and aggregation pipelines ease the development process.

Every tenant will have access to at least one database where they can store and manage json documents. Documents are the trailing end of a nested hierarchy of data that begins with a database that houses one or more collections. The collections house json documents the structure of which is left up to the administrators of the tenant database.

Permissions for access to databases, collections and documents must be predefined before accessing those resources. The definitions for access are defined within the Security Kernel API of Tapis V3 and must be added by a tenant or service administrator. See the Permissions section below for some examples of permissions definitions and access to resources in the Meta V3 API.

Getting Started

Create a document

We have a database named MyTstDB and a collection name MyCollection. To add a json document to MyCollection, we can do the following:

With CURL:

$ curl -v -X POST -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" --data '{"name": "test document slt 7.21.2020-14:27","jimmyList": ["1","3"],"description": "new whatever",}'  $BASE_URL/v3/meta/MyTstDB/MyCollection?basic=true

The response will have an empty response body with a status code of 201 “Created” unless the “basic” url query parameter is set to true. Setting the “basic” parameter to true will give a Tapis Basic response along with the “_id” of the newly created document. A more detailed discussion of autogenerated ids and specified ids can be found in the “Create Document” section of “Document Resources”.

{
  "result": {
    "_id": "5f189316e37f7b5a692285f3"
  },
  "status": "201",
  "message": "Created",
  "version": "0.0.1"
}

List documents

Using our MyTstDb/MyCollection resources we can ask for a default list of documents in MongoDB default sorted order. The document we created earlier should be listed with a new “_id” field that was autogenerated by MongoDB.

With CURL:

$ curl -v -X GET -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/MyTstDB/MyCollection

The response will be an array of json documents from MyCollection :

[
  {
    "_id": {
      "$oid": "5f189316e37f7b5a692285f3"
    },
    "name": "test document slt 7.21.2020-14:27",
    "jimmyList": [
      "1",
      "3"
    ],
    "description": "new whatever",
    "_etag": {
      "$oid": "5f189316296c81742a6a3e4c"
    }
  },
  {
    "_id": {
      "$oid": "5f1892ece37f7b5a692285e9"
    },
    "name": "test document slt 7.21.2020-14:25",
    "jimmyList": [
      "1",
      "3"
    ],
    "description": "new whatever",
    "_etag": {
      "$oid": "5f1892ec296c81742a6a3e4b"
    }
  }
]

Get a document

If we know the “_id” of a created document, we can ask for it directly.

With CURL:

$ curl -v -X GET -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/MyTstDB/MyCollection/5f1892ece37f7b5a692285e9

The response will be a json document from MyCollection with the “_id” of 5f1892ece37f7b5a692285e9 :

{
  "_id": {
    "$oid": "5f1892ece37f7b5a692285e9"
  },
  "name": "test document slt 7.21.2020-14:25",
  "jimmyList": [
    "1",
    "3"
  ],
  "description": "new whatever",
  "_etag": {
    "$oid": "5f1892ec296c81742a6a3e4b"
  }
}

Find a document

We can pass a query parameter named “filter” and set the value to a json MongoDB query document. Let’s find a document by a specific “name”.

With CURL:

$ curl -v -X GET -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" --data-urlencode filter='{"name": "test document slt 7.21.2020-14:25"}' $BASE_URL/v3/meta/MyTstDB/MyCollection

The response will be an array of json documents from MyCollection :

[
  {
    "_id": {
      "$oid": "5f1892ece37f7b5a692285e9"
    },
    "name": "test document slt 7.21.2020-14:25",
    "jimmyList": [
      "1",
      "3"
    ],
    "description": "new whatever",
    "_etag": {
      "$oid": "5f1892ec296c81742a6a3e4b"
    }
  }
]

Resources

General resources

An unauthenticated Health check is in included in the Meta V3 API to let any user know the current condition of the service.

Health Check

An unauthenticated request for the health status of Meta V3 API.

With CURL:

$ curl -v -X GET -H "Content-Type:application/json" $BASE_URL/v3/meta/

The response will be a Basic Tapis response on health:

{
  "result": "",
  "status": "200",
  "message": "OK",
  "version": "0.0.1"
}

Root resources

The Root resource space represents the root namespace for databases on the MongoDb host. All databases are located here. Requests to this space are limited to READ only for tenant administrators.

List DB Names

A request to the Root resource will list Database names found on the server. This request has been limited to those users with tenant administrative roles.

With CURL:

$ curl -v -X GET -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/

The response will a json list of database names:

[
    "StreamsDevDB",
    "v1airr"
]

Database resources

The Database resource is the top level for many tenant projects. The resource maps directly to a MongoDb named database in the database server. Case matters for matching the name of the database and must be specified when making requests for collections or documents. Currently

List Collection Names

This request will return a list of collection names from the specified database {db}. The permissions for access to the database are set prior to access.

With CURL:

$ curl -v -X GET -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/{db}

Here is an example response:

[
  "streams_alerts_metadata",
  "streams_channel_metadata",
  "streams_instrument_index",
  "streams_project_metadata",
  "streams_templates_metadata",
  "tapisKapa-local"
]

Get DB Metadata

This request will return the metadata properties associated with the database. The core server generates an etag in the _properties collection for a database that is necessary for future deletion.

With CURL:

$ curl -v -X GET -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/{db}/_meta

Here is an example response:

{
   "_id": "_meta",
   "_etag": { "$oid": "5ef6232b296c81742a6a3e02" }
}

Create DB

TODO: this implementation is not exposed. Creation of a database by tenant administrators is scheduled for inclusion in an administrative interface API in a future release.

This request will create a new named database in the MongoDb root space by a tenant or service administrator.

With CURL:

$ curl -v -X PUT -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/{db}

Here is an example response:

{ }

Delete a DB TODO: this implementation is not exposed. Deletion of a database by tenant administrators is scheduled for inclusion in an administrative interface API in a future release.

This request will delete a named database in the MongoDb root space by a tenant or service administrator.

With CURL:

$ curl -v -X DELETE -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/{db}

Here is an example response:


{ }

Collection Resources

The Collection resource allows requests for managing and querying json documents within a MongoDB collection.

Create a Collection

You can create a new collection of documents by specifying a collection name under a specific database. /v3/meta/{db}/{collection}

With CURL:

$ curl -v -X PUT -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/{db}/{collection}

Here is an example response:

Empty response with HTTP status of 201

List Documents

A default number of documents found in the collection are returned in an array of documents.

With CURL:

$ curl -v -X GET -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/{db}/{collection}

The response will look like the following:

[
  {
    "_id": {
      "$oid": "5f1892ece37f7b5a692285e9"
    },
    "name": "test document slt 7.21.2020-14:25",
    "description": "new whatever",
    "_etag": {
      "$oid": "5f1892ec296c81742a6a3e4b"
    }
  },
  {
    "_id": {
      "$oid": "5f1892ece37f7b5a69228533"
    },
    "name": "test document slt 7.21.2020-14:25",
    "description": "new whatever",
    "_etag": {
      "$oid": "5f1892ec296c81742a6a3e444"
    }
  }
]

List Documents Large Query

A default number of documents found in the collection are returned in an array of documents.

With CURL:

$ curl -v -X GET -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" -d @FILENAME '' $BASE_URL/v3/meta/{db}/{collection}/_filter

The response will look like the following:

[
  {
    "_id": {
      "$oid": "5f1892ece37f7b5a692285e9"
    },
    "name": "test document slt 7.21.2020-14:25",
    "description": "new whatever",
    "_etag": {
      "$oid": "5f1892ec296c81742a6a3e4b"
    }
  },
  {
    "_id": {
      "$oid": "5f1892ece37f7b5a69228533"
    },
    "name": "test document slt 7.21.2020-14:25",
    "description": "new whatever",
    "_etag": {
      "$oid": "5f1892ec296c81742a6a3e444"
    }
  }
]

Delete a Collection

This administrative method is only available to tenant or meta administrators and requires an If-Match header parameter of the Etag for the collection. The Etag value, if not already known, can be retrieved from the “_meta” call for a collection.

With CURL:

$ curl -v -X DELETE -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/{db}/{collection}

Here is an example response:

Empty response body with status code 204

Get Collection Size

You can find the given size or number of documents in a given collection by calling “_size” on a collection.

With CURL:

$ curl -v -X GET -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/{db}/{collection}/_size

Here is an example response:

TODO

Get Collection Metadata

You can find the metadata properties of a given collection by calling “_meta” on a collection. This would include the Etag value for a collection that is needed for deletion.

With CURL:

$ curl -v -X GET -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/{db}/{collection}/_meta

Here is an example response:

{
  "_id": "_meta",
  "_etag": {
    "$oid": "5f2b2b7a204ce7637579c85f"
  }
}

Document Resources

Document resources are json documents found in a collection. Reading, creating, deleting and updating documents along with batch processing make up the operations that can be applied to documents in a collection. There various ways to retrieve one or more documents from a collection, including using a filter query parameter and value in the form of a MongoDB query document. Batch addition of documents, as well as, batch updates based on queries is also allowed.

Create a Document

Creating a new document within a collection. Submitting a json document within the request body of a POST request will create a new document within the specified collection with a MongoDB autogenerated “_id”. Batch document addition is possible by POSTing an array of new documents with a request body for the specified collection. The rules for “_id” creation operates the same way on multiple documents as they do with a single document.

The default representation returned is an empty response body along with a 201 Http status code “Created”. However if an additional query parameter named “basic” is added with the value of “true” a basic Tapis response is returned along with the newly created “_id” of the document.

With CURL:

$ curl -v -X POST -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" -d '{"docName":"test doc"}' $BASE_URL/v3/meta/{db}/{collection}

Here is an example response:

Empty response

Multiple documents can be added to a collection by POSTing a json array of documents. The batch addition of documents only supports the default response.

With CURL:

$ curl -v -X POST -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" -d '[{"docName":"test doc1"},{"docName":"test doc2"}]' $BASE_URL/v3/meta/{db}/{collection}

The response body will be empty:

Get a Document

Get a specific document by its “_id”.

With CURL:

$ curl -v -X GET -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt"  $BASE_URL/v3/meta/{db}/{collection}/{document_id}

The response will be the standard json response:

{
    "_id"}

Replace a Document

This call replaces an existing document identified by document id (“_id”), with the json supplied in the request body.

With CURL:

$ curl -v -X PUT -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" -d '{"docName":"test doc another one"}' $BASE_URL/v3/meta/{db}/{collection}/{document_id}

Here is an example response:

TODO

Modify a Document

This call will replace a portion of a document identified by document id (“_id”) with the supplied json.

With CURL:

$ curl -v -X PATCH -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" -d '{"docName":"test changed"}' $BASE_URL/v3/meta/{db}/{collection}/{document_id}

Here is an example response:

TODO

Delete Document

Deleting a document with a specific document id (“_id”), removes it from the collection.

With CURL:

$ curl -v -X DELETE -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/{db}/{collection}/{document_id}

Here is an example response:

TODO

Index Resources

Indexes can help speed up queries of your collection and the API gives you the ability to define and manage your indexes. You can create an index for a collection, list indexes for a collection and delete an index. Indexes can’t be updated they must be deleted and recreated.

List Indexes

List the indexes defined for a collection.

With CURL:

$ curl -v -X POST -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt"  $BASE_URL/v3/meta/{db}/{collection}/_indexes

Here is an example response:

TODO

Create Index

Create a new Index with a new name. To create an index you have to specify the keys and the index options. Let’s create an unique, sparse index on property qty and name our index “qtyIndex”.

PUT /v3/meta/{db}/{collection}/_indexes/qtyIndex

{"keys": {"qty": 1},"ops": {"unique": true, "sparse": true }}

With CURL:

$ curl -v -X PUT -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" -d '{ "keys":  <keys>, "ops": <options> }' $BASE_URL/v3/meta/{db}/{collection}/_indexes/{indexName}

Here is an example response:

TODO

Delete Index

Remove a named Index from the index list.

With CURL:

$ curl -v -X DELETE -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/{db}/{collection}/_indexes/{indexName}

Here is an example response:

TODO

Aggregation Resources

Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. Aggregations in the API are predefined and added to a collections properties. They may also be parameterized for use with multiple sets of inputs.

Create an Aggregation

Create an aggregation pipeline by adding the aggregation to the collection for future execution. The aggregation may have variables that are defined so that a future request may pass variable values for aggregation execution. See “Execute an Aggregation”.

{ "aggrs" : [
      { "stages" : [ { "$match" : { "name" : { "$var" : "n" } } },
            { "$group" : { "_id" : "$name",
                  "avg_age" : { "$avg" : "$age" }
                } }
          ],
        "type" : "pipeline",
        "uri" : "example-pipeline"
      }
    ]
}
Property Mandatory Description
type yes
  • for aggregation pipeline operations is “pipeline”
uri yes
  • specifies the URI when the operation is bound under the path /<db>/<collection>/_aggrs.
stages yes
  • the MongoDB aggregation pipeline stages.

For more information refer to https://docs.mongodb.org/manual/core/aggregation-pipeline/

With CURL:

$ curl -v -X PUT -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt"
     -d '{ "aggrs" : [{ "stages" : [ { "$match" : { "name" : { "$var" : "n" } } },{ "$group" : { "_id" : "$name","avg_age" : { "$avg" : "$age" }} } ],
        "type" : "pipeline","uri" : "example-pipeline"}]}' $BASE_URL/v3/meta/{db}/{collection}

Here is an example response:

TODO

Execute an Aggregation

TODO

With CURL:

$ curl -v -X POST -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/

Here is an example response:

TODO

Delete an Aggregation

TODO

With CURL:

$ curl -v -X POST -H "Content-Type:application/json"  -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/

Here is an example response:

TODO

User Profiles

The Tapis hosted identity service (profiles service) is a RESTful web service that gives organizations a way to create and manage the user accounts within their Tapis tenant. The service is backed by a redundant LDAP instance hosted in multiple datacenters making it highly available. Additionally, passwords are stored using the openldap md5crypt algorithm.

Tenant administrators can manage only a basic set of fields on each user account within LDAP itself. For more complex profiles, we recommend combing the profiles service with the metadata service. See the section on Extending the Basic Profile with the Metadata Service below.

The service uses OAuth2 for authentication, and user’s must have special privileges to create and update user accounts within the tenant. Please work with the Tapis development team to make sure your admins have the user-account-manager role.

In addition to the web service, there is also a basic front-end web application providing user sign up. The web application will suffice for basic user profiles and can be used as a starting point for more advanced use cases.

This service should NOT be used for authenticating users. For details on using OAuth for authentication, see the Authorization Guide


Creating

Create a user account with the following CLI command:

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -X POST \
    -d "username=testuser" \
    -d "password=abcd123" \
    -d "email=testuser@test.com" \
    https://api.tacc.utexas.edu/profiles/v2

Show json response
{
  "message":"User created successfully.",
  "result":{
    "email":"testuser@test.com",
    "first_name":"",
    "full_name":"testuser",
    "last_name":"testuser",
    "mobile_phone":"",
    "phone":"",
    "status":"Active",
    "uid":null,
    "username":"testuser"
  },
  "status":"success",
  "version":"2.0.0-SNAPSHOT-rc3fad"
}

Create a user account by sending a POST request to the profiles service, providing an access token of a user with the user-account-manager role. The fields username, password and email are required to create a new user.

Creating and managing accounts requires a special **user-account-manager* role. As a best practice, we recommend setting up a separate, dedicated, account to handle user management. Please work with the Tapis developer team if this is of interest to your organization.*

The complete list of available fields and their descriptions is provided in the table below.

Field Name Description Required?
username The username for the user; must be unique across the tenant Yes
email The email address for the user Yes
password The password for the user Yes
first_name First name of the user No
last_name Last name of the user No
phone User’s phone number No
mobile_phone User’s mobile phone number No

Note that the service does not do any password strength enforcement or other password management policies. We leave it to each organization to implement the policies best suited for their use case.


Extending with Metadata

Here is an example metadata object for extending a user profile:

Show json example
{
  "name":"user_profile",
  "value":{
    "firstName":"Test",
    "lastName":"User",
    "email":"testuser@test.com",
    "city":"Springfield",
    "state":"IL",
    "country":"USA",
    "phone":"636-555-3226",
    "gravatar":"http://www.gravatar.com/avatar/ed53e691ee322e24d8cc843fff68ebc6"
  }
}

Save the extended profile document to the metadata service with the following CLI command:

tapis metadata update -v -F profile_example.json
Show curl
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -X POST \
    -F "fileToUpload=@profile_ex" \
    https://api.tacc.utexas.edu/meta/v2/data/?pretty=true

Show json response
{
  "status" : "success",
  "message" : null,
  "version" : "2.1.0-rc0c5a",
  "result" : {
    "uuid" : "0001429724043699-5056a550b8-0001-012",
    "owner" : "jstubbs",
    "schemaId" : null,
    "internalUsername" : null,
    "associationIds" : [ ],
    "lastUpdated" : "2015-04-22T12:34:03.698-05:00",
    "name" : "user_profile",
    "value" : {
      "firstName" : "Test",
      "lastName" : "User",
      "email" : "testuser@test.com",
      "city" : "Springfield",
      "state" : "IL",
      "country" : "USA",
      "phone" : "636-555-3226",
      "gravatar" : "http://www.gravatar.com/avatar/ed53e691ee322e24d8cc843fff68ebc6"
    },
    "created" : "2015-04-22T12:34:03.698-05:00",
    "_links" : {
      "self" : {
        "href" : "https://api.tacc.utexas.edu/meta/v2/data/0001429724043699-5056a550b8-0001-012"
      }
    }
  }
}

We do not expect the fields above to provide full support for anything but the most basic profiles. The recommended strategy is to use the profiles service in combination with the metadata service (see Metadata Guide) to store additional information. The metadata service allows you to create custom types using JSON schema, making it more flexible than standard LDAP from within a self-service model. Additionally, the metadata service includes a rich query interface for retrieving users based on arbitrary JSON queries.

The general approach used by existing tenants has been to create a single entry per user where the entry contains all additional profile data for the user. Every metadata item representing a user profile can be identified using a fixed string for the name attribute (e.g., user_profile). The value of the metadata item contains a unique identifier for the user (e.g. username or email address) along with all the additional fields you wish to track on the profile. One benefit of this approach is that it cleanly delineates multiple classes of profiles, for example admin_profile, developer_profile, mathematician_profile, etc. When consuming this information in a web interface, such user-type grouping makes presentation significantly easier.

Another issue to consider when extending user profile information through the Metadata service is ownership. If you create the user’s account, then prompt them to login before entering their extended data, it is possible to create the user’s metadata record under their account. This has the advantage of giving the user full ownership over the information, however it also opens up the possibility that the user, or a third-party application, could modify or delete the record.

A better approach is to use a service account to create all extended profile metadata records and grant the user READ access on the record. This still allows third-party applications to access the user’s information at their request, but prevents any malicious things from happening.

For even quicker access, you can associate the metadata record with the UUID of the user through the associationIds attribute. See the `Metadata Guide <../metadata/introduction.md>`_ for more information about efficient storing and searching of metadata.


Updating

Update a user profile with the following CLI command:

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -X PUT
    -d "password=abcd123&email=testuser@test.com&first_name=Test&last_name=User" \
    https://api.tacc.utexas.edu/profiles/v2/testuser

Show json response
{
  "message":"User updated successfully.",
  "result":{
    "create_time":"20150421153504Z",
    "email":"testuser@test.com",
    "first_name":"Test",
    "full_name":"Test User",
    "last_name":"User",
    "mobile_phone":"",
    "phone":"",
    "status":"Active",
    "uid":0,
    "username":"testuser"
  },
  "status":"success",
  "version":"2.0.0-SNAPSHOT-rc3fad"
}

Updates to existing users can be made by sending a PUT request to https://api.tacc.utexas.edu/profiles/v2/ and passing the fields to update. For example, we can add a gravatar attribute to the account we created above.


Deleting

Delete a user profile with the following CLI command:

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
  -X DELETE https://api.tacc.utexas.edu/profiles/v2/testuser

Show json response
{
  "message": "User deleted successfully.",
  "result": {},
  "status": "success",
  "version": "2.0.0-SNAPSHOT-rc3fad"
}

To delete an existing user, make a DELETE request on their profile resource.

Deleting a user is a destructive action and cannot be undone. Consider the implications of user deletion and the impact on their existing metadata before doing so.


Registration Web Application

The account creation web app provides a simple form to enable user self-sign.

Tapis web app sign in

The web application also provides an email loop for verification of new accounts. The code is open source and freely available from bitbucket: Account Creation Web Application

Most likely you will want to customize the branding and other aspects of the application, but for simple use cases, the Tapis team can deploy a stock instance of the application in your tenant. Work with the Tapis developer team if this is of interest to your organization.

UUID

 /$$   /$$ /$$   /$$ /$$$$$$ /$$$$$$$
| $$  | $$| $$  | $$|_  $$_/| $$__  $$
| $$  | $$| $$  | $$  | $$  | $$  $$
| $$  | $$| $$  | $$  | $$  | $$  | $$
| $$  | $$| $$  | $$  | $$  | $$  | $$
| $$  | $$| $$  | $$  | $$  | $$  | $$
|  $$$$$$/|  $$$$$$/ /$$$$$$| $$$$$$$/
 \______/  \______/ |______/|_______/

The Tapis UUID service resolves the type and representation of one or more Tapis UUID. This is helpful, for instance, when you need to expand the hypermedia response of another resource, get the URL corresponding to a UUID, or fetch the representations of multiple resources in a single request.

Resolving a single UUID

Resolving a uuid
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    https://api.tacc.utexas.edu/uuid/v2/0001409758089943-5056a550b8-0001-002
The response will look something like this:
{
  "uuid":"0001409758089943-5056a550b8-0001-002",
  "type":"FILE",
  "_links":{
    "file":{
      "href":"https://api.tacc.utexas.edu/files/v2/history/system/data.iplantcollaborative.org/nryan/picksumipsum.txt"
    }
  }
}

A single UUID can be resolved by making a GET request on the UUID resource. The response will include the UUID and the type of the resource to which it is associated. The canonical resource URL is available in the hypermedia response. All calls to the UUID API are authenticated, however no permission checks will be made when doing basic resolving.

Expanding a UUID query

Resolving a uuid to a full resource representation
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    https://api.tacc.utexas.edu/uuid/v2/0001409758089943-5056a550b8-0001-002?expand=true&pretty=true
The response will include the entire representation of the resource just as if you queried the Files API.
{
  "internalUsername":null,
  "lastModified":"2014-09-03T10:28:09.943-05:00",
  "name":"picksumipsum.txt",
  "nativeFormat":"raw",
  "owner":"nryan",
  "path":"/home/nryan/picksumipsum.txt",
  "source":"http://127.0.0.1/picksumipsum.txt",
  "status":"STAGING_QUEUED",
  "systemId":"data.iplantcollaborative.org",
  "uuid":"0001409758089943-5056a550b8-0001-002",
  "_links":{
    "history":{
      "href":"https://api.tacc.utexas.edu/files/v2/history/system/data.iplantcollaborative.org/nryan/picksumipsum.txt"
    },
    "self":{
      "href":"https://api.tacc.utexas.edu/files/v2/media/system/data.iplantcollaborative.org/nryan/picksumipsum.txt"
    },
    "system":{
      "href":"https://api.tacc.utexas.edu/systems/v2/data.iplantcollaborative.org"
    }
  }
}

Often times you need more information about the resource associated with the UUID. You can save yourself an API request by adding expand=true to the URL query. The resulting response, if successful, will include the full resource representation of the resource associated with the UUID just as if you had called its URL directly. Filtering is also supported, so you can specify just the fields you want returned in the response.

Resolving multiple UUID

Resolving multiple UUID.
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    https://api.tacc.utexas.edu/uuid/v2/?uuids.eq=0001409758089943-5056a550b8-0001-002,0001414144065563-5056a550b8-0001-007?expand=true&pretty=true
The response will be similar to the following.
[
  {
    "uuid":"0001409758089943-5056a550b8-0001-002",
    "type":"FILE",
    "url":"https://api.tacc.utexas.edu/files/v2/history/system/data.iplantcollaborative.org/nryan/picksumipsum.txt",
    "_links":{
      "file":{
        "href":"https://api.tacc.utexas.edu/files/v2/history/system/data.iplantcollaborative.org/nryan/picksumipsum.txt"
      }
    }
  },
  {
    "uuid":"0001414144065563-5056a550b8-0001-007",
    "type":"JOB",
    "url":"https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007",
    "_links":{
      "file":{
        "href":"https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007"
      }
    }
  }
]

To resolve multiple UUID, make a GET request on the uuids collection and pass the UUID in as a comma-separated list to the uuids query parameter. The response will contain a list of resolved resources in the same order that you requested them.

Expanding multiple UUID

Resolving multiple UUID to their resource representations
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    https://api.tacc.utexas.edu/uuid/v2/?uuids.eq=0001409758089943-5056a550b8-0001-002,0001414144065563-5056a550b8-0001-007?expand=true&pretty=true
The response will include an array of the expanded representations in the order they were requested in the URL query.
Show json response
[
  {
    "id":"$JOB_ID",
    "name":"demo-pyplot-demo-advanced test-1414139896",
    "owner":"$API_USERNAME",
    "appId":"demo-pyplot-demo-advanced-0.1.0",
    "executionSystem":"$PUBLIC_EXECUTION_SYSTEM",
    "batchQueue":"debug",
    "nodeCount":1,
    "processorsPerNode":1,
    "memoryPerNode":1.0,
    "maxRunTime":"01:00:00",
    "archive":false,
    "retries":0,
    "localId":"10321",
    "outputPath":null,
    "status":"STOPPED",
    "submitTime":"2014-10-24T04:48:11.000-05:00",
    "startTime":"2014-10-24T04:48:08.000-05:00",
    "endTime":null,
    "inputs":{
      "dataset":"agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv"
    },
    "parameters":{
      "chartType":"bar",
      "height":"512",
      "showLegend":"false",
      "xlabel":"Time",
      "background":"#FFF",
      "width":"1024",
      "showXLabel":"true",
      "separateCharts":"false",
      "unpackInputs":"false",
      "ylabel":"Magnitude",
      "showYLabel":"true"
    },
    "_links":{
      "self":{
        "href":"https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007"
      },
      "app":{
        "href":"https://api.tacc.utexas.edu/apps/v2/demo-pyplot-demo-advanced-0.1.0"
      },
      "executionSystem":{
        "href":"https://api.tacc.utexas.edu/systems/v2/$PUBLIC_EXECUTION_SYSTEM"
      },
      "archiveData":{
        "href":"https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/listings"
      },
      "owner":{
        "href":"https://api.tacc.utexas.edu/profiles/v2/$API_USERNAME"
      },
      "permissions":{
        "href":"https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/pems"
      },
      "history":{
        "href":"https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/history"
      },
      "metadata":{
        "href":"https://api.tacc.utexas.edu/meta/v2/data/?q=%7b%22associationIds%22%3a%220001414144065563-5056a550b8-0001-007%22%7d"
      },
      "notifications":{
        "href":"https://api.tacc.utexas.edu/notifications/v2/?associatedUuid=0001414144065563-5056a550b8-0001-007"
      }
    }
  },
  {
    "internalUsername":null,
    "lastModified":"2014-09-03T10:28:09.943-05:00",
    "name":"picksumipsum.txt",
    "nativeFormat":"raw",
    "owner":"nryan",
    "path":"/home/nryan/picksumipsum.txt",
    "source":"http://127.0.0.1/picksumipsum.txt",
    "status":"STAGING_QUEUED",
    "systemId":"data.iplantcollaborative.org",
    "uuid":"0001409758089943-5056a550b8-0001-002",
    "_links":{
      "history":{
        "href":"https://api.tacc.utexas.edu/files/v2/history/system/data.iplantcollaborative.org/nryan/picksumipsum.txt"
      },
      "self":{
        "href":"https://api.tacc.utexas.edu/files/v2/media/system/data.iplantcollaborative.org/nryan/picksumipsum.txt"
      },
      "system":{
        "href":"https://api.tacc.utexas.edu/systems/v2/data.iplantcollaborative.org"
      }
    }
  }
]

Expansion also works when querying UUID in bulk. Simply add expand=true to the URL query in your request and the full resource representation of each UUID will be returned in an array with the original UUID request order maintained. If any of the resolutions fail due to permission violation or server error, the error response object will be provided rather than resource representation.

Events

 /$$$$$$$$                          /$$
| $$_____/                         | $$
| $$   /$$    /$$/$$$$$$ /$$$$$$$ /$$$$$$  /$$$$$$$
| $$$$|  $$  /$$/$$__  $| $$__  $|_  $$_/ /$$_____/
| $$__/ $$/$$| $$$$$$$| $$  $$ | $$  |  $$$$$$
| $$     $$$/| $$_____| $$  | $$ | $$ /$\____  $$
| $$$$$$$ $/ |  $$$$$$| $$  | $$ |  $$$$/$$$$$$$/
|________/\_/   \_______|__/  |__/  \___/|_______/

Events underpin everything in the Tapis Platform. This section covers the events available to each resource.

Tooling

Sometimes the hardest part of a new project is taking the first step. Tapis Tooling helps make taking that first step a little easier through reference web applications, boilerplate integrations scripts, and integrations with popular CMS and frameworks through native plugins and modules.

Jupyter Hub

Jupyter notebooks (formerly iPython notebooks) provide users with interactive computing documents that contain both computer code and a mix of rich text elements such as data visualizations, text paragraphs, hyperlinks, formatted equations, etc. The code cells in notebooks can be executed interactively, cell by cell, and the results of the executions are displayed in subsequent cells in the notebook. The notebooks can also be exported to a serialized JSON formatted file and executed like a traditional program.

JupyterHub is an open source project to provide multi-user hosted notebook servers as a service. When a user signs in to JupyterHub, a notebook server with pre-configured software is automatically launched for them. The Tapis team integrated JupyterHub into its identity and access management stack and made several other additional enhancements and customizations to enable the use of Tapis’ (Tapis) language SDKs such as agavepy and the CLI, persistent storage, and multiple kernel support, directly from their notebooks with very minimal setup. Tapis’ (Tapis) deployment of JupyterHub, which runs each user’s notebook server in a Docker container to further enhance reproducibility, is freely available for use in Tapis’ (Tapis) Public Tenant.

You can get started with JupyterHub today at https://jupyter.tacc.cloud.

Command Line Interface

The Tapis command-line interface (CLI) is an complete interface to the Tapis REST API. The scripts include support for creating persistent authentication sessions, creating/renaming apps, registering and sharing systems, uploading and managing data, creating PostIts, etc. For existing projects looking to leverage Tapis for back-end processing, for users wishing to integrate Tapis into their existing scripted solutions, or for those new to Tapis who just want to kick the tires, the Tapis CLI is a powerful tool for all of these things. The Tapis CLI can be checked out from the Tapis git repository.

git clone https://github.com/TACC-Cloud/tapis-cli.git

For more information on using the Tapis CLI in common tasks, please consult the Tutorials Section which reference it in all their examples, or check out the Tapis Samples project for sample data and examples of how to use it to populate and interact with your tenant. You can also check out the Tapis CLI Documentation.

Indices and tables