Skip to content

Cromwell on AWS Batch

Cromwell on AWS

Cromwell is a workflow management system for scientific workflows developed by the Broad Institute and supports job execution using AWS Batch.

Full Stack Deployment

If you need a Cromwell server backed by AWS now and will worry about the details later, use the CloudFormation template below.

Name Description Source Launch Stack
Cromwell All-in-One Create all resources needed to run Cromwell on AWS: an S3 Bucket, AWS Batch Environment, and Cromwell Server Instance cloud_download play_arrow

When the above stack is complete, navigate to the HostName that is generated in the outputs to access Cromwell via its SwaggerUI, which provides a simple web interface for submitting workflows.

cromwell on aws

Requirements

To get started using Cromwell on AWS you'll need the following setup in your AWS account:

  • The core set of resources (S3 Bucket, IAM Roles, AWS Batch) described in the Getting Started section.
  • Custom Compute Resource (Launch Template or AMI) with Cromwell Additions
  • EC2 Instance as a Cromwell Server

The documentation and CloudFormation templates here will help you get these setup.

Note

For a Cromwell server that will run multiple workflows, or workflows with many steps (e.g. ones with large scatter steps), it is recommended to setup a database to store workflow metadata.

Custom Compute Resource with Cromwell Additions

Follow the instructions on creating a custom compute resources with the following changes:

  • specify the scratch mount point as /cromwell_root
  • make sure that cromwell additions are included in the resource by selecting "cromwell" as the resource type.

Once complete, you will have a resource ID to give to AWS Batch to setup compute environments.

Cromwell Server

To ensure the highest level of security and robustness for long running workflows, it is recommended that you use an EC2 instance as your Cromwell server for submitting workflows to AWS Batch.

A couple things to note:

  • This server does not need to be permanent. In fact, when you are not running workflows, you should stop or terminate the instance so that you are not paying for resources you are not using.

  • You can launch a Cromwell server just for yourself and exactly when you need it.

  • This server does not need to be in the same VPC as the one that Batch will launch instances in.

The following CloudFormation template will create a CromwellServer instance with Cromwell installed, running, and preconfigured to operate with an S3 Bucket and Batch Queue that you define at launch.

Name Description Source Launch Stack
Cromwell Server Create an EC2 instance and an IAM instance profile to run Cromwell cloud_download play_arrow

Once the stack is created, you can access the server in a web browser via the instance's hostname. There you should see Cromwell's SwaggerUI, which provides a simple web interface for submitting workflows.

The CloudFormation template above also configures the server with integration to Amazon CloudWatch for monitoring Cromwell's log output and AWS Systems Manager for performing any maintenance, or gaining terminal access.

For details of how this instance was constructed - e.g. if you want to customize it for your purposes, checkout the template source and read the sections below.

Cromwell server requirements

This instance needs the following:

  • Java 8 (per Cromwell's requirements)
  • The latest version of Cromwell with AWS Batch backend support (v35+)
  • Permissions to
    • read from the S3 bucket used for input and output data
    • submit / describe / cancel / terminate jobs to AWS Batch queues

The permissions above can be added to the instance via policies in an instance profile. Example policies are shown below:

Access to AWS Batch

Lets the Cromwell server instance submit and get info about AWS Batch jobs.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CromwellServer-BatchPolicy",
            "Effect": "Allow",
            "Action": [
                "batch:DescribeJobQueues"
                "batch:DeregisterJobDefinition"
                "batch:TerminateJob"
                "batch:DescribeJobs"
                "batch:CancelJob"
                "batch:SubmitJob"
                "batch:RegisterJobDefinition"
                "batch:DescribeJobDefinitions"
                "batch:ListJobs"
                "batch:DescribeComputeEnvironments"
            ],
            "Resource": "*"
        }
    ]
}

Access to S3

Lets the Cromwell server instance read and write data from/to S3 - i.e. the return codes (written to rc.txt files) for each job.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CromwellServer-S3Policy",
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::<bucket-name>",
                "arn:aws:s3:::<bucket-name>/*",
            ]
        }
    ]
}

Configuring Cromwell to use AWS Batch

The following is an example *.conf file to use the AWSBackend.

// cromwell.conf
include required(classpath("application"))

webservice {
    interface = localhost
    port = 8000
}

// this stanza controls how fast Cromwell submits jobs to AWS Batch
// and avoids running into API request limits
system {
    job-rate-control {
        jobs = 1
        per = 2 second
    }
}

// this stanza defines how your server will authenticate with your AWS
// account.  it is recommended to use the "default-credential-provider" scheme.
aws {
  application-name = "cromwell"
  auths = [{
      name = "default"
      scheme = "default"
  }]

  // you must provide your operating region here - e.g. "us-east-1"
  // this should be the same region your S3 bucket and AWS Batch resources
  // are created in
  region = "<your region>"
}

engine {
  filesystems {
    s3 { auth = "default" }
  }
}

backend {
  // this configures the AWS Batch Backend for Cromwell
  default = "AWSBATCH"
  providers {
    AWSBATCH {
      actor-factory = "cromwell.backend.impl.aws.AwsBatchBackendLifecycleActorFactory"
      config {
        root = "s3://<your-s3-bucket-name>/cromwell-execution"
        auth = "default"

        numSubmitAttempts = 3
        numCreateDefinitionAttempts = 3

        default-runtime-attributes {
          queueArn: "<your-queue-arn>"
        }

        filesystems {
          s3 {
            auth = "default"
          }
        }
      }
    }
  }
}

The above file uses the default credential provider chain for authorization.

Replace the following with values appropriate for your account and workload:

  • <your region> : the AWS region your S3 bucket and AWS Batch environment are deployed into - e.g. us-east-1
  • <your-s3-bucket-name> : the name of the S3 bucket you will use for inputs and outputs from tasks in the workflow.
  • <your-queue-arn> : the Amazon Resoure Name of the AWS Batch queue you want to use for your tasks.

Start the Cromwell server

Note

The CloudFormation template above automatically starts Cromwell on launch. Use the instructions below if you are provisioning your own EC2 instance.

Log into your server using SSH. If you setup a port tunnel, you can interact with Cromwell's REST API from your local machine:

$ ssh -L localhost:8000:localhost:8000 ec2-user@<cromwell server host or ip>

This port tunnel only needs to be open for submitting workflows. You do not need to be connected to the server while a workflow is running.

Launch the server using the following command:

$ java -Dconfig.file=cromwell.conf -jar cromwell-35.jar server

Note

If you plan on having this server run for a while, it is recommended you use a utility like screen or tmux so that you can log out while keeping Cromwell running. Alternatively, you could start Cromwell as a detached process in the background using nohup.

You should now be able to access Cromwell's SwaggerUI from a web browser on your local machine by navigating to:

http://localhost:8000/

Running a workflow

To submit a workflow to your Cromwell server, you can use any of the following:

  • Cromwell's SwaggerUI in a web-browser
  • a REST client like Insomnia or Postman
  • the command line with curl

After submitting a workflow, you can monitor the progress of tasks via the AWS Batch console.

The next section provides some examples of running Crommwell on AWS.