Skip to content

Core: Custom Compute Resources

Genomics is a data-heavy workload and requires some modification to the defaults used by AWS Batch for job processing. To efficiently use resources, AWS Batch places multiple jobs on an worker instance. The data requirements for individual jobs can range from a few MB to 100s of GB. Instances running workflow jobs will not know beforehand how much space is required, and need scalable storage to meet unpredictable runtime demands.

To handle this use case, we can use a process that monitors a mountpoint on an instance and expands free space as needed based on capacity thresholds. This can be done using logical volume management and attaching EBS volumes as needed to the instance like so:

Autoscaling EBS storage

The above process - "EBS autoscaling" - requires a few small dependencies and a simple daemon installed on the host instance.

By default, AWS Batch uses the Amazon ECS-Optimized AMI to launch instances for running jobs. This is sufficient in most cases, but specialized needs, such as the large storage requirements noted above, require customization of the base AMI. Because the provisioning requirements for EBS autoscaling are fairly simple and light weight, one can use an EC2 Launch Template to customize instances.

EC2 Launch Template

The simplest method for customizing an instance is to use an EC2 Launch Template. This works best if your customizations are relatively light - such as installing a few small utilities or making specific configuration changes.

This is because Launch Templates run a UserData script when an instance first launches. The longer these scripts / customizations take to complete, the longer it will be before your instance is ready for work.

Launch Templates are capable of pre-configuring a lot of EC2 instance options. Since this will be working with AWS Batch, which already does a lot of automatic instance configuration on its own, you only need to supply a UserData script like the one below:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==BOUNDARY=="

--==BOUNDARY==
Content-Type: text/cloud-config; charset="us-ascii"

packages:
- jq
- btrfs-progs
- sed
- wget
- unzip
# add more package names here if you need them

runcmd:
- curl -s "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "/tmp/awscliv2.zip"
- unzip -q /tmp/awscliv2.zip -d /tmp
- /tmp/aws/install

###
# add provisioning commands here
###

--==BOUNDARY==--

Info

The MIME boundaries are required in the UserData script if the Launch Template is to be used with AWS Batch. See AWS Batch's Launch Template Support documentation to learn more.

In the set of provisioning commands you can add steps that retrieve and install Amazon EBS Autoscale. For example, this documentation comes with a tarball that is publicly hosted on S3, allowing you to run the following to provision your instances:

cd /opt && wget https://aws-genomics-workflows.s3.amazonaws.com/artifacts/amazon-ebs-autoscale.tgz && tar -xzf amazon-ebs-autoscale.tgz
sh /opt/ebs-autoscale/install.sh

The above will install an ebs-autoscale daemon on an instance. By default it will add a 100GB EBS volume to the logical volume mounted at /scratch.

The mount point is specific to your use case, and /scratch is considered a generic default. For instances launched by AWS Batch for running containerized jobs, a good option is to apply Amazon EBS Autoscaling to /var/lib/docker - the location where Docker stores container volumes and metadata and is the location that containers use for their filesystems. Making this auto-expand allows containers to pull in any amount of data they need.

Flexible provisioning

Provisioning needs can change over time. For instance, updrades to existing software or swapping out tools altogether. While you could explicitly write all the commands you initially need in UserData, any future changes will require creating a new revision of the Launch Template and subsequently require rebuilding AWS Batch resources. For more flexibility, you can store your provisioning scripts on S3 and have the Launch Template retrieve and run them. This allows you to create a generic and relatively immutable Launch Template, and update provisioning steps by uploading new scripts to S3.

You can automate this further using a Git repository like AWS CodeCommit to track versions of your provisioning scripts and use CI/CD pipelines with AWS CodeBuild and AWS CodePipeline to deploy new releases to S3.

All of the above can be create in an automated fashion with CloudFormation. Look at the source code for the template below to see how this is done.

Name Description Source Launch Stack
Provisioning Code Creates and installs code and artifacts used to run subsequent templates and provision EC2 instances cloud_download play_arrow

Info

The launch button has been disabled above since this template is part of a set of nested templates. It is not recommended to launch it independently of its intended parent stack.

Creating an EC2 Launch Template

Instructions on how to create a launch template are below. Once your Launch Template is created, you can reference it when you setup resources in AWS Batch to ensure that jobs run therein have your customizations available to them.

Automated via CloudFormation

You can use the following CloudFormation template to create a Launch Template suitable for your needs.

Name Description Source Launch Stack
EC2 Launch Template Creates an EC2 Launch Template that provisions instances on first boot for processing genomics workflow tasks. cloud_download play_arrow

Info

The launch button has been disabled above since this template is part of a set of nested templates. It is not recommended to launch it independently of its intended parent stack.

Manually via the AWS Console

  • Go to the EC2 Console
  • Click on "Launch Templates" (under "Instances")
  • Click on "Create launch template"
  • Under "Launch template name and description"

    • Use "genomics-workflow-template" for "Launch template name"
  • Under "Storage volumes"

    • Click "Add new volume" - this will add an entry called "Volume 3 (custom)"
      • Set Size to 100 GiB
      • Set Delete on termination to Yes
      • Set Device name to /dev/xvdba
      • Set Volume type to General purpose SSD (gp2)
      • Set Encrypted to Yes

Info

Volume 1 is used for the root filesystem. The default size of 8GB is typically sufficient.

Volume 2 is the default volume used by Amazon ECS Optimized AMIs for Docker image and metadata volume using Docker's devicemapper driver. We'll be replacing this volume with the custom one you created, but need to keep it for compatibility.

Volume 3 (the one you created above) will be used for job scratch space. This will be mapped to /var/lib/docker which is used for container storage - i.e. what each running container will use to create its internal filesystem.

  • Expand the "Advanced details" section

    • Add the following script to User data
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==BOUNDARY=="

--==BOUNDARY==
Content-Type: text/cloud-config; charset="us-ascii"

packages:
- jq
- btrfs-progs
- sed
- wget
- unzip
# add more package names here if you need them

runcmd:
- curl -s "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "/tmp/awscliv2.zip"
- unzip -q /tmp/awscliv2.zip -d /tmp
- /tmp/aws/install

###
# add provisioning commands here
###

--==BOUNDARY==--

Important

You will need to replace # add provisioning commands here with something appropriate

  • Click on "Create launch template"

Custom AMIs

A slightly more involved method for customizing an instance is to create a new AMI based on the ECS Optimized AMI. This is good if you have a lot of customization to do - lots of software to install and/or need large datasets preloaded that will be needed by all your jobs.

You can learn more about how to create your own AMIs in the EC2 userguide.

If needed, you can use Custom AMIs and Lauch Templates together - e.g. for a case where you need to ship preloaded datesets or use AMI packaged software from the AWS Marketplace and use the flexible provisioning options Launch Templates provide.

Note

This is considered advanced use. All documentation and CloudFormation templates hereon assumes use of EC2 Launch Templates.