post-thumb post-thumb post-thumb post-thumb post-thumb post-thumb post-thumb

AWS Batch is is an always-on job scheduler and resource orchestrator that lets you easily and efficiently run thousands of containerized applications.

Workflow builders love it for scaling their workloads, from machine learning to genomics. Batch scales from one job to millions of jobs, and takes away the chore of spinning up fleets of compute instances and keeping them busy.


Scale for all your needs

AWS Batch efficiently and dynamically provisions and scales compute on your behalf. Batch can scale from one job to millions of jobs. Our largest analysis (so far) used Batch to orchastrate over five million vCPUs across multiple AWS Regions. And once your work is done, Batch handles scaling down those resources too!

Batch leverages AWS scaling technologies like EC2 Fleet and Spot Fleet used by thousands of customers every day to elastically meet their computing demands.


Cost and throughput optimized

AWS Batch optimizes for throughput and cost. It does so by scaling compute resources to process jobs in the job queue using allocation strategies to fit your business needs and budget. Batch can also use EC2 Spot Instances to save up to a 90% discount compared to On-Demand prices, with a preference for instance types that are less likely to be interrupted.


Secure by design

Responsbility for security at AWS is shared between you and AWS. AWS protects the infrastructure that runs all the services it offers, while you protect the assets you run on AWS.

AWS Batch uses IAM to control and monitor the AWS resources that your jobs can access, such as Amazon DynamoDB tables. Through IAM, you can also define policies for different users in your organization. For example, administrators can be granted full access permissions to any AWS Batch API operation, developers can have limited permissions related to configuring compute environments and registering jobs, and end users can be restricted to the permissions needed to submit and delete jobs.


Advanced scheduling capabilities

With AWS Batch, you can set up multiple queues with different priority levels. Batch jobs are stored in the queues until compute resources are available to run the job. The AWS Batch scheduler evaluates when, where, and how to run jobs that have been submitted to a queue based on the resource requirements of each job. The scheduler evaluates the priority of each queue and runs jobs in priority order on optimal compute resources (for example, memory-optimized compared to CPU-optimized), as long as those jobs have no outstanding dependencies.


Integrated monitoring and logging

AWS Batch displays key operational metrics for your batch jobs in the AWS Management Console. You can view metrics related to compute capacity, as well as metrics for running, pending, and completed jobs. Logs for your jobs (for example, STDERR and STDOUT) are available in the console and are also written to Amazon CloudWatch Logs. You can leverage this information to provide insights on your jobs and the instances used to run them.


Cloud-native

AWS Batch was built on the cloud, using AWS cloud technologies. This means you can integrate AWS Batch with services like Amazon CloudWatch, AWS Lambda, and AWS Step Functions to process events, orchestrate jobs, manage data, and other mission-critical tasks across your entire business, not just your HPC workloads.


Latest compute

Your AWS Batch jobs run on Amazon EC2. With over 500 (and growing) instance types available, you can tailor your Batch compute environments to specific workloads. You leverage the latest x86 CPUs from Intel and AMD, AWS Graviton (our Arm-based processors), and accelerators like Trainium or powerful NVIDIA GPUs. If a new instance type meets your needs, adding it to your production infastructure is as simple as changing a configuration setting.


Learn More About AWS Batch

First, let’s make sure you’re comfortable that AWS Batch is the right tool for your workloads. AWS also has a command line tool for standing up traditional HPC clusters called AWS ParallelCluster. AWS ParallelCluster builds on many of the same AWS technologies as AWS Batch, so it is also scalable, flexible, and adaptable to a wide number of use cases.

If you’re familiar with using a traditional HPC resource - like a SLURM cluster - you may wonder what makes AWS Batch different. To find out, read our post to help you choose between AWS Batch or AWS ParallelCluster.


Workflow engines love Batch

If you’re working with Nextflow or Cromwell natively, then you’ll probably love finding out about the AWS Genomics CLI which does pretty much all the boring set up work for you and sets you up for running Nextflow piplines in around half an hour (from a standing start).

Use cases