AWS Batch is is an always-on job scheduler and resource orchestrator that lets you easily and efficiently run thousands of containerized applications.
Workflow builders love it for scaling their workloads, from machine learning to genomics. Batch scales from one job to millions of jobs, and takes away the chore of spinning up fleets of compute instances and keeping them busy.
Scale for all your needs
AWS Batch efficiently and dynamically provisions and scales compute on your behalf. Batch can scale from one job to millions of jobs. Our largest analysis (so far) used Batch to orchastrate over five million vCPUs across multiple AWS Regions. And once your work is done, Batch handles scaling down those resources too!
Batch leverages AWS scaling technologies like EC2 Fleet and Spot Fleet used by thousands of customers every day to elastically meet their computing demands.
Cost and throughput optimized
AWS Batch optimizes for throughput and cost. It does so by scaling compute resources to process jobs in the job queue using allocation strategies to fit your business needs and budget. Batch can also use EC2 Spot Instances to save up to a 90% discount compared to On-Demand prices, with a preference for instance types that are less likely to be interrupted.
Secure by design
Responsbility for security at AWS is shared between you and AWS. AWS protects the infrastructure that runs all the services it offers, while you protect the assets you run on AWS.
AWS Batch uses IAM to control and monitor the AWS resources that your jobs can access, such as Amazon DynamoDB tables. Through IAM, you can also define policies for different users in your organization. For example, administrators can be granted full access permissions to any AWS Batch API operation, developers can have limited permissions related to configuring compute environments and registering jobs, and end users can be restricted to the permissions needed to submit and delete jobs.
Advanced scheduling capabilities
With AWS Batch, you can set up multiple queues with different priority levels. Batch jobs are stored in the queues until compute resources are available to run the job. The AWS Batch scheduler evaluates when, where, and how to run jobs that have been submitted to a queue based on the resource requirements of each job. The scheduler evaluates the priority of each queue and runs jobs in priority order on optimal compute resources (for example, memory-optimized compared to CPU-optimized), as long as those jobs have no outstanding dependencies.
Integrated monitoring and logging
AWS Batch displays key operational metrics for your batch jobs in the AWS Management Console. You can view metrics related to compute capacity, as well as metrics for running, pending, and completed jobs. Logs for your jobs (for example, STDERR and STDOUT) are available in the console and are also written to Amazon CloudWatch Logs. You can leverage this information to provide insights on your jobs and the instances used to run them.
Cloud-native
AWS Batch was built on the cloud, using AWS cloud technologies. This means you can integrate AWS Batch with services like Amazon CloudWatch, AWS Lambda, and AWS Step Functions to process events, orchestrate jobs, manage data, and other mission-critical tasks across your entire business, not just your HPC workloads.
Latest compute
Your AWS Batch jobs run on Amazon EC2. With over 500 (and growing) instance types available, you can tailor your Batch compute environments to specific workloads. You leverage the latest x86 CPUs from Intel and AMD, AWS Graviton (our Arm-based processors), and accelerators like Trainium or powerful NVIDIA GPUs. If a new instance type meets your needs, adding it to your production infastructure is as simple as changing a configuration setting.
Learn More About AWS Batch
First, let’s make sure you’re comfortable that AWS Batch is the right tool for your workloads. AWS also has a command line tool for standing up traditional HPC clusters called AWS ParallelCluster. AWS ParallelCluster builds on many of the same AWS technologies as AWS Batch, so it is also scalable, flexible, and adaptable to a wide number of use cases.
If you’re familiar with using a traditional HPC resource - like a SLURM cluster - you may wonder what makes AWS Batch different. To find out, read our post to help you choose between AWS Batch or AWS ParallelCluster.
Workflow engines love Batch
If you’re working with Nextflow or Cromwell natively, then you’ll probably love finding out about the AWS Genomics CLI which does pretty much all the boring set up work for you and sets you up for running Nextflow piplines in around half an hour (from a standing start).
- Genomics Workflows on AWS - Cromwell
- Cromwell on AWS - blog post by Mark Schreiber about an improved integration.
Use cases
- Data Science workflows at insitro: advanced service features from AWS Batch and AWS Glue - [Part 1] [Part 2]
- Bayesian ML Models at Scale with AWS Batch [Blog Post] || [Video] - with the data science team from Ampersand in New York.
- Scalable and Cost-Effective Batch Processing for ML workloads with AWS Batch and Amazon FSx
- Optimize Protein Folding Costs with OpenFold on AWS Batch
- Protein folding in the cloud - a protein primer with Brian Loyal
- AlphaFold vs OpenFold - accelerating time to result in protein folding
- Analyzing Genomic Data using Amazon Genomics CLI and Amazon SageMaker
- Accelerating drug discovery with Amazon EC2 Spot Instances
- Running 20k simulations in 3 days to accelerate early stage drug discovery with AWS Batch
- miniWDL workflows with 100% cloud elasticity, and no DevOps geekery
- Nextflow Tower and how it makes it easy to manage a lot of infrastructure quickly.
- Genomics workflow set made easy with AWS Genomics CLI
- Getting Started with NVIDIA Clara Parabricks on AWS Batch using AWS CloudFormation