AWS Batch

AWS Batch is is an always-on job scheduler and resource orchestrator that lets you easily and efficiently run thousands of containerized applications.

Workflow builders love it for scaling their workloads, from machine learning to genomics. Batch scales from one job to millions of jobs, and takes away the chore of spinning up fleets of compute instances and keeping them busy.

Scale for all your needs

AWS Batch efficiently and dynamically provisions and scales compute on your behalf. Batch can scale from one job to millions of jobs. Our largest analysis (so far) used Batch to orchastrate over five million vCPUs across multiple AWS Regions. And once your work is done, Batch handles scaling down those resources too!

Batch leverages AWS scaling technologies like EC2 Fleet and Spot Fleet used by thousands of customers every day to elastically meet their computing demands.

Learn more about scaling your workloads with AWS Batch


Scalable and Cost-Effective Batch Processing for ML workloads with AWS Batch and Amazon FSx	Introducing fair-share scheduling for AWS Batch
AWS Batch now supports Kubernetes	Span AWS Regions using the AWS Cyclone solution with AWS Batch

Cost and throughput optimized

AWS Batch optimizes for throughput and cost. It does so by scaling compute resources to process jobs in the job queue using allocation strategies to fit your business needs and budget. Batch can also use EC2 Spot Instances to save up to a 90% discount compared to On-Demand prices, with a preference for instance types that are less likely to be interrupted.

Learn more about optimizing costs with AWS Batch


AWS Batch allocations strategies	Using On-Demand or Spot Instances
EC2 Spot Best Practices for AWS Batch	A serverless cost gaurdian solution for AWS Batch

Secure by design

Responsbility for security at AWS is shared between you and AWS. AWS protects the infrastructure that runs all the services it offers, while you protect the assets you run on AWS.

AWS Batch uses IAM to control and monitor the AWS resources that your jobs can access, such as Amazon DynamoDB tables. Through IAM, you can also define policies for different users in your organization. For example, administrators can be granted full access permissions to any AWS Batch API operation, developers can have limited permissions related to configuring compute environments and registering jobs, and end users can be restricted to the permissions needed to submit and delete jobs.

Learn more about HPC security on AWS


Security in AWS Batch	Compliance Validation for AWS Batch
How to authenticate private container registries using AWS Batch	Identity and Access Management for AWS Batch

Advanced scheduling capabilities

With AWS Batch, you can set up multiple queues with different priority levels. Batch jobs are stored in the queues until compute resources are available to run the job. The AWS Batch scheduler evaluates when, where, and how to run jobs that have been submitted to a queue based on the resource requirements of each job. The scheduler evaluates the priority of each queue and runs jobs in priority order on optimal compute resources (for example, memory-optimized compared to CPU-optimized), as long as those jobs have no outstanding dependencies.

Learn more about EC2 for HPC


Introducing fair-share scheduling for AWS Batch	AWS Batch Scheduling Policies User Guide
Video: Fair share scheduling to maximize user happiness in AWS Batch	Workshop: Fair-share scheduling on AWS Batch

Integrated monitoring and logging

AWS Batch displays key operational metrics for your batch jobs in the AWS Management Console. You can view metrics related to compute capacity, as well as metrics for running, pending, and completed jobs. Logs for your jobs (for example, STDERR and STDOUT) are available in the console and are also written to Amazon CloudWatch Logs. You can leverage this information to provide insights on your jobs and the instances used to run them.

Learn more about logging and monitoring with AWS Batch


Custom logging with AWS Batch	AWS Batch Runtime Monitoring Dashboards Solution
Video: New console features including container insights in AWS Batch	A serverless cost gaurdian solution for AWS Batch

Cloud-native

AWS Batch was built on the cloud, using AWS cloud technologies. This means you can integrate AWS Batch with services like Amazon CloudWatch, AWS Lambda, and AWS Step Functions to process events, orchestrate jobs, manage data, and other mission-critical tasks across your entire business, not just your HPC workloads.

Learn more about cloud-native HPC with AWS Batch


Batch can now use Fargate for a truly serverless experience	A serverless cost gaurdian solution for AWS Batch
Rearchitecting AWS Batch managed services to leverage AWS Fargate	Using AWS Batch Console Support for Step Functions Workflows

Latest compute

Your AWS Batch jobs run on Amazon EC2. With over 500 (and growing) instance types available, you can tailor your Batch compute environments to specific workloads. You leverage the latest x86 CPUs from Intel and AMD, AWS Graviton (our Arm-based processors), and accelerators like Trainium or powerful NVIDIA GPUs. If a new instance type meets your needs, adding it to your production infastructure is as simple as changing a configuration setting.

Learn more about EC2 for HPC


Hpc6a - HPC optimzed, AMD x86_64	C7g Instances - HPC-ready AWS Graviton3
P4de - NVIDIA A100s for ML and HPC	Trn1 - Custom processors tuned for ML training
C6i - 3rd Generation Intel Ice Lake	M6a - Large-Memory AMD EPYC

Learn More About AWS Batch

First, let’s make sure you’re comfortable that AWS Batch is the right tool for your workloads. AWS also has a command line tool for standing up traditional HPC clusters called AWS ParallelCluster. AWS ParallelCluster builds on many of the same AWS technologies as AWS Batch, so it is also scalable, flexible, and adaptable to a wide number of use cases.

If you’re familiar with using a traditional HPC resource - like a SLURM cluster - you may wonder what makes AWS Batch different. To find out, read our post to help you choose between AWS Batch or AWS ParallelCluster.

Tools for learning more about AWS Batch


AWS Batch Dos and Don’ts: Best Practices in a Nutshell	What's the difference between canceling and terminating a job in AWS Batch?
Understanding the AWS Batch termination process	Using AWS Batch Console Support for Step Functions Workflows
AWS Batch updates: higher compute utilization, AWS PrivateLink support, and updatable compute environments	Encoding workflow dependencies in AWS Batch

Workflow engines love Batch

If you’re working with Nextflow or Cromwell natively, then you’ll probably love finding out about the AWS Genomics CLI which does pretty much all the boring set up work for you and sets you up for running Nextflow piplines in around half an hour (from a standing start).

Genomics Workflows on AWS - Nextflow
Nextflow's getting started guide for AWS Batch
Nextflow's Summit in 2022 - lots of great talks.

Genomics Workflows on AWS - Cromwell
Cromwell on AWS - blog post by Mark Schreiber about an improved integration.

Metaflow's Season 2 blockbuster on AWS Batch

Use cases

Data Science workflows at insitro: advanced service features from AWS Batch and AWS Glue - [Part 1] [Part 2]
Bayesian ML Models at Scale with AWS Batch [Blog Post] || [Video] - with the data science team from Ampersand in New York.
Scalable and Cost-Effective Batch Processing for ML workloads with AWS Batch and Amazon FSx

Efficient and cost-effective rendering pipelines with Blender and AWS Batch