AWS ParallelCluster helps you build and manage cost-efficient HPC Clusters on AWS.
ParallelCluster HPC systems are powered by the latest compute architectures and are as scalable as AWS itself. They feature advanced networking and high-performance storage to handle challenging workloads. But, they are also easy to use, with familiar software environments, and designed with a high degree of security and compliance in mind to help protect your data.
And they’re cloud-ready so you can integrate them with other services and stacks.
Your HPC clusters run on Amazon EC2. With over 500 (and growing) instance types available, you can tailor your cluster’s compute architecture to specific workloads. You can build clusters powered by the latest x86 CPUs from Intel and AMD, AWS Graviton (our Arm-based processors), and accelerators like Trainium or powerful NVIDIA GPUs. If a new instance type meets your needs, adding it to your production infrastructure is as simple as changing a configuration setting.
Rather than being statically provisioned for your peak workload, your AWS-based clusters can be dynamic. This means they scale up their resource footprint when you have jobs to run, then scale back down again when you don’t. ParallelCluster builds on Slurm, a popular open-source job scheduler, and integrates it with AWS scaling technologies like EC2 Fleet and Spot Fleet used by thousands of customers every day to elastically meet their computing demands.
Amazon Elastic Fabric Adapter (EFA), our high-performance data-center scale networking protocol, is built in to most modern EC2 instances – ParallelCluster configures it for you automatically. You can also easily tune inter-instance latency with instance placement groups that put your HPC EC2 instances in close physical proximity in our data centers. And, your clusters can have any combination of public and private networking (VPCs), SSH bastion hosts, static IP addresses, and policy-based network security groups, all configurable using ParallelCluster.
Data is just is important as compute the modern HPC universe. ParallelCluster helps you provision, manage, and use high-performance shared filesystems built securely on your choice of Amazon storage technologies. By default, your cluster’s home directory is exported to all compute nodes from an AWS Elastic Block Store (EBS) volume. You can add additional shared filesystems built on FSx for Lustre (as well as NetApp ONTAP and OpenZFS). If you choose FSx for Lustre, you can integrate with Amazon S3 and Amazon File Cache to manage data lifecycle, control costs, and integrate with on-premises storage.
You can choose from Amazon Linux, Centos, or Ubuntu operating systems to build your clusters. Each feature a best-of-class user environment that supports modern systems for managing software such as GNU modules, Conda, and Spack. You can access them directly over SSH, or indirectly via Amazon Systems Manager or EC2 Instance Connect. You can also use interactive desktops and visualization tools via NICE DCV, which is built-in to ParallelCluster systems. On the management side of things, ParallelCluster has a graphical interface (and a CLI) to model and control all the resources you need for your HPC applications. It also has a web service API you can use for advanced workflow management.
Secure by design
Responsibility for security at AWS is shared between you and AWS. AWS protects the infrastructure that runs all the services it offers, while you protect the assets you run on AWS. ParallelCluster helps with this by offering secure-by-default configurations for user authentication and authorization, networking access, software installation and updates, and more. You can build on these foundations with additional filesystem encryption, IAM policies and roles, integration with Active Directory, networking configuration, and secrets management. These are all provided by AWS services and are usable from within ParallelCluster.
Today’s HPC finds itself integrated with complex instrumentation and business systems, many of which are native to the web. ParallelCluster systems are built on the cloud, using cloud technologies. This means you can integrate them with services like Amazon CloudWatch, AWS Lambda, and AWS Step Functions to process events, orchestrate jobs, manage data, and other mission-critical tasks that involve non-HPC systems.
Learn More About ParallelCluster
First, let’s make sure you’re comfortable that ParallelCluster is the right path for your AWS workloads. We also have 100% cloud-native offering known as AWS Batch. Batch builds on many of the same AWS technologies as ParallelCluster, so it is also scalable, flexible, and adaptable to a wide number of use cases. We have an in-depth blog post that helps explain when to choose AWS Batch versus AWS ParallelCluster. Once you’re read it over, feel free to keep exploring more resources below describing key features and configurations of ParallelCluster.
- Introducing AWS ParallelCluster 3- This blog describes the features of ParallelCluster 3.
- ParallelCluster 3 - built by customers - In this Tech Short video (17m) the ParallelCluster product team talks about new features in version 3.
- ParallelCluster UI - Introducing ParallelCluster’s new UI.
- ParallelCluster 3's config file - Learn how the ParallelCluster configuration file is an example of infrastructure as code (13m).
- Customizing ParallelCluster 3 AMIs - Learn about customizing the virtual machine image that powers your HPC cluster.
Build with ParallelCluster
There are two ways to get ParallelCluster::
We suggest using ParallelCluster UI, a web-based interface for designing and deploying your clusters. It helps you easily integrate fast file systems (like Lustre), visual desktops, and tools to control your spend using Slurm accounting. It takes a few minutes to deploy your own private ParallelCluster console in your account - see the tutorial here.
If you’re familiar with AWS already, or just want a CLI, we have that too. You can install the ParallelCluster Python package on nearly any modern computer. The procedure is documented here but starts with:
$ python3 -m pip install --upgrade "aws-parallelcluster>3"
You can run these workshops in your own AWS account. Most take 1-2 hours to complete and cost between 5 and 15 dollars.
Scale with ParallelCluster
ParallelCluster delivers you a canonical Beowulf cluster experience on AWS, but with added twists like elasticity, and support for fast storage and networking, built-in by design. That means you can run virtually any workload you like on AWS and expect to see great results.
The tabs here feature videos and blog posts describing how different customers have scaled with ParallelCluster - hopefully you’ll recognize some that are familiar to your needs. If not, reach out!
We’re always adding new content to this site and at the AWS HPC blog and Techshorts Youtube channel. Check back often for more articles and videos on ParallelCluster and other HPC technologies from AWS.