SC'24

We want to help you get to your first “a ha!” moment in the least possible time - here’s where to start

You probably saw some of our expert builders creating a complex HPC architecture live during the conference. If you were excited about the speed with which it’s possible to deploy completely new architectures and workflows, we’ve pulled some recipes and other resources together that you can adapt to suit your environment.

If you can’t find what you need reach out to us at ask-hpc@amazon.com.

10am-11am - Secure and successful foundation for HPC on AWS with NIST SP 800-223

This session lays out the foundational plumbing for a secure and scalable HPC environment on the cloud. We show architectural best-practices for cost optimization, security, and compliance alignment. We’ll show you how to achieve reproducibility using infrastructure-as-code techniques that will automate (and validate) your environments for deploying workloads.

Resources

We have a blog post, and a recipe that will walk you through the details of setting up a compliant environment.
If you’re running AWS ParallelCluster instead, then this post about securing them in an isolated VPC will help.
If your compliance needs don’t extend to NIST SP 800, then we suggest you look at our network stacks in the HPC Recipes Library - either HPC Basic or HPC Large Scale should do a great job for you.

11am-12pm - Re-imagining HPC with AWS Parallel Computing Service

We introduce AWS Parallel Computing Service (AWS PCS), a new managed service that transforms how organizations operate HPC environments in the cloud. We’ll demonstrate how PCS can eliminate operational overhead while preserving the familiar Slurm experience your users rely on. In real time, we’ll build a production-ready HPC cluster that integrates elastic storage from both NFS and high-performance Lustre, along with advanced compute capabilities like GPUs and our EFA networking. We’ll show you how a single infrastructure can efficiently support both large-scale simulation and model training workloads, simplify operations but maintaining the performance your applications demand.

Resources for PCS

We have a LOT of tutorials and recipes for on-boarding to PCS.

We have a playlist of all our PCS tutorials in our Tech Shorts channel on YouTube.
If you just want to get started with one-click, so you can kick the tires and see how PCS works, then go straight to this tutorial from Matt Vaughn.

12pm-1pm - Deploy a Terascale Lustre file system In minutes, not months

Come see a live build of a massive 1 Terabyte per second shared file system using Amazon FSx for Lustre. We’ll accelerate a large data-set from Amazon S3 and make use of file system semantics and metadata acceleration. You’ll get to see the simplicity of on-demand provisioning of an HPC cluster, too, faster than it took you to even specify the system you’re working on today.

Resources

Amazon FSx for Lustre is probably one of the easiest services to launch and use. Randy Seamens does a great job of explaining it in both a blog post, and a Tech Short on YouTube where you’ll see it in action, driving up to 1 TByte/s of I/O.

There are also some specific recipes available in the HPC Recipes Library, including for alternatives to Lustre if, for example, you’re looking for a scalable solution to /home instead (in which case, check out EFS Simple).

1pm-2m - Providing workspaces and visualization tool for data analysis, and managing budgets

With our Slurm cluster managed with AWS Parallel Computing Service, we’ll explore different options for the end-user experience. We’ll first use a PCS-managed group of Login nodes to provide SSH access to the cluster. Next, we’ll illustrate how you can “Bring your own Login node” to the Research and Engineering Studio (RES) on AWS to provide individual or shared virtual graphical desktops. We’ll demonstrate visualization capabilities using Paraview in RES, powered by Amazon DCV, with hooks into Slurm’s job management.

Resources

We have all the details in a new blog post including a recipe you can swipe from the HPC Recipes Library, so you don’t have to do the hard work.

You can find out more about Research and Engineering Studio on AWS from it’s homepage, our walkthough video on Tech Shorts. We’ve also published a blog post about RES-ready images for AWS ParallelCluster - this same option is also available for Parallel Computing Service, it’s just that the PCS version is easier, because PCS is a full-managed service.

2pm-3pm - Building your performant applications on different architectures on AWS

Now we’ll explore how to build, optimize, and run codes on our clusters making use of the Elastic Fabric Adaptor (EFA) which supports all our CPU and GPU architectures. We’ll show a software build of OpenFOAM using Spack. Then we’ll test this build on multiple architectures, and discuss how to validate that the build has been built to leverage best of each underlying architecture. Finally, we’ll submit a large-scale batch array of simulations that we’ll use later for trainng an ML model for rapid automotive design.

Resources

You’ll probably have seen Matt or one of our other builders using Spack to install performant binaries from source without any fuss, and if this gets you interested in Spack, we think that’s a great thing.

You can see Matt installing Spack as part of a machine image pipeline in EC2 Image Builder on Tech Shorts. If you need some background on why Spack helps, you can’t do better than this blog post from 2022 (an oldie, but it’s an ever green message).

AWS is a major sponsor of the High Performance Software Foundation, of which Spack is a founding project. We deeply believe in the important of open-source to the HPC community, and we encourage you to do a deep dive on Spack to see how it can help in your own environment.

3pm-4pm - Cloud native HPC: Taking a different path to the same destination (an end-to-end AI 4 CAE workflow)

So far we’ve shown how to recreate a traditional HPC environment on the cloud, giving your users with prior knowledge of HPC systems a robust and performant system to leverage for their research. What if your users are researchers have no prior knowledge of HPC and instead prefer to work in data science environments like Jupyter Notebooks? This sessions shows how you can leverage a cloud-native scheduler, AWS Batch, and a popular data science workflow framework, Metaflow, to achieve massive scale for your workloads.

Resources

You’ll see more on this in coming weeks as we prep some content around the work we’ve been doing with our friends from MetaFlow.

4pm-5pm - Using advanced HPC to change how we do engineering design

We can now work through the whole workflow we’ve been building to show you a glimpse of the near future. The simulation runs we kicked off will now serve as a training dataset for building a deep-learning surrogate model. Then we’ll take a hundred unseen geometries and do large-scale batch-inference using the pre-trained models, visualizing the results with our graphical desktops.

Resources

Neil Ashton has been leading this project in AWS, and has presented a number of papers on the topic. He stopped by Tech Shorts to talk about surrogate models and how they can be deployed in aerospace industries, and CAE more generally.

Building an AMD-based HPC cluster in a few minutes using AWS Parallel Computing Service

If you saw our talk at the AMD booth, where we showed you how easy it is to make huge performance gains by running you workloads on our AMD-based Amazon EC2 instances - like the C7a and Hpc7a families - then you’ll be happy to know it’s not hard to get running with this in your own AWS account right away.

We’ve created a short video on HPC Tech Shorts to walk you through building an AMD-based PCS cluster in very little time.

You can launch our custom one-click launcher for AMD, which is part of our HPC Recipes Library - a vast collection of compatible stacks that you can plug together to assemble really complex architectures.

The Recipe Library enables you to get to a final working solution quickly, skipping the learning curve, but still adopting safe and secure best practices. And all the templates we ship are well commented, and documented, so when you’re ready to dive into the details, it’s not hard to learn what’s going on. That means you can make meaningful modifications, forking the recipes, and making it suit your needs for your HPC site.

It’s not for nothing that we say it’s like standing on the shoulders of … someone else :-)

Thanks to our friends at AMD for creating great processors, and helping us prove these stacks.