AWS Unpacked #2: Compute

Categories: AWS

The Engine Room of the Cloud: Flexible Compute for Every Business Need

Blog header image
TL;DR
AWS Compute services form the backbone of cloud-based applications by delivering scalable, flexible, and on-demand computing power. At the center is Amazon EC2, which allows you to spin up virtual machines in minutes, tailor them to your workload, and pay only for what you use. But compute in AWS goes far beyond just EC2. Services like Elastic Load Balancing (ELB), Auto Scaling Groups (ASG), and Capacity Reservations work together to ensure your applications are resilient, high-performing, and cost-efficient under any load. Whether you're handling unpredictable traffic spikes, planning for steady growth, or ensuring mission-critical uptime, AWS Compute gives you the control and agility to architect with confidence.

Amazon EC2: Instances & Key Features

Amazon EC2 (Elastic Compute Cloud) lets you launch virtual servers in AWS. You pick an AMI (machine image with an OS and software) and choose an instance type (size/family). Instances run in Availability Zones and you pay per use. Below are the main instance families:

  • General purpose (e.g. M, T): Balanced CPU, memory, and networking. Good for web servers, dev/test, small databases. (Analogy: a well-rounded sedan.)
  • Compute optimized (C): High CPU-to-memory ratio. Ideal for compute-bound tasks like high-performance computing (HPC) and batch processing. (Like a sports car with more horsepower.)
  • Memory optimized (R, X): High memory-to-CPU ratio. Best for in-memory databases or caching (e.g. Redis, big data analytics). (Think of a moving van carrying lots of stuff.)
  • Storage optimized (I, D, H): Lots of very fast local (NVMe) storage. Designed for workloads that need massive read/write IOPS to local disks (e.g. NoSQL databases, data warehousing). (Like a pickup truck hauling heavy goods.)
  • Accelerated (P, G, F): Instances with GPUs or FPGAs. Use for graphics rendering, machine learning, or specialized compute tasks.

When you look at EC2 instance types, you’ll see names like t3.micro, m5.large, c6g.4xlarge, etc. These names aren’t just random — they follow a consistent structure that tells you a lot about the instance. Let’s break one down with an example: m5.2xlarge; m: Instance family - general purpose in this case; 5: Generation - newer generations have better performance and features; 2xlarge: Size - more vCPUs, memory, network performance compared to xlarge

Here are the most important features of EC2:

  • SSH / EC2 Instance Connect: Most Linux instances use SSH (port 22) for login. You launch with a key pair, or use EC2 Instance Connect to push a one-time key via the console/CLI. (Windows uses RDP on port 3389.) Always ensure the Security Group allows SSH or RDP inbound from your IP.
  • Elastic IP (EIP): A static public IPv4 address allocated to your AWS account. You can attach it to an instance’s network interface. The IP stays yours until you release it. (Analogy: your permanent phone number in the cloud.)
  • AMI (Amazon Machine Image): A template for your instance. AMIs include the OS, any software, and boot configurations. You must specify an AMI (from AWS, community, or one you created) when launching an instance. You can copy AMIs between regions or share them with other accounts.
  • Placement Groups: Control instance placement for networking/performance:
    • Cluster placement packs instances close together in one AZ for ultra-low latency (useful for HPC). Cluster placement requires capacity.
    • Spread placement keeps a small number of instances on distinct hardware across different AZs (reducing correlated failures). Downside - we are limited to 7 instances per AZ per placement group.
    • AWS also offers Partition placement (spreads across partitions, used for large distributed systems - ie. EC2s in many different racks in the same AZ, but can also span across multiple AZs in the same region). Use cases: HDFS, HBase, Cassandra, Kafka.
  • Elastic Network Interfaces (ENI): Virtual network cards for an instance. Each instance has a primary ENI (with one private IP and security groups). You can attach additional ENIs (up to limits) on multi-NIC instance types. ENIs carry attributes (IPs, SGs, MAC) with them if moved. Use ENIs for high availability (swap to standby instance) or multiple IPs on one server. ENIs are bound to a specific AZ.
  • Hibernate: You can hibernate Linux instances at stop. EC2 will save the in-memory RAM to the EBS root volume. On next start, the instance boots back into its previous state (processes resume). It’s slower to start, but great if your app needs a warm cache. (Analogy: putting a laptop to sleep instead of rebooting.) The root EBS volume must be encrypted to do this, can’t be an instance store, and instance RAM size must be less than 150GB. It is available for On-Demand, Reserved and Spot instances and it cannot be hibernated for more than 60 days.
  • User Data Scripts: When you launch an EC2 instance, you often want it to do something as soon as it starts up — like install software, update packages, or configure the server for a specific purpose. User Data is a script (typically shell script for Linux, PowerShell for Windows) that you provide when launching an EC2 instance. It runs only once, right after the instance boots for the first time. It’s part of what’s called bootstrapping — automatically preparing your instance when it first launches. Some common use cases include: Installing Apache/Nginx web server, pulling application code from GitHub, setting environment variables, installing updates and security patches, mounting EFS or other storage, creating cron jobs, writing startup logs to S3.

EC2 Pricing Options

EC2 offers several pricing models for different use cases:

  • On-Demand: No long-term commitment. You pay per second (60s minimum) for instances. It’s flexible but more expensive. (Good for spikes or unpredictable usage.)
  • Reserved Instances (RIs): Commit to a 1- or 3-year term for a specific instance configuration, and get a big discount (as much as ~30–65% off). RIs are actually billing discounts applied to matching usage. You choose between Standard (more discount, less flexible) or Convertible Reserved Instances (swap types, slightly lower discount). Think of RIs like buying a device on a payment plan: you save money by committing upfront.
  • Savings Plans: Commit to consistent compute spend ($/hr) for 1-3 years to save up to ~72%. More flexible than RIs – you get the discount no matter which EC2 instance or even region you use. (AWS recommends Savings Plans now for cost savings.) It’s like an energy contract: you promise to pay for 10 hours a month, and you pay less per hour.
  • Spot Instances: Bid on spare capacity at up to 90% off the on-demand price. Extremely low cost, but your instances can be interrupted if AWS needs the capacity back. AWS will give a two-minute warning before reclaiming (if set). Spot is great for flexible, fault-tolerant tasks (batch jobs, analytics). Spot instances can be interrupted by AWS if they need the capacity; or if the spot price goes above your max price (which you may optionally specify when you initially request a spot - if not specified, the market price is used). There are two types of spot requests:
    • One-time spot instances are requested once and runs until terminated, interrupted or stopped. Use case: stateless batch jobs, CI/CD pipelines, test environments
    • Persistent spot request: if instance is interrupted or terminated, AWS will try to launch it again automatically. It persists until you cancel the request.

    Spot fleets are collections of Spot (and optionally on-demand) instance requests managed as a single group. It automatically attempts to diversify across instance types and AZs to maintain capacity. You define target capacity, list of instance types, allocation strategy (lowest price/capacity optimized, diversified).

  • Dedicated Hosts: book an entire physical server and control instance placement, which allows you to address compliance requirements and use your existing server-bound software licenses such as per-socket, per-core, per-VM type of licenses, because you have control over the hardware (sockets, cores, etc).
  • Dedicated Instances: no other customers will share your hardware, but you can’t manage the host itself, and therefore don’t have control over the hardware (sockets, cores, etc) - therefore slightly cheaper than Dedicated Host. Ideal for workloads needing isolation without host management.
  • Capacity Reservations: reserve capacity in a specific AZ for any duration (to ensure you always have capacity available - for example in an emergency). This is not a savings plan - capacity reservations are charged for holding the capacity, whether you use it or not. Combine it with regional reserved instances or savings plans to benefit from billing discounts.

Choose the right mix: steady-state workloads often use RIs/Savings Plans for discount, while on-demand handles bursts and Spot handles fault-tolerant work.

Understanding the Layers: TCP/IP and OSI Basics

Before we delve into the details of Load Balancing, an important feature of EC2, it is important that we understand the concept of “layers” in networking, as referred to in the OSI model. This model helps describe how data moves through a network. It’s not something you configure directly, but it helps to understand what’s happening “under the hood,” especially when learning about services like Elastic Load Balancing (ELB). The OSI model has 7 layers, each with a specific role.

Layer Name What It Does Example
7 Application Interfaces directly with software applications HTTP, HTTPS, SMTP
6 Presentation Translates data formats, handles encryption SSL/TLS, JPEG, ASCII
5 Session Manages sessions or conversations between apps API sessions, NetBIOS
4 Transport Delivers data between devices reliably or quickly TCP, UDP
3 Network Routes data between networks (IP addressing) IP, ICMP
2 Data Link Moves data between physically connected devices Ethernet, MAC addresses
1 Physical Actual physical hardware and transmission Cables, Wi-Fi, switches

Layers 3, 4, and 7 are the most important to understand better, so let’s delve a bit deeper into those.

Layer 3 - The Network Layer

Layer 3 is all about routing data between networks. If you think of Layer 4 (TCP/UDP) as the delivery truck handling how data gets sent, Layer 3 is the GPS—figuring out where it needs to go across a web of networks. What does layer 3 actually do?

  • IP Addressing: Assigns unique addresses to devices (like 192.0.2.1) so they can find and talk to each other.
  • Routing: Finds the best path for data to travel from one network to another (e.g., from a VPC in Cape Town to one in Dublin).
  • Packet Forwarding: Moves data packets from router to router (or hop to hop) toward their destination.
  • Subnetting: Divides networks into smaller chunks for better organization and efficiency.

Layer 4 – The Transport Layer (TCP/UDP)

This layer is responsible for getting data from one machine to another reliably or quickly, depending on the protocol used:

  • TCP (Transmission Control Protocol): Reliable, connection-based. Think of it like a phone call — it makes sure the other side picks up, and if any part of the message is lost, it’ll resend it.
  • UDP ( -gram Protocol): Fast, but no guarantees. Think of it like throwing a flyer into the wind — you send it and hope it gets there.

Elastic Load Balancers that operate at Layer 4 are called “TCP Load Balancers” or “Network Load Balancers”. They don’t look inside the contents of the traffic — they just forward it based on the destination port and IP.

Layer 7 – The Application Layer (HTTP/HTTPS)

This is the top layer where actual applications like web browsers and web servers talk to each other. When you load a website using HTTP or HTTPS, that’s Layer 7 traffic.

Application Load Balancers (ALB) operate here. They understand the content of the request — things like:

  • The URL (/login vs /products)
  • HTTP headers
  • Cookies

Load Balancing: ALB, NLB, GWLB (and CLB)

AWS Elastic Load Balancing offers multiple types:

  • Application Load Balancer (ALB): Operates at Layer 7 (HTTP/HTTPS). Ideal for web applications and microservices. ALB can route by URL path, HTTP Headers or hostname, attach multiple TLS certs (via SNI), and maintain sticky sessions using cookies. It also supports WebSockets and HTTP/2. The application servers don’t see the IP of the client directly, but the true IP of the client is inserted in the header X-Forwarded-For. You can also get the port and protocol used by the client (X-Forwarded-Port and X-Forwarded-Proto). An ALB can automatically redirect HTTP traffic to HTTPS, ensuring secure communication without requiring changes in your application code. This is done by configuring a listener rule on port 80 (HTTP) that issues a 301 (permanent) redirect to the same URL but using the HTTPS protocol. This feature helps enforce HTTPS for all incoming requests, improving security and compliance with best practices. It’s simple to set up via the AWS Console or CLI and works seamlessly, allowing you to centralize SSL enforcement at the load balancer level instead of within your application.
  • Network Load Balancer (NLB): Operates at Layer 4 (TCP/UDP). Extremely high throughput and low latency. NLB preserves the client’s source IP, and each AZ node gets a static IP. Use NLB for non-HTTP protocols or massive scale. (For example, use NLB for a gaming server or VOIP service where preserving IP and ultra performance matters.)
  • Gateway Load Balancer (GWLB): Operates at Layer 3 (network) and is used to deploy third-party appliances. Think of GWLB as a transparent gateway: all traffic flows through it on a fixed port (6081 with GENEVE encapsulation) to a fleet of virtual appliances (like firewalls or IDS) that scale automatically. Only use GWLB when integrating partner networking/security appliances.

Features:

  • ELBs provide a static DNS name rather than a static IP address, which is a key design feature to support high availability and scalability. The DNS name (like my-loadbalancer-1234567890.us-west-2.elb.amazonaws.com) remains consistent even if the underlying infrastructure changes (such as EC2 instance replacements or availability zone rebalancing), making it a reliable and persistent endpoint for clients and services to connect to. This is preferable to hardcoding IP addresses, which may change due to the dynamic nature of AWS infrastructure. Here’s how this works in practice:
    • Static DNS Name: Always points to the current set of healthy targets behind the load balancer, automatically updating as needed.
    • IP Addresses (dynamic): The actual IPs returned by the DNS name may vary over time or per availability zone, depending on how AWS routes traffic.
    • Static IP (possible with NLBs): If you absolutely need static IPs, use a Network Load Balancer (NLB), which supports Elastic IP addresses, allowing you to associate fixed, public IPs with your load balancer.
    • Application & Gateway Load Balancers: These don’t support static IPs, but their DNS names are stable and fully managed by AWS Route 53.

      Using the static DNS name ensures your application remains decoupled from infrastructure changes, which is especially important in auto-scaling or multi-AZ environments.

  • Cross-Zone Balancing: By default, each load balancer node sends traffic only to targets in its own AZ. Enabling cross-zone load balancing lets each node distribute requests evenly across all enabled AZs, which flattens the load distribution. (Without cross-zone, if AZ A has 2 instances and AZ B has 8, the instances in A would get much more per-instance traffic.)
  • Sticky Sessions: ALBs (and Classic LBs) can enable session affinity. Using either application-generated cookies or load-balancer-generated cookies, you can “stick” a client to one target so its session data stays on that server. For example, a shopping cart session might remain on the same instance.
  • SSL/SNI: ALBs support multiple TLS certificates on one listener using SNI, so you can host many HTTPS sites (with different domains) behind one ALB. NLB now also supports TLS termination and multiple certs.
  • Health Checks: All ELBs perform health checks. The load balancer only sends traffic to healthy targets (it stops sending to any instance that fails the health check). This works with ASGs so that unhealthy instances are automatically replaced.
  • Connection Draining (Deregistration Delay): When you remove or terminate an instance behind a Classic Load Balancer or deregister from a target group, connection draining lets existing connections finish before shutdown. (ALBs/NLBs call this deregistration delay.) Without it, in-flight requests are cut off. Think of it like waiting for all customers in line to be served before closing a checkout register.

Analogy: Think of a load balancer as a smart traffic cop at a busy intersection, directing cars (requests) to various lanes (instances) to keep traffic flowing smoothly. Cross-zone balancing is like having cops coordinate lanes across bridges (AZs), and sticky sessions are like seating one customer at the same booth in a restaurant each visit.

Auto Scaling Groups (ASG)

An Auto Scaling Group lets you maintain and dynamically adjust a fleet of EC2 instances. You define a minimum, desired, and maximum capacity, and a launch template/configuration (AMI, instance type, etc.). ASG can span multiple AZs for resilience.

Dynamic Scaling

Dynamic scaling automatically adjusts your Auto Scaling Group (ASG) capacity based on real-time demand, using CloudWatch alarms and metrics. There are several types of dynamic scaling policies:

  • Target Tracking Scaling (Recommended)

    This policy automatically scales your ASG in or out to maintain a specific target for a chosen metric—like keeping average CPU utilization at 50%. It works similarly to a thermostat: AWS adjusts resources to maintain your defined threshold.

    • Use Case: Maintain stable performance under varying load.
    • Common Metrics: CPU utilization, request count per target, or custom CloudWatch metrics.
  • Step Scaling

    Step scaling allows you to define thresholds with corresponding scaling actions. For example:

    • If CPU > 70%, add 2 instances.
    • If CPU > 90%, add 4 instances.

    This approach offers more granular control based on how far metrics deviate from the norm.

    • Use Case: When scaling response should vary based on metric severity.
  • Simple Scaling (Legacy)

    Simple scaling triggers a fixed adjustment (e.g., add 1 instance) when a metric crosses a threshold. It includes a cooldown period before the next scaling action.

    • Use Case: Very basic scaling needs (now largely replaced by target tracking or step scaling).

Scheduled Scaling

Scheduled scaling lets you plan scaling actions ahead of time by specifying when to increase or decrease capacity. For example, you might scale out at 8 AM every weekday to handle predictable increases in traffic.

  • Use Case: Predictable workloads such as office hours, batch processing, or marketing events.
  • Example: Scale to 10 instances at 8 AM, then reduce to 3 instances at 6 PM.

Predictive Scaling

Predictive scaling uses machine learning to forecast future traffic and automatically scale your ASG in advance. AWS analyzes historical data to anticipate traffic spikes or drops, improving response time and resource efficiency.

  • Use Case: Applications with consistent traffic patterns (e.g., daily or weekly cycles).
  • Benefit: Avoids lag by proactively scaling before a spike hits.

Other features

  • Instance Warm-Up: When scaling out, new instances take time to become ready. ASG allows you to configure a warm-up period so it doesn’t overshoot scaling.
  • Health Checks & Replacement: ASG integrates with ELB health checks and EC2 status checks. If an instance is deemed unhealthy, ASG will terminate and replace it.
  • Termination Policies: When scaling in, ASG selects which instance(s) to terminate. By default, it first tries to keep AZs balanced (if one AZ has extra instances, it scales in there). Then it looks for instances using outdated launch configurations or that are closest to the next billing hour. The default is generally: keep balanced, remove oldest launch config, then oldest instance. You can override this order (e.g. choose to terminate newest or oldest) or protect certain instances from scale-in.
  • Scaling cool-down: after a scaling activity, you enter the cooldown period, which is by default, 5 minutes. During this period, the ASG will not launch or terminate additional instances.

Security Groups & Common Ports

Security Groups (SGs) are stateful VPC firewalls for your EC2 instances. Key facts:

  • Inbound vs Outbound: Each SG rule specifies allowed traffic by protocol, port range, and source/destination. By default, inbound is deny all and outbound is allow all. You must explicitly open inbound ports for your use case. For example:
    • Port 22 (TCP) for SSH access to Linux.
    • Port 3389 for RDP to Windows.
    • Ports 80/443 for web servers (HTTP/HTTPS).
    • Port 21 for FTP (if you’re running an FTP server).
    • DB ports (e.g. 3306 for MySQL, 5432 for PostgreSQL), usually open only to application servers or VPC.

    If an inbound rule is missing (e.g. no rule for port 22), the connection is blocked. (A common exam pitfall: forgetting to open port 22, so you suddenly “can’t log in!” to your instance.)

  • Stateful: SGs are stateful. This means return traffic for an allowed outbound request is automatically allowed back in, even if no inbound rule explicitly permits it. Likewise, response traffic to allowed inbound connections is allowed out. (You don’t need a matching inbound rule for the response.)
  • Region/VPC specific: SGs are locked down to a region/VPC combination.
  • Associations: You can attach multiple SGs to an instance (they are additive). SG rules affect all instances using that SG. SGs operate at the instance’s network interface level. There is no extra charge for SGs. You can also attach one SG to multiple instances.

Analogy: Think of a security group as the bouncer at a club door. You tell the bouncer which kinds of guests (IPs/protocols/ports) to let in. If the bouncer doesn’t have your name (no rule), you can’t enter. Once you’re inside and send a drink to a friend (outbound traffic), the friend’s thanks (return traffic) can come in automatically because the bouncer recognizes your table (stateful).

Storage (EBS vs EFS) – Quick Overview

While a deeper dive is for another post, here’s a snapshot:

  • Amazon EBS: High-performance block storage for EC2. These volumes attach to a single instance (like a virtual hard disk). They come in SSD and HDD flavors for different workloads. EBS volumes live in one AZ, but you can snapshot them to S3 and restore across AZs/regions. Data on EBS persists independently of the instance – stopping an instance does not wipe its EBS. (Use EBS for boot/root volumes, databases, and any data that must survive restarts.)
  • Amazon EFS: Fully-managed file storage (NFS v4.0/4.1). It can be mounted concurrently by many instances (even across AZs). EFS automatically scales capacity as you add/remove files, so you never run out of space. Use EFS for shared file systems (e.g. home directories, media repositories). It’s like a network-attached drive that grows on demand.
  • Some EC2 instance types include instance store volumes, which are temporary block storage physically attached to the host machine. These volumes offer very fast I/O performance, making them ideal for use cases like caching, scratch space, or buffer storage during processing. However, the key limitation is that data on instance store is ephemeral — it disappears if the instance is stopped, terminated, or fails. Unlike EBS, instance store volumes cannot be detached, moved to another instance, or used for persistent storage. They are included in the price of the instance and are only available on certain instance families such as i3 or d3. For most applications, EBS or S3 should be used when durability is important.

High Performance Computing (HPC) on EC2

High Performance Computing (HPC) refers to the use of powerful compute resources to solve complex problems that require high levels of computation, memory, and networking. It’s commonly used in fields like computational fluid dynamics, seismic modeling, genomics, financial risk modeling, weather prediction, and machine learning.

With EC2, AWS provides a flexible and scalable platform to run HPC workloads without having to maintain traditional on-premises supercomputers. But to get true HPC-level performance in the cloud, it’s important to go beyond just choosing powerful instance types — networking and orchestration matter too.

Enhanced Networking on EC2

Enhanced Networking is AWS’s umbrella term for technologies that offer higher bandwidth, lower latency, and lower jitter between EC2 instances — crucial for distributed HPC workloads.

There are two main implementations of Enhanced Networking on AWS:

  1. Elastic Network Adapter (ENA)

    ENA is the current-generation enhanced networking option available on most modern EC2 instances. It supports up to 100 Gbps of bandwidth and provides low-latency, high-throughput networking for general-purpose and compute-intensive applications. ENA is the go-to option for most workloads that require good network performance but don’t need tightly coupled parallel processing (like high-volume web servers, large-scale analytics, or basic distributed computing).

  2. Intel 82599 VF (Virtual Function) Interface

    This is the older, legacy implementation of enhanced networking that predates ENA. It’s only available on specific instance families (like C3, R3, and I2) and supports up to 10 Gbps. It’s still technically Enhanced Networking, but most new applications and workloads should use ENA wherever possible due to better performance and broader compatibility.

Elastic Fabric Adapter (EFA)

Elastic Fabric Adapter (EFA) is a specialized network interface that builds on ENA but goes further by allowing EC2 instances to use low-latency, OS-bypass networking. This is critical for tightly coupled HPC applications that need to pass messages rapidly between nodes — for example, applications built with MPI (Message Passing Interface).

EFA enables EC2 to support HPC workloads that traditionally required Infiniband-style interconnects in on-premises clusters. It’s available on select instance types (like C5n, C6gn, and HPC6id) and must be used with supported Linux AMIs and drivers. With EFA, you can run applications like computational chemistry simulations or physics solvers with near-native network latency.

ENA vs. EFA

Feature ENA EFA
Purpose High-throughput networking Low-latency, tightly coupled HPC
Use Cases Web apps, analytics, microservices MPI-based HPC, scientific simulations
Latency Low Ultra-low (OS bypass)
OS/Driver Requirements None (standard AMIs) Custom drivers and supported AMIs
Instance Support Most modern EC2 types Select instances only

In short: ENA is for general enhanced networking, while EFA is purpose-built for HPC workloads that demand ultra-low-latency internode communication.

Orchestrating HPC Workloads: AWS Batch vs. AWS ParallelCluster

Running HPC in the cloud isn’t just about launching instances — you need automation and orchestration. AWS offers two primary services to manage HPC jobs:

  1. AWS Batch

    AWS Batch is a fully managed service for running batch computing jobs. It dynamically provisions compute resources (usually EC2 or Fargate) based on the volume of jobs in the queue. While not specific to HPC, it works well for parallel workloads — those that don’t require inter-node communication (e.g. image processing, Monte Carlo simulations, video rendering).

    You define job queues, compute environments, and job definitions, and Batch handles everything else — scaling, retries, and execution.

  2. AWS ParallelCluster

    ParallelCluster is an open-source cluster management tool that lets you create and manage HPC clusters in AWS using familiar job schedulers like Slurm, PBS, or AWS Batch (as a scheduler backend). It’s better suited for traditional HPC users who are familiar with running tightly coupled, multi-node jobs that require shared storage, job schedulers, and specialized networking (like EFA).

    It supports features like:

    • Head nodes and compute nodes
    • Shared storage (EFS or FSx for Lustre)
    • MPI libraries and job schedulers
    • EFA networking

Which to use?

Feature AWS Batch AWS ParallelCluster
Best For Parallel jobs, stateless tasks HPC clusters with shared storage and schedulers
Networking Requirements Standard ENA usually sufficient EFA often required for tightly coupled jobs
Skill Level Needed More abstracted, minimal cluster knowledge Requires HPC/scheduler knowledge
Setup Complexity Low (managed service) Medium to high (you configure your own cluster)

So, if you’re running simulations that don’t talk to each other, go with AWS Batch. If you’re doing multi-node MPI workloads that rely on shared memory or fast communication, AWS ParallelCluster is your tool.

About the Author

Dawie Loots is a data scientist with a keen interest in using technology to solve real-world problems. He combines data science expertise with his background as a Chartered Accountant and executive leader to help organizations build scalable, value-driven solutions.

Back to Blog