Skip to main content
Review and, where needed, request increases for the quotas below before running ./deploy.sh.
Quota increases that require AWS review can take 24–72 hours — request them in advance.
All quotas are per-region unless noted. Find them in the AWS Service Quotas console.

Summary table

ServiceQuota nameQuota codeDefaultRecommended targetAction required?
EC2Running On-Demand Standard (A,C,D,H,I,M,R,T,Z) instances (vCPUs)L-1216C47A32256Yes — raise before deploy
EC2All G and VT Spot Instance Requests (vCPUs)L-3819A6DF0–32128Yes — often 0 by default
EC2All P Spot Instance Requests (vCPUs)L-7212CCBC0384Yes — always 0 by default
IAMManaged policies attached to a roleL-0DA4ABF31025Yes — raise before deploy
VPCVPCs per regionL-F678F1CE510Yes — raise proactively
VPCNAT gateways per Availability ZoneL-FE5A380F5 per AZ10Yes — raise proactively
EC2Elastic IP addressesL-0263D0A3520Yes — raise proactively
VPCInterface endpoints per VPC(see VPC service quotas)50100Yes if using existing VPC
S3BucketsL-DC2B2D3D100150Yes — raise proactively
EC2Application Load BalancersL-53DA6B972050Yes — raise proactively
EC2Network Load BalancersL-69A177A22050Yes — raise proactively
EKSClusters per regionL-1194D53C100100No
IAMRoles per accountL-FE177D6410001000No
IAMCustomer managed policies per accountL-E95E486215001500No

Detailed breakdown

1. EC2 On-Demand vCPUs (Standard instances) — raise this first

Quota code: L-1216C47A · Default: 32 vCPUs The Terraform deployment creates two managed node groups, and Karpenter then launches additional CPU nodes from the m-family (gen 5+, 2+ vCPUs) to run Impala services. Initial managed node groups (created by ./deploy.sh):
Node groupInstance typevCPUsMin → Desired → Max
CPU nodesm6a.4xlarge160 → 1 → 2
Karpenter bootstrapc6a.large / m6a.large / c5a.large / r6a.large20 → 1 → 2
Minimum at deploy time: 18 vCPUs (1 × m6a.4xlarge + 1 Karpenter bootstrap node). After ./deploy.sh completes, Karpenter launches additional m-family nodes to schedule Impala services (mongodb, rabbitmq, prometheus, openai-api, planner, deployer, scaler, etc.). These workloads collectively require several nodes; the On-Demand quota must cover the full steady-state fleet. Recommended target: 256 vCPUs — covers a full Impala service fleet plus headroom for Karpenter burst scaling without requiring another quota request mid-operation. All instance types above are “Standard” family and count against the same quota. GPU nodes use separate Spot quotas — see sections 2 and 3.

2. GPU Spot instances — G family (g6e) — likely 0 by default

Quota code: L-3819A6DF (All G and VT Spot Instance Requests) · Default: 0–32 vCPUs (varies by region; often 0) Karpenter’s GPU node pool uses g6e.xlarge or g6e.2xlarge Spot instances for inference workloads:
Instance typevCPUsGPUGPU memorySpot vCPUs consumed
g6e.xlarge41× NVIDIA L40S48 GB4
g6e.2xlarge81× NVIDIA L40S48 GB8
Karpenter selects the cheapest available type. To run a single GPU node you need at least 8 vCPUs in this quota (to allow either size to be selected). Recommended target: 128 vCPUs — enough for up to 16× g6e.xlarge or 16× g6e.2xlarge concurrently, giving Karpenter room to scale inference workloads without stalling on quota. Check your current quota:
aws service-quotas get-service-quota \
  --service-code ec2 \
  --quota-code L-3819A6DF \
  --region us-east-1 \
  --query 'Quota.Value'
g6e instances are not available in all regions. Before requesting the quota, verify availability:
aws ec2 describe-instance-type-offerings \
  --filters Name=instance-type,Values=g6e.xlarge \
  --query 'InstanceTypeOfferings[].Location' --output text

3. GPU Spot instances — P family (p5, p5en) — always 0 by default

Quota code: L-7212CCBC (All P Spot Instance Requests) · Default: 0 vCPUs The GPU node pool also lists p5.48xlarge and p5en.48xlarge for high-end workloads. These have a default Spot quota of 0 and require a manual AWS review to increase.
Instance typevCPUsGPUsGPU memoryNotes
p5.48xlarge1928× NVIDIA H100 SXM5640 GBRequires dedicated capacity
p5en.48xlarge1928× NVIDIA H100640 GBRequires dedicated capacity
Recommended target: 384 vCPUs (2× p5.48xlarge concurrently). To use p5/p5en Spot instances:
  1. Open a Service Quota increase request for L-7212CCBC.
  2. Include a brief justification (ML inference workloads).
  3. AWS typically responds within 1–5 business days; capacity is regionally limited.
If p5/p5en Spot capacity is unavailable in your region, Karpenter will fall back to g6e instances automatically. If neither GPU family is available or quota is insufficient, GPU workloads will remain pending.

4. VPC interface endpoints per VPC

Quota name: Interface endpoints per VPC (search under the “Amazon VPC” service in the Service Quotas console) · Default: 50 The networking stack creates 13 interface endpoints by default, plus 1 more when Impala Connect is enabled (enable_impala_connect = true, which is the default):
EndpointAWS service
ecr.dkrContainer image pulls
ecr.apiECR management
stsIAM token exchange
eksEKS API plane
eks-authEKS Pod Identity
ec2EC2 API (Karpenter, node groups)
sqsKarpenter interrupt queue
ssmSystems Manager on nodes
kmsEncryption at rest
elasticloadbalancingALB/NLB provisioning
xrayDistributed tracing
logsCloudWatch Logs
route53Private DNS (cross-region, us-east-1)
impala_connectImpala Connect PrivateLink (default on)
Total: 14 interface endpoints with defaults. Each entry in s3_cross_region_access_regions adds one more S3 interface endpoint per configured region. This is well within the default quota of 50 for a fresh VPC. However, if you set existing_vpc_id to reuse an existing VPC that already hosts many endpoints, check your current count before deploying:
aws ec2 describe-vpc-endpoints \
  --filters Name=vpc-id,Values=<your-vpc-id> Name=vpc-endpoint-type,Values=Interface \
  --query 'length(VpcEndpoints)' --output text
Recommended target: 100 — request this when creating a new VPC too, so future endpoint additions never require a second quota request.

5. VPCs per region

Quota code: L-F678F1CE · Default: 5 ./deploy.sh creates 1 new VPC (CIDR 10.0.0.0/16 by default) unless you set existing_vpc_id to reuse an existing one. Check your current count:
aws ec2 describe-vpcs --query 'length(Vpcs)' --output text
Recommended target: 10 — request this proactively regardless of current count.

6. NAT gateways per Availability Zone

Quota code: L-FE5A380F · Default: 5 per AZ The default deployment creates 1 NAT gateway (single-AZ, cost-optimized). One Elastic IP is consumed per NAT gateway. Check your current count per AZ:
aws ec2 describe-nat-gateways \
  --filter Name=state,Values=available \
  --query 'NatGateways[].SubnetId' --output text
Recommended target: 10 per AZ — request this proactively. If you later switch to single_nat_gateway = false for HA, you need 3 NAT gateways across 3 AZs and no room for error at the default limit of 5.

7. Elastic IP addresses

Quota code: L-0263D0A3 · Default: 5 The deployment allocates 1 Elastic IP for the NAT gateway (more if you set single_nat_gateway = false, one per AZ). Check your current allocation:
aws ec2 describe-addresses --query 'length(Addresses)' --output text
Recommended target: 20 — request this proactively. HA NAT (3 AZs) consumes 3 EIPs; other AWS resources in the account share this pool.

8. S3 buckets

Quota code: L-DC2B2D3D · Default: 100 per account (account-wide, not per region) The Terraform storage stack creates 2 buckets. The Helm charts reference up to 2 additional customer-managed buckets (models storage and a dedicated batches bucket), for a total of 2–4 new buckets:
BucketCreated byPurpose
Resources bucketTerraform (storage stack)Batch metadata, logs, metrics
Batches bucketTerraform (storage stack)Customer batch files
Models bucketCustomer-provided (optional)Tokenizer / model weights (read-only)
Additional batches bucketCustomer-provided (optional)Separate batches storage
Check your current count:
aws s3api list-buckets --query 'length(Buckets)' --output text
Recommended target: 150 — request this proactively. The default 100 is account-wide and fills up faster than expected across environments.

9. EBS volumes — persistent storage from Helm workloads

EBS volumes are provisioned automatically by the EBS CSI driver when Helm charts are deployed. The following PersistentVolumeClaims are created by default (all gp3, encrypted):
WorkloadVolume sizeNote
MongoDB20 GiDatabase data directory
Prometheus50 Gi × 2 replicasMetrics retention (15 days / 40 GB)
RabbitMQ20 GiMessage queue persistence
Data-sync client1 GiState tracking
Alertmanager10 GiDisabled by default; enabled if alerts are configured
Minimum persistent storage at chart deploy time: ~141 GiB across 5 volumes. Additionally, every Karpenter-provisioned node consumes an EBS root/data volume:
  • CPU nodes (Karpenter-managed): 30 Gi gp3 root volume per node
  • GPU nodes (g6e/p5): 4 Gi root (Bottlerocket OS) + 50 Gi data volume per node
EBS volume count and total storage are unlikely to hit AWS quota limits (defaults are in the thousands of TiB), but gp3 IOPS and throughput reserved by the Prometheus and MongoDB volumes may matter on small instance types — both workloads benefit from their persistent volumes being on the same node as the pod (VolumeBindingMode: WaitForFirstConsumer).

10. Load balancers (ALB + NLB)

ALB quota code: L-53DA6B97 · NLB quota code: L-69A177A2 · Default: 20 each Created by Terraform (./deploy.sh):
  • 1 NLB — PrivateLink endpoint service (enable_eks_api_privatelink = true by default)
Created by Helm charts (via AWS Load Balancer Controller):
  • 1 ALB — impala-services ingress (routes deployer, scaler, hardware-estimator)
  • 0–3 optional internal NLBs — one each for deployer, mongodb, and prometheus when internalNLB.enabled: true is set in their chart values (used for cross-region / satellite cluster connectivity)
Maximum total: 1 ALB + 4 NLBs. The default limit of 20 feels comfortable, but the AWS Load Balancer Controller can create additional load balancers for other Kubernetes services over time. Recommended target: 50 for both ALB and NLB — request both proactively. Check current usage first:
aws elbv2 describe-load-balancers --query 'LoadBalancers[].Type' --output text | tr '\t' '\n' | sort | uniq -c

11. EKS clusters

Quota code: L-1194D53C · Default: 100 The deployment creates 1 EKS cluster. The default limit is very unlikely to be an issue.

12. IAM — roles, policies, and attachments per role

12a. IAM roles

Quota code: L-FE177D64 · Default: 1000 per account The deployment creates 12 IAM roles in total (all features enabled):
RoleStackConditional on
eks_clusteriam-baseAlways
eks_nodesiam-baseAlways
impala_access_roleiam-baseAlways
workload_roleiam-oidcOIDC provider available
karpenteriam-oidcenable_karpenter = true + OIDC
vpc_cni_roleiam-oidcOIDC provider available
aws_load_balancer_controlleriam-oidcOIDC provider available
scaleriam-oidcenable_scaler = true + OIDC
external_dnsiam-oidcOIDC provider available
ebs_csieks-csi-rolesOIDC provider available
s3_csieks-csi-rolesOIDC provider available
eni_reconcile (Lambda)privatelinkenable_eks_api_privatelink = true
Rarely a problem, but check if your account is near the limit:
aws iam list-roles --query 'length(Roles)' --output text

12b. Customer managed policies

Quota code: L-E95E4862 · Default: 1500 per account The deployment creates 15 customer managed policies (all features enabled):
PolicyStackConditional on
ecr_read_access_policyiam-baseAlways
service_quotas_readonly_policyiam-baseAlways
models_s3_readonly_access_policyiam-baseAlways
ec2_cluster_access_policyiam-baseAlways
resources_s3_full_access_policyiam-bases3_resources_bucket_arn provided
batches_s3_full_access_policyiam-bases3_batches_bucket_arn provided
batches_s3_list_access_policyiam-bases3_batches_bucket_arn provided
eks_admin_access_policyiam-basecluster_arn provided
karpenter_cluster_access_policyiam-baseenable_karpenter = true + cluster_arn
ec2_read_access_policyiam-oidcOIDC provider available
aws_load_balancer_controller_policyiam-oidcOIDC provider available
sqs_node_metrics_access_policyiam-oidcenable_node_monitoring = true
sqs_node_lifecycle_access_policyiam-oidcenable_node_monitoring = true
sqs_launch_failure_access_policyiam-oidcenable_node_monitoring = true
scaler_eks_access_policyiam-oidcenable_scaler = true
Check current count:
aws iam list-policies --scope Local --query 'length(Policies)' --output text

12c. Managed policies attached to a role — raise this before deploy

This is the most likely IAM quota to be hit. The deployment will fail at the iam-base apply step unless this quota is raised first.
Quota code: L-0DA4ABF3 · Default: 10 per role The eks_nodes role receives 11 managed policy attachments when all default features are enabled:
#PolicyType
1AmazonEKSWorkerNodePolicyAWS managed
2AmazonEKS_CNI_PolicyAWS managed
3AmazonEBSCSIDriverPolicyAWS managed
4AmazonSSMManagedInstanceCoreAWS managed
5AmazonEC2ContainerRegistryReadOnlyAWS managed
6ecr_read_access_policyCustomer managed
7ec2_cluster_access_policyCustomer managed
8models_s3_readonly_access_policyCustomer managed
9resources_s3_full_access_policyCustomer managed (conditional)
10batches_s3_full_access_policyCustomer managed (conditional)
11karpenter_cluster_access_policyCustomer managed (conditional)
With the default configuration (enable_karpenter = true, S3 buckets provided), eks_nodes sits at 11 — one over the default limit of 10. The deployment will fail at the iam-base apply step unless this quota is raised first. Recommended target: 25 — gives room for the current 11 attachments plus future policy additions without needing another quota request. Request the increase:
aws service-quotas request-service-quota-increase \
  --service-code iam \
  --quota-code L-0DA4ABF3 \
  --desired-value 25 \
  --region us-east-1
IAM is a global service. The quota is account-wide, not per-region — request it once in us-east-1 regardless of your deployment region.

Checking and requesting increases

  1. Open the AWS Service Quotas console in the target region.
  2. Search for the quota by name or code.
  3. Select Request quota increase and enter the desired value.
  4. On-Demand EC2 vCPU increases are usually auto-approved within a few hours. Spot GPU quotas (especially P family) require manual AWS review — submit these first, they take the longest.
Request via CLI (replace <your-region> with your deployment region):
# IAM: managed policies per role (global — request once in us-east-1)
aws service-quotas request-service-quota-increase \
  --service-code iam --quota-code L-0DA4ABF3 \
  --desired-value 25 --region us-east-1

# On-Demand Standard vCPUs
aws service-quotas request-service-quota-increase \
  --service-code ec2 --quota-code L-1216C47A \
  --desired-value 256 --region <your-region>

# G/VT Spot vCPUs (g6e inference nodes)
aws service-quotas request-service-quota-increase \
  --service-code ec2 --quota-code L-3819A6DF \
  --desired-value 128 --region <your-region>

# P Spot vCPUs (p5/p5en nodes)
aws service-quotas request-service-quota-increase \
  --service-code ec2 --quota-code L-7212CCBC \
  --desired-value 384 --region <your-region>

# VPCs per region
aws service-quotas request-service-quota-increase \
  --service-code vpc --quota-code L-F678F1CE \
  --desired-value 10 --region <your-region>

# NAT gateways per AZ
aws service-quotas request-service-quota-increase \
  --service-code vpc --quota-code L-FE5A380F \
  --desired-value 10 --region <your-region>

# Elastic IP addresses
aws service-quotas request-service-quota-increase \
  --service-code ec2 --quota-code L-0263D0A3 \
  --desired-value 20 --region <your-region>

# Application Load Balancers
aws service-quotas request-service-quota-increase \
  --service-code elasticloadbalancing --quota-code L-53DA6B97 \
  --desired-value 50 --region <your-region>

# Network Load Balancers
aws service-quotas request-service-quota-increase \
  --service-code elasticloadbalancing --quota-code L-69A177A2 \
  --desired-value 50 --region <your-region>

# S3 buckets (account-wide — request once in us-east-1)
aws service-quotas request-service-quota-increase \
  --service-code s3 --quota-code L-DC2B2D3D \
  --desired-value 150 --region us-east-1

Notes on configuration variants

Deployment optionAdditional quota impact
single_nat_gateway = false (HA NAT)1 NAT gateway + 1 EIP per AZ (typically ×3)
enable_eks_api_privatelink = falseSaves 1 NLB
enable_openai_api_privatelink = true+1 NLB, +1 VPC endpoint service
existing_vpc_id = ... (bring your own VPC)No new VPC or NAT gateway consumed
internalNLB.enabled: true (per-chart)+1 NLB per enabled service (deployer, mongodb, prometheus)
GPU nodes disabled (Karpenter GPU pool removed)No G/VT or P Spot quota needed
Alertmanager enabled+1 EBS volume (10 Gi gp3)
Prometheus replicas > 2+1 × 50 Gi EBS per additional replica

Need help?

Reach out to your Impala contact directly.