AWS Service Networking Guide: EKS, ECS, Lambda, RDS, and More

Deploy AWS services with confidence. Learn networking requirements for EKS, ECS Fargate, Lambda, RDS, ElastiCache, ALB, API Gateway, and MSK with production-ready Terraform examples.

Jun 30, 2024
16 min read
Share:

AWS Service Networking: Practical Deployment Guide

This is a companion guide to AWS Networking Best Practices. While that article covers VPC fundamentals, this one focuses on the practical networking requirements for deploying specific AWS services.

In this series

Understanding VPC fundamentals is essential, but the real challenge comes when deploying actual AWS services. Each service has unique networking requirements that can trip up even experienced engineers. This guide covers what you need to know to successfully deploy the most common AWS services.

Quick Reference: Networking by Service

ServiceSubnet TypeKey Requirements
EKSPrivate (large /19+)Subnet tags, CNI IP planning, pod security groups
ECS FargatePrivateawsvpc mode, ALB integration, service discovery
LambdaPrivate (only if needed)NAT for internet, VPC endpoints for AWS services
RDSDatabase (isolated)Subnet group across 2+ AZs, no public access
ElastiCacheDatabase (isolated)Subnet group, cluster mode considerations
ALB/NLBPublic or PrivateCross-zone, target group networking
API GatewayN/A (managed)VPC Links for private integrations
MSK (Kafka)Private3 AZs recommended, high throughput planning

EKS (Elastic Kubernetes Service)

EKS networking is one of the most complex topics in AWS. The key thing to understand is that AWS VPC CNI assigns real VPC IP addresses to each pod—this is different from other Kubernetes distributions that use overlay networks.

Why this matters: A node with 30 pods needs 30+ IP addresses. If you’re running 100 nodes with 30 pods each, that’s 3,000+ IPs just for pods. Plan your subnets accordingly.

Critical: Subnet Tags for EKS

EKS uses specific tags to auto-discover subnets for load balancer placement. Without these tags, your Kubernetes services won’t be able to provision external or internal load balancers. The kubernetes.io/role/elb tag tells EKS which subnets can host internet-facing load balancers, while kubernetes.io/role/internal-elb identifies subnets for internal load balancers. Additionally, the cluster tag helps EKS identify which subnets belong to which cluster when you have multiple clusters in the same VPC.

EKS VPC with Required Subnet Tags
terraform
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "eks-vpc"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
# Use /19 for private subnets (8,192 IPs each) - EKS pods consume IPs fast
private_subnets = ["10.0.0.0/19", "10.0.32.0/19", "10.0.64.0/19"]
public_subnets = ["10.0.96.0/24", "10.0.97.0/24", "10.0.98.0/24"]
enable_nat_gateway = true
single_nat_gateway = false
one_nat_gateway_per_az = true
# REQUIRED: Tags for EKS subnet auto-discovery
public_subnet_tags = {
"kubernetes.io/role/elb" = 1 # Public LBs (internet-facing)
}
private_subnet_tags = {
"kubernetes.io/role/internal-elb" = 1 # Internal LBs
}
tags = {
"kubernetes.io/cluster/my-cluster" = "shared"
}
}

EKS Security Groups

EKS requires careful security group configuration to allow communication between the control plane and worker nodes. The control plane needs to reach nodes on ports 1025-65535 for kubelet communication, while nodes need to reach the control plane API on port 443. Additionally, nodes must be able to communicate with each other for pod-to-pod networking. The configuration below sets up these security groups with the minimum required rules.

EKS Cluster and Node Security Groups
terraform
# Cluster security group (control plane to nodes)
resource "aws_security_group" "eks_cluster" {
name = "eks-cluster-sg"
description = "EKS cluster security group"
vpc_id = module.vpc.vpc_id
# Allow nodes to communicate with control plane
ingress {
description = "Nodes to cluster API"
from_port = 443
to_port = 443
protocol = "tcp"
security_groups = [aws_security_group.eks_nodes.id]
}
egress {
description = "Cluster to nodes"
from_port = 1025
to_port = 65535
protocol = "tcp"
security_groups = [aws_security_group.eks_nodes.id]
}
tags = {
Name = "eks-cluster-sg"
}
}
# Node security group
resource "aws_security_group" "eks_nodes" {
name = "eks-nodes-sg"
description = "EKS node security group"
vpc_id = module.vpc.vpc_id
# Node to node communication (required for pod networking)
ingress {
description = "Node to node"
from_port = 0
to_port = 0
protocol = "-1"
self = true
}
# Control plane to nodes
ingress {
description = "Cluster API to nodes"
from_port = 1025
to_port = 65535
protocol = "tcp"
security_groups = [aws_security_group.eks_cluster.id]
}
egress {
description = "All outbound"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "eks-nodes-sg"
}
}

EKS IP Planning

  • Default CNI: Each pod gets a VPC IP. A m5.large can run ~29 pods (based on ENI limits)
  • Prefix delegation: Enable to get 16 IPs per ENI slot, supporting 110+ pods per node
  • Secondary CIDR: Add a 100.64.0.0/16 CIDR for pod IPs if running out of space
  • Subnet sizing: Use /19 minimum for production (8,190 usable IPs)

ECS (Elastic Container Service)

ECS networking depends on your launch type (Fargate vs EC2) and network mode. For modern deployments, use Fargate with awsvpc network mode—each task gets its own ENI with a VPC IP address.

Fargate Networking

ECS Fargate with awsvpc network mode gives each task its own elastic network interface (ENI) with a private IP address from your VPC. This provides better security isolation and allows you to apply security groups directly to tasks. The configuration below shows a complete ECS service setup including the cluster, service with network configuration, ALB integration, and the security group that controls traffic to your containers.

ECS Fargate Service with awsvpc Networking
terraform
# ECS Cluster
resource "aws_ecs_cluster" "main" {
name = "production-cluster"
setting {
name = "containerInsights"
value = "enabled"
}
}
# ECS Service with awsvpc networking
resource "aws_ecs_service" "app" {
name = "my-app"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.app.arn
desired_count = 3
launch_type = "FARGATE"
# awsvpc network mode - each task gets its own ENI
network_configuration {
subnets = module.vpc.private_subnets
security_groups = [aws_security_group.ecs_tasks.id]
assign_public_ip = false # Use NAT Gateway for outbound
}
# ALB integration
load_balancer {
target_group_arn = aws_lb_target_group.app.arn
container_name = "app"
container_port = 8080
}
# Service discovery for internal communication
service_registries {
registry_arn = aws_service_discovery_service.app.arn
}
}
# Security group for ECS tasks
resource "aws_security_group" "ecs_tasks" {
name = "ecs-tasks-sg"
description = "Security group for ECS tasks"
vpc_id = module.vpc.vpc_id
# Allow traffic from ALB
ingress {
description = "From ALB"
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
# Allow task-to-task communication (for service mesh/discovery)
ingress {
description = "Task to task"
from_port = 0
to_port = 0
protocol = "-1"
self = true
}
egress {
description = "All outbound"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "ecs-tasks-sg"
}
}

Service Discovery with Cloud Map

AWS Cloud Map provides DNS-based service discovery for your ECS services, allowing containers to find each other using friendly DNS names instead of hardcoded IP addresses. When you register a service with Cloud Map, ECS automatically registers and deregisters task IP addresses as tasks start and stop. Other services can then resolve the service name to get the current IP addresses of healthy tasks.

ECS Service Discovery with Cloud Map
terraform
# Private DNS namespace for service discovery
resource "aws_service_discovery_private_dns_namespace" "main" {
name = "local"
description = "Private DNS namespace for ECS services"
vpc = module.vpc.vpc_id
}
# Service discovery service
resource "aws_service_discovery_service" "app" {
name = "my-app"
dns_config {
namespace_id = aws_service_discovery_private_dns_namespace.main.id
dns_records {
ttl = 10
type = "A"
}
routing_policy = "MULTIVALUE"
}
health_check_custom_config {
failure_threshold = 1
}
}
# Other services can now reach this service at: my-app.local

Lambda Networking

Here’s the most important thing about Lambda networking: Don’t put Lambda in a VPC unless you have to.

Lambda functions run in AWS-managed VPCs by default and can access the internet and AWS services without any VPC configuration. Only configure VPC access when you need to reach private resources like RDS, ElastiCache, or internal APIs.

When to use VPC with Lambda:

  • Accessing RDS, Aurora, or ElastiCache in private subnets
  • Calling internal APIs or services in your VPC
  • Compliance requirements mandating VPC isolation

When NOT to use VPC with Lambda:

  • Calling public APIs or AWS services (use VPC Endpoints instead)
  • Simple functions that don’t need private resource access
  • When cold start latency is critical (VPC adds 1-10 seconds)

Lambda in VPC

When you do need VPC access for Lambda, the configuration is straightforward but has important implications. Lambda creates ENIs in your specified subnets, which means you need enough IP addresses to handle concurrent executions. The security group should follow least-privilege principles—only allow outbound traffic to the specific resources your function needs to access. Adding VPC endpoints for AWS services like Secrets Manager and SSM avoids routing that traffic through NAT Gateway, reducing both cost and latency.

Lambda with VPC Access and Endpoints
terraform
# Lambda function with VPC access
resource "aws_lambda_function" "vpc_lambda" {
filename = "function.zip"
function_name = "my-vpc-function"
role = aws_iam_role.lambda.arn
handler = "index.handler"
runtime = "nodejs18.x"
# VPC configuration - only add if you need private resource access
vpc_config {
subnet_ids = module.vpc.private_subnets
security_group_ids = [aws_security_group.lambda.id]
}
# Increase timeout to account for cold starts
timeout = 30
environment {
variables = {
DB_HOST = aws_db_instance.main.address
}
}
}
# Security group for Lambda
resource "aws_security_group" "lambda" {
name = "lambda-sg"
description = "Security group for Lambda functions"
vpc_id = module.vpc.vpc_id
# Outbound to RDS
egress {
description = "To RDS"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.db_tier.id]
}
# Outbound to ElastiCache
egress {
description = "To ElastiCache"
from_port = 6379
to_port = 6379
protocol = "tcp"
security_groups = [aws_security_group.redis.id]
}
# Outbound HTTPS (for AWS services via NAT or VPC Endpoints)
egress {
description = "HTTPS outbound"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "lambda-sg"
}
}
# IMPORTANT: Add VPC Endpoints to avoid NAT Gateway costs
resource "aws_vpc_endpoint" "lambda_endpoints" {
for_each = toset([
"secretsmanager", # For database credentials
"ssm", # For parameter store
"logs", # For CloudWatch Logs
])
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.${var.region}.${each.value}"
vpc_endpoint_type = "Interface"
subnet_ids = module.vpc.private_subnets
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
tags = {
Name = "${each.value}-endpoint"
}
}

Lambda VPC Best Practices

  • Use VPC Endpoints: Avoid NAT Gateway costs for AWS service calls (Secrets Manager, SSM, S3)
  • Provision Concurrency: Reduces cold starts for VPC Lambdas by keeping ENIs warm
  • Subnet sizing: Lambda can consume many IPs during scale-out. Use /19 subnets minimum
  • Security groups: Be specific—only allow outbound to resources Lambda actually needs

RDS and Aurora

Database networking is straightforward but critical to get right. Databases should always be in isolated subnets with no internet access.

RDS Networking

RDS requires a DB subnet group that spans at least two Availability Zones for high availability. The database should never be publicly accessible in production—all access should come through your application tier or bastion hosts. The security group should only allow connections from the specific security groups that need database access, following the principle of least privilege.

RDS with Subnet Group and Security
terraform
# Database subnet group (required for RDS)
resource "aws_db_subnet_group" "main" {
name = "main-db-subnet-group"
subnet_ids = module.vpc.database_subnets
tags = {
Name = "main-db-subnet-group"
}
}
# RDS instance
resource "aws_db_instance" "main" {
identifier = "production-db"
engine = "postgres"
engine_version = "15.4"
instance_class = "db.r6g.large"
allocated_storage = 100
max_allocated_storage = 500
storage_encrypted = true
db_name = "myapp"
username = "admin"
password = var.db_password
# Networking
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.rds.id]
publicly_accessible = false # NEVER set to true in production
# High availability
multi_az = true
tags = {
Name = "production-db"
}
}
# RDS security group
resource "aws_security_group" "rds" {
name = "rds-sg"
description = "Security group for RDS"
vpc_id = module.vpc.vpc_id
# Only allow from app tier
ingress {
description = "PostgreSQL from app tier"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.app_tier.id]
}
# Allow from Lambda
ingress {
description = "PostgreSQL from Lambda"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.lambda.id]
}
# No egress needed for RDS
tags = {
Name = "rds-sg"
}
}

RDS Proxy for Lambda

When Lambda connects to RDS, each invocation can create a new database connection. This quickly exhausts connection limits. RDS Proxy solves this by pooling connections. The proxy maintains a pool of database connections and multiplexes Lambda invocations across them, dramatically reducing the connection overhead and allowing your database to handle many more concurrent Lambda executions.

RDS Proxy for Connection Pooling
terraform
# RDS Proxy for connection pooling
resource "aws_db_proxy" "main" {
name = "rds-proxy"
debug_logging = false
engine_family = "POSTGRESQL"
idle_client_timeout = 1800
require_tls = true
role_arn = aws_iam_role.rds_proxy.arn
vpc_security_group_ids = [aws_security_group.rds_proxy.id]
vpc_subnet_ids = module.vpc.database_subnets
auth {
auth_scheme = "SECRETS"
iam_auth = "REQUIRED"
secret_arn = aws_secretsmanager_secret.db_credentials.arn
}
tags = {
Name = "rds-proxy"
}
}
# Lambda connects to proxy endpoint instead of RDS directly
# Proxy handles connection pooling automatically

ElastiCache (Redis/Memcached)

ElastiCache follows similar patterns to RDS—isolated subnets, no public access. Redis clusters should be deployed with replication across multiple AZs for high availability, and encryption should be enabled both at rest and in transit. The security group should only allow connections from the application tiers that need cache access.

ElastiCache Redis Cluster
terraform
# ElastiCache subnet group
resource "aws_elasticache_subnet_group" "main" {
name = "main-cache-subnet-group"
subnet_ids = module.vpc.database_subnets
}
# Redis cluster
resource "aws_elasticache_replication_group" "main" {
replication_group_id = "production-redis"
description = "Production Redis cluster"
node_type = "cache.r6g.large"
num_cache_clusters = 2 # Primary + replica
port = 6379
subnet_group_name = aws_elasticache_subnet_group.main.name
security_group_ids = [aws_security_group.redis.id]
at_rest_encryption_enabled = true
transit_encryption_enabled = true
auth_token = var.redis_auth_token
automatic_failover_enabled = true
multi_az_enabled = true
tags = {
Name = "production-redis"
}
}
# Redis security group
resource "aws_security_group" "redis" {
name = "redis-sg"
description = "Security group for ElastiCache Redis"
vpc_id = module.vpc.vpc_id
ingress {
description = "Redis from app tier"
from_port = 6379
to_port = 6379
protocol = "tcp"
security_groups = [
aws_security_group.app_tier.id,
aws_security_group.lambda.id,
aws_security_group.ecs_tasks.id
]
}
tags = {
Name = "redis-sg"
}
}

Application Load Balancer (ALB)

ALBs are typically deployed in public subnets for internet-facing applications, or private subnets for internal services. Internet-facing ALBs need to be in public subnets with routes to an Internet Gateway, while internal ALBs can be in private subnets. Cross-zone load balancing should be enabled to distribute traffic evenly across all registered targets in all enabled Availability Zones.

Public and Internal ALB Configuration
terraform
# Internet-facing ALB
resource "aws_lb" "public" {
name = "public-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = module.vpc.public_subnets
enable_deletion_protection = true
enable_cross_zone_load_balancing = true
tags = {
Name = "public-alb"
}
}
# Internal ALB (for service-to-service communication)
resource "aws_lb" "internal" {
name = "internal-alb"
internal = true
load_balancer_type = "application"
security_groups = [aws_security_group.internal_alb.id]
subnets = module.vpc.private_subnets
tags = {
Name = "internal-alb"
}
}
# ALB security group
resource "aws_security_group" "alb" {
name = "alb-sg"
description = "Security group for public ALB"
vpc_id = module.vpc.vpc_id
ingress {
description = "HTTPS from internet"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTP from internet (redirect to HTTPS)"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
description = "To targets"
from_port = 0
to_port = 0
protocol = "-1"
security_groups = [
aws_security_group.app_tier.id,
aws_security_group.ecs_tasks.id
]
}
tags = {
Name = "alb-sg"
}
}

API Gateway with VPC Integration

API Gateway is a managed service that doesn’t run in your VPC, but you can connect it to private resources using VPC Links. This allows API Gateway to route requests to private ALBs, NLBs, or directly to private IP addresses. VPC Links create a private connection between API Gateway and your VPC, keeping traffic off the public internet.

API Gateway with VPC Link to Private ALB
terraform
# VPC Link for API Gateway to reach private resources
resource "aws_apigatewayv2_vpc_link" "main" {
name = "main-vpc-link"
security_group_ids = [aws_security_group.vpc_link.id]
subnet_ids = module.vpc.private_subnets
tags = {
Name = "main-vpc-link"
}
}
# HTTP API with VPC integration
resource "aws_apigatewayv2_api" "main" {
name = "my-api"
protocol_type = "HTTP"
}
# Integration to private ALB
resource "aws_apigatewayv2_integration" "private_alb" {
api_id = aws_apigatewayv2_api.main.id
integration_type = "HTTP_PROXY"
integration_uri = aws_lb_listener.internal.arn
integration_method = "ANY"
connection_type = "VPC_LINK"
connection_id = aws_apigatewayv2_vpc_link.main.id
}
# Security group for VPC Link
resource "aws_security_group" "vpc_link" {
name = "api-gateway-vpc-link-sg"
description = "Security group for API Gateway VPC Link"
vpc_id = module.vpc.vpc_id
egress {
description = "To internal ALB"
from_port = 443
to_port = 443
protocol = "tcp"
security_groups = [aws_security_group.internal_alb.id]
}
tags = {
Name = "api-gateway-vpc-link-sg"
}
}

S3 Access Patterns

S3 is a regional service that doesn’t run in your VPC, but you can control access patterns using Gateway Endpoints and bucket policies. Gateway Endpoints are free and should always be used—they route S3 traffic through AWS’s private network instead of the internet. You can also restrict bucket access to only allow requests that come through your VPC endpoint, providing an additional layer of security for sensitive data.

S3 Gateway Endpoint with VPC-Restricted Bucket Policy
terraform
# Gateway Endpoint for S3 (FREE - always use this)
resource "aws_vpc_endpoint" "s3" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.${var.region}.s3"
vpc_endpoint_type = "Gateway"
route_table_ids = concat(
module.vpc.private_route_table_ids,
module.vpc.database_route_table_ids
)
tags = {
Name = "s3-gateway-endpoint"
}
}
# S3 bucket policy - restrict to VPC endpoint only
resource "aws_s3_bucket_policy" "private_bucket" {
bucket = aws_s3_bucket.private.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "AllowVPCEndpointOnly"
Effect = "Deny"
Principal = "*"
Action = "s3:*"
Resource = [
aws_s3_bucket.private.arn,
"${aws_s3_bucket.private.arn}/*"
]
Condition = {
StringNotEquals = {
"aws:sourceVpce" = aws_vpc_endpoint.s3.id
}
}
}
]
})
}
# For S3 access from specific VPCs only
resource "aws_s3_bucket_policy" "vpc_restricted" {
bucket = aws_s3_bucket.internal.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "AllowVPCOnly"
Effect = "Deny"
Principal = "*"
Action = "s3:*"
Resource = [
aws_s3_bucket.internal.arn,
"${aws_s3_bucket.internal.arn}/*"
]
Condition = {
StringNotEquals = {
"aws:sourceVpc" = module.vpc.vpc_id
}
}
}
]
})
}

MSK (Managed Streaming for Apache Kafka)

MSK requires careful network planning due to high throughput requirements. Kafka brokers should be deployed across three Availability Zones for durability and high availability. The cluster should be kept private with no public access, and security groups should allow the necessary ports for Kafka client connections (9094 for TLS), Zookeeper (2181), and inter-broker communication.

MSK Kafka Cluster Configuration
terraform
# MSK cluster
resource "aws_msk_cluster" "main" {
cluster_name = "production-kafka"
kafka_version = "3.5.1"
number_of_broker_nodes = 3 # One per AZ
broker_node_group_info {
instance_type = "kafka.m5.large"
client_subnets = module.vpc.private_subnets # Must be in 3 AZs
security_groups = [aws_security_group.msk.id]
storage_info {
ebs_storage_info {
volume_size = 1000 # GB
}
}
connectivity_info {
public_access {
type = "DISABLED" # Keep private
}
}
}
encryption_info {
encryption_in_transit {
client_broker = "TLS"
in_cluster = true
}
}
tags = {
Name = "production-kafka"
}
}
# MSK security group
resource "aws_security_group" "msk" {
name = "msk-sg"
description = "Security group for MSK"
vpc_id = module.vpc.vpc_id
# Kafka broker ports
ingress {
description = "Kafka TLS"
from_port = 9094
to_port = 9094
protocol = "tcp"
security_groups = [
aws_security_group.app_tier.id,
aws_security_group.ecs_tasks.id
]
}
# Zookeeper (if needed)
ingress {
description = "Zookeeper"
from_port = 2181
to_port = 2181
protocol = "tcp"
security_groups = [aws_security_group.app_tier.id]
}
# Broker to broker communication
ingress {
description = "Inter-broker"
from_port = 0
to_port = 0
protocol = "-1"
self = true
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "msk-sg"
}
}

Service Networking Checklist

Before Deploying Any AWS Service

Before deploying any AWS service, verify:

  1. Subnet selection: Right subnet type (public/private/database) for the service
  2. Security groups: Least-privilege rules allowing only required traffic
  3. VPC Endpoints: Add Gateway Endpoints (S3, DynamoDB) and Interface Endpoints for frequently used services
  4. DNS resolution: Enable DNS hostnames and support in VPC settings
  5. IP capacity: Ensure subnets have enough IPs for the service’s scaling needs

Next Steps

With these service-specific configurations, you’re ready to deploy production workloads on AWS.

Continue the series

Additional Resources

Need help? Contact Quabyt for AWS architecture and implementation support.

Back to Blog