AWS Networking Best Practices: VPC, Transit Gateway, and Beyond

Master AWS networking with this comprehensive guide. Learn VPC design, security groups, Transit Gateway, Direct Connect, and cost optimization strategies with production-ready examples.

Jun 29, 2024
18 min read
Share:

AWS Networking: Production-Ready Guide

This is Part 1 of our Cloud Networking series. If you haven’t read the overview, start with Cloud Networking Done Right: Series Overview.

Other parts in this series

Quick Start: Deploy Your First AWS VPC

Get a production-ready VPC running in 5 minutes:

# Save as main.tf and run: terraform init && terraform apply
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "my-production-vpc"
cidr = "10.0.0.0/16" # 65,536 IP addresses
# Deploy across 3 AZs for high availability
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
# Private subnets: For application servers, no direct internet access
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
# Public subnets: For load balancers, NAT gateways
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
# Database subnets: Isolated tier for RDS
database_subnets = ["10.0.201.0/24", "10.0.202.0/24", "10.0.203.0/24"]
# NAT Gateway configuration - High availability
enable_nat_gateway = true
single_nat_gateway = false # Set true for dev/test to save ~$64/month
one_nat_gateway_per_az = true
# Enable DNS
enable_dns_hostnames = true
enable_dns_support = true
# FREE VPC Endpoints - Save on NAT Gateway costs
enable_s3_endpoint = true # Saves $0.045/GB for S3 traffic
enable_dynamodb_endpoint = true # Saves $0.045/GB for DynamoDB traffic
tags = {
Environment = "production"
ManagedBy = "terraform"
}
}
# Output for use in other modules
output "vpc_id" {
value = module.vpc.vpc_id
}
output "private_subnet_ids" {
value = module.vpc.private_subnets
}

Estimated Monthly Cost: $96-150 (3 NAT Gateways + data transfer)

AWS VPC Architecture

Before diving into individual components, let’s understand how they work together in a typical production VPC. This architecture shows a highly available, multi-tier application deployed across two Availability Zones.

What you’re seeing:

  • Public subnets host internet-facing resources (load balancers, NAT Gateways)
  • Private subnets host application servers with no direct internet access
  • Database subnets provide an additional isolation layer for sensitive data
  • Multiple AZs ensure high availability—if one AZ fails, the other continues serving traffic

VPC (10.0.0.0/16) - us-east-1

Availability Zone 1b

Availability Zone 1a

Database Subnet (10.0.202.0/24)

Private Subnet (10.0.2.0/24)

Public Subnet (10.0.102.0/24)

Database Subnet (10.0.201.0/24)

Private Subnet (10.0.1.0/24)

Public Subnet (10.0.101.0/24)

Replication

Outbound

Outbound

Internet

Application Load Balancer

NAT Gateway

App Server 1

RDS Primary

Application Load Balancer

NAT Gateway

App Server 2

RDS Standby

Internet Gateway

Understanding AWS VPC Components

1. VPC (Virtual Private Cloud)

What it is: Your isolated network in AWS where you launch resources.

Key Characteristics:

  • Regional resource (doesn’t span regions)
  • Requires CIDR block (e.g., 10.0.0.0/16)
  • Can have up to 5 CIDR blocks
  • Supports IPv4 and IPv6

VPC Best Practices

  • Use /16 for production VPCs - Provides 65,536 IP addresses for growth
  • Private IP ranges only - Use 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16
  • Plan for growth - Running out of IPs requires complex migration
  • Document IP allocation - Maintain clear records to avoid overlaps

2. Subnets

What they are: Segments of your VPC CIDR block, confined to a single Availability Zone.

Types:

  • Public Subnet: Has route to Internet Gateway, resources can have public IPs
  • Private Subnet: No direct internet access, uses NAT Gateway for outbound
  • Database Subnet: Isolated subnet for databases, no internet access

Subnet Best Practices

  • Multi-AZ deployment - Create subnets across multiple Availability Zones for high availability
  • Use /24 for most subnets - Provides 256 IPs (AWS reserves 5)
  • Clear naming convention - Use descriptive names like prod-public-us-east-1a, prod-private-us-east-1a
  • Reserve ranges - Keep subnet ranges available for future expansion

3. Internet Gateway (IGW)

The Internet Gateway is your VPC’s connection to the public internet. It’s a simple concept but critical to understand: without an IGW, your VPC is completely isolated from the internet.

How it works:

  • Attached to your VPC (one IGW per VPC)
  • Performs NAT for instances with public IP addresses
  • Horizontally scaled, redundant, and highly available by AWS
  • No bandwidth constraints or throughput limits

Key Points:

  • Completely free—no hourly charges or data processing fees
  • Only works for resources with public IP addresses
  • Requires a route in your route table pointing 0.0.0.0/0 to the IGW

When to use:

  • Public-facing resources like Application Load Balancers
  • Bastion hosts that need direct internet access
  • Any resource that needs to be reachable from the internet

4. NAT Gateway

NAT Gateway solves a common problem: your private instances need to download updates, call external APIs, or access AWS services, but you don’t want to give them public IPs. NAT Gateway provides outbound internet connectivity while keeping instances completely private.

How it works:

  • Deployed in a public subnet (needs internet access itself)
  • Private instances route their outbound traffic through the NAT Gateway
  • NAT Gateway translates private IPs to its public IP
  • Return traffic is automatically routed back to the originating instance
  • Important: Only works for outbound traffic—inbound connections are blocked

Cost: $0.045/hour ($32/month) + $0.045/GB data processed

Best Practices:

  • High availability: Deploy one NAT Gateway per AZ (prevents single point of failure)
  • Cost optimization: Use single NAT Gateway in dev/test environments
  • Avoid NAT charges: Use VPC Endpoints for AWS services (S3, DynamoDB, etc.)
  • Monitor costs: Data processing fees can add up—review VPC Flow Logs to identify high-traffic sources

Cost Optimization:

# Development/Test: Single NAT Gateway
module "vpc_dev" {
source = "terraform-aws-modules/vpc/aws"
enable_nat_gateway = true
single_nat_gateway = true # Saves $64/month (2 NAT Gateways)
}
# Production: One NAT Gateway per AZ
module "vpc_prod" {
source = "terraform-aws-modules/vpc/aws"
enable_nat_gateway = true
one_nat_gateway_per_az = true # High availability
}

5. Route Tables

Route tables are like GPS for your VPC traffic—they determine where network packets go. Every subnet must be associated with a route table, and that route table’s rules determine how traffic flows.

How they work:

  • Each route has a destination (CIDR block) and a target (where to send traffic)
  • Routes are evaluated from most specific to least specific
  • Local routes (within VPC) are automatically added and can’t be deleted
  • Each subnet can only be associated with one route table

Key Concepts:

  • Main route table: Created automatically with your VPC, used by default for any subnet without explicit association
  • Custom route tables: Create these for specific routing needs (public vs private subnets)
  • Best practice: Don’t use the main route table for production subnets—create explicit route tables

Example Route Table (Private Subnet):

Destination Target
10.0.0.0/16 local (VPC)
0.0.0.0/0 nat-xxxxx (NAT Gateway)

Example Route Table (Public Subnet):

Destination Target
10.0.0.0/16 local (VPC)
0.0.0.0/0 igw-xxxxx (Internet Gateway)

AWS Security: Defense in Depth

AWS networking security follows a “defense in depth” strategy—multiple layers of security controls that work together. If one layer is breached, others provide backup protection. Think of it like a castle with multiple walls, moats, and gates.

The security layers:

Application Layer

Instance Layer

Network Layer

Edge Protection

Internet

AWS WAF

AWS Shield

AWS Network Firewall

Network ACLs

Security Groups

IAM Roles

Application Security

Security Groups (Most Important!)

Security Groups are your most important security control in AWS. They act as virtual firewalls for your EC2 instances, controlling both inbound and outbound traffic. Understanding security groups is essential for any AWS deployment.

What makes them special:

  • Stateful: If you allow inbound traffic on port 443, the response traffic is automatically allowed back out—you don’t need a separate outbound rule
  • Deny by default: All inbound traffic is blocked unless you explicitly allow it. All outbound traffic is allowed by default
  • Instance-level: Applied to ENIs (Elastic Network Interfaces), not subnets
  • Dynamic: Changes take effect immediately—no need to restart instances
  • Security group references: You can allow traffic from another security group instead of IP ranges (powerful for microservices)

Common mistake: New users often confuse security groups with NACLs. Security groups are stateful and deny by default; NACLs are stateless and require explicit allow/deny rules.

Production Example:

# Web tier security group
resource "aws_security_group" "web_tier" {
name = "web-tier-sg"
description = "Security group for web tier"
vpc_id = module.vpc.vpc_id
# Allow HTTPS from anywhere
ingress {
description = "HTTPS from internet"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# Allow HTTP (redirect to HTTPS at ALB)
ingress {
description = "HTTP from internet"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# Outbound to app tier only
egress {
description = "To app tier"
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.app_tier.id]
}
tags = {
Name = "web-tier-sg"
Tier = "web"
}
}
# Application tier security group
resource "aws_security_group" "app_tier" {
name = "app-tier-sg"
description = "Security group for application tier"
vpc_id = module.vpc.vpc_id
# Only allow traffic from web tier
ingress {
description = "From web tier"
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.web_tier.id]
}
# Outbound to database tier only
egress {
description = "To database tier"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.db_tier.id]
}
# Outbound HTTPS for API calls
egress {
description = "HTTPS for external APIs"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "app-tier-sg"
Tier = "application"
}
}
# Database tier security group
resource "aws_security_group" "db_tier" {
name = "db-tier-sg"
description = "Security group for database tier"
vpc_id = module.vpc.vpc_id
# Only allow traffic from app tier
ingress {
description = "PostgreSQL from app tier"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.app_tier.id]
}
# No outbound internet access
# (Add specific egress rules if needed)
tags = {
Name = "db-tier-sg"
Tier = "database"
}
}

Network ACLs (NACLs)

Network ACLs are often misunderstood and overused. They’re stateless firewalls that operate at the subnet level, providing an additional layer of security beyond Security Groups. However, for most use cases, Security Groups alone are sufficient.

Key difference from Security Groups:

  • Stateless: Unlike Security Groups, NACLs don’t track connection state. If you allow inbound traffic on port 443, you must also explicitly allow the return traffic on ephemeral ports (1024-65535)
  • Subnet-level: Applied to entire subnets, not individual instances
  • Rule evaluation: Rules are numbered and evaluated in order (lowest number first)
  • Allow and Deny: Can explicitly deny traffic (Security Groups can only allow)

When to use NACLs:

  • Block specific IPs: Deny traffic from known malicious IP ranges
  • Compliance requirements: Some regulations require subnet-level controls
  • Additional defense layer: Defense in depth strategy
  • Temporary blocks: Quick way to block traffic to an entire subnet

Best Practice: Start with the default NACL (allows all traffic) and only add custom NACLs when you have a specific security requirement. Most applications don’t need them.

VPC Flow Logs

VPC Flow Logs are your network traffic recorder—they capture information about IP traffic going to and from network interfaces in your VPC. Think of them as your VPC’s black box, essential for troubleshooting, security analysis, and cost optimization.

What they capture:

  • Source and destination IP addresses
  • Source and destination ports
  • Protocol (TCP, UDP, ICMP)
  • Number of packets and bytes
  • Action taken (ACCEPT or REJECT)
  • Timestamps

Why you need them:

  • Troubleshooting: Diagnose why connections are failing (security group rules, routing issues)
  • Security analysis: Detect unusual traffic patterns, potential attacks, or data exfiltration
  • Cost optimization: Identify which resources are generating high data transfer costs
  • Compliance: Many regulations require network traffic logging
  • Forensics: Investigate security incidents after they occur

Cost consideration: Flow Logs are charged based on the amount of data ingested (~$0.50 per GB). For high-traffic VPCs, this can add up. Consider sampling or filtering to specific subnets.

Implementation:

# VPC Flow Logs to CloudWatch
resource "aws_flow_log" "vpc_flow_logs" {
log_destination = aws_cloudwatch_log_group.flow_logs.arn
log_destination_type = "cloud-watch-logs"
traffic_type = "ALL" # or "ACCEPT" or "REJECT"
vpc_id = module.vpc.vpc_id
iam_role_arn = aws_iam_role.flow_logs.arn
tags = {
Name = "vpc-flow-logs"
}
}
resource "aws_cloudwatch_log_group" "flow_logs" {
name = "/aws/vpc-flow-log/main-vpc"
retention_in_days = 90 # Adjust based on compliance needs
tags = {
Name = "vpc-flow-logs"
}
}
# VPC Flow Logs to S3 (cheaper for long-term storage)
resource "aws_flow_log" "vpc_flow_logs_s3" {
log_destination = aws_s3_bucket.flow_logs.arn
log_destination_type = "s3"
traffic_type = "ALL"
vpc_id = module.vpc.vpc_id
# Parquet format for Athena queries
destination_options {
file_format = "parquet"
per_hour_partition = true
}
tags = {
Name = "vpc-flow-logs-s3"
}
}

VPC Endpoints: Save Money on NAT Gateway

Here’s a common problem: Your EC2 instances in private subnets need to access AWS services like S3 or DynamoDB. Without VPC Endpoints, this traffic goes through your NAT Gateway, costing you $0.045/GB in data processing fees. For high-traffic applications, this can mean hundreds of dollars per month in unnecessary costs.

The solution: VPC Endpoints provide private connectivity to AWS services without going through the NAT Gateway or internet. Traffic stays on AWS’s private network, improving security and reducing costs.

Two types of VPC Endpoints:

  1. Gateway Endpoints (FREE): For S3 and DynamoDB only
  2. Interface Endpoints ($7/month + data): For most other AWS services

Cost savings example: If your application transfers 1TB/month to S3 through NAT Gateway:

  • Without VPC Endpoint: 1000 GB × $0.045 = $45/month
  • With Gateway Endpoint: $0/month

That’s $540/year saved per TB of S3 traffic!

Gateway Endpoints (FREE!)

For S3 and DynamoDB only:

# S3 Gateway Endpoint - FREE, no hourly charges
resource "aws_vpc_endpoint" "s3" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.${var.region}.s3"
vpc_endpoint_type = "Gateway"
# Associate with private subnet route tables
route_table_ids = module.vpc.private_route_table_ids
tags = {
Name = "s3-gateway-endpoint"
}
}
# DynamoDB Gateway Endpoint - FREE
resource "aws_vpc_endpoint" "dynamodb" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.${var.region}.dynamodb"
vpc_endpoint_type = "Gateway"
route_table_ids = module.vpc.private_route_table_ids
tags = {
Name = "dynamodb-gateway-endpoint"
}
}

Savings: $0.045/GB that would go through NAT Gateway

Interface Endpoints (Paid)

For other AWS services:

Cost: $0.01/hour ($7/month) + $0.01/GB data processed

# ECR API Endpoint
resource "aws_vpc_endpoint" "ecr_api" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.${var.region}.ecr.api"
vpc_endpoint_type = "Interface"
subnet_ids = module.vpc.private_subnets
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
tags = {
Name = "ecr-api-endpoint"
}
}
# ECR Docker Endpoint
resource "aws_vpc_endpoint" "ecr_dkr" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.${var.region}.ecr.dkr"
vpc_endpoint_type = "Interface"
subnet_ids = module.vpc.private_subnets
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
tags = {
Name = "ecr-dkr-endpoint"
}
}
# Security group for VPC endpoints
resource "aws_security_group" "vpc_endpoints" {
name = "vpc-endpoints-sg"
description = "Security group for VPC endpoints"
vpc_id = module.vpc.vpc_id
ingress {
description = "HTTPS from VPC"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [module.vpc.vpc_cidr_block]
}
tags = {
Name = "vpc-endpoints-sg"
}
}

When to use Interface Endpoints:

  • High traffic to AWS services (>100GB/month)
  • Break-even point: ~$7/month endpoint vs NAT Gateway data processing
  • Services: ECR, CloudWatch Logs, Systems Manager, Secrets Manager

Transit Gateway: Hub-and-Spoke Architecture

As your AWS environment grows, connecting multiple VPCs becomes complex. VPC Peering works for 2-3 VPCs, but with 5+ VPCs, you’d need to create dozens of peering connections (N×(N-1)/2 connections). Transit Gateway solves this by acting as a central hub—each VPC connects once to the Transit Gateway, and the Transit Gateway handles routing between all VPCs.

Why use Transit Gateway:

  • Simplified connectivity: Connect 10 VPCs with 10 attachments instead of 45 peering connections
  • Centralized routing: Manage all inter-VPC routing in one place
  • On-premises integration: Connect your data center once, reach all VPCs
  • Scalability: Supports thousands of VPCs per Transit Gateway
  • Segmentation: Use route tables to control which VPCs can talk to each other

How it works:

  1. Create a Transit Gateway in your region
  2. Attach VPCs to the Transit Gateway (one attachment per VPC)
  3. Configure route tables on the Transit Gateway to control traffic flow
  4. Update VPC route tables to send inter-VPC traffic to the Transit Gateway

Cost: $0.05/hour ($36/month per Transit Gateway) + $0.02/GB data processed

Break-even point: Transit Gateway becomes cost-effective when you have 3+ VPCs that need to communicate. Below that, VPC Peering (free) is usually cheaper.

AWS Transit Gateway

VPN/Direct Connect

Shared Services VPC
(10.2.0.0/16)

DNS

Active Directory

Development VPC
(10.1.0.0/16)

Dev Apps

Production VPC
(10.0.0.0/16)

Applications

On-Premises
Data Center

Central Hub

Implementation:

# Transit Gateway
resource "aws_ec2_transit_gateway" "main" {
description = "Main Transit Gateway"
default_route_table_association = "enable"
default_route_table_propagation = "enable"
dns_support = "enable"
vpn_ecmp_support = "enable"
tags = {
Name = "main-tgw"
}
}
# Attach Production VPC
resource "aws_ec2_transit_gateway_vpc_attachment" "production" {
subnet_ids = module.production_vpc.private_subnets
transit_gateway_id = aws_ec2_transit_gateway.main.id
vpc_id = module.production_vpc.vpc_id
dns_support = "enable"
tags = {
Name = "tgw-attachment-production"
}
}
# Attach Development VPC
resource "aws_ec2_transit_gateway_vpc_attachment" "development" {
subnet_ids = module.development_vpc.private_subnets
transit_gateway_id = aws_ec2_transit_gateway.main.id
vpc_id = module.development_vpc.vpc_id
dns_support = "enable"
tags = {
Name = "tgw-attachment-development"
}
}
# Route from Production VPC to Transit Gateway
resource "aws_route" "production_to_tgw" {
count = length(module.production_vpc.private_route_table_ids)
route_table_id = module.production_vpc.private_route_table_ids[count.index]
destination_cidr_block = "10.0.0.0/8" # All internal traffic
transit_gateway_id = aws_ec2_transit_gateway.main.id
}

When to Use Transit Gateway

  • Multiple VPCs - 3+ VPCs that need to communicate with each other
  • Hybrid cloud - On-premises connectivity via VPN or Direct Connect
  • Centralized control - Need centralized routing and security inspection
  • Cost consideration - Break-even vs VPC Peering at ~3+ VPCs

Direct Connect: Dedicated Network Connection

Direct Connect is AWS’s solution for dedicated, private connectivity between your data center and AWS. Unlike VPN connections that go over the public internet, Direct Connect uses a dedicated fiber connection, providing predictable performance and lower latency.

Why use Direct Connect instead of VPN:

  • Higher bandwidth: 1 Gbps, 10 Gbps, or 100 Gbps vs VPN’s 1.25 Gbps per tunnel
  • Consistent performance: Dedicated bandwidth with predictable latency
  • Lower data transfer costs: $0.02/GB vs $0.09/GB for internet egress
  • Better security: Traffic never touches the public internet
  • Compliance: Some regulations require private connectivity

How it works:

  1. Order a Direct Connect port at an AWS Direct Connect location (colocation facility)
  2. Work with a network provider to establish physical connectivity
  3. Create a Virtual Interface (VIF) to connect to your VPC or Transit Gateway
  4. Configure BGP routing between your router and AWS
  5. Traffic flows over the dedicated connection

Setup time: Typically 2-4 weeks (requires physical circuit provisioning)

Common use cases:

  • Data migration: Transfer large datasets to AWS (faster than internet upload)
  • Hybrid applications: Low-latency connectivity for hybrid cloud workloads
  • Disaster recovery: Reliable connection for replication and backup
  • Production workloads: Predictable performance for mission-critical applications

Cost:

  • Port hours: $0.30/hour for 1Gbps = $216/month
  • Data transfer out: $0.02/GB (cheaper than internet)

Comparison: Direct Connect vs Site-to-Site VPN

Bandwidth:

  • Direct Connect: 1 Gbps, 10 Gbps, or 100 Gbps dedicated
  • Site-to-Site VPN: Up to 1.25 Gbps per tunnel (can use multiple tunnels)

Latency:

  • Direct Connect: Low and consistent (private connection)
  • Site-to-Site VPN: Higher and variable (depends on internet)

Cost:

  • Direct Connect: $216/month (1 Gbps port) + $0.02/GB egress
  • Site-to-Site VPN: $36/month (VPN connection) + $0.09/GB egress

Setup Time:

  • Direct Connect: 2-4 weeks (physical circuit provisioning)
  • Site-to-Site VPN: Minutes (fully self-service)

Reliability:

  • Direct Connect: 99.9% SLA (recommend VPN backup)
  • Site-to-Site VPN: 99.95% SLA (two tunnels for HA)

Best for:

  • Direct Connect: Production workloads, large data transfers, consistent performance needs
  • Site-to-Site VPN: Dev/test environments, backup connectivity, quick setup

Pro tip: Use both! Direct Connect for primary connectivity with VPN as backup.

Implementation:

# Direct Connect Gateway
resource "aws_dx_gateway" "main" {
name = "main-dx-gateway"
amazon_side_asn = 64512
}
# Associate with Transit Gateway
resource "aws_dx_gateway_association" "main" {
dx_gateway_id = aws_dx_gateway.main.id
associated_gateway_id = aws_ec2_transit_gateway.main.id
allowed_prefixes = [
"10.0.0.0/8",
"172.16.0.0/12"
]
}
# Site-to-Site VPN as backup
resource "aws_vpn_connection" "backup" {
customer_gateway_id = aws_customer_gateway.main.id
transit_gateway_id = aws_ec2_transit_gateway.main.id
type = "ipsec.1"
static_routes_only = false
tags = {
Name = "backup-vpn"
}
}

AWS Networking Cost Optimization

Monthly Cost Breakdown Example

Production VPC (us-east-1):
NAT Gateways (3 AZs):
- Hourly: 3 × $0.045 × 730 hours = $98.55
- Data processing: 500GB × $0.045 = $22.50
- Subtotal: $121.05
VPC Endpoints (Interface):
- ECR API: $7.30
- ECR DKR: $7.30
- CloudWatch Logs: $7.30
- Subtotal: $21.90
Load Balancers:
- Application Load Balancer: $16.20 + $8/LCU
- Subtotal: ~$25
Data Transfer:
- Inter-AZ: 200GB × $0.01 = $2.00
- Internet egress: 1TB × $0.09 = $90.00
- Subtotal: $92.00
Total Monthly Cost: ~$260

Cost Optimization Strategies

Add Gateway Endpoints (FREE)

Deploy S3 and DynamoDB Gateway Endpoints at no cost to eliminate NAT Gateway charges for AWS service traffic. Can save $0.045/GB in data processing fees.

Single NAT Gateway for Dev/Test

Use one NAT Gateway instead of three in non-production environments to save ~$64/month. Accept the reduced availability for cost savings.

Release Unused Resources

Delete unused Elastic IPs ($3.60/month each) and load balancers ($16-25/month each). Regular audits prevent waste.

VPC Peering Over Transit Gateway

For simple connectivity between 2-3 VPCs, use VPC Peering (free) instead of Transit Gateway ($36/month + data charges).

Interface Endpoints for High Traffic

Add Interface Endpoints ($7/month) for services with >100GB/month traffic. Break-even point makes this cost-effective for ECR, CloudWatch, etc.

Monitor Data Transfer Patterns

Use VPC Flow Logs and Cost Explorer to identify and optimize expensive cross-AZ and internet data transfer patterns.

Common AWS Networking Issues

Issue 1: Can’t SSH to EC2 in Private Subnet

Solution: Use AWS Systems Manager Session Manager (no SSH port needed!)

# IAM role for EC2 instances
resource "aws_iam_role" "ec2_ssm" {
name = "ec2-ssm-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
})
}
# Attach SSM policy
resource "aws_iam_role_policy_attachment" "ec2_ssm" {
role = aws_iam_role.ec2_ssm.name
policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
# Instance profile
resource "aws_iam_instance_profile" "ec2_ssm" {
name = "ec2-ssm-profile"
role = aws_iam_role.ec2_ssm.name
}

Connect via CLI:

Terminal window
aws ssm start-session --target i-1234567890abcdef0

Issue 2: High NAT Gateway Costs

Diagnosis:

Terminal window
# Check VPC Flow Logs for traffic patterns
aws ec2 describe-flow-logs --filter "Name=resource-id,Values=vpc-xxxxx"
# Analyze with Athena
SELECT
srcaddr,
dstaddr,
SUM(bytes) as total_bytes
FROM vpc_flow_logs
WHERE action = 'ACCEPT'
GROUP BY srcaddr, dstaddr
ORDER BY total_bytes DESC
LIMIT 20;

Solution: Add VPC Endpoints for AWS services

Issue 3: VPC Peering Not Working

Checklist:

Terminal window
# 1. Check peering connection status
aws ec2 describe-vpc-peering-connections \
--filters "Name=status-code,Values=active"
# 2. Verify route tables in BOTH VPCs
aws ec2 describe-route-tables \
--filters "Name=route.destination-cidr-block,Values=10.1.0.0/16"
# 3. Check security groups allow peer VPC CIDR
aws ec2 describe-security-groups --group-ids sg-xxxxx
# 4. Verify no overlapping CIDR blocks

Troubleshooting Tools

VPC Reachability Analyzer

Test connectivity without sending packets:

Terminal window
# Create analysis
aws ec2 create-network-insights-path \
--source i-source-instance \
--destination i-dest-instance \
--protocol tcp \
--destination-port 443
# Start analysis
aws ec2 start-network-insights-analysis \
--network-insights-path-id nip-xxxxx
# Get results
aws ec2 describe-network-insights-analyses \
--network-insights-analysis-ids nia-xxxxx

VPC Flow Logs Analysis

Query with Athena:

-- Top talkers
SELECT
srcaddr,
dstaddr,
SUM(bytes) as total_bytes,
COUNT(*) as packet_count
FROM vpc_flow_logs
WHERE date = '2024-06-23'
GROUP BY srcaddr, dstaddr
ORDER BY total_bytes DESC
LIMIT 20;
-- Rejected connections (security issues)
SELECT
srcaddr,
dstaddr,
srcport,
dstport,
COUNT(*) as reject_count
FROM vpc_flow_logs
WHERE action = 'REJECT'
GROUP BY srcaddr, dstaddr, srcport, dstport
ORDER BY reject_count DESC;

Next Steps

You now have the knowledge to build production-ready networks on AWS!

Continue the series

Additional AWS Resources

Need help? Contact Quabyt for AWS networking architecture and implementation support.

Back to Blog