Financial charts and graphs representing AWS cost reduction analytics
← All Articles
AWS

How I Cut AWS Costs by 60% for a 100K-User E-Learning Platform

The Problem

When I took over the ALW platform infrastructure, the AWS bill was growing every month with no clear ownership of what was driving costs. We were running 100,000+ concurrent users across Open edX on AWS me-south-1 (Bahrain) and the team had no cost visibility at all.

Here is exactly what I found and fixed — resulting in a 60% reduction in monthly AWS spend.

AWS Cost Explorer dashboard showing spending trends AWS Cost Explorer — your first stop before any optimization work

Step 1 — Get Visibility First

Before cutting anything, I set up proper cost allocation tags across every resource:

  • Environment (prod / staging / dev)
  • Platform (alw / edly / ilmx)
  • Team (devops / backend / frontend)

Without tags, you are flying blind. Enable Cost Explorer and set up a monthly budget alert at 80% threshold immediately.

Step 2 — EC2 Rightsizing

Used AWS Compute Optimizer to identify overprovisioned instances. Found that most application servers were running at under 20% CPU utilization consistently.

Actions taken:

  • Downgraded m5.xlarge → m5.large for app servers (saving 50% on compute)
  • Switched dev/staging to t3.medium burstable instances
  • Scheduled non-prod instances to stop at 8PM and restart at 8AM daily

Saving: ~35% of EC2 bill

Server utilization metrics on monitoring dashboard CloudWatch metrics revealing consistently low CPU utilization on overprovisioned instances

Step 3 — Reserved Instances for Production

Production workloads run 24/7 with predictable load. Switched from On-Demand to 1-year Reserved Instances for all production EC2 and RDS.

Saving: ~40% on reserved resources

Step 4 — CloudFront for Static Assets

Open edX serves a huge amount of static content — course videos, images, JS/CSS. Was serving all of it directly from S3 with no CDN, paying full data transfer costs.

Set up CloudFront distribution in front of S3:

  • Cache TTL set to 1 year for versioned assets
  • Saudi Arabia and Pakistan edge locations reduced latency by 60%
  • Data transfer costs dropped dramatically

Saving: ~25% of data transfer costs

Step 5 — S3 Lifecycle Policies

Found 2TB+ of old course backups, log files, and unused media sitting in S3 Standard storage.

Applied lifecycle rules:

  • Move to S3 Infrequent Access after 30 days
  • Move to Glacier after 90 days
  • Delete logs older than 1 year

Saving: ~40% of S3 bill

Step 6 — Kill Hidden Resources

Ran a full audit with AWS Trusted Advisor and found:

  • 12 unattached EBS volumes still being charged
  • 8 unused Elastic IPs
  • 3 idle Load Balancers with zero traffic
  • Old NAT Gateway in a dev VPC nobody was using

Deleted all of them immediately.

Saving: $200+/month in hidden waste

Final Result

CategoryBeforeAfterSaving
EC2 Compute$X$X45%
RDS Database$X$X38%
S3 Storage$X$X40%
Data Transfer$X$X55%
Hidden Resources$X$0100%
Total$X$X60%

Money savings celebration animation The feeling when the AWS bill drops 60% 💸

Key Takeaways

  1. Tag everything before you optimize anything
  2. Rightsizing gives the fastest win with lowest risk
  3. Reserved Instances are worth it for stable production workloads
  4. CloudFront pays for itself on any platform with heavy static assets
  5. Run a hidden resource audit every quarter — waste accumulates silently