Professional Summary
Passionate Cloud & DevOps Engineer with expertise in AWS services, Infrastructure as Code, and CI/CD automation.
Experienced in designing and implementing scalable, secure cloud architectures using Terraform, GitHub Actions, and
serverless-first principles. AWS Certified Solutions Architect – Associate with a strong focus on FinOps,
observability, and production reliability.
Featured Projects
Event-driven AWS serverless pipeline that automatically detects infrastructure issues via CloudWatch Alarms,
runs three parallel data collectors (metrics, logs, deploy context), and feeds the results to Amazon Bedrock
(Claude) to generate root-cause hypotheses with confidence levels — advisory-only, no auto-remediation.
- Parallel fan-out via Step Functions: Metrics Collector, Logs Collector & Deploy Context Collector run simultaneously to minimise latency
- AI analysis with Bedrock (Claude): root-cause hypotheses, supporting evidence, and actionable remediation recommendations
- Multi-channel notifications: Slack webhooks + SNS email with rich formatting
- 90-day DynamoDB incident store with TTL, queryable by resource ARN, severity, or time range
- Full observability: structured JSON logging, X-Ray distributed tracing, custom CloudWatch metrics
Dockerized MCP (Model Context Protocol) server that acts as a senior DevOps & Cloud mentor inside your IDE.
Parses live GitHub repos, CI/CD pipelines, and Terraform files to give structured, production-grade feedback
— teaching how to think like a DevOps engineer, not just what to do.
- 11 MCP tools: repo analysis, CI/CD review, Terraform audit, AWS cost advisor, skill tracker, learning path engine
- 4 review modes — Mentor, Review, Debug, Interview — each changes tone and depth of feedback
- CI/CD engine checks: OIDC vs long-lived creds, action pinning, missing permissions blocks, job timeouts, concurrency groups
- Terraform engine (HCL2 parsing): flags hardcoded secrets, missing remote backend, IAM wildcards, S3 without encryption
- Skill tracker persists progress in SQLite and generates a personalised learning roadmap
Modular Terraform + GitHub Actions system for ephemeral AWS EKS clusters that spin up on demand,
auto-scale GPU nodes with Karpenter, run batch AI workloads (Kubernetes Jobs), collect results,
then self-destruct — eliminating idle cluster costs entirely.
- Full lifecycle orchestration: Deploy → Run → Collect → Destroy via a single CI/CD pipeline
- Karpenter provisioner with CPU & GPU NodePool support for LLM inference workloads
- Modular IaC: VPC, EKS, IAM, IAM_EKS, and OIDC provider as independent Terraform modules
- OIDC identity federation — GitHub Actions as trusted identity provider, zero static AWS keys
Automated AWS cost-optimization engine that hunts orphaned "zombie" resources
(EBS volumes, NAT Gateways, RDS instances, Elastic IPs) across every enabled region,
triggered by EventBridge on a weekly Sunday cron schedule.
- CloudWatch metrics-based detection — flags resources by actual usage patterns, not just status
- Cross-region: dynamically fetches all enabled AWS regions and audits each independently
- Shift-left security: tfsec + flake8 in CI pipeline before any deployment
- Modular IaC using Terraform
moved blocks for zero-destroy refactoring
- Sample output:
{"estimated_monthly_savings": "$347.50"}
Production-ready containerized web app: React (TypeScript + Nginx) frontend, Node.js (Express) backend,
and PostgreSQL 15 on RDS Multi-AZ — deployed on ECS Fargate behind an ALB with full VPC isolation,
enterprise security, and complete observability.
- 3-tier: ALB → ECS Fargate frontend → ECS Fargate backend → RDS PostgreSQL (private subnet)
- Security: Secrets Manager, WAF, SSL/TLS termination at ALB, security groups per tier
- Observability: Prometheus metrics, CloudWatch logs, health checks, and alerting
- Cost-optimised: Fargate Spot, RDS auto-pause, log retention policies, right-sizing
- Performance targets: <2s page load, <500ms API response, 99.9% uptime
Serverless resume website with a live visitor counter. Static assets served from S3 via CloudFront OAC,
visitor count stored in DynamoDB and incremented atomically by a Python Lambda behind API Gateway —
full IaC with modular Terraform, OIDC-based CI/CD, and custom domain at www.ericchiu.page.