🎯 Mục tiêu Task 12: Setup CloudWatch Logs & Metrics cho monitoring toàn hệ thống - LOG TẬP TRUNG + DASHBOARD + ALARMS
Task 12 thu thập logs & metrics từ toàn bộ hệ thống:
🔗 Monitoring Architecture
ECS Fargate (NestJS) ─────┐
│
API Gateway ──────────────┼──→ CloudWatch Logs
│ │
DynamoDB ─────────────────┤ ├──→ CloudWatch Metrics
│ │ │
CI/CD Pipeline ───────────┘ │ ├──→ Alarms
│ │
CloudTrail ───────────────────────┘ └──→ Dashboard
→ Log tập trung → Metrics → Alarms + Dashboard giám sát
{
"family": "vinashoes-user-service",
"taskRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskRole",
"executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskExecutionRole",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"containerDefinitions": [
{
"name": "user-service",
"image": "ACCOUNT.dkr.ecr.ap-southeast-1.amazonaws.com/vinashoes-user-service:latest",
"portMappings": [
{
"containerPort": 3000,
"protocol": "tcp"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/vinashoes-user-service",
"awslogs-region": "ap-southeast-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}
📁 Log Groups Organization
Tạo separate log groups cho từng microservice:
/ecs/vinashoes-user-service/ecs/vinashoes-product-service/ecs/vinashoes-order-service/ecs/vinashoes-cart-service/ecs/vinashoes-payment-serviceLog Group Settings:
Log group name: "/ecs/vinashoes-user-service"
Retention setting: 7 days (để tiết kiệm cost)
Repeat for all services:
- /ecs/vinashoes-product-service
- /ecs/vinashoes-order-service
- /ecs/vinashoes-cart-service
- /ecs/vinashoes-payment-service
# Update ECS service với new task definition
aws ecs update-service \
--cluster vinashoes-cluster \
--service vinashoes-user-service \
--task-definition vinashoes-user-service:LATEST \
--region ap-southeast-1
/ecs/vinashoes-user-service🔐 API Gateway Logging Requirements API Gateway cần IAM role để write logs:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams",
"logs:PutLogEvents",
"logs:GetLogEvents",
"logs:FilterLogEvents"
],
"Resource": "*"
}
]
}
CloudWatch log role ARN: "arn:aws:iam::ACCOUNT:role/APIGatewayCloudWatchLogsRole"
Default Route Settings:
✅ Detailed CloudWatch Metrics
✅ CloudWatch Logs
Log Level: INFO
✅ Log full requests/responses data
✅ Data trace
Stage Configuration:
Stage name: "prod"
Logs/Tracing:
✅ Enable CloudWatch Logs
Log Level: INFO
✅ Log full requests/responses
✅ Enable detailed CloudWatch Metrics
Default DynamoDB Metrics:
- ConsumedReadCapacityUnits
- ConsumedWriteCapacityUnits
- ProvisionedReadCapacityUnits
- ProvisionedWriteCapacityUnits
- ThrottledRequests
- SystemErrors
- UserErrors
📝 DynamoDB Audit Logging CloudTrail capture DynamoDB API calls cho audit:
# CloudTrail config cho DynamoDB audit
CloudTrail Configuration:
Trail name: "vinashoes-dynamodb-audit"
Event Type:
✅ Management events
⚠️ Data events (optional - có cost)
Storage Location:
S3 bucket: "vinashoes-cloudtrail-logs"
CodeBuild tự động gửi build logs về CloudWatch:
CodeBuild Log Groups (tự động tạo):
- /aws/codebuild/vinashoes-backend-build
Log Stream Format:
- [build-id]/[phase]
Retention: 30 days (default)
CodePipeline gửi execution events về CloudWatch:
Pipeline Events:
- Pipeline execution started
- Stage execution started/succeeded/failed
- Action execution started/succeeded/failed
Event Targets:
- CloudWatch Logs
- SNS notifications (optional)
- Lambda functions (optional)
🚨 ECS Critical Alarms
Monitoring points quan trọng cho ECS:
Alarm Configuration:
Alarm name: "ECS-UserService-HighCPU"
Description: "User service CPU usage > 80%"
Metric:
Namespace: "AWS/ECS"
MetricName: "CPUUtilization"
Dimensions:
ServiceName: "vinashoes-user-service"
ClusterName: "vinashoes-cluster"
Threshold:
Comparison: "GreaterThanThreshold"
Threshold: 80
Evaluation Periods: 2 out of 2
Period: 300 seconds (5 minutes)
Actions:
Alarm: Send SNS notification
OK: Send SNS notification
Alarm Configuration:
Alarm name: "APIGateway-High5XXErrors"
Description: "API Gateway 5XX errors > 10 in 5 minutes"
Metric:
Namespace: "AWS/ApiGateway"
MetricName: "5XXError"
Dimensions:
ApiName: "vinashoes-api"
Stage: "prod"
Threshold:
Comparison: "GreaterThanThreshold"
Threshold: 10
Statistic: Sum
Period: 300 seconds
Alarm Configuration:
Alarm name: "DynamoDB-UserThrottling"
Description: "DynamoDB User table throttling detected"
Metric:
Namespace: "AWS/DynamoDB"
MetricName: "ThrottledRequests"
Dimensions:
TableName: "User"
Operation: "Query"
Threshold:
Comparison: "GreaterThanThreshold"
Threshold: 0
Period: 300 seconds
📊 Dashboard Organization
Organize dashboard theo service layers:
Dashboard Configuration:
Dashboard name: "VinaShoesProductionMonitoring"
Widgets Configuration:
- ECS Services Health (Line chart)
- API Gateway Request Rate (Number widget)
- DynamoDB Consumed Capacity (Stacked area)
- CI/CD Pipeline Status (Number widget)
ECS Metrics Widget:
{
"type": "metric",
"width": 12,
"height": 6,
"properties": {
"metrics": [
["AWS/ECS", "CPUUtilization", "ServiceName", "vinashoes-user-service", "ClusterName", "vinashoes-cluster"],
["...", "vinashoes-product-service", ".", "."],
["...", "vinashoes-order-service", ".", "."]
],
"period": 300,
"stat": "Average",
"region": "ap-southeast-1",
"title": "ECS Services CPU Utilization"
}
}
API Gateway Metrics Widget:
{
"type": "metric",
"width": 12,
"height": 6,
"properties": {
"metrics": [
["AWS/ApiGateway", "Count", "ApiName", "vinashoes-api", "Stage", "prod"],
[".", "Latency", ".", ".", ".", "."],
[".", "4XXError", ".", ".", ".", "."],
[".", "5XXError", ".", ".", ".", "."]
],
"period": 300,
"stat": "Sum",
"region": "ap-southeast-1",
"title": "API Gateway Metrics"
}
}
| Component | Status | Details |
|---|---|---|
| ✅ ECS Logs | ACTIVE | Container logs từ tất cả microservices |
| ✅ API Gateway Logs | ACTIVE | Access logs + execution logs |
| ✅ DynamoDB Metrics | ACTIVE | Performance metrics tự động |
| ✅ CI/CD Logs | ACTIVE | CodeBuild + CodePipeline logs |
| ✅ CloudWatch Alarms | CONFIGURED | CPU, Memory, 5XX, Throttling alarms |
| ✅ Dashboard | LIVE | Realtime monitoring view |
🎉 Complete Monitoring Setup!
Log Sources:
Alerting:
Visibility:
Critical Alarms:
ECS CPU Usage: > 80% for 5 minutes
ECS Memory Usage: > 90% for 5 minutes
API Gateway 5XX: > 10 errors in 5 minutes
DynamoDB Throttling: > 0 throttled requests
Warning Alarms:
ECS CPU Usage: > 60% for 10 minutes
API Gateway Latency: > 2000ms average
DynamoDB Consumed Capacity: > 80% of provisioned
🎯 Production Monitoring Tips
Log Management:
Alarm Strategy:
Dashboard Design:
Cost Optimization:
Problem: ECS logs không xuất hiện
# Check ECS task execution role permissions
aws iam get-role-policy \
--role-name ecsTaskExecutionRole \
--policy-name CloudWatchLogsPolicy
# Verify log group exists
aws logs describe-log-groups \
--log-group-name-prefix "/ecs/vinashoes"
Problem: API Gateway logs missing
Problem: Alarms not triggering
Log Volume Management:
# Monitor log ingestion volume
aws logs describe-metric-filters \
--log-group-name "/ecs/vinashoes-user-service"
# Set up log retention
aws logs put-retention-policy \
--log-group-name "/ecs/vinashoes-user-service" \
--retention-in-days 7
Next Task: Task 13 - Security & Compliance monitoring với AWS Config và CloudTrail 🚀
Xóa log groups cho tất cả services:
# Xóa ECS log groups
aws logs delete-log-group --log-group-name "/ecs/vinashoes-user-service"
aws logs delete-log-group --log-group-name "/ecs/vinashoes-product-service"
aws logs delete-log-group --log-group-name "/ecs/vinashoes-order-service"
aws logs delete-log-group --log-group-name "/ecs/vinashoes-cart-service"
aws logs delete-log-group --log-group-name "/ecs/vinashoes-payment-service"
# Xóa API Gateway log groups
aws logs delete-log-group --log-group-name "API-Gateway-Execution-Logs_vinashoes-api/prod"
aws logs delete-log-group --log-group-name "API-Gateway-Access-Logs_vinashoes-api/prod"
# Xóa CodeBuild log groups
aws logs delete-log-group --log-group-name "/aws/codebuild/vinashoes-backend-build"
Xóa tất cả monitoring alarms:
# Xóa ECS alarms
aws cloudwatch delete-alarms --alarm-names \
"ECS-UserService-HighCPU" \
"ECS-UserService-HighMemory" \
"ECS-ServiceCount-Zero"
# Xóa API Gateway alarms
aws cloudwatch delete-alarms --alarm-names \
"APIGateway-High5XXErrors" \
"APIGateway-HighLatency"
# Xóa DynamoDB alarms
aws cloudwatch delete-alarms --alarm-names \
"DynamoDB-UserThrottling" \
"DynamoDB-HighConsumedCapacity"
Xóa monitoring dashboard:
aws cloudwatch delete-dashboards --dashboard-names "VinaShoesProductionMonitoring"
Disable CloudWatch logging cho API Gateway:
# Tắt logging cho stage prod
aws apigateway update-stage \
--rest-api-id YOUR_API_ID \
--stage-name prod \
--patch-op op=replace,path=/methodSettings/*/*/loggingLevel,value=OFF \
--patch-op op=replace,path=/methodSettings/*/*/metricsEnabled,value=false
Xóa CloudWatch permissions từ ECS task roles:
# Detach CloudWatch policies từ ECS task execution role
aws iam detach-role-policy \
--role-name ecsTaskExecutionRole \
--policy-arn arn:aws:iam::aws:policy/CloudWatchLogsFullAccess
# Xóa API Gateway CloudWatch role
aws iam detach-role-policy \
--role-name APIGatewayCloudWatchLogsRole \
--policy-arn arn:aws:iam::aws:policy/service-role/APIGatewayCloudWatchLogsRole
aws iam delete-role --role-name APIGatewayCloudWatchLogsRole
⚠️ Thứ Tự Dọn Dẹp CloudWatch:
Cấu trúc giá CloudWatch:
| Thành Phần Dịch Vụ | Miễn Phí | Trả Phí | Ước Tính Chi Phí |
|---|---|---|---|
| Logs Ingestion | 5GB/tháng | $0.50/GB | $10-50/tháng |
| Logs Storage | - | $0.03/GB/tháng | $3-10/tháng |
| Metrics | 10 metrics | $0.30/metric/tháng | $5-15/tháng |
| Alarms | - | $0.10/alarm/tháng | $3-10/tháng |
| Dashboard | 3 dashboards | $3/dashboard/tháng | $3/tháng |
| API Requests | 1M requests | $0.01/1K requests | $1-5/tháng |
Ước tính chi phí cho e-commerce platform:
Chi Phí Cơ Bản CloudWatch:
Logs Ingestion: $25/tháng (50GB logs)
Logs Storage: $5/tháng (150GB stored)
Custom Metrics: $10/tháng (30 metrics)
Alarms: $5/tháng (50 alarms)
Dashboard: $3/tháng (1 dashboard)
Monitoring & Alerting:
API Requests: $2/tháng (200K requests)
Cross-region: $1/tháng (minimal)
Tổng Chi Phí Hàng Tháng: $51/tháng
Giảm chi phí CloudWatch:
Chiến Thuật Tối Ưu:
1. Log Retention:
- ECS logs: 7 ngày retention
- API Gateway: 30 ngày cho prod
- Archive old logs to S3 Glacier
2. Sampling & Filtering:
- Enable log sampling cho high-volume services
- Use metric filters thay vì storing all logs
3. Alarm Optimization:
- Combine related alarms
- Use composite alarms để giảm số lượng
4. Dashboard Efficiency:
- Use single dashboard với multiple widgets
- Remove unused metrics từ dashboard
Lợi Ích Monitoring vs Chi Phí:
| Loại Lợi Ích | Giá Trị | Tác Động Chi Phí |
|---|---|---|
| MTTR Reduction | Giảm 70% thời gian fix issues | $50K+ mỗi outage |
| Performance Optimization | Cải thiện response time 30% | $20K+ mỗi giây chậm |
| Proactive Monitoring | Phát hiện issues trước khi user impacted | $100K+ downtime prevention |
| Operational Efficiency | Tự động alerts giảm manual monitoring | 10 giờ/tuần tiết kiệm |
| Compliance | Audit trails cho security compliance | Vô giá trị |
Tính Toán ROI:
Theo dõi chi tiêu CloudWatch:
# Kiểm tra chi phí CloudWatch
aws ce get-cost-and-usage \
--time-period Start=2024-01-01,End=2024-01-31 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=SERVICE \
--filter '{
"Dimensions": {
"Key": "SERVICE",
"Values": ["AmazonCloudWatch"]
}
}'
# Giám sát log volume
aws logs describe-log-groups \
--query 'logGroups[*].{logGroupName:logGroupName,storedBytes:storedBytes}' \
--output table
# Check metrics usage
aws cloudwatch list-metrics \
--namespace "AWS/ECS" \
--query 'Metrics[*].{MetricName:MetricName,Dimensions:Dimensions}'
💡 Thực Tiễn Quản Lý Chi Phí Tốt Nhất
Log Management:
Cost Monitoring:
Optimization:
Scaling Considerations:
🚀 Production-Ready AWS Microservices Platform with Complete Observability! 🚀