AWS X-Ray Distributed Tracing

🎯 Mục tiêu Task 13: Setup AWS X-Ray cho distributed tracing - TRACE REQUEST END-TO-END

🔍 X-Ray Tracing Overview

Task 13 enable distributed tracing:

  • 🕸️ Service Map: Visualize microservices architecture
  • 📊 Request Tracing: Track API Gateway → ECS → DynamoDB
  • Performance Analysis: Detect bottlenecks
  • 🐛 Error Detection: Identify failures

Tracing Flow: User Request → API Gateway → ECS → DynamoDB → X-Ray Analysis

Prerequisites

  • ✅ Task 9: ECS services running
  • ✅ Task 10: API Gateway active
  • ✅ Task 7: DynamoDB tables
  • ✅ Task 12: CloudWatch monitoring

1. API Gateway X-Ray (5 minutes)

Enable X-Ray cho API Gateway:

  1. API Gateway Console → Stage → prod → Settings:
✅ Enable X-Ray Tracing
Sampling rate: 10%
  1. Test tracing:
curl -X GET https://your-api-gateway-url/api/users

2. ECS X-Ray Integration (15 minutes)

Update ECS Task Definition với X-Ray daemon:

{
  "containerDefinitions": [
    {
      "name": "user-service",
      "image": "ACCOUNT.dkr.ecr.ap-southeast-1.amazonaws.com/vinashoes-user-service:latest",
      "environment": [
        {
          "name": "AWS_XRAY_DAEMON_ADDRESS",
          "value": "localhost:2000"
        }
      ],
      "dependsOn": [
        {
          "containerName": "xray-daemon",
          "condition": "START"
        }
      ]
    },
    {
      "name": "xray-daemon",
      "image": "amazon/aws-xray-daemon:latest",
      "cpu": 32,
      "memoryReservation": 256,
      "portMappings": [
        {
          "containerPort": 2000,
          "protocol": "udp"
        }
      ]
    }
  ]
}

IAM permissions cần thiết:

{
  "Effect": "Allow",
  "Action": [
    "xray:PutTraceSegments",
    "xray:PutTelemetryRecords"
  ],
  "Resource": "*"
}

3. NestJS X-Ray SDK (10 minutes)

Install X-Ray SDK:

npm install aws-xray-sdk-core aws-xray-sdk-express

Configure trong main.ts:

import { NestFactory } from '@nestjs/core';
import * as AWSXRay from 'aws-xray-sdk-express';

async function bootstrap() {
  const app = await NestFactory.create(AppModule);
  
  // Enable X-Ray tracing
  app.use(AWSXRay.openSegment('user-service'));
  app.enableCors();
  app.use(AWSXRay.closeSegment());
  
  await app.listen(3000);
}

Trace DynamoDB calls:

import * as AWSXRay from 'aws-xray-sdk-core';
const AWS = AWSXRay.captureAWS(require('aws-sdk'));

@Injectable()
export class UsersService {
  private dynamoDB = new AWS.DynamoDB.DocumentClient();
  
  async findOne(id: string) {
    const params = { TableName: 'User', Key: { id } };
    return await this.dynamoDB.get(params).promise();
  }
}

4. Deploy và Test (10 minutes)

Deploy updated services:

# Build và push new image với X-Ray
docker build -t user-service-xray .
docker push ACCOUNT.dkr.ecr.ap-southeast-1.amazonaws.com/vinashoes-user-service:latest

# Update ECS service
aws ecs update-service \
  --cluster vinashoes-cluster \
  --service vinashoes-user-service \
  --task-definition vinashoes-user-service:LATEST

Test tracing:

# Generate test requests
curl -X GET https://your-api-gateway-url/api/users/123
curl -X GET https://your-api-gateway-url/api/users/nonexistent

# Check X-Ray Console → Service map

5. X-Ray Analysis (5 minutes)

Service Map Analysis:

  • API Gateway → ECS Services → DynamoDB
  • Response time distribution
  • Error rate per service
  • Service dependencies

Performance Insights:

Example Trace Analysis:
  Total Duration: 245ms
  - API Gateway: 5ms
  - User Service: 180ms
    - Business logic: 20ms  
    - DynamoDB call: 160ms (BOTTLENECK!)
  - Response: 60ms

CloudWatch ServiceLens Integration:

  • Single monitoring view
  • Correlation với metrics và logs
  • Automated insights

6. Task 13 Hoàn Thành!

📋 Checklist

Component Status
✅ API Gateway Tracing ACTIVE
✅ ECS X-Ray Integration DEPLOYED
✅ NestJS SDK INTEGRATED
✅ Service Map VISIBLE
✅ Performance Analysis READY

🎯 Key Benefits

✅ Complete End-to-End Tracing:

  • Request flow visibility
  • Performance bottleneck detection
  • Error correlation
  • Service dependency mapping
  • CloudWatch integration

💡 Production Tips

Sampling Strategy:
  Production: 5-10%
  Development: 100%
  Critical paths: Always sample

Cost Management:
  - Use intelligent sampling
  - Configure retention policies
  - Focus on critical services

Troubleshooting:
  - Check X-Ray daemon logs
  - Verify IAM permissions
  - Monitor sampling rates

Next: Task 14 - Security monitoring với AWS Config 🚀


7. Dọn Dẹp Tài Nguyên

7.1. Tắt X-Ray Tracing trong API Gateway

Disable X-Ray cho API Gateway stage:

# Disable X-Ray tracing cho stage prod
aws apigateway update-stage \
  --rest-api-id YOUR_API_ID \
  --stage-name prod \
  --patch-op op=replace,path=/tracingEnabled,value=false

7.2. Xóa X-Ray Daemon khỏi ECS

Update ECS task definition để loại bỏ X-Ray daemon:

{
  "containerDefinitions": [
    {
      "name": "user-service",
      "image": "ACCOUNT.dkr.ecr.ap-southeast-1.amazonaws.com/vinashoes-user-service:latest",
      "environment": []
    }
  ]
}

Deploy updated task definition:

# Update ECS service với task definition mới
aws ecs update-service \
  --cluster vinashoes-cluster \
  --service vinashoes-user-service \
  --task-definition vinashoes-user-service:NEW_VERSION \
  --force-new-deployment

7.3. Xóa IAM Permissions

Remove X-Ray permissions từ ECS task role:

# Detach X-Ray policy từ task execution role
aws iam detach-role-policy \
  --role-name ecsTaskExecutionRole \
  --policy-arn arn:aws:iam::aws:policy/AWSXRayDaemonWriteAccess

# Hoặc remove specific permissions từ custom policy

7.4. Dọn Dẹp X-Ray Data

Xóa X-Ray traces và service maps (optional):

# Xóa traces cũ hơn 30 ngày
aws xray delete-trace \
  --trace-id trace-id-here

# Note: X-Ray tự động xóa data sau 30 ngày theo mặc định

⚠️ Thứ Tự Dọn Dẹp X-Ray:

  1. Tắt tracing trong API Gateway
  2. Update ECS task definition (loại bỏ X-Ray daemon)
  3. Deploy service mới
  4. Xóa IAM permissions
  5. Dọn dẹp traces nếu cần

8. Phân Tích Chi Phí

8.1. Tổng Quan Giá X-Ray

Cấu trúc giá AWS X-Ray:

Thành Phần Dịch Vụ Miễn Phí Trả Phí Ước Tính Chi Phí
Traces 100,000 traces/tháng $0.000005/trace $0.50/tháng
Retrieved Traces - $0.000005/retrieved trace $0.10/tháng
Analytics Queries - $0.000005/query $0.05/tháng
Service Map Miễn phí - $0/tháng
CloudWatch Integration Miễn phí - $0/tháng

8.2. Chi Tiết Chi Phí Hàng Tháng

Ước tính chi phí cho e-commerce platform:

Chi Phí Cơ Bản X-Ray:
  Traces: $0.50/tháng (100K traces)
  Retrieved Traces: $0.10/tháng (20K retrieved)
  Analytics: $0.05/tháng (10K queries)
  
Sampling & Storage:
  Sampling Rate: 10% (giảm 90% traces)
  Storage: Miễn phí (30 ngày retention)
  
Tổng Chi Phí Hàng Tháng: $0.65/tháng

8.3. Chiến Lược Tối Ưu Chi Phí

Giảm chi phí X-Ray:

Chiến Thuật Tối Ưu:
  1. Sampling Thông Minh:
     - Production: 5-10% sampling
     - Development: 100% sampling
     - Critical paths: Luôn trace
     
  2. Retention Policy:
     - 30 ngày đủ cho hầu hết debugging
     - Archive traces quan trọng
     
  3. Query Optimization:
     - Sử dụng filters để giảm retrieved traces
     - Schedule analytics queries
     
  4. Service Selection:
     - Enable chỉ cho critical services
     - Disable cho background jobs

8.4. Phân Tích ROI

Lợi Ích Observability vs Chi Phí:

Loại Lợi Ích Giá Trị Tác Động Chi Phí
Debugging Thời Gian Giảm MTTR 50% $10K+ mỗi outage
Performance Optimization Cải thiện response time $5K+ mỗi giây chậm
Service Reliability Giảm downtime $50K+ mỗi giờ downtime
Development Efficiency Troubleshooting nhanh hơn 5 giờ/ngày tiết kiệm
Customer Experience Tăng satisfaction Tăng revenue 2-5%

Tính Toán ROI:

  • Chi Phí Hàng Năm: $8 (0.65/tháng × 12)
  • Lợi Ích Hàng Năm: $200K+ (debugging + reliability)
  • ROI: 2,500,000% (lợi ích ÷ chi phí)

8.5. Giám Sát Chi Phí

Theo dõi chi tiêu X-Ray:

# Kiểm tra chi phí X-Ray
aws ce get-cost-and-usage \
  --time-period Start=2024-01-01,End=2024-01-31 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=SERVICE \
  --filter '{
    "Dimensions": {
      "Key": "SERVICE",
      "Values": ["AWS X-Ray"]
    }
  }'

# Giám sát số lượng traces
aws xray get-trace-summaries \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-31T23:59:59Z \
  --query 'TraceSummaries[*].{Id:Id,Duration:Duration,ResponseTime:ResponseTime}'

💡 Thực Tiễn Quản Lý Chi Phí Tốt Nhất

Sampling Strategy:

  • Production: 5-10% cho cost-effective monitoring
  • Development: 100% cho full debugging
  • Critical services: Always sample

Cost Monitoring:

  • Set CloudWatch billing alerts cho $5/tháng threshold
  • Monitor trace volume hàng tuần
  • Review sampling rates monthly

Optimization:

  • Use X-Ray only cho critical request paths
  • Configure appropriate retention periods
  • Leverage CloudWatch ServiceLens integration

Scaling Considerations:

  • High-traffic: Tăng sampling rate có thể tăng chi phí
  • Multi-service: Chi phí scale với số services
  • Global: Cross-region traces add minimal cost

🚀 Production-Ready AWS Microservices Platform with Complete Observability! 🚀