AWS X-Ray - Distributed Tracing
Chapter 38: AWS X-Ray - Distributed Tracing
Section titled “Chapter 38: AWS X-Ray - Distributed Tracing”End-to-End Application Tracing
Section titled “End-to-End Application Tracing”38.1 Overview
Section titled “38.1 Overview”AWS X-Ray helps developers analyze and debug production, distributed applications, such as those built using a microservices architecture.
AWS X-Ray Overview+------------------------------------------------------------------+| || +------------------------+ || | AWS X-Ray | || +------------------------+ || | || +---------------------+---------------------+ || | | | | || v v v v || +----------+ +----------+ +----------+ +----------+ || | Traces | | Segments | | Service | | Sampling | || | | | | | Map | | Rules | || | - Request| | - Subseg | | | | | || | Track | | - Metadata| | - Visual | | - Config | || | - Timing | | | | - Depend | | - Rate | || +----------+ +----------+ +----------+ +----------+ || |+------------------------------------------------------------------+Key Features
Section titled “Key Features”| Feature | Description |
|---|---|
| Traces | End-to-end request tracking |
| Segments | Service-level processing details |
| Service Map | Visual representation of services |
| Sampling | Control data collection rate |
38.2 X-Ray Concepts
Section titled “38.2 X-Ray Concepts”Trace and Segment Hierarchy
Section titled “Trace and Segment Hierarchy” X-Ray Trace Hierarchy+------------------------------------------------------------------+| || Trace || +------------------------------------------------------------+ || | | || | - Unique ID for entire request | || | - Collection of segments | || | - Tracks request through all services | || | | || | +--------------------------------------------------------+ | || | | Segment (Service) | | || | | +----------------------------------------------------+ | | || | | | - Processing on a single service | | | || | | | - Start/end time | | | || | | | - HTTP method, URL, response code | | | || | | | | | | || | | | +------------------------------------------------+ | | | || | | | | Subsegment (Operation) | | | | || | | | | +----------------------------------------------+ | | | | || | | | | | - Specific operation within service | | | | | || | | | | | - Database query, HTTP call, function call | | | | | || | | | | | - Custom metadata | | | | | || | | | | +----------------------------------------------+ | | | | || | | | +------------------------------------------------+ | | | || | | +----------------------------------------------------+ | | || | +--------------------------------------------------------+ | || | | || +------------------------------------------------------------+ || |+------------------------------------------------------------------+Service Map
Section titled “Service Map” X-Ray Service Map Example+------------------------------------------------------------------+| || +------------------------+ || | Client | || +------------------------+ || | || v || +------------------------+ || | Load Balancer | || | (ELB) | || +------------------------+ || | || +---------------+---------------+ || | | || v v || +----------------+ +----------------+ || | Web Service | | API Service | || | (EC2) | | (Lambda) | || +----------------+ +----------------+ || | | || v v || +----------------+ +----------------+ || | Database | | Cache | || | (RDS) | | (ElastiCache)| || +----------------+ +----------------+ || || Colors indicate health: || - Green: Healthy || | - Yellow: Degraded || | - Red: Error || |+------------------------------------------------------------------+38.3 X-Ray Integration
Section titled “38.3 X-Ray Integration”Supported Services
Section titled “Supported Services” X-Ray Supported Services+------------------------------------------------------------------+| || Compute || +------------------------------------------------------------+ || | - Amazon EC2 | || | - AWS Lambda | || | - Amazon ECS | || | - Amazon EKS | || | - AWS Elastic Beanstalk | || +------------------------------------------------------------+ || || Networking || +------------------------------------------------------------+ || | - Elastic Load Balancing (ALB/NLB) | || | - Amazon API Gateway | || | - Amazon CloudFront | || +------------------------------------------------------------+ || || Database || +------------------------------------------------------------+ || | - Amazon RDS | || | - Amazon DynamoDB | || | - Amazon ElastiCache | || +------------------------------------------------------------+ || || Messaging || +------------------------------------------------------------+ || | - Amazon SNS | || | - Amazon SQS | || | - Amazon Kinesis | || +------------------------------------------------------------+ || |+------------------------------------------------------------------+X-Ray SDK Integration
Section titled “X-Ray SDK Integration” X-Ray SDK Integration+------------------------------------------------------------------+| || Application Code || +------------------------------------------------------------+ || | | || | // Node.js Example | || | const AWSXRay = require('aws-xray-sdk'); | || | const app = require('express')(); | || | | || | // Enable X-Ray for all routes | || | app.use(AWSXRay.express.openSegment('MyApp')); | || | | || | app.get('/', (req, res) => { | || | const segment = req.segment; | || | segment.addAnnotation('user', 'alice'); | || | segment.addMetadata('custom', 'data'); | || | res.send('Hello World'); | || | }); | || | | || | app.use(AWSXRay.express.closeSegment()); | || | | || +------------------------------------------------------------+ || || AWS SDK Instrumentation || +------------------------------------------------------------+ || | | || | // Capture AWS SDK calls | || | const AWS = AWSXRay.captureAWS(require('aws-sdk')); | || | | || | // All AWS SDK calls are automatically traced | || | const s3 = new AWS.S3(); | || | await s3.getObject({Bucket: 'my-bucket', Key: 'file'}).promise(); || | | || +------------------------------------------------------------+ || |+------------------------------------------------------------------+38.4 Sampling Rules
Section titled “38.4 Sampling Rules”Sampling Configuration
Section titled “Sampling Configuration” X-Ray Sampling Rules+------------------------------------------------------------------+| || Default Sampling Rule || +------------------------------------------------------------+ || | | || | - FixedRate: 1 (100% of requests) | || | - HTTPMethod: * | || | - Path: * | || | - ServiceName: * | || | - ResourceType: * | || | | || +------------------------------------------------------------+ || || Custom Sampling Rules || +------------------------------------------------------------+ || | | || | Rule: HighTrafficAPI | || | +------------------------------------------------------+ | || | | - Priority: 1 | | || | | - HTTPMethod: GET | | || | | - Path: /api/health | | || | | - FixedRate: 0.01 (1% sampling) | | || | | - ReservoirSize: 10 (per second) | | || | +------------------------------------------------------+ | || | | || | Rule: CriticalAPI | || | +------------------------------------------------------+ | || | | - Priority: 2 | | || | | - HTTPMethod: POST | | || | | - Path: /api/checkout | | || | | - FixedRate: 1 (100% sampling) | | || | +------------------------------------------------------+ | || | | || +------------------------------------------------------------+ || |+------------------------------------------------------------------+38.5 Annotations and Metadata
Section titled “38.5 Annotations and Metadata”Annotations vs Metadata
Section titled “Annotations vs Metadata” X-Ray Annotations vs Metadata+------------------------------------------------------------------+| || Annotations || +------------------------------------------------------------+ || | | || | - Indexed for search | || | - Key-value pairs | || | - Limited to string, number, boolean | || | - Use for filtering traces | || | | || | Example: | || | +------------------------------------------------------+ | || | | segment.addAnnotation('userId', 'user-123'); | | || | | segment.addAnnotation('orderId', 45678); | | || | | segment.addAnnotation('isPremium', true); | | || | +------------------------------------------------------+ | || | | || +------------------------------------------------------------+ || || Metadata || +------------------------------------------------------------+ || | | || | - Not indexed | || | - Can be any object | || | - Use for additional context | || | | || | Example: | || | +------------------------------------------------------+ | || | | segment.addMetadata('request', { | | || | | headers: req.headers, | | || | | body: req.body | | || | | }); | | || | +------------------------------------------------------+ | || | | || +------------------------------------------------------------+ || |+------------------------------------------------------------------+38.6 CLI Commands
Section titled “38.6 CLI Commands”# Get sampling rulesaws xray get-sampling-rules
# Create sampling ruleaws xray create-sampling-rule \ --sampling-rule-record '{ "SamplingRule": { "RuleName": "MyRule", "RuleARN": "arn:aws:xray:us-east-1:123456789012:sampling-rule/MyRule", "ResourceARN": "*", "Priority": 1, "FixedRate": 0.1, "ReservoirSize": 10, "ServiceName": "*", "ServiceType": "*", "Host": "*", "HTTPMethod": "*", "URLPath": "*", "Version": 1 } }'
# Get service graphaws xray get-service-graph \ --start-time 1704067200 \ --end-time 1704153600
# Get trace summariesaws xray get-trace-summaries \ --start-time 1704067200 \ --end-time 1704153600
# Get traces by IDsaws xray batch-get-traces \ --trace-ids '["trace-id-1","trace-id-2"]'
# Put trace segmentsaws xray put-trace-segments \ --trace-segment-documents 'file://segments.json'
# Put telemetry recordsaws xray put-telemetry-records \ --telemetry-records 'file://telemetry.json'
# Delete sampling ruleaws xray delete-sampling-rule \ --rule-name "MyRule"
# Update sampling ruleaws xray update-sampling-rule \ --sampling-rule-update '{ "RuleName": "MyRule", "FixedRate": 0.05, "ReservoirSize": 20 }'38.7 Advanced X-Ray Features
Section titled “38.7 Advanced X-Ray Features”Cross-Account Tracing
Section titled “Cross-Account Tracing” Cross-Account X-Ray Tracing+------------------------------------------------------------------+| || Account A (Production) Account B (Monitoring) || +------------------------+ +------------------------+ || | | | | || | +----------+ | | +----------+ | || | | Lambda | | | | X-Ray | | || | | Function |--------->|----->| | Console | | || | +----------+ | | +----------+ | || | | | | || | +----------+ | | +----------+ | || | | API | | | | Service | | || | | Gateway |--------->|----->| | Map | | || | +----------+ | | +----------+ | || | | | | || +------------------------+ +------------------------+ || || Configuration: || +------------------------------------------------------------+ || | # Enable cross-account tracing | || | aws xray put-resource-policy \ | || | --policy-name CrossAccountPolicy \ | || | --policy-document '{ | || | "Version": "2012-10-17", | || | "Statement": [ | || | { | || | "Effect": "Allow", | || | "Principal": { "AWS": "arn:aws:iam::ACCOUNT-B" }, | || | "Action": [ | || | "xray:PutTraceSegments", | || | "xray:PutTelemetryRecords" | || | ] | || | } | || | ] | || | }' | || +------------------------------------------------------------+ || |+------------------------------------------------------------------+X-Ray SDK Integration Examples
Section titled “X-Ray SDK Integration Examples”# Python (Flask) X-Ray Integrationfrom aws_xray_sdk.core import xray_recorderfrom aws_xray_sdk.ext.flask.middleware import XRayMiddlewarefrom flask import Flask
app = Flask(__name__)
# Configure X-Rayxray_recorder.configure( service='my-flask-app', sampling=True, context_missing='LOG_ERROR')
# Add X-Ray middlewareXRayMiddleware(app, xray_recorder)
@app.route('/api/users')@xray_recorder.capture('get_users')def get_users(): # Custom subsegment with xray_recorder.in_subsegment('database_query') as subsegment: subsegment.put_annotation('user_id', '12345') subsegment.put_metadata('query', {'table': 'users'}) # Database query here users = query_database()
return {'users': users}
# Custom annotations and metadata@app.route('/api/orders/<order_id>')def get_order(order_id): segment = xray_recorder.current_segment() segment.put_annotation('order_id', order_id) segment.put_metadata('request_info', { 'source': 'web', 'version': '1.0' }) return get_order_from_db(order_id)// Node.js X-Ray Integrationconst AWSXRay = require('aws-xray-sdk');const AWS = AWSXRay.captureAWS(require('aws-sdk'));const express = require('express');
const app = express();
// Enable X-Ray middlewareapp.use(AWSXRay.express.openSegment('MyApp'));
app.get('/api/products', async (req, res) => { const segment = AWSXRay.getSegment();
// Add annotation segment.addAnnotation('product_category', 'electronics');
// Add metadata segment.addMetadata('request_id', req.headers['x-request-id']);
// Custom subsegment AWSXRay.captureAsyncFunc('fetchProducts', async (subsegment) => { try { const products = await fetchProductsFromDB(); subsegment.close(); res.json(products); } catch (err) { subsegment.close(err); res.status(500).json({ error: err.message }); } });});
app.use(AWSXRay.express.closeSegment());X-Ray with AWS Lambda
Section titled “X-Ray with AWS Lambda”# SAM template with X-Ray enabledAWSTemplateFormatVersion: '2010-09-09'Transform: AWS::Serverless-2016-10-31
Resources: MyFunction: Type: AWS::Serverless::Function Properties: Handler: index.handler Runtime: nodejs18.x Tracing: Active # Enable X-Ray tracing Policies: - Version: '2012-10-17' Statement: - Effect: Allow Action: - xray:PutTraceSegments - xray:PutTelemetryRecords Resource: '*'CloudWatch ServiceLens Integration
Section titled “CloudWatch ServiceLens Integration” CloudWatch ServiceLens Architecture+------------------------------------------------------------------+| || +------------------------+ || | CloudWatch | || | ServiceLens | || +------------------------+ || | || +---------------------+---------------------+ || | | | | || v v v v || +----------+ +----------+ +----------+ +----------+ || | X-Ray | | CloudWatch| | App | | Service | || | Traces | | Metrics | | Insights | | Map | || | | | | | | | | || | - Request| | - Latency| | - Runtime| | - Deps | || | Paths | | - Errors | | Metrics| | - Health | || | - Errors | | - Invoc. | | - Errors | | - Latency| || +----------+ +----------+ +----------+ +----------+ || || Benefits: || +------------------------------------------------------------+ || | - Correlate traces with metrics | || | - Visualize service dependencies | || | - Identify bottlenecks quickly | || | - Root cause analysis | || +------------------------------------------------------------+ || |+------------------------------------------------------------------+X-Ray Insights
Section titled “X-Ray Insights” X-Ray Insights Features+------------------------------------------------------------------+| || Insights automatically detects: || +------------------------------------------------------------+ || | | || | 1. High latency operations | || | +----------------------------------------------------+ | || | | - Slow database queries | | || | | - External API calls | | || | | - Resource contention | | || | +----------------------------------------------------+ | || | | || | 2. Error patterns | || | +----------------------------------------------------+ | || | | - Recurring exceptions | | || | | - HTTP 5xx errors | | || | | - Timeout issues | | || | +----------------------------------------------------+ | || | | || | 3. Anomalies | || | +----------------------------------------------------+ | || | | - Traffic spikes | | || | | - Latency outliers | | || | | - Error rate changes | | || | +----------------------------------------------------+ | || | | || +------------------------------------------------------------+ || |+------------------------------------------------------------------+38.8 Best Practices
Section titled “38.8 Best Practices”X-Ray Best Practices
Section titled “X-Ray Best Practices” X-Ray Best Practices+------------------------------------------------------------------+| || 1. Use annotations for searchable data || +------------------------------------------------------------+ || | - Add userId, orderId, etc. for filtering | || | - Keep annotations simple (string, number, boolean) | || +------------------------------------------------------------+ || || 2. Use subsegments for detailed operations || +------------------------------------------------------------+ || | - Trace database queries | || | - Trace external HTTP calls | || | - Trace internal function calls | || +------------------------------------------------------------+ || || 3. Configure sampling for high-traffic applications || +------------------------------------------------------------+ || | - Reduce costs | || | - Focus on critical paths | || +------------------------------------------------------------+ || || 4. Integrate with CloudWatch ServiceLens || +------------------------------------------------------------+ || | - Combined view of metrics and traces | || | - Enhanced monitoring | || +------------------------------------------------------------+ || || 5. Use X-Ray daemon on EC2/ECS || +------------------------------------------------------------+ || | - Collects segments from application | || | - Sends to X-Ray service | || +------------------------------------------------------------+ || |+------------------------------------------------------------------+38.9 Why This Matters in DevOps/SRE
Section titled “38.9 Why This Matters in DevOps/SRE”X-Ray is essential for distributed tracing in microservices architectures. SREs use it to understand request flows, identify bottlenecks, and troubleshoot latency issues across services.
X-Ray in DevOps/SRE+------------------------------------------------------------------+| || SRE Observability for Microservices: || || 1. End-to-End Tracing || +----------------------------------------------------------+ || | - Track requests across multiple services | || | - Identify where latency is introduced | || | - Visualize dependency map with Service Map | || +----------------------------------------------------------+ || || 2. Error Analysis || +----------------------------------------------------------+ || | - Identify failing services quickly | || | - Trace errors back to root cause | || | - Analyze error patterns across services | || +----------------------------------------------------------+ || || 3. Performance Optimization || +----------------------------------------------------------+ || | - Find slowest operations in request path | || | - Optimize database queries and external calls | || | - Set performance SLOs based on trace data | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+38.10 Linux Systems Perspective
Section titled “38.10 Linux Systems Perspective”X-Ray Automation from Arch Linux
Section titled “X-Ray Automation from Arch Linux”# Install X-Ray daemon on Arch Linuxsudo pacman -S aws-cli-v2 jqyay -S aws-xray-daemon
# Run X-Ray daemonsudo systemctl enable xraysudo systemctl start xray
# Get trace summary#!/bin/bash# ~/bin/xray-trace-summary.shset -euo pipefail
TRACE_ID="${1:-1-12345678-1234567890abcdef1234567890abcdef}"
echo "=== Trace: $TRACE_ID ==="aws xray get-trace-summaries \ --start-time $(date -d "1 hour ago" +%s)000 \ --query-string "[?TraceId=='$TRACE_ID']"
# Analyze service performanceaws xray get-service-graph \ --start-time $(date -d "1 hour ago" +%s)000 \ --end-time $(date +%s)000 \ --output json | jq '.Services[] | select(.SummaryStatistics.ErrorStatistics.ErrorRate > 0)'38.11 Common Mistakes & Anti-Patterns
Section titled “38.11 Common Mistakes & Anti-Patterns” X-Ray Anti-Patterns+------------------------------------------------------------------+| || ❌ Mistake 1: Not Using Subsegments || +----------------------------------------------------------+ || | Problem: Only seeing top-level timing | || | Impact: Can't identify slow database queries/API calls| || | Fix: Add subsegments for database and HTTP calls | || +----------------------------------------------------------+ || || ❌ Mistake 2: Sampling Too Aggressively || +----------------------------------------------------------+ || | Problem: Missing important traces | || | Impact: Can't reproduce intermittent issues | || | Fix: Use adaptive sampling, increase rate for errors | || +----------------------------------------------------------+ || || ❌ Mistake 3: Not Using Annotations for Filtering || +----------------------------------------------------------+ || | Problem: Can't search traces by user/order ID | || | Impact: Hard to investigate user-specific issues | || | Fix: Add annotations for user ID, request ID | || +----------------------------------------------------------+ || || ❌ Mistake 4: Ignoring X-Ray in Lambda || +----------------------------------------------------------+ || | Problem: Missing Lambda traces | || | Impact: Incomplete view of serverless applications | || | Fix: Enable X-Ray active tracing in Lambda config | || +----------------------------------------------------------+ || |+------------------------------------------------------------------+38.12 Interview Questions
Section titled “38.12 Interview Questions”Conceptual Questions
Section titled “Conceptual Questions”-
Q: Explain the difference between segments and subsegments in X-Ray.
- A: Segments represent the work done by a single service. They contain metadata about the request, response, and timing. Subsegments are subdivisions of segments that provide detailed timing for downstream calls (database queries, HTTP requests, AWS service calls). Subsegments help identify exactly where in the request path time is being spent.
-
Q: How does X-Ray sampling work?
- A: X-Ray uses sampling rules to reduce the number of traces collected. Default rule samples 1% of requests (5% minimum). You can create custom rules based on: service names, HTTP methods, paths, reservoirs (fixed rate), and rates. Adaptive sampling increases sampling during low traffic and decreases during high traffic.
Scenario-Based Questions
Section titled “Scenario-Based Questions”- Q: A user reports slow API responses. How would you use X-Ray to diagnose?
- A: Use X-Ray Service Map to see which services have high latency. Look at trace details to identify the slowest segment/subsegment. Check if it’s database queries, external API calls, or Lambda execution. Use annotations to filter by user ID and trace specific user requests. Compare latency percentiles (p50, p95, p99).
38.13 Exam Tips
Section titled “38.13 Exam Tips” Key Exam Points+------------------------------------------------------------------+| || 1. X-Ray provides end-to-end request tracing || || 2. Traces contain segments, segments contain subsegments || || 3. Service Map visualizes application architecture || || 4. Annotations are indexed for search || || 5. Metadata is not indexed, can be any object || || 6. Sampling rules control data collection rate || || 7. X-Ray daemon runs on EC2/ECS to send data || || 8. Lambda integration is automatic with X-Ray enabled || || 9. X-Ray supports cross-account tracing || || 10. CloudWatch ServiceLens combines X-Ray and CloudWatch || |+------------------------------------------------------------------+38.14 Summary
Section titled “38.14 Summary” Chapter 38 Summary+------------------------------------------------------------------+| || X-Ray Core Concepts || +------------------------------------------------------------+ || | - Traces: End-to-end request tracking | || | - Segments: Service-level processing | || | - Subsegments: Operation-level details | || | - Service Map: Visual architecture | || +------------------------------------------------------------+ || || Key Features || +------------------------------------------------------------+ || | - Distributed tracing | || | - Error analysis | || | - Performance monitoring | || | - Dependency mapping | || +------------------------------------------------------------+ || || Integration || +------------------------------------------------------------+ || | - X-Ray SDK for application code | || | - AWS SDK auto-instrumentation | || | - Native integration with Lambda, API Gateway, etc. | || +------------------------------------------------------------+ || |+------------------------------------------------------------------+Previous Chapter: Chapter 37: AWS CloudTrail - API Auditing Next Chapter: Chapter 39: Amazon OpenSearch Service - Log Analytics