Skip to content

AWS X-Ray - Distributed Tracing

Chapter 38: AWS X-Ray - Distributed Tracing

Section titled “Chapter 38: AWS X-Ray - Distributed Tracing”

AWS X-Ray helps developers analyze and debug production, distributed applications, such as those built using a microservices architecture.

AWS X-Ray Overview
+------------------------------------------------------------------+
| |
| +------------------------+ |
| | AWS X-Ray | |
| +------------------------+ |
| | |
| +---------------------+---------------------+ |
| | | | | |
| v v v v |
| +----------+ +----------+ +----------+ +----------+ |
| | Traces | | Segments | | Service | | Sampling | |
| | | | | | Map | | Rules | |
| | - Request| | - Subseg | | | | | |
| | Track | | - Metadata| | - Visual | | - Config | |
| | - Timing | | | | - Depend | | - Rate | |
| +----------+ +----------+ +----------+ +----------+ |
| |
+------------------------------------------------------------------+
FeatureDescription
TracesEnd-to-end request tracking
SegmentsService-level processing details
Service MapVisual representation of services
SamplingControl data collection rate

X-Ray Trace Hierarchy
+------------------------------------------------------------------+
| |
| Trace |
| +------------------------------------------------------------+ |
| | | |
| | - Unique ID for entire request | |
| | - Collection of segments | |
| | - Tracks request through all services | |
| | | |
| | +--------------------------------------------------------+ | |
| | | Segment (Service) | | |
| | | +----------------------------------------------------+ | | |
| | | | - Processing on a single service | | | |
| | | | - Start/end time | | | |
| | | | - HTTP method, URL, response code | | | |
| | | | | | | |
| | | | +------------------------------------------------+ | | | |
| | | | | Subsegment (Operation) | | | | |
| | | | | +----------------------------------------------+ | | | | |
| | | | | | - Specific operation within service | | | | | |
| | | | | | - Database query, HTTP call, function call | | | | | |
| | | | | | - Custom metadata | | | | | |
| | | | | +----------------------------------------------+ | | | | |
| | | | +------------------------------------------------+ | | | |
| | | +----------------------------------------------------+ | | |
| | +--------------------------------------------------------+ | |
| | | |
| +------------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
X-Ray Service Map Example
+------------------------------------------------------------------+
| |
| +------------------------+ |
| | Client | |
| +------------------------+ |
| | |
| v |
| +------------------------+ |
| | Load Balancer | |
| | (ELB) | |
| +------------------------+ |
| | |
| +---------------+---------------+ |
| | | |
| v v |
| +----------------+ +----------------+ |
| | Web Service | | API Service | |
| | (EC2) | | (Lambda) | |
| +----------------+ +----------------+ |
| | | |
| v v |
| +----------------+ +----------------+ |
| | Database | | Cache | |
| | (RDS) | | (ElastiCache)| |
| +----------------+ +----------------+ |
| |
| Colors indicate health: |
| - Green: Healthy |
| | - Yellow: Degraded |
| | - Red: Error |
| |
+------------------------------------------------------------------+

X-Ray Supported Services
+------------------------------------------------------------------+
| |
| Compute |
| +------------------------------------------------------------+ |
| | - Amazon EC2 | |
| | - AWS Lambda | |
| | - Amazon ECS | |
| | - Amazon EKS | |
| | - AWS Elastic Beanstalk | |
| +------------------------------------------------------------+ |
| |
| Networking |
| +------------------------------------------------------------+ |
| | - Elastic Load Balancing (ALB/NLB) | |
| | - Amazon API Gateway | |
| | - Amazon CloudFront | |
| +------------------------------------------------------------+ |
| |
| Database |
| +------------------------------------------------------------+ |
| | - Amazon RDS | |
| | - Amazon DynamoDB | |
| | - Amazon ElastiCache | |
| +------------------------------------------------------------+ |
| |
| Messaging |
| +------------------------------------------------------------+ |
| | - Amazon SNS | |
| | - Amazon SQS | |
| | - Amazon Kinesis | |
| +------------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
X-Ray SDK Integration
+------------------------------------------------------------------+
| |
| Application Code |
| +------------------------------------------------------------+ |
| | | |
| | // Node.js Example | |
| | const AWSXRay = require('aws-xray-sdk'); | |
| | const app = require('express')(); | |
| | | |
| | // Enable X-Ray for all routes | |
| | app.use(AWSXRay.express.openSegment('MyApp')); | |
| | | |
| | app.get('/', (req, res) => { | |
| | const segment = req.segment; | |
| | segment.addAnnotation('user', 'alice'); | |
| | segment.addMetadata('custom', 'data'); | |
| | res.send('Hello World'); | |
| | }); | |
| | | |
| | app.use(AWSXRay.express.closeSegment()); | |
| | | |
| +------------------------------------------------------------+ |
| |
| AWS SDK Instrumentation |
| +------------------------------------------------------------+ |
| | | |
| | // Capture AWS SDK calls | |
| | const AWS = AWSXRay.captureAWS(require('aws-sdk')); | |
| | | |
| | // All AWS SDK calls are automatically traced | |
| | const s3 = new AWS.S3(); | |
| | await s3.getObject({Bucket: 'my-bucket', Key: 'file'}).promise(); |
| | | |
| +------------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

X-Ray Sampling Rules
+------------------------------------------------------------------+
| |
| Default Sampling Rule |
| +------------------------------------------------------------+ |
| | | |
| | - FixedRate: 1 (100% of requests) | |
| | - HTTPMethod: * | |
| | - Path: * | |
| | - ServiceName: * | |
| | - ResourceType: * | |
| | | |
| +------------------------------------------------------------+ |
| |
| Custom Sampling Rules |
| +------------------------------------------------------------+ |
| | | |
| | Rule: HighTrafficAPI | |
| | +------------------------------------------------------+ | |
| | | - Priority: 1 | | |
| | | - HTTPMethod: GET | | |
| | | - Path: /api/health | | |
| | | - FixedRate: 0.01 (1% sampling) | | |
| | | - ReservoirSize: 10 (per second) | | |
| | +------------------------------------------------------+ | |
| | | |
| | Rule: CriticalAPI | |
| | +------------------------------------------------------+ | |
| | | - Priority: 2 | | |
| | | - HTTPMethod: POST | | |
| | | - Path: /api/checkout | | |
| | | - FixedRate: 1 (100% sampling) | | |
| | +------------------------------------------------------+ | |
| | | |
| +------------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

X-Ray Annotations vs Metadata
+------------------------------------------------------------------+
| |
| Annotations |
| +------------------------------------------------------------+ |
| | | |
| | - Indexed for search | |
| | - Key-value pairs | |
| | - Limited to string, number, boolean | |
| | - Use for filtering traces | |
| | | |
| | Example: | |
| | +------------------------------------------------------+ | |
| | | segment.addAnnotation('userId', 'user-123'); | | |
| | | segment.addAnnotation('orderId', 45678); | | |
| | | segment.addAnnotation('isPremium', true); | | |
| | +------------------------------------------------------+ | |
| | | |
| +------------------------------------------------------------+ |
| |
| Metadata |
| +------------------------------------------------------------+ |
| | | |
| | - Not indexed | |
| | - Can be any object | |
| | - Use for additional context | |
| | | |
| | Example: | |
| | +------------------------------------------------------+ | |
| | | segment.addMetadata('request', { | | |
| | | headers: req.headers, | | |
| | | body: req.body | | |
| | | }); | | |
| | +------------------------------------------------------+ | |
| | | |
| +------------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Terminal window
# Get sampling rules
aws xray get-sampling-rules
# Create sampling rule
aws xray create-sampling-rule \
--sampling-rule-record '{
"SamplingRule": {
"RuleName": "MyRule",
"RuleARN": "arn:aws:xray:us-east-1:123456789012:sampling-rule/MyRule",
"ResourceARN": "*",
"Priority": 1,
"FixedRate": 0.1,
"ReservoirSize": 10,
"ServiceName": "*",
"ServiceType": "*",
"Host": "*",
"HTTPMethod": "*",
"URLPath": "*",
"Version": 1
}
}'
# Get service graph
aws xray get-service-graph \
--start-time 1704067200 \
--end-time 1704153600
# Get trace summaries
aws xray get-trace-summaries \
--start-time 1704067200 \
--end-time 1704153600
# Get traces by IDs
aws xray batch-get-traces \
--trace-ids '["trace-id-1","trace-id-2"]'
# Put trace segments
aws xray put-trace-segments \
--trace-segment-documents 'file://segments.json'
# Put telemetry records
aws xray put-telemetry-records \
--telemetry-records 'file://telemetry.json'
# Delete sampling rule
aws xray delete-sampling-rule \
--rule-name "MyRule"
# Update sampling rule
aws xray update-sampling-rule \
--sampling-rule-update '{
"RuleName": "MyRule",
"FixedRate": 0.05,
"ReservoirSize": 20
}'

Cross-Account X-Ray Tracing
+------------------------------------------------------------------+
| |
| Account A (Production) Account B (Monitoring) |
| +------------------------+ +------------------------+ |
| | | | | |
| | +----------+ | | +----------+ | |
| | | Lambda | | | | X-Ray | | |
| | | Function |--------->|----->| | Console | | |
| | +----------+ | | +----------+ | |
| | | | | |
| | +----------+ | | +----------+ | |
| | | API | | | | Service | | |
| | | Gateway |--------->|----->| | Map | | |
| | +----------+ | | +----------+ | |
| | | | | |
| +------------------------+ +------------------------+ |
| |
| Configuration: |
| +------------------------------------------------------------+ |
| | # Enable cross-account tracing | |
| | aws xray put-resource-policy \ | |
| | --policy-name CrossAccountPolicy \ | |
| | --policy-document '{ | |
| | "Version": "2012-10-17", | |
| | "Statement": [ | |
| | { | |
| | "Effect": "Allow", | |
| | "Principal": { "AWS": "arn:aws:iam::ACCOUNT-B" }, | |
| | "Action": [ | |
| | "xray:PutTraceSegments", | |
| | "xray:PutTelemetryRecords" | |
| | ] | |
| | } | |
| | ] | |
| | }' | |
| +------------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
# Python (Flask) X-Ray Integration
from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.ext.flask.middleware import XRayMiddleware
from flask import Flask
app = Flask(__name__)
# Configure X-Ray
xray_recorder.configure(
service='my-flask-app',
sampling=True,
context_missing='LOG_ERROR'
)
# Add X-Ray middleware
XRayMiddleware(app, xray_recorder)
@app.route('/api/users')
@xray_recorder.capture('get_users')
def get_users():
# Custom subsegment
with xray_recorder.in_subsegment('database_query') as subsegment:
subsegment.put_annotation('user_id', '12345')
subsegment.put_metadata('query', {'table': 'users'})
# Database query here
users = query_database()
return {'users': users}
# Custom annotations and metadata
@app.route('/api/orders/<order_id>')
def get_order(order_id):
segment = xray_recorder.current_segment()
segment.put_annotation('order_id', order_id)
segment.put_metadata('request_info', {
'source': 'web',
'version': '1.0'
})
return get_order_from_db(order_id)
// Node.js X-Ray Integration
const AWSXRay = require('aws-xray-sdk');
const AWS = AWSXRay.captureAWS(require('aws-sdk'));
const express = require('express');
const app = express();
// Enable X-Ray middleware
app.use(AWSXRay.express.openSegment('MyApp'));
app.get('/api/products', async (req, res) => {
const segment = AWSXRay.getSegment();
// Add annotation
segment.addAnnotation('product_category', 'electronics');
// Add metadata
segment.addMetadata('request_id', req.headers['x-request-id']);
// Custom subsegment
AWSXRay.captureAsyncFunc('fetchProducts', async (subsegment) => {
try {
const products = await fetchProductsFromDB();
subsegment.close();
res.json(products);
} catch (err) {
subsegment.close(err);
res.status(500).json({ error: err.message });
}
});
});
app.use(AWSXRay.express.closeSegment());
# SAM template with X-Ray enabled
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Handler: index.handler
Runtime: nodejs18.x
Tracing: Active # Enable X-Ray tracing
Policies:
- Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- xray:PutTraceSegments
- xray:PutTelemetryRecords
Resource: '*'
CloudWatch ServiceLens Architecture
+------------------------------------------------------------------+
| |
| +------------------------+ |
| | CloudWatch | |
| | ServiceLens | |
| +------------------------+ |
| | |
| +---------------------+---------------------+ |
| | | | | |
| v v v v |
| +----------+ +----------+ +----------+ +----------+ |
| | X-Ray | | CloudWatch| | App | | Service | |
| | Traces | | Metrics | | Insights | | Map | |
| | | | | | | | | |
| | - Request| | - Latency| | - Runtime| | - Deps | |
| | Paths | | - Errors | | Metrics| | - Health | |
| | - Errors | | - Invoc. | | - Errors | | - Latency| |
| +----------+ +----------+ +----------+ +----------+ |
| |
| Benefits: |
| +------------------------------------------------------------+ |
| | - Correlate traces with metrics | |
| | - Visualize service dependencies | |
| | - Identify bottlenecks quickly | |
| | - Root cause analysis | |
| +------------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
X-Ray Insights Features
+------------------------------------------------------------------+
| |
| Insights automatically detects: |
| +------------------------------------------------------------+ |
| | | |
| | 1. High latency operations | |
| | +----------------------------------------------------+ | |
| | | - Slow database queries | | |
| | | - External API calls | | |
| | | - Resource contention | | |
| | +----------------------------------------------------+ | |
| | | |
| | 2. Error patterns | |
| | +----------------------------------------------------+ | |
| | | - Recurring exceptions | | |
| | | - HTTP 5xx errors | | |
| | | - Timeout issues | | |
| | +----------------------------------------------------+ | |
| | | |
| | 3. Anomalies | |
| | +----------------------------------------------------+ | |
| | | - Traffic spikes | | |
| | | - Latency outliers | | |
| | | - Error rate changes | | |
| | +----------------------------------------------------+ | |
| | | |
| +------------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

X-Ray Best Practices
+------------------------------------------------------------------+
| |
| 1. Use annotations for searchable data |
| +------------------------------------------------------------+ |
| | - Add userId, orderId, etc. for filtering | |
| | - Keep annotations simple (string, number, boolean) | |
| +------------------------------------------------------------+ |
| |
| 2. Use subsegments for detailed operations |
| +------------------------------------------------------------+ |
| | - Trace database queries | |
| | - Trace external HTTP calls | |
| | - Trace internal function calls | |
| +------------------------------------------------------------+ |
| |
| 3. Configure sampling for high-traffic applications |
| +------------------------------------------------------------+ |
| | - Reduce costs | |
| | - Focus on critical paths | |
| +------------------------------------------------------------+ |
| |
| 4. Integrate with CloudWatch ServiceLens |
| +------------------------------------------------------------+ |
| | - Combined view of metrics and traces | |
| | - Enhanced monitoring | |
| +------------------------------------------------------------+ |
| |
| 5. Use X-Ray daemon on EC2/ECS |
| +------------------------------------------------------------+ |
| | - Collects segments from application | |
| | - Sends to X-Ray service | |
| +------------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

X-Ray is essential for distributed tracing in microservices architectures. SREs use it to understand request flows, identify bottlenecks, and troubleshoot latency issues across services.

X-Ray in DevOps/SRE
+------------------------------------------------------------------+
| |
| SRE Observability for Microservices: |
| |
| 1. End-to-End Tracing |
| +----------------------------------------------------------+ |
| | - Track requests across multiple services | |
| | - Identify where latency is introduced | |
| | - Visualize dependency map with Service Map | |
| +----------------------------------------------------------+ |
| |
| 2. Error Analysis |
| +----------------------------------------------------------+ |
| | - Identify failing services quickly | |
| | - Trace errors back to root cause | |
| | - Analyze error patterns across services | |
| +----------------------------------------------------------+ |
| |
| 3. Performance Optimization |
| +----------------------------------------------------------+ |
| | - Find slowest operations in request path | |
| | - Optimize database queries and external calls | |
| | - Set performance SLOs based on trace data | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Terminal window
# Install X-Ray daemon on Arch Linux
sudo pacman -S aws-cli-v2 jq
yay -S aws-xray-daemon
# Run X-Ray daemon
sudo systemctl enable xray
sudo systemctl start xray
# Get trace summary
#!/bin/bash
# ~/bin/xray-trace-summary.sh
set -euo pipefail
TRACE_ID="${1:-1-12345678-1234567890abcdef1234567890abcdef}"
echo "=== Trace: $TRACE_ID ==="
aws xray get-trace-summaries \
--start-time $(date -d "1 hour ago" +%s)000 \
--query-string "[?TraceId=='$TRACE_ID']"
# Analyze service performance
aws xray get-service-graph \
--start-time $(date -d "1 hour ago" +%s)000 \
--end-time $(date +%s)000 \
--output json | jq '.Services[] | select(.SummaryStatistics.ErrorStatistics.ErrorRate > 0)'

X-Ray Anti-Patterns
+------------------------------------------------------------------+
| |
| ❌ Mistake 1: Not Using Subsegments |
| +----------------------------------------------------------+ |
| | Problem: Only seeing top-level timing | |
| | Impact: Can't identify slow database queries/API calls| |
| | Fix: Add subsegments for database and HTTP calls | |
| +----------------------------------------------------------+ |
| |
| ❌ Mistake 2: Sampling Too Aggressively |
| +----------------------------------------------------------+ |
| | Problem: Missing important traces | |
| | Impact: Can't reproduce intermittent issues | |
| | Fix: Use adaptive sampling, increase rate for errors | |
| +----------------------------------------------------------+ |
| |
| ❌ Mistake 3: Not Using Annotations for Filtering |
| +----------------------------------------------------------+ |
| | Problem: Can't search traces by user/order ID | |
| | Impact: Hard to investigate user-specific issues | |
| | Fix: Add annotations for user ID, request ID | |
| +----------------------------------------------------------+ |
| |
| ❌ Mistake 4: Ignoring X-Ray in Lambda |
| +----------------------------------------------------------+ |
| | Problem: Missing Lambda traces | |
| | Impact: Incomplete view of serverless applications | |
| | Fix: Enable X-Ray active tracing in Lambda config | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

  1. Q: Explain the difference between segments and subsegments in X-Ray.

    • A: Segments represent the work done by a single service. They contain metadata about the request, response, and timing. Subsegments are subdivisions of segments that provide detailed timing for downstream calls (database queries, HTTP requests, AWS service calls). Subsegments help identify exactly where in the request path time is being spent.
  2. Q: How does X-Ray sampling work?

    • A: X-Ray uses sampling rules to reduce the number of traces collected. Default rule samples 1% of requests (5% minimum). You can create custom rules based on: service names, HTTP methods, paths, reservoirs (fixed rate), and rates. Adaptive sampling increases sampling during low traffic and decreases during high traffic.
  1. Q: A user reports slow API responses. How would you use X-Ray to diagnose?
    • A: Use X-Ray Service Map to see which services have high latency. Look at trace details to identify the slowest segment/subsegment. Check if it’s database queries, external API calls, or Lambda execution. Use annotations to filter by user ID and trace specific user requests. Compare latency percentiles (p50, p95, p99).

Exam Tip

Key Exam Points
+------------------------------------------------------------------+
| |
| 1. X-Ray provides end-to-end request tracing |
| |
| 2. Traces contain segments, segments contain subsegments |
| |
| 3. Service Map visualizes application architecture |
| |
| 4. Annotations are indexed for search |
| |
| 5. Metadata is not indexed, can be any object |
| |
| 6. Sampling rules control data collection rate |
| |
| 7. X-Ray daemon runs on EC2/ECS to send data |
| |
| 8. Lambda integration is automatic with X-Ray enabled |
| |
| 9. X-Ray supports cross-account tracing |
| |
| 10. CloudWatch ServiceLens combines X-Ray and CloudWatch |
| |
+------------------------------------------------------------------+

Chapter 38 Summary
+------------------------------------------------------------------+
| |
| X-Ray Core Concepts |
| +------------------------------------------------------------+ |
| | - Traces: End-to-end request tracking | |
| | - Segments: Service-level processing | |
| | - Subsegments: Operation-level details | |
| | - Service Map: Visual architecture | |
| +------------------------------------------------------------+ |
| |
| Key Features |
| +------------------------------------------------------------+ |
| | - Distributed tracing | |
| | - Error analysis | |
| | - Performance monitoring | |
| | - Dependency mapping | |
| +------------------------------------------------------------+ |
| |
| Integration |
| +------------------------------------------------------------+ |
| | - X-Ray SDK for application code | |
| | - AWS SDK auto-instrumentation | |
| | - Native integration with Lambda, API Gateway, etc. | |
| +------------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Previous Chapter: Chapter 37: AWS CloudTrail - API Auditing Next Chapter: Chapter 39: Amazon OpenSearch Service - Log Analytics