Change_management
Chapter 98: Change Management
Section titled “Chapter 98: Change Management”Overview
Section titled “Overview”Change management is a critical process in IT operations that ensures all changes to production systems are properly reviewed, tested, and documented. It minimizes the risk of service disruptions while enabling the organization to respond quickly to business needs. This chapter covers the complete change management lifecycle, industry frameworks (ITIL, COBIT), practical implementation, and DevOps/Agile approaches to change management. Understanding change management is essential for DevOps and SRE roles, as it forms the backbone of safe software delivery and infrastructure modifications.
98.1 Change Process
Section titled “98.1 Change Process”Change Workflow
Section titled “Change Workflow”┌─────────────────────────────────────────────────────────────────────────┐│ CHANGE WORKFLOW │├─────────────────────────────────────────────────────────────────────────┤│ ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ CHANGE LIFECYCLE │ ││ ├─────────────────────────────────────────────────────────────────┤ ││ │ │ ││ │ ┌─────────┐ │ ││ │ │ CREATE │ ──► Create change request with details │ ││ │ │ Request │ - What, Why, When, How, Risk │ ││ │ └────┬────┘ │ ││ │ │ │ ││ │ ▼ │ ││ │ ┌─────────┐ │ ││ │ │ REVIEW │ ──► Technical and business review │ ││ │ │ │ - Assess impact, dependencies, risks │ ││ │ └────┬────┘ │ ││ │ │ │ ││ │ ▼ │ ││ │ ┌─────────┐ │ ││ │ │ APPROVAL│ ──► Get authorization to proceed │ ││ │ │ │ - Based on risk level │ ││ │ └────┬────┘ │ ││ │ │ │ ││ │ ▼ │ ││ │ ┌─────────┐ │ ││ │ │IMPLEMENT│ ──► Execute the change │ ││ │ │ │ - Follow documented procedure │ ││ │ └────┬────┘ │ ││ │ │ │ ││ │ ▼ │ ││ │ ┌─────────┐ │ ││ │ │ VERIFY │ ──► Confirm change achieved desired result │ ││ │ │ │ - Test functionality, monitoring │ ││ │ └────┬────┘ │ ││ │ │ │ ││ │ ▼ │ ││ │ ┌─────────┐ │ ││ │ │ CLOSE │ ──► Document lessons, update knowledge base │ ││ │ │ │ - Complete change record │ ││ │ └─────────┘ │ ││ │ │ ││ └─────────────────────────────────────────────────────────────────┘ ││ ││ Change Types: ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ │ ││ │ EMERGENCY CHANGES │ ││ │ - Unplanned, immediate implementation required │ ││ │ - Post-implementation approval (within 24-48 hours) │ ││ │ - Requires incident ticket linkage │ ││ │ - Minimal documentation, but must be captured │ ││ │ │ ││ │ STANDARD CHANGES │ ││ │ - Pre-approved, routine changes │ ││ │ - Low risk, well-understood │ ││ │ - No additional approval needed │ ││ │ - Examples: security patches, configuration updates │ ││ │ │ ││ │ NORMAL CHANGES │ ││ │ - Full review and approval process │ ││ │ - Moderate to high risk │ ││ │ - Requires CAB approval for high-risk │ ││ │ - Examples: infrastructure upgrades, new deployments │ ││ │ │ ││ └─────────────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────┘Risk Assessment Matrix
Section titled “Risk Assessment Matrix”┌─────────────────────────────────────────────────────────────────────────┐│ RISK ASSESSMENT │├─────────────────────────────────────────────────────────────────────────┤│ ││ Risk Score = Impact × Likelihood ││ ││ Impact Levels: ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ Level 1 (Low) - No customer impact, easy rollback │ ││ │ Level 2 (Moderate) - Minor impact, quick rollback possible │ ││ │ Level 3 (High) - Significant impact, careful planning │ ││ │ Level 4 (Critical)- Major outage potential, extensive testing│ ││ └─────────────────────────────────────────────────────────────────┘ ││ ││ Likelihood: ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ 1 - Rare | 2 - Unlikely | 3 - Possible | 4 - Likely | 5 - Almost Certain │ ││ └─────────────────────────────────────────────────────────────────┘ ││ ││ Risk Score Matrix: ││ ┌─────────┬───────┬───────┬───────┬───────┬───────┐ │ ││ │ Impact │ Rare │Unlikely│Possible│ Likely│ Certain│ │ ││ ├─────────┼───────┼───────┼───────┼───────┼───────┤ │ ││ │ Critical│ 4 │ 8 │ 12 │ 16 │ 20 │ │ ││ │ High │ 3 │ 6 │ 9 │ 12 │ 15 │ │ ││ │ Moderate│ 2 │ 4 │ 6 │ 8 │ 10 │ │ ││ │ Low │ 1 │ 2 │ 3 │ 4 │ 5 │ │ ││ └─────────┴───────┴───────┴───────┴───────┴───────┘ │ ││ ││ Risk Acceptance: ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ 1-4: Low Risk - Approve automatically │ ││ │ 5-9: Medium Risk - Manager approval │ ││ │ 10-14: High Risk - CAB approval required │ ││ │ 15-20: Critical - Executive approval required │ ││ └─────────────────────────────────────────────────────────────────┘ ││ ││ Questions to Answer: ││ - What could go wrong? ││ - What's the blast radius? ││ - What's the rollback plan? ││ - How do we verify success? ││ - What's the backout time? ││ │└─────────────────────────────────────────────────────────────────────────┘98.2 Change Management Frameworks
Section titled “98.2 Change Management Frameworks”ITIL Change Management
Section titled “ITIL Change Management”┌─────────────────────────────────────────────────────────────────────────┐│ ITIL CHANGE MANAGEMENT │├─────────────────────────────────────────────────────────────────────────┤│ ││ Key Elements: ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ │ ││ │ Change Manager: │ ││ │ - Owns the change process │ ││ │ - Coordinates with stakeholders │ ││ │ - Ensures compliance │ ││ │ │ ││ │ Change Advisory Board (CAB): │ ││ │ - Reviews high-risk changes │ ││ │ - Includes: IT management, technical leads, security │ ││ │ - Weekly meetings (typically) │ ││ │ - Makes approval/rejection decisions │ ││ │ │ ││ │ Emergency Change Advisory Board (ECAB): │ ││ │ - Subset of CAB for emergency changes │ ││ │ - Quick decision-making │ ││ │ - Meets ad-hoc as needed │ ││ │ │ ││ │ Change Model Components: │ ││ │ - Request for Change (RFC) │ ││ │ - Impact assessment │ ││ │ - Implementation plan │ ││ │ - Test plan │ ││ │ - Rollback procedure │ ││ │ - Post-implementation review │ ││ │ │ ││ └─────────────────────────────────────────────────────────────────┘ ││ ││ Metrics: ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ - % of successful changes │ ││ │ - % of changes requiring rollback │ ││ │ - Average change implementation time │ ││ │ - Number of emergency changes │ ││ │ - Change lead time (request to implementation) │ ││ │ - Number of rejected changes │ ││ └─────────────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────┘98.3 Best Practices
Section titled “98.3 Best Practices”Implementation Best Practices
Section titled “Implementation Best Practices”# ============================================================# CHANGE MANAGEMENT BEST PRACTICES# ============================================================
# Change Advisory Board (CAB)# - Meet regularly (weekly recommended)# - Review all normal and high-risk changes# - Include diverse stakeholders# - Document all decisions
# Example CAB Meeting Agenda:# 1. Review previous action items (5 min)# 2. New change requests (30 min)# 3. Emergency changes (10 min)# 4. Metrics review (5 min)# 5. Process improvements (10 min)
# Automation# Infrastructure as Code- Use Terraform, Ansible, CloudFormation- Version control all changes- Peer review via pull requests
# CI/CD Pipelines- Automated testing at each stage- Automated deployment- Automated rollback capabilities- Blue-green or canary deployments
# Change Categories and Automationstandard_changes: - Security patches (automated) - SSL certificate renewal - Database index changes - Configuration updates
normal_changes: - New service deployment - Infrastructure modifications - Application updates - Requires manual approval
emergency_changes: - Security vulnerability fixes - Critical bug fixes - Outage remediation
# Communicationnotify_stakeholders: - Email notifications - Chat bot updates - Status page updates - Team standups
# Post-Implementation Review (PIR)# Schedule within 48-72 hours# Discuss:# - What went well?# - What could be improved?# - Lessons learned# - Action itemsDevOps Approach to Change Management
Section titled “DevOps Approach to Change Management”┌─────────────────────────────────────────────────────────────────────────┐│ DEVOPS APPROACH TO CHANGES │├─────────────────────────────────────────────────────────────────────────┤│ ││ Traditional ITAM → DevOps: ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ │ ││ │ Heavy documentation ──► Working software over documentation │ ││ │ Manual approvals ──► Automated approvals │ ││ │ Long cycles ──► Short, frequent changes │ ││ │ Fear of change ──► Embrace change │ ││ │ Siloed teams ──► Cross-functional teams │ ││ │ Big bang releases ──► Incremental deployments │ ││ │ │ ││ └─────────────────────────────────────────────────────────────────┘ ││ ││ DevOps Change Principles: ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ │ ││ │ 1. Make small, frequent changes (Trunk-based development) │ ││ │ 2. Everything in version control │ ││ │ 3. Automated testing and deployment │ ││ │ 4. Feature flags for gradual rollouts │ ││ │ 5. Canary releases to detect issues early │ ││ │ 6. Automated rollback on failure │ ││ │ 7. Telemetry to detect problems quickly │ ││ │ 8. Blameless post-mortems │ ││ │ │ ││ └─────────────────────────────────────────────────────────────────┘ ││ ││ SRE Change Process: ││ ┌─────────────────────────────────────────────────────────────────┐ ││ │ │ ││ │ - Changes must be: │ ││ │ 1. Gradually rolled out │ ││ │ 2. Monitored closely │ ││ │ 3. Have quick rollback │ ││ │ 4. Have explicit success criteria │ ││ │ │ ││ │ - Error Budget: Changes faster when error budget is healthy │ ││ │ - Toil Reduction: Automate repetitive changes │ ││ │ │ ││ └─────────────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────┘98.4 Change Management Tools
Section titled “98.4 Change Management Tools”Tool Implementation
Section titled “Tool Implementation”# Example: Change Management in Jira Service Managementfields: change_type: - Emergency - Standard - Normal
risk_assessment: impact: - Critical - High - Moderate - Low likelihood: - Almost Certain - Likely - Possible - Unlikely - Rare
implementation_plan: - type: text - required: true
rollback_plan: - type: text - required: true
test_plan: - type: text - required: true
workflow: states: - Draft - Submitted - Under Review - Approved - Rejected - In Progress - Verified - Closed
transitions: - from: Draft to: Submitted trigger: Submit
- from: Submitted to: Under Review trigger: Start Review
- from: Under Review to: Approved trigger: Approve condition: risk_score <= 9
- from: Under Review to: Rejected trigger: Reject
- from: Approved to: In Progress trigger: Implement
- from: In Progress to: Verified trigger: Verify Success
- from: In Progress to: Verified trigger: Rollback condition: verification_failed
notifications: - on_approval_required: email to CAB - on_implementation_start: email to stakeholders - on_verification_complete: email to requester98.5 Interview Questions
Section titled “98.5 Interview Questions”┌─────────────────────────────────────────────────────────────────────────┐│ CHANGE MANAGEMENT INTERVIEW QUESTIONS │├─────────────────────────────────────────────────────────────────────────┤ │Q1: What is the change management process in your organization? │ │A1: │- Explain the workflow: Request → Review → Approve → Implement → Verify- Describe change types (Emergency, Standard, Normal) │- Risk assessment process │- Approval hierarchy │- Tools used (Jira, ServiceNow, etc.) │- Post-implementation reviews │ │─────────────────────────────────────────────────────────────────────────┤ │Q2: What's the difference between emergency and standard changes? │ │A2: │- Emergency: Immediate, unplanned, post-approval acceptable │- Standard: Pre-approved, routine, low-risk │- Normal: Full review process, moderate to high risk │ │─────────────────────────────────────────────────────────────────────────┤ │Q3: How do you handle a change that causes an incident? │ │A3: │- Immediately trigger rollback │- Log incident and link to change record │- Notify stakeholders │- Fix issue before re-attempting │- Conduct post-mortem after resolution │- Update change process if needed │ │─────────────────────────────────────────────────────────────────────────┤ │Q4: How does DevOps change traditional change management? │ │A4: │- Smaller, frequent changes vs large releases │- Automated testing reduces risk │- Feature flags enable instant rollback │- Canary deployments catch issues early │- Blameless culture encourages reporting │- Faster recovery through automation │ │─────────────────────────────────────────────────────────────────────────┤ │Q5: What factors do you consider in change risk assessment? │ │A5: │- Impact: Customer, revenue, data, compliance │- Likelihood: Probability of failure │- Complexity: Number of components affected │- Dependencies: What else might be affected │- Rollback complexity: How hard to undo │- Test coverage: How well is it tested │- Team experience: Familiar with the change? │ │─────────────────────────────────────────────────────────────────────────┤ │Q6: How would you implement a change management process from scratch? │ │A6: │1. Define change types and criteria │2. Create RFC template │3. Establish approval workflow │4. Form CAB (if needed) │5. Select/configure tool │6. Train team members │7. Start with pilot │8. Iterate based on feedback │9. Measure and improve │ │─────────────────────────────────────────────────────────────────────────┤ │Q7: How do you balance speed of delivery with change control? │ │A7: │- Use risk-based categorization │- Automate low-risk changes │- Use feature flags for gradual rollouts │- Implement canary deployments │- Trust but verify: Automated testing │- Error budgets allow faster delivery when stable │- Good telemetry reduces uncertainty │ │─────────────────────────────────────────────────────────────────────────┤ │Q8: Describe a time when you had to push back on a change request. │ │A8: │- Example: Insufficient testing, high risk without adequate rollback │- Explained concerns to stakeholder │- Proposed alternatives │- Reached compromise │- Documented decision │ │─────────────────────────────────────────────────────────────────────────┤ │Q9: What is a CAB and when is it necessary? │ │A9: │Change Advisory Board: │- Cross-functional team reviewing changes │- Needed for high-risk changes │- Provides collective decision-making │- Includes: IT management, technical leads, security, business reps │- Not needed for low-risk, pre-approved changes │ │─────────────────────────────────────────────────────────────────────────┤ │Q10: How do you measure change management effectiveness? │ │A10: │- Change success rate │- Rollback rate │- Change lead time │- Emergency change percentage │- Mean time to recovery for change-related incidents │- Stakeholder satisfaction │- Process compliance rate │ │└─────────────────────────────────────────────────────────────────────────┘Quick Reference
Section titled “Quick Reference”Change Workflow:Request → Review → Approve → Implement → Verify → Close
Change Types:- Emergency: Immediate, post-approval OK- Standard: Pre-approved, low-risk- Normal: Full review, moderate-high risk
Risk Assessment:- Impact × Likelihood- Low (1-9): Approve- Medium (10-14): CAB approval- High (15-20): Executive approval
Best Practices:- Small, frequent changes- Everything in version control- Automated testing- Rollback ready- Canary deployments- Post-implementation reviewSummary
Section titled “Summary”- Process: Request → Review → Approve → Implement → Verify → Close
- Types: Emergency (post-approval), Standard (pre-approved), Normal (full process)
- Risk: Impact × Likelihood determines approval level
- CAB: Change Advisory Board reviews high-risk changes
- DevOps: Small changes, automation, quick rollback
Next Chapter
Section titled “Next Chapter”Chapter 99: Incident Management
Last Updated: February 2026