Change Management
Chapter 98: Change Management
Section titled βChapter 98: Change ManagementβOverview
Section titled βOverviewβChange management is a critical process in IT operations that ensures all changes to production systems are properly reviewed, tested, and documented. It minimizes the risk of service disruptions while enabling the organization to respond quickly to business needs. This chapter covers the complete change management lifecycle, industry frameworks (ITIL, COBIT), practical implementation, and DevOps/Agile approaches to change management. Understanding change management is essential for DevOps and SRE roles, as it forms the backbone of safe software delivery and infrastructure modifications.
98.1 Change Process
Section titled β98.1 Change ProcessβChange Workflow
Section titled βChange Workflowβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ CHANGE WORKFLOW ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€β ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ β CHANGE LIFECYCLE β ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ ββ β β ββ β βββββββββββ β ββ β β CREATE β βββΊ Create change request with details β ββ β β Request β - What, Why, When, How, Risk β ββ β ββββββ¬βββββ β ββ β β β ββ β βΌ β ββ β βββββββββββ β ββ β β REVIEW β βββΊ Technical and business review β ββ β β β - Assess impact, dependencies, risks β ββ β ββββββ¬βββββ β ββ β β β ββ β βΌ β ββ β βββββββββββ β ββ β β APPROVALβ βββΊ Get authorization to proceed β ββ β β β - Based on risk level β ββ β ββββββ¬βββββ β ββ β β β ββ β βΌ β ββ β βββββββββββ β ββ β βIMPLEMENTβ βββΊ Execute the change β ββ β β β - Follow documented procedure β ββ β ββββββ¬βββββ β ββ β β β ββ β βΌ β ββ β βββββββββββ β ββ β β VERIFY β βββΊ Confirm change achieved desired result β ββ β β β - Test functionality, monitoring β ββ β ββββββ¬βββββ β ββ β β β ββ β βΌ β ββ β βββββββββββ β ββ β β CLOSE β βββΊ Document lessons, update knowledge base β ββ β β β - Complete change record β ββ β βββββββββββ β ββ β β ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ ββ Change Types: ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ β β ββ β EMERGENCY CHANGES β ββ β - Unplanned, immediate implementation required β ββ β - Post-implementation approval (within 24-48 hours) β ββ β - Requires incident ticket linkage β ββ β - Minimal documentation, but must be captured β ββ β β ββ β STANDARD CHANGES β ββ β - Pre-approved, routine changes β ββ β - Low risk, well-understood β ββ β - No additional approval needed β ββ β - Examples: security patches, configuration updates β ββ β β ββ β NORMAL CHANGES β ββ β - Full review and approval process β ββ β - Moderate to high risk β ββ β - Requires CAB approval for high-risk β ββ β - Examples: infrastructure upgrades, new deployments β ββ β β ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββRisk Assessment Matrix
Section titled βRisk Assessment Matrixβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ RISK ASSESSMENT ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€β ββ Risk Score = Impact Γ Likelihood ββ ββ Impact Levels: ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ β Level 1 (Low) - No customer impact, easy rollback β ββ β Level 2 (Moderate) - Minor impact, quick rollback possible β ββ β Level 3 (High) - Significant impact, careful planning β ββ β Level 4 (Critical)- Major outage potential, extensive testingβ ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ ββ Likelihood: ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ β 1 - Rare | 2 - Unlikely | 3 - Possible | 4 - Likely | 5 - Almost Certain β ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ ββ Risk Score Matrix: ββ βββββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ β ββ β Impact β Rare βUnlikelyβPossibleβ Likelyβ Certainβ β ββ βββββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββ€ β ββ β Criticalβ 4 β 8 β 12 β 16 β 20 β β ββ β High β 3 β 6 β 9 β 12 β 15 β β ββ β Moderateβ 2 β 4 β 6 β 8 β 10 β β ββ β Low β 1 β 2 β 3 β 4 β 5 β β ββ βββββββββββ΄ββββββββ΄ββββββββ΄ββββββββ΄ββββββββ΄ββββββββ β ββ ββ Risk Acceptance: ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ β 1-4: Low Risk - Approve automatically β ββ β 5-9: Medium Risk - Manager approval β ββ β 10-14: High Risk - CAB approval required β ββ β 15-20: Critical - Executive approval required β ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ ββ Questions to Answer: ββ - What could go wrong? ββ - What's the blast radius? ββ - What's the rollback plan? ββ - How do we verify success? ββ - What's the backout time? ββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ98.2 Change Management Frameworks
Section titled β98.2 Change Management FrameworksβITIL Change Management
Section titled βITIL Change Managementβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ITIL CHANGE MANAGEMENT ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€β ββ Key Elements: ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ β β ββ β Change Manager: β ββ β - Owns the change process β ββ β - Coordinates with stakeholders β ββ β - Ensures compliance β ββ β β ββ β Change Advisory Board (CAB): β ββ β - Reviews high-risk changes β ββ β - Includes: IT management, technical leads, security β ββ β - Weekly meetings (typically) β ββ β - Makes approval/rejection decisions β ββ β β ββ β Emergency Change Advisory Board (ECAB): β ββ β - Subset of CAB for emergency changes β ββ β - Quick decision-making β ββ β - Meets ad-hoc as needed β ββ β β ββ β Change Model Components: β ββ β - Request for Change (RFC) β ββ β - Impact assessment β ββ β - Implementation plan β ββ β - Test plan β ββ β - Rollback procedure β ββ β - Post-implementation review β ββ β β ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ ββ Metrics: ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ β - % of successful changes β ββ β - % of changes requiring rollback β ββ β - Average change implementation time β ββ β - Number of emergency changes β ββ β - Change lead time (request to implementation) β ββ β - Number of rejected changes β ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ98.3 Best Practices
Section titled β98.3 Best PracticesβImplementation Best Practices
Section titled βImplementation Best Practicesβ# ============================================================# CHANGE MANAGEMENT BEST PRACTICES# ============================================================
# Change Advisory Board (CAB)# - Meet regularly (weekly recommended)# - Review all normal and high-risk changes# - Include diverse stakeholders# - Document all decisions
# Example CAB Meeting Agenda:# 1. Review previous action items (5 min)# 2. New change requests (30 min)# 3. Emergency changes (10 min)# 4. Metrics review (5 min)# 5. Process improvements (10 min)
# Automation# Infrastructure as Code- Use Terraform, Ansible, CloudFormation- Version control all changes- Peer review via pull requests
# CI/CD Pipelines- Automated testing at each stage- Automated deployment- Automated rollback capabilities- Blue-green or canary deployments
# Change Categories and Automationstandard_changes: - Security patches (automated) - SSL certificate renewal - Database index changes - Configuration updates
normal_changes: - New service deployment - Infrastructure modifications - Application updates - Requires manual approval
emergency_changes: - Security vulnerability fixes - Critical bug fixes - Outage remediation
# Communicationnotify_stakeholders: - Email notifications - Chat bot updates - Status page updates - Team standups
# Post-Implementation Review (PIR)# Schedule within 48-72 hours# Discuss:# - What went well?# - What could be improved?# - Lessons learned# - Action itemsDevOps Approach to Change Management
Section titled βDevOps Approach to Change Managementβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ DEVOPS APPROACH TO CHANGES ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€β ββ Traditional ITAM β DevOps: ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ β β ββ β Heavy documentation βββΊ Working software over documentation β ββ β Manual approvals βββΊ Automated approvals β ββ β Long cycles βββΊ Short, frequent changes β ββ β Fear of change βββΊ Embrace change β ββ β Siloed teams βββΊ Cross-functional teams β ββ β Big bang releases βββΊ Incremental deployments β ββ β β ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ ββ DevOps Change Principles: ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ β β ββ β 1. Make small, frequent changes (Trunk-based development) β ββ β 2. Everything in version control β ββ β 3. Automated testing and deployment β ββ β 4. Feature flags for gradual rollouts β ββ β 5. Canary releases to detect issues early β ββ β 6. Automated rollback on failure β ββ β 7. Telemetry to detect problems quickly β ββ β 8. Blameless post-mortems β ββ β β ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ ββ SRE Change Process: ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ β β ββ β - Changes must be: β ββ β 1. Gradually rolled out β ββ β 2. Monitored closely β ββ β 3. Have quick rollback β ββ β 4. Have explicit success criteria β ββ β β ββ β - Error Budget: Changes faster when error budget is healthy β ββ β - Toil Reduction: Automate repetitive changes β ββ β β ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ98.4 Change Management Tools
Section titled β98.4 Change Management ToolsβTool Implementation
Section titled βTool Implementationβ# Example: Change Management in Jira Service Managementfields: change_type: - Emergency - Standard - Normal
risk_assessment: impact: - Critical - High - Moderate - Low likelihood: - Almost Certain - Likely - Possible - Unlikely - Rare
implementation_plan: - type: text - required: true
rollback_plan: - type: text - required: true
test_plan: - type: text - required: true
workflow: states: - Draft - Submitted - Under Review - Approved - Rejected - In Progress - Verified - Closed
transitions: - from: Draft to: Submitted trigger: Submit
- from: Submitted to: Under Review trigger: Start Review
- from: Under Review to: Approved trigger: Approve condition: risk_score <= 9
- from: Under Review to: Rejected trigger: Reject
- from: Approved to: In Progress trigger: Implement
- from: In Progress to: Verified trigger: Verify Success
- from: In Progress to: Verified trigger: Rollback condition: verification_failed
notifications: - on_approval_required: email to CAB - on_implementation_start: email to stakeholders - on_verification_complete: email to requester98.5 Interview Questions
Section titled β98.5 Interview Questionsβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ CHANGE MANAGEMENT INTERVIEW QUESTIONS ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ βQ1: What is the change management process in your organization? β βA1: β- Explain the workflow: Request β Review β Approve β Implement β Verify- Describe change types (Emergency, Standard, Normal) β- Risk assessment process β- Approval hierarchy β- Tools used (Jira, ServiceNow, etc.) β- Post-implementation reviews β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ βQ2: What's the difference between emergency and standard changes? β βA2: β- Emergency: Immediate, unplanned, post-approval acceptable β- Standard: Pre-approved, routine, low-risk β- Normal: Full review process, moderate to high risk β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ βQ3: How do you handle a change that causes an incident? β βA3: β- Immediately trigger rollback β- Log incident and link to change record β- Notify stakeholders β- Fix issue before re-attempting β- Conduct post-mortem after resolution β- Update change process if needed β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ βQ4: How does DevOps change traditional change management? β βA4: β- Smaller, frequent changes vs large releases β- Automated testing reduces risk β- Feature flags enable instant rollback β- Canary deployments catch issues early β- Blameless culture encourages reporting β- Faster recovery through automation β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ βQ5: What factors do you consider in change risk assessment? β βA5: β- Impact: Customer, revenue, data, compliance β- Likelihood: Probability of failure β- Complexity: Number of components affected β- Dependencies: What else might be affected β- Rollback complexity: How hard to undo β- Test coverage: How well is it tested β- Team experience: Familiar with the change? β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ βQ6: How would you implement a change management process from scratch? β βA6: β1. Define change types and criteria β2. Create RFC template β3. Establish approval workflow β4. Form CAB (if needed) β5. Select/configure tool β6. Train team members β7. Start with pilot β8. Iterate based on feedback β9. Measure and improve β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ βQ7: How do you balance speed of delivery with change control? β βA7: β- Use risk-based categorization β- Automate low-risk changes β- Use feature flags for gradual rollouts β- Implement canary deployments β- Trust but verify: Automated testing β- Error budgets allow faster delivery when stable β- Good telemetry reduces uncertainty β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ βQ8: Describe a time when you had to push back on a change request. β βA8: β- Example: Insufficient testing, high risk without adequate rollback β- Explained concerns to stakeholder β- Proposed alternatives β- Reached compromise β- Documented decision β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ βQ9: What is a CAB and when is it necessary? β βA9: βChange Advisory Board: β- Cross-functional team reviewing changes β- Needed for high-risk changes β- Provides collective decision-making β- Includes: IT management, technical leads, security, business reps β- Not needed for low-risk, pre-approved changes β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ βQ10: How do you measure change management effectiveness? β βA10: β- Change success rate β- Rollback rate β- Change lead time β- Emergency change percentage β- Mean time to recovery for change-related incidents β- Stakeholder satisfaction β- Process compliance rate β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββQuick Reference
Section titled βQuick ReferenceβChange Workflow:Request β Review β Approve β Implement β Verify β Close
Change Types:- Emergency: Immediate, post-approval OK- Standard: Pre-approved, low-risk- Normal: Full review, moderate-high risk
Risk Assessment:- Impact Γ Likelihood- Low (1-9): Approve- Medium (10-14): CAB approval- High (15-20): Executive approval
Best Practices:- Small, frequent changes- Everything in version control- Automated testing- Rollback ready- Canary deployments- Post-implementation reviewSummary
Section titled βSummaryβ- Process: Request β Review β Approve β Implement β Verify β Close
- Types: Emergency (post-approval), Standard (pre-approved), Normal (full process)
- Risk: Impact Γ Likelihood determines approval level
- CAB: Change Advisory Board reviews high-risk changes
- DevOps: Small changes, automation, quick rollback
Next Chapter
Section titled βNext ChapterβChapter 99: Incident Management
Last Updated: February 2026