Skip to content

Change_management

Change management is a critical process in IT operations that ensures all changes to production systems are properly reviewed, tested, and documented. It minimizes the risk of service disruptions while enabling the organization to respond quickly to business needs. This chapter covers the complete change management lifecycle, industry frameworks (ITIL, COBIT), practical implementation, and DevOps/Agile approaches to change management. Understanding change management is essential for DevOps and SRE roles, as it forms the backbone of safe software delivery and infrastructure modifications.


┌─────────────────────────────────────────────────────────────────────────┐
│ CHANGE WORKFLOW │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ CHANGE LIFECYCLE │ │
│ ├─────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ ┌─────────┐ │ │
│ │ │ CREATE │ ──► Create change request with details │ │
│ │ │ Request │ - What, Why, When, How, Risk │ │
│ │ └────┬────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────┐ │ │
│ │ │ REVIEW │ ──► Technical and business review │ │
│ │ │ │ - Assess impact, dependencies, risks │ │
│ │ └────┬────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────┐ │ │
│ │ │ APPROVAL│ ──► Get authorization to proceed │ │
│ │ │ │ - Based on risk level │ │
│ │ └────┬────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────┐ │ │
│ │ │IMPLEMENT│ ──► Execute the change │ │
│ │ │ │ - Follow documented procedure │ │
│ │ └────┬────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────┐ │ │
│ │ │ VERIFY │ ──► Confirm change achieved desired result │ │
│ │ │ │ - Test functionality, monitoring │ │
│ │ └────┬────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────┐ │ │
│ │ │ CLOSE │ ──► Document lessons, update knowledge base │ │
│ │ │ │ - Complete change record │ │
│ │ └─────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Change Types: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ EMERGENCY CHANGES │ │
│ │ - Unplanned, immediate implementation required │ │
│ │ - Post-implementation approval (within 24-48 hours) │ │
│ │ - Requires incident ticket linkage │ │
│ │ - Minimal documentation, but must be captured │ │
│ │ │ │
│ │ STANDARD CHANGES │ │
│ │ - Pre-approved, routine changes │ │
│ │ - Low risk, well-understood │ │
│ │ - No additional approval needed │ │
│ │ - Examples: security patches, configuration updates │ │
│ │ │ │
│ │ NORMAL CHANGES │ │
│ │ - Full review and approval process │ │
│ │ - Moderate to high risk │ │
│ │ - Requires CAB approval for high-risk │ │
│ │ - Examples: infrastructure upgrades, new deployments │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ RISK ASSESSMENT │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Risk Score = Impact × Likelihood │
│ │
│ Impact Levels: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Level 1 (Low) - No customer impact, easy rollback │ │
│ │ Level 2 (Moderate) - Minor impact, quick rollback possible │ │
│ │ Level 3 (High) - Significant impact, careful planning │ │
│ │ Level 4 (Critical)- Major outage potential, extensive testing│ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Likelihood: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ 1 - Rare | 2 - Unlikely | 3 - Possible | 4 - Likely | 5 - Almost Certain │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Risk Score Matrix: │
│ ┌─────────┬───────┬───────┬───────┬───────┬───────┐ │ │
│ │ Impact │ Rare │Unlikely│Possible│ Likely│ Certain│ │ │
│ ├─────────┼───────┼───────┼───────┼───────┼───────┤ │ │
│ │ Critical│ 4 │ 8 │ 12 │ 16 │ 20 │ │ │
│ │ High │ 3 │ 6 │ 9 │ 12 │ 15 │ │ │
│ │ Moderate│ 2 │ 4 │ 6 │ 8 │ 10 │ │ │
│ │ Low │ 1 │ 2 │ 3 │ 4 │ 5 │ │ │
│ └─────────┴───────┴───────┴───────┴───────┴───────┘ │ │
│ │
│ Risk Acceptance: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ 1-4: Low Risk - Approve automatically │ │
│ │ 5-9: Medium Risk - Manager approval │ │
│ │ 10-14: High Risk - CAB approval required │ │
│ │ 15-20: Critical - Executive approval required │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Questions to Answer: │
│ - What could go wrong? │
│ - What's the blast radius? │
│ - What's the rollback plan? │
│ - How do we verify success? │
│ - What's the backout time? │
│ │
└─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐
│ ITIL CHANGE MANAGEMENT │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Key Elements: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Change Manager: │ │
│ │ - Owns the change process │ │
│ │ - Coordinates with stakeholders │ │
│ │ - Ensures compliance │ │
│ │ │ │
│ │ Change Advisory Board (CAB): │ │
│ │ - Reviews high-risk changes │ │
│ │ - Includes: IT management, technical leads, security │ │
│ │ - Weekly meetings (typically) │ │
│ │ - Makes approval/rejection decisions │ │
│ │ │ │
│ │ Emergency Change Advisory Board (ECAB): │ │
│ │ - Subset of CAB for emergency changes │ │
│ │ - Quick decision-making │ │
│ │ - Meets ad-hoc as needed │ │
│ │ │ │
│ │ Change Model Components: │ │
│ │ - Request for Change (RFC) │ │
│ │ - Impact assessment │ │
│ │ - Implementation plan │ │
│ │ - Test plan │ │
│ │ - Rollback procedure │ │
│ │ - Post-implementation review │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Metrics: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ - % of successful changes │ │
│ │ - % of changes requiring rollback │ │
│ │ - Average change implementation time │ │
│ │ - Number of emergency changes │ │
│ │ - Change lead time (request to implementation) │ │
│ │ - Number of rejected changes │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘

Terminal window
# ============================================================
# CHANGE MANAGEMENT BEST PRACTICES
# ============================================================
# Change Advisory Board (CAB)
# - Meet regularly (weekly recommended)
# - Review all normal and high-risk changes
# - Include diverse stakeholders
# - Document all decisions
# Example CAB Meeting Agenda:
# 1. Review previous action items (5 min)
# 2. New change requests (30 min)
# 3. Emergency changes (10 min)
# 4. Metrics review (5 min)
# 5. Process improvements (10 min)
# Automation
# Infrastructure as Code
- Use Terraform, Ansible, CloudFormation
- Version control all changes
- Peer review via pull requests
# CI/CD Pipelines
- Automated testing at each stage
- Automated deployment
- Automated rollback capabilities
- Blue-green or canary deployments
# Change Categories and Automation
standard_changes:
- Security patches (automated)
- SSL certificate renewal
- Database index changes
- Configuration updates
normal_changes:
- New service deployment
- Infrastructure modifications
- Application updates
- Requires manual approval
emergency_changes:
- Security vulnerability fixes
- Critical bug fixes
- Outage remediation
# Communication
notify_stakeholders:
- Email notifications
- Chat bot updates
- Status page updates
- Team standups
# Post-Implementation Review (PIR)
# Schedule within 48-72 hours
# Discuss:
# - What went well?
# - What could be improved?
# - Lessons learned
# - Action items
┌─────────────────────────────────────────────────────────────────────────┐
│ DEVOPS APPROACH TO CHANGES │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Traditional ITAM → DevOps: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Heavy documentation ──► Working software over documentation │ │
│ │ Manual approvals ──► Automated approvals │ │
│ │ Long cycles ──► Short, frequent changes │ │
│ │ Fear of change ──► Embrace change │ │
│ │ Siloed teams ──► Cross-functional teams │ │
│ │ Big bang releases ──► Incremental deployments │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ DevOps Change Principles: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ 1. Make small, frequent changes (Trunk-based development) │ │
│ │ 2. Everything in version control │ │
│ │ 3. Automated testing and deployment │ │
│ │ 4. Feature flags for gradual rollouts │ │
│ │ 5. Canary releases to detect issues early │ │
│ │ 6. Automated rollback on failure │ │
│ │ 7. Telemetry to detect problems quickly │ │
│ │ 8. Blameless post-mortems │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ SRE Change Process: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ - Changes must be: │ │
│ │ 1. Gradually rolled out │ │
│ │ 2. Monitored closely │ │
│ │ 3. Have quick rollback │ │
│ │ 4. Have explicit success criteria │ │
│ │ │ │
│ │ - Error Budget: Changes faster when error budget is healthy │ │
│ │ - Toil Reduction: Automate repetitive changes │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘

change_request_workflow.yaml
# Example: Change Management in Jira Service Management
fields:
change_type:
- Emergency
- Standard
- Normal
risk_assessment:
impact:
- Critical
- High
- Moderate
- Low
likelihood:
- Almost Certain
- Likely
- Possible
- Unlikely
- Rare
implementation_plan:
- type: text
- required: true
rollback_plan:
- type: text
- required: true
test_plan:
- type: text
- required: true
workflow:
states:
- Draft
- Submitted
- Under Review
- Approved
- Rejected
- In Progress
- Verified
- Closed
transitions:
- from: Draft
to: Submitted
trigger: Submit
- from: Submitted
to: Under Review
trigger: Start Review
- from: Under Review
to: Approved
trigger: Approve
condition: risk_score <= 9
- from: Under Review
to: Rejected
trigger: Reject
- from: Approved
to: In Progress
trigger: Implement
- from: In Progress
to: Verified
trigger: Verify Success
- from: In Progress
to: Verified
trigger: Rollback
condition: verification_failed
notifications:
- on_approval_required: email to CAB
- on_implementation_start: email to stakeholders
- on_verification_complete: email to requester

┌─────────────────────────────────────────────────────────────────────────┐
│ CHANGE MANAGEMENT INTERVIEW QUESTIONS │
├─────────────────────────────────────────────────────────────────────────┤
Q1: What is the change management process in your organization? │
A1: │
- Explain the workflow: Request → Review → Approve → Implement → Verify- Describe change types (Emergency, Standard, Normal) │
- Risk assessment process │
- Approval hierarchy │
- Tools used (Jira, ServiceNow, etc.) │
- Post-implementation reviews │
─────────────────────────────────────────────────────────────────────────┤
Q2: What's the difference between emergency and standard changes? │
A2: │
- Emergency: Immediate, unplanned, post-approval acceptable │
- Standard: Pre-approved, routine, low-risk │
- Normal: Full review process, moderate to high risk │
─────────────────────────────────────────────────────────────────────────┤
Q3: How do you handle a change that causes an incident? │
A3: │
- Immediately trigger rollback │
- Log incident and link to change record │
- Notify stakeholders │
- Fix issue before re-attempting │
- Conduct post-mortem after resolution │
- Update change process if needed │
─────────────────────────────────────────────────────────────────────────┤
Q4: How does DevOps change traditional change management? │
A4: │
- Smaller, frequent changes vs large releases │
- Automated testing reduces risk │
- Feature flags enable instant rollback │
- Canary deployments catch issues early │
- Blameless culture encourages reporting │
- Faster recovery through automation │
─────────────────────────────────────────────────────────────────────────┤
Q5: What factors do you consider in change risk assessment? │
A5: │
- Impact: Customer, revenue, data, compliance │
- Likelihood: Probability of failure │
- Complexity: Number of components affected │
- Dependencies: What else might be affected │
- Rollback complexity: How hard to undo │
- Test coverage: How well is it tested │
- Team experience: Familiar with the change? │
─────────────────────────────────────────────────────────────────────────┤
Q6: How would you implement a change management process from scratch? │
A6: │
1. Define change types and criteria │
2. Create RFC template │
3. Establish approval workflow │
4. Form CAB (if needed) │
5. Select/configure tool │
6. Train team members │
7. Start with pilot │
8. Iterate based on feedback │
9. Measure and improve │
─────────────────────────────────────────────────────────────────────────┤
Q7: How do you balance speed of delivery with change control? │
A7: │
- Use risk-based categorization │
- Automate low-risk changes │
- Use feature flags for gradual rollouts │
- Implement canary deployments │
- Trust but verify: Automated testing │
- Error budgets allow faster delivery when stable │
- Good telemetry reduces uncertainty │
─────────────────────────────────────────────────────────────────────────┤
Q8: Describe a time when you had to push back on a change request. │
A8: │
- Example: Insufficient testing, high risk without adequate rollback │
- Explained concerns to stakeholder │
- Proposed alternatives │
- Reached compromise │
- Documented decision │
─────────────────────────────────────────────────────────────────────────┤
Q9: What is a CAB and when is it necessary? │
A9: │
Change Advisory Board: │
- Cross-functional team reviewing changes │
- Needed for high-risk changes │
- Provides collective decision-making │
- Includes: IT management, technical leads, security, business reps │
- Not needed for low-risk, pre-approved changes │
─────────────────────────────────────────────────────────────────────────┤
Q10: How do you measure change management effectiveness? │
A10: │
- Change success rate │
- Rollback rate │
- Change lead time │
- Emergency change percentage │
- Mean time to recovery for change-related incidents │
- Stakeholder satisfaction │
- Process compliance rate │
└─────────────────────────────────────────────────────────────────────────┘

Change Workflow:
Request → Review → Approve → Implement → Verify → Close
Change Types:
- Emergency: Immediate, post-approval OK
- Standard: Pre-approved, low-risk
- Normal: Full review, moderate-high risk
Risk Assessment:
- Impact × Likelihood
- Low (1-9): Approve
- Medium (10-14): CAB approval
- High (15-20): Executive approval
Best Practices:
- Small, frequent changes
- Everything in version control
- Automated testing
- Rollback ready
- Canary deployments
- Post-implementation review

  • Process: Request → Review → Approve → Implement → Verify → Close
  • Types: Emergency (post-approval), Standard (pre-approved), Normal (full process)
  • Risk: Impact × Likelihood determines approval level
  • CAB: Change Advisory Board reviews high-risk changes
  • DevOps: Small changes, automation, quick rollback

Chapter 99: Incident Management


Last Updated: February 2026