Your company follows Site Reliability Engineering principles. You are writing a postmortem for an incident, triggered by a software change, that severely affected users. You want to prevent severe incidents from happening in the future.
What should you do?
- Identify engineers responsible for the incident and escalate to their senior management.
- Ensure that test cases that catch errors of this type are run successfully before new software releases.
- Follow up with the employees who reviewed the changes and prescribe practices they should follow in the future.
- Design a policy that will require on-call teams to immediately call engineers and management to discuss a plan of action if an incident occurs.
Answer(s): B
Explanation:
The best way to prevent severe incidents from happening in the future is to ensure that test cases that catch errors of this type are run successfully before new software releases. This is aligned with the Site Reliability Engineering principle of testing for reliability.
Reveal Solution Next Question