You support a large service with a well-defined Service Level Objective (SLO). The development team deploys new releases of the service multiple times a week. If a major incident causes the service to miss its SLO, you want the development team to shift its focus from working on features to improving service reliability.
What should you do before a major incident occurs?
- Develop an appropriate error budget policy in cooperation with all service stakeholders.
- Negotiate with the product team to always prioritize service reliability over releasing new features.
- Negotiate with the development team to reduce the release frequency to no more than once a week.
- Add a plugin to your Jenkins pipeline that prevents new releases whenever your service is out of SLO.
Answer(s): A
Explanation:
Reason : Incident has not occurred yet, even when development team is already pushing new features multiple times a week. The option A says, to define an error budget "policy", not to define error budget(It is already present). Just simple means to bring in all stakeholders, and decide how to consume the error budget effectively that could bring balance between feature deployment and reliability.
The goals of this policy are to: -- Protect customers from repeated SLO misses -- Provide an incentive to balance reliability with other features https://sre.google/workbook/error-budget-policy/
Reveal Solution Next Question