You are responsible for the reliability of a high-volume enterprise application. A large number of users report that an important subset of the application's functionality a data intensive reporting feature is consistently failing with an HTTP 500 error.
When you investigate your application's dashboards, you notice a strong correlation between the failures and a metric that represents the size of an internal queue used for generating reports. You trace the failures to a reporting backend that is experiencing high I/O wait times. You quickly fix the issue by resizing the backend's persistent disk (PD). How you need to create an availability Service Level Indicator (SLI) for the report generation feature. How would you define it?
- As the I/O wait times aggregated across all report generation backends
- As the proportion of report generation requests that result in a successful response
- As the application's report generation queue size compared to a known-good threshold
- As the reporting backend PD throughout capacity compared to a known-good threshold
Answer(s): B
Explanation:
According to SRE Workbook, one of potential SLI is as below:
* Type of service: Request-driven
* Type of SLI: Availability
* Description: The proportion of requests that resulted in a successful response.
https://sre.google/workbook/implementing-slos/
Reveal Solution Next Question