Mean time to recovery definition
In the context of information technologies (IT), the mean time to recovery is the average time it takes to completely restore an IT system to normal operations after a failure. Mean time to recovery is closely tied to the concept of the “mean time to repair,” but it includes further elements — namely, the additional time needed to detect the problem, notify the appropriate parties, and initiate repairs.
Measuring the mean time to recovery
The mean time to recovery covers the length of time between when the failure occurs and the point at which the system has been completely restored. You calculate the mean time to repair by dividing the total recovery time spent over the duration of a maintenance contract by the total number of failures.
Mean time to recovery components
- Detection: Time taken to identify that a failure has occurred, usually initiated by monitoring systems, user reports, or automated alerts.
- Notification: Time taken to alert the repairing party about the failure (for example, by submitting a request form or calling the IT department).
- Diagnosis: Time spent determining the root cause of the failure, including analyzing logs, examining configurations, and investigating dependencies.
- Repair: Total time needed to fix the issue and restore the service to full functionality.
- Testing: Time spent on tests to make sure the service is functioning as expected after the repair. The IT technicians may need to test different scenarios to finally approve the solution.