DEVOPS metrics to track on the go

4 min readMay 10, 2023

The purpose of this article is to provide a set of SMART metrics and goals that can be used to monitor the overall success or failure rates when it comes to developing and maintaining a system across its various components.

1. Change success/failure rate

The purpose of this metric is to measure the percentage of successful changes vs erroneous ones in Production. Failed releases/changes can be considered as the ones that we tried to deploy and it failed during deployment or the ones that successfully got deployed but caused interruption in the system after deployment like slowing down the system or a bug being reported.

How to measure?

Count the percentage of failed deployments

Ideal metric value

100% successful deployments

2. Mean time to detect (MTTD)

The purpose of this metric is to measure the total time it’s taking to detect a defect/bug in the system. We shouldn’t be too worried about what can be defined as minor bugs affecting a very small area/function of the system or when the total customers affected are less and its not impacting any major function.

How to measure?

Chart the time of the top ten detections on a monthly basis in hours unit.

Ideal metric value

Shortest possible and ideally before customers

3. Mean time to recovery (MTTR)

The purpose of this metric is to measure the total time it’s taking to recover from a defect/bug in the system after it is being detected. We shouldn’t be too worried about what can be defined as minor bugs affecting a very small area of the system or when the total customers affected are less and its not impacting any major function.

How to measure?

Chart the time of the top ten recoveries on a monthly basis in hours unit.

Ideal metric value

Shortest possible and aligned with SLA policy

4. Time to market (TTM)

The purpose of this metric is to measure the time it takes us to ship to production a new feature in the system from the time of its conception until its release in the market. Features could be a major ones or a medium sized ones. We shouldn’t cater for counting on small features like a feature developed for one customer based on their needs. Tracking such metric will help figure out any business, resources or technical…blockades that is affecting the overall evolving of a system. A faster TTM means also that you are competitive with similar service providers and slow TTM means we could be behind in the competition ranking whether its about tech or features. In addition this should tell where to enhance your processes to remove blockers: for example manual process to deploy.

How to measure?

Chart the TTM of all medium and big sized feature on a monthly basis

Ideal metric value

Meeting the deadlines

5. Deployment frequency

Deployment frequency is how often new code changes are deployed to production. It’s a measure of a team’s speed and agility. This metric tracks how quickly teams can release new features or fixes.

Elite and high-performing teams have the ability — and capacity — to make multiple deployments in a day, on-demand.

There is no one way to measure this metric, sometimes frequent deployments could be a healthy sign if we are shipping more successful features into the system that is helping the system to evolve, but sometimes it might be frequent deployments related to fixing bugs one time after another.

An indicator of bad DF is for example that our backlog is loaded with features or bugs but our deployment frequency is once a week

How to measure?

Count deployments related to shipping new features or bug fix on a monthly basis

Ideal metric value

Higher deployments related to shipping features and less deployments related to bug fixes

6. Customers reported bugs

The purpose of this metric is to count the total number of bugs reported by the system’s end users. There are different benefits of tracking such a metric, like checking how much better we are getting at testing our system before shipping features to end users or how much we have proper Observability in place. The lower the customer reported bug, the higher the indication that our system testing process is in place; obviously, we don’t want to use our customers for quality control.

How to measure?

Count the total number of bugs reported by end users on a monthly basis

Ideal metric value

0 bugs reported by customers

7. Net Promoter Score (NPS)

Although this metric is not direct to Devops but in my opinion in any metrics tracking users feedback should be there. This metric is a measure used to gauge customer loyalty, satisfaction, and enthusiasm with a company that’s calculated by asking customers one question: “On a scale from 0 to 10, how likely are you to recommend this product/company to a friend or colleague?” Aggregate NPS scores help businesses improve upon service, customer support, delivery, etc. for increased customer loyalty. At the end, we are building the system for our customers, and they will be the best option to tell whether the system is doing the job or not.

How to measure?

Send a quarterly or half yearly survey to customers. NPS is calculated by subtracting the percentage of customers who answer the NPS question with a 6 or lower (known as ‘detractors’) from the percentage of customers who answer with a 9 or 10 (known as ‘promoters’).

Ideal metric value

70% or above as promoters

DEVOPS metrics to track on the go

1. Change success/failure rate

2. Mean time to detect (MTTD)

3. Mean time to recovery (MTTR)

4. Time to market (TTM)

5. Deployment frequency

6. Customers reported bugs

7. Net Promoter Score (NPS)

Written by Fouad Roumieh