Complete Guide to DORA Metrics | How to track DORA metrics
Alexandre Couëdelo
February 3rd, 2023
DORA Metrics Best Trackers Comparison Guide [Feb 2023]
If you're a DevOps team looking for ways to improve your performance, you've probably heard of the Accelerate and DORA metrics. Tracking your performance with these metrics is a challenge, requiring good tooling. You might be tempted to build your own tracking solution, but you don’t need to! Due to the growing popularity of DORA metrics, several tracker tools are available on the market.
But how do you choose the best DORA metrics tracker for you? This handy comparison guide will help!
In 2021, I reviewed five of the most popular tools out there at the time (Faros, Haystack, LinearB, Sleuth and Velocity). In less than a year, the number of competing products has exploded. This year, we’ll review four more trackers, for a total of nine candidates:
- Faros
- Haystack
- Jellyfish
- LinearB
- Propelo
- Sleuth
- Swarmia
- Uplevel
- Velocity
For each of these tools, we'll discuss their different features and help you decide which one works for your team. Also, we’ll focus particularly on customization, so you can find the tracker that best fits your unique needs.
Before jumping into our comparisons, let's review some basics, clarifying what the four DORA metrics are and why we care.
What are DORA metrics?
A fundamental claim of the DevOps approach is that we can achieve the fast delivery of reliable software. This may seem counter-intuitive; if you’re constantly updating your software to make it more reliable, then how can you deliver it quickly? However, when implemented well, continuous integration and delivery practices can help you achieve this goal.
To determine how well a team is implementing DevOps practices, we look to DORA metrics:
- Change Lead Time tracks the time from when a developer starts writing code for a feature or a change to when that change is released to end users.
- Deployment Frequency tracks how often code is deployed to production or otherwise deployed to end users at the highest level.
- Mean Time to Recovery is the time it takes to restore a service that impacts users, averaged across all incidents in an organization.
- Change Failure Rate is the ratio of the number of deployments that caused a failure to the total number of deployments.
The first two metrics—Change Lead Time and Deployment Frequency—are temporal metrics, and their objective is to measure speed or throughput. The last two—Mean Time to Recovery and Change Failure Rate—are quality metrics, and their objective is to measure the system's reliability.
Tracking these metrics enables your organization to see where it stands compared to other organizations in the industry. Every year, DORA (DevOps Research and Assessment) surveys and categorizes companies based on their performance in these metrics, and that research is summarized in an annual State of DevOps Report.
Tracker tools are important because they help you automate the process of measuring your performance. They help you stay on track, improve your process, and release fast and reliable software.
Evaluating DORA metrics trackers
In order to evaluate the numerous tracker tools available, we’ve divided our survey into three broad categories:
1. Metrics measurement
This first category validates whether or not a tool tracks each of the four DORA metrics and whether that tracking is performed accurately. Tracking the metrics alone is not enough, as the ideal tracker should show your metrics in an easy-to-read dashboard and provide proper reporting to identify trends and problems in your process. And it can only do that with an accurate model of actual work being done.
Modeling how a team works is not one size fits all, but basic components include:
- How you group code, infrastructure, feature flag, and manual changes together
- How change flows through your different environments
- The time spent and the real work done in the different phases of your software development life cycle
- Your overall developer deployment workflow or how an individual takes a piece of work from concept through to successful launch
For more on how to accurately model and understand your engineering efficiency and DORA metrics, check out our Accuracy Matters white paper.
2. Developer friendliness
The next broad category of evaluation is developer friendliness. Developers are at the core of your business operations, so a tracker tool must make the DORA metrics useful to them (instead of just managers and executives through dashboards and reports). The ideal tracker should empower developers with feedback in the development and deployment process, focusing on team performance over individual performance.
3. Integrations and customizations
This last category in our evaluation aims to help you find the tracker that fits your unique needs. The ideal tool integrates within the full DevOps loops (plan ➤ code ➤ build ➤ test ➤ release ➤ deploy ➤ monitor). However, we prefer tools that can be customized to fit with how we already work rather than those that force an organization to change its processes just to calculate metrics accurately. A tracker tool should serve the organization, not the other way around.
Now that we’ve provided a brief overview of our approach, let’s dive into the results.
Metrics measurement
When it comes to this category, we’re looking for those trackers that capture all of the DORA metrics accurately and display those metrics compellingly. We asked the following questions when we reviewed each tracker:
- Does this tool track all four DORA metrics?
- Does this tool track these metrics accurately?
- Does this tool provide dashboards to visualize an organization’s performance?
- Does this tool provide reporting to identify trends and issues?
The table below shows our assessment of how each tracker tool stands up to each of these questions. For reading the results in the table, we use the following key:
- ✅ = Meets the criteria
- 🟧 = Partially meets the criteria but has some minor issues
- ❌ = Does not meet the criteria
After answering the questions for each of the tracker tools, we assigned a grade based on how well each tool meets the different criteria overall.
Tracks all DORA Metrics | Tracks with accuracy | Provides dashboards | Provides reports | Grade | |
---|---|---|---|---|---|
Faros | ✅ | 🟧 Change Failure Rate calculation is oversimplified as ratio of deployment to incident or bugs to release. | ✅ Offers a simple, premade dashboard for DORA metrics. | 🟧 No real reporting and analysis of the result. You have to build your own analysis. | A |
Haystack | ✅ | ❌ Deployment Frequency infers deployment based on 7 possible Git events, which also impacts Change Lead Time accuracy. Change Failure Rate focuses on hotfixes or configuring Jira so some tickets are considered deployment failures, leaving some elements uncovered. | ❌ The main dashboard does not include MTTR. DORA metrics are not their main focus, so they don’t provide a consolidated dashboard for them. | ✅ The reporting system is highly customizable. You can send a report or link to a reporting dashboard. | B |
Jellyfish | ✅ | ✅ Uses deployment and incident APIs to build the metrics. | ✅ Clear dashboard focuses on one metric at a time. | ✅ Reporting tools provide custom analysis of trends per metric. | A+ |
LinearB | ✅ | ❌ Uses tickets to define the MTTR, then calculates the mean time between a production bug ticket being opened and closed. Calculates Change Failure Rate incorrectly as the number of deployments divided by incidents. | ✅ Dedicated dashboard with the DORA metrics. | ❌ The main focus is on the Change Lead Time and Deployment Frequency metrics. | B |
Propelo | ✅ | ✅ Some configuration is needed to get accurate metrics, as it looks mainly at tickets before integration. | ✅ Shows DORA metrics clearly. Can deep-dive into each metric to see details and find the root cause of metric degradation. | ✅ Reports for tracking improvement of metrics, business insights, comparisons to State of DevOps report | A+ |
Sleuth | ✅ | ✅ Uses webhooks to ingest data, so the data is based on actual events (e.g., deployments). Also integrates with monitoring and observability systems, in addition to incident management systems. As a result, all failures are accounted for, not just incident-inducing failures. Offers more options to detect failure; sources of failure are now categorized to give better insight to fail rate increases. Also tracks feature flags. | ✅ Project and team metrics dashboards are well designed for presenting DORA metrics. | ✅ New labeling allows analyzing trends based on properties (e.g., TDD vs non-TDD, backend vs frontend, staging vs production). | A+ |
Swarmia | ✅ | 🟧 Incorrect Change Lead Time calculation. Cycle time focuses on PRs but not from the first commit to deploy to production. | ❌ MTTR is hidden in the deployment section. | ❌ Lacks a clear report for DORA metrics, with no way to know how you perform over time. | B |
Uplevel | ❌ Focuses on scrum metrics and cycle time. More of a productivity tracker. | ❌ Measures lead time, but only based on issue tracking. | ❌ No DORA dashboards. | ❌ No DORA metrics reporting. | D |
Velocity | ❌ Tracks Deployment Frequency and Change Lead Time; not MTTR nor Change Failure Rate. | ❌ Change Lead Time is based on a ticket being closed or a pull request merged. It's more of a Scrum metric than DORA metric. | 🟧 Polished dashboard for the metrics it tracks. | ❌ No DORA metrics dashboard, hence no DORA reports. | C |
Let’s explain how we arrived at the above assessment.
Top scorers: Sleuth, Jellyfish, Propelo, and Faros
The standouts in this category were Sleuth, Jellyfish, and Propelo, which scored A+ grades. They offer excellent features and care deeply about providing accurate DORA metrics.
Faros, coming in slightly behind, oversimplifies Change Failure Rate, calculating it as the ratio of incidents to deployments or bugs to releases. Determining the Change Failure Rate with Faros is difficult because it means filtering the cause of an incident or a bug. In addition, Faros encourages you to build your own analytics. Therefore, reporting in Faros is not as straightforward as the other tools.
Less focus on DORA Metrics: LinearB, Swarmia, and Haystack
LinearB, Swarmia, and Haystack all scored lower in this category, missing out on some of the key features that make a good DORA metrics tracker. While they all track and display the four DORA metrics, it’s clear that the DORA metrics are not the main focus of these tools.
Swarmia almost gets the metrics right, except that Change Lead Time is focused on pull requests. In addition, Swarmia focuses instead on calculating development Cycle Time. The DORA metrics are displayed in a separate dashboard, and there is no consolidated report for those metrics.
Haystack counts deployments based on Git events, which can lead to inaccurate metrics calculations. Although Haystack provides an interesting dashboard, they promote their own metrics instead of those from DORA.
LinearB calculates Mean Time to Recovery based on open and closed production bug tickets, and this approach brings some limitations when accounting for failures. Similarly to Faros, LinearB also oversimplifies its calculation of Change Failure Rate. Lastly, LinearB provides a good dashboard, but the analysis and reporting of the metrics seem oversimplified when compared to the other tracker tools.
Little focus on DORA Metrics: Uplevel and Velocity
Finally, we have Uplevel and Velocity, which seem to be popular engineering metrics tools. However, they do not focus on DORA metrics but rather on Cycle Time, emphasizing the speed of your delivery process over the reliability of your software.
As we evaluated the nine different trackers for this category, we found three tiers of tools. The top tier focuses on the DORA metrics and aims to provide the most accurate representation. Tools in the middle tier incorporate the DORA metrics in their system but don’t emphasize them. Lastly, we have those tracker tools that focus primarily on DevOps process speed rather than on the DORA metrics.
With this broad category covered, let’s proceed to consider how each tool scored regarding developer friendliness.
Developer friendliness
All the tools in our survey focus on providing development feedback. Collecting metrics about your development lifecycle is not enough. We expect those tools to deliver actionable feedback to developers. Here are the questions we asked:
- Does the tool provide actionable feedback for developers regarding the development process?
- Does the tool provide actionable feedback for developers regarding the deployment process?
- Does the tool refrain from providing individual metrics?
- Does the tool refrain from using proxy metrics?
Some engineering metrics tools track individual developer performance. With such metrics, it is tempting for managers to reduce team problems to a single individual. However, we recommend focusing on team performance to bring overall improvement. This approach fosters a blameless culture that nurtures team morale, avoiding an unhealthy focus on individual performance.
In addition, we want to look for tools that avoid increasing the toil on the developer that comes from tracking questionable performance metrics—what we call “proxy metrics”—like the number of lines of code changed or pull requests opened. These proxy metrics distort the view of your DevOps process and can lead to decisions that are not in the best interests of your team.
We start with the table showing our evaluation of each tool in this category, and then we’ll follow it up with a detailed explanation.
Development feedback | Deployment feedback | No individual metrics | No proxy metrics | Score | |
---|---|---|---|---|---|
Faros | 🟧 Must build this yourself using n8n.io (integrated in the platform). | 🟧 Must build this yourself using n8n.io (integrated in the platform). | ❌ It’s possible to drill down into individual contributions. | ✅ They do not promote proxy metrics out of the box; most of the dashboards are centered around value stream mapping. | B |
Haystack | ✅ Their Haystack Notification tool provides daily updates for developers and weekly updates for teams to identify bottlenecks. | ❌ Not available | 🟧 Includes per-member metrics that must be manually turned off. | ✅ Focuses on DORA metrics | B |
Jellyfish | ✅ Offers development metrics for sprint planning, sprint review, work in progress and PR size. | ❌ Not available. | ✅ Focuses on team-based metrics to measure teams’ progress | ✅ DORA metrics are a new addition to the platform. | A |
LinearB | ✅ Slack integration provides helpful support to improve the development workflow and coordinate between team members. | ❌ Missing from the WorkerB tool | ❌ A tool for managers to assign work to developers, resulting in individual metrics. | ❌ Number of pull requests, work activity, number of commit, working hours, workload, work in progress | C |
Propelo | 🟧 Doesn’t come out of the box; has to be built. | 🟧 Integrates with CI/CD. Lets you define SLA and create alerts. | 🟧 Individual metrics can be enabled. Once enabled, there are many options to compare people’s productivity. | ✅ | B |
Sleuth | ✅ The latest release includes Work in Progress dashboards that show in-flight work that hasn’t yet deployed, and highlights emerging risks that could negatively affect team efficiency, allowing for immediate action to remedy them. | ✅ Offers rule-based deployment approval workflow, with a simple thumbs up approval in Slack. | ✅ No leaderboard here; only team or project-based metrics. | ✅ Focuses on DORA metrics | A+ |
Swarmia | ✅ A unique tool is the work agreement that can be used to set policy on work in progress and workflow to improve the development process. | ❌ The deployment dashboard is the only interaction with the deployment; lacks extra automated insight. | ✅ All metrics are team or organization-based. | ✅ SPACE is the underlying framework | A |
Uplevel | ✅ Provides goal-based notifications to find blockers and bottlenecks. | ❌ None | ❌ Track individual contributor metrics for managers, like always on, context switching, lack of deep work, and others. | ❌ Provides many metrics to managers, like time in meetings and contributions to sprints. | C |
Velocity | ✅ Provides feedback. | ❌ None | ❌ Can track on a per-individual basis as well as per-team basis. | ❌ Tracks several metrics outside of DORA that could be considered questionable proxy metrics. | C |
Development feedback
Every tool we evaluated uses email and/or Slack notifications to keep developers up-to-date. However, the way this is achieved differs from tool to tool.
Sleuth, Haystack,and Velocity provide an interesting Slack standup feature that captures for developers the significant events that happened the previous day. Similarly, Uplevel provides a daily update on sprint health and potential blockers.
Swarmia and LinearB focus on finding bottlenecks and notifying teams when issues and pull requests are idle for too long, helping teams to collaborate. Swarmia has an interesting feature called “working agreements” that lets you select limits and improvement targets to improve collaboration.
Propelo and Faros choose the same approach, letting users create their own notification workflow. While this can be seen as a positive thing in terms of customization, developers don’t get that out-of-the-box experience that the other tools offer.
Deployment feedback
While we see a lot of effort made to help developers ship code and close pull requests, only a few tools provide a concrete solution to help developers feel engaged with the deployment process. Propelo and Faros can achieve this feature via their configurable and flexible ChatOps system. However, Sleuth is the only platform that integrates directly with the deployment process with both approval workflow integrations and deployment notifications.
Individual and proxy metrics
It’s important to note that not all tools track individual metrics. Some tools—such as Sleuth, Jellyfish, and Swarmia—focus exclusively on tracking team and organization metrics. This is important because it allows developers to measure their performance and improve their processes without worrying about invading their privacy.
Propelo and Haystack are noteworthy in the sense that individual metrics are not available out of the box, but you enable them upon request.
Regarding proxy metrics, our assessment runs parallel to that of individual metrics. If a tool provides individual metrics, then it will undoubtedly provide every imaginable way to measure those individual performances, and that includes proxy metrics.
From my evaluation, it seems that every tracker tool provides actionable feedback to help developers during the development process. We could even say that most tools in our survey focus on development feedback. However, only a few provide concrete solutions for developers to feel engaged with the deployment process.
Additionally, managers should be careful not to reduce team problems to a single individual when looking at performance metrics. Instead, they should focus exclusively on the team's performance as a whole, avoiding tools that do not share that vision.
Let’s finish our comparison by looking at what these tools offer regarding integrations. In other words, how well would they fit in your stack?
Integrations and customization
We evaluated aspects in this final category according to the following criteria:
- Issue tracking integration: Does the tool help to bridge the gap between issues/stories/epics and the actual work behind them?
- Codebase integration: Does the tool integrate with your code repository (whether you prefer a mono repo/microservice, a Git flow, or a trunk-based approach)?
- CI/CD integration: Does the tool integrate with your CI/CD pipeline to accurately account for deployments?
- Monitoring integration: Does the tool help identify deployment issues and improve metrics accuracy?
- Automated data collection: How simple is the data ingestion process for this tool?
- Customization: Can we import additional data and create dashboards how we want?
Let’s look at how our tools performed for each of these criteria.
Issue Tracking | Codebase | CI/CD | Monitoring | Automated Collection | Customization | Score | |
---|---|---|---|---|---|---|---|
Faros | ✅ Automatically pulls data from issue tracking and analyzes it | ✅ Default is GitHub Flow, but can customize to accommodate different workflows | ✅ Provides a CLI or offer the option to use Airtable | 🟧 Possible to do using Airbyte, but not out of the box | 🟧 Provides automated integration that collects and transforms the data automatically. | ✅ Highly customizable, based on manipulating the database schema. Register new schemas and populate the content with Airbyte. | A |
Haystack | ✅ Jira, Github, Gitlab. Their integration tries to push the concept a bit further by helping maintain issue tracking in sync with what is happening on the code side | ✅ GitHub, Gitlab and Bitbucket. Allows all types of flow (hence the many deployment configuration options in terms of Git events.) | ❌ None | ❌ None | ✅ Everything is automated, but the number of data sources is limited to code source + issue tracking. | ✅ Lets you create a custom dashboard and filter metrics to allow customized reports for anyone. | B |
Jellyfish | ✅ Integrates with most | ❌ No notification integration. | ✅ | ✅ Default integration with Pagerduty. Also provides the Incident API to offer flexibility when calculating deployment failure/failure rate | 🟧 A few integrations, but mostly they help you integrate with their API. | ❌ Customizations not possible | B |
LinearB | ✅ Jira, Github Issues, Asana, Monday, Clickup, Linear | ❌ Implies usage of Github Flow with multiple repo and pull-requests; no trunk base or monorepo | ✅ Provide an API endpoint for deployment frequency | ❌ None | ✅ Must define how to identify bugs in production from tickets. | ❌ Fairly static in terms of customization | C |
Propelo | ✅ Integrates with most | 🟧 Somewhat difficult out-of-the-box integration | ✅ Mostly pre-configured; use API or webhooks for others. | ✅ Pagerduty, Salesforce | ✅ 40 different integrations | ✅ Customizable dashboards. Can import internal data and store as long as there’s an API (useful for comparing existing metrics with internal metrics, like DORA & VSM). | A |
Sleuth | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | A |
Swarmia | ✅ Integration is possible with Jira and Linear. | ❌ Only works for GitHub. Deployment is based on webhooks, but this breaks down accuracy of the other metrics. | ✅ Can use webhooks to integrate with any CI/CD. | 🟧 Can send a webhook to failure so that any alerting system can be integrated. | 🟧 Everything works out of the box except for the failure rate. | ❌ Customizations not possible. | B |
Uplevel | ✅ This is their main target. | ❌ Does not look into the code base. | ❌ None | ❌ None | ✅ Automates everything, but integrations are limited to calendars, management tools and issue tracking. | ❌ None | C |
Velocity | ✅ Integrates with most | ❌ No trunk-based development. No monorepos | ❌ No CI/CD data collected from JIRA/issue tracker | ❌ None | ✅ Automates everything | ❌ Mostly static; provides default dashboard. | C |
Issue tracking, codebase, and CI/CD
One of the main reasons teams use issue trackers is to manage the development process; collecting metrics about this process can help you measure performance. It’s worthwhile to note that every metrics tracker tool we evaluated integrates with the most popular issue trackers.
The codebase and CI/CD integrations are the most important criteria to consider, as they can directly impact data gathering for DORA metrics calculations. Ideally, you want a tool that can adapt to your workflow.
For instance, Swarmia only works with GitHub and assumes you are using what is commonly called GitHub Flow. As you can see, this is an important consideration. In contrast, we have trunk-based development, which is one of the key recommendations from DORA’s State of DevOps Report. However, many tools don’t support it, which should be a red flag. Tools like LinearB, Velocity, and Uplevel all integrate with your codebase but are very rigid, basing metrics on pull requests, and they do not support a trunk-based development approach.
Top integration performers: Propelo, Faros, and Sleuth
Overall, for integrations and customization, three tools stand out: Propelo, Faros, and Sleuth. They all support trunk-based development. They all have CI/CD integration via webhooks or plugins, letting you choose which jobs in your pipeline represent deployments. They also all offer integration with a monitoring and alerting system. Finally, they can all ingest data from the monitoring system to provide an accurate representation of the different sources of failure.
Jellyfish enables you to achieve metrics collection from different systems; however, it does not provide as much out-of-the-box integration as its competitors.
Propelo, Faros, and Sleuth all have their respective strengths and weaknesses. Propelo’s support for trunk-based development seems like it is still evolving. For Faros, integration with monitoring is available but not straightforward, as it forces you to define the data structure before ingesting the data into the platform. Meanwhile, Sleuth intentionally does not support the creation of custom dashboards or ingesting other types of metrics besides DORA metrics.
Conclusion
In order to accurately measure the performance of your DevOps team, it is important to use a tracker that integrates well with your codebase and your development process. Integration with CI/CD and monitoring systems is essential to provide accurate measurement of the DORA metrics.
The final table below summarizes the grades for each tool within each category.
Metrics measurement | Developer friendliness | Integrations and customization | |
---|---|---|---|
Sleuth | A+ | A+ | A |
Propelo | A+ | B | A |
Faros | A | B | A |
Jellyfish | A+ | A | B |
Swarmia | B | A | B |
LinearB | B | C | C |
Haystack | B | B | B |
Uplevel | D | C | C |
Velocity | C | C | C |
The trackers reviewed in this article offer a variety of features, but some are more suited for DORA metrics calculations than others. Sleuth and Propelo are good choices for teams that believe in using those metrics to improve their adoption of the DevOps approach.
Faros and Jellyfish are close contenders; however, the tools fall short when it comes to providing actionable feedback. Other tools may claim to track DORA metrics because of the popularity of those metrics, but they don’t give DORA metrics the place they deserve within the tool.
Propelo and Faros could be considered data platforms designed to process, ingest, and display your engineering metrics, leaving the configuration of alerts and developer interactions up to you. However, both of those tools provide individual performance metrics, and the misuse of those metrics can kill morale.
Sleuth alone has adopted the approach of gathering and displaying only those proven metrics as supported by DORA's research. This enables Sleuth to provide actionable feedback that can both speed up development and reduce deployment pain.