Engineering Improvement Runbook | Engineering Automations
Dylan Etkin
June 23rd, 2023
Guardrails: Engineering teams head for a deadly crash without them
Every team has guardrails, whether you recognize them or not. They’re a form of automation that can have significant impact on your software development process and the people doing the work. They’re another way to give toil the boot and keep developers in the flow.
We’ve made the case for engineering automation in a previous article; here’s how guardrails as automations ensure that agreed upon boundaries and ways of working are codified into team processes.
Guardrails defined
Guardrails define the boundaries a team agrees not to cross. They establish consistency, which creates safety, maintains your developers’ sanity, and makes things maintainable. Guardrails give a certain amount of visibility into the development process and provide the “why” behind the “what.”
Guardrails can:
- Be as simple as holding a weekly standup or monthly planning meeting.
- Set cultural standards, like agreeing that when an incident occurs, team members avoid making new changes to respect the SREs who are fighting that incident.
- Be a straightforward check to ensure a pull request description is complete before it’s merged.
Or, guardrails can be more progressive and enforced with various tools. Compared to a manual deployment process where it’s easy to skip steps, a fully automated CI/CD pipeline with strong tooling makes adding guardrails easy.
For example, at Sleuth, we started with very basic linting in our CI/CD pipeline to check if a formatter runs. Then, we added Mypy to do static analysis. Any errors were motivation to ratchet them down to zero so we could reason about the code more intelligently. Because we had a fully automated process, we could adopt new tools, onboard, and add more layers of safety, security, and scanning quickly.
How guardrails influence teams
Perhaps the biggest impact guardrails can have is alignment with a group of engineers around how they work, what is acceptable, and what isn’t. It encodes ways of working into practice, beyond relying on a social contract.
This makes it easier for new people to join and actually get work done. It makes engineering projects more maintainable over time. It increases the safety involved in delivery so you don’t impact your customers when you deploy or make a change. You want to make it easier to know what's going wrong when something goes wrong.
Guardrails can also be put in place to change the culture. If you need to make a disruptive cultural change and finally do things differently, put strong guardrails in place to drive that change. It might be painful to people at first, but it affects change quickly. And once your devs feel the weight of toil starting to lift, they’ll be on board.
Pains and gains of implementing guardrails
When I started at Statuspage, something had gone wrong with the “meta Statuspage” — a status page for Statuspage, so to speak. It was a robust version of a staging environment.
It hadn't been deployed in several months. The team culture had evolved to be fine with that site not working. That led us to question if it was truly a guardrail or not.
Culturally, we needed the guardrail to be that the meta site had to deploy first, and it had to be successful in order to make changes to the customer-facing system. So, we had to fix the issues in order to make it stick.
We made the guardrail a required part of the automated CI/CD process and a required stage of a deployment. This automated a way to push it out that would fail any changes we were trying to make to the main site, and prevent the meta Statuspage from getting out of sync. We would know there was a problem immediately and have to fix that problem immediately.
There was some pain in fixing something that was very broken, but the upside was that we never got into that situation again.
It also drove home, culturally, that there was no ambiguity around whether having that site in sync was important. Everyone on the team knew it was important, because there was no way to deploy without getting past that guardrail.
Guardrails let developers make simple improvements
Guardrails are one of the lowest level tools developers have to make improvements because they can provide more context.
If devs want to understand the why behind the what, for example, they can put a guardrail in place that checks for an issue key in commit messages that they make in their repository. If it’s missing, it forces them to add that information.
This adds uniformity so that when somebody else tries to understand why there's a complicated bug, they can link off to that issue, read it, and understand the context behind it. Then, they’re off to the races without having to dig into the issue.
How to implement guardrails
Adding guardrails can be easy or hard, depending on the tooling and the advancement of your tooling. It’s a classic build-versus-buy situation.
If a tool has domain over a certain set of your work, that tool can offer some guardrails fairly easily. For example, if you use Github for pull requests and code review, you can add checks over the pull request itself.
It’s more challenging when you want to set guardrails for something that spans multiple tools.
If you want to build a guardrail that depends on incident management, issue tracking, source code hosting and your chat ops (Slack or Microsoft Teams), the barrier to entry can be quite high, because you’ll have four different inputs, APIs, and events.
So, what it takes to implement can be as easy as clicking a button in Github, or devoting weeks of time of a senior engineer to build a bespoke guardrail that works for your team. And don’t forget the maintenance required after you build it.
I’ve worked and talked with many teams that decide to build it. But down the road, inevitably they wonder if they could have bought it and avoided the time-consuming maintenance work they find themselves doing.
You can improve with guardrails
Consider the guardrails you have in place already — no matter how minor you think they are — and pay attention to how they work for your team. Do they work? Can you adjust them to improve your efficiency even more? What guardrail would help your developers?
Regardless of what you have or choose to put in place, automating guardrails enforces a way of working within the development process that takes the question out of how things are done. They’re efficient, effective, and can have a big impact.
Stay tuned for our next article on engineering automations, which will focus on how notifications and smart nudges mitigate toil for developers by driving awareness of conditions that require their attention so they can take immediate and relevant action.