Top Practices for Runbook Automation

What is runbook automation?

Runbooks, also known as playbooks, are documents that walk you through a certain task with specific steps. For example, a runbook for spinning up a new server might ask some questions about the purpose of the server and its estimated load, then lead you to the appropriate instructions and settings. Runbooks ease the cognitive load of these common tasks by clearly outlining the process for each.

A runbook template: Key steps to consider

To create runbooks that automatically use a variety of services, you’ll need to understand how each service functions and how they connect. Map these connections and include information on how automation tools can control each service to lay a solid foundation for future runbooks.

  1. Identify the right service owners: Common tasks often have common steps — subtask procedures like auditing, version control, and deployment are likely to overlap. Identify these key steps and clearly define their processes, then compile them into a list. Future runbook authors should use steps from this list when possible for consistency.
  2. Lay out key procedures and checklist tasks: Now that you have a list of key procedures that recur in many tasks, you also have a great starting point for finding automation opportunities. Look for things that can be scripted, and ways to have scripts trigger subsequent scripts. Make your automated steps modular so they can be baked into a variety of runbooks.
  3. Identify methods to bake into automation: Resources like the architecture map, service owner repository, and list of common tasks aren’t to be created once and left untouched. Include updating these resources as a checklist task on procedures that would modify them, and also have regular checks to ensure they’re up to date. When you revisit them, take the opportunity to learn from them again, looking for new opportunities to automate and optimize.

How to write simple runbooks for complex workflows

One of the most powerful features of automated runbooks or playbooks is their ability to navigate long conditional paths to complete complex tasks. Consider a runbook created to update the settings for a variety of development environments. This could require the automating tool to check many variables and deploy different changes for each combination, quickly creating a tree with many branches. Manually determining which branch to shake can be a tedious challenge, but the automated runbook finds the correct branch with ease.

Make creating new automated runbooks easy

To get the most out of runbook automation, developers should be encouraged to implement them where possible to help create guardrails around specific processes. You should never assume that any area of development and operations is unable to be automated — even in the most nuanced projects, you’ll find simpler subtasks that could be automated. Likewise, consider automating even seemingly novel tasks. Your investment in automation can pay big dividends if these tasks do end up recurring.

Integrate runbook automation into every aspect of DevOps

There are opportunities to automate and save time in even the most nuanced aspects of development and operations. To empower this, your automated runbooks should hook into every tool in your stack. One route to this connection is to have tools that can be easily controlled through things like external scripts, allowing the orchestrating runbook automation tool to deploy custom instructions.

Have automated runbooks for reliability events

One of the most helpful ways to use automated runbooks is in incident response, increasing the speed and consistency of resolution. Create automated runbooks for your common troubleshooting processes, and have them trigger in response to outages, extreme load, or other SLOs.

Blameless helps you get the most out of automated runbooks

The Blameless SRE platform provides many tools to help you get the most from runbook automation:

  • They also encourage the creation of new automated runbooks by highlighting the procedural logic behind complex tasks.
  • Key runbook activities are automatically captured in Blameless’ incident retrospectives, also known as postmortems, allowing teams to focus on building more insightful incident narratives
  • Blameless reliability insights can highlight areas where certain incidents or workflows can benefit from more automation.

Giving you all you need to know about Site Reliability Engineering.