SREview Issue #9 January 2021

New year, new SRE! We’ve said goodbye to 2020 and hello to 2021. Here’s some of the most exciting Tweets, content, and events happening in the SRE and resilience engineering community so far this year.

Image for post
Image for post

Tweets that have us twittering

Coders often talk about refactoring, but I’d like to see more “prefactorings” — refactoring done to make a subsequent change simpler. Put these into their own commits (or even PRs!) which are verifiably “no-impact”. Use them to make your “real” change more obvious and surgical.

- Tim Hockin (@thockin) January 4, 2021

Abstraction teaches us that we must elide details in order to be able to reason about the behavior of complex systems.

Resilience engineering teaches us that we will inevitably abstract away too many details.

- Lorin Hochstein E_TOO_MANY_FAILURE_MODES (@lhochstein) January 3, 2021

In order of complexitiy
Uptime < Availability < Reliability < Trust

- Patrick Debois — #thinktogether (@patrickdebois) January 11, 2021


Modern Operations Best Practices from Engineering Leaders at New Relic and Tenable: Top experts discuss how teams can embrace SRE best practices and make cultural shifts towards blamelessness.

How Chaos Engineering Helps You Reduce Cloud Spend: Learn to right-size your infrastructure, be smart about redundancy, and more with chaos engineering.

Little Known Ways to Better Use Your Error Budgets: Featured in SRE Weekly #252. How error budgets can help cross-functional teams across the organization.

Home Alone: a Post-Incident Review: A blameless retrospective of how the McCallister family incident occurred and what the contributing factors were.

Top Reliability and Scaling Practices from Experts at Citrix, Greenlight Financial, and Incognia: Experts discuss best practices for responding to incidents, scaling for reliability, and engineering with the customer in mind.

Writing Runbook Documentation When You’re An SRE: Tips and tricks for writing effective runbook documentation when you aren’t a technical writer.

Give it a whirl

Teams have a new tool in their tool belts. Blameless Runbook Documentation is available for early access.

Image for post
Image for post

Runbooks are an industry best practice, empowering teams to codify the incident response process and drive process repeatability and consistency. These sets of instructions allow teams to resolve incidents faster with greater confidence and less toil.

Fill out this form to see Runbook Documentation in action.


Blameless Bi-Weekly Demo February 2 at 8 AM PST: Check out a live demo of Blameless as we walk you through operations best practices, and get your questions answered.

DeveloperWeek 2021February 17–19: Discover the latest in developer technologies, languages, platforms, and tools. Register here and get a FREE open pass. Plus catch our own Matt Davis giving a talk.

Want to contribute?

If you’re looking to share your insights with the SRE and resilience engineering community, we’d love to partner with you on content. Fill out our form here and we’ll reach out!

Originally published at

Giving you all you need to know about Site Reliability Engineering.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store