Is love in the air? We think so. While we don’t have chocolate or flowers for you, we have something just as sweet. Here are some of the most exciting Tweets, content, and events happening in the SRE and resilience engineering community this February.
Tweets that have us twittering
How much of what is labeled “human error” is simply real world limits of human perception, information processing and other constraints of being human in a complex world?
- matt scanlon (@picudoc13) February 8, 2021
The ability to roll back safely is important, but once you have a reasonable feedback loop and gradual rollout, the vast majority of your problems in prod will be long-extant problems that were surfaced under just the right conditions.
- Vallery Lancey (@vllry) February 15, 2021
Tired: blameless post-mortems
Wired: gripping accounts of the incident experience
- Lorin Hochstein E_TOO_MANY_FAILURE_MODES (@lhochstein) February 13, 2021
“I’m Just Doing my Job,” An SRE Myth: Blameless SRE Darrell Pappa writes about how organizations can become more customer-centric. Featured in SRE Weekly #256.
On Not Being a Cog in the Machine: Honeycomb’s first SRE Fred Hebert writes about his thoughts on human processes, socio-technical systems, and observability.
Communication Tool Down? Here are 3 Ways to Handle it: Learn how to work through a communication tooling failure via chaos engineering, eliminating SPOFs, and more.
Slack’s Outage on January 4th 2021: Laura Nolan writes an in-depth retrospective on Slack’s recent incident.
4 Tips on Preparing for a [Great] Failure: SRE techniques for mitigating the impacts of system failure including building runbooks, assessing with SLOs, monitoring metrics, and more.
How Cloud Services Platform Teams Can Drive The Adoption Of Effective SRE Practices: Tina Huang writes about using cloud transformations to drive SRE adoption.
Give it a whirl
Teams have a new tool in their tool belts. Blameless Runbook Documentation is available for early access.
Runbooks are an industry best practice, empowering teams to codify the incident response process and drive process repeatability and consistency. These sets of instructions allow teams to resolve incidents faster with greater confidence and less toil.
Fill out this form to see Runbook Documentation in action.
Blameless Bi-Weekly Demo March 2 at 8 AM PST: Check out a live demo of Blameless as we walk you through operations best practices, and get your questions answered.
Want to contribute?
If you’re looking to share your insights with the SRE and resilience engineering community, we’d love to partner with you on content. Fill out our form here and we’ll reach out!
Originally published at https://www.blameless.com.