SREview Issue #6 October 2020

BOO! Did we scare you? We couldn’t help it, we’re just so happy it’s spooky season. Here’s the October issue of SREview! This monthly zine features epic Tweets, content, and events happening in the SRE and resilience engineering community.

Take our survey on SRE Maturity & SLO Adoption: it will take only 5–10 minutes. 5 lucky winners will receive a $100 Amazon gift card!

Tweets that have us twittering

distributed systems is an attempt to answer the question “is it possible for something to be broken and still work at the same time”

- Senior Oops Engineer (@ReinH) September 22, 2020

My favourite metaphor for tech debt is dishes. you gotta do the dishes, not just the cooking.

If the dishes are done, then when the customer orders a new thing you can make it right away.

- Brenda Wallace, Potato Enthusiast (@BR3NDA) September 23, 2020

Maybe I need to write a blog post called “On Call For Managers”. If you’re asking engineers to be on call for their code — and you should — you owe in return:

- enough time to fix what’s broken
- hands to do the work
- closely track how often they are interrupted/woken
- ..etc

- Charity Majors (@mipsytipsy) September 25, 2020

SREading

The Comprehensive Guide on SLIs, SLOs, and Error Budgets: This 27-page guide walks through how to set SLIs and SLOs that matter to make data-informed decisions.

Here’s your Complete Definition of Software Reliability: In this blog post, we’ll break down what software reliability means in terms of perception, team operation, and customer happiness.

Alerting on SLOs: Glitch’s Mads Hartmann writes about the team’s progress in adopting SLOs, including the motivation behind implementation.

How to Construct a Reliability Model for your Organization: In this post, we’ll construct a basic reliability model and show you how to create one for your own organization.

Four Things I Wish I Knew as the New CTO of a Startup: Isabel Nyo writes about her experiences as CTO of a small startup and key lessons learned over the course of a year.

Give it a whirl

Blameless automates toil and creates guardrails during incidents, streamlines learning from incidents, and much more.

Try out our free sandbox today.

Upcoming events

Blameless Bi-Weekly Demo October 20 at 8 AM PST: Check out a live demo of Blameless as we walk you through operations best practices, and get your questions answered.

Unscripted ConferenceOctober 21–22: DevOps practitioners, and technology leaders to learn and share stories of simplified software delivery at scale, but with a twist.

Achieving Zero Downtime October 22 at 10 AM PST: Learn from Cindy Sridharan on how to conduct zero downtime deployments at the latest 99 Percent DevOps Talk from Lightstep.

Want to contribute?

If you’re looking to share your insights with the SRE and resilience engineering community, we’d love to partner with you on content. Fill out our form here and we’ll reach out!

Originally published at https://www.blameless.com.

Giving you all you need to know about Site Reliability Engineering. https://www.blameless.com/blog/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store