Open in app

Sign In

Write

Sign In

Blameless
Blameless

70 Followers

Home

About

Apr 20, 2021

SREview Issue #12 April 2021

Spring is here! We have rain! We have flowers! We have allergies! We also have some of the most exciting Tweets, content, and events happening in the SRE and resilience engineering community this month. Tweets that have us twittering We need to stop with the “they need to feel their own pain” framing for service…

Site Reliability

4 min read

SREview Issue #12 April 2021
SREview Issue #12 April 2021
Site Reliability

4 min read


Apr 19, 2021

Resilience in Action E6: Oversize Coffee Mugs, SLOs, and ML with Todd Underwood

Audio here. Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Kurt Andersen. Kurt is a practitioner and an active thought leader in the SRE community. He speaks at…

Site Reliability

22 min read

Resilience in Action E6: Oversize Coffee Mugs, SLOs, and ML with Todd Underwood
Resilience in Action E6: Oversize Coffee Mugs, SLOs, and ML with Todd Underwood
Site Reliability

22 min read


Apr 13, 2021

What are MTTx Metrics Good For? Let’s Find Out.

By: Emily Arnott Data helps best-in-class teams make the right decisions. Analyzing your system’s metrics shows you where to invest time and resources. A common type of metric is Mean Time to X, or MTTx. These metrics detail the average time it takes for something to happen. …

Site Reliability

6 min read

What are MTTx Metrics Good For? Let’s Find Out.
What are MTTx Metrics Good For? Let’s Find Out.
Site Reliability

6 min read


Published in FAUN Publication

·Apr 12, 2021

Having On-call Nightmares? Runbooks can Help you Wake Up.

By: Harry Hull The nightmare You aren’t sure how long you’ve been here, but the view outside the window sure is soothing. Before you can fully take in your surroundings, a siren rips you back into the conscious world. …

Site Reliability

6 min read

TitHaving On-call Nightmares? Runbooks can Help you Wake Up.
TitHaving On-call Nightmares? Runbooks can Help you Wake Up.
Site Reliability

6 min read


Apr 6, 2021

SRE Leaders Panel: SRE Adoption as Organizational Transformation

Blameless recently had the privilege of hosting SRE leaders Kurt Andersen, SRE Architect at Blameless, Vanessa Yiu, Executive Director, Enterprise Architecture at Goldman Sachs, and Tony Hansmann, Former Global CTO at Pivotal Software, Inc. …

Site Reliability

30 min read

SRE Leaders Panel: SRE Adoption as Organizational Transformation
SRE Leaders Panel: SRE Adoption as Organizational Transformation
Site Reliability

30 min read


Published in FAUN Publication

·Apr 5, 2021

So you Want an SRE Tool. Do you Build, Buy, or Open Source?

By: Emily Arnott As your organization’s reliability needs grow, you may consider investing in SRE tools. Tooling can make many processes more efficient, consistent, and repeatable. When you decide to invest in tooling, one of the major decisions is how you’ll source your tools. …

Site Reliability

6 min read

So you Want an SRE Tool. Do you Build, Buy, or Open Source?
So you Want an SRE Tool. Do you Build, Buy, or Open Source?
Site Reliability

6 min read


Published in FAUN Publication

·Mar 30, 2021

How to Analyze Incidents Better with the Right Metrics

By: Emily Arnott An important SRE best practice is analyzing and learning from incidents. When an incident occurs, you shouldn’t think of it as a setback, but as an opportunity to grow. Good incident analysis involves building an incident retrospective. This document will contain everything from incident metrics to the…

Site Reliability

6 min read

How to Analyze Incidents Better with the Right Metrics
How to Analyze Incidents Better with the Right Metrics
Site Reliability

6 min read


Mar 23, 2021

SREview Issue #11 March 2021

Is it spring yet? Or spring still? Time sure is strange nowadays. At least we have a ton to look forward to in the next few weeks! Here are some of the most exciting Tweets, content, and events happening in the SRE and resilience engineering community this month. Tweets that have us twittering

Site Reliability

2 min read

ite SREview Issue #11 March 2021
ite SREview Issue #11 March 2021
Site Reliability

2 min read


Mar 22, 2021

How to Scale for Reliability and Trust

By: Emily Arnott As more people depend on your product, reliability expectations tend to grow. For a service to continue succeeding, it has to be one customers can rely upon. At the same time, as you bring on more customers, the technical demands put on your service increase as well. …

Site Reliability

6 min read

How to Scale for Reliability and Trust
How to Scale for Reliability and Trust
Site Reliability

6 min read


Published in FAUN Publication

·Mar 16, 2021

How to Analyze Contributing Factors Blamelessly

SRE advocates addressing problems blamelessly. When something goes wrong, don’t try to determine who is at fault. Instead, look for systemic causes. Adopting this approach has many benefits, from the practical to the cultural. Your system will become more resilient as you learn from each failure. …

Site Reliability

6 min read

How to Analyze Contributing Factors Blamelessly
How to Analyze Contributing Factors Blamelessly
Site Reliability

6 min read

Blameless

Blameless

70 Followers

Giving you all you need to know about Site Reliability Engineering. https://www.blameless.com/blog/

Following
  • Robert Barron

    Robert Barron

  • Jason Weiland

    Jason Weiland

  • Nassos Michas

    Nassos Michas

  • Netflix Technology Blog

    Netflix Technology Blog

  • Jamie Allen

    Jamie Allen

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech