How to Scale for Reliability and Trust

“How to Scale for Reliability and Trust” white text on blue background.
  • Design services that can remain reliable while scaling
  • Balance reliability and development velocity
  • Respond to incidents using best practices
  • Build trust when incidents occur through good communication

Designing services that stay reliable while scaling

  1. Platform stability: Make sure that all services are running in similar environments. If you’re using Kubernetes, everything’s in Kubernetes. Terraform as much as you can or codify in another preferred way; the important thing is to prioritize consistency. This uniformity helps de-risk the process around spinning up new environments.
  2. Fast and small releases: Invest in automated testing, as it is crucial for enabling continuous delivery. You can’t manually test if you’re shipping out every change to prod.
  3. Observability: Implement distributed tracing and other observability efforts, then check to see if it’s working by asking novel questions of your system and analyzing the accuracy of the results.
  4. Service ownership: Plug all this into on-call. When something breaks, make sure that someone who understands the product can tackle the problem — ideally the person who built the service.

Balancing reliability and development velocity

Recover from outages faster with incident response

Reassure customers with good communication

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store