Becoming a Rockstar SRE: Electrify your site reliability engineering mindset to build reliable, resilient, and efficient systems

Ebook1,017 pages7 hours

Becoming a Rockstar SRE: Electrify your site reliability engineering mindset to build reliable, resilient, and efficient systems

Name: Becoming a Rockstar SRE: Electrify your site reliability engineering mindset to build reliable, resilient, and efficient systems
Author: Jeremy Proffitt
ISBN: 9781804614563

By Jeremy Proffitt and Rod Anami

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Site reliability engineering is all about continuous improvement, finding the balance between business and product demands while working within technological limitations to drive higher revenue. But quantifying and understanding reliability, handling resources, and meeting developer requirements can sometimes be overwhelming. With a focus on reliability from an infrastructure and coding perspective, Becoming a Rockstar SRE brings forth the site reliability engineer (SRE) persona using real-world examples.
This book will acquaint you the role of an SRE, followed by the why and how of site reliability engineering. It walks you through the jobs of an SRE, from the automation of CI/CD pipelines and reducing toil to reliability best practices. You’ll learn what creates bad code and how to circumvent it with reliable design and patterns. The book also guides you through interacting and negotiating with businesses and vendors on various technical matters and exploring observability, outages, and why and how to craft an excellent runbook. Finally, you’ll learn how to elevate your site reliability engineering career, including certifications and interview tips and questions.
By the end of this book, you’ll be able to identify and measure reliability, reduce downtime, troubleshoot outages, and enhance productivity to become a true rockstar SRE!

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateApr 28, 2023

ISBN9781804614563

Author

Jeremy Proffitt

Related authors

Skip carousel

Related to Becoming a Rockstar SRE

Related ebooks

Skip carousel

Clean Code with C#: Refactor your legacy C# code base and improve application performance using best practices
Ebook
Clean Code with C#: Refactor your legacy C# code base and improve application performance using best practices
byJason Alls
Rating: 0 out of 5 stars
0 ratings
Diving into Secure Access Service Edge: A technical leadership guide to achieving success with SASE at market speed
Ebook
Diving into Secure Access Service Edge: A technical leadership guide to achieving success with SASE at market speed
byJeremiah
Rating: 0 out of 5 stars
0 ratings
Azure Architecture Explained: A comprehensive guide to building effective cloud solutions
Ebook
Azure Architecture Explained: A comprehensive guide to building effective cloud solutions
byDavid Rendón
Rating: 0 out of 5 stars
0 ratings
Hands-on Site Reliability Engineering: Build Capability to Design, Deploy, Monitor, and Sustain Enterprise Software Systems at Scale (English Edition)
Ebook
Hands-on Site Reliability Engineering: Build Capability to Design, Deploy, Monitor, and Sustain Enterprise Software Systems at Scale (English Edition)
byShamayel Mohammed Farooqui
Rating: 0 out of 5 stars
0 ratings
State Management with React Query: Improve developer and user experience by mastering server state in React
Ebook
State Management with React Query: Improve developer and user experience by mastering server state in React
byDaniel Afonso
Rating: 0 out of 5 stars
0 ratings
Modernizing Legacy Applications to Microsoft Azure: Plan and execute your modernization journey seamlessly
Ebook
Modernizing Legacy Applications to Microsoft Azure: Plan and execute your modernization journey seamlessly
bySteve Read
Rating: 0 out of 5 stars
0 ratings
Full Stack Web Development with Remix: Enhance the user experience and build better React apps by utilizing the web platform
Ebook
Full Stack Web Development with Remix: Enhance the user experience and build better React apps by utilizing the web platform
byAndre Landgraf
Rating: 0 out of 5 stars
0 ratings
MuleSoft Platform Architect's Guide: A practical guide to using Anypoint Platform's capabilities to architect, deliver, and operate APIs
Ebook
MuleSoft Platform Architect's Guide: A practical guide to using Anypoint Platform's capabilities to architect, deliver, and operate APIs
byJitendra Bafna
Rating: 0 out of 5 stars
0 ratings
Cracking Microservices Interview: Learn Advance Concepts, Patterns, Best Practices, NFRs, Frameworks, Tools and DevOps
Ebook
Cracking Microservices Interview: Learn Advance Concepts, Patterns, Best Practices, NFRs, Frameworks, Tools and DevOps
bySameer S Paradkar
Rating: 3 out of 5 stars
3/5
Enterprise-Grade Hybrid and Multi-Cloud Strategies: Proven strategies to digitally transform your business with hybrid and multi-cloud solutions
Ebook
Enterprise-Grade Hybrid and Multi-Cloud Strategies: Proven strategies to digitally transform your business with hybrid and multi-cloud solutions
bySathya AG
Rating: 0 out of 5 stars
0 ratings
Principles of Software Architecture Modernization: Delivering engineering excellence with the art of fixing microservices, monoliths, and distributed monoliths (English Edition)
Ebook
Principles of Software Architecture Modernization: Delivering engineering excellence with the art of fixing microservices, monoliths, and distributed monoliths (English Edition)
byDiego Pacheco
Rating: 0 out of 5 stars
0 ratings
AWS Observability Handbook: Monitor, trace, and alert your cloud applications with AWS' myriad observability tools
Ebook
AWS Observability Handbook: Monitor, trace, and alert your cloud applications with AWS' myriad observability tools
byPhani Kumar Lingamallu
Rating: 0 out of 5 stars
0 ratings
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
Ebook
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
byPedro Lopes
Rating: 0 out of 5 stars
0 ratings
Digital Transformation with Dataverse for Teams: Become a citizen developer and lead the digital transformation wave with Microsoft Teams and Power Platform
Ebook
Digital Transformation with Dataverse for Teams: Become a citizen developer and lead the digital transformation wave with Microsoft Teams and Power Platform
bySrikumar Nair
Rating: 0 out of 5 stars
0 ratings
The Self-Taught Cloud Computing Engineer: A comprehensive professional study guide to AWS, Azure, and GCP
Ebook
The Self-Taught Cloud Computing Engineer: A comprehensive professional study guide to AWS, Azure, and GCP
byDr. Logan Song
Rating: 0 out of 5 stars
0 ratings
Enterprise Application Development with C# 9 and .NET 5: Enhance your C# and .NET skills by mastering the process of developing professional-grade web applications
Ebook
Enterprise Application Development with C# 9 and .NET 5: Enhance your C# and .NET skills by mastering the process of developing professional-grade web applications
byRishabh Verma
Rating: 0 out of 5 stars
0 ratings
Developing Cloud Native Applications in Azure using .NET Core: A Practitioner’s Guide to Design, Develop and Deploy Apps
Ebook
Developing Cloud Native Applications in Azure using .NET Core: A Practitioner’s Guide to Design, Develop and Deploy Apps
byRekha Kodali
Rating: 0 out of 5 stars
0 ratings
Ultimate Data Engineering with Databricks: Develop Scalable Data Pipelines Using Data Engineering's Core Tenets Such as Delta Tables, Ingestion, Transformation, Security, and Scalability
Ebook
Ultimate Data Engineering with Databricks: Develop Scalable Data Pipelines Using Data Engineering's Core Tenets Such as Delta Tables, Ingestion, Transformation, Security, and Scalability
byMayank Malhotra
Rating: 0 out of 5 stars
0 ratings
Ultimate Data Engineering with Databricks
Ebook
Ultimate Data Engineering with Databricks
byMayank Malhotra
Rating: 0 out of 5 stars
0 ratings
AWS CDK in Practice: Unleash the power of ordinary coding and streamline complex cloud applications on AWS
Ebook
AWS CDK in Practice: Unleash the power of ordinary coding and streamline complex cloud applications on AWS
byMark Avdi
Rating: 0 out of 5 stars
0 ratings
Policy Design in the Age of Digital Adoption.: Explore how PolicyOps can drive Policy as Code adoption in an organization's digital transformation
Ebook
Policy Design in the Age of Digital Adoption.: Explore how PolicyOps can drive Policy as Code adoption in an organization's digital transformation
byRicardo Ferreira
Rating: 0 out of 5 stars
0 ratings
Data Lake for Enterprises: Lambda Architecture for building enterprise data systems
Ebook
Data Lake for Enterprises: Lambda Architecture for building enterprise data systems
byTomcy John
Rating: 0 out of 5 stars
0 ratings
Implementing Event-Driven Microservices Architecture in .NET 7: Develop event-based distributed apps that can scale with ever-changing business demands using C# 11 and .NET 7
Ebook
Implementing Event-Driven Microservices Architecture in .NET 7: Develop event-based distributed apps that can scale with ever-changing business demands using C# 11 and .NET 7
byJoshua Garverick
Rating: 0 out of 5 stars
0 ratings
Intelligent Workloads at the Edge: Deliver cyber-physical outcomes with data and machine learning using AWS IoT Greengrass
Ebook
Intelligent Workloads at the Edge: Deliver cyber-physical outcomes with data and machine learning using AWS IoT Greengrass
byIndraneel Mitra
Rating: 0 out of 5 stars
0 ratings
Serverless Beyond the Buzzword: What Can Serverless Architecture Do for You?
Ebook
Serverless Beyond the Buzzword: What Can Serverless Architecture Do for You?
byThomas Smart
Rating: 0 out of 5 stars
0 ratings
Architecting Cloud Computing Solutions: Build cloud strategies that align technology and economics while effectively managing risk
Ebook
Architecting Cloud Computing Solutions: Build cloud strategies that align technology and economics while effectively managing risk
byKevin L. Jackson
Rating: 0 out of 5 stars
0 ratings
Cracking the Data Engineering Interview: Land your dream job with the help of resume-building tips, over 100 mock questions, and a unique portfolio
Ebook
Cracking the Data Engineering Interview: Land your dream job with the help of resume-building tips, over 100 mock questions, and a unique portfolio
byKedeisha Bryan
Rating: 0 out of 5 stars
0 ratings
Test-Driven Development with PHP 8: Build extensible, reliable, and maintainable enterprise-level applications using TDD and BDD with PHP
Ebook
Test-Driven Development with PHP 8: Build extensible, reliable, and maintainable enterprise-level applications using TDD and BDD with PHP
byRainier Sarabia
Rating: 0 out of 5 stars
0 ratings
Cloud Solution Architect's Career Master Plan: Proven techniques and effective tips to help you become a successful solution architect
Ebook
Cloud Solution Architect's Career Master Plan: Proven techniques and effective tips to help you become a successful solution architect
byRick Weyenberg
Rating: 0 out of 5 stars
0 ratings
Ext JS Application Development Blueprints
Ebook
Ext JS Application Development Blueprints
byColin Ramsay
Rating: 0 out of 5 stars
0 ratings

Software Development & Engineering For You

Skip carousel

Python For Dummies
Ebook
Python For Dummies
byStef Maruch
Rating: 4 out of 5 stars
4/5
Android App Development For Dummies
Ebook
Android App Development For Dummies
byMichael Burton
Rating: 0 out of 5 stars
0 ratings
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
So You Want to Start a Podcast: Finding Your Voice, Telling Your Story, and Building a Community That Will Listen
Ebook
So You Want to Start a Podcast: Finding Your Voice, Telling Your Story, and Building a Community That Will Listen
byKristen Meinzer
Rating: 3 out of 5 stars
3/5
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
Ebook
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
byMark Chan
Rating: 5 out of 5 stars
5/5
Hand Lettering on the iPad with Procreate: Ideas and Lessons for Modern and Vintage Lettering
Ebook
Hand Lettering on the iPad with Procreate: Ideas and Lessons for Modern and Vintage Lettering
byLiz Kohler Brown
Rating: 4 out of 5 stars
4/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
Creative Selection: Inside Apple's Design Process During the Golden Age of Steve Jobs
Ebook
Creative Selection: Inside Apple's Design Process During the Golden Age of Steve Jobs
byKen Kocienda
Rating: 5 out of 5 stars
5/5
Beginning Programming For Dummies
Ebook
Beginning Programming For Dummies
byWallace Wang
Rating: 4 out of 5 stars
4/5
Level Up! The Guide to Great Video Game Design
Ebook
Level Up! The Guide to Great Video Game Design
byScott Rogers
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
Ebook
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
byTJ Books
Rating: 3 out of 5 stars
3/5
Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems
Ebook
Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems
byDavid J. Agans
Rating: 4 out of 5 stars
4/5
Lua Game Development Cookbook
Ebook
Lua Game Development Cookbook
byMário Kašuba
Rating: 0 out of 5 stars
0 ratings
The Python Workshop: Learn to code in Python and kickstart your career in software development or data science
Ebook
The Python Workshop: Learn to code in Python and kickstart your career in software development or data science
byAndrew Bird
Rating: 5 out of 5 stars
5/5
OneNote: The Ultimate Guide on How to Use Microsoft OneNote for Getting Things Done
Ebook
OneNote: The Ultimate Guide on How to Use Microsoft OneNote for Getting Things Done
byChris Will
Rating: 1 out of 5 stars
1/5
Tiny Python Projects: Learn coding and testing with puzzles and games
Ebook
Tiny Python Projects: Learn coding and testing with puzzles and games
byKen Youens-Clark
Rating: 5 out of 5 stars
5/5
Managing Humans: Biting and Humorous Tales of a Software Engineering Manager
Ebook
Managing Humans: Biting and Humorous Tales of a Software Engineering Manager
byMichael Lopp
Rating: 4 out of 5 stars
4/5
How Do I Do That In InDesign?
Ebook
How Do I Do That In InDesign?
byDave Clayton
Rating: 5 out of 5 stars
5/5
Adobe Illustrator CC For Dummies
Ebook
Adobe Illustrator CC For Dummies
byDavid Karlins
Rating: 5 out of 5 stars
5/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byChris Minnick
Rating: 0 out of 5 stars
0 ratings
Agile Practice Guide
Ebook
Agile Practice Guide
byProject Management Institute
Rating: 4 out of 5 stars
4/5
How to Write Effective Emails at Work
Ebook
How to Write Effective Emails at Work
byRamakrishna Reddy
Rating: 4 out of 5 stars
4/5
Gray Hat Hacking the Ethical Hacker's
Ebook
Gray Hat Hacking the Ethical Hacker's
byÇağatay Şanlı
Rating: 5 out of 5 stars
5/5
27 PROGRAM MANAGEMENT INTERVIEW TECHNIQUES - To Ace That Dream Job Offer !
Ebook
27 PROGRAM MANAGEMENT INTERVIEW TECHNIQUES - To Ace That Dream Job Offer !
byKumar Saurabh
Rating: 5 out of 5 stars
5/5
Data Visualization: a successful design process
Ebook
Data Visualization: a successful design process
byAndy Kirk
Rating: 4 out of 5 stars
4/5
Succeeding with AI: How to make AI work for your business
Ebook
Succeeding with AI: How to make AI work for your business
byVeljko Krunic
Rating: 0 out of 5 stars
0 ratings
Ry's Git Tutorial
Ebook
Ry's Git Tutorial
byRyan Hodson
Rating: 0 out of 5 stars
0 ratings
Good Code, Bad Code: Think like a software engineer
Ebook
Good Code, Bad Code: Think like a software engineer
byTom Long
Rating: 5 out of 5 stars
5/5
Expert Python Programming - Third Edition: Become a master in Python by learning coding best practices and advanced programming concepts in Python 3.7, 3rd Edition
Ebook
Expert Python Programming - Third Edition: Become a master in Python by learning coding best practices and advanced programming concepts in Python 3.7, 3rd Edition
byMichał Jaworski
Rating: 0 out of 5 stars
0 ratings
iPhone Application Development For Dummies
Ebook
iPhone Application Development For Dummies
byNeal Goldstein
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

The Disciplined Pursuit of Less: Using AI and Design to Maximize Customer Impact w/ Dheeraj Pandey #169: In today’s episode, we’re resharing Dheeraj Pandey’s popular session from ELC Annual 2023 on the disciplined pursuit of less! As the Co-Founder, CEO & Chairman of DevRev.ai, he shares how AI tools can maximize customer impact & reduce information asymmetry between various teams, including eng, customer support, product, sales, etc., ultimately creating a more customer-centric mindset. He reveals how to leverage AI to tackle “verbs,” such as classifying, routing, attributing, summarizing and more, further streamlining productivity and empowering your org to focus on customer needs.
Podcast episode
The Disciplined Pursuit of Less: Using AI and Design to Maximize Customer Impact w/ Dheeraj Pandey #169: In today’s episode, we’re resharing Dheeraj Pandey’s popular session from ELC Annual 2023 on the disciplined pursuit of less! As the Co-Founder, CEO & Chairman of DevRev.ai, he shares how AI tools can maximize customer impact & reduce information asymmetry between various teams, including eng, customer support, product, sales, etc., ultimately creating a more customer-centric mindset. He reveals how to leverage AI to tackle “verbs,” such as classifying, routing, attributing, summarizing and more, further streamlining productivity and empowering your org to focus on customer needs.
byThe Engineering Leadership Podcast
0 ratings
0% found this document useful
Customer narratives, business fluency & investing in developer experience w/ Marco Argenti #101: Customer narratives are a transformative tool to help you build successful products! Marco Argenti (CIO @ Goldman Sachs) explains how to develop these narratives as your team’s guiding vision and help eng orgs better understand “the business” side of software. Plus we cover best practices for investing in developer experience, Goldman Sachs’ transition to prioritize external developers, and the signs, signals and trends Marco’s used to navigate his career across tons of different emerging technology fields.
Podcast episode
Customer narratives, business fluency & investing in developer experience w/ Marco Argenti #101: Customer narratives are a transformative tool to help you build successful products! Marco Argenti (CIO @ Goldman Sachs) explains how to develop these narratives as your team’s guiding vision and help eng orgs better understand “the business” side of software. Plus we cover best practices for investing in developer experience, Goldman Sachs’ transition to prioritize external developers, and the signs, signals and trends Marco’s used to navigate his career across tons of different emerging technology fields.
byThe Engineering Leadership Podcast
0 ratings
0% found this document useful
106 | Test Driven Development (TTD) and Testing: Brad and Amy discuss different testing methods and the importance of properly tested code.
Podcast episode
106 | Test Driven Development (TTD) and Testing: Brad and Amy discuss different testing methods and the importance of properly tested code.
byCOMPRESSEDfm
0 ratings
0% found this document useful
44 | What does it look like to work on an actual dev team?
Podcast episode
44 | What does it look like to work on an actual dev team?
byCOMPRESSEDfm
0 ratings
0% found this document useful
SRE for the non-unicorns (aka Enterprises) with James Brookbank: You have a CISO (Chief Security Information Officer) but no CRO (Chief Reliability Officer)? You blame people if systems crash? You scale your people in the rate of scaling your infrastructure? If you answer any of those questions with YES then you...
Podcast episode
SRE for the non-unicorns (aka Enterprises) with James Brookbank: You have a CISO (Chief Security Information Officer) but no CRO (Chief Reliability Officer)? You blame people if systems crash? You scale your people in the rate of scaling your infrastructure? If you answer any of those questions with YES then you...
byPurePerformance
0 ratings
0% found this document useful
Navigating the World of Cybersecurity | Corey White, Cyvatar: Cybersecurity is at the forefront of today's news as the frequency of cyber-attacks continues to escalate alarmingly. Statistics reveal that the number of global cyber-attacks rose by a significant 28% during Q3 of 2022, compared to the previous year....
Podcast episode
Navigating the World of Cybersecurity | Corey White, Cyvatar: Cybersecurity is at the forefront of today's news as the frequency of cyber-attacks continues to escalate alarmingly. Statistics reveal that the number of global cyber-attacks rose by a significant 28% during Q3 of 2022, compared to the previous year....
byBuild Tech Stack Equity
0 ratings
0% found this document useful
How Redpanda Extracts Business Value from Data Events with Alex Gallego
Podcast episode
How Redpanda Extracts Business Value from Data Events with Alex Gallego
byScreaming in the Cloud
0 ratings
0% found this document useful
4 | What Stack Should You Use on a New Dev Project in 2021?: James and Amy discuss different categories of sites and the best tools and tech stacks to reach for. Categories include brochure and marketing sites, eCommerce, Applications, and Membership sites.
Podcast episode
4 | What Stack Should You Use on a New Dev Project in 2021?: James and Amy discuss different categories of sites and the best tools and tech stacks to reach for. Categories include brochure and marketing sites, eCommerce, Applications, and Membership sites.
byCOMPRESSEDfm
0 ratings
0% found this document useful
Connecting bugs & quality to the business bottom line w/ Dave Rhodes #190: Dave Rhodes, CEO @ Sauce Labs, joins the pod to discuss the value of great digital experiences & how/why quality issues affect companies’ bottom lines, and how to connect bugs to the business! Dave dissects strategies for addressing quality issues, examples connecting quality with the bottom line, best practices for quality testing strategies, and incorporating the philosophy of embracing the impossible within your eng teams. We also cover highlights of Dave’s recent report, “Every Experience Counts” exploring the relationship between broken experiences, lost consumer trust, and topline revenue. And to set the stage & magnify the stakes, the Crowdstrike & Microsoft outage coincidentally happened the day we hit record.
Podcast episode
Connecting bugs & quality to the business bottom line w/ Dave Rhodes #190: Dave Rhodes, CEO @ Sauce Labs, joins the pod to discuss the value of great digital experiences & how/why quality issues affect companies’ bottom lines, and how to connect bugs to the business! Dave dissects strategies for addressing quality issues, examples connecting quality with the bottom line, best practices for quality testing strategies, and incorporating the philosophy of embracing the impossible within your eng teams. We also cover highlights of Dave’s recent report, “Every Experience Counts” exploring the relationship between broken experiences, lost consumer trust, and topline revenue. And to set the stage & magnify the stakes, the Crowdstrike & Microsoft outage coincidentally happened the day we hit record.
byThe Engineering Leadership Podcast
0 ratings
0% found this document useful
22 | Hiring a Designer or Getting Your First UI / UX Job: In this episode, Amy and James talk about all the components that make up a good designer. If you're hiring, how to interview a designer and what to look for. Or, if you're trying to land your first design job, what to expect in the interview process
Podcast episode
22 | Hiring a Designer or Getting Your First UI / UX Job: In this episode, Amy and James talk about all the components that make up a good designer. If you're hiring, how to interview a designer and what to look for. Or, if you're trying to land your first design job, what to expect in the interview process
byCOMPRESSEDfm
0 ratings
0% found this document useful
Unpacking the Costs and Value of Observability with Martin Mao
Podcast episode
Unpacking the Costs and Value of Observability with Martin Mao
byScreaming in the Cloud
0 ratings
0% found this document useful
Hyper-Personalizing the Customer Experience w/ AI with Rob Walker - TWiML Talk #127: In this episode, we're joined by Rob Walker, Vice…
Podcast episode
Hyper-Personalizing the Customer Experience w/ AI with Rob Walker - TWiML Talk #127: In this episode, we're joined by Rob Walker, Vice…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
WBSP289: Grow Your Business by Learning eCommerce Wireframes Strategies, a Live Interview w/ a Panel of Experts
Podcast episode
WBSP289: Grow Your Business by Learning eCommerce Wireframes Strategies, a Live Interview w/ a Panel of Experts
byWBSRocks: Business Growth with ERP and Digital Transformation
0 ratings
0% found this document useful
AWS and the Journey to Responsible AI with Diya Wynn
Podcast episode
AWS and the Journey to Responsible AI with Diya Wynn
byScreaming in the Cloud
0 ratings
0% found this document useful
Finding Wealth in Relationships: A Customer-Centric Approach to Presales Engineering with Gene Torres (DF#112): In this week’s episode, Gene Torres, a Senior Presales Engineer, shares his personal journey and valuable insights into the world of presales engineering. We dive into the importance of continuously seeking new opportunities and not settling in your ...
Podcast episode
Finding Wealth in Relationships: A Customer-Centric Approach to Presales Engineering with Gene Torres (DF#112): In this week’s episode, Gene Torres, a Senior Presales Engineer, shares his personal journey and valuable insights into the world of presales engineering. We dive into the importance of continuously seeking new opportunities and not settling in your ...
byDegree Free
0 ratings
0% found this document useful
47 | Brain Dump on React Hooks: This episode is all about hooks within React: useState, useEffect, useReducer, useContext, useRef, useMemo, and useCallback.
Podcast episode
47 | Brain Dump on React Hooks: This episode is all about hooks within React: useState, useEffect, useReducer, useContext, useRef, useMemo, and useCallback.
byCOMPRESSEDfm
0 ratings
0% found this document useful
Building Inclusive Products with Jeremy King #67: Jeremy King (SVP of Engineering @ Pinterest) discusses some of the challenges, principles & frameworks behind building inclusive products. We also cover filtering decisions through your company mission, investing in rest and emerging challenges around creating serendipity with ideas, onboarding, retaining talent and the hard logistics of workplace flexibility.
Podcast episode
Building Inclusive Products with Jeremy King #67: Jeremy King (SVP of Engineering @ Pinterest) discusses some of the challenges, principles & frameworks behind building inclusive products. We also cover filtering decisions through your company mission, investing in rest and emerging challenges around creating serendipity with ideas, onboarding, retaining talent and the hard logistics of workplace flexibility.
byThe Engineering Leadership Podcast
0 ratings
0% found this document useful
65 | Validating an Application: How do I know I’m building the right thing? Where do I start?: In this episode, Amy and James talk about the process of validating an application idea, planning it out, and determining which features get built first.
Podcast episode
65 | Validating an Application: How do I know I’m building the right thing? Where do I start?: In this episode, Amy and James talk about the process of validating an application idea, planning it out, and determining which features get built first.
byCOMPRESSEDfm
0 ratings
0% found this document useful
SaaStr 179: What It Means To Be An ARR First SaaS Company, The Most Commonly Misunderstood SaaS Metrics & Why Renewals Does Not Mean Happy Customers with Dave Kellogg, CEO @ Host Analytics: Dave Kellogg is the CEO @ Host Analytics, the leader in cloud-based enterprise performance management (EPM). Previously, Dave was SVP/GM of Service Cloud at Salesforce and CEO at unstructured big data provider MarkLogic. Before that, Dave was CMO at...
Podcast episode
SaaStr 179: What It Means To Be An ARR First SaaS Company, The Most Commonly Misunderstood SaaS Metrics & Why Renewals Does Not Mean Happy Customers with Dave Kellogg, CEO @ Host Analytics: Dave Kellogg is the CEO @ Host Analytics, the leader in cloud-based enterprise performance management (EPM). Previously, Dave was SVP/GM of Service Cloud at Salesforce and CEO at unstructured big data provider MarkLogic. Before that, Dave was CMO at...
byThe Official SaaStr Podcast: SaaS | Founders | Investors
0 ratings
0% found this document useful
Self Service and Org Charts
Podcast episode
Self Service and Org Charts
byThe Cloudcast
0 ratings
0% found this document useful
The Pricing Model That Led to 8-FIGURE AGENCY GROWTH with Graeme Barlow | Ep #645: Are you struggling to scale your digital agency to 7-figures and beyond? Do you think 8-figure or 9-figure agency owners must have it all figured out? Today’s guest shares the journey of taking his smaller agency to an eight-figure business and...
Podcast episode
The Pricing Model That Led to 8-FIGURE AGENCY GROWTH with Graeme Barlow | Ep #645: Are you struggling to scale your digital agency to 7-figures and beyond? Do you think 8-figure or 9-figure agency owners must have it all figured out? Today’s guest shares the journey of taking his smaller agency to an eight-figure business and...
bySmart Agency Masterclass with Jason Swenk: Podcast for Digital Marketing Agencies
0 ratings
0% found this document useful
Making Email Better With AI At Shortwave: Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee started work on Shortwave he was focused on making email more productive. When AI started gaining adoption he realized that he had even more potential for a transformative experience. In this episode he shares the technical challenges that he and his team have overcome in integrating AI into their product, as well as the benefits and features that it provides to their customers.
Podcast episode
Making Email Better With AI At Shortwave: Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee started work on Shortwave he was focused on making email more productive. When AI started gaining adoption he realized that he had even more potential for a transformative experience. In this episode he shares the technical challenges that he and his team have overcome in integrating AI into their product, as well as the benefits and features that it provides to their customers.
byData Engineering Podcast
0 ratings
0% found this document useful
#28 AI and the Driving Experience: Enhancing EVs Through Smart Technology: On this episode, Eric Wood, Vice President of Product Experience at Rivian, discusses the future of EVs and software-defined vehicles, emphasizing the need for seamless integration and personalized experiences. He also digs into the importance of collaboration and empathy in design and he highlights the significance of customer-centric leadership and building brands that deliver on promises.
Podcast episode
#28 AI and the Driving Experience: Enhancing EVs Through Smart Technology: On this episode, Eric Wood, Vice President of Product Experience at Rivian, discusses the future of EVs and software-defined vehicles, emphasizing the need for seamless integration and personalized experiences. He also digs into the importance of collaboration and empathy in design and he highlights the significance of customer-centric leadership and building brands that deliver on promises.
byExperts of Experience
0 ratings
0% found this document useful
093 | Full Time Content Creation: Amy discusses with James his recent job change and how he plans to move forward in full time content creation.
Podcast episode
093 | Full Time Content Creation: Amy discusses with James his recent job change and how he plans to move forward in full time content creation.
byCOMPRESSEDfm
0 ratings
0% found this document useful
WBSP384: Grow Your Business by Learning the Best Practices of Product Configurator, a Live Interview w/ a Panel of Experts
Podcast episode
WBSP384: Grow Your Business by Learning the Best Practices of Product Configurator, a Live Interview w/ a Panel of Experts
byWBSRocks: Business Growth with ERP and Digital Transformation
0 ratings
0% found this document useful
Stages of Enterprise AI Maturity, from a Practitioner's Perspective - with Rajkumar Bondugula of Verizon: This week’s guest is Rajkumar Bondugula, Chief Data Scientist at Verizon. Rajkumar holds a Ph.D. in Machine Learning and was previously the Principal Data Scientist at Equifax. In this episode, Rajkumar clarifies some of the critical differences...
Podcast episode
Stages of Enterprise AI Maturity, from a Practitioner's Perspective - with Rajkumar Bondugula of Verizon: This week’s guest is Rajkumar Bondugula, Chief Data Scientist at Verizon. Rajkumar holds a Ph.D. in Machine Learning and was previously the Principal Data Scientist at Equifax. In this episode, Rajkumar clarifies some of the critical differences...
byThe AI in Business Podcast
0 ratings
0% found this document useful
77 | All Things Serverless: James and Amy talk about everything Serverless and how it fits into modern Web Development. They discuss Serverless Functions, hosting platforms (Netlify, Vercel, and Cloudflare), frameworks and tools, benefits, Edge Functions, and more.
Podcast episode
77 | All Things Serverless: James and Amy talk about everything Serverless and how it fits into modern Web Development. They discuss Serverless Functions, hosting platforms (Netlify, Vercel, and Cloudflare), frameworks and tools, benefits, Edge Functions, and more.
byCOMPRESSEDfm
0 ratings
0% found this document useful
#134 - A Developer-Centric Approach to Measuring and Improving Productivity - Margaret-Anne Storey & Abi Noda
Podcast episode
#134 - A Developer-Centric Approach to Measuring and Improving Productivity - Margaret-Anne Storey & Abi Noda
byTech Lead Journal
0 ratings
0% found this document useful
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
Podcast episode
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
byData Engineering Podcast
0 ratings
0% found this document useful
Adriana Villela - On Being a Serial Refactorer: Robby has a chat with Adriana Villela, a Senior Developer Advocate at Lightstep, about coding software “beautifully”, why she values being a serial refactorer, the importance of a debugger, her involvement with the OpenTelemetry project and the standardization of observability protocols, trace-based testing, and so much more. Tune in to benefit from the software engineering wisdom she had to share.
Podcast episode
Adriana Villela - On Being a Serial Refactorer: Robby has a chat with Adriana Villela, a Senior Developer Advocate at Lightstep, about coding software “beautifully”, why she values being a serial refactorer, the importance of a debugger, her involvement with the OpenTelemetry project and the standardization of observability protocols, trace-based testing, and so much more. Tune in to benefit from the software engineering wisdom she had to share.
byMaintainable
0 ratings
0% found this document useful

Skip carousel

In Conversation with Rajesh Dhuddu Global Head, Blockchain & Metaverse Practice, Tech Mahindra
Techfastly
Article
In Conversation with Rajesh Dhuddu Global Head, Blockchain & Metaverse Practice, Tech Mahindra
Nov 1, 2022
6 min read
There’s A New Career In Town
True Love
Article
There’s A New Career In Town
Oct 21, 2019
2 min read
Learning to Love What I Don’t Know
Inc.
Article
Learning to Love What I Don’t Know
Nov 1, 2017
LIKE MANY WHO HAVE made the leap into Startupland, I guessed from the outset that I had a lot to learn. I was right. Indeed, I jumped into the wormhole of blind spots and unknown unknowns. This has been especially true on matters technological. At Io
2 min read
In Conversation with Surbhi Rathore
Techfastly
Article
In Conversation with Surbhi Rathore
Oct 1, 2021
4 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
Data Fabric
PC Pro Magazine
Article
Data Fabric
Aug 13, 2020
3 min read
Digital Trust Is On The Horizon
The European Business Review
Article
Digital Trust Is On The Horizon
Mar 1, 2022
11 min read
Harvey Cameron And Nero Motion Combine Forces
NZ Marketing
Article
Harvey Cameron And Nero Motion Combine Forces
Dec 8, 2023
2 min read
5 Ways To Improve ThePerformance Of A Home Network
Residential Tech Today
Article
5 Ways To Improve ThePerformance Of A Home Network
Apr 27, 2021
2 min read
It As The Whipping Boy: Mistakenly Confusing ‘Enterprise It’ With ‘Consumer It’
The European Business Review
Article
It As The Whipping Boy: Mistakenly Confusing ‘Enterprise It’ With ‘Consumer It’
Jul 31, 2020
As users of digital technologies in their personal lives, many executives pine for their internal IT systems to give them a similar experience and to be just like IT is in their daily lives. They point to the simplicity, ease of use and hassle free n
9 min read
IBM Boss: Big Companies Were Not Prepared For The Pandemic
Evening Standard
Article
IBM Boss: Big Companies Were Not Prepared For The Pandemic
Sep 4, 2020
Sreeram Visvanathan is the new chief executive of IBM UK and Ireland. The 53 year-old is from Bangalore in India and previously led IBM’s global Public Sector team. In the middle of London Tech Week, he talked to the Evening Standard about the future
4 min read
Businesses Were Not Prepared For The Pandemic, Warns IBM Chief Executive
Evening Standard
Article
Businesses Were Not Prepared For The Pandemic, Warns IBM Chief Executive
Sep 7, 2020
4 min read
Prabhu’s Journey Beyond The Professional
Facility Management
Article
Prabhu’s Journey Beyond The Professional
Jun 27, 2019
6 min read
COMPETITIVE ADVANTAGE THROUGH SOFTWARE: Contrasting Enterprises & Startups
The European Business Review
Article
COMPETITIVE ADVANTAGE THROUGH SOFTWARE: Contrasting Enterprises & Startups
Feb 4, 2019
6 min read
Arnab PANDEY
Techfastly
Article
Arnab PANDEY
Apr 1, 2021
11 min read
“The Biggest Problem I See When People Are Working From Home Is A Poorly Designed Network”
PC Pro Magazine
Article
“The Biggest Problem I See When People Are Working From Home Is A Poorly Designed Network”
Jun 8, 2023
6 min read
10 Questions Every IT Department Should Be Able To Answer (BUT PROBABLY CAN’T)
PC Pro Magazine
Article
10 Questions Every IT Department Should Be Able To Answer (BUT PROBABLY CAN’T)
Jul 8, 2021
6 min read
DETER, DETECT, DELAY: Thwarting The Digital Intruders
The European Business Review
Article
DETER, DETECT, DELAY: Thwarting The Digital Intruders
Nov 25, 2021
6 min read
Design Of Awesome
NZ Marketing
Article
Design Of Awesome
Mar 20, 2018
8 min read
The Blending of Home and Office Tech
Residential Tech Today
Article
The Blending of Home and Office Tech
Jun 22, 2019
4 min read
A Human-Centric Approach to Cybersecurity Branded Content
Inc.
Article
A Human-Centric Approach to Cybersecurity Branded Content
Aug 13, 2024
REDTRACETECH.COM Digital threats are evolving with alarming velocity, but RedTrace Technologies is redefining the cybersecurity landscape, transcending traditional cybersecurity approaches. Instead of offering one-size-fits-all solutions, RedTrace di
1 min read
Data-driven Decision Making That Uses Data, Mind And Heart
The European Business Review
Article
Data-driven Decision Making That Uses Data, Mind And Heart
Jan 31, 2020
14 min read
Cybersecurity Made Simple: Taming The Password
The European Business Review
Article
Cybersecurity Made Simple: Taming The Password
Mar 1, 2022
8 min read
“Be Global But Act Local because Each Economy Is Unique”
Business Today
Article
“Be Global But Act Local because Each Economy Is Unique”
Dec 8, 2023
6 min read
Building Trends, Building Momentum
Facility Management
Article
Building Trends, Building Momentum
Oct 14, 2019
3 min read
Jobs Of The Future
True Love
Article
Jobs Of The Future
Jan 26, 2023
5 min read
All Change – But Which Platform? Confronting Shift In The Telecom Sector
The European Business Review
Article
All Change – But Which Platform? Confronting Shift In The Telecom Sector
Nov 25, 2021
Q Thank you for joining us today, Mr Peters! Would you mind giving us a little backstory on how your interest in telecommunications came about? A My college background (Fairfield University) was in physics and neuroscience, so my introduction to tel
8 min read
Quantum Leap
Marketing
Article
Quantum Leap
Jul 11, 2019
6 min read
Real-World Experience
Residential Tech Today
Article
Real-World Experience
Jan 30, 2019
Richard Millson often seems like the smartest guy in the room. There’s a confidence, bordering on arrogance, sure, but he’s not one of those people who thinks he has all of the answers but turns out to be all bluster. Millson actually seems to know a
6 min read
“Why Are The Stupid Rules There In The First Place? Because Someone Had To Tick A Compliance Box”
PC Pro Magazine
Article
“Why Are The Stupid Rules There In The First Place? Because Someone Had To Tick A Compliance Box”
Jul 9, 2022
I hate passwords with a vengeance. In the main because they are so badly abused, from a security perspective, by so many people. I’m not just talking about the person on the Clapham omnibus who keeps their passwords simple and shared between multiple
7 min read

Related categories

Skip carousel

Reviews for Becoming a Rockstar SRE

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Becoming a Rockstar SRE - Jeremy Proffitt

cover.png

BIRMINGHAM—MUMBAI

Becoming a Rockstar SRE

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Mohd Riyan Khan

Publishing Product Manager: Surbhi Suman

Senior Editor: Romy Dias

Technical Editor: Shruthi Shetty

Copy Editor: Safis Editing

Project Coordinator: Ashwin Kharwa

Proofreader: Safis Editing

Indexer: Tejal Daruwale Soni

Production Designer: Alishon Mendonca

Marketing Coordinator: Agnes D’souza

First published: March 2023

Production reference: 1290323

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80323-922-4

www.packtpub.com

For my wonderful wife, who still likes me after 18 years. I like you too.

– Jeremy Proffitt

To my God, wife Tati, and son Gabe.

– Rod Anami

Contributors

About the authors

Jeremy Proffitt (born January 1977) is obsessed with constantly improving systems and solving problems with an unmatched sense of urgency – the definition of a Site Reliability Engineer (SRE). A master of solutions and technological knowledge, Jeremy is a rockstar SRE with AWS professional certifications in Architecture and DevOps – and has routinely saved millions in potential lost revenue in his career. In his free time, Jeremy enjoys spending time in his rockstar-appropriate technology cave and loves venturing into 3D printing, electronics, and Internet of Things (IoT) projects. By day, Jeremy currently manages a team of top SRE and DevOps talent driving constant improvement and is often cited in the company as a visionary in terms of observability and emergency response.

To the leaders who have helped me see the truth in our work and friends who have stood by and given me the encouragement to follow the wonders of technology, often while in awe of their own work, I say thank you! To my arch-enemies, you have been a wonderful addition that has always challenged me to become better. And finally, to my wife, Jamie, who I still desperately love after 18 years – and mind you, still likes me – I still remember our first date when you took my arm, you stole my heart, and in all our years, I’ve never felt you let go once.

Rod Anami is a seasoned engineer who works with cloud infrastructure and software engineering technologies. As one of the SREs at the Kyndryl CoE, he coaches other SREs on running IT modernization, transformation, and automation projects for clients worldwide. Rod leads the global SRE guild inside Kyndryl, where he helps plant and grow SRE chapters in many countries. Rod is certified as an SRE, technical specialist, and DevOps engineer professional at the ultimate level. He holds AWS, HashiCorp, Azure, and Kubernetes certifications, among many others. He is passionate about contributing to open source software at large with Node.js libraries.

I want to thank my wonderful wife, Tatiana, and my beloved son Gabriel, for giving me the space and support needed to write this book. My parents, Shizuo and Rita, for raising me with solid character. The Google site reliability engineering organization made this fantastic approach and profession open source. I want to thank Kyndryl for backing me on this journey. I had many bosses and leaders, good, bad, and inspiring ones. I want to mention a few who impacted my career immensely by helping me acquire the skills and knowledge for this book: Marcos Cimmino, Tara Sims, Andy Barnes, and Gene Brown. Nothing great is accomplished alone: it requires effort, endurance, enjoyment, colleagues, and God.

About the reviewers

Chris Smith is a strategic IT leader with a proven track record across the financial service industry. His passion is to lead organization-wide transformational efforts for Fortune 500 institutions within digital and contact center technology and operations. He is skilled at driving agile adoption, building an engineering-first mindset, and facilitating cloud modernization of core banking services at scale.

Itohanoghosa Eregie is the founder of techinanutshellhack, a platform dedicated to explaining technology concepts with short video clips about cloud and site SRE concepts in their simplest form via LinkedIn. She worked as a software developer at Cyberspace Limited before finding her passion as a platform engineer, which earned her an opportunity to work with Dell EMC as a resident platform engineer for one of Africa’s largest telecommunications companies, MTN Nigeria, as a platform engineer. Altoros Americas currently employs her as a VMware Tanzu engineer, involved in customer engagement. Itohan is passionate about building resilient systems in the cloud and ensuring organizations adhere to SRE practices.

Brannen Taylor has almost 30 years of experience in corporate IT from the healthcare, managed services, power, hosted DR, and financial services industries. He has worked with small mom-and-pop operations up to ITIL-heavy Fortune 10 companies. He was a network engineer for 20 years and has been a network operations manager for the past 2 years. He has certifications from many vendors such as Nortel, Cisco, and Palo Alto, as well as a few that are vendor-agnostic, many cloud certifications from AWS and Azure, and is now moving into Network DevOps (NetDevOps), focusing on Nautobot, Ansible, and various vendor SDKs. He enjoys scuba diving with his wife and friends and has two grown children.

I would like to thank God for leading me into a career that I love. I want to thank my children for only eye-rolling me a little when I launch into an explanation about binary when they ask me how email works. I want to thank my wife Lara for putting up with me being on call these past 23 years, working unexpectedly long days, nights, and weekends, and non-stop studying. Thank you to my colleagues and the friends I’ve made along the way.

Gene Brown is the Vice President and a Distinguished Engineer at Kyndryl. He leads the SRE profession and certification program and is the global site reliability engineering leader. He is responsible for driving the enablement of SREs across Kyndryl’s countries, practices, and strategic markets through a Center of Excellence with SRE chapter leaders across the services organization globally.

Gene enjoys spending time with clients interested in adopting SRE and likes comparing notes on what has worked well and how to overcome the challenges that come with cultural change. Gene was the co-founder of IBM’s and Kyndryl’s SRE profession with a focus on certifying SREs based on their applied experience in the field of site reliability engineering.

Table of Contents

Preface

Part 1 - Understanding the Basics of Who, What, and Why

SRE Job Role – Activities and Responsibilities

Making this journey personal

SRE driving forces

SRE skills

SRE traits

Understanding the mindset and hobbies of an SRE

SRE affinity game

SRE guiding principles

SRE hobbies

DevOps engineers versus SRE versus others

DevOps and site reliability engineers

Software and site reliability engineers

Describing an SRE’s main responsibilities

An overview of the daily activities of an SRE

People that inspire

Jeremy’s recognition – Paul Tyma, former CTO, LendingTree

Rod’s recognition – Ingo Averdunk, Distinguished Engineer, IBM, and Gene Brown, Distinguished Engineer, Kyndryl

Summary

Further reading

Fundamental Numbers – Reliability Statistics

SLA commitment – a conversation, not a number

Internal partner SLAs

External partner SLAs

The cost of more 9s in an SLA

A final word on SLAs

Defining and leveraging SLOs and SLIs

SLOs

SLOs and time

Tracking outage frequency with the MTBF

Measuring the downtime with the MTTR

Understanding the customer and revenue impact

Transparency in outages

The rockstar SRE’s SLA

Summary

Imperfect Habits – Duct Tape Architecture and Spaghetti Code

The business of software development – let’s start with the dollars

Defining the value of software to a business

The value of protecting business

The value of growing a business

The value of saving labor costs

The A/B testing mindset – the art of change in customer interaction

A/B testing in customer flows

Analyzing the results of A/B testing

Leveraging A/B testing to satisfy quarterly numbers

Dedication to the craft of development – and why some are just here for a job

A quick guide to communicating with your colleagues

Reviewing the merge request – it’s about training, oversight, and reliability

Avoiding the typical rubber stamp mentality

A word on production deployments

Why businesses want us to outright ignore best practices

The truth about the ownership of a developer’s time

Understanding the flaws in how we estimate development cost

Fast, good, cheap – pick one

Why is observability the answer to reliability issues?

The cost of highly available architecture

Mixing good and bad – tricks to wrapping bad code and making it resilient

Alerting that fires actions

Adding additional logging to monitor potential issues

Using try catch to encapsulate exceptions

Retries to the rescue…or not

Summary

Part 2 - Implementing Observability for Site Reliability Engineering

Essential Observability – Metrics, Events, Logs, and Traces (MELT)

Technical requirements

Accomplishing systems monitoring and telemetry

Monitoring targets for infrastructure

Monitoring types and tools

Monitoring golden signals

Monitoring data

Understanding APM

Getting to know topology self-discovery, the blast radius, predictability, and correlation

Alerting – the art of doing it quietly

The user perspective notification trigger principle

Event-to-incident mapping principle

Mixing everything into observability

Outages versus downtime

Observability architecture

Observability effectiveness

In practice – applying what you have learned

Lab architecture

Lab contents

Lab instructions

Summary

Further reading

Resolution Path – Master Troubleshooting

Properly defining the problem – and what to ask and not ask

Source of information

The knowledge base of the reporter

Naming conventions

False urgency

Executive summary

Breaking down and testing systems

Breaking down hardware versus the operating system

Breaking down a web API

Understanding the steps

The problems with this method of troubleshooting

Previous and common events – checking for the simple problems

Prior Root Cause Analysis (RCA) documents

Timeline analysis

Comparison

The best approach

Effective research both online and among peers

The art of the Google search

Skimming the content quickly and refining it

Never forget your internal resources

Breaking down source code efficiently

Code you’ve never seen

When that fails

Logging plus code

In practice – applying what you’ve learned

Summary

Operational Framework – Managing Infrastructure and Systems

Technical requirements

Approaching systems administration as a discipline

Design

Installation

Configuration

App deployment

Management

Upgrade

Uninstallation

Understanding IT service management

ITIL

DevOps

Seeing systems administration as multiple layers and multiple towers

Automating systems provisioning and management

Infrastructure as Code

Immutable infrastructure

In practice – applying what you’ve learned

Lab architecture

Lab contents

Lab instructions

Summary

Further readings

Data Consumed – Observability Data Science

Technical requirements

Making data-driven decisions

Defining the question and options

Determining which data to use

Identifying which data is already available

Collecting the missing data

Analyzing all datasets together

Presenting the decision as a record

Documenting the lessons learned in the process

Solving problems through a scientific approach

Formulation

Hypothesis

Prediction

Experiment

Analysis

Understanding the most common statistical methods

Percentages

Mean, average, and standard deviation

Quantiles and percentiles

Histograms

Using other mathematical models in observability

Visualizing histograms with Grafana

In practice – applying what you’ve learned

Lab architecture

Lab contents

Lab instructions

Summary

Further reading

Part 3 - Applying Architecture for Reliability

Reliable Architecture – Systems Strategy and Design

Technical requirements

Designing for reliability

Architectural aspects

Reliability equations

Design patterns

Modern applications

Splitting and balancing the workload

Splitting

Balancing

Failing over – almost as good

Scaling up and out – horizontal versus vertical

Horizontal

Vertical

Autoscaling

In practice – applying what you’ve learned

Lab architecture

Lab contents

Lab instructions

Summary

Further reading

Valued Automation – Toil Discovery and Elimination

Technical requirements

Eliminating toil

Toil redefined

Why toil is bad

Handling toil the right way

Treating automation as a software problem

Document

Algorithm

Code

Automating the (in)famous CI/CD pipeline

Continuous integration

Continuous delivery

Production releases

In practice – applying what you’ve learned

Lab architecture

Lab contents

Lab instructions

Summary

Further reading

Exposing Pipelines – GitOps and Testing Essentials

A basic pipeline – building automation to deploy infrastructure as code architecture and code

Pipelines in chronological order

Pipeline templates

Errors or breaks in pipelines

Using containers in pipelines

Pipeline artifacts

Pipeline troubleshooting tips

Automating compliance and security in pipelines

Library age

Application security testing

Dynamic Application Security Testing (DAST)

Static Application Security Testing (SAST)

Secrets scanning

Automated linting for code quality and standards

Compiling with linting feedback

Validating functionality during deployment with automated testing

Why is testing so important to reliability?

Test data

The types of testing

When to test a pipeline

Testing observability

Automated rollbacks

The reduction of developer toil through automated processes

What is the impact of addressing toil?

In practice – applying what you’ve learned

Preparing AWS for the lab

Creating your repository

Adding secrets to your repository

Downloading and committing the lab files

Understanding the pipeline

Adding more steps

Testing but not deploying

Lab final thoughts

Summary

Worker Bees – Orchestrations of Serverless, Containers, and Kubernetes

Technical requirements

The multiple definitions of serverless

Serverless Framework

Serverless computing

Serverless functions

Monitoring serverless functions

Errors

Containers and why we love them

Isolation

Immutability

Promotability

Tagging

Rollbacks

Security

Signable

Monitoring containers

Kubernetes and other ways to orchestrate containers

Health checks

Crashing and force-closing containers

HTTP-based load balancing

Server load balancing

Containers as a Service (CaaS)

Simple container orchestration

Kubernetes

Deployment techniques and workers

Traditional replacement deployment

Rolling deployment

A/B or blue/green deployment

Canary deployment

Automation and rolling back failed deployments

Rollback metrics

When to roll back

How to roll back

In practice – applying what you’ve learned

Leveraging Gitpod – a containerized workspace

The emulation source code

Running the emulation

Summary

Final Exam – Tests and Capacity Planning

Technical requirements

Understanding types of testing

Development tests

Build tests

Delivery tests

Deployment tests

Production tests

Adopting TDD

Unit testing the hard way

Unit testing with a framework

Using test automation frameworks

Staying ahead with capacity planning

Load test data

The capacity curve

The demand curve

In practice – applying what you’ve learned

Lab architecture

Lab contents

Lab instructions

Summary

Further reading

Part 4 - Mastering the Outage Moments

First Thing – Runbooks and Low Noise Outage Notifications

Technical requirements

What makes a good runbook – the basics

Runbooks as living documents

Understanding the runbook audience knowledge level

Runbook audience permissions

What do you put into a runbook anyway?

Beyond the runbook – code and comments

Quickly understanding source code

Searching source code for your needle in a haystack

Commenting for understanding

What’s in a good dashboard?

Types of dashboards

NOC-style red and green

Displaying trends

Aggregates and breakdowns

What dashboards are not

The basics of priority levels

Response effort

Engineer retention

Incident response systems and priority

Incident response systems and phone-based alerts

What is a priority one event?

Defining priority based on...

The priority level of observability failures

Forcing the priority – the rockstar way!

Adjusting alerts

Logs and alerting

Pausing alerts

In practice – applying what you’ve learned

Defining priority levels

Custom hat pricing API runbook

Alerting

Summary

Rapid Response – Outage Management Techniques

Where to meet – an effective strategy for communicating good information

Online collaboration

In-person collaboration

The historical data found in outage responses

Participants

Follow-up work

Leveraging the people involved in the response

Tasks

Participants and personalities

Break strategy and stress management

The opportunity to respond at the right time

Training

Runbook and contact list revisions

Team building

Executive messaging bugs in the ear

Opportunities to call out during the RCA

Messaging customers and leadership

Customer versus leadership messaging

Cadence

Email groups

Status sites

Over-messaging

Notes, notes, notes...

In practice – applying what you’ve learned

Outage and alarm

Notification and response

Troubleshooting

The conclusion

Summary

Postmortem Candor – Long-Term Resolution

The content of the postmortem in executive summary style

Executive summary style

Overview

Impact

Timeline

Detailed technical description

Response

Resolution

Future actions

Decisions are not blame

Business is business

Resource and time constraints

Monitoring

The cost of more reliability as a business decision

Active:Active

Manual failover

Cost of time to identify

The cost of time to move a load

Hidden development costs

Training and skill sets – they matter

Identifying gaps

Training and certification targets

Creating future action plans

Immediate follow-up

Who to involve

Timelines and priority

Assigning ownership

Tracking the work

In-practice – an example of a postmortem

Writing the overview

Rounding out the postmortem

Custom Hat Company postmortem

Impact

Timeline

Technical details and response

Resolution

Future actions

Summary

Part 5 - Looking into Future Trends and Preparing for SRE Interviews

Chaos Injector – Advanced Systems Stability

Technical requirements

Comprehending the wheel-of-misfortune game

All ends are new beginnings

Lessons to be learned

Role-playing scenarios

A little bit of gamification

Understanding chaos engineering for reliability

Principles of chaos engineering

Chaos system architecture

Chaos experiments

In practice – employing the wheel-of-misfortune game

Lab architecture

Lab contents

Lab instructions

In practice – injecting chaos into systems

Lab architecture

Lab contents

Lab instructions

Summary

Further reading

Interview Advice – Hiring and Being Hired

What we’re looking for in a candidate

Are you qualified?

Entry-level SRE job

Problem-solving

The ability to accept feedback and direction

A broad knowledge base and skill set

Research and learning skill set

The ability to say No

Culture fit

The X factor

Passion

Experience

Personal responsibility

Common interview questions and answers

Technical questions

Non-technical questions

Insightfully odd questions

What should you look for in a career?

Define a good boss

Dotted line reporting

Morals

Researching the company

Business model

Profitability for the next decade

Structure

Large versus small

Public versus private

Online reviews

Are you over-or under-certified?

Certifications that matter

How many are too many certifications?

Relevancy

Tips for landing the job with a great salary

Interview tips

Salary negotiations

Summary

Appendix A – The Site Reliability Engineer Manifesto

The manifesto

How to adopt it

How to contribute to it

Appendix B – The 12-Factor App Questionnaire

The questionnaire

Factor I – Code base

Factor II – Dependencies

Factor III – Config (configuration)

Factor IV – Backing (backend) services

Factor V – Build, release, run

Factor VI – Processes

Factor VII – Port binding

Factor VIII – Concurrency

Factor IX – Disposability

Factor X – Development/production (dev/prod) parity

Factor XI – Logs

Factor XII – Admin processes

How to adopt this questionnaire

How to contribute to this questionnaire

Index

Other Books You May Enjoy

Preface

Site reliability engineering relates to constant improvement, bridging business and product issues as per customer requirements and technology limitations, thereby generating higher revenue. Quantifying and understanding reliability, resource handling, and developer needs can sometimes be overwhelming. Becoming a Rockstar SRE explores reliability from an infrastructure and coding perspective and uses real-world examples to bring forth the site reliability engineer (SRE) persona.

This book will acquaint you with who an SRE is, followed by discussions on the why and how of site reliability engineering. It walks you through the jobs of an SRE, from automation of continuous integration/continuous delivery (CI/CD) pipelines and reducing toil to the details of reliability and the best practices to excel in it. You’ll learn why harmful code is created and how to circumvent that with reliable designs and patterns. You’ll explore how to interact and negotiate with businesses and vendors on various technical matters. You’ll then deep dive into observability, outage, and why and how to craft an excellent runbook. Finally, you’ll learn how to elevate your site reliability engineering career, including certifications, interview tips, and questions.

By the end of this book, you’ll be able to identify and measure reliability, reduce downtime, troubleshoot outages, and enhance productivity to become a true rockstar SRE!

Who is this book for

This book is intended for IT professionals, from developers looking to advance into an SRE role to system administrators mastering technologies and executives experiencing repeated downtime in their organizations. This book will also be helpful to anyone interested in bringing reliability and automation to their organization to drive down customer impact and revenue loss while increasing development throughput. While reading this book, a basic understanding of API and web architecture and some experience with cloud computing and services will be helpful.

What this book covers

Chapter 1, SRE Job Role – Activities and Responsibilities, talks about the site reliability engineer persona addressing who is an SRE.

Chapter 2, Fundamental Numbers – Reliability Statistics, shows how the site reliability engineering work and business impact are measured.

Chapter 3, Imperfect Habits – Duct Tape Architecture and Spaghetti Code, explains why systems are naturally unreliable.

Chapter 4, Essential Observability – Metrics, Events, Logs, and Traces (MELT), discusses how we go from monitoring to true observability.

Chapter 5, Resolution Path – Master Troubleshooting, lectures on the SRE way of precisely and concisely troubleshooting.

Chapter 6, Operational Framework – Managing Infrastructure and Systems, describes why and how SREs tackle operational work and not just engineering duties.

Chapter 7, Data Consumed – Observability Data Science, teaches the basic mathematical models and statistical methods for SREs.

Chapter 8, Reliable Architecture – Systems Strategy and Design, describes systems thinking applied to reliability and reliable architectural patterns.

Chapter 9, Valued Automation – Toil Discovery and Elimination, familiarizes readers with a critical pillar of site reliability engineering: making operations scalable.

Chapter 10, Exposing Pipelines – GitOps and Testing Essentials, illustrates how to leverage reliability inside DevOps delivery pipelines.

Chapter 11, Worker Bees – Orchestrations of Serverless, Containers, and Kubernetes, presents how workload management affects the reliability of systems.

Chapter 12, Final Exam – Tests and Capacity Planning, demonstrates how good testing and capacity planning keep the performance of systems ahead.

Chapter 13, First Thing – Runbooks and Low Noise Outage Notifications, discusses how well-designed procedures and notifications prepare SREs for problems.

Chapter 14, Rapid Response – Outage Management Techniques, teaches about SRE positive behaviors and how to keep interactions toward the resolution during a significant incident.

Chapter 15, Postmortem Candor – Long-Term Resolution, portrays how postmortems should lead to actions that will make systems more reliable.

Chapter 16, Chaos Injector – Advanced Systems Stability, clarifies how SREs inject chaos into systems to learn more and use gamification to hone their skills.

Chapter 17, Interview Advice – Hiring and Being Hired, displays how companies should hire SREs and how SREs should demonstrate their knowledge during an interview.

Appendix A, The Site Reliability Engineer Manifesto, depicts the primary responsibilities of any SRE in the world.

Appendix B, The 12-Factor App Questionnaire, consolidates a series of questions to test whether an application design is reliable according to the twelve-factor app manifesto from Heroku.

To get the most out of this book

We purposefully used SRE as the acronym for site reliability engineer and kept site reliability engineering in its extended form throughout the book. For us, site reliability engineering is only accomplishable if you have an SRE and not the other way around. Although it’s common to see SRE standing for both site reliability engineer and engineering interchangeably, we want to emphasize the persona and the who in this book.

This book contains simulation labs to give its readers practical knowledge. Each has a prerequisite knowledge set, such as Kubernetes, cloud computing, or software development. It’s not part of this book to teach you about specific technologies and products but the most effective practices and principles that are technology agnostic. However, we must adopt some technology to demonstrate the site reliability engineering concepts and techniques. For that, we preferred open source software and platforms with free tier accounts in the labs.

Each simulation lab states its learning requirements and points to where the reader can find more information and instructions. We divided each practical exercise into three parts:

Lab architecture

Lab contents

Lab instructions

The lab architecture explains the big picture around the design and connections among its main components. The contents section explains what’s inside the GitHub repository, such as files and folders. And the lab instructions have a procedure for installing, configuring, and using the lab properly.

The following is a list of software covered in this book’s simulation labs and the required execution environment:

You will require a laptop with reasonable access to the internet to work in the book’s labs.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Becoming-a-Rockstar-SRE. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://packt.link/W6q5Y.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Within this repository, under the Chapter-8 folder, there is just one subfolder called terraform.

A block of code is set as follows:

provider google {

credentials = file(project-service-account-key.json)

project = autoscaling-simulation-lab

region = southamerica-east1

zone = southamerica-east1-a

}

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

resource google_compute_autoscaler foobar {

...

autoscaling_policy

{

max_replicas = 5

min_replicas = 1

cooldown_period = 60

Any command-line input or output is written as follows:

$ terraform init

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: To do this, we navigate to the Settings tab in our GitHub repository. Then select Secrets on the left side, and choose Actions.

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

Join the SRE community: We invite you to join the large Site Reliability Engineers community at the sreterminus.slack.com Slack public workspace.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Becoming a Rockstar SRE, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781803239224

Submit your proof of purchase

That’s it! We’ll send your free PDF and other benefits to your email directly

Part 1 - Understanding the Basics of Who, What, and Why

In this first part, you will learn about site reliability engineering, its roots, and current usage outside Google. We emphasize how the site reliability engineer (SRE) persona is the center of gravity of everything orbiting systems reliability. When we talk about site reliability engineering, it’s impossible to do so without a discussion about the business of software development, which we tie into not only statistics used for reliability but how those impact what companies are ultimately interested in, customer satisfaction and revenue. Finally, we’ll explore why the lack of reliability persists in organizations and discuss some of the lesser known truths that make site reliability engineering critical and complex.

The following chapters will be covered in this section:

Chapter 1, SRE Job Role – Activities and Responsibilities

Chapter 2, Fundamental Numbers – Reliability Statistics

Chapter 3, Imperfect Habits – Duct Tape Architecture and Spaghetti Code

SRE Job Role – Activities and Responsibilities

A lot has been said about site reliability engineering, what it is, what it is not, and the multiple practices and techniques that we should apply to adopt the site reliability engineering model. Who site reliability engineers (SREs) are is often put aside even though it is a crucial aspect. Moreover, how people from various parts of information technology (IT) become SREs and how some of them are recognized as thought leaders in this domain.

However, little has been said about the site reliability engineer persona, as detailed in the following list:

What do they know?

Which skills have they developed?

What do they do daily?

What are their primary responsibilities?

Those characteristics would explain, at a bare minimum, why someone should start the journey to becoming an SRE rockstar. That’s precisely why we decided to start this book by outlining the SRE job role.

In this chapter, we’re going to cover the following main topics:

Making this journey personal

Understanding the mindset and hobbies of an SRE

DevOps engineers versus SRE versus others

Describing an SRE’s main responsibilities

An overview of the daily activities of an SRE

People that inspire

Making this journey personal

Unfortunately, often when an enterprise starts to adopt SRE into their IT governance processes, they don’t use a people-processes-tools (PPT) model to transform their operations and software development areas, having a clear vision of these pillars. Even more often, they don’t emphasize or focus on the people element of PPT in such transformations. We want to change that by making this learning journey personal and centered on the individuals rather than the involved processes or technologies.

It’s critical to understand (and learn) what drives typical SREs forward, which fundamental skills they have developed, and how they hone their skills over time to go above and beyond at work. For that purpose, we will divide this subject into three sections:

SRE driving forces

SRE skills

SRE traits

Let’s start this personal journey by understanding why you should become an SRE.

SRE driving forces

We want to explore what motivates or incentivizes site reliability engineers. There’s no journey of any nature if there is no driving force pushing you through. As a word of advice, we should warn you that learning about site reliability engineering is more of an expedition than a tourism trip. In other words, it’s more a marathon than a sprint. Having clarified that, we’ll begin by putting the possible rewards of this journey on the table. Let’s depict each driving force as a mockup code snippet (JavaScript) to make it fun.

Money

If we could represent in the form of an algorithm how money drives people when they don’t earn enough, it would look like the following:

money

if (

money

< MyMinimumSalary) {

motivated = false;

excitement--;

}

doMyWork();

if (motivated && jobSatisfaction) {

honeSRESkills();

doExtraWork();

} else lookForAnotherJob();

Site reliability engineers make more money than most other technical professionals. According to a Glassdoor (2022) report, they can earn more than USD 118K per year on average. In similar reports, SREs are even noted to have surpassed DevOps engineers in a salary comparison. Nevertheless, not making enough money can be a key demotivating factor. It is hard for anyone to move forward with their career if they are preoccupied with expenses.

Although SREs have a notorious income on average, their salaries will vary per country, years of experience, and employer. Companies justify SRE salary levels based on the reliability value they bring to the table. Rest assured, the site reliability engineering career is well paved in the compensation field.

Job satisfaction

What affects our job satisfaction can be depicted as code logic as follows:

jobSatisfaction

if (interestingJob || purposefulWorkActivities || challengingSkillDevelopment || technicalAppreciation) {

jobSatisfaction

= true;

excitement++;

}

Job satisfaction is another driving force of site reliability engineers, and it has many factors. We usually translate job satisfaction to employee happiness at work. Site reliability engineering leads to job satisfaction when we look at the following profession characteristics: exciting job content, purposeful work activities, challenging skill development, and technical appreciation.

The job content of site reliability engineering spans multiple domains. You can work with developers one day and help systems administrators the next. You may need to assist in redesigning an app to increase its service reliability. As with any generalist model job with technical depth in many subject areas, you will never get bored for sure.

As we will see later in this chapter, SRE work activities have clear business value. They improve not just the service quality, availability, and resiliency, but also the system’s reliability. Reliable services might help with customer loyalty, bringing additional revenue to the service provider. There is a direct relationship between SRE work and business metrics improvement, making their efforts purposeful.

Since site reliability engineering is a cross-technology domain engineering discipline, any skills acquisition is challenging. SREs have knowledge and skills that a systems administrator or software developer doesn’t have. They are required to keep those skills updated and hone them over time. This necessity to keep learning brings the always-moving-forward feeling that may not happen if you only need to master a single product or technology.

The last factor on our list is technical appreciation. According to Boston Consulting Group (BCG) research, appreciation is the number one job happiness factor. Being an SRE, you will aid customers, users, and other technical professionals because of your keen holistic view of the systems. Consequently, technical appreciation for the job you do is common, and who doesn’t like that?

Innovative solutions

The following code gives you an idea of how exciting exploring uncharted terrains is:

If (!solutionExists) {

deviseNewSolution

();

excitement++;

}

Site reliability engineers are natural trailblazers as they explore new technologies and processes to obtain better reliability and eliminate toil (manual and repetitive tasks that are devoid of value). They face many scenarios and situations that are a first of their kind. Moreover, they are responsible for paving the path for others by documenting procedures in runbooks when none exist. There’s nothing more exciting than devising new solutions or improving existing ones. Imagine how you would feel if they named a technical operating procedure after you.

Nevertheless, SREs want to minimize complexity and reduce technical debt. They don’t create a solution just for the sake of doing it unless it adds value and resolves or prevents events that impact customers.

Good relationships

The following code snippet is a representation of how good relationships are a result of an exciting working environment:

If (excitement > HIGH) {

motivateOthers();

relationships.healthy

= true;

}

Also, good work environment relationships are one of the top 10 factors contributing to employee happiness. SREs have good relationships in their work environment. The reason is straightforward; they act as integration hubs among different tribes and have the mission to break company siloes. SREs need cooperation from both development and operations teams. They are technical diplomats and have strong communication skills. Since they are

Enjoying the preview?

Page 1 of 1

Becoming a Rockstar SRE: Electrify your site reliability engineering mindset to build reliable, resilient, and efficient systems

About this ebook

Jeremy Proffitt

Related authors

Related to Becoming a Rockstar SRE

Related ebooks

Software Development & Engineering For You

Related podcast episodes

Related articles

Related categories

Reviews for Becoming a Rockstar SRE

What did you think?

Book preview

Becoming a Rockstar SRE - Jeremy Proffitt