Many don’t understand DevOps. It is more than just development and operations. Different aspects of it need to be understood and respected.

Let’s go through them, as Gene Kim has already demonstrated.

Table of Contents

M‍yths include the following

DevOps is only for startup

Pioneered by internal unicorns like Google, Amazon, Netflix & Etsy

‍Key problems include:

hazardous code releases prone to failure
Inability to release features fast enough to beat the competition
High levels of distrust between Development and Operations.

DevOps replaces Agile

DevOps principles and practices are compatible with Agile.
DevOps is the logical continuation of the agile journey that began in 2001.
Agile serves to enable DevOps
DevOps practises emerge when we have our code always in a deployable state
Developers check into the trunk daily
Demonstrate features in production-like environments

DevOps is incompatible with information security and compliance

Security and compliance are integrated into every stage of daily work in the software development lifecycle
Security and compliance integration results in better quality, security and compliance outcomes.

DevOps means eliminating IT operations, or “NoOps.”

IT Ops work will naturally change with DevOps, but it remains as important as ever.
IT Ops collaborates far earlier with development, who works with IT Ops long after code has been deployed into production.
IT Ops enables developer productivity through APIs and self-service platforms that create environments, test & deploy code, monitor and display production telemetry, etc…
IT Ops becomes more like development, where the product is the platform developers use to safely, quickly, and securely test, deploy and run their IT services in production.

DevOps is just “Infrastructure as code” or automation

DevOps requires cultural norms and an architecture that allows shared goals to be achieved throughout the IT value stream.

DevOps is only for Open Source Software

Achieving DevOps outcomes is independent of the technology being used.

‍

Dev & Ops become DevOps

By working together toward a common goal, they enable

The fast flow of planned work into production
Achieve world-class stability, reliability, availability and security

Toward a common goal

Cross-functional teams test their hypothesis of which features will delight users and advance the organizational goals.
Cross-functional teams actively ensure their work flows smoothly and frequently throughout the entire value stream without causing chaos and disruption to IT Ops or any other internal or external customer.
QA, IT Ops, and InfoSec work on ways to reduce friction for the team, creating work systems that enable developers to be more productive and get better outcomes.
By integrating QA, IT Ops, and InfoSec, as well as delivery teams and automated self-service tools and platforms, teams can use that expertise in their daily work without being dependent on other groups.
Organizations create a safe system of work where small teams can quickly and independently develop, test and deploy code and value quickly, safely, securely and reliably to customers.

‍Outcomes

Maximize developer productivity;
Enable organizational learning;
Create high employee satisfaction;
Win in the marketplace.

‍How it is today

Dev and IT Ops are adversaries
Testing and InfoSec happen only at the end of a project –> too late to correct errors that are found
Critical activities require too much manual effort and too many handoffs, leaving people always waiting

‍Consequences

Contribute to highly long lead times
Quality of work, especially production deployments, is also problematic and chaotic
Negative impacts are produced on our customers and our business
We fall short of our goals
The whole organization is dissatisfied with IT performance
Budgets are reduced
Unhappy employees feel powerless to change the process and its outcomes

Manufacturing revolution in the 1980s

Adopted lean principles and practices

Make improvements to the following:

Plant productivity
Customer lead times
Product Quality
Customer satisfaction

Introduction of DevOps in 2010

Faster hardware, software, cloud deployments, features, and even startup companies begin in just weeks.
Deployment to production in just hours or minutes
Deployments become routine and low-risk
Businesses able to test new ideas and run experiments
Businesses discover which ideas create the most value for customers more efficiently and effectively
Rapid, safe and secure deployment to production

Organizations unable to deploy fast and quickly to the market are destined to lose in the marketplace to more nimble competitors. Regardless of the industry, how we acquire customers and deliver value depends on the technology value stream.‍

The Problem and Chronic Conflict

Most organizations are unable to deploy production changes in minutes or hours
Production deployments are not routine
Production deployments involve outages, chronic firefighting and heroics
A core conflict exists within these technology organizations

Chronic conflict

The conflict between Dev + Ops creates a downward spiral resulting in:

slower time-to-market
reduced quality
increased outage
increasing technical debt

‍Technical debt describes how decisions we make lead to problems that get difficult to fix over time:

Reduces available options in the future
There are often competing goals between Dev & Ops

Goals of IT Organisations

Respond to the rapidly changing competitive landscape
Provide stable, reliable and secure service to the customer

Development objectives

Development takes responsibility for responding to changes in the market
Development deploys features and changes into production as fast as possible

Operations objectives

Responsible for providing customers with IT service that is stable, reliable and secure –> consequence: makes it virtually impossible for anyone to introduce production changes that could jeopardize production.

Dev + Ops have opposed goals and incentives.
‍
‍Core conflict: when organizational measurements and incentives across different silos prevent the achievement of organizational goals –> prevent achieving desired business outcomes. These chronic conflicts put technology workers into situations that lead to:

poor software and service quality, lousy customer outcomes
the daily need for workarounds
firefighting
heroics‍

‍The downward spiral in the 2 acts

‍IT Operations

Goal: Keep applications and infrastructure running so that our organization can deliver value to customers

Many problems are due to applications and infrastructure that are:

Complex
Poorly documented
Incredibly fragile

‍Outcome:

Lots of technical debt and workarounds
Systems most prone to failure are also our most important and at the epicentre of our most urgent changes.
When our most urgent changes fail, we may risk the following and jeopardize our most critical organizational promises: availability to customers, revenue goals, security of customer data, accurate financial reporting, etc.

Compensation for the last broken promise

Cause: product managers promise bigger, bolder features to impress customers

‍Outcome:

Oblivious to the limitations of what the technology can and can’t do, they commit the technology organization to commit to a promise they can’t keep
Development talked with another urgent project requiring solving new technological challenges, cutting corners to meet promised release dates, and further adding to technical debt.

Getting busier -> for what?

Outcome: loss of market share

Consequence: When IT fails, the organization fails

How the downward spiral starts

Everybody gets a little busier
Work takes a little more time
Communications become slower
Work queues get a little longer
Work becomes more tightly coupled
More minor actions cause more extensive failures
Become fearful and less tolerant of changes
Work requires more communication, coordination and approvals
Teams must wait longer for their dependent work to get done
Quality keeps getting worse
Production code deployments are taking longer to complete
Deployment outcomes have become problematic
The ever-increasing number of customer outages
More heroics and firefighting in operations
Inability to pay down technical debt
Product delivery cycles slower and slower
Fewer projects taken are less ambitious
Feedback becomes slower and weaker
Feedback from customers slows down
Things seem to get worse
No longer able to respond quickly to the changing competitive landscape
Inability to provide stable, reliable service to our customers

‍
‍Two facts:

Every IT organization has two opposing goals
Every company is a technology company

Benefits of DevOps

‍DevOps enables organisations to improve

Organizational performance
Achieve goals of all various functional technology roles: Dev, Ops, InfoSec, QA
Improve the human condition

Core advantages and general checklist to observe

Developers independently implement their features.
Developers validate the correctness of their features in production-like environments.
Developers have their code deployed to production quality safely and securely.
Code deployments are routine and predictable.
Deployments occur throughout the business day when everyone is already in the office without customers noticing.
Everyone can see the effects of their actions by creating fast feedback loops at every step of the process.
When changes are committed to version controls, fast, automated tests are run in production-like environments.
DevOps give continual assurance that the code and environments operate as designed.
Deployments are always secure.
Automated testing helps developers discover their mistakes quickly, enabling faster fixes and genuine learning.
Code and environments operate as designed and are always secure and deployable.
Automated tests help developers discover their mistakes quickly.
Instead of occurring in technical debt, problems are fixed as they are found.
Global goals outweigh local goals.
Pervasive production telemetry in our code and production environments ensures that problems are detected and corrected quickly.
The architecture allows small teams to work safely and decoupled from the work of other teams.
Teams work independently and productively in small batches, quickly and frequently delivering new value to customers.
High-profile products and features become routine by using dark launch techniques.
Instead of firefighting for days, we merely change a feature toggle or configuration setting.
Features can be automatically rolled back if something goes wrong.
Releases are controlled, predictable, reversible, and low-stress.
All sorts of problems are being found and fixed early when they are smaller, cheaper and easier to correct.
With every fix, we generate organizational learnings, allowing us to prevent the problem from recurring.
Everyone is learning, fostering a hypothesis-driven culture where the scientific method is used to ensure nothing is taken for granted.
We use experiments to treat product development and process improvements.
We create long-term teams intact so they can keep iterating and improving, using those learnings to achieve their goals.
Instead of a culture of fear, we have a high-trust, collaborative culture where people are rewarded for taking risks.
People can fearlessly talk about problems.
Everyone wholly owns the quality of their work.
People use peer reviews to gain confidence that problems are addressed long before they impact the customer.
When something goes wrong, we conduct blameless postmortems to understand what caused the accident and how to prevent it.
We reinforce a culture of learning.
We care about quality so much that we even inject faults into our production environment so we can learn how our system fails in a planned manner.
We conduct planned exercises to practise large-scale failures, randomly killing processes, and computing services in production.
We inject network latencies and other nefarious acts to ensure we grow resilient.
We enable organizational learning and improvement.
Everyone owns their work, regardless of their role in the technology organization.
Employees have confidence that their work matters and meaningfully contributes to organizational benefits.

‍

The Business Value of DevOps

DevOps Practises

Throughput metrics
Code and change deployment lead times (30x or more)
Reliability metrics
Production deployments (60x higher change success rate)
Mean time to restore services (168x faster)
Organizational performance metrics
Productivity, market share, and profitability goals (2x more likely to succeed)
Market capitalization growth (50% higher over three years)

Value of DevOps

High performers were both agile and more reliable, empirical evidence that DevOps enables us to break the core, chronic conflict
“Code committed” to “successfully running in prod” was 200x faster. Lead time is measured in minutes instead of hours.
High performers twice as likely to exceed profitability, market share and productivity goals
Higher employee job satisfaction
Lower rates of employee satisfaction
Lower rates of employee burnout
Employees 2x more likely to recommend their employer to friends as a great place to work
Better info security outcomes, spending 50% less time remediating security issues by fully integrating it into all stages of development and operations processes.

DevOps help scale developer productivity

Increasing the number of developers for a project significantly decreases developer productivity due to overhead in communication, integration, and testing.

The following combination enables small teams of developers to act quickly, safely and independently to develop, integrate, test and deploy changes into production.

The right architecture
The right technical practices
The right cultural norms

Problems to overcome:

Catastrophic deployments
Problems with availability
Problems with security
Problems with compliance

DevOps is the result of applying the following:

Flow: accelerate delivery of work from Dev+Ops
Feedback: Create safer systems of work
Continual learning & Experimentation: Faster, high-trust culture and scientific approach to improvement

‍

History of DevOps

DevOps is the combination of the following knowledge

Knowledge from lean
Theory of constraints
Toyota production system
Resilience engineering
Learning organisations
Safety culture
Human factors

‍Valuable contexts that DevOps draws from

High-trust management cultures
Servant leadership
Organisational change management

‍The outcome of implementing DevOps

World-class quality, reliability and security
Lower cost and effort
Accelerated flow
Reliability throughout the technology value stream, including product management, development, QA, IT operations and InfoSec

DevOps is the logical continuation of the agile software journey that began in 2001

History of Lean

Value Stream Mapping, Kanban Boards and Total Productive Maintenance were codified for the Toyota Production System in the 1990s

‍3 of Lean’s major tenants are the following

Manufacturing Lead Time

Conversion of raw materials into finished goods was the best indicator of quality, customer satisfaction, and employee satisfaction

Lean Principles

Focus on how to create value for the customer through systems thinking

Creating constancy of purpose
Embracing scientific thinking
Creating flow and pull
Assuring quality at the source
Leading with humility
Respecting every individual

Value streams

The sequence of activities an organisation undertakes to deliver upon a customer request
‍

History of Agile

Created in 2001 by 17 of the leading thinkers in software development

Focus: create a lightweight set of values and principles against heavyweight software development practises such as waterfall and methodologies like rational unified process

The key principle of agile: “deliver working software frequently, from a couple of weeks to a couple of months, with a preference for the shorter timescale”

The desire for small batch sizes
Incremental releases
Need for small, self-motivated teams
Work in high-trust management model

Agile is credited for dramatically increasing the productivity of many development organisations

‍

Agile Infrastructure and Velocity Movement

Patrick Debois and Andrew Schafer introduced agile principles to infrastructure versus application code.

In 2009, John Allspew and Paul Hammond introduced “10 deploys per day”

Creation of shared goals between Dev & Ops
Using continuous integration practises to make deployment part of everyone’s daily work

The term “DevOps” was coined by Allspaw and Hammond in 2009
‍

Continuous Delivery Movement

Continuous delivery

Creation of a deployment pipeline
Ensure that code and infrastructure are always in a deployable state
All code checked into the trunk can be safely deployed into production.

Toyota Kata

Codification of the Toyota Production System
Mike Rother helped develop the Lean Toolkit

‍The term “improvement Kata” means:

Every organisation has work routines
Improvement kata requires creating a structure for the daily, habitual practice of improvement work
Daily practice improves outcomes
Setting weekly target outcomes and continual improvement of daily work is what guided the improvement of Toyota
‍

Manufacturing Value Streams

Value Streams

The sequence of activities required to design, produce and deliver a good or service to a customer, including the duel flows of information and material

The customer order is received
Raw materials are released onto the plant floor

Achieve Relentless Focus

Use small batch sizes
Reduce work in process (WIP)
Prevent rework to ensure we don’t pass defects to downstream work centres
Constantly improve and optimise our system toward global goals

Technology Value Streams

In DevOps, the technology value stream is the process required to convert a business hypothesis into a technology-enabled service that delivers value to the customer

Inputs include:

Formulation of business objectives
Formulation of concepts
Formulation of an idea
Formulation of a hypothesis

Outcome: adding inputs to our committed backlog of work

Development teams follow an agile process

Transform idea into a user story
Implement code into an application
Code is checked into the version control
Change is integrated
Testing is conducted with the rest of the software system

Value generation

Value is created only when our services are running in production
We must ensure that we deliver fast flow and that our deployments can also be performed without causing chaos and disruptions such as service outages, service impairments or securing or compliance failures.‍

Deployment Lead Time in Minutes

The following will be a measure of the success of your DevOps lead times

Developers receive fast, constant feedback on their work
Developers are enabled to quickly and independently implement, integrate, and validate their code, and have the code deployed into a production environment
Developers check into the production environment small code changes into the version control repository, perform automated and exploratory testing against it, and deploy into production.
This enables a high degree of confidence that our changes will operate as designed in production and that any problems can be quickly detected and corrected.

Deployment lead time is measured in minutes, worse case in hours.

Observing “% C/A” as a measure of rework

The key metric in the tech value stream is per cent complete and accurate “% C/A”
Reflects quality output of each step in our value stream
% C/A can be obtained by asking downstream customers what percentage of the time they receive work that is “unable as is”

Target ideal deployment lead times

Step 1 – committed (automated) produced by automated approval

Step 2 – Automated testing (manual approval) is automated (10 minutes)

Step 3 – Exploratory testing (10 minutes)

Step 4 – Production deployment (5 minutes)

‍Focus on deployment lead times – The value stream begins when any engineer (Dev, QA, Ops, InfoSec) checks a change into version control and ends when that change is successfully running in production

‍Phase 1: Design & Development

Design and development is similar to Lean Product Development and is highly variable and uncertain. It requires high degrees of creativity and work that may never be performed again, resulting in high variability of process times.

‍Phase 2: Testing & Operations

Akin to lean manufacturing.
Requires creativity and expertise
Strives to be predictable and mechanistic with the goal of achieving work outputs with minimised availability (ie. short & predictable lead times, near-zero defects)‍

‍Phase 3: Remove large batches of work

The goal is to have testing and operations happening simultaneously with design/development, enabling fast flow and high quality
The method succeeds when we work in small batches and quality into every part of our value stream.

Lead Time vs. Processing Time

Lead Time

Used to measure performance in value streams
The clock starts when the request is made and ends when it is fulfilled
Because lead time is what the customer experiences, we focus our process improvement there instead of process time
Achieving fast flow and short lead times almost always requires reducing the time our work is waiting in queues.

‍Process Time

Starts only when we begin work on the customer’s request

Because lead time is what the customer experiences, we typically focus our process improvement attention there instead of on process time.
Process to lead time serves as an important measure of efficiency. Achieving fast flow and short lead times almost always requires reducing the time our work is waiting in the queue.

Common Scenario – Deployment Lead Times Requiring Months

Common in large, complex organisations that are working with:

Tightly coupled monolithic applications
Often, with scarce integration test environments
High reliance on manual testing
Multiple required approval process
Heroics required
High risks occur after merging all development team changes, resulting in code that no longer builds correctly or passes our tests.
Fixing each problem requires days or weeks
Extensive investigation was conducted to determine who broke the code and how to fix

Result: Poor customer outcomes

Enabling organisational learning and a safety Culture

Complex systems make it impossible to predict all the outcomes of our actions.

The root cause is often called human errors – “name, blame, shame”. The cycle begins for the person who caused the problem.
More processes and approvals are created to revent errors from happening
How management reacts to failures and accidents leads to a culture of fear, making it unlikely that problems and failure signals are ever reported. Problems remain hidden until a catastrophe occurs.
Dr. Westrum defined 3 types of culture: pathological, bureaucratic, and generative.
In the technology value stream, we need to create a “generative culture”

Westrum organisational typology (2004)

‍Pathological organisations

Information is hidden
Managers are “shot”
Responsibilities are shirked
Bridging between teams is discouraged
Failure is covered up
New ideas are crushed.

Bureaucratic organisations

Information may be ignored
Managers are tolerated
Responsibilities are compartmented
Bridging between teams is allowed but discouraged
The organisation is just and merciful
New ideas create problems.

Generative organisations

Information is actively sought
Managers are trained
Responsibilities are shared
Bridging between teams is rewarded
Failure causes inquiry
New ideas are welcomed

The goal of Technology Value Stream

Establish the foundations of a generative culture by striving to create a safe system of work.
We look for how we can redesign the system to prevent the accident from happening again.
We conduct a blameless postmortem after every incident to understand how the accident occurred and agree upon the best countermeasures to improve the system. We want to enable faster detection and recovery by preventing the problem from occurring again.

‍Result:

Create organizational learning
Help customers
Ensure quality
Create competitive advantage
Energised workforce
Committed workforce
We can uncover the truth.
‍

Institutionalise the improvement of daily work

Problems:

In the absence of improvements, processes don’t stay the same. Due to chaos and entropy, processes actually degrade over time.
When we avoid fixing problems and relying on daily workarounds, our problems and technical debt accumulate until all we do is perform workarounds, trying to avoid disaster, with no cycles left for productive work.

Solutions:

Reserve time to pay down technical debt, fixing defects and refactoring
Improve problematic areas for our code and environments
We need to reserve cycles in each deployment interval.
Schedule kaizen blitzes – periods when engineers self-organize into teams to work on fixing any problems they want.

Outcome: As we make our system of work safer, we find problems from even weaker failure signals.

Transform local discoveries into global improvements

When teams or individuals have experiences that create expertise, we aim to convert that tacit knowledge into explicit, codified knowledge, which becomes someone else’s expertise through practice.

Result: When people do similar work, they do so with the cumulative and collective experience of everyone in the organization who has ever done the same work.

We convert individual expertise into artefacts that the rest of the organization can use.

What we must do: We must create global knowledge by making all blameless post-mortem reports searchable by teams trying to solve similar problems.
‍

Greenfield vs. Brownfield Services

‍Greenfield Development

Build on undeveloped land
No existing structures that need demolishing
New software project/initiative

Visited 1 times, 1 visit(s) today