What is DevOps?

DevOps is the logical continuation of the Agile journey that began in 2001. Agile serves to enable DevOps. DevOps practises emerge when we have our code always in a deployable state. DevOps principles and practises are compatible with Agile.

DevOps Myths

Many don't understand what DevOps is about. DevOps is more than just development and operations. Different aspects of it need to be understood and respected. Let's go through them, as Gene Kim has already demonstrated.


1) DevOps is only for startup

  • Pioneered by internal unicorns like Google, Amazon, Netflix & Etsy

Key problems include:

  1. hazardous code releases prone to failure
  2. Inability to release features fast enough to beat the competition
  3. High levels of distrust between Development and Operations.


2) DevOps replaces Agile

  • DevOps principles and practices are compatible with Agile.
  • DevOps is the logical continuation of the agile journey that began in 2001.
  • Agile serves to enable DevOps
  • DevOps practises emerge when we have our code always in a deployable state
  • Developers check into the trunk daily
  • Demonstrate features in production-like environments


3) DevOps is incompatible with information security and compliance

  • Security and compliance are integrated into every stage of daily work in the software development lifecycle
  • Security and compliance integration results in better quality, security and compliance outcomes.


4) DevOps means eliminating IT operations, or "NoOps."

  • IT Ops work will naturally change with DevOps, but its work remains as important as ever.
  • IT Ops collaborates far earlier with development, who works with IT Ops long after code has been deployed into production.
  • IT Ops enables developer productivity through APIs and self-service platforms that create environments, test & deploy code, monitor and display production telemetry, etc...
  • IT Ops becomes more like development, where the product is the platform developers use to safely, quickly, and securely test, deploy and run their IT services in production.


5) DevOps is just "Infrastructure as code" or automation

  • DevOps requires cultural norms and an architecture that allows shared goals to be achieved throughout the IT value stream.


6) DevOps is only for Open Source Software

  • Achieving DevOps outcomes is independent of the technology being used.

Dev & Ops become DevOps


By working together toward a common goal, they enable

  • The fast flow of planned work into production
  • Achieve world-class stability, reliability, availability and security

Toward a common goal

  • Cross-functional teams test their hypothesis of which features will delight users and advance the organizational goals.
  • Cross-functional teams actively ensure their work flows smoothly and frequently throughout the entire value stream without causing chaos and disruption to IT Ops or any other internal or external customer.
  • QA, IT Ops, and InfoSec work on ways to reduce friction for the team, creating work systems that enable developers to be more productive and get better outcomes.
  • By integrating QA, IT Ops, and InfoSec, delivery teams and automated self-service tools and platforms, teams can use that expertise in their daily work without being dependent on other groups.
  • Organizations create a safe system of work, where small teams can quickly and independently develop, test and deploy code and value quickly, safely, securely and reliably to customers.


Outcomes

  • Maximize developer productivity;
  • Enable organizational learning;
  • Create high employee satisfaction;
  • Win in the marketplace.


How it is today

  • Dev and IT Ops are adversaries
  • Testing and InfoSec happen only at the end of a project --> too late to correct errors that are found
  • Critical activities require too much manual effort and too many handoffs, leaving people always waiting


Consequences

  • Contribute to highly long lead times
  • Quality of work, especially production deployments, is also problematic and chaotic
  • Negative impacts are produced on our customers and our business
  • We fall short of our goals
  • The whole organization is dissatisfied with IT performance
  • Budgets are reduced
  • Unhappy employees feel powerless to change the process and its outcomes

Manufacturing revolution in the 1980s


Adopted lean principles and practices

Make improvements to the following:

  • Plant productivity
  • Customer lead times
  • Product quality
  • Customer satisfaction

Introduction of DevOps in 2010

  • Faster deployments of hardware, software, the cloud, features, and even startup companies begin in just weeks.
  • Deployment to production in just hours or minutes
  • Deployments become routine and low risk
  • Businesses able to test new ideas and run experiments
  • Businesses discover which ideas create the most value to customers more efficiently and effectively
  • Rapid, safe and secure deployment to production

Organizations unable to deploy fast and quickly to the market are destined to lose in the marketplace to more nimble competitors. Regardless of the industry, how we acquire customers and deliver value to them depends on the technology value stream.

The Problem and Chronic Conflict

  • Most organizations are unable to deploy production changes in minutes or hours
  • Production deployments are not routine
  • Production deployments involve outages, chronic firefighting and heroics
  • A core conflict exists within these technology organizations

Chronic conflict


The conflict between Dev + Ops creates a downward spiral resulting in:

  • slower time-to-market
  • reduced quality
  • increased outage
  • increasing technical debt


Technical debt describes how decisions we make lead to problems that get difficult to fix over time:

  • Reduces available options in the future
  • There are often competing goals between Dev & Ops

Goals of IT Organisations

  • Respond to the rapidly changing competitive landscape
  • Provide stable, reliable and secure service to the customer


Development objectives

  • Development takes responsibility for responding to changes in the market
  • Development deploys features and changes into production as fast as possible


Operations objectives

  • Responsible for providing customers with IT service that is stable, reliable and secure --> consequence: makes it virtually impossible for anyone to introduce production changes that could jeopardize production.

Dev + Ops have opposed goals and incentives.

Core conflict: when organizational measurements and incentives across different silos prevent the achievement of organizational goals --> prevent achieving desired business outcomes. These chronic conflicts put technology workers into situations that lead to:

  • poor software and service quality, lousy customer outcomes
  • the daily need for workarounds
  • firefighting
  • heroics

The downward spiral in the 3 acts


a)  IT Operations

Goal: Keep applications and infrastructure running so that our organization can deliver value to customers

Many problems are due to applications and infrastructure that are:

  • Complex
  • Poorly documented
  • Incredibly fragile


Outcome:

  • Lots of technical debt and workarounds
  • Systems most prone to failure are also our most important and at the epicentre of our most urgent changes.
  • When our most urgent changes fail, we may risk the following and jeopardize our most critical organizational promises: availability to customers, revenue goals, security of customer data, accurate financial reporting, etc.


b) ompensation for the last broken promise

Cause: product managers promise bigger, bolder features to impress customers


Outcome:

  • Oblivious to the limitations of what the technology can and can't do, they commit the technology organization to commit to a promise they can't keep
  • Development talked with another urgent project requiring solving new technological challenges, cutting corners to meet promised release dates, and further adding to technical debt.


3) Getting busier -> for what?

Outcome: loss of market share


Consequence:
When IT fails, the organization fails

How the downward spiral starts

  • Everybody gets a little busier
  • Work takes a little more time
  • Communications become slower
  • Work queues get a little longer
  • Work becomes more tightly coupled
  • More minor actions cause more extensive failures
  • Become fearful and less tolerant of changes
  • Work requires more communication, coordination and approvals
  • Teams must wait longer for their dependent work to get done
  • Quality keeps getting worse
  • Production code deployments are taking longer to complete
  • Deployment outcomes have become problematic
  • The ever-increasing number of customer outages
  • More heroics and firefighting in operations
  • Inability to pay down technical debt
  • Product delivery cycles slower and slower
  • Fewer projects taken are less ambitious
  • Feedback becomes slower and weaker
  • Feedback from customers slows down
  • Things seem to get worse
  • No longer able to respond quickly to changing competitive landscape
  • Inability to provide stable, reliable service to our customers


Two facts:

  • Every IT organization has two opposing goals
  • Every company is a technology company

Benefits of DevOps

DevOps enables organisations to improve

  • Organizational performance
  • Achieve goals of all various functional technology roles: Dev, Ops, InfoSec, QA
  • Improve the human condition

Core advantages and general checklist to observe

  1. Developers independently implement their features.
  2. Developers validate the correctness of their features in production-like environments.
  3. Developers have their code deployed to production quality safely and securely.
  4. Code deployments are routine and predictable.
  5. Deployments occur throughout the business day when everyone is already in the office without customers noticing.
  6. Everyone can see the effects of their actions by creating fast feedback loops at every step of the process.
  7. When changes are committed to version controls, fast, automated tests are run in production-like environments.
  8. DevOps give continual assurance that the code and environments operate as designed.
  9. Deployments are always secure.
  10. Automated testing helps developers discover their mistakes quickly, enabling faster fixes and genuine learning.
  11. Code and environments operate as designed and are always in a secure and deployable state.
  12. Automated tests help developers discover their mistakes quickly.
  13. Instead of occurring in technical debt, problems are fixed as they are found.
  14. Global goals outweigh local goals.
  15. Pervasive production telemetry in our code and production environments ensures that problems are detected and corrected quickly.
  16. The architecture allows small teams to work safely and decoupled from the work of other teams.
  17. Teams work independently and productively in small batches, quickly and frequently delivering new value to customers.
  18. High-profile products and features become routine by using dark launch techniques.
  19. Instead of firefighting for days, we merely change a feature toggle or configuration setting.
  20. Features can be automatically rolled back if something goes wrong.
  21. Releases are controlled, predictable, reversible, and low stress.
  22. All sorts of problems are being found and fixed early when they are smaller, cheaper and easier to correct.
  23. With every fix, we generate organizational learnings, allowing us to prevent the problem from recurring.
  24. Everyone is learning, fostering a hypothesis-driven culture where the scientific method is used to ensure nothing is taken for granted.
  25. We use experiments to treat product development and process improvements.
  26. We create long-term teams intact so they can keep iterating and improving, using those learnings to achieve their goals.
  27. Instead of a culture of fear, we have a high-trust, collaborative culture, where people are rewarded for taking risks.
  28. People can fearlessly talk about problems.
  29. Everyone wholly owns the quality of their work.
  30. People use peer reviews to gain confidence that problems are addressed long before they impact the customer.
  31. When something goes wrong, we conduct blameless postmortems to understand what caused the accident and how to prevent it.
  32. We reinforce a culture of learning.
  33. We care about quality so much that we even inject faults into our production environment so we can learn how our system fails in a planned manner.
  34. We conduct planned exercises to practise large-scale failures, randomly killing processes and compute services in production.
  35. We inject network latencies and other nefarious acts to ensure we grow resilient.
  36. We enable organizational learning and improvement.
  37. Everyone owns their work, regardless of their role in the technology organization.
  38. Employees have confidence that their work matters and is meaningfully contributing to organizational benefits.

The Business Value of DevOps

DevOps Practises

  1. Throughput metrics
  2. Code and change deployments lead times (30x or more)
  3. Reliability metrics
  4. Production deployments (60x higher change success rate)
  5. Mean time to restore services (168x faster)
  6. Organizational performance metrics
  7. Productivity, market share, and profitability goals (2x more likely to succeed)
  8. Market capitalization growth (50% higher over three years)

Value of DevOps

  • High performers were both agile and more reliable, empirical evidence that DevOps enables us to break the core, chronic conflict
  • "Code committed" to "successfully running in prod" was 200x faster. Lead time measured in minutes instead of hours.
  • High performers twice likely to exceed profitability, market share and productivity goals
  • Higher employee job satisfaction
  • Lower rates of employee satisfaction
  • Lower rates of employee burnout
  • Employees 2x more likely to recommend their employer to friends as a great place to work
  • Better info security outcomes, spending 50% less time remediating security issues by fully integrating it into all stages of development and operations processes.

DevOps help scale developer productivity


Increasing the number of developers for a project significantly decreases developer productivity to: communication, integration and testing overhead.

The following combination enables small teams of developers to act quickly, safely and independently develop, integrate, test and deploy changes into production.

  1. The right architecture
  2. The right technical practises
  3. The right cultural norms


Problems to overcome:

  • Catastrophic deployments
  • Problems with availability
  • Problems with security
  • Problems with compliance


DevOps is the result of applying the following:

  1. Flow: accelerate delivery of work from Dev+Ops
  2. Feedback: Create safer systems of work
  3. Continual learning & Experimentation: Faster, high-trust culture and scientific approach to improvement

History

History of DevOps

DevOps is the combination of the following knowledge:

  • Knowledge from lean
  • Theory of constraints
  • Toyota production system
  • Resilience engineering
  • Learning organisations
  • Safety culture
  • Human factors


Valuable contexts that DevOps draws from:

  • High-trust management cultures
  • Servant leadership
  • Organisational change management


Outcome of implementing DevOps:

  • World class quality, reliability and security
  • Lower cost and effort
  • Accelerated flow
  • Reliability throughout the technology value stream, including product management, development, QA, IT operations and InfoSec

DevOps is the logical continuation of the agile software journey that began in 2001

History of Lean

Value Stream Mapping, Kanban Boards and Total Productive Maintenance were codified for the Toyota Production System in the 1990s


3 of Lean's major tenants are the following

1) Manufacturing Lead Time

  • Conversion of raw materials into finished goods was the best indicator of quality, customer satisfaction, and employee satisfaction


2) Lean Principles: Focus on how to create value for the customer through systems thinking

  • Creating constancy of purpose
  • Embracing scientific thinking
  • Creating flow and pull
  • Assuring quality at the source
  • Leading with humility
  • Respecting every individual


3) Value streams

  • The sequence of activities an organisation undertakes to deliver upon a customer request

History of Agile

Created in 2001 by 17 of the leading thinkers in software development

Focus: create a lightweight set of values and principles against heavyweight software development practises such as waterfall, and methodologies like rational unified process

Key principle of agile: "deliver working software frequently, from a couple of weeks to a couple of months, with a preference for the shorter timescale"

  • Desire for small batch sizes
  • Incremental releases
  • Need for small, self-motivated teams
  • Work in high-trust management model

Agile is credited for dramatically increasing the productivity of many development organisations

Agile Infrastructure and Velocity Movement


Patrick Debois and Andrew Schafer introduced agile principles to infrastructure versus application code.

In 2009, John Allspew and Paul Hammond introduce "10 deploys per day"

  • Creation of shared goals between Dev & Ops
  • Using continuous integration practises to make deployment part of everyone's daily work


Term "DevOps" was coined by Allspaw and Hammond in 2009

Continuous Delivery Movement


Continuous delivery:

  • Creation of a deployment pipeline
  • Ensure that code and infrastructure are always in a deployable state
  • All code checked into trunk can be safely deployed into production.

Toyota Kata


Origins

  • Codification of the Toyota Production System
  • Mike Rother helped develop the Lean Toolkit


The term "improvement Kata" means:

  1. Every organiosation has work routines
  2. Improvement kata requires creating structure for the daily, habitual practise of improvement work
  3. Daily practise improves outcomes
  4. Setting weekly target outcomes and continual improvement of daily work is what guided improvement of Toyota

Manufacturing Value Streams

Value Streams

The sequence of activities required to design, produce and deliver a good or service to a customer, including the duel flows of information and material

  • Customer order is received
  • Raw materials released onto plant floor

Achieve Relentless Focus

  • Use small batch sizes
  • Reduce work in process (WIP)
  • Prevent rework to ensure we don't pass defects to downstream work centers
  • Constantly improve and optimise our system toward global goals

Technology Value Streams

In DevOps the technology value stream is the process required to convert a business hypothesis into a technology-enabled service that delivers value to the customer


Inputs include:

  • Formulation of business objectives
  • Formulation of concepts
  • Formulation of an idea
  • Formulation of a hypothesis

Outcome: adding inputs to our committed backlog of work


Development teams follow an agile process

  1. Transform idea into user story
  2. Implement code into application
  3. Code is checked into version control
  4. Change is integrated
  5. Testing is conducted with the rest of the software system


Value generation:

  • Value is created only when our services are running in production
  • We must ensure that we are not only delivering fast flow, but that our deployments can also be performed without causing chaos and disruptions such as service outages, service impairments or securing or compliance failures.

Deployment Lead Time in Minutes


The following will be a measure of the success of your DevOps lead times

  • Developers receive fast, constant feedback on their work
  • Developers are enabled to quickly and independently implement, integrate, and validate tehir code, and have the code deployed into production environment
  • Developers check-in into production environment small code changes into version control repository, performing automated and exploratory testing against it, and deployment into production.
  • Enables high degree of confidence that our changes will operate as designed in production and that any problems can be quickly detected and corrected.
  • Deployment lead time is measured in minutes, worse case in hours.

Observing "% C/A" as measure of rework

  • Key metric in the tech value stream is percent complete and accurate "% C/A"
  • Reflects quality output of each step in our value stream
  • % C/A can be obtained by asking downstream customers what percentage of the time they receive work that is "unable as is"

Target ideal deployment lead times

Step 1 - committed (automated) produced by automated approval

Step 2 - Automated testing (manual approval) is automated (10 minutes)

Step 3 - Exploratory testing (10 minutes)

Step 4 - Production deployment (5 minutes)


Focus on deployment lead times - The value stream begins when any engineer (Dev, QA, Ops, InfoSec) checks a change into version control and ends when that change is successfully running in production


Phase 1: Design & Development

Design & Development akin to Lean Product Development and is highly variable and highly uncertain, requiring high degrees of creativity and work that may never be performed again, resulting in high variability of process times.


Phase 2: Testing & Operations

  • Akin to lean manufacturing.
  • Requires creativity and expertise
  • Strives to be predictable and mechanistic with the goal of achieving work outputs with minimised availability (ie. short & predictable lead times, near zero defects)


Phase 3: Remove large batches of work

  • Goal is to have testing and operations happening simultaneously with design/development, enabling fast flow and high quality
  • Method succeeds when we work in small batches and quality into every part of our value stream.

Lead Time vs. Processing Time

Lead Time

  • Used to measure performance in value streams
  • Clock starts when request is made and ends when it is fulfilled
  • Because lead time is what the customer experiences, we focus our process improvement there instead of process time
  • Achieving fast flow and short lead times almost always requires reducing the time our work is waiting in queues.

Process Time

Starts only when we begin work on the customer request

Process time versus lead time
Process Time vs Lead Time
  • Because lead time is what the customer experiences, we typically focus our process improvement attention there instead of on process time.
  • Process to lead time serves an important measure of efficiency -- achieving fast flow and short lead times almost always requires reducing time our work is waiting in the queue.

Common Scenario - Deployment Lead Times Requiring Months

Common in large, complex organisations that are working with:

  • Tightly coupled monolithic applications
  • Often with scarce integration test environments
  • High reliance on manual testing
  • Multiple required approval process
  • Heroics required
  • High risks occur after merging all development team changes resulting in code that no longer builds correctly or passes any of our tests.
  • Fixing each problem requires days or weeks
  • Extensive investigation conducted to determine who broke the code and how to fix

Result: Poor customer outcomes

Enabling organisational learning and a safety Culture

Complex systems make it impossible for us to predict all the outcomes for any action we take.

  • The root cause of errors are often deemed to be human errors – “name, blame, shame”. The cycle begins for the person who caused the problem.
  • More processes and approvals are created top revent errors from happening
  • How management chooses to react to failures and accidents leads to a culture of fear, which then makes it unlikely that problems and failures signals are ever reported. Problems remain hidden until a catastrophe occurs.
  • Dr. Westrum defined 3 types of culture: pathological, bureaucratic, generative.
  • In the technology value stream we need to create a “generative culture”

Westrum organisational typology (2004)


Pathological organisations:

  • Information is hidden
  • Managers are “shot”
  • Responsibilities are shirked
  • Bridging between teams is discouraged
  • Failure is covered up
  • New ideas are crushed.


Bureaucratic organisations:

  • Information may be ignored
  • Managers are tolerated
  • Responsibilities are compartmented
  • Bridging between teams is allowed but discouraged
  • Organisation is just and merciful
  • New ideas create problems.


Generative organisations:

  • Information is actively sought
  • Mangers are trained
  • Responsibilities are shared
  • Bridging between teams is rewarded
  • Failure causes inquiry
  • New ideas are welcomed

Goal of Technology Value Stream

  1. Establish the foundations of a generative culture by striving to create a safe system of work.
  2. We look for how we can redesign the system to prevent the accident from happening again.
  3. We conduct a blameless postmortem after every incident to gain understanding of how the accident occurred and agree upon what the best counter-measures are to improve the system. We want to enable faster detection and recovery by preventing the problem from occurring again.


Result:

  • Create organizational learning
  • Help customers
  • Ensure quality
  • Create competitive advantage
  • Energised workforce
  • Committed workforce
  • We can uncover the truth.

Institutionalise the improvement of daily work


Problems:

  • In the absence of improvements, processes don’t stay the same. Due to chaos and entropy, processes actually degrade over time.
  • When we avoid fixing problems, relying on daily workarounds, our problems and technical debt accumulates until all we are doing is performing workarounds trying to avoid disaster with no cycles left for productive work.


Solutions:

  • Reserve time to pay down technical debt, fixing defects and refactoring
  • Improve problematic areas for our code and environments
  • We need to reserve cycles in each deployment interval.
  • Schedule kaizen blitzes – periods when engineers self-organize into teams to work on fixing any problems they want.


Outcome
: As we make our system of work safer, we find and problems from even weaker failure signals.

Transform local discoveries into global improvements


When teams or individuals have experiences that create expertise, our goal is to convert that tacit knowledge into explicit, codified knowledge, which becomes someone else’s expertise through practice.

Result: When people do similar work, they do so with the cumulative and collective experience of everyone in the organization who has ever done the same work.


We convert individual expertise into artifacts that the rest of the organization can use.

What we need to do: We need to create global knowledge by making all blameless post-mortem reports searchable by teams trying to solve similar problems.

Greenfield vs. Brownfield Services

Greenfield Development

  • Build on undeveloped land
  • No existing structures that need demolishing
  • New software project / initiative