Accelerate: Building and Scaling High Performing Technology Organizations
Below are my notes from the book Accelerate: Building and Scaling High Performing Technology Organizations.
Opinion
Book is recommended if you are looking to explain better why people should adopt DevOps practices. At Print.com, people don’t need to be convinced of it’s benefits. It did help me think of our state of DevOps and how we can improve it and at what parts we can still improve. When your in your day-to-day you often times forget these lessons and this book makes for a good reminder.
Short summary
- Generative culture
- Work in small batches
- Reduce work in progress
- Elimate deployment pain
- Kaizen: relentlessly pursue continuous improvement
- Shift left on security and testing
Notes
What is a high performing organization?
The right culture creates high performing organizations
- Before you can improve, you need a culture that is willing to improve
- Culture exists in three levels
-
- Assumptions: unconscious, taken for granted beliefs, things we just “know”
-
- Values: are more visible, can be discussed, influence group interactions, which shape the actions
-
- Artifacts: missions statements etc.
- Westrum Model (2014) categorizes culture, helps measure it
-
- Pathological: power oriented, low cooperation, messengers shot
-
-
- When a problem occurs, people will try to find a “throat to choke”. While actually, “accidents typically emerge from a complex interplay of contributing factors. Failure in complex systems is, like the other types of behavior in such systems, emergent.” (Perrow 2011). Investigations that stop at “human error” are not helpful, this should be the start of an investigation.
-
-
- Bureaucratic: rule oriented, low cooperation, messengers neglected
-
-
- Not always bad, rules level the playing field
-
-
-
- Following the rules is more important than achieving the mission
-
-
- Generative: performance oriented, high cooperation, messengers trained
-
-
- Higher level of trust, emphasizes the mission, put aside personal issues as the mission is primary
-
- Culture predicts the way information flows through the organization. Organization with a good information flow perform better.
- Good information
-
- Provides an answer
-
- Is timely
-
- Easy to digest
- Organization with good information make better informed decisions. They are more open and transparant. Problems are easily discovered and addressed.
- “Who is on the team matters less than how the team members interact, structure their work, and view their contributions.” (Google 2015)
- “start by changing how people behave — what they do (Shook 2010)”
- Lean management and other technical practices (continuous delivery) do in fact impact culture. Implement these technical practices and the organizational culture will follow.
Continuous Delivery (CD) creates the right culture
- Continuous Delivery is a set of practices that enable organizations to deliver software more frequently, with higher quality and lower risk (Humble 2010)
- Five key principles of Continuous Delivery
-
- Build quality in
-
-
- Eliminate the need for inspection to detect issues as early as possible
-
-
- Work in small batches
-
-
- Split up chunks that deliver measureable business outcomes. Course correct early. Cost of pushing out changes is cheap.
-
-
- Computers perform repetitive tasks, people solve problems
-
-
- Remove repetitive tasks that take a lot of time, such as regression testing. Free people up for high value work.
-
-
- Relentlessly pursue continuous improvement
-
-
- High performers make improvement a part of they daily work (kaizen)
-
-
- Everyone is responsible
-
-
- bureaucratic orgaizations focus on departments, high performing organizations focus on the value stream
-
-
-
- Make the state of the system visible to everyone
-
-
-
- work with the rest of the organization to set measurable achievable goals for these outcomes
-
- To implement continous delivery
-
- Comprehensive configuration management
-
-
- All configuration is stored in a version control system (IaC)
-
-
-
- Leave room for manual approval, but once approved, should be applied automaticly
-
-
- Continuous integration
-
-
- Short lived branches
-
-
-
- Integrated frequently
-
-
-
- If process fails, developers fix it asap
-
-
- Continous testing
-
-
- Automated tests
-
-
-
- Tests run on workstations as well
-
- Capabilities (existance of..)
-
- Version control
-
- Test automation
-
-
- Have tests that are reliable (non flaky tests).
-
-
-
- Tests written by QA or a third party did not have a correlation with IT performance
-
-
-
-
- When developers are responsible for writing tests, they care more about them and will invest more time into maintaining and fixing them
-
-
-
-
-
- Code becomes more testable when developers write them.
-
-
-
- Deployment automation
-
- Continuous integration
-
- Shift left on security
-
- Quick and many merges
-
-
- fewer than three active branches at any time
-
-
-
- very short lifetimes (less than a day)
-
-
-
- no code freeze, feature freeze or stabilization periods
-
-
-
- Github flow was ok for open source projects as contributors are not working full time on the project
-
-
- Test data
-
- Empowered teams
-
-
- Teams could choose they own tools based on what is best of the users of those tools
-
-
-
- Teams could deploy to production on demand
-
-
- Monitoring
-
- A loosely coupled, well encapsulated architecture
-
-
- Fast feedback on the quality and the deployability of the system is available to everyone
-
- Organization that scored well on these capabilities, also had employees that identified more with the organization. These capabilities will improve the culture.
- Improvements to CD made the work “feel” better
- CD helps achieve one of the twelve principles of the agile manifesto. “Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.”
-
- Lower levels of deployment pain
-
- Reduced team burnout
- Quality is value to some person (Weinberg 1992)
- High performers spend a lot less time on unplanned work and rework. They work more on “new work” (new features, products, services)
- Unplanned work and rework is a good proxy for quality as they represent a failure to build quality into the product.
- The Visible Ops Handbook describes unplanned work as “the difference between paying attention to the low fuel warning light on an automobile versus running out of gas on the freeway.”
- John Seddon, creator of the Vanguard Method, describes rework as “the difference between doing it right the first time and doing it over.”
Architecture
- High performance is possible, provided that systems and the teams that build them are loosely coupled
-
- It allows teams to easily test and deploy individual components, even as the organization and the number of systems it operators grows.
- There is barely any correlation between the type of system and delivery performance
- High performers conform with the following
-
- We can do most of our testing without requiring an integrated environment (testability)
-
- We can and do deploy or release our application independently of other applications/services it depends on (deployability)
- Can the team?
-
- Make large scale changes to the design of their system without the permission of somebody outside the team?
-
- Make large scale changes without depending on other teams to make changes?
-
- Complete their work without needing fine grained communication and coordination with people outside the team?
-
- Deploy and release their product or service on demand, independently of other services it depends on?
-
- Do most of their testing without requiring an integrated test environment
-
- Perform deployments during normal business hours with negligible downtime
- Teams that scored highly on architectural capabilities, didn’t required communication between teams to deliver.
- Managed services always need to be mocked
- Many service orientated architectures or microservices do not permit testing and deploying services independently
- A loosely coupled architecture enables scaling
- Steve Yegge’s Platform Rant (https://gist.github.com/chitchcock/1281611)
Management
- PRINCE2 is a structured framework for project management. Was previously used in enterprise IT.
- Agile manifesto was a reaction to the waterfall model.
- “Lean” ideas were being applied to software
- Toyata inspired the “relentless improvement” idea
- Lean Software Development (Poppendieck 2003)
-
- Limit work in progress and use these limits to drive process improvement and increase throughput
-
- Creating and maintaining visual displays showing key quality and productivity metrics and the current status of work (including defects), making these visual displays available to everyone. Align these to operational goals.
-
- Using data from application performance and infrastructure monitoring tools to make business decisions on a daily basis
- Lean Management
-
- Limit work in progress
-
- Visualize work
-
- Feedback from Production
-
- Lightweight change approvals
- Does WIP limit make obstacles in the flow visible? And if teams remove these obstacles through process improvement, does it lead to higher throughput?
-
- WIP Limits are no good if they don’t lead to improvements that increase flow
- Visibility and the high quality communication it enables, are key.
- Lean management practices both decrease burnout culture and lead to a more generative culture.
- Approval by a CAB simply doesn’t work to increase the stability of production systems. It certainly slows things down.
-
- Lightweight change approvals based on peer review, such as pair programming or intrateam code reviews, combined with a deployment pipeline to detect and reject bad changes, are more effective.
-
- What are the chances that an external body, not intimitly familair with the system, can reviews tens of thousands of lines of code changes by potentially hundreds of developers, and accurately determine the impact on a complex production system?
-
- These is a place for outside teams to do effective risk management around changes, this is more a governance role. Such teams should monitoring delivery performance and help teams improve it by implementing practices that are known are known to increase stability, quality and speed.
Mental Health
- When a team feels fear & anxiety when they push code into production says a lot about the delivery performance. This is called deployment pain.
- Deployment pain is a good proxy for delivery performance.
- Most deployment problems are caused a complex and brittle deployment process. Software is often not written with deployability in mind.
-
- Software requires a very specific environment
-
- Manual changes are a part of the deployment process
-
- Requires a lot of departments/handoffs
- In order to reduce deployment pain:
-
- Build systems that are easily deployed into multiple environments
-
- Ensure that the state of the enviroment can be easily reproduced
-
- Build is a “click of a button” process
- Common causes of burnout:
-
- Work overload: job exceeds human limit
-
- Lack of control: no influence over decisions that affect the job
-
- Unsufficient rewards: financial, institutional, social rewards
-
- Breakdown of community: isolation, conflict, lack of support
-
- Absence of fairness: unfair or disrespectful treatment
-
- Conflicting values: job conflicts with personal values
- To reduce burnout
-
- Have a blame free environment. Human error is never the root cause.
-
- Make sure deployment pains are resolved
-
- Reduce WIP and eliminate roadblocks
-
- Have a work environment that supports experimentation, failure and learning.
- When organizational and individual values aren’t aligned, you are more likely to see burnout in employees.
Leadership
- Engagement and satisfaction are indicative of employee loyalty and identity. It thus drives profitability, productivity and market share.
- Measured using NPS score
-
- What is NPS
-
-
- 9 to 10 are promotors
-
-
-
- 7 to 8 are passives
-
-
-
- 0 to 6 are detractors
-
-
-
- NPS = % promotors - % detractors
-
-
- Questions
-
-
- Would you recommend this company as a place to work?
-
-
-
- Would you recommend your team as a place to work?
-
- When requirements are handed down to development teams who must deliver them in large stacks of work. This is called “big design up front” (BDUF). This is a sign of a bureaucratic culture.
24 Key Capabilities
Continuous Delivery
- Use version control for all production artifacts
- Automate your deployment process
- Implement continuous integration
- Use trunk-based development methods
- Implement test automation
- Support test data management
- Shift left on security
- Implement continuous delivery Architecture
- Use a loosely coupled architecture
- Architect for empowered teams Product and Process
- Gather and implement customer feedback
- Make the flow of work visible
- Work in small batches
- Foster and enable team experimentation Lean management and monitoring
- Have a lightweight change approval process
- Monitor across application and infrastructure to inform business decisions
- Check system health proactively
- Improve process and manage work with work-in-progress limits
- Visualize work to monitor quality and communicate throughout the team Culture
- Support a generative culture
- Encourage and support learning
- Support and facilitate collaboration among teams
- Provide resources and tools that make work meaningful
- Support or embody transformational leadership