Blog.

Horizontal metrics in a vertical organization

Cover Image for Horizontal metrics in a vertical organization
Craig Blagg
Craig Blagg

In the book, Good Strategy Bad Strategy, Richard Rumelt outlines a chain-link system as:

a "chain-link system" refers to an organizational structure where each component or department is interconnected, meaning the overall performance is limited by the weakest link

This sentence resonated deeply with me, especially in the context of engineering leadership. The concept of interconnected dependencies mirrors how many engineering organizations and systems function. Visualizing business priorities as chains and understanding the links can be critical to optimizing efficiency and effectiveness.

Links in a Chain

When structuring engineering organizations, companies often strive to optimize for autonomy — creating vertical teams with clearly defined and bounded scopes. These teams operate independently, reducing their cognitive load and increasing focus. For instance, we may have dedicated teams for:

  • Artifact registries and CI/CD tools
  • Media Asset hosting
  • Traffic and Networks
  • Microservice frameworks
  • Authentication
  • Each Product/Feature specific team
  • Product analytics

Each of these teams has a well-defined mandate and specific service-level objectives (SLOs), and most of the time, they operate within their goals. However, these same organizations may still encounter persistent complaints suchas:

  • Engineers frustrated by how long it takes to release a feature
  • Periodic customer reports of an inability to complete certain tasks

These issues can persist even when teams report their component metrics as being within acceptable bounds. Why? Because teams measure success within their own defined scope but may not account for their role in the broader interconnected system. Measurement is happening at solely at the link, not the chain.

Know your chains

As organizations become more specialized, their measurements often follow suit. In our examples above, component-level metrics may all be within acceptable bounds, but that doesn't mean the complaints aren't valid.

The chain has many links

As requirements evolve over time, systems accrue numerous direct and transitive dependencies our systems and processes - both consciously (and sometime unconsciously). As these chains increase in length, reliability and performance at each link in the chain can have a compounding effect on completion of the task overall.

In a process that requires a chain of 30 synchronous links, if an organization carries a 99.95% reliability for each service component, the resulting overall reliability of that chain is only 98.5%.

The chain has accrued unknowingly unsupported links

In mature organizations, we will discover components that have become ownerless - whether sequential reorganizations that leave a system unaccounted for, or the conscious orphaning of a system by a team who given their resources channel their effort to other owned initiatives and silo a link in "maintenance mode". Unprioritized links may often be going unmeasured, either through the complete absence of instrumentation - or through the absence of observing the metrics produced.

It's important to remember that despite a link being unmeasured or unobserved - it continues to have the same impact on the overall system.

Know your critical path

In our example case of releasing a feature, we may need the simultaneous availability of:

Building

  • CI/CD platform
  • Build agents (x N)
  • Artifact registry
  • Static Asset origin
  • Unit Test success rate
  • Integration Test success rate

Staging Quality Assurance

  • Staging - compute platform
  • All Microservices needed to run application, for successful E2E test run (x N)
  • E2E browser & test running system
  • E2E Test success rate

Production Deployment

  • Production - compute platform

Each of these systems is managed by different teams with independent goals. Without systemic observability, organizations lack insight into how these components interact at a business-impact level.

Measuring Systemic Health: User-Centric Metrics

To capture systemic performance, organizations should adopt User-Centric Metrics as high-level indicators reflecting the system’s overall effectiveness at delivering a human digestible process.

User-centric metrics can be just as valuable for internal processes as they are for customer-facing ones. Their benefits include:

  • Digestible metrics that rollup progress reporting and can be understood cross discipline
  • Discovering change in unobserved, unowned, or new links
  • Relatable ...

Try this

Ask a colleague to list all the internal and external systems and processes required for code to go from merge to deployment and become usable by an end user. Chances are:

(1) Some steps will be missed. (2) They won’t be able to generate a query of metrics across all systems to determine success rate and duration.

Know your weakest link

As Rumelt outlines, a well functioning chain can be a differentiator for a business. Conversely, a poorly functioning chain can leave a system or process stuck in a low-effectiveness state due to quality matching.

By implementing user-centric measures, we can assess our collective capability in delivering business processes. Are we happy with what the metric informs us? If not, where do we optimize?

Always optimize at the weakest link.

A chain always has a weakest link and a chain is not made stronger by strengthening the links around it. Strengthening adjacent links won’t solve systemic inefficiencies—we must prioritize effort on the bottleneck(s) within the critical path. Solutions may include:

  • Bypass a link by trading off relevant countermeasures
  • Change how the link function

Fixing the weakest link isn’t always glamorous work. You may find it to be neglected, surrounded in organizational ambivilance, entirely unowned, or undocumented. But if a step remains necessary, we now have a clear business case for investing in it—and having a direct connection to business impact is exciting for any engineer.

Don't be shy of owning a chain

As engineering leaders, it is our job to own outcomes that transcend teams/groups/organizations to deliver outcomes that impact the business.

Doing so often means owning a chain - measuring success across multiple areas, connecting processes to business value, and identifying opportunities to make those chains more reliable. Embracing this mindset can lead to substantial improvements in engineering efficiency and business success.