<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Strategies on Crafting Engineering Strategy</title><link>https://craftingengstrategy.com/strategies/</link><description>Recent content in Strategies on Crafting Engineering Strategy</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><copyright>Will Larson</copyright><lastBuildDate>Thu, 24 Apr 2025 06:00:00 -0700</lastBuildDate><atom:link href="https://craftingengstrategy.com/strategies/index.xml" rel="self" type="application/rss+xml"/><item><title>How should Stripe deprecate APIs? (~2016)</title><link>https://craftingengstrategy.com/api-deprecation-strategy/</link><pubDate>Thu, 24 Apr 2025 06:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/api-deprecation-strategy/</guid><description>&lt;p&gt;While Stripe is a widely admired company for things like its
creation of the &lt;a href="https://craftingengstrategy.com/stripe-sorbet-strategy/"&gt;Sorbet typer project&lt;/a&gt;, I personally
think that Stripe&amp;rsquo;s most interesting strategy work is also among its most subtle:
its willingness to significantly prioritize API stability.&lt;/p&gt;
&lt;p&gt;This strategy is almost invisible externally.
Internally, discussions around it were frequent and detailed, but mostly confined to dedicated API design conversations.
API stability isn&amp;rsquo;t just a technical design quirk, it&amp;rsquo;s a foundational decision in an API-driven business,
and I believe it is one of the unsung heroes of Stripe&amp;rsquo;s business success.&lt;/p&gt;
&lt;h2 id="reading-this-document"&gt;Reading this document&lt;/h2&gt;
&lt;p&gt;To apply this strategy, start at the top with &lt;em&gt;Policy&lt;/em&gt;. To understand the thinking behind this strategy, read sections in reverse order, starting with &lt;em&gt;Explore&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;More detail on this structure in &lt;a href="https://lethain.com/readable-engineering-strategy-documents"&gt;Making a readable Engineering Strategy document&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="policy--operation"&gt;Policy &amp;amp; Operation&lt;/h2&gt;
&lt;p&gt;Our policies for managing API changes are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Design for long API lifetime.&lt;/strong&gt;
APIs are not inherently durable. Instead we have to design thoughtfully
to ensure they can support change.
When designing a new API, build a test application that doesn&amp;rsquo;t use this API,
then migrate to the new API.
Consider how integrations might evolve as applications change. Perform these migrations yourself to understand potential friction with your API.
Then think about the future changes that &lt;em&gt;we&lt;/em&gt; might want to implement on our end.
How would those changes impact the API, and how would they impact the application you&amp;rsquo;ve developed.&lt;/p&gt;
&lt;p&gt;At this point, take your API to API Review for initial approval as described below.
Following that approval, identify a handful of early adopter companies
who can place additional pressure on your API design, and test with them
before releasing the final, stable API.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;All new and modified APIs must be approved by API Review.&lt;/strong&gt;
API changes may not be enabled for customers prior to API Review approval.
Change requests should be sent to &lt;code&gt;api-review&lt;/code&gt; email group.
For examples of prior art, review the &lt;code&gt;api-review&lt;/code&gt; archive for prior requests
and the feedback they received.&lt;/p&gt;
&lt;p&gt;All requests must include a written proposal.
Most requests will be approved asynchronously by a member of API Review.
Complex or controversial proposals will require live discussions to ensure API Review members
have sufficient context before making a decision.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;We never deprecate APIs without an unavoidable requirement to do so.&lt;/strong&gt;
Even if it&amp;rsquo;s technically expensive to maintain support, we incur that support cost.
To be explicit, we define API deprecation as &lt;em&gt;any&lt;/em&gt; change that would require customers to
modify an existing integration.&lt;/p&gt;
&lt;p&gt;If such a change were to be approved as an exception to this policy,
it must first be approved by the API Review, followed by our CEO.
One example where we granted an exception was the deprecation of TLS 1.2 support due to PCI compliance obligations.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;When significant new functionality is required, we add a new API.&lt;/strong&gt;
For example, we created &lt;a href="https://docs.stripe.com/api/subscriptions"&gt;&lt;code&gt;/v1/subscriptions&lt;/code&gt;&lt;/a&gt; to
support those workflows
rather than extending &lt;a href="https://docs.stripe.com/api/charges"&gt;&lt;code&gt;/v1/charges&lt;/code&gt;&lt;/a&gt; to add subscriptions support.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;With the benefit of hindsight, a good example of this policy in action was the introduction of the Payment Intents APIs to maintain
compliance with &lt;a href="https://support.stripe.com/questions/payment-intents-api-requirement-for-strong-customer-authentication-%28sca%29-compliance"&gt;Europe&amp;rsquo;s Strong Customer Authentication&lt;/a&gt;
requirements. Even in that case the &lt;code&gt;charge&lt;/code&gt; API continued to work as it did previously,
albeit only for non-European Union payments.&lt;/p&gt;
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;We manage this policy&amp;rsquo;s implied technical debt via an API translation layer.&lt;/strong&gt;
We release changed APIs into versions, tracked in our &lt;a href="https://docs.stripe.com/changelog"&gt;API version changelog&lt;/a&gt;.
However, we only maintain one implementation internally, which is the implementation of the latest
version of the API.
On top of that implementation, a series of version transformations are maintained,
which allow us to support prior versions without maintaining them directly.
While this approach doesn&amp;rsquo;t &lt;em&gt;eliminate&lt;/em&gt; the overhead of supporting multiple API versions,
it significantly reduces complexity by enabling us to maintain just a single, modern implementation internally.&lt;/p&gt;
&lt;p&gt;All API modifications &lt;em&gt;must&lt;/em&gt; also update the version transformation layers to allow the new
version to coexist peacefully with prior versions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;In the future, SDKs may allow us to soften this policy.&lt;/strong&gt;
While a significant number of our customers have direct integrations with our APIs,
that number has dropped significantly over time.
Instead, most new integrations are performed via one of our official API SDKs.&lt;/p&gt;
&lt;p&gt;We believe that in the future, it may be possible for us to make more backwards
incompatible changes because we can absorb the complexity of migrations into
the SDKs we provide. That is certainly &lt;em&gt;not&lt;/em&gt; the case yet today.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="diagnosis"&gt;Diagnosis&lt;/h2&gt;
&lt;p&gt;Our diagnosis of the impact on API changes and deprecation on our business is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If you are a small startup composed of mostly engineers, integrating a new payments API seems easy.
However, for a small business without dedicated engineers—or a larger
enterprise involving numerous stakeholders—handling external API changes can be particularly challenging.&lt;/p&gt;
&lt;p&gt;Even if this is only marginally true, &lt;a href="https://craftingengstrategy.com/api-deprecation-model/"&gt;we&amp;rsquo;ve modeled the impact of minimizing API changes&lt;/a&gt;
on long-term revenue growth, and it has a significant impact, unlocking our ability to benefit from
other churn reduction work.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;While we believe API instability directly creates churn, we also believe that API stability
directly retains customers by increasing the migration overhead even if they wanted to change providers.
Without an API change forcing them to change their integration, we believe that hypergrowth customers
are particularly unlikely to change payments API providers absent a concrete motivation like an
API change or a payment plan change.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We are aware of relatively few companies that provide long-term API stability in general,
and particularly few for complex, dynamic areas like payments APIs.
We can&amp;rsquo;t assume that companies that make API changes are ill-informed.
Rather it appears that they experience a meaningful technical debt tradeoff between the API provider and API consumers,
and aren&amp;rsquo;t willing to consistently absorb that technical debt internally.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Future compliance or security requirements—along the lines of our upgrade from TLS 1.2 to TLS 1.3 for PCI—may necessitate API changes.
There may also be new tradeoffs exposed as we enter new markets with their own compliance regimes.
However, we have limited ability to predict these changes at this point.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Strategies</title><link>https://craftingengstrategy.com/strategies/</link><pubDate>Thu, 24 Apr 2025 06:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/strategies/</guid><description/></item><item><title>Systems model of API deprecation</title><link>https://craftingengstrategy.com/api-deprecation-model/</link><pubDate>Thu, 24 Apr 2025 05:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/api-deprecation-model/</guid><description>&lt;p&gt;In &lt;a href="https://craftingengstrategy.com/api-deprecation-strategy/"&gt;How should Stripe deprecate APIs?&lt;/a&gt;, the diagnosis
depends on the claim that deprecating APIs is a significant cause of customer churn.
While there is internal data that can be used to correlate deprecation with churn,
it&amp;rsquo;s also valuable to build a model to help us decide if we believe that
correlation and causation are aligned in this case.&lt;/p&gt;
&lt;p&gt;In this chapter, we&amp;rsquo;ll cover:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;What we learn from modeling API deprecation&amp;rsquo;s impact on user retention&lt;/li&gt;
&lt;li&gt;Developing a system model using the &lt;a href="https://github.com/lethain/systems"&gt;lethain/systems&lt;/a&gt; package on GitHub.
That model &lt;a href="https://github.com/lethain/eng-strategy-models/blob/main/APIDeprecationModel.ipynb"&gt;is available in the lethain/eng-strategy-models&lt;/a&gt;
repository&lt;/li&gt;
&lt;li&gt;Exercising that model to learn from it&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Time to investigate whether it&amp;rsquo;s reasonable to believe that API deprecation
is a major influence on user retention and churn.&lt;/p&gt;
&lt;h2 id="learnings"&gt;Learnings&lt;/h2&gt;
&lt;p&gt;In an initial model that has 10% baseline for customer churn per round,
reducing customers experiencing API deprecation from 50% to 10% per round only increases
the steady state of integrated customers by about 5%.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/api-deprecation-model-2.png" alt="Impact of 10% and 50% API deprecation on integrated customers"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Impact of 10% and 50% API deprecation on integrated customers&lt;/p&gt;
&lt;p&gt;However, if we eliminate the baseline for customer churn entirely, then we see a massive
difference between a 10% and 50% rate of API deprecation.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/api-deprecation-model-4.png" alt="Impact of rates of API deprecation with zero baseline churn"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Impact of rates of API deprecation with zero baseline churn&lt;/p&gt;
&lt;p&gt;The biggest takeaway from this model is that eliminating API-deprecation churn alone
won&amp;rsquo;t significantly increase the number of integrated customers.
However, we also can&amp;rsquo;t fully benefit from reducing baseline churn without simultaneously reducing API deprecations.
Meaningfully increasing the number of integrated customers requires lowering both sorts of churn in tandem.&lt;/p&gt;
&lt;h2 id="sketch"&gt;Sketch&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ll start by sketching the model&amp;rsquo;s happiest path: potential customers flowing into
engaged customers and then becoming integrated customers. This represents a customer
who decides to integrate with Stripe&amp;rsquo;s APIs, and successfully completes that integration
process.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/api-deprecation-simple.png" alt="Happiest path for Stripe API integration"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Happiest path for Stripe API integration&lt;/p&gt;
&lt;p&gt;Business would be good if that were the entire problem space.
Unfortunately, customers do occasionally churn.
This churn is represented in two ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;baseline churn&lt;/code&gt; where integrated customers leave Stripe for any number of reasons,
including things like dissolution of their company&lt;/li&gt;
&lt;li&gt;&lt;code&gt;experience deprecation&lt;/code&gt; followed by &lt;code&gt;deprecation-influenced churn&lt;/code&gt;, which represent
the scenario where a customer decides to leave after an API they use is deprecated&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There is also a flow for &lt;code&gt;reintegration&lt;/code&gt;, where a customer impacted by API deprecation
can choose to update their integration to comply with the API changes.&lt;/p&gt;
&lt;p&gt;Pulling things together, the final sketch shows five stocks and six flows.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/api-deprecation-full.png" alt="Final version of systems model for API deprecation"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Final version of systems model for API deprecation&lt;/p&gt;
&lt;p&gt;You could imagine modeling additional dynamics, such as recovery of churned customers,
but it seems unlikely that would significantly influence our understanding of how API deprecation
impacts churn.&lt;/p&gt;
&lt;h2 id="reason"&gt;Reason&lt;/h2&gt;
&lt;p&gt;In terms of acquiring customers, the most important flows
are customer acquisition and initial integration with the API.
Optimizing those flows will increase the number of existing integrations.&lt;/p&gt;
&lt;p&gt;The flows driving churn are baseline churn, and
the combination of API deprecation and deprecation-influenced churn.
It&amp;rsquo;s difficult to move baseline churn for a payments API, as many churning
customers leave due to company dissolution. From a revenue-weighted perspective,
baseline churn is largely driven by non-technical factors, primarily pricing.
In either case, it&amp;rsquo;s challenging to impact this flow without significantly lowering margin.&lt;/p&gt;
&lt;p&gt;Engineering decisions, on the other hand, have a significant impact on both the number of API deprecations,
and on the ease of reintegration after a migration.
Because the same work to support reintegration also supports the initial integration experience,
that&amp;rsquo;s a promising opportunity for investment.&lt;/p&gt;
&lt;h2 id="model"&gt;Model&lt;/h2&gt;
&lt;p&gt;You can find the &lt;a href="https://github.com/lethain/eng-strategy-models/blob/main/APIDeprecationModel.ipynb"&gt;full implementation of this model on GitHub&lt;/a&gt;
if you want to see the full model rather than these emphasized snippets.&lt;/p&gt;
&lt;p&gt;Now that we have identified the most interesting avenues for experimentation,
it&amp;rsquo;s time to develop the model to evaluate which flows are most impactful.&lt;/p&gt;
&lt;p&gt;Our initial model specification is:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# User Acquisition Flow
[PotentialCustomers] &amp;gt; EngagedCustomers @ 100
# Initial Integration Flow
EngagedCustomers &amp;gt; IntegratedCustomers @ Leak(0.5)
# Baseline Churn Flow
IntegratedCustomers &amp;gt; ChurnedCustomers @ Leak(0.1)
# Experience Deprecation Flow
IntegratedCustomers &amp;gt; DeprecationImpactedCustomers @ Leak(0.5)
# Reintegrated Flow
DeprecationImpactedCustomers &amp;gt; IntegratedCustomers @ Leak(0.9)
# Deprecation-Influenced Churn
DeprecationImpactedCustomers &amp;gt; ChurnedCustomers @ Leak(0.1)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Whether these are &lt;em&gt;reasonable&lt;/em&gt; values depends largely on how we think about the
length of each round. If a round was a month, then assuming half of integrated customers
would experience an API deprecation would be quite extreme. If we assumed it was a year,
then it would still be high, but there are certainly some API providers that routinely deprecate
at that rate. (From my personal experience, I can say with confidence that Facebook&amp;rsquo;s Ads API deprecated
at least one important field on a quarterly basis in the 2012-2014 period.)&lt;/p&gt;
&lt;p&gt;Admittedly, for a payments API this would be a high rate, and is intended primarily as a
contrast with more reasonable values in the exercise section below.&lt;/p&gt;
&lt;h2 id="exercise"&gt;Exercise&lt;/h2&gt;
&lt;p&gt;Our goal with exercising this model is to understand how much API deprecation impacts customer churn.
We&amp;rsquo;ll start by charting the initial baseline, then move to compare it with a variety of scenarios
until we build an intuition for how the lines move.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/api-deprecation-model-1.png" alt="Initial model stabilizing integrated customers around 1,000 customers"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Initial model stabilizing integrated customers around 1,000 customers&lt;/p&gt;
&lt;p&gt;The initial chart stabilizes in about forty rounds, maintaining about 1,000 integrated customers
and 400 customers dealing with deprecated APIs.
Now let&amp;rsquo;s change the experience deprecation flow to impact significantly fewer
customers:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Initial setting with 50% experiencing deprecation per round
IntegratedCustomers &amp;gt; DeprecationImpactedCustomers @ Leak(0.5)
# Less deprecation, only 10% experiencing per round
IntegratedCustomers &amp;gt; DeprecationImpactedCustomers @ Leak(0.1)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After those changes, we can compare the two scenarios.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/api-deprecation-model-2.png" alt="Impact of 10% and 50% API deprecation on integrated customers"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Impact of 10% and 50% API deprecation on integrated customers&lt;/p&gt;
&lt;p&gt;Lowering the deprecation rate significantly reduces the number of companies dealing with deprecations
at any given time, but it has a relatively small impact on increasing the steady state for integrated customers.
This must mean that another flow is significantly impacting the size of the integrated customers stock.&lt;/p&gt;
&lt;p&gt;Since there&amp;rsquo;s only one other flow impacting that stock, baseline churn, that&amp;rsquo;s the one to exercise next.
Let&amp;rsquo;s set the baseline churn flow to zero to compare that with the initial model:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Initial Baseline Churn Flow
IntegratedCustomers &amp;gt; ChurnedCustomers @ Leak(0.1)
# Zeroed out Baseline Churn Flow
IntegratedCustomers &amp;gt; ChurnedCustomers @ Leak(0.0)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These results make a compelling case that baseline churn is
dominating the impact of deprecation. With no baseline churn, the number of integrated customers
stabilizes at around 1,750, as opposed to around 1,000 for the initial model.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/api-deprecation-model-3.png" alt="Impact of eliminating baseline churn from model"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Impact of eliminating baseline churn from model&lt;/p&gt;
&lt;p&gt;Next, let&amp;rsquo;s compare two scenarios without baseline churn, where one has high API deprecation (50%)
and the other has low API deprecation (10%).&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/api-deprecation-model-4.png" alt="Impact of rates of API deprecation with zero baseline churn"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Impact of rates of API deprecation with zero baseline churn&lt;/p&gt;
&lt;p&gt;In the case of two scenarios without baseline churn, we can see having an API deprecation rate of
10% leads to about 6,000 integrated customers, as opposed to 1,750 for a 50% rate of API deprecation.
More importantly, in the 10% scenario, the integrated customers line shows no sign of flattening, and
continues to grow over time rather than stabilizing.&lt;/p&gt;
&lt;p&gt;The takeaway here is that significantly reducing either baseline churn or API deprecation magnifies the benefits of reducing the other.
These results also reinforce the value of treating churn reduction as a system-level optimization,
not merely a collection of discrete improvements.&lt;/p&gt;</description></item><item><title>Why did Stripe build Sorbet? (~2017)</title><link>https://craftingengstrategy.com/stripe-sorbet-strategy/</link><pubDate>Thu, 17 Apr 2025 06:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/stripe-sorbet-strategy/</guid><description>&lt;p&gt;Many hypergrowth companies of the 2010s battled increasing complexity in
their codebase by &lt;a href="https://craftingengstrategy.com/monolith-decomposition-strategy/"&gt;decomposing their monoliths&lt;/a&gt;.
Stripe was somewhat of an exception, largely delaying decomposition until
it had grown beyond three thousand engineers and had accumulated a decade of development in its core Ruby monolith.
Even now, significant portions of their product are
maintained in the monolithic repository, and it&amp;rsquo;s safe to say this was only possible
because of Sorbet&amp;rsquo;s impact.&lt;/p&gt;
&lt;p&gt;Sorbet is a custom static type checker for Ruby that was initially designed and implemented by Stripe engineers
on their Product Infrastructure team.
Stripe&amp;rsquo;s Product Infrastructure had similar goals to other companies&amp;rsquo; Developer Experience or Developer Productivity teams,
but it focused on improving productivity through changes in the internal architecture of the codebase itself,
rather than relying solely on external tooling or processes.&lt;/p&gt;
&lt;p&gt;This strategy explains why Stripe chose to delay decomposition for so long,
and how the Product Infrastructure team invested in developer productivity to deal with the challenges
of a large Ruby codebase managed by a large software engineering team with low average tenure caused by rapid hiring.&lt;/p&gt;
&lt;p&gt;Before wrapping this introduction, I want to explicitly acknowledge that this
strategy was spearheaded by Stripe&amp;rsquo;s Product Infrastructure team, not by me.
Although I ultimately became responsible for that team,
I can&amp;rsquo;t take credit for this strategy&amp;rsquo;s thinking.
Rather, I was initially skeptical, preferring an incremental migration to an existing
strongly-typed programming language, either Java for library coverage or Golang
for Stripe&amp;rsquo;s existing familiarity.
Despite my initial doubts, the Sorbet project eventually won me over with its indisputable results.&lt;/p&gt;
&lt;h2 id="reading-this-document"&gt;Reading this document&lt;/h2&gt;
&lt;p&gt;To apply this strategy, start at the top with &lt;em&gt;Policy&lt;/em&gt;. To understand the thinking behind this strategy, read sections in reverse order, starting with &lt;em&gt;Explore&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;More detail on this structure in &lt;a href="https://lethain.com/readable-engineering-strategy-documents"&gt;Making a readable Engineering Strategy document&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="policy--operation"&gt;Policy &amp;amp; Operation&lt;/h2&gt;
&lt;p&gt;The Product Infrastructure team is investing in Stripe&amp;rsquo;s
developer experience by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Every six months, Product Infrastructure will select its three highest priority areas to focus,
and invest a significant majority of its energy into those.
We will provide minimal support for other areas.&lt;/p&gt;
&lt;p&gt;We commit to refreshing our priorities every half after running the developer productivity survey.
We will further share our results, and priorities, in each Quarterly Business Review.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Our three highest priority areas for this half are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Add static typing to the highest value portions of our Ruby codebase,
such that we can run the type checker locally and on the test machines to identify
errors more quickly.&lt;/li&gt;
&lt;li&gt;Support selective test execution such that engineers can quickly determine and run
the most appropriate tests on their machine rather than delaying until tests run
on the build server.&lt;/li&gt;
&lt;li&gt;Instrument test failures such that we have better data to prioritize
future efforts.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Static typing is not a typical solution to developer productivity, so it
requires some explanation when we say this is our highest priority area
for investment. Doubly so when we acknowledge that it will take us 12-24 months
of much of the team&amp;rsquo;s time to get our type checker to an effective place.&lt;/p&gt;
&lt;p&gt;Our type checker, which we plan to name Sorbet, will allow us to continue developing
within our existing Ruby codebase. It will further allow our product engineers
to remain focused on developing new functionality rather than migrating existing
functionality to new services or programming languages.
Instead, our Product Infrastructure team will centrally absorb both the development
of the type checker and the initial rollout to our codebase.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s possible for Product Infrastructure to take on both, despite its fixed size.
We&amp;rsquo;ll rely on a hybrid approach of deep-dives to add typing to particularly complex areas,
and scripts to rewrite our code&amp;rsquo;s Abstract Syntax Trees (AST) for less complex portions.
In the relatively unlikely event that this approach fails, the cost to Stripe is of a small, known size:
approximately six months of half the Product Infrastructure team, which is what we anticipate
requiring to determine if this approach is viable.&lt;/p&gt;
&lt;p&gt;Based on our knowledge of Facebook&amp;rsquo;s &lt;a href="https://hacklang.org/"&gt;Hack&lt;/a&gt;
project, we believe we can build a static type checker
that runs locally and significantly faster than our test suite.
It&amp;rsquo;s hard to make a precise guess now, but we think less than 30 seconds to type our entire codebase,
despite it being quite large.
This will allow for a highly productive local development experience, even if we are not able to
speed up local testing. Even if we do speed up local testing, typing would help us eliminate
one of the categories of errors that testing has been unable to eliminate, which is passing
of unexpected types across code paths which have been tested for expected scenarios but not
for entirely unexpected scenarios.&lt;/p&gt;
&lt;p&gt;Once the type checker has been validated, we can incrementally prioritize adding typing
to the highest value places across the codebase. We do not need to wholly type
our codebase before we can start getting meaningful value.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In support of these static typing efforts, we will advocate for product engineers at Stripe to begin development
using the &lt;a href="https://en.wikipedia.org/wiki/Command_Query_Responsibility_Segregation"&gt;Command Query Responsibility Segregation&lt;/a&gt;
(CQRS) design pattern, which we believe
will provide high-leverage interfaces for incrementally introducing static typing into our codebase.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Selective test execution will allow developers to quickly run appropriate tests locally.
This will allow engineers to stay in a tight local development loop, speeding up development
of high quality code.&lt;/p&gt;
&lt;p&gt;Given that our codebase is not currently statically typed, inferring which tests to run is rather challenging.
With our very high test coverage, and the fact that all tests will still be run before deployment to the
production environment,
we believe that we can rely on statistically inferring which tests are likely to fail when a given file is modified.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Instrumenting test failures is our third, and lowest priority, project for this half.
Our focus this half is purely on annotating errors for which we have high conviction about their source, whether infrastructure or test issues.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For escalations and issues, reach out in the #product-infra channel.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="diagnose"&gt;Diagnose&lt;/h2&gt;
&lt;p&gt;In 2017, Stripe is a company of about 1,000 people, including 400 software engineers.
We aim to grow our organization by about 70% year-over-year to meet increasing demand
for a broader product portfolio and to scale our existing products and infrastructure to accommodate user growth.
As our production stability has improved over the past several years, we have now turned
our focus towards improving developer productivity.&lt;/p&gt;
&lt;p&gt;Our current diagnosis of our developer productivity is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;We primarily fund developer productivity for our Ruby-authoring software engineers via our Product Infrastructure team.
The Ruby-focused portion of that team has about ten engineers on it today, and is unlikely to significantly grow in the future.
(If we do expand, we are likely to staff non-Ruby ecosystems like Scala or Golang.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We have two primary mechanisms for understanding our engineer&amp;rsquo;s developer experience.
The first is standard productivity metrics around deploy time, deploy stability, test coverage, test time, test flakiness, and so on.
The second is a twice annual developer productivity survey.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Looking at our productivity metrics, our test coverage remains extremely high, with coverage above 99% of lines,
and tests are quite slow to run locally. They run quickly in our infrastructure because they are multiplexed
across a large fleet of test runners.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Tests have become slow enough to run locally that an increasing number of developers
run an overly narrow subset of tests, or entirely skip running tests until after pushing their changes.
They instead rely on our test servers to run against their pull request&amp;rsquo;s branch,
which works well enough, but significantly slows down developer iteration time because the
merge, build, and test cycle takes twenty to thirty minutes to complete.&lt;/p&gt;
&lt;p&gt;By the time their build-test cycle completes, they&amp;rsquo;ve lost their focus and maybe take several hours
to return to addressing the results.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There is significant disagreement about whether tests are becoming flakier due to test infrastructure
issues, or due to quality issues of the tests themselves. At this point, there is no trustworthy dataset
that allows us to attribute between those two causes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Feedback from the twice annual developer productivity survey supports the above diagnosis,
and adds some additional nuance.
Most concerning, although long-tenured Stripe engineers find themselves highly productive in our codebase,
we increasingly hear in the survey that newly hired engineers with long tenures at other companies
find themselves unproductive in our codebase.
Specifically, they find it very difficult to determine how to safely make changes in our codebase.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Our product codebase is entirely implemented in a single Ruby monolith.
There is one narrow exception, a Golang service handling payment tokenization,
which we consider out of scope for two reasons.
First, it is kept intentionally narrow in order to absorb our SOC1 compliance obligations.
Second, developers in that environment have not raised concerns about their productivity.&lt;/p&gt;
&lt;p&gt;Our data infrastructure is implemented in Scala. While these developers have concerns&amp;ndash;primarily
slow build times&amp;ndash;they manage their build and deployment infrastructure independently, and the group remains relatively small.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Ruby is not a highly performant programming language, but we&amp;rsquo;ve found it sufficiently efficient
for our needs. Similarly, other languages are more cost-efficient from a compute resources perspective,
but a significant majority of our spend is on real-time storage and batch computation.
For these reasons alone, we would not consider replacing Ruby as our core programming language.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Our Product Infrastructure team is about ten engineers, supporting about 250 product engineers.
We anticipate this group growing modestly over time, but certainly sublinearly
to the overall growth of product engineers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Developers working in Golang and Scala routinely ask for more centralized support,
but it&amp;rsquo;s challenging to prioritize those requests as we&amp;rsquo;re forced to consider the return
on improving the experience for 240 product engineers working in Ruby vs 10 in Golang
or 40 data engineers in Scala.&lt;/p&gt;
&lt;p&gt;If we introduced more programming languages, this prioritization problem would become
increasingly difficult, and we are already failing to support additional languages.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>How to get better at strategy?</title><link>https://craftingengstrategy.com/getting-better/</link><pubDate>Thu, 10 Apr 2025 05:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/getting-better/</guid><description>&lt;p&gt;One of the most memorable quotes in Arthur Miller&amp;rsquo;s &lt;em&gt;The Death of a Salesman&lt;/em&gt;
comes from Uncle Ben, who describes his path to becoming wealthy as,
&amp;ldquo;When I was seventeen, I walked into the jungle, and when I was twenty-one I walked out. And by God I was rich.&amp;rdquo;
I wish I could describe the path to learning engineering strategy in similar terms,
but by all accounts it&amp;rsquo;s a much slower path. Two decades in, I am still learning
more from each project I work on.
This book has aimed to accelerate your learning path, but my experience is that there&amp;rsquo;s
still a great deal left to learn, despite what this book has hoped to accomplish.&lt;/p&gt;
&lt;p&gt;This final chapter is focused on the remaining advice I have to give on
how you can continue to improve at strategy long after reading this
book&amp;rsquo;s final page.
Inescapably, this chapter has become advice on writing your own strategy
for improving at strategy.
You are already familiar with my general suggestions on creating strategy,
so this chapter provides focused advice on creating your own plan to get better at strategy.&lt;/p&gt;
&lt;p&gt;It covers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Exploring strategy creation to find strategies you can learn from via public and private resources,
and through creating learning communities&lt;/li&gt;
&lt;li&gt;How to diagnose the strategies you&amp;rsquo;ve found, to ensure you learn the right lessons from each one&lt;/li&gt;
&lt;li&gt;Policies that will help you find ways to perform and practice strategy
within your organization, whether or not you have organizational authority&lt;/li&gt;
&lt;li&gt;Operational mechanisms to hold yourself accountable to developing a strategy practice&lt;/li&gt;
&lt;li&gt;My final benediction to you as a strategy practitioner who has finished reading this book&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With that preamble, let&amp;rsquo;s write this book&amp;rsquo;s final strategy:
your personal strategy for developing your strategy practice.&lt;/p&gt;
&lt;h2 id="exploring-strategy-creation"&gt;Exploring strategy creation&lt;/h2&gt;
&lt;p&gt;Ideally, we&amp;rsquo;d begin improving our engineering strategy skills by broadly reading publicly available examples.
Unfortunately, there simply aren&amp;rsquo;t many easily available works to learn from others&amp;rsquo; experience.
Nonetheless, resources do exist, and we&amp;rsquo;ll discuss the three categories
that I&amp;rsquo;ve found most useful:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Public resources on engineering strategy, such as companies&amp;rsquo; engineering blogs&lt;/li&gt;
&lt;li&gt;Private and undocumented strategies available through your professional network&lt;/li&gt;
&lt;li&gt;Learning communities that you build together, including ongoing learning circles&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Each of these is explored in its own section below.&lt;/p&gt;
&lt;h3 id="public-resources"&gt;Public resources&lt;/h3&gt;
&lt;p&gt;While there aren&amp;rsquo;t as many public engineering strategy resources as I&amp;rsquo;d like,
I&amp;rsquo;ve found that there are still a reasonable number available.
This book collects a number of such resources in the appendix of &lt;a href="https://craftingengstrategy.com/additional-resources/"&gt;engineering strategy resources&lt;/a&gt;.
That appendix also includes some individuals&amp;rsquo; blog posts that are adjacent to this topic.
You can go a long way by searching and prompting your way into these resources.&lt;/p&gt;
&lt;p&gt;As you read them, it&amp;rsquo;s important to recognize that public strategies are often misleading,
as &lt;a href="https://lethain.com/distinguishing-good-vs-bad-strategy/"&gt;discussed previously in evaluating strategies&lt;/a&gt;.
Everyone writing in public has an agenda, and that agenda often means that they&amp;rsquo;ll omit
important details to make themselves, or their company, come off well.
Make sure you read through the lines rather than taking things too literally.&lt;/p&gt;
&lt;h3 id="private-resources"&gt;Private resources&lt;/h3&gt;
&lt;p&gt;Ironically, where public resources are hard to find, I&amp;rsquo;ve found it
much easier to find privately held strategy resources.
While private recollections are still prone to inaccuracies,
the incentives to massage the truth are less pronounced.&lt;/p&gt;
&lt;p&gt;The most useful sources I&amp;rsquo;ve found are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;peers&amp;rsquo; stories&lt;/em&gt; &amp;ndash;
strategies are often oral histories, and they are shared freely among
peers within and across companies. As you build out your professional network,
you can usually get access to any company&amp;rsquo;s engineering strategy on any topic
by just asking.&lt;/p&gt;
&lt;p&gt;There are brief exceptions. Even a close peer won&amp;rsquo;t share a sensitive strategy before its
existence becomes obvious externally, but they&amp;rsquo;ll be glad to after it does.
People tend to overestimate how much information companies can keep private anyway.
Even reading recent job postings can usually expose a surprising amount about a company.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;internal strategy archaeologists&lt;/em&gt; &amp;ndash;
while surprisingly few companies formally collect their strategies into a repository,
the stories are informally collected by the tenured members of the organization.
These folks are the company&amp;rsquo;s strategy archaeologists, and you can learn a great
deal by explicitly consulting them&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;becoming a strategy archaeologist yourself&lt;/em&gt; &amp;ndash;
whether or not you&amp;rsquo;re a tenured member of your company, you can learn a tremendous amount by
starting to build your own strategy repository.
As you start collecting them, you&amp;rsquo;ll interest others in contributing their strategies as well.&lt;/p&gt;
&lt;p&gt;As discussed in &lt;em&gt;Staff Engineer&lt;/em&gt;&amp;rsquo;s section on the &lt;a href="https://staffeng.com/guides/engineering-strategy/"&gt;Write five then synthesize&lt;/a&gt;
approach to strategy,
over time you can foster a culture of documentation where one didn&amp;rsquo;t exist before.
Even better, building that culture doesn&amp;rsquo;t require any explicit authority, just an ongoing show of excitement.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are other sources as well, ranging from attending the hallway track in conferences
to organizing dinners where stories are shared with a commitment to privacy.&lt;/p&gt;
&lt;h3 id="working-in-community"&gt;Working in community&lt;/h3&gt;
&lt;p&gt;My final suggestion for seeing how others work on strategy is to form
a &lt;a href="https://lethain.com/rough-notes-learning-circles/"&gt;learning circle&lt;/a&gt;.
I formed a &lt;a href="https://lethain.com/crowdsourcing-cto-vpe-learning-circles/"&gt;learning circle when I first moved into an executive role&lt;/a&gt;,
and at this point have been running it for more than five years.
What&amp;rsquo;s surprised me the most is how much I&amp;rsquo;ve learned from it.&lt;/p&gt;
&lt;p&gt;There are a few reasons why ongoing learning circles are exceptional for sharing strategy:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Bi-directional discussion allows so much more learning and understanding
than mono-directional communication like conference talks or documents.&lt;/li&gt;
&lt;li&gt;Groups allow you to learn from others&amp;rsquo; experiences and others&amp;rsquo; questions,
rather than having to guide the entire learning yourself.&lt;/li&gt;
&lt;li&gt;Continuity allows you to see the strategy at inception, during the rollout,
and after it&amp;rsquo;s been in practice for some time.&lt;/li&gt;
&lt;li&gt;Trust is built slowly, and you only get the full details about a problem when
you&amp;rsquo;ve already successfully held trust about smaller things.
An ongoing group makes this sort of sharing feasible where a transient group does not.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Although putting one of these communities together requires a commitment,
they are the best mechanism I&amp;rsquo;ve found.
As a final secret, many people get stuck on how they can get invited to an existing
learning circle, but that&amp;rsquo;s almost always the wrong question to be asking.
If you want to join a learning circle, make one. That&amp;rsquo;s how I got invited to mine.&lt;/p&gt;
&lt;h2 id="diagnosing-your-prior-and-current-strategy-work"&gt;Diagnosing your prior and current strategy work&lt;/h2&gt;
&lt;p&gt;Collecting strategies to learn from is a valuable part of improving,
but it&amp;rsquo;s only the first step.
You also have to determine what to take away from each strategy.
For example, you have to determine whether Calm&amp;rsquo;s approach to &lt;a href="https://craftingengstrategy.com/project-resourcing-strategy/"&gt;resourcing Engineering-driven projects&lt;/a&gt;
is something to copy or something to avoid.&lt;/p&gt;
&lt;p&gt;What I&amp;rsquo;ve found effective is to apply &lt;a href="https://craftingengstrategy.com/evaluating-strategy/"&gt;the strategy rubric&lt;/a&gt; we developed in the &amp;ldquo;Is this strategy any good?&amp;rdquo; chapter
to each of the strategies you&amp;rsquo;ve collected.
Even by splitting a strategy into its various phases, you&amp;rsquo;ll learn a lot.
Applying the rubric to each phase will teach you more.
Each time you do this to another strategy, you&amp;rsquo;ll get a bit faster at applying
the rubric, and you&amp;rsquo;ll start to see interesting, recurring patterns.&lt;/p&gt;
&lt;p&gt;As you dig into a strategy that you&amp;rsquo;ve split into phases and applied the evaluation rubric to,
here are a handful of questions that I&amp;rsquo;ve found interesting to ask myself:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How long did it take to determine a strategy&amp;rsquo;s initial phase could be improved?
How high was the cost to fund that initial phase&amp;rsquo;s discovery?&lt;/li&gt;
&lt;li&gt;Why did the strategy reach its final stage and get repealed or replaced?
How long did that take to get there?&lt;/li&gt;
&lt;li&gt;If you had to pick only one, did this strategy fail in its approach to
exploration, diagnosis, policy or operations?&lt;/li&gt;
&lt;li&gt;To what extent did the strategy outlive the tenure of its primary author?
Did it get repealed quickly after their departure, did it endure,
or was it perhaps replaced during their tenure?&lt;/li&gt;
&lt;li&gt;Would you generally repeat this strategy, or would you strive to avoid repeating it?
If you did repeat it, what conditions seem necessary to make it a success?&lt;/li&gt;
&lt;li&gt;How might you apply this strategy to your current opportunities and challenges?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It&amp;rsquo;s not necessary to work through all of these questions for every strategy
you&amp;rsquo;re learning from. I often try to pick the two that I think might be most
interesting for a given strategy.&lt;/p&gt;
&lt;h2 id="policy-for-improving-at-strategy"&gt;Policy for improving at strategy&lt;/h2&gt;
&lt;p&gt;At a high level, there are just a few key policies to consider for improving your strategic abilities.
The first is implementing strategy,
and the second is practicing implementing strategy.
While those are indeed the starting points,
there are a few more detailed options worth consideration:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If your company has existing strategies that are not working, debug one and work to fix it.
If you lack the authority to work at the company scope, then decrease altitude until
you find an altitude you can work at. Perhaps setting Engineering organizational strategies
is beyond your circumstances, but strategy for your team is entirely accessible.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If your company has no documented strategies, document one to make it debuggable.
Again, if operating at a high altitude isn&amp;rsquo;t attainable for some reason, operate at a
lower altitude that is within reach.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If your company&amp;rsquo;s or team&amp;rsquo;s strategies are effective
but have low adoption, see if you can iterate on operational mechanisms
to increase adoption.
Many such mechanisms require no authority at all, such as low-noise nudges
or the &lt;a href="https://lethain.com/model-document-share/"&gt;model-document-share&lt;/a&gt; approach.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If existing strategies are effective and have high adoption,
see if you can build excitement for a new strategy.
Start by mining for which problems Staff-plus engineers and senior
managers believe are important. Once you find one, you have a valuable
strategy vein to start mining.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you don&amp;rsquo;t feel comfortable sharing your work internally,
then try writing proposals while only sharing them to a few trusted peers.&lt;/p&gt;
&lt;p&gt;You can even go further to only share proposals with trusted external peers,
perhaps within a learning circle that you create or join.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Trying all of these at once would be overwhelming, so I recommend picking one
in any given phase.
If you aren&amp;rsquo;t able to gain traction, then try another approach until something works.
It&amp;rsquo;s particularly important to recognize in your diagnosis
where things are not working&amp;ndash;perhaps you simply don&amp;rsquo;t have the sponsorship you
need to enforce strategy so you need to switch towards suggesting strategies instead&amp;ndash;and
you&amp;rsquo;ll find something that works.&lt;/p&gt;
&lt;h3 id="what-if-youre-not-allowed-to-do-strategy"&gt;What if you&amp;rsquo;re not allowed to do strategy?&lt;/h3&gt;
&lt;p&gt;If you&amp;rsquo;re looking to find one, you&amp;rsquo;ll always unearth a reason
why it&amp;rsquo;s not possible to do strategy in your current environment.&lt;/p&gt;
&lt;p&gt;If you believe your current role prevents you from engaging in strategy work,
I&amp;rsquo;ve found two useful approaches:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Lower your altitude&lt;/em&gt; &amp;ndash; there&amp;rsquo;s always a scale where you can perform strategy,
even if it&amp;rsquo;s just your team or even just yourself.&lt;/p&gt;
&lt;p&gt;Only you can forbid yourself from developing personal strategies.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Practice rather than perform&lt;/em&gt; &amp;ndash; organizations can only absorb so much strategy development
at a given time, so sometimes they won&amp;rsquo;t be open to you doing more strategy.
In that case, you should focus on &lt;em&gt;practicing&lt;/em&gt; strategy work rather than directly
performing it.&lt;/p&gt;
&lt;p&gt;Only you can stop yourself from practice.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Don&amp;rsquo;t believe the hype: you can always do strategy work.&lt;/p&gt;
&lt;h2 id="operating-your-strategy-improvement-policies"&gt;Operating your strategy improvement policies&lt;/h2&gt;
&lt;p&gt;As the refrain goes, even the best policies don&amp;rsquo;t accomplish much
if they aren&amp;rsquo;t paired with operational mechanisms to ensure the policies
actually happen, and debug why they aren&amp;rsquo;t happening.
It&amp;rsquo;s tempting to overlook operations for personal habits, but that would be a mistake.
These habits profoundly impact us in the long term, yet they&amp;rsquo;re easiest to neglect because others rarely inquire about them.&lt;/p&gt;
&lt;p&gt;The mechanisms I&amp;rsquo;d recommend:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Clearly track the strategies you&amp;rsquo;ve implemented, refined, documented, or read.
Maintain these in a document, spreadsheet, or folder that makes it easy to monitor your progress.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Review your tracked strategies every quarter: are you working on the expected number
and in the expected way? If not, why not?&lt;/p&gt;
&lt;p&gt;Ideally, your review should be done in community with a peer or a learning circle.
It&amp;rsquo;s too easy to deceive yourself, it&amp;rsquo;s much harder to trick someone else.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If your periodic review ever discovers that you&amp;rsquo;re simply not doing the work you expected,
sit down for an hour with someone that you trust&amp;ndash;ideally someone equally or more experienced than you&amp;ndash;and
debug what&amp;rsquo;s going wrong. Commit to doing this &lt;em&gt;before&lt;/em&gt; your next periodic review.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Tracking your personal habits can feel a bit odd,
but it&amp;rsquo;s something I highly recommend.
I&amp;rsquo;ve been setting and tracking personal goals for some time now—for example,
in my &lt;a href="https://lethain.com/2024-in-review/"&gt;2024 year in review&lt;/a&gt;—and have benefited greatly from it.&lt;/p&gt;
&lt;h3 id="too-busy-for-strategy"&gt;Too busy for strategy&lt;/h3&gt;
&lt;p&gt;Many companies convince themselves that they&amp;rsquo;re too much in a rush to make good decisions.
I&amp;rsquo;ve certainly gotten stuck in this view at times myself,
although at this point in my career I find it increasingly difficult
to not recognize that I have a number of tools to create time for strategy,
and an obligation to do strategy rather than inflict poor decisions on
the organizations I work in. Here&amp;rsquo;s my advice for creating time:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you&amp;rsquo;re not tracking how often you&amp;rsquo;re creating strategies, then start there.&lt;/li&gt;
&lt;li&gt;If you&amp;rsquo;ve not worked on a single strategy in the past six months, then start with one.&lt;/li&gt;
&lt;li&gt;If implementing a strategy has been prohibitively time consuming, then focus on practicing a strategy instead.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you do try all those things and still aren&amp;rsquo;t making progress,
then accept your reality: you don&amp;rsquo;t view doing strategy as particularly important.
Spend some time thinking about why that is, and if you&amp;rsquo;re comfortable with your answer,
then maybe this is a practice you should come back to later.&lt;/p&gt;
&lt;h2 id="final-words"&gt;Final words&lt;/h2&gt;
&lt;p&gt;At this point, you&amp;rsquo;ve read everything I have to offer on drafting engineering strategy.
I hope this has refined your view on what strategy can be in your organization,
and has given you the tools to draft a more thoughtful
future for your corner of the software engineering industry.&lt;/p&gt;
&lt;p&gt;What I&amp;rsquo;d never ask is for you to wholly agree with my ideas here. They are my best thinking
on this topic, but strategy is a topic where I&amp;rsquo;m certain Hegel&amp;rsquo;s world view is the correct one:
even the best ideas here are wrong in interesting ways, and will be surpassed by better ones.&lt;/p&gt;</description></item><item><title>Going Forward</title><link>https://craftingengstrategy.com/going-forward/</link><pubDate>Thu, 10 Apr 2025 05:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/going-forward/</guid><description/></item><item><title>Wardley mapping the service orchestration ecosystem (2014).</title><link>https://craftingengstrategy.com/uber-strategy-wardley/</link><pubDate>Thu, 10 Apr 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/uber-strategy-wardley/</guid><description>&lt;p&gt;In &lt;a href="https://craftingengstrategy.com/uber-strategy/"&gt;Uber&amp;rsquo;s 2014 service migration strategy&lt;/a&gt;,
we explore how to navigate the move from a Python monolith to a services-oriented
architecture while also scaling with user traffic that doubled every six months.&lt;/p&gt;
&lt;p&gt;This &lt;a href="https://craftingengstrategy.com/wardley-mapping/"&gt;Wardley map&lt;/a&gt; explores how orchestration frameworks were evolving
during that period to be used as an input into determining the most effective path forward
for Uber&amp;rsquo;s Infrastructure Engineering team.&lt;/p&gt;
&lt;h2 id="reading-this-map"&gt;Reading this map&lt;/h2&gt;
&lt;p&gt;To quickly understand this Wardley Map, read from top to bottom.
If you want to review how this map was &lt;em&gt;written&lt;/em&gt;, then you should
read section by section from the bottom up, starting with Users, then Value Chains, and so on.&lt;/p&gt;
&lt;p&gt;More detail on this structure in &lt;a href="https://craftingengstrategy.com/wardley-mapping/"&gt;Refining strategy with Wardley Mapping&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="how-things-work-today"&gt;How things work today&lt;/h2&gt;
&lt;p&gt;There are three primary internal teams involved in service provisioning.
The Service Provisioning Team abstracts applications developed by Product Engineering from servers managed by the Server Operations Team.
As more servers are added to support application scaling, this is invisible to the
applications themselves, freeing Product Engineers to focus on what the company
values the most: developing more application functionality.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/wardley-compute-v1.png" alt="Wardley map for service orchestration"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Wardley map for service orchestration&lt;/p&gt;
&lt;p&gt;The challenges within the current value chain are cost-efficient scaling, reliable deployment,
and fast deployment. All three of those problems anchor on the same underlying problem of
resource scheduling. We want to make a significant investment into improving our resource
scheduling, and believe that understanding the industry&amp;rsquo;s trend for resource scheduling
underpins making an effective choice.&lt;/p&gt;
&lt;h2 id="transition-to-future-state"&gt;Transition to future state&lt;/h2&gt;
&lt;p&gt;Most interesting cluster orchestration problems are anchored in
cluster metadata and resource scheduling.
Request routing, whether through DNS entries or allocated ports, depends on cluster metadata.
Mapping services to a fleet of servers depends on resource scheduling managing cluster metadata.
Deployment and autoscaling both depend on cluster metadata.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/wardley-compute-v2.png" alt="Pipeline showing progression of service orchestration over time"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Pipeline showing progression of service orchestration over time&lt;/p&gt;
&lt;p&gt;This is also an area where we see significant changes occurring in 2014.&lt;/p&gt;
&lt;p&gt;Uber initially solved this problem using Clusto, an open-source tool released by Digg
with goals similar to Hashicorp&amp;rsquo;s &lt;a href="https://www.consul.io/"&gt;Consul&lt;/a&gt;
but with limited adoption. We also used &lt;a href="https://www.puppet.com/"&gt;Puppet&lt;/a&gt; for configuring servers, alongside custom scripting.
This has worked, but has required custom, ongoing support for scheduling.
The key question we&amp;rsquo;re confronted with is whether to build our own scheduling algorithms (e.g. &lt;a href="https://en.wikipedia.org/wiki/Bin_packing_problem"&gt;bin packing&lt;/a&gt;)
or adopt a different approach.
It seems clear that the industry intends to directly solve this problem via two paths:
relying on Cloud providers for orchestration (Amazon Web Services, Google Cloud Platform, etc)
and through open-source scheduling frameworks such as Mesos and Kubernetes.&lt;/p&gt;
&lt;p&gt;Industry peers with more than five years of infrastructure experience are almost unanimously
adopting open-source scheduling frameworks to better support their physical infrastructure.
This will give them a tool to perform a bridged migration from physical infrastructure to cloud infrastructure.&lt;/p&gt;
&lt;p&gt;Newer companies with less existing infrastructure are moving directly to the cloud, and avoiding the orchestration problem entirely.
The only companies not adopting one of these two approaches are extraordinarily large and complex
(think Google or Microsoft) or allergic to making any technical change at all.&lt;/p&gt;
&lt;p&gt;From this analysis, it&amp;rsquo;s clear that continuing our reliance on Clusto and Puppet is going to
be an expensive investment that&amp;rsquo;s not particularly aligned with the industry&amp;rsquo;s evolution.&lt;/p&gt;
&lt;h2 id="user--value-chains"&gt;User &amp;amp; Value Chains&lt;/h2&gt;
&lt;p&gt;This map focuses on the orchestration ecosystem within a single company,
with a focus on what did, and did not, stay the same from roughly
2008 to 2014. It focuses in particular on three users:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Product Engineers&lt;/strong&gt; are focused on provisioning new services,
and then deploying new versions of that service as they make changes.
They are wholly focused on their own service, and entirely unaware of anything beneath the orchestration layer
(including any servers).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Service Provisioning Team&lt;/strong&gt;
focuses on provisioning new services, orchestrating resources for those services,
and routing traffic to those services.
This team acts as the bridge between the Product Engineers and the Server Operations Team.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Server Operations Team&lt;/strong&gt; is focused on adding server capacity to be used for orchestration.
They work closely with the Service Provisioning Team, and have no contact with the Product Engineers.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It&amp;rsquo;s worth acknowledging that, in practice, these are artificial aggregates of multiple underlying teams.
For example, routing traffic between services and servers is typically handled by a Traffic or Service Networking team.
However, these omissions are intended to clarify the distinctions relevant to the evolution of orchestration tooling.&lt;/p&gt;</description></item><item><title>How to resource Engineering-driven projects at Calm? (2020)</title><link>https://craftingengstrategy.com/project-resourcing-strategy/</link><pubDate>Thu, 03 Apr 2025 05:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/project-resourcing-strategy/</guid><description>&lt;p&gt;One of the recurring challenges in any organization is how to split your attention
across long-term and short-term problems. Your software might be struggling to scale
with ramping user load while also knowing that you have a series of meaningful security vulnerabilities
that need to be closed sooner than later. How do you balance across them?&lt;/p&gt;
&lt;p&gt;These sorts of balance questions occur at every level of an organization.
A particularly frequent format is the debate between Product and Engineering
about how much time goes towards developing new functionality versus improving
what&amp;rsquo;s already been implemented.
In 2020, Calm was growing rapidly as we navigated the COVID-19 pandemic,
and the team was struggling to make improvements, as they felt saturated by incoming new requests.
This strategy for resourcing Engineering-driven projects was our attempt to
solve that problem.&lt;/p&gt;
&lt;h2 id="reading-this-document"&gt;Reading this document&lt;/h2&gt;
&lt;p&gt;To apply this strategy, start at the top with &lt;em&gt;Policy&lt;/em&gt;. To understand the thinking behind this strategy, read sections in reverse order, starting with &lt;em&gt;Explore&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;More detail on this structure in &lt;a href="https://lethain.com/readable-engineering-strategy-documents"&gt;Making a readable Engineering Strategy document&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="policy--operation"&gt;Policy &amp;amp; Operation&lt;/h2&gt;
&lt;p&gt;Our policies for resourcing Engineering-driven projects are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We will protect one Eng-driven project per product engineering team, per quarter.
These projects should represent a maximum of 20% of the team&amp;rsquo;s bandwidth.
Each project must advance a measurable metric,
and execution must be designed to show progress on that metric within 4 weeks.&lt;/li&gt;
&lt;li&gt;These projects must adhere to &lt;a href="https://craftingengstrategy.com/product-eng-strategy/"&gt;Calm&amp;rsquo;s existing Engineering strategies&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;We resource these projects first in the team&amp;rsquo;s planning, rather than last.
However, only concrete projects are resourced.
If there are no concrete proposals, then the team won&amp;rsquo;t have time budgeted for Engineering-driven work.&lt;/li&gt;
&lt;li&gt;Team&amp;rsquo;s engineering manager is responsible for deciding on the project,
ensuring the project is valuable,
and pushing back on attempts to defund the project.&lt;/li&gt;
&lt;li&gt;Project selection does not require CTO approval, but you should escalate to the CTO if there&amp;rsquo;s friction
or disagreement.&lt;/li&gt;
&lt;li&gt;CTO will review Engineering-driven projects each quarter
to summarize their impact and provide feedback to teams&amp;rsquo; engineering managers
on project selection and execution.
They will also review teams that did &lt;em&gt;not&lt;/em&gt; perform a project to understand why not.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As we&amp;rsquo;ve communicated this strategy, we&amp;rsquo;ve frequently gotten conceptual alignment
that this sounds reasonable, coupled with uncertainty about what sort of projects
should actually be selected. At some level, this ambiguity is an acknowledgment
that we believe teams will identify the best opportunities bottoms-up.
However, we also wanted to give two concrete examples of projects we&amp;rsquo;re greenlighting in the
first batch:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Code-free media release&lt;/em&gt;: historically, we&amp;rsquo;ve needed to make a number of pull requests
to add, organize, and release new pieces of media. This is high urgency work,
but Engineering doesn&amp;rsquo;t exercise much judgment while doing it, and
manual steps often create errors. We aim to track and eliminate these pull requests,
while also increasing the number of releases that can be facilitated without
scaling the content release team.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Machine-learning content placement&lt;/em&gt;: developing new pieces of media is often a multi-week or month
process. After content is ready to release, there&amp;rsquo;s generally a debate on where to place the content.
This matters for the company, as this drives engagement with our users,
but it matters even more to the content creator, who is generally evaluated in terms of their content&amp;rsquo;s
performance.&lt;/p&gt;
&lt;p&gt;This often leads to Product and Engineering getting caught up in debates about how to
surface particular pieces of content. This project aims to improve user engagement
by surfacing the best content for their interests, while also giving the Content team
several explicit positions to highlight content without Product and Engineering involvement.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Although these projects are similar, it&amp;rsquo;s not intended that &lt;em&gt;all&lt;/em&gt;
Engineering-driven projects are of this variety.
Instead it&amp;rsquo;s happenstance based on what the teams view as
their biggest opportunities today.&lt;/p&gt;
&lt;h2 id="diagnosis"&gt;Diagnosis&lt;/h2&gt;
&lt;p&gt;Our assessment of the current situation at Calm is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;We are spending a high percentage of our time on urgent but low engineering value tasks.
Most significantly, about one-third of our time is going into launching, debugging,
and changing content that we release into our product.
Engineering is involved due to implementation limitations, not because our involvement adds inherent value
(We mostly just make releases slowly and inadvertently introduce bugs of our own.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We have a bunch of fairly clear ideas around improving the platform
to empower the Content team to speed up releases, and to eliminate the Engineering involvement.
However, we&amp;rsquo;ve struggled to find time to implement them, or to validate that these ideas will work.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If we don&amp;rsquo;t find a way to prioritize, and succeed at implementing, a project
to reduce Engineering involvement in Content releases, we will struggle to support
our goals to release more content and to develop more product functionality this year&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Our Infrastructure team has been able to plan and make these kinds of investments stick.
However, when we attempt these projects within our Product Engineering teams,
things don&amp;rsquo;t go that well.
We are good at getting them onto the initial roadmap, but then
they get deprioritized due to pressure to complete other projects.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Our Engineering team of 20 engineers is not very fungible, largely due to
specialization across roles like iOS, Android, Backend, Frontend, Infrastructure, and QA.
We would like to staff these kinds of projects onto the Infrastructure team,
but in practice that team does not have the product development experience to implement
this kind of project.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We&amp;rsquo;ve discussed spinning up a Platform team, or moving product engineers onto Infrastructure,
but that would either (1) break our goal to maintain joint pairs between Product Managers and Engineering Managers,
or (2) be indistinguishable from prioritizing within the existing team because it would still have
the same Product Manager and Engineering Manager pair.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Company planning is organic, occurring in many discussions and limited structured process.
If we make a decision to invest in one project, it&amp;rsquo;s easy for that project to get
deprioritized in a side discussion missing context on why the project is important.&lt;/p&gt;
&lt;p&gt;These reprioritization discussions happen both in executive forums and in
team-specific forums. There&amp;rsquo;s imperfect awareness across these two sorts of forums.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="explore"&gt;Explore&lt;/h2&gt;
&lt;p&gt;Prioritization is a deep topic with a wide variety of &lt;a href="https://en.wikipedia.org/wiki/Requirement_prioritization"&gt;popular solutions&lt;/a&gt;.
For example, many software companies rely on &amp;ldquo;RICE&amp;rdquo; scoring, calculating priority as (Reach times Impact times Confidence) divided by Effort.
At the other extreme are complex methodologies like &lt;a href="https://en.wikipedia.org/wiki/Scaled_agile_framework"&gt;Scaled Agile Framework&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In addition to generalized planning solutions, many companies carve out special mechanisms
to solve for particular prioritization gaps.
Google historically offered &lt;a href="https://en.wikipedia.org/wiki/Side_project_time"&gt;20% time&lt;/a&gt; to allow
individuals to work on experimental projects that didn&amp;rsquo;t align directly with top-down priorities.
Stripe&amp;rsquo;s Foundation Engineering organization developed the concept of Foundational Initiatives
to prioritize cross-pillar projects with long-term implications,
which otherwise struggled to get prioritized within the team-led planning process.&lt;/p&gt;
&lt;p&gt;All these methods have clear examples of succeeding, and equally clear examples of struggling.
Where these initiatives have succeeded, they had an engaged executive sponsoring the practice&amp;rsquo;s rollout,
including triaging escalations when the rollout inconvenienced supporters of the prior method.
Where they lacked a sponsor, or were misaligned with the company&amp;rsquo;s culture, these methods
have consistently failed despite the fact that they&amp;rsquo;ve previously succeeded elsewhere.&lt;/p&gt;</description></item><item><title>Is this strategy any good?</title><link>https://craftingengstrategy.com/evaluating-strategy/</link><pubDate>Thu, 27 Mar 2025 05:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/evaluating-strategy/</guid><description>&lt;p&gt;We&amp;rsquo;ve read a lot of strategy at this point in the book.
We can judge a strategy&amp;rsquo;s format, and its construction: both are useful things.
However, format is a predictor of quality, not quality itself.
The remaining question is, how should we assess whether a strategy is any good?&lt;/p&gt;
&lt;p&gt;&lt;a href="https://craftingengstrategy.com/uber-strategy/"&gt;Uber&amp;rsquo;s service migration strategy&lt;/a&gt; unlocked
the entire organization to make rapid progress.
It also led to a sprawling architecture problem down the line.
Was it a great strategy or a terrible one? Folks can reasonably disagree,
but it&amp;rsquo;s worthwhile developing our point of view on why we should prefer one
interpretation or the other.&lt;/p&gt;
&lt;p&gt;This chapter will focus on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The various ways that are frequently suggested for evaluating strategies,
such as input-only evaluation, output-only evaluation, and so on&lt;/li&gt;
&lt;li&gt;A rubric for evaluating strategies, and why a useful
rubric has to recognize that strategies have to be evaluated
in phases rather than as a unified construct&lt;/li&gt;
&lt;li&gt;Why ending a strategy is often a sign of a good strategist,
and sometimes the natural reaction to a new phase in a strategy,
rather than a judgment on prior phases&lt;/li&gt;
&lt;li&gt;How missing context is an unpierceable veil for evaluating other companies'
strategies with high-conviction, and why you&amp;rsquo;ll end up attempting to
evaluate them anyway&lt;/li&gt;
&lt;li&gt;Why you can learn just as much from bad strategies as from good ones,
even in circumstances where you are missing much of the underlying context&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Time to refine our judgment about strategy quality a bit.&lt;/p&gt;
&lt;h2 id="how-are-strategies-graded"&gt;How are strategies graded?&lt;/h2&gt;
&lt;p&gt;Before suggesting my own rubric, I want to explore how the industry appears to grade strategies in practice.
That&amp;rsquo;s not because I particularly agree with them&amp;ndash;I generally find each approach misses an important nuance&amp;ndash;understanding
their flaws is a foundation to build on.&lt;/p&gt;
&lt;p&gt;Grading strategy on its outputs is by far the most prevalent approach I&amp;rsquo;ve found in industry.
This is an appealing approach, because it does make sense that a strategy&amp;rsquo;s results are more important
than anything else. However, this line of thinking can go awry.
We saw massive companies like Google move to service
architectures, and we copied them because if it worked for Google, it would likely work for us.
As discussed in the &lt;a href="https://craftingengstrategy.com/monolith-decomposition-strategy/"&gt;monolith decomposition strategy&lt;/a&gt;,
it did not work particularly well for most adopters.&lt;/p&gt;
&lt;p&gt;The challenge with grading outputs is that it doesn&amp;rsquo;t distinguish between
&amp;ldquo;alpha&amp;rdquo;, how much better your results are because of your strategy, and &amp;ldquo;beta&amp;rdquo;,
the expected outcome if you hadn&amp;rsquo;t used the strategy.
For example, the &lt;a href="https://craftingengstrategy.com/index-acquisition-strategy/"&gt;acquisition of Index&lt;/a&gt;
allowed Stripe to build a point-of-sale business line, but they were also on track to internally
build that business. Looking &lt;em&gt;only&lt;/em&gt; at outputs can&amp;rsquo;t distinguish whether it would have been better
to build the business via acquisition or internally.
But one of those paths must have been the better strategy.&lt;/p&gt;
&lt;p&gt;Similarly, there are also strategies that succeed, but do so at unreasonably high costs.
&lt;a href="https://craftingengstrategy.com/api-deprecation-strategy/"&gt;Stripe&amp;rsquo;s API deprecation strategy&lt;/a&gt; is a good example of a
strategy that was &lt;em&gt;extremely&lt;/em&gt; well worth the cost for the company&amp;rsquo;s first decade,
but eventually became too expensive to maintain as the evolving regulatory environment created more overhead.
Fortunately, Stripe modified their strategy to allow some deprecations, but you can imagine an
alternate scenario where they attempted to maintain their original strategy, which would have
likely failed due to its accumulating costs.&lt;/p&gt;
&lt;p&gt;Confronting these problems with judging on outputs, it&amp;rsquo;s compelling to switch to the opposite lens and evaluate
strategy purely on its inputs. In that approach, as long as the sum of the strategy&amp;rsquo;s parts make sense,
it&amp;rsquo;s a good strategy, even if it didn&amp;rsquo;t accomplish its goals.
This approach is very appealing, because it appears to focus &lt;em&gt;purely&lt;/em&gt; on the strategy&amp;rsquo;s alpha.&lt;/p&gt;
&lt;p&gt;Unfortunately I find this view similarly deficient.
For example, the &lt;a href="https://craftingengstrategy.com/llm-adoption-strategy/"&gt;strategy for adopting LLMs&lt;/a&gt; offers a cautious approach to adopting LLMs.
If that company is outcompeted by competitors in the incorporation of LLMs, to the loss of significant revenue,
I would argue that strategy isn&amp;rsquo;t a great one, even if it&amp;rsquo;s rooted in a proper diagnosis and effective policies.
Doing good strategy requires reconciling the theoretical with the practical,
so we can&amp;rsquo;t argue that inputs alone are enough to evaluate strategy work.
If a strategy is conceptually sound, but struggling to make an impact,
then its authors should continue to &lt;a href="https://craftingengstrategy.com/refine/"&gt;refine it&lt;/a&gt;.
If its authors take a single pass and ignore subsequent information that it&amp;rsquo;s not working,
then it&amp;rsquo;s a failed strategy, regardless of how thoughtful the first pass was.&lt;/p&gt;
&lt;p&gt;While I find these mechanisms to be incomplete, they&amp;rsquo;re still instructive.
By incorporating bits of each of these observations, we&amp;rsquo;re surprisingly
close to a rubric that avoids each of these particular downfalls.&lt;/p&gt;
&lt;h2 id="rubric-for-strategy"&gt;Rubric for strategy&lt;/h2&gt;
&lt;p&gt;Balancing the strengths and flaws of the previous section&amp;rsquo;s ideas,
the rubric I&amp;rsquo;ve found effective for evaluating strategy is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;How quickly is the strategy refined?&lt;/strong&gt;
If a strategy starts out bad, but improves quickly, that&amp;rsquo;s a better strategy
than a mostly right strategy that never evolves.
Strategy thrives when its practitioners understand it is a living endeavor.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How expensive is the strategy&amp;rsquo;s refinement for implementing and impacted teams?&lt;/strong&gt;
Just as culture eats strategy for breakfast,
good policy loses to poor operational mechanisms every time.
Especially early on, good strategy is validated cheaply.
Expensive strategies are discarded before they can be validated,
let alone improved.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How well does the current iteration solve its diagnosis?&lt;/strong&gt;
Ultimately, strategy does have to address the diagnosis it starts from.
Even if you&amp;rsquo;re learning quickly and at a low cost, at some point you
do have to actually get to impact.
Strategy must eventually be graded on its impact.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With this rubric in hand, we can finally assess the
&lt;a href="https://craftingengstrategy.com/uber-strategy/"&gt;Uber&amp;rsquo;s service migration strategy&lt;/a&gt;.
It refined rapidly as we improved our tooling, minimized costs because
we had to rely on voluntary adoption, and solved its diagnosis extremely well.
So this was a great strategy, but how do we think about the fact that its diagnosis
missed out on the consequences of a wide-spread service architecture on developer productivity?&lt;/p&gt;
&lt;p&gt;This brings me to the final component of the strategy quality rubric:
the recognition that strategy exists across multiple phases.
Each phase is defined by new information&amp;ndash;whether or not this information is known
by the strategy&amp;rsquo;s authors&amp;ndash;that render the diagnosis incomplete.&lt;/p&gt;
&lt;p&gt;The Uber strategy can be thought of as existing across two phases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Phase 1 used service provisioning to address developer productivity challenges
in the monolith.&lt;/li&gt;
&lt;li&gt;Phase 2 was engaging with consequences of a sprawling service architecture.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All the good grades I gave the strategy are appropriate to the first phase.
However, the second phase was ushered in by the negative impacts to developer
productivity exposed by the initial rollout.
The second phase&amp;rsquo;s grades on the rate of iteration, the cost, and the outcomes
were reasonable, but a bit lower than first phase.
In the subsequent years, the second phase was succeeded
by a third phase that aimed to address the second&amp;rsquo;s challenges.&lt;/p&gt;
&lt;h2 id="does-stopping-mean-a-strategys-bad"&gt;Does stopping mean a strategy&amp;rsquo;s bad?&lt;/h2&gt;
&lt;p&gt;Now that we have a rubric, we can use it to evaluate one of the
important questions of strategy: does giving up on a strategy mean
that the strategy is a bad one?&lt;/p&gt;
&lt;p&gt;The vocabulary of strategy phases helps us here, and I think
it&amp;rsquo;s uncontroversial to say that a new phase&amp;rsquo;s evolution
of your prior diagnosis might make it appropriate to abandon a strategy.
For example, Digg owned our own servers in 2010, but
would certainly &lt;em&gt;not&lt;/em&gt; buy their own servers if they started
ten years later. Circumstances change.&lt;/p&gt;
&lt;p&gt;Sometimes I also think that aborting a strategy in its
first phase is a good sign. That&amp;rsquo;s generally true when
the rate of learning is outpaced by the cost of learning.
I recently sponsored a developer productivity strategy that
had some impact, but less than we&amp;rsquo;d intended.
We immortalized a few of the smaller pieces, and returned
further exploration to a &lt;a href="https://craftingengstrategy.com/when-write-strategy/"&gt;lower altitude strategy&lt;/a&gt;
owned by the teams rather than the high altitude strategy that I owned as an executive.&lt;/p&gt;
&lt;p&gt;Essentially all strategies are competing with strategies at other altitudes,
so I think giving up on strategies, especially high altitude strategies,
is almost always a good idea.&lt;/p&gt;
&lt;h2 id="the-unpierceable-veil"&gt;The unpierceable veil&lt;/h2&gt;
&lt;p&gt;Working within our industry, we are often called upon
to evaluate strategies from afar. As other companies rolled out
LLMs in their products or microservices for their architectures,
our companies pushed us on why we weren&amp;rsquo;t making these changes as well.
The &lt;a href="https://craftingengstrategy.com/explore/"&gt;exploration step&lt;/a&gt; of strategy helps determine
where a strategy might be useful for you, but even that doesn&amp;rsquo;t really help
you evaluate whether the strategy or the strategists were effective.&lt;/p&gt;
&lt;p&gt;There are simply too many dimensions of the rubric that you cannot evaluate
when you&amp;rsquo;re far away. For example, how many phases occurred before the idea that
became the external representation of the strategy came into existence?
How much did those early stages cost to implement?
Is the &lt;em&gt;real&lt;/em&gt; mastery in the operational mechanisms that are never reported on?
Did the external representation of the strategy ever happen at all,
or is it the logical next phase that solves the reality of the internal
implementation?&lt;/p&gt;
&lt;p&gt;With all that in mind, I find that it&amp;rsquo;s generally impossible to accurately evaluate
strategies happening in other companies with much conviction.
Even if you want to, the missing context is an impenetrable veil.
That&amp;rsquo;s not to say that you shouldn&amp;rsquo;t try to evaluate their strategies,
that&amp;rsquo;s something that you&amp;rsquo;ll be forced to do in your own strategy work.
Instead, it&amp;rsquo;s a reminder to keep a low confidence score in those appraisals:
you&amp;rsquo;re guaranteed to be missing something.&lt;/p&gt;
&lt;h2 id="learning-despite-quality-issues"&gt;Learning despite quality issues&lt;/h2&gt;
&lt;p&gt;Although I believe it&amp;rsquo;s quite valuable for us to judge the quality
of strategies, I want to caution against going a step further and
making the conclusion that you can&amp;rsquo;t learn from poor strategies.
As long as you are aware of a strategy&amp;rsquo;s quality, I believe you
can learn just as much from failed strategies as from great strategy.&lt;/p&gt;
&lt;p&gt;Part of this is because often even failed strategies have early phases
that work extremely well. Another part is because strategies tend to
fail for interesting reasons. I learned just as much from Stripe&amp;rsquo;s
failed rollout of agile, which struggled due to missing operational mechanisms, as I did from Calm&amp;rsquo;s successful transition to focus primarily on product engineering.
Without a clear point of view on which of these worked,
you&amp;rsquo;d be at risk of learning the wrong lessons, but with forewarning you don&amp;rsquo;t
run that risk.&lt;/p&gt;
&lt;p&gt;Once you&amp;rsquo;ve determined a strategy was unsuccessful, I find it particularly valuable
to determine the strategy&amp;rsquo;s phases and understand which phase and where in the &lt;a href="https://craftingengstrategy.com/strategy-steps/"&gt;strategy steps&lt;/a&gt;
things went wrong. Was it a lack of operational mechanisms? Was the policy itself a poor match for
the diagnosis? Was the diagnosis willfully ignorant of a truculent executive?
Answering these questions will teach you more about strategy than only studying successful strategies,
because you&amp;rsquo;ll develop an intuition for which parts truly matter.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Finishing this chapter, you now have a structured rubric
for evaluating a strategy, moving beyond &amp;ldquo;good strategy&amp;rdquo; and &amp;ldquo;bad strategy&amp;rdquo; to
a nuanced assessment.
This assessment is not just useful for grading strategy, but makes it possible to
specifically improve your strategy work.&lt;/p&gt;
&lt;p&gt;Maybe your approach is sound, but your operational mechanisms are too costly for
the rate of learning they facilitate.
Maybe you&amp;rsquo;ve treated strategy as a single iteration exercise, rather than
recognizing that even excellent strategy goes stale over time.
Keep those ideas in mind as we head into the final chapter
on &lt;a href="https://craftingengstrategy.com/getting-better/"&gt;how you personally can get better at strategy work&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Steps to build an engineering strategy.</title><link>https://craftingengstrategy.com/strategy-steps/</link><pubDate>Thu, 27 Mar 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/strategy-steps/</guid><description>&lt;p&gt;Often you&amp;rsquo;ll see a disorganized collection of ideas labeled as a &amp;ldquo;strategy.&amp;rdquo;
Even when they&amp;rsquo;re dense with ideas, such documents can be hard to parse, and are a major
reason why most engineers will claim their company doesn&amp;rsquo;t have a clear strategy
even though in my experience, &lt;em&gt;all&lt;/em&gt; companies follow some strategy, even if it&amp;rsquo;s undocumented.&lt;/p&gt;
&lt;p&gt;This chapter lays out a repeatable, structured approach to drafting strategy.
It introduces each step of that approach, which are then detailed further in their respective chapters.
Here we&amp;rsquo;ll cover:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How these five steps fit together to facilitate creating strategy,
especially by preventing practitioners from skipping steps that feel awkward or challenging.&lt;/li&gt;
&lt;li&gt;Step 1: Exploring the wider industry&amp;rsquo;s ideas and practices around the strategy you&amp;rsquo;re working on.
Exploration is understanding what recent research might change your approach,
and how the state of the art has changed since you last tackled a similar problem.&lt;/li&gt;
&lt;li&gt;Step 2: Diagnosing the details of your problem.
It&amp;rsquo;s hard to slow down to understand your problem clearly before attempting
to solve it, but it&amp;rsquo;s even more difficult to solve anything well without
a clear diagnosis.&lt;/li&gt;
&lt;li&gt;Step 3: Refinement is taking a raw, unproven set of ideas and testing them
against reality. Three techniques are introduced to support this validation process:
strategy testing, systems modeling, and Wardley mapping.&lt;/li&gt;
&lt;li&gt;Step 4: Policy makes the tradeoffs and decisions to solve your diagnosis.
These can range from specifying how software is architected, to how pull requests are reviewed,
to how headcount is allocated within an organization.&lt;/li&gt;
&lt;li&gt;Step 5: Operations are the concrete mechanisms that translate policy into an active force
within your organization.
These can be nudges that remind you about code changes without associated tests,
or weekly meetings where you study progress on a migration.&lt;/li&gt;
&lt;li&gt;Whether these steps are sacred or are open to adaptation and experimentation,
including when you personally should persevere in attempting steps that don&amp;rsquo;t
feel effective.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From this chapter&amp;rsquo;s starting point,
you&amp;rsquo;ll have a high-level summary of each step in strategy creation,
and can decide where you want to read further.&lt;/p&gt;
&lt;h2 id="how-the-steps-become-strategy"&gt;How the steps become strategy&lt;/h2&gt;
&lt;p&gt;Creating effective strategy is not the rote incantation of a formula. You can’t merely follow these steps to guarantee that you&amp;rsquo;ll create a great strategy.
However, what I’ve consistently found is that strategies fail more often due to avoidable errors than
from fundamentally unsound thinking.
Busy people skip steps. Especially steps they dislike or have failed at before.&lt;/p&gt;
&lt;p&gt;These steps are the scaffolding to avoid those errors.
By practicing routinely, you&amp;rsquo;ll build
powerful habits and intuition around which approach is most appropriate for the current strategy you&amp;rsquo;re working on.
They also help turn strategy into a community practice that you, your colleagues,
and the wider engineering ecosystem can participate in together.&lt;/p&gt;
&lt;p&gt;Each step is an input that flows into the next step. Your exploration is the foundation
of a solid diagnosis.
Your diagnosis helps you search the infinite space of policy for what you currently need.
Operational mechanisms help you turn policy into an active force supporting your
strategy rather than an abstract treatise.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re skeptical of the steps, you should certainly maintain your skepticism,
but do give them a few tries before discarding them entirely.
You may also appreciate the discussion in the chapter on
&lt;a href="https://craftingengstrategy.com/theory-and-practice/"&gt;bridging between theory and practice when doing strategy&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="explore"&gt;Explore&lt;/h2&gt;
&lt;p&gt;Exploration is the deliberate practice of searching through a strategy’s problem and solution spaces before allowing yourself to commit to a given approach.
It&amp;rsquo;s understanding how other companies and teams have approached similar questions, and whether their approaches
might also work well for you. It&amp;rsquo;s also learning why what brought you so much success at your former employer
isn&amp;rsquo;t necessarily the best solution for your current organization.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://craftingengstrategy.com/uber-strategy/"&gt;Uber service migration strategy&lt;/a&gt; used exploration
to understand the service ecosystem by reading industry literature:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;As a starting point, we find it valuable to read
&lt;a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43438.pdf"&gt;Large-scale cluster management at Google with Borg&lt;/a&gt;
which informed some elements of the approach to Kubernetes, and
&lt;a href="https://people.eecs.berkeley.edu/~alig/papers/mesos.pdf"&gt;Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center&lt;/a&gt;
which describes the Mesos/Aurora approach.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It also used a &lt;a href="https://craftingengstrategy.com/wardley-mapping/"&gt;Wardley map&lt;/a&gt; to explore the cloud compute ecosystem.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/wardley-compute-v2.png" alt="Evolution of service orchestration in 2014"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Evolution of service orchestration in 2014&lt;/p&gt;
&lt;p&gt;For more detail, read the &lt;a href="https://craftingengstrategy.com/explore/"&gt;Exploration chapter&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="diagnose"&gt;Diagnose&lt;/h2&gt;
&lt;p&gt;Diagnosis is your attempt to correctly recognize the context that the strategy needs to solve before deciding on the policies to address that context.
Starting from your exploration&amp;rsquo;s learnings, and your understanding of your current circumstances,
building a diagnosis forces you to delay thinking about solutions until you fully understand your problem&amp;rsquo;s nuances.&lt;/p&gt;
&lt;p&gt;A diagnosis can be largely data driven, such as
the &lt;a href="https://craftingengstrategy.com/private-equity-strategy/"&gt;navigating a Private Equity ownership transition strategy&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Our Engineering headcount costs have grown by 15% YoY this year, and 18% YoY the prior year.
Headcount grew 7% and 9% respectively, with the difference between headcount and headcount costs explained by salary band adjustments (4%),
a focus on hiring senior roles (3%), and increased hiring in higher cost geographic regions (1%).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It can also be less data driven, instead aiming to summarize a problem,
such as the &lt;a href="https://craftingengstrategy.com/index-acquisition-strategy/"&gt;Index acquisition strategy&lt;/a&gt;&amp;rsquo;s
summary of the known and unknown elements of the technical integration
prior to the acquisition closing:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We will need to rapidly integrate the acquired startup to meet this timeline. We only know a small number of details about what this will entail. We do know that point-of-sale devices directly operate on payment details (e.g. the point-of-sale device knows the credit card details of the card it reads).&lt;/p&gt;
&lt;p&gt;Our compliance obligations restrict such activity to our “tokenization environment”, a highly secured and isolated environment with direct access to payment details. This environment converts payment details into a unique token that other environments can utilize to operate against payment details without the compliance overhead of having direct access to the underlying payment details.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The approach, and challenges, of developing a diagnosis are
detailed in the &lt;a href="https://craftingengstrategy.com/diagnosis/"&gt;Diagnosis chapter&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="refine-test-map--model"&gt;Refine (Test, Map &amp;amp; Model)&lt;/h2&gt;
&lt;p&gt;Strategy refinement is a toolkit of methods to identify which parts of your diagnosis
are most important, and verify that your approach to solving the diagnosis actually works.
This chapter delves into the details of using three methods in particular:
&lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt;,
&lt;a href="https://craftingengstrategy.com/systems-modeling/"&gt;systems modeling&lt;/a&gt;,
and &lt;a href="https://craftingengstrategy.com/wardley-mapping/"&gt;Wardley mapping&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/QualityMentalModels.png" alt="Requests succeeding and failing between a user, load balancer, and server"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Requests succeeding and failing between a user, load balancer, and server&lt;/p&gt;
&lt;p class="tc"&gt;&lt;em&gt;An example of a systems modeling diagram.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;These techniques are also demonstrated in the strategy case studies,
such as the &lt;a href="https://craftingengstrategy.com/wardley-llm-ecosystem/"&gt;Wardley map of the LLM ecosystem&lt;/a&gt;,
or the &lt;a href="https://craftingengstrategy.com/private-equity-model/"&gt;systems model of backfilling roles without downleveling them&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For more detail, read the &lt;a href="https://craftingengstrategy.com/refine/"&gt;Refinement chapter&lt;/a&gt;.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;h3 id="why-isnt-refinement-earlier-or-later"&gt;Why isn&amp;rsquo;t refinement earlier (or later)?&lt;/h3&gt;
&lt;p&gt;A frequent point of disagreement is that refinement should occur before
the diagnosis. Another is that mapping and modeling are two distinct steps,
and mapping should occur before diagnosis, and modeling should occur
after policy.
A third is that refinement ought to be the final step of strategy,
turning the steps into a looping cycle.
These are all reasonable observations, so let me unpack
my rationale for this structure.&lt;/p&gt;
&lt;p&gt;By &lt;em&gt;far&lt;/em&gt; the biggest risk for most strategies is not that you
model too early, or map too late, but instead that you simply skip
both steps entirely. My foremost concern is minimizing the required
investment into mapping and modeling such that more folks do these steps at all.
Refining after exploring and diagnosing allows you to concentrate your efforts
on a smaller number of load-bearing areas.&lt;/p&gt;
&lt;p&gt;That said, it&amp;rsquo;s common to refine many places in your strategy creation.
You&amp;rsquo;re just as likely to have three small refinement steps as one bigger one.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="policy"&gt;Policy&lt;/h2&gt;
&lt;p&gt;Policy is interpreting your diagnosis into a concrete plan.
This plan also needs to work, which requires careful study
of what&amp;rsquo;s worked within your company, and what new ideas you&amp;rsquo;ve
discovered while exploring the current problem.&lt;/p&gt;
&lt;p&gt;Policies can range from providing directional guidance,
such as the &lt;a href="https://craftingengstrategy.com/user-data-strategy/"&gt;user data controls strategy&lt;/a&gt;&amp;rsquo;s
guidance:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Good security discussions don’t frame decisions as a compromise between security and usability.&lt;/strong&gt; We will pursue multi-dimensional tradeoffs to simultaneously improve security and efficiency. Whenever we frame a discussion on trading off between security and utility, it’s a sign that we are having the wrong discussion, and that we should rethink our approach.&lt;/p&gt;
&lt;p&gt;We will prioritize mechanisms that can both automatically authorize and automatically document the rationale for accesses to customer data. The most obvious example of this is automatically granting access to a customer support agent for users who have an open support ticket assigned to that agent. (And removing that access when that ticket is reassigned or resolved.)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To committing not to make a decision until later,
as practiced in the &lt;a href="https://craftingengstrategy.com/index-acquisition-strategy/"&gt;Index acquisition strategy&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Defer making a decision regarding the introduction of Java to a later date: the introduction of Java is incompatible with our existing engineering strategy, but at this point we’ve also been unable to align stakeholders on how to address this decision. Further, we see attempting to address this issue as a distraction from our timely goal of launching a joint product within six months.&lt;/p&gt;
&lt;p&gt;We will take up this discussion after launching the initial release.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This chapter further goes into evaluating policies,
overcoming ambiguous circumstances that make it difficult
to decide on an approach, and developing novel policies.&lt;/p&gt;
&lt;p&gt;For full detail, read the &lt;a href="https://craftingengstrategy.com/policy/"&gt;Policy chapter&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="operations"&gt;Operations&lt;/h2&gt;
&lt;p&gt;Even the best policies have to be interpreted. There will be new circumstances
their authors never imagined, and the policies may be in effect long after their authors have left
the organization. Operational mechanisms are the concrete implementation of your policy.&lt;/p&gt;
&lt;p&gt;The simplest mechanisms are an explicit escalation path,
as shown in &lt;a href="https://craftingengstrategy.com/product-eng-strategy/"&gt;Calm&amp;rsquo;s product engineering strategy&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Exceptions are granted by the CTO, and must be in writing. The above policies are deliberately restrictive. Sometimes they may be wrong, and we will make exceptions to them. However, each exception should be deliberate and grounded in concrete problems we are aligned both on solving and how we solve them. If we all scatter towards our preferred solution, then we’ll create negative leverage for Calm rather than serving as the engine that advances our product.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;From that starting point, the mechanisms can get far more complex.
This chapter works through evaluating mechanisms, composing an operational plan,
and the most common sorts of operational mechanisms that I&amp;rsquo;ve seen across strategies.&lt;/p&gt;
&lt;p&gt;For more detail, read the &lt;a href="https://craftingengstrategy.com/operations/"&gt;Operations chapter&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="is-the-structure-sacrosanct"&gt;Is the structure sacrosanct?&lt;/h2&gt;
&lt;p&gt;When someone&amp;rsquo;s struggling to write a strategy document,
one of the first tools someone will often recommend is a strategy template.
Templates are great: they reduce the ambiguity in an already broad project
into something more tractable.
If you&amp;rsquo;re wondering if you should use a template to craft strategy:
sure, go ahead!&lt;/p&gt;
&lt;p&gt;However, I find that well-meaning, thoughtful templates often
turn into lumbering, callous documents that serve no one well.
The secret to good templates is that someone has to own it,
and that person has to care about the template writer first and foremost,
rather than the various constituencies that want to insert requirements
into the strategy creation process.
The security, compliance and cost of your plans matter a great deal,
but many organizations start to layer in more and more requirements
into these sorts of documents until the idea of writing them
becomes prohibitively painful.&lt;/p&gt;
&lt;p&gt;The best advice I can give someone attempting to write
strategy, is that you should discard every element of strategy
that gets in your way &lt;em&gt;as long as&lt;/em&gt; you can explain what that element
was intended to accomplish.
For example, if you&amp;rsquo;re drafting a strategy and you don&amp;rsquo;t find any
operational mechanisms that fit. That&amp;rsquo;s fine, discard that section.
Ultimately, the structure is not sacrosanct, it&amp;rsquo;s the thinking
behind the sections that really matter.&lt;/p&gt;
&lt;p&gt;This topic is explored in more detail in the chapter on
&lt;a href="https://craftingengstrategy.com/readable-strategy/"&gt;Making engineering strategies more readable&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Now, you know the foundational steps to conducting strategy.
From here, you can dive into the details with the strategy case studies
like &lt;a href="https://craftingengstrategy.com/llm-adoption-strategy/"&gt;How should you adopt LLMs?&lt;/a&gt;
or you can maintain a high altitude starting with
how &lt;a href="https://craftingengstrategy.com/explore/"&gt;exploration creates the foundation for an effective strategy&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Whichever you start with, I encourage you to eventually work through both
to get the full perspective.&lt;/p&gt;</description></item><item><title>Steps</title><link>https://craftingengstrategy.com/steps/</link><pubDate>Thu, 27 Mar 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/steps/</guid><description/></item><item><title>Operations</title><link>https://craftingengstrategy.com/operations/</link><pubDate>Thu, 20 Mar 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/operations/</guid><description>&lt;p&gt;Even the best policies fail if they aren&amp;rsquo;t adopted by the teams they&amp;rsquo;re intended to serve.
Can we persistently change our company&amp;rsquo;s behaviors with a one-time announcement?
No, probably not.&lt;/p&gt;
&lt;p&gt;I refer to the art of making policies work as &amp;ldquo;operations&amp;rdquo; or &amp;ldquo;strategy operations.&amp;rdquo;
The good news is that effectively operating a policy is two-thirds avoiding
common practices that simply don&amp;rsquo;t work.
The other one-third takes some repetition, but can be practiced in any engineering role:
there&amp;rsquo;s no need to wait until you&amp;rsquo;re an executive to start building mastery.&lt;/p&gt;
&lt;p&gt;This chapter will dig into those mechanisms, with particular focus on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How policies are supported by operations, and how
operations are composed of mechanisms that ensure they work well&lt;/li&gt;
&lt;li&gt;Evaluating operational mechanisms to select between different options,
and determine which mechanisms are unlikely to be an effective choice&lt;/li&gt;
&lt;li&gt;Composing an operational plan for the specific set of policies that
you are looking to support&lt;/li&gt;
&lt;li&gt;Common varieties of effective mechanisms such as approval forums, inspection mechanisms,
nudges, and so on.
We&amp;rsquo;ll also explore the sorts of mechanisms that tend to work poorly&lt;/li&gt;
&lt;li&gt;How to adjust your approach to operations if you are in an engineering
role rather than an executive role&lt;/li&gt;
&lt;li&gt;How cargo-culting remains the largest threat to effective strategy operations&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&amp;rsquo;s unpack the details of turning your &lt;em&gt;potentially&lt;/em&gt;
good policy into an impactful policy.&lt;/p&gt;
&lt;h2 id="what-are-operational-mechanisms"&gt;What are operational mechanisms?&lt;/h2&gt;
&lt;p&gt;Operations are how a policy is implemented and reinforced.
Effective operations ensure that your policies actually accomplish something.
They can range from a recurring weekly meeting, to an alert that notifies the team when a threshold is exceeded,
to a promotion rubric requiring a certain behavior to be promoted.&lt;/p&gt;
&lt;p&gt;In the strategy for &lt;a href="https://craftingengstrategy.com/private-equity-strategy/"&gt;working with new private equity ownership&lt;/a&gt;,
we introduce a policy to backfill hires at a lower level, and also limit the maximum number
of principal engineers:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;We will move to an “N-1” backfill policy&lt;/strong&gt;, where departures are backfilled with a less senior level.
We will also institute a strict maximum of one Principal Engineer per business unit,
with any exceptions approved in writing by the CTO–this applies for both promotions and external hires.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That introduces an explicit operational mechanism of escalations going to the CTO,
but it also introduces an implicit and undefined mechanism: how do we ensure the backfills are actually
down-leveled as the policy instructs?
It might be a group chat with engineering recruiting where the CTO approves the level of backfilled roles.
Instead, it might be the responsibility of recruiting to enforce that downleveling.
In a third approach, it might be taken on trust that hiring managers will do the right thing.
Each of those three scenarios is a potential operational solution to implementing this policy.
Operations is picking the right one for your circumstances, and then tweaking it
as you learn from running it.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;&lt;strong&gt;Operations in government&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For another interesting take on how critical operations are,
&lt;em&gt;&lt;a href="https://www.recodingamerica.us/"&gt;Recoding America&lt;/a&gt;&lt;/em&gt; by Jennifer Pahlka
is well worth the read.
It explores how well-intended government legislation often isn&amp;rsquo;t implementable,
which results in policies that require massive IT investments
but provide little benefit to constituents.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="how-to-evaluate-mechanisms"&gt;How to evaluate mechanisms&lt;/h2&gt;
&lt;p&gt;In order to determine the most effective operational mechanisms
for the problems you&amp;rsquo;re working on, it&amp;rsquo;s useful to have a
standardized rubric for evaluating mechanisms.
While this rubric isn&amp;rsquo;t perfectly universal&amp;ndash;customize it for your needs&amp;ndash;having
any rubric will make it easier to evaluate your options consistently.&lt;/p&gt;
&lt;p&gt;The rubric I use to evaluate whether an operational mechanism will be effective is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Measurability&lt;/strong&gt;:
Can you measure both leading and lagging indicators
to &lt;a href="https://lethain.com/inspection/"&gt;inspect&lt;/a&gt; the mechanism&amp;rsquo;s impact?
If you have to choose between the two, measuring leading indicators allows much quicker
evaluation and iteration on your mechanisms.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Adoption cost&lt;/strong&gt;:
How much work will &lt;a href="https://lethain.com/migrations/"&gt;migrating&lt;/a&gt; to this mechanism require?
Can this work be done incrementally or does it require a major, coordinated shift?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;User ease (or burden)&lt;/strong&gt;:
After adopting this policy, how much easier (or harder) will it be for users to perform their work?
If things will be harder, are those users able to tolerate the additional time?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Provider ease (or burden)&lt;/strong&gt;:
How much additional ongoing maintenance will this mechanism require from the centralized or
platform team providing it?
For example, if every new architecture proposal requires a thorough review by your Security team,
does the Security team have the actual ability to support those reviews?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reliance on authority&lt;/strong&gt;:
How much does this mechanism depend on a top-down authority&amp;rsquo;s active support?
If the sponsoring executive departs, will this mechanism remain effective?
Is that an effective tradeoff in this case?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Culturally aligned&lt;/strong&gt;:
Is this something that your organization is going to do,
or something that they are going to fight against each step?
Is there a way you can adjust the framing to make it more acceptable
to your organization&amp;rsquo;s culture?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Generally, I find folks are good at evaluating mechanisms against these criteria,
but somewhat worse at accepting the consequences of their evaluation.
For example, falling in love with a particular mechanism
and then trying to force the organization to accept a mechanism whose adoption cost is unbearably high,
or introduce a mechanism that creates significant user burden onto a team that is
already struggling with tight efficiency goals like a customer support team.&lt;/p&gt;
&lt;p&gt;Self-awareness helps here, but so does consulting others to point out the errors in your reasoning,
which is a core part of how I&amp;rsquo;ve found success in adopting operational mechanisms.&lt;/p&gt;
&lt;h2 id="composing-an-operational-plan"&gt;Composing an operational plan&lt;/h2&gt;
&lt;p&gt;Your operational plan is the sum of the mechanisms used to support your policies.
While evaluating each individual mechanism in isolation is part of creating an operations plan,
it&amp;rsquo;s also valuable to consider how the mechanisms will work together:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Review the policies you&amp;rsquo;ve developed.&lt;/strong&gt;
What sort of mechanisms seem most likely to support these policies?
How might these mechanisms be pooled together to avoid redundancy?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Review the operational mechanisms that have worked in your organization.&lt;/strong&gt;
What mechanisms have been used to best effect, and which have left a sufficiently bad
taste in the organization&amp;rsquo;s collective memory that they&amp;rsquo;ll be hard to reuse effectively?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Which new mechanisms showed up in your &lt;a href="https://craftingengstrategy.com/explore/"&gt;exploration&lt;/a&gt;?&lt;/strong&gt;
In your exploration phase, you&amp;rsquo;ll frequently encounter mechanisms that your organization
hasn&amp;rsquo;t previously tried. If any of them seem particularly well-suited to the policies
you&amp;rsquo;re considering, and none of your organization&amp;rsquo;s frequently used mechanisms are good fits,
then consider testing a new one.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Evaluate mechanisms against the evaluation rubric.&lt;/strong&gt;
For each of the mechanisms you&amp;rsquo;re considering using,
apply the rubric from the above &lt;em&gt;How to evaluate mechanisms&lt;/em&gt;
to validate they&amp;rsquo;re good fits.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Consolidate into an operational plan.&lt;/strong&gt;
Now that you&amp;rsquo;ve determined the mechanisms you want to consider,
work on fitting the full set of mechanisms into one coherent plan.
Be particularly mindful of the ease, or burden, the integrated plan
creates for both your users and platform providers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Validate plan with users and providers.&lt;/strong&gt;
Many plans make sense from afar, but fail
due to imposing an unreasonable burden.
Or the burden might be acceptable, but the actual workflow
simply won&amp;rsquo;t work at all.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Consider validating via &lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt;.&lt;/strong&gt;
If you run the above process, and can&amp;rsquo;t come to an agreement with stakeholders on your proposed plan,
then simply commit to running a strategy testing process including the plan.
This will create space for everyone to build confidence in the approach before
they feel forced to make a commitment to following it long-term.&lt;/p&gt;
&lt;p&gt;Even if you don&amp;rsquo;t use strategy testing for your plan,
at least commit to scheduling a review in three months
reflecting on how things have worked out.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Your operational plan is the vehicle that delivers your policies
to your organization. It&amp;rsquo;s extremely tempting to skip refining
the details here, but it&amp;rsquo;s a relatively quick step and will
completely change your strategy&amp;rsquo;s outcomes.&lt;/p&gt;
&lt;h2 id="common-mechanisms"&gt;Common mechanisms&lt;/h2&gt;
&lt;p&gt;Most companies have a handful of frequently used operational mechanisms.
Some of those mechanisms are company specific, such as &lt;a href="https://forum.commoncog.com/t/the-amazon-weekly-business-review-commoncog/1958"&gt;Amazon&amp;rsquo;s weekly business review&lt;/a&gt;,
and others repeat across companies like requiring executive approval.
Across the many mechanisms you&amp;rsquo;ll encounter, you can generally cluster them into
recurring categories.
This section covers the mechanisms I&amp;rsquo;ve found consistently effective.&lt;/p&gt;
&lt;h3 id="approval-and-advice-forums"&gt;Approval and advice forums&lt;/h3&gt;
&lt;p&gt;At a high level, new policies are obvious, simple and apply cleanly to the problem they are intended to solve.
However, when you apply those policies to detailed, complex circumstances, it&amp;rsquo;s often ambiguous how
to stay loyal to a policy&amp;rsquo;s intentions.
Approval and advice forums are a common solution to that problem.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://craftingengstrategy.com/product-eng-strategy/"&gt;Calm&amp;rsquo;s product engineering strategy&lt;/a&gt; shows what the simplest,
and most common, approval forum looks like in practice:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Exceptions are granted by the CTO, and must be in writing.&lt;/strong&gt; The above policies are deliberately restrictive. Sometimes they may be wrong, and we will make exceptions to them. However, each exception should be deliberate and grounded in concrete problems we are aligned both on solving and how we solve them. If we all scatter towards our preferred solution, then we’ll create negative leverage for Calm rather than serving as the engine that advances our product.&lt;/p&gt;
&lt;p&gt;All exceptions must be written. If they are not written, then you should operate as if it has not been granted. Our goal is to avoid ambiguity around whether an exception has, or has not, been approved. If there’s no written record that the CTO approved it, then it’s not approved.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This example also has several weaknesses that happen in many approval forums.
Most importantly, it doesn&amp;rsquo;t make it clear how to get approvals.
It would be stronger if it explicitly explained how to get an approval (perhaps go ask in &lt;code&gt;#cto-approvals&lt;/code&gt;),
and where to find prior approvals to help someone considering requesting an exception to
calibrate their request.&lt;/p&gt;
&lt;p&gt;Approvals don&amp;rsquo;t necessarily need to come from senior leadership.
Instead, the senior leadership can loan their authority on a topic to another group.
The &lt;a href="https://craftingengstrategy.com/llm-adoption-strategy/"&gt;LLM adoption strategy&lt;/a&gt; provides a good example of this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Start with Anthropic. We use Anthropic models, which are available through our existing cloud provider via AWS Bedrock. To avoid maintaining multiple implementations, where we view the underlying foundational model quality to be somewhat undifferentiated, we are not looking to adopt a broad set of LLMs at this point. This is anchored in our Wardley map of the LLM ecosystem.&lt;/p&gt;
&lt;p&gt;Exceptions will be reviewed by the Machine Learning Review in #ml-review&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In a more community-minded organization, the approval forums might not
require senior leadership involvement at all. Instead, the culture might
create an environment where the forums&amp;rsquo; feedback is taken seriously on its
own merits.&lt;/p&gt;
&lt;p&gt;Every company does approval forums a bit differently, ranging from
our experiments at &lt;a href="https://lethain.com/navigators/"&gt;Carta with Navigators&lt;/a&gt;, granting executive authority for
technical decisions to named engineers in each area,
to Andrew Harmel-Law&amp;rsquo;s discussion of this topic in
&lt;em&gt;&lt;a href="https://www.amazon.com/Facilitating-Software-Architecture-Empowering-Architectural-ebook/dp/B0DMHGWCPN/"&gt;Facilitating Software Architecture&lt;/a&gt;&lt;/em&gt;.
You can spend a lot of time arguing the details here,
my experience is that having the right participants and a good executive sponsor
matter a lot, and the other pieces matter a lot less.&lt;/p&gt;
&lt;h3 id="inspection"&gt;Inspection&lt;/h3&gt;
&lt;p&gt;While even the best policies can fail, the more common scenario is that
a policy will sort-of work, and need some modest adjustments to make it
more successful. An &lt;a href="https://lethain.com/inspection/"&gt;inspect&lt;/a&gt; mechanism allows you to evaluate
whether your policy is succeeding and if you need to make adjustments.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://craftingengstrategy.com/user-data-strategy/"&gt;user-data access strategy&lt;/a&gt; provides
an example:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Measure progress on percentage of customer data access requests justified by a user-comprehensible, automated rationale.&lt;/strong&gt; This will anchor our approach on simultaneously improving the security of user data and the usability of our colleagues’ internal tools. If we only expand requirements for accessing customer data, we won’t view this as progress because it’s not automated (and consequently is likely to encourage workarounds as teams try to solve problems quickly). Similarly, if we only improve usability, charts won’t represent this as progress, because we won’t have increased the number of supported requests.&lt;/p&gt;
&lt;p&gt;As part of this effort, we will create a private channel where the security and compliance team has visibility into all manual rationales for user-data access, and will directly message the manager of any individual who relies on a manual justification for accessing user data.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This example is a good start, but fully realizing an inspection mechanism requires concretely specifying where and how the
data will be tracked. A better version of this would include a link to the dashboard you&amp;rsquo;ll look at,
and a commitment to reviewing the data on a certain frequency.&lt;/p&gt;
&lt;p&gt;For a recent inspection mechanism, I created a recurring invite with a link to the relevant data dashboard,
and a specific chat channel for discussion, and invited
the working group who had agreed to review the data on that cadence.
This wasn&amp;rsquo;t a synchronous meeting, but rather a commitment to independently review, and discuss anything that felt surprising.&lt;/p&gt;
&lt;p&gt;Your particular mechanisms could be threshold-triggered alerts, something you fold into an existing metrics review meeting,
a script you commit to running and reviewing periodically, or something else.
The most important thing is that it cannot silently fail.&lt;/p&gt;
&lt;h3 id="nudges"&gt;Nudges&lt;/h3&gt;
&lt;p&gt;While it&amp;rsquo;s common to hear complaints about how a team isn&amp;rsquo;t following a new policy,
as if it were a deliberate choice they&amp;rsquo;d made, I find it more common that people
want to do things the new way, but rarely take time to learn how to do it.
Nudges are providing individuals with context to inform them about a better way
they might do something, and they are an exceptionally effective mechanism.&lt;/p&gt;
&lt;p&gt;Grounding this in an example, at Stripe we had a policy of allowing teams
to self-authorize introducing new cloud hosting costs. This worked well
almost all the time. However, sometimes teams would accidentally introduce
large cost increases without realizing it, and teams that introduced those
spikes almost never had any awareness that they had caused the problem.
Even if we&amp;rsquo;d told them they must not introduce unapproved spending spikes,
they simply didn&amp;rsquo;t perceive they&amp;rsquo;d done it.&lt;/p&gt;
&lt;p&gt;We had the choice between preventing all teams from introducing new spend,
or we could try using a nudge. The nudge we added informed teams when
their cloud spend accelerated month over month, directed to charts that
explained the acceleration, and told them where to go to ask questions.
Nudges pair well with inspections, and there was also a monthly review
by the Efficiency Engineering team to review any spikes and reach out where necessary.&lt;/p&gt;
&lt;p&gt;Maybe we could have forced all teams to review new spend,
but this nudge approach didn&amp;rsquo;t require an authoritative mandate to implement.
It also meant we only spent time advising teams that &lt;em&gt;actually&lt;/em&gt; spent too much,
instead of having to discuss with every team that &lt;em&gt;might&lt;/em&gt; spend too much.&lt;/p&gt;
&lt;p&gt;As another example making that point, a working group at Carta added a nudge to inform managers
of untested pull requests merged by their team. Some managers had previously
said they simply didn&amp;rsquo;t know when and why their team had merged untested pull requests,
and this nudge made it easy to detect. The nudge also respected their attention
by not sending a notification at all if there wasn&amp;rsquo;t a new, untested pull request.&lt;/p&gt;
&lt;p&gt;With poor ergonomics, nudges can be an overwhelming assault on your colleagues attention,
but done well, I continue to believe they are the most effective operational mechanism.&lt;/p&gt;
&lt;h3 id="documentation"&gt;Documentation&lt;/h3&gt;
&lt;p&gt;Policies can&amp;rsquo;t be enforced by people who don&amp;rsquo;t know they exist,
or by people who don&amp;rsquo;t know how to follow those policies.
In my experience, nudges are the most effective way of solving both of those problems,
because nudges bring information to people at exactly the moment that information would
be useful.
At most companies, well-done nudges are relatively uncommon, and the far more common solution
to lack of information is documentation and training.&lt;/p&gt;
&lt;p&gt;There are so many approaches to both of these topics,
and I&amp;rsquo;ve not found my own approaches here particularly effective.
Consequently, I am hesitant to give much advice on what will work best
for you.
The best I can offer is that following standard practices for your company,
even if the outcomes seem imperfect, is probably your best bet.
Internal knowledge bases tend to rot quickly, and introducing yet another
knowledge base is almost always the illusion of progress rather than real progress.
Even when you really don&amp;rsquo;t like the current one.&lt;/p&gt;
&lt;p&gt;Finally, remember that success for documentation and training is not necessarily
that everyone in the company knows how a new policy works.
Instead, as discussed in &lt;a href="https://craftingengstrategy.com/is-useful/"&gt;the chapter on whether strategy is useful&lt;/a&gt;,
a more useful goal is informational herd immunity: as long as someone on each team
understands your policy, the team will generally be capable of following it.&lt;/p&gt;
&lt;h3 id="automation"&gt;Automation&lt;/h3&gt;
&lt;p&gt;Relying on humans to respond is slow, and the quality of human response is highly varied.
In many cases, automation provides the most effective and most scalable mechanism
to support your policies&amp;rsquo; rollout.&lt;/p&gt;
&lt;p&gt;Automation was key in the &lt;a href="https://craftingengstrategy.com/uber-strategy/"&gt;Uber service migration strategy&lt;/a&gt;,
moving us out of a manual, slow process that was taking up a great deal of user and provider time:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Move to structured requests, and out of tickets. Missing or incorrect information in provisioning requests create significant delays in provisioning. Further, collecting this information is the first step of moving to a self-service process. As such, we can get paid twice by reducing errors in manual provisioning while also creating the interface for self-service workflows.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In that case, better automation allowed us to eliminate a series of back-and-forth negotiations to collect
data, and to instead get the necessary information in a single step. Occasionally we still ran into users who
couldn&amp;rsquo;t fill in the form, but now we could focus on providing a good manual experience for those rare exceptions.&lt;/p&gt;
&lt;p&gt;As you use automation as a core strategy mechanism,
it&amp;rsquo;s important to recognize that designing an effective
user experience is a prerequisite to automation having a positive impact.
If you view the user experience of your automation as a secondary concern,
then you are unlikely to make much impact with automation.&lt;/p&gt;
&lt;h3 id="deferment-to-future-work"&gt;Deferment to future work&lt;/h3&gt;
&lt;p&gt;Sometimes there&amp;rsquo;s something you really want a policy to do, but you also
know that you have no reasonable mechanism to do it.
In that case, you may find explicitly deferring action on the topic useful.&lt;/p&gt;
&lt;p&gt;The strategy for &lt;a href="https://craftingengstrategy.com/index-acquisition-strategy/"&gt;integration of the Index acquisition at Stripe&lt;/a&gt;
uses this mechanism:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Defer making a decision regarding the introduction of Java to a later date: the introduction of Java is incompatible with our existing engineering strategy, but at this point we’ve also been unable to align stakeholders on how to address this decision. Further, we see attempting to address this issue as a distraction from our timely goal of launching a joint product within six months.&lt;/p&gt;
&lt;p&gt;We will take up this discussion after launching the initial release.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As did the strategy for &lt;a href="https://craftingengstrategy.com/private-equity-strategy/"&gt;working with a private equity acquirer&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We believe there are significant opportunities to reduce R&amp;amp;D maintenance investments, but we don’t have conviction about which particular efforts we should prioritize. We will kickoff a working group to identify the features with the highest support load.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There&amp;rsquo;s no shame in deferral.
As much as you want to make progress on a certain area,
it&amp;rsquo;s better to explicitly acknowledge that you can&amp;rsquo;t make progress
on it&amp;ndash;and clarify when you will be able to&amp;ndash;then to allow the organization
to churn on an intractable problem.&lt;/p&gt;
&lt;h3 id="meetings"&gt;Meetings&lt;/h3&gt;
&lt;p&gt;Meetings are the final mechanism, and you can fit any and all of
the above mechanisms into a meeting. They are a universal mechanism,
although frequently overused because they can do an adequate job
of operating almost any policy.&lt;/p&gt;
&lt;p&gt;The most common mechanism is a reporting meeting,
such as reporting progress in the Executive Weekly Meeting
as &lt;a href="https://craftingengstrategy.com/llm-adoption-strategy/"&gt;suggested in the LLM adoption strategy&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Develop an LLM-backed process for reactivating departed and suspended drivers in mature markets.&lt;/strong&gt;
Through modeling our driver lifecycle, we determined that improving onboarding time will have little impact on the total number of active drivers.
Instead, we are focusing on mechanisms to reactivate departed and suspended drivers, which is the only opportunity to meaningfully impact active drivers.&lt;/p&gt;
&lt;p&gt;Report on progress monthly in Exec Weekly Meeting, coordinated in #exec-weekly&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The other common meeting archetype is the &lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;weekly working meeting&lt;/a&gt;
introduced in the chapter on strategy testing. Meetings are almost always the
most expensive mechanism you can find to solve a problem,
but they are easy to suggest, run, and iterate on.&lt;/p&gt;
&lt;p&gt;If you can&amp;rsquo;t find any other mechanism you believe in,
then a meeting is a decent starting point.
Just don&amp;rsquo;t get too fond of them, and try to iterate
your way to canceling every meeting that you start.&lt;/p&gt;
&lt;h2 id="anti-patterns"&gt;Anti-patterns&lt;/h2&gt;
&lt;p&gt;In addition to the effective operational methods discussed above,
there are a number of additional mechanisms that are frequently
used, but which I consider anti-patterns.
They can provide some value, but there&amp;rsquo;s almost always a better alternative.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Top-down pronouncements&lt;/strong&gt;:
Sometimes a policy will be operationalized by simply declaring it must be followed.
It&amp;rsquo;s common to see a leader declare that a policy is now in effect,
assuming that the announcement is a useful way to implement the new policy.&lt;/p&gt;
&lt;p&gt;For example, some &amp;ldquo;return to office&amp;rdquo; policies dictate that the team
must work from their office, but driving a real change requires
motivating those individuals to actually return.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Education-as-announcements rollouts&lt;/strong&gt;:
The default way that many companies roll out policies is through one-time &amp;ldquo;education,&amp;rdquo;
often as an all-company announcement for existing employees.
They might follow up by updating training for onboarding new-hires.
Education sounds great, but
a couple of trainings will never change organizational behavior.&lt;/p&gt;
&lt;p&gt;Changing behavior requires ongoing reminders, visible role models,
inspection to understand why some teams are &lt;em&gt;not&lt;/em&gt; adopting the behavior,
and so on. Education can be a good component of operationalizing a policy,
but it cannot stand on its own.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Mandatory recurring trainings:&lt;/strong&gt;
These are a staple of compliance driven policies,
generally because of laws which require providing a certain number
of hours of relevant training each year.&lt;/p&gt;
&lt;p&gt;There are two deep challenges with mandatory trainings.
First, because attendance is &lt;em&gt;required&lt;/em&gt;, people tend to make little effort
to make the content good.
Second, many folks don&amp;rsquo;t pay attention because they expect the content
to be low quality.
It&amp;rsquo;s not uncommon to hear people say that they&amp;rsquo;ve never heard of a policy that
they&amp;rsquo;ve performed annual training on for multiple years.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s possible to overcome these barriers, but in a situation where you&amp;rsquo;re
accountable for changing outcomes, as opposed to shifting legal obligations away from the company,
these tend to work poorly.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Just change the culture.&lt;/strong&gt;
Some leaders frame most problems as cultural problems, which is a reasonable frame: most things can be usefully viewed as a cultural problem.
Unfortunately, it&amp;rsquo;s common for those who rely heavily on the cultural frame to also have a simplistic view about
how culture is changed.&lt;/p&gt;
&lt;p&gt;Changing an organization&amp;rsquo;s culture is tricky, and requires a combination of many
techniques to create visible leaders role modeling the new behavior,
and reinforcement mechanisms to ensure pockets of dissent are weeded out.
Anyone who frames culture change as a simple or instant change is
living in an imaginary world.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you&amp;rsquo;re using one of these approaches, it isn&amp;rsquo;t
necessarily a bad choice. Instead, you should just make sure you can
explain why you&amp;rsquo;re using it, and then you need to also make sure you
believe that explanation. If you don&amp;rsquo;t, look for a mechanism from the
earlier&lt;/p&gt;
&lt;h2 id="what-if-youre-not-an-executive"&gt;What if you&amp;rsquo;re not an executive?&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s easy to get discouraged when you think about which operational mechanisms
are available to you as a non-executive. So many of the frequently seen mechanisms
like running mandatory recurring meetings, or a binding architecture review process
are not accessible to you.&lt;/p&gt;
&lt;p&gt;That is true: they&amp;rsquo;re not accessible to you.
However, there&amp;rsquo;s always a related mechanism that
can be implemented with less authority.
The binding architecture process can be replaced with an architectural advice process.
The mandatory review of pull requests can be replaced with a nudge.&lt;/p&gt;
&lt;p&gt;Although it may be more common to see the authoritative mechanisms in the companies
you work in, my experience working as an executive is that these authoritative mechanisms
don&amp;rsquo;t work particularly well. They do a great job of technically shifting accountability
to the wider organization, but they often don&amp;rsquo;t change behavior at all.
So, instead of getting frustrated by what you can&amp;rsquo;t do, focus instead on the mechanisms
that are available to you today. Add nudges, focus on the real dynamics of how colleagues
do work in your organization, and build a real dataset.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s very hard to get an executive to support your initiative before the mechanisms
and data exist to support it, and very easy to get their support once they do.
Once you&amp;rsquo;ve done what you can without authority to build confidence,
if you really do need more authority,
then you&amp;rsquo;re in a good place to escalate to get an executive to support your policies.&lt;/p&gt;
&lt;h2 id="beware-cargo-culting"&gt;Beware cargo-culting&lt;/h2&gt;
&lt;p&gt;The longer that I am in the industry, the more I am surprised
by how few strategists seem to care if their approach actually works.
Instead, they seem focused on doing something that &lt;em&gt;might&lt;/em&gt; work, offloading
accountability to either the organization or some team, and then moving off to
the next problem.&lt;/p&gt;
&lt;p&gt;Perhaps this is driven by an unfortunate reality that leaders are often evaluated
by how they appear, rather than by what they accomplish.
Whether or not that&amp;rsquo;s the underlying reason for why it happens,
it does make it surprisingly difficult to know which patterns to borrow
from strategy rollouts and implementations.&lt;/p&gt;
&lt;p&gt;The best advice, unfortunately, is to remain skeptically optimistic.
Collect ideas widely, but force the ideas to prove their merit.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Now that you&amp;rsquo;ve finished this chapter,
you&amp;rsquo;re significantly more qualified to
write a complete, useful strategy than I was a decade into my career.
Often skipped, the operations behind your strategy are at least as
essential as any other step, and any strategy without them will fade quietly into
your organization&amp;rsquo;s history.&lt;/p&gt;
&lt;p&gt;In addition to being able to rollout a strategy of your own,
this chapter also provides a useful rescue toolkit you can use
to put an existing, floundering strategy back on track.
If you don&amp;rsquo;t see an opportunity to write new strategy within your
organization, then there&amp;rsquo;s still probably room to flex
your operational skill.&lt;/p&gt;</description></item><item><title>AI Companion / Next Steps</title><link>https://craftingengstrategy.com/aic/next-steps/</link><pubDate>Sat, 15 Mar 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/aic/next-steps/</guid><description>&lt;p&gt;In order to reach this final chapter of the &lt;em&gt;AI Companion to Crafting Engineering Strategy&lt;/em&gt;,
you&amp;rsquo;ve co-written and revised a strategy with an LLM. We&amp;rsquo;ve also used both in-context learning examples and
Model Context Protocol servers to prime an LLM to work on complex, domain-specific problems such as
creating a systems model or Wardley map.&lt;/p&gt;
&lt;p&gt;The two biggest remaining questions to engage with at this point are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;How should you &lt;em&gt;actually&lt;/em&gt; use the LLM-optimized edition of &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt;
going forward?&lt;/li&gt;
&lt;li&gt;Is the concept of an LLM-optimized book actually a valuable one?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These two questions are really &lt;em&gt;the&lt;/em&gt; key questions when it comes to evaluating this project,
and will help determine whether this format represents a meaningful advance in how
books are released, or whether it&amp;rsquo;s merely a hacky concept to be forgotten.&lt;/p&gt;
&lt;h2 id="using-the-llm-optimized-book-in-practice"&gt;Using the LLM-optimized book in practice&lt;/h2&gt;
&lt;p&gt;One of the gifts of writing down what I&amp;rsquo;ve learned over the past two decades
is that I can load that writing into the context window of any LLM, and have
that context improve the model&amp;rsquo;s generation.
At this point, I do all my LLM work in a context-rich project to improve the generated responses,
and for strategy related topics, that means including this book.&lt;/p&gt;
&lt;p&gt;That approach, ambiently including this book in your context window when you do strategy work,
is my best suggestion for getting long-term value out of it.
That&amp;rsquo;s in addition, certainly, to utilizing the techniques and examples from this companion text
when you are actively developing your next engineering strategy.&lt;/p&gt;
&lt;p&gt;A few other experiments that are worth trying in your organization are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use as a strategy coach or mentor to guide your strategy practice
along the lines described in &lt;a href="https://craftingengstrategy.com/getting-better/"&gt;How to get better at strategy?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Rewrite your existing strategies into consistent, readable formats&lt;/li&gt;
&lt;li&gt;Identify recurring topics and themes in Architecture Decision Records that
could be moved into a durable strategy rather than frequently rehashed&lt;/li&gt;
&lt;li&gt;Summarize your organization&amp;rsquo;s current strategy altitudes, to facilitate a
structured discussion about which kinds of decisions are being made where&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Depending on your current writing culture, some of these will work better than others.
That said, as you test them out, I&amp;rsquo;m confident you&amp;rsquo;ll find more opportunities as well.&lt;/p&gt;
&lt;h2 id="are-llm-optimized-books-a-gimmick"&gt;Are LLM-optimized books a gimmick?&lt;/h2&gt;
&lt;p&gt;Many authors are concerned about LLMs stealing their work and their livelihood.
This is not an abstract concern, but a very real one: already we see search traffic
decreasing for many websites, as folks get answers directly from LLMs.
LLMs that have been trained on those sites&amp;rsquo; content, but which aren&amp;rsquo;t compensating
the authors for their usage.&lt;/p&gt;
&lt;p&gt;As web search became pervasive in the 1990s, there were similar debates about &lt;a href="https://en.wikipedia.org/wiki/Fair_use"&gt;fair use&lt;/a&gt;.
Eventually the courts decided on the exact parameters of what fair use means in the context of web crawlers,
and the debate has faded as websites learned to take advantage of the reach created by search engines.
The status quo was not preserved, and left the newspaper industry irrevocably changed,
but a new status quo was eventually established.
Today, LLMs are in a similar moment, where authors need to discover the opportunity
that LLMs represent, while also grappling with the reality that they may
significantly change how books are read and sold.&lt;/p&gt;
&lt;p&gt;Working with O&amp;rsquo;Reilly to release the LLM-optimized edition of &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt;,
along with this &lt;em&gt;AI Companion&lt;/em&gt; is a joint experiment in finding what might work.
The market for people buying LLM-optimized books is essentially non-existent today,
but it&amp;rsquo;s a problem that some of the most thoughtful people I know have mentioned
struggling with, and that I think this approach solves well.&lt;/p&gt;
&lt;p&gt;The most extreme version of this future is one where people&amp;rsquo;s entire collection of books is stored in a personal library
that is accessible to LLM agents running on their behalf.
Another vision is that reading behavior doesn&amp;rsquo;t change much, with humans mostly reading on behalf of humans.
Although I&amp;rsquo;m not sure which of those we&amp;rsquo;ll be living in next year or next decade,
my best guess is a bit of both.&lt;/p&gt;
&lt;p&gt;While this approach has its imperfections, I&amp;rsquo;m personally finding it useful today,
and I hope it makes it easier to write effective engineering strategies
for your organization.&lt;/p&gt;</description></item><item><title>AI Companion / Generating Wardley Maps</title><link>https://craftingengstrategy.com/aic/generate-wardley-map-with-llm/</link><pubDate>Sat, 15 Mar 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/aic/generate-wardley-map-with-llm/</guid><description>&lt;p&gt;When I was drafting the first chapters of
&lt;em&gt;Crafting Engineering Strategy&lt;/em&gt;, I thought I might
write a chapter on GitLab&amp;rsquo;s strategy. GitLab
is interesting because it has publicly shared so
much of how they operate, that I figured it might
be possible to write an interesting strategy document about them
despite never having worked there.&lt;/p&gt;
&lt;p&gt;In the end, I decided not to write that chapter,
but I did create a &lt;a href="https://lethain.com/wardley-gitlab-strategy/"&gt;Wardley map exploring GitLab&amp;rsquo;s strategy&lt;/a&gt;.
That Wardley map was not included in the final edition of the book,
including not being in the LLM-optimized format either,
making it a perfect test case for whether an LLM can help us generate
an effective wardley map.&lt;/p&gt;
&lt;h1 id="naive-approach"&gt;Naive approach&lt;/h1&gt;
&lt;p&gt;A key part of creating a Wardley map is creating the value chains, as discussed in
the chapter on &lt;a href="https://craftingengstrategy.com/wardley-mapping/"&gt;refining strategy with Wardley maps&lt;/a&gt;.
Creating the value chain is absolutely something an LLM can help with,
but value chains are just text, so we&amp;rsquo;ve already covered effective patterns
that apply to creating them.&lt;/p&gt;
&lt;p&gt;However, creating an actual Wardley map image is definitely not something we&amp;rsquo;ve covered.
As such, we&amp;rsquo;ll start from the discarded chapter&amp;rsquo;s user and value chains (&lt;a href="https://gist.github.com/lethain/444cd679c361ae97c9be63383ac34ce1"&gt;available in this Gist&lt;/a&gt;),
with the goal of converting it into a Wardley map.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll start by uploading the Wardley mapping chapter,
along with completed Wardley maps represented as images.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/cas-wardley-files.png" alt="Files uploaded for project"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Files uploaded for project&lt;/p&gt;
&lt;p&gt;This project only has the Wardley mapping chapter, rather than the entire book,
in order to create more space in the project&amp;rsquo;s context window for the images.
If you used &lt;code&gt;llm.py&lt;/code&gt; and an LLM with a large context window, you could include
the entire book as normal.&lt;/p&gt;
&lt;p&gt;Once we have the project set up, we use the user and value chains
(as a reminder, (&lt;a href="https://gist.github.com/lethain/444cd679c361ae97c9be63383ac34ce1"&gt;available here&lt;/a&gt;))
along with a short prompt to generate a Wardley map:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Create a wardley map image using the template
in this project these value chains:
{user and value chains text}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using ChatGPT 4o, this generated a Wardley map diagram. Well, sort of.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/cas-wardley-render.png" alt="An aspirational Wardley map that went wrong"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;An aspirational Wardley map that went wrong&lt;/p&gt;
&lt;p&gt;The image has a lot of superficially Wardley map-like aspects,
but it&amp;rsquo;s just not at all a Wardley map. For each correctly spelled label on the x-axis,
there&amp;rsquo;s a nonsensical word in a box elsewhere on the map.
I think we can conclude that direct image generation is not going to
be the path forward for using an LLM to create a Wardley map.&lt;/p&gt;
&lt;h1 id="shifting-complexity-to-a-tool"&gt;Shifting complexity to a tool&lt;/h1&gt;
&lt;p&gt;Ok, that worked surprisingly badly.
Generally, when this happens, the secret is to shift complexity
to a tool. Frequently the best way to do so is creating a domain-specific language (DSL)
that allows LLMs to work in a format they love&amp;ndash;text!&amp;ndash;and pushes the other work
out of the LLM. That is what we did in the &lt;a href="https://craftingengstrategy.com/aic/generate-systems-model-with-llm/"&gt;last chapter&lt;/a&gt;
with the &lt;code&gt;systems&lt;/code&gt; library and the &lt;code&gt;systems-mcp&lt;/code&gt; server.&lt;/p&gt;
&lt;p&gt;Creating a DSL for Wardley maps would be a somewhat painful undertaking.
Fortunately, there&amp;rsquo;s already a DSL for writing Wardley maps, Damon Skelhorn&amp;rsquo;s
&lt;a href="https://github.com/damonsk/onlinewardleymaps"&gt;OnlineWardleyMaps&lt;/a&gt;.
(Many thanks to &lt;a href="https://hiredthought.com/"&gt;Ben Mosior&lt;/a&gt;, whose
&lt;a href="https://learnwardleymapping.com/2024/06/24/top-5-wardley-mapping-tools-for-2024/"&gt;post&lt;/a&gt;
introduced me to OnlineWardleyMaps.)&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve created a &lt;a href="https://gist.github.com/lethain/c713e9fa378bd7469f7914819a9a7b24"&gt;file with several Wardley map DSL examples&lt;/a&gt;,
which you can load into your project. After doing that, let&amp;rsquo;s redo the above prompt,
this time prefacing it with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Create a wardley map for the below value,
using the OnlineWardleyMaps syntax.
{same value chains as above, omitted for brevity}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The LLM generated &lt;a href="https://gist.github.com/lethain/bbeea37c43ef7a45e916ca9fc866969c"&gt;this output&lt;/a&gt;,
which I added to &lt;a href="https://www.onlinewardleymaps.com/"&gt;OnlineWardleyMaps.com&lt;/a&gt;, and successfully
generated a compelling Wardley map.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/cas-online-wardley.png" alt="OnlineWardleyMap output from generated specification"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;OnlineWardleyMap output from generated specification&lt;/p&gt;
&lt;p&gt;To my eye, this is surprisingly good. There are absolutely pieces I would tweak, but that&amp;rsquo;s the
nature of any Wardley map. Getting a rough draft up and working is exceptionally valuable,
because it&amp;rsquo;s much easier to critique and iterate on something that exists.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In this chapter, we experienced the push and pull experience of working with LLMs
to solve messy problems. Our first attempt to create a Wardley map by working with
images resulted in an unusable mess. We then formulated a new approach, taking advantage
of the existing OnlineWardleyMaps format, to translate the problem into a format that
LLMs are adept at, DSLs.&lt;/p&gt;
&lt;p&gt;You can now write a complex Wardley map rapidly, in a format that you can share quickly,
and without having to learn the messy details of that format yourself.
That&amp;rsquo;s pretty remarkable in my mind.&lt;/p&gt;</description></item><item><title>AI Companion / Generating Systems Models</title><link>https://craftingengstrategy.com/aic/generate-systems-model-with-llm/</link><pubDate>Sat, 15 Mar 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/aic/generate-systems-model-with-llm/</guid><description>&lt;p&gt;&lt;em&gt;Crafting Engineering Strategy&lt;/em&gt; has a chapter on &lt;a href="https://craftingengstrategy.com/systems-modeling/"&gt;systems modeling&lt;/a&gt;
along with a number of examples. Those examples focus on using the &lt;a href="https://github.com/lethain/systems"&gt;&lt;code&gt;lethain/systems&lt;/code&gt; python library&lt;/a&gt;
to generate models using a &lt;a href="https://jupyter.org/"&gt;Jupyter notebook&lt;/a&gt;,
with examples in &lt;a href="https://github.com/lethain/eng-strategy-models"&gt;&lt;code&gt;lethain/eng-strategy-models&lt;/code&gt;&lt;/a&gt;.
That is a reasonable approach, but it also requires learning the &lt;code&gt;systems&lt;/code&gt; library&amp;rsquo;s
syntax for modeling.&lt;/p&gt;
&lt;p&gt;This chapter looks at how to use an LLM to write the system model syntax for you,
without requiring learning how to use that syntax in great detail.
In addition to being specific instructions for working with the &lt;code&gt;systems&lt;/code&gt; library,
this is also a generalizable pattern for using LLMs to work with domain-specific languages.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll cover:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Adding problem-specific resources to your context window to prime LLMs to solve complex problems&lt;/li&gt;
&lt;li&gt;Using an LLM to write systems model in a proper &lt;code&gt;systems&lt;/code&gt; specification&lt;/li&gt;
&lt;li&gt;Having an LLM walk you through running a script to run a systems model&lt;/li&gt;
&lt;li&gt;Using a model context protocol (MCP) server to simplify the process of running a systems model,
and an &lt;a href="https://support.anthropic.com/en/articles/9487310-what-are-artifacts-and-how-do-i-use-them"&gt;Claude Artifact&lt;/a&gt;
to explore the results&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following instructions specifically build on the Anthropic Claude.ai Project setup, but
should be adaptable to other approaches, especially when support for local MCP servers
is more widely adapted to environments other than Claude Desktop.&lt;/p&gt;
&lt;h2 id="adding-systems-modeling-instructions"&gt;Adding systems modeling instructions&lt;/h2&gt;
&lt;p&gt;In the first chapter, &lt;a href="https://craftingengstrategy.com/aic/foundations-of-collaboration/"&gt;Foundation of Collaboration&lt;/a&gt;,
we configured a project to include &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt; in the context
window. We&amp;rsquo;re going to continue that technique, adding more detailed instructions for
creating systems models into our context window.&lt;/p&gt;
&lt;p&gt;Start by retrieving a text copy of the
&lt;a href="https://raw.githubusercontent.com/lethain/systems/refs/heads/master/README.md"&gt;&lt;code&gt;README.md&lt;/code&gt; from &lt;code&gt;lethain/systems&lt;/code&gt;&lt;/a&gt;.
That file explains the syntax and usage of the library in detail, and ought to be enough instruction for
the model to specify models.
Once you have the file, upload it into your Anthropic project with your LLM-optimized
version of &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-sm-add-readme.png" alt="Adding README.md to project&amp;rsquo;s input context"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Adding README.md to project's input context&lt;/p&gt;
&lt;p&gt;This will make the README&amp;rsquo;s contents available in the context window, serving as in-context learning (ICL)
examples for generating models.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-sm-setup.png" alt="Project overview with both book and README contents"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Project overview with both book and README contents&lt;/p&gt;
&lt;p&gt;After adding the README, your project will include both its contents and also the book&amp;rsquo;s chapters.
Now it&amp;rsquo;s time to start writing a systems model. This is similar to the hiring model
specified in &lt;a href="https://lethain.com/modeling-hiring-funnel-systems/"&gt;Modeling a hiring funnel with Systems library&lt;/a&gt;,
but that model is not included in &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt; or the &lt;code&gt;README.md&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="creating-a-model"&gt;Creating a model&lt;/h2&gt;
&lt;p&gt;Start by writing a general description of the system you want to create.
The prompt I started with is:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Help me write a systems model specification for
systems library to model a hiring pipeline system where:
1. First stock is infinite stock of &amp;quot;potential candidates&amp;quot;
2. Second stock is &amp;quot;outreaches&amp;quot;, with an in flow of 10 from
&amp;quot;potential candidates&amp;quot;
3. Third stock is &amp;quot;interested&amp;quot;, with an in flow of 50%
from &amp;quot;outreaches&amp;quot;
4. Fourth stock is &amp;quot;active&amp;quot;, with an in flow of 50%
from &amp;quot;interested&amp;quot;
5. Fifth stock is &amp;quot;offers&amp;quot; with an in flow of 10% from &amp;quot;active&amp;quot;
6. Sixth and final stock is &amp;quot;hires&amp;quot; with an in flow of
70% from &amp;quot;offers&amp;quot;
All of these flows with percentages should be modeled using
&amp;quot;Leak&amp;quot; flows, for example:
Outreaches &amp;gt; Interested @ Leak(0.5)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is a detailed description, although only the last few sentences about Leaks requires specifically knowing the &lt;code&gt;systems&lt;/code&gt; syntax.
We&amp;rsquo;ll come back to the &amp;ldquo;Leak&amp;rdquo; versus &amp;ldquo;Conversion&amp;rdquo; distinction a bit later.
The biggest thing to take away is that, even if we don&amp;rsquo;t need to know &lt;code&gt;systems&lt;/code&gt; syntax,
you absolutely &lt;em&gt;do&lt;/em&gt; still need to understand how systems models work to write this specification.&lt;/p&gt;
&lt;p&gt;From that prompt, Claude generates a complete, usable model, as well as links to a handful of examples of
other systems models. If you ran into a problem, those examples might be helpful to learn from,
but you could also just convert them into Markdown and include them in your prompt to help the LLM better
solve any errors it generated.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-sm-model-1.png" alt="Claude.ai generates working systems syntax from natural language prompt"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Claude.ai generates working systems syntax from natural language prompt&lt;/p&gt;
&lt;p&gt;Now that we have a working prompt, you may have recognized that the prior hint to use
&lt;code&gt;Leak&lt;/code&gt; doesn&amp;rsquo;t really make that much sense.
That&amp;rsquo;s because &lt;code&gt;Leaks&lt;/code&gt; model a portion of candidates progressing forward
in the hiring pipeline, unfortunately they model the candidates who don&amp;rsquo;t progress as staying in the prior
stock.&lt;/p&gt;
&lt;p&gt;That isn&amp;rsquo;t quite right: candidates who don&amp;rsquo;t move forward in a hiring funnel don&amp;rsquo;t stay in the
prior stock, they leave the hiring funnel. For example, a candidate who doesn&amp;rsquo;t get an offer doesn&amp;rsquo;t stay in
&lt;code&gt;Active&lt;/code&gt;, they are removed from the hiring process entirely.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s fix this by asking the LLM to use a &lt;code&gt;Conversion&lt;/code&gt; instead of a &lt;code&gt;Leak&lt;/code&gt;, using a prompt
such as:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Ok, this is a good start. Could you change the leaks
to be conversions, since the others are actually getting lost?
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can see the output of that prompt showing a fixed model.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-sm-model-2.png" alt="Claude.ai switching from Leaks to Conversions"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Claude.ai switching from Leaks to Conversions&lt;/p&gt;
&lt;p&gt;As we&amp;rsquo;ve shown in this section, if you include enough appropriate documentation, LLMs are surprisingly good at writing
valid syntax. The challenge is understanding your goals well enough to provide that documentation.&lt;/p&gt;
&lt;h2 id="running-the-model"&gt;Running the model&lt;/h2&gt;
&lt;p&gt;Once you have the working syntax for a systems model, the next step is getting
data out of that model.
While I personally run models in a &lt;a href="https://jupyter.org/"&gt;Jupyter Notebook&lt;/a&gt;
as demonstrated in &lt;a href="https://github.com/lethain/eng-strategy-models"&gt;&lt;code&gt;lethain/eng-strategy-models&lt;/code&gt;&lt;/a&gt;,
you can also rely on the LLM to walk you through running the models locally.&lt;/p&gt;
&lt;p&gt;Continuing the above prompt, let&amp;rsquo;s ask for instructions to run our new model:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Ok, that seems like a good systems model. Please write
me the python script that uses the &amp;quot;systems&amp;quot; python
library to run that model and generate the output as a table.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This generates instructions for installing &lt;code&gt;systems&lt;/code&gt; locally,
along with a script to run the model.
The full script is available in &lt;a href="https://gist.github.com/lethain/34ca7ad6c65c59dbfb0235d49bf55245"&gt;this Gist&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-sm-code.png" alt="Textual table of output from a systems model"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Textual table of output from a systems model&lt;/p&gt;
&lt;p&gt;If you follow the steps to install and run the script, you&amp;rsquo;ll get this table as output.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-sm-output.png" alt="Textual table showing output from systems model"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Textual table showing output from systems model&lt;/p&gt;
&lt;p&gt;You can then include that table, along with a prompt asking Claude.ai to create
an &lt;a href=""&gt;artifact&lt;/a&gt; to explore the data:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;This is the library output, please produce an artifact
that shows chart of active candidates over time:
{table data omitted for brevity}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will then generate this Artifact visualizing the dataset as a chart.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-render-chart.png" alt="Claude.ai Artifact showing a chart of system model output"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Claude.ai Artifact showing a chart of system model output&lt;/p&gt;
&lt;p&gt;This approach works, but it&amp;rsquo;s rather awkward, requiring moving between the chat interface and the terminal.
What we really want is an approach that&amp;rsquo;s better than a Jupter notebook, and this is decidedly worse.
Fortunately, we can do better.&lt;/p&gt;
&lt;h2 id="using-a-model-context-protocol-server"&gt;Using a Model Context Protocol server&lt;/h2&gt;
&lt;p&gt;Now that we&amp;rsquo;ve completed te experiment of relying on the LLM
to guide us through every step of the model, run and render pipeline,
we can admit something important: some of this wasn&amp;rsquo;t that useful.
A better approach would allow us to run every step from within the chat interface.&lt;/p&gt;
&lt;p&gt;Fortunately, we can provide that experience by creating a Model Context Protocol (MCP) server
and exposing it to the LLM. MCP servers provide tools to the LLM, and also instructions on when
and how to use that tool. A simple MCP server might offer a tool that retrieves the current weather,
performs an API call against a service, or queries search index for results.
In this case, an MCP server can also use the &lt;code&gt;systems&lt;/code&gt; library to run a systems model,
as shown in the &lt;a href="https://github.com/lethain/systems-mcp"&gt;lethain/systems-mcp&lt;/a&gt; library.&lt;/p&gt;
&lt;p&gt;That repository includes installation instructions to configuring it to run with
your Claude Desktop setup that we configured in the &lt;a href="https://craftingengstrategy.com/aic/foundations-of-collaboration/"&gt;Foundations chapter&lt;/a&gt;.
Once you have it installed, you can run and render a model as described in the following steps.&lt;/p&gt;
&lt;p&gt;This MCP server exposes two tools, &lt;code&gt;load_systems_documentation&lt;/code&gt; and &lt;code&gt;run_systems_model&lt;/code&gt;.
The first of those tools, &lt;code&gt;load_systems_documentation&lt;/code&gt;, injects project specific documentation
and examples into the context window to serve as in-context learning, to improve the quality
of generated models.
The other tool, &lt;code&gt;run_systems_model&lt;/code&gt;, takes a model specification and runs it,
returning the results in JSON format.&lt;/p&gt;
&lt;p&gt;Here is an example of Claude.ai using both tools to generate
a systems model for a social network. Note that none of the
examples relate to social networks, so this is a genuinely
new creation rather than a recreation of an explicit item
within the loaded context.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/sys-mcp-load-prompt.png" alt="Claude.ai using two tools to generate a systems model"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Claude.ai using two tools to generate a systems model&lt;/p&gt;
&lt;p&gt;After creating the output from a systems model, we still
need to explore that data. In a Jupyter Notebook, we would
render different cuts in still charts. In Claude.ai, we can
ask it to generate an Artifact to explore this dataset via:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Create an artifact to render the output of the last
systems model run in a chart and allow me to select
which columns to include.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will generate an artifact along these lines, allowing you
to select among which columns to render.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/sys-mcp-load-artifact.png" alt="Claude.ai Artifact for exploring system model results"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Claude.ai Artifact for exploring system model results&lt;/p&gt;
&lt;p&gt;At this point, we&amp;rsquo;ve created something that is more powerful than a Jupyter Notebook&amp;ndash;because it
will generate the systems model for you&amp;ndash;and also allows the same sort of exploratory
scenarios. If we apply a more critical eye, we can correctly observe that the Artifact only
included five rounds rather than all fifty. We &lt;em&gt;can&lt;/em&gt; fix it by asking:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Include all 50 rounds in the artifact, not just 5
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;However, we really shouldn&amp;rsquo;t need to ask. So it&amp;rsquo;s not quite perfect, but you can make it work fairly well.
It&amp;rsquo;s also easy to imagine a near future where tools and Artifacts are better connected, allowing
the LLM to directly connect tools to Artifacts rather than requiring the LLM to indirectly connect them
by replicating data.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In this chapter, we started with a complex problem, and broke it down into components that an LLM can solve.
We used documentation to prime the LLM to write correct syntax. We loaded a Model Context Protocol server
to push math-heavy work into software that knows how to quickly and accurately perform the math.
We used a Claude Artifact to not just chart the resulting data, but to explore that rendered data.&lt;/p&gt;
&lt;p&gt;This combination of approaches not only solved the particular problem of generating and running systems models,
it&amp;rsquo;s also a generalizable approach to making messy problems fit with LLM-driven approaches.
What worked here will work elsewhere, including&amp;ndash;we hope&amp;ndash;for helping us generate Wardley maps
in the next chapter.&lt;/p&gt;</description></item><item><title>AI Companion / Reviewing and Editing</title><link>https://craftingengstrategy.com/aic/reviewing-strategy-with-llm/</link><pubDate>Sat, 15 Mar 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/aic/reviewing-strategy-with-llm/</guid><description>&lt;p&gt;In the last chapter, we &lt;a href="https://craftingengstrategy.com/aic/cowriting-with-llm/"&gt;co-wrote a strategy with an LLM&lt;/a&gt;.
Now, we&amp;rsquo;re going to review that strategy, looking for areas that we can improve.
Ideally, every organization would have someone ready to provide feedback on your
documents quickly, but that&amp;rsquo;s often not the case whether they&amp;rsquo;re busy or
simply don&amp;rsquo;t exist, and these techniques are a useful stand-in during those cases.&lt;/p&gt;
&lt;p&gt;In this chapter we&amp;rsquo;ll cover:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Identifying weaknesses in a strategy document using an LLM&lt;/li&gt;
&lt;li&gt;Using an LLM to summarize and narrow feedback to provide to the strategy&amp;rsquo;s author&lt;/li&gt;
&lt;li&gt;Advising an LLM on how to address raised feedback
to rewrite an existing strategy based on your evaluation
of the flagged concerns&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By the chapter&amp;rsquo;s end, you&amp;rsquo;ll have identified issues in the last chapter&amp;rsquo;s strategy,
communicated those issues concisely, and rewritten the strategy to address that feedback.&lt;/p&gt;
&lt;h1 id="skimming-for-flaws"&gt;Skimming for flaws&lt;/h1&gt;
&lt;p&gt;Taking the &lt;a href="https://gist.github.com/lethain/754244b8825c5d5e34b6bf7d1d019b7b"&gt;text of the strategy&lt;/a&gt;,
we&amp;rsquo;re going to use a reasoning model to look for gaps. We&amp;rsquo;ll do that by using a reasoning model
such as OpenAI&amp;rsquo;s o3 or Anthropic&amp;rsquo;s Claude Opus 4 with extended thinking, and the prompt:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Look for reasoning errors and unsubstantiated claims in this strategy:
{text of strategy document}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Running in the project we set up in &lt;a href="https://craftingengstrategy.com/aic/foundations-of-collaboration/"&gt;Foundations of collaboration&lt;/a&gt;,
this prompt identified a number of reasoning errors, unsubstantiated claims, and missing elements.
Let&amp;rsquo;s review each category of identified issues.&lt;/p&gt;
&lt;p&gt;First, it found a number of reasoning errors. These are errors in the thinking underpinning the strategy,
ranging from lack of causal analysis (for example, do the diagnoses actually tie to the policies?)
to a survivorship bias in the exploration (for example, what about considering examples where the
policies go poorly, in addition to those where they were successful?).&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-reasoning-errors.png" alt="Reasoning errors identified by Claude Opus 4"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Reasoning errors identified by Claude Opus 4&lt;/p&gt;
&lt;p&gt;Next, it identified a number of unsubstantiated claims. These are a mix of issues in the diagnosis,
such as the lack of evidence that senior engineers are spending significant time in architectural debates,
and arbitrary goals in suggested policies. Both categories are valid issues that ought to be addressed
in an effective strategy document.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-unsubstantiated.png" alt="Unsubstantiated claims flagged by Claude Opus 4"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Unsubstantiated claims flagged by Claude Opus 4&lt;/p&gt;
&lt;p&gt;Most valuably, it also identified a series of entirely missing elements.
Of particular value, it identified that there was no strategy testing plan,
which I would agree is probably the most important gap in the entire plan.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-missing-elements.png" alt="Missing elements detected by Claude Opus 4"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Missing elements detected by Claude Opus 4&lt;/p&gt;
&lt;p&gt;The feedback here is genuinely very useful, and identifies a number of places where
we definitely should improve the initial writeup. The biggest weakness in this feedback
is that there&amp;rsquo;s simply too much of it. Fortunately, we can whittle down
the volume to what&amp;rsquo;s most useful to the author.&lt;/p&gt;
&lt;h2 id="summarizing-feedback"&gt;Summarizing feedback&lt;/h2&gt;
&lt;p&gt;Imagine for a moment, that this is a strategy that someone else wrote,
that we want to provide feedback on. We wouldn&amp;rsquo;t want to provide all of this
feedback, instead we&amp;rsquo;d want to identify the top 3-5 issues, and make suggestions
on how they might be addressed.&lt;/p&gt;
&lt;p&gt;Reading through the above feedback, the three biggest issues from my perspective are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Identify what we can measure to determine if the AAG becomes a bottleneck (e.g. time from requested review to review being performed)&lt;/li&gt;
&lt;li&gt;Add a 4 week pilot phase for the SRE organization, which we&amp;rsquo;ll use to evaluate and refine
expanding this to all of Engineering&lt;/li&gt;
&lt;li&gt;Eliminate all arbitrary success metrics &lt;em&gt;other&lt;/em&gt; than those related to elapsed time
(e.g. 90% ADR compliance is not a great measure unless we&amp;rsquo;re using that exclusively
to understand why the 10% aren&amp;rsquo;t happening)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We can then ask the LLM to summarize our feedback concisely, along with our suggestions
on how the authors might address the feedback.
Continuing the conversation that generated the initial feedback from the LLM,
try a prompt like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Rewrite the above feedback to focus on these three areas of feedback,
eliminate most other feedback unless it seems essential, and make it
as concise as possible:
1. Identify what we can measure to determine if the AAG becomes
a bottleneck (e.g. time from requested review to review
being performed)
2. Add a 4 week pilot phase for the SRE organization, which
we’ll use to evaluate and refine expanding this to all
of Engineering
3. Eliminate all arbitrary success metrics other than those
related to elapsed time (e.g. 90% ADR compliance is not
a great measure unless we’re using that exclusively to
understand why the 10% aren’t happening)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The LLM then generates this focused version of the prior feedback,
&lt;a href="https://gist.github.com/lethain/f741cd06fdec1deca59ed5b501035cbe"&gt;available in Gist&lt;/a&gt;.
This is much more along the lines of what I would provide to a strategy&amp;rsquo;s author
than the voluminous original feedback.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-revised-feedback.png" alt="Revised strategy feedback from LLM"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Revised strategy feedback from LLM&lt;/p&gt;
&lt;p&gt;You might want to rework the tone a bit with another pass,
but generally I think the particulars make sense.
This overall approach, of using the LLM to brainstorm ideas
and then focus down on the particulars, is an effective approach.&lt;/p&gt;
&lt;h2 id="incorporating-feedback"&gt;Incorporating feedback&lt;/h2&gt;
&lt;p&gt;For this section, we&amp;rsquo;re changing perspectives once again,
returning to the perspective of the strategy&amp;rsquo;s author, who has
just received the above feedback.&lt;/p&gt;
&lt;p&gt;Going to our &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt; project,
we&amp;rsquo;ll add both the &lt;a href="https://gist.github.com/lethain/754244b8825c5d5e34b6bf7d1d019b7b"&gt;original strategy document&lt;/a&gt;
and the &lt;a href="https://gist.github.com/lethain/f741cd06fdec1deca59ed5b501035cbe"&gt;feedback&lt;/a&gt; to the chat,
and start with this prompt explaining how to specifically address the feedback:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;I want to incorporate the below feedback into our strategy document
on engineering decision making. Follow this advice on how to
incorporate the feedback:
1. &amp;quot;Measure AAG Bottleneck Risk&amp;quot; -- prioritize addressing this,
we should measure using time feedback is requested in a Slack
channel until feedback is provided by the AAG as measured by
replying with a new document in the Slack thread
2. &amp;quot;Add 4-Week SRE Pilot Phase&amp;quot; -- yes, we should do this,
add this as note below the policy/operations section rather
than making a big deal about it
3. &amp;quot;Replace Arbitrary Success Metrics&amp;quot; -- remove any instances
of arbitrary success metrics, don't try to suggest better ones
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This generated an explanation of how the feedback was addressed,
and also &lt;a href="https://gist.github.com/lethain/5cf5be5a34fcd9dd827a916ddb15ca39"&gt;the full revised strategy document&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-address-feedback.png" alt="Summary of how the feedback was addressed"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Summary of how the feedback was addressed&lt;/p&gt;
&lt;p&gt;This worked well, and is a good reminder to apply the LLM-as-Intern technique:
frontier LLMs are fairly good at applying your explicitly stated direction,
but they&amp;rsquo;re still missing your good judgment. Be clear, and you&amp;rsquo;ll get the best
possible output.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In this chapter, we started with a strategy draft, generated feedback on,
refined that feedback, and then decided how to incorporate that feedback
into a revised draft of the strategy.
While I wouldn&amp;rsquo;t describe the strategy as perfect, it is
significantly improved from the initial version,
was quickly done, and we weren&amp;rsquo;t constrained by anyone else
getting around to providing feedback.&lt;/p&gt;
&lt;p&gt;If you work in an organization with significant senior engineering bandwidth,
then I&amp;rsquo;m certain you could already generate a document of this caliber quickly,
but not this fast. More importantly, I am certain your organization could write
a better strategy than this one, but many organizations simply don&amp;rsquo;t have the
bandwidth or the staff to do that. Maybe you&amp;rsquo;re the only Staff-plus engineer at
your company, or maybe you&amp;rsquo;re trying to iterate on a draft before the other knowledgable
engineer gets off a busy project. In those cases, I find these techniques surprisingly
useful.&lt;/p&gt;</description></item><item><title>AI Companion / Cowriting</title><link>https://craftingengstrategy.com/aic/cowriting-with-llm/</link><pubDate>Sat, 15 Mar 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/aic/cowriting-with-llm/</guid><description>&lt;p&gt;With your environment working, the obvious task to start with is
co-writing a strategy document with an LLM.
A good strategy document is not just readable, but has a clear view of
your current challenge and how to address it.&lt;/p&gt;
&lt;p&gt;This chapter will cover:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Using meta prompting to optimize our initial prompt for writing a strategy&lt;/li&gt;
&lt;li&gt;Incrementally building a strategy document by prompting each step with our meta prompted prompt,
and the outputs of prior steps&lt;/li&gt;
&lt;li&gt;Cleaning up our generated strategy into something well-formatted&lt;/li&gt;
&lt;li&gt;Remembering that the quality of a strategy rests in your &lt;em&gt;thinking&lt;/em&gt;,
and that quickly generating bad reasoning won&amp;rsquo;t solve real problems&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By the end of the chapter, we&amp;rsquo;ll have written a reasonably good strategy document
in a surprisingly short period of time.&lt;/p&gt;
&lt;h2 id="developing-our-prompt"&gt;Developing our prompt&lt;/h2&gt;
&lt;p&gt;To co-write a strategy with an LLM, the core of the work is stepping through each of
the &lt;a href="https://craftingengstrategy.com/strategy-steps/"&gt;five steps of building a strategy&lt;/a&gt;:
exploration, diagnosis, refinement, policy, and operations.
Before we can do that, we need to develop the prompt we&amp;rsquo;ll use
for those sections.&lt;/p&gt;
&lt;p&gt;The simplest prompt we could imagine creating is along the lines of:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;How should I explore a new engineering strategy that answers:
how do we make software architecture decisions in our company?
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This prompt will provide an answer, but my experience is that it will be very general
about the &lt;em&gt;process&lt;/em&gt; of exploring a strategy theme. That&amp;rsquo;s useful to some extent,
as it reminds us of the strategy process, but it doesn&amp;rsquo;t really help us co-write our strategy.&lt;/p&gt;
&lt;p&gt;From this basic starting point, we can improve this starter prompt in several ways.
First, we can develop a problem statement that summarizes the problem our strategy needs to solve.
A helpful prompt to create that problem statement is:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;I want to write a 2-3 sentence problem statement for
a new engineering strategy. Asking me one question
at a time until you can write a complete statement for me.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will lead to a series of back and forth questions and answers
until the LLM has collected enough information to write its problem
statement.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-cowrite-1.png" alt="Question and answers with Claude.ai to develop a problem statement"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Question and answers with Claude.ai to develop a problem statement&lt;/p&gt;
&lt;p&gt;Excluding the initial prompt asking for the problem statement, I needed to answer three
questions to generate this more helpful problem statement:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Our engineering organization lacks a clear, consistent process for
making software architecture decisions, leading to friction between
engineers who feel excluded from decisions versus those who feel
slowed down by lengthy approval processes. This ambiguity around
decision-making authority—particularly when a few highly opinionated
engineers can effectively overrule others' work—is reducing overall
engineering velocity and creating frustration across the team.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we can enhance our initial prompt to include the problem statement instead
of our ad-hoc one line problem statement from above:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Problem statement
{text of problem statement}
# Request
How should I explore an engineering strategy to solve the
above problem statement?
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will tend to generate more helpful output, for example giving concrete
suggestions about which book to read or what questions to asks peers.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-cowrite-2.png" alt="Example of recommendation exploration for strategy"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Example of recommendation exploration for strategy&lt;/p&gt;
&lt;p&gt;That being said, the generated text could still be more helpful.
Let&amp;rsquo;s start by meta prompting as discussed in &lt;a href="https://craftingengstrategy.com/aic/foundations-of-collaboration/"&gt;Foundations of collaboration&lt;/a&gt;
to improve the output.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-cowrite-3.png" alt="Meta prompting to improve strategy co-writing prompt"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Meta prompting to improve strategy co-writing prompt&lt;/p&gt;
&lt;p&gt;The improved prompt is quite long, so I encourage creating your own or retrieving it
&lt;a href="https://gist.github.com/lethain/38b6b2082339a544681055878c54c4a2"&gt;from this Gist&lt;/a&gt;.
(Here&amp;rsquo;s a version with &lt;a href="https://gist.github.com/lethain/e9b5e7f9fa936d38d2170e92b3c8cb96"&gt;the embedded problem statement&lt;/a&gt;.)
Now that we have an improved prompt, we can get started with the first stage of strategy creation: exploration.&lt;/p&gt;
&lt;h2 id="exploration"&gt;Exploration&lt;/h2&gt;
&lt;p&gt;The first step of exploration is to take &lt;a href="https://gist.github.com/lethain/e9b5e7f9fa936d38d2170e92b3c8cb96"&gt;our prompt&lt;/a&gt;
and update it to specify exploration:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{... updated prompt ...}
**Your request:** Write an exploration section to
address the above problem statement.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This generates a surprisingly comprehensive summary, whose full text you can
&lt;a href="https://gist.github.com/lethain/733d4f40cce62a86adfe960b65d4802e"&gt;read in this Gist&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-cowrite-4.png" alt="Excerpt from generated strategy exploration section"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Excerpt from generated strategy exploration section&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s important to note, that while the exploration is pretty interesting,
there are a number of areas where the output &lt;em&gt;sounds reasonable&lt;/em&gt; but is,
to the best of my knowledge, not particularly accurate.
For example, I&amp;rsquo;m familiar with the general meaning of Amazon&amp;rsquo;s bar raiser
and Amazon&amp;rsquo;s two pizza teams concepts, but either these are overloaded terms&amp;ndash;which
might well be the case&amp;ndash;or they are only very abstractly relevant to technical decision making.&lt;/p&gt;
&lt;p&gt;As a result, the next step is taking what was written, and editing it down to parts
that you actually agree with and can vouch for. For parts that I can&amp;rsquo;t vouch for,
I spent time doing some quick research to prove or disprove them.&lt;/p&gt;
&lt;p&gt;In my edit, I ended up with about &lt;a href="https://gist.github.com/lethain/d61bb97f137e976ebe0de86bbb85b0ff"&gt;half the original content&lt;/a&gt;,
but the remaining portions are useful.
What I particularly appreciate, is that it did a fair amount of synthesis of the different approaches,
and created a reasonably good framing of the options.
Working with LLM it&amp;rsquo;s easy to fall into a sunk cost fallacy, where you accept the output
as &lt;em&gt;good enough&lt;/em&gt;, even though you don&amp;rsquo;t think it&amp;rsquo;s that good.
Your defense is maintaining the same quality bar you&amp;rsquo;d impose on a peer
rather than degrading your standard to accept what the LLM has generated.&lt;/p&gt;
&lt;h2 id="diagnosis-onward"&gt;Diagnosis onward&lt;/h2&gt;
&lt;p&gt;With the exploration completed, the next step is return to our optimized prompt,
additionally including the exploration beneath the problem statement.
That results in &lt;a href="https://gist.github.com/lethain/10a6278bde07f5da66b60592b69ac454"&gt;this prompt&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-cowrite-5.png" alt="Diagnosis generated exploration and optimized prompt"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Diagnosis generated exploration and optimized prompt&lt;/p&gt;
&lt;p&gt;Altogether, the &lt;a href="https://gist.github.com/lethain/0b31203afbe693ce810ac5ba243df1c3"&gt;full diagnosis&lt;/a&gt;
is a surprisingly good set of factors that would come up when dealing with this problem.
This is how in-context learning from relevant examples goes so far in shaping better content.
That&amp;rsquo;s not to say the diagnosis is perfect, once again, it requires a meaningful editing pass
to make it accurate to your circumstances rather than the more generalized ones generated by
the LLM.&lt;/p&gt;
&lt;p&gt;From diagnosis onward, the steps remain the same for policy and operations.
Copy the edited contents of the prior step into your growing prompt&amp;ndash;including
the original optimized prompt and all the subsequent sections&amp;ndash;and ask it to
complete the next step.
To speed things up a bit, I &lt;a href="https://gist.github.com/lethain/c8e1c2735dfe58d2ae5f08b97ff5da8e"&gt;prompted to generate both the policy and operations&lt;/a&gt;
in one step, &lt;a href="https://gist.github.com/lethain/5aebe1862c440888c25d40a7d603169d"&gt;which it did a reasonable job at&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-cowrite-6.png" alt="Policies for improving architectural decision making"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Policies for improving architectural decision making&lt;/p&gt;
&lt;p&gt;Overal, the policy and operational mechanisms are pretty reasonable.
They lean heavily on the sorts of approaches featured in &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt;,
which is why I believe that including this sort of book&amp;ndash;from an author whose approach you trust&amp;ndash;is
what makes this sort of approach a useful one.&lt;/p&gt;
&lt;h2 id="cleaning-it-up"&gt;Cleaning it up&lt;/h2&gt;
&lt;p&gt;At this point, we have all the individual sections of our strategy,
which I&amp;rsquo;ve &lt;a href="https://gist.github.com/lethain/8aa85a70d7fa973af150ed4bff78c72c"&gt;collected into one file&lt;/a&gt;.
Now we need to clean it up a bit.&lt;/p&gt;
&lt;p&gt;To do that, I&amp;rsquo;m attaching the above file to this prompt within our working project:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Clean up this strategy document (structure it properly
as a strategy writeup, make it read smoothly, remove duplication).
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This generated &lt;a href="https://gist.github.com/lethain/a1d7c578ef9da0c6a2f9f5beb6b9ed3b"&gt;this output&lt;/a&gt;,
which is remarkably good in my opinion for the amount of time this approach has taken.
It&amp;rsquo;s overall &lt;em&gt;too much&lt;/em&gt;, so I took an editing pass to pare things down
into something that I would actually recommend, which is &lt;a href="https://gist.github.com/lethain/754244b8825c5d5e34b6bf7d1d019b7b"&gt;available for reading&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In this chapter, we&amp;rsquo;ve cowritten a valuable strategy with an LLM.
To do this, we&amp;rsquo;ve meta prompted an initial prompt into a more useful tool.
We&amp;rsquo;ve then used that optimized prompt to perform each step, supplementing
the initial prompt with each step as we completed it.
In the end, we used the LLM to clean up our strategy document as well.&lt;/p&gt;
&lt;p&gt;The important thing to recognize is that each step of a strategy builds
on the prior steps. A great exploration creates a powerful diagnosis.
A faulty diagnosis ruins the following policy and operations steps.
Just because an LLM can help you write quickly, doesn&amp;rsquo;t mean the
quality is worth using. However, it&amp;rsquo;s a powerful brainstorming
tool, and&amp;ndash;in at least this example&amp;ndash;did a surprisingly good job.&lt;/p&gt;</description></item><item><title>AI Companion / Foundations of Collaboration</title><link>https://craftingengstrategy.com/aic/foundations-of-collaboration/</link><pubDate>Sat, 15 Mar 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/aic/foundations-of-collaboration/</guid><description>&lt;p&gt;There are many things that LLMs are not particularly good at,
and generating a great engineering strategy is one of them. If you ask an LLM
to generate your company&amp;rsquo;s engineering strategy, I&amp;rsquo;m quite confident you are
going to be disappointed with what it generates.&lt;/p&gt;
&lt;p&gt;However, I am extremely confident that an LLM is already an excellent companion
to help you develop an effective engineering strategy. Used thoughtfully, an
LLM can generate and analyze systems models, help you in each step of writing,
and point out the areas your strategy needs to be reinforced.
This companion will walk you through the techniques to accomplish each of those things,
starting with this chapter which walks through configuring an environment
to collaborate with &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt; and an LLM of your choice.&lt;/p&gt;
&lt;p&gt;In this chapter we&amp;rsquo;ll cover&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Practices for prompting an LLM effective, including meta-prompting, and in-context learning&lt;/li&gt;
&lt;li&gt;Working with large corpuses, like this book, with an LLM, from Claude projects
to the &lt;a href="https://llm.datasette.io/en/stable/"&gt;&lt;code&gt;llm&lt;/code&gt;&lt;/a&gt; library.&lt;/li&gt;
&lt;li&gt;Providing tools to LLMs via the Model Context Protocol (MCP) servers.
These tools extend what the LLM can accomplish by offering new capabilities
such as web search or document retrieval&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After working through these foundations, we&amp;rsquo;ll be ready for the following chapters where
we use &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt; as a co-pilot for creating strategies.&lt;/p&gt;
&lt;h2 id="prompting-llms-effectively"&gt;Prompting LLMs effectively&lt;/h2&gt;
&lt;p&gt;If you want thorough coverage of working with LLMs, then I&amp;rsquo;d recommend
Chip Huyen&amp;rsquo;s &lt;a href="https://www.amazon.com/AI-Engineering-Building-Applications-Foundation/dp/1098166302"&gt;AI Engineering: Building Applications with Foundation Models&lt;/a&gt;.
However, most folks are getting started with LLMs by building their intuition chatting with them directly,
which I&amp;rsquo;ve found rather effective.
On the assumption that you&amp;rsquo;ve spent some time using LLMs already,
I will limit myself to emphasizing three particularly valuable techniques:
thinking of your LLM as a newly hired intern, improving prompts via meta prompting,
and using in-context learning (such as &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt; example strategies)
to improve output.&lt;/p&gt;
&lt;h2 id="llm-as-intern"&gt;LLM-as-Intern&lt;/h2&gt;
&lt;p&gt;The closest reference point most folks have for prompting is using a search engine like Google.
Search indexes are powerful creatures, and generally you can get exactly what you want by typing
in a few uncommon words related to your topic. Seasoned searchers might often know the resource
they want to retrieve the content from, for example &lt;code&gt;reddit best restaurants sf&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;However, LLMs generally don&amp;rsquo;t perform well with these sorts of terse prompts. For example,
we can look at two prompts for creating a Python command-line tool that annotates a Markdown
image with a description underneath it.&lt;/p&gt;
&lt;p&gt;The first prompt uses the LLM-as-Search mental model, providing a very general prompt
under the impression that you are &lt;em&gt;retrieving&lt;/em&gt; the response from an existing data source:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;I need a Python function to add images descriptions to Markdown.
Description should be in a paragraph tag beneath the image.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using &lt;code&gt;ChatGPT 4o&lt;/code&gt;, this generates a Python script, but it doesn&amp;rsquo;t really generate the right Python script.
This is evident solely from the function definition:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def add_image_descriptions(markdown_text, description_func=None):
...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I wanted a script that used the existing image definition within the Markdown image specification,
but because I made a general query&amp;ndash;as if I were retrieving an existing answer&amp;ndash;I got the wrong thing
entirely.&lt;/p&gt;
&lt;p&gt;Now let&amp;rsquo;s try the second sort of prompt, this one taking a much more verbose approach, which I would describe as
the LLM-as-Intern mental model. Here, you are providing very detailed instructions, as if you were writing a ticket
for a newly hired intern for their first job writing software:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Write a Python3 function with the definition:
def add_image_descriptions(text: str) -&amp;gt; str:
This function should use Python's re library to extract the
description text for Markdown images, and add it as floating
text beneath the image, for example it should replace this:
![Initial sketch of API](/static/api-simple.png)
With this:
![Initial sketch of API](/static/api-simple.png)
&amp;lt;p&amp;gt;Initial sketch of API&amp;lt;/p&amp;gt;
It should do that every every Markdown image in the text.
It should _not_ do that for any Markdown links that start
with [, only images which start with ![.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The second prompt took much longer to write, but it consistently generates &lt;em&gt;exactly&lt;/em&gt; the
outcome that I am looking for. Just as inexperienced can lead an intern to make the wrong assumptions when they
take on a new task, LLMs do the same. The solution in both cases is being more explicit.
Even for surprisingly complex tasks, my recurring experience is that LLMs &lt;em&gt;can do it&lt;/em&gt;
as long as I invest more into improving the prompt.&lt;/p&gt;
&lt;h3 id="meta-prompting"&gt;Meta-prompting&lt;/h3&gt;
&lt;p&gt;While you should work at improving your prompts yourself,
LLMs can also help here as well through meta prompting.
The idea here is to ask the LLM to improve your prompt
rather than to directly following the prompt&amp;rsquo;s direction.&lt;/p&gt;
&lt;p&gt;For example, &lt;a href="https://cookbook.openai.com/examples/enhance_your_prompts_with_meta_prompting"&gt;Open AI&amp;rsquo;s guide on meta prompting&lt;/a&gt;
suggests this prompt:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Improve the following prompt to generate a more detailed summary.
Adhere to prompt engineering best practices.
Make sure the structure is clear and intuitive and
contains the type of news, tags and sentiment analysis.
{simple_prompt}
Only return the prompt.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That is based on an example around analyzing news stories.
The most effective meta prompt will vary a bit depending on the sort of project
you are working on. For engineering strategy prompts working with &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt;,
here is a meta prompt that I have found effective:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Improve the following prompt to generate a better engineering strategy.
Adhere to prompt engineering best practices.
Make sure the structure is clear and intuitive.
Use strategies from _Crafting Engineering Strategy_
to provide multiple examples of the sort of desired output.
{simple_prompt}
Only return the prompt.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once you have a meta prompt, just prompt against the LLM of your choice,
replacing &lt;code&gt;{simple_prompt}&lt;/code&gt; with your initial prompt that you want to improve.
For example, here&amp;rsquo;s using it a Claude.ai project, which we&amp;rsquo;ll cover configuring
later in this chapter, to use the contents of &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt;
to improve a prompt for diagnosing a strategy.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-meta-1.png" alt="Meta prompting Claude.ai to improve a prompt"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Meta prompting Claude.ai to improve a prompt&lt;/p&gt;
&lt;p&gt;The meta prompt takes that simple initial prompt and turns it into
a significantly more detailed prompt. Writing out all of those steps
by hand would have taken a while, but no time at all with the appropriate
meta prompt.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-meta-2.png" alt="The improved prompt from Claude.ai after meta prompting"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;The improved prompt from Claude.ai after meta prompting&lt;/p&gt;
&lt;p&gt;The meta prompt I developed here is an improvement, but still fairly basic.
From this starting point, you should experiment with your own variations that work best for you.&lt;/p&gt;
&lt;h3 id="in-context-learning"&gt;In-Context Learning&lt;/h3&gt;
&lt;p&gt;The final technique I&amp;rsquo;ll discuss in this section is in-context learning (ICL),
which is essentially providing a number of examples of what you want the
LLM to generate for you. These examples help prime the LLM to understand
exactly the sort of response you want.
The more examples you include, the better the prompt will typically perform.
As I&amp;rsquo;ve iterated on deploying LLMs into production systems, I&amp;rsquo;ve found that
ICL solves most problems that meta prompting alone cannot solve.&lt;/p&gt;
&lt;p&gt;You can use this book both to support generating ICL examples and then include those
examples to better support generating your desired output.
With the book in your context window, your prompt to generate examples can be
straight forward, for example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Provide 10 examples of diagnoses from Crafting Engineering Strategy's
example strategies.
Format them as a bulleted list of examples.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will then generate a list of appropriate examples of diagnoses.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-icl-1.png" alt="Claude.ai generating sample diagnoses from Crafting Engineering Strategy"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Claude.ai generating sample diagnoses from Crafting Engineering Strategy&lt;/p&gt;
&lt;p&gt;Once you have those examples, include them in your next prompt to improve
the output it generates.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-icl-2.png" alt="Using generated diagnoses as in-context learning in subsequent prompt"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Using generated diagnoses as in-context learning in subsequent prompt&lt;/p&gt;
&lt;p&gt;Note that the prompt shown here is sparse to improve readability.
In practice, you&amp;rsquo;d likely want to apply meta prompting to improve
this prompt as well.&lt;/p&gt;
&lt;h2 id="working-with-a-book-in-an-llm"&gt;Working with a book in an LLM&lt;/h2&gt;
&lt;p&gt;In order to follow along with this book&amp;rsquo;s examples,
you will need to setup an environment to prompt against the book.
While there are an infinite number of options,
here we&amp;rsquo;ll look at three setups:
using Anthropic&amp;rsquo;s Claude.ai Projects, using
OpenAI ChatGPT Projects, and using the &lt;a href="https://llm.datasette.io/"&gt;&lt;code&gt;llm&lt;/code&gt; library&lt;/a&gt;
with a model of your choosing (albeit restricted to models that
accept at least ~180k tokens in their input window, such as
most modern Gemini or Anthropic models, &lt;code&gt;gpt-4.1-mini&lt;/code&gt;, and many others).&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re indifferent to which approach to use, I encourage using
Claude.ai Projects as they work best with local Model Context Protocol (MCP) servers,
which are discussed in the next section, but outside of MCP support all options
are quite reasonable. Similarly, it&amp;rsquo;s reasonable to assume that MCP support will
be widely available in the other environments in a relatively short timeframe.&lt;/p&gt;
&lt;h3 id="claudeai-projects"&gt;Claude.ai Projects&lt;/h3&gt;
&lt;p&gt;In June 2024, Anthropic&amp;rsquo;s Claude.ai &lt;a href="https://www.anthropic.com/news/projects"&gt;introduced the concept of Projects&lt;/a&gt;,
which allow you to associate files with a project. These files are included in the Project&amp;rsquo;s chats&amp;rsquo; context window,
and at 200k tokens is large enough to support this book&amp;rsquo;s 170k tokens.&lt;/p&gt;
&lt;p&gt;Get started by creating a Claude.ai account and navigating to the &lt;a href="https://claude.ai/projects"&gt;Projects page&lt;/a&gt;.
Once there, use the &lt;code&gt;New Project&lt;/code&gt; button to create a new project.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-found-anthrop-1.png" alt="Claude.ai&amp;rsquo;s project list page"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Claude.ai's project list page&lt;/p&gt;
&lt;p&gt;After selecting &lt;code&gt;New Project&lt;/code&gt;, you will be shown a short form
to describe the project. Include a memorable name, e.g. &amp;ldquo;Crafting Eng Strat&amp;rdquo;
and optionally a short description.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-found-anthrop-2.png" alt="Creating a new project in Claude.ai"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Creating a new project in Claude.ai&lt;/p&gt;
&lt;p&gt;After creating the project, you will be redirected to your new project.
Then you&amp;rsquo;ll want to use the plus button to add a new file to the project.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-found-anthrop-3.png" alt="The file upload section for a Claude.ai project"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;The file upload section for a Claude.ai project&lt;/p&gt;
&lt;p&gt;Select &amp;ldquo;Upload from device&amp;rdquo; and select the &lt;code&gt;ces_llm.md&lt;/code&gt; file that you&amp;rsquo;ve already
retrieved following the Preface&amp;rsquo;s instructions.
After doing so, you&amp;rsquo;ll see that about 80% of your project&amp;rsquo;s capacity is taken
up by the book.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-found-anthrop-4.png" alt="The file upload section after uploading Crafting Engineering Strategy to a project"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;The file upload section after uploading Crafting Engineering Strategy to a project&lt;/p&gt;
&lt;p&gt;At this point, any prompt you run within the project will include
the full context of the book in addition to the prompt that you add.
This allows you to query the book directly.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-found-anthrop-5.png" alt="Prompting against a project with Crafting Engineering Strategy in the context window"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Prompting against a project with Crafting Engineering Strategy in the context window&lt;/p&gt;
&lt;p&gt;At this point, you are ready to use this project
to interact with the book as explored in subsequent chapters.&lt;/p&gt;
&lt;p&gt;As a final note, it&amp;rsquo;s valuable to recognize that these projects are &lt;em&gt;not&lt;/em&gt;
performing Retrieval Augmented Generation (RAG), a technique which often uses
search algorithms to select a subset of a corpus to include with your prompt.
Instead, it is including the entirety of the book. Anything included in the
book, ought to be accessible, even sections which might not semantically
relate to your initial prompt.&lt;/p&gt;
&lt;h3 id="chatgpt-projects"&gt;ChatGPT Projects&lt;/h3&gt;
&lt;p&gt;Much like Claude.ai, ChatGPT has also &lt;a href="https://help.openai.com/en/articles/10169521-using-projects-in-chatgpt"&gt;introduced a Projects&lt;/a&gt;
feature to allow interacting with a corpus of resources. It functions quite similarly to
Claude.ai&amp;rsquo;s projects.&lt;/p&gt;
&lt;p&gt;Get started by going to &lt;a href="https://chatgpt.com/"&gt;ChatGPT&lt;/a&gt; and logging in.
Note that at the time of writing, ChatGPT projects are available with any paid plan, but not on the free plan.
After logging in, click the &lt;code&gt;New project&lt;/code&gt; button.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-found-openai-1.png" alt="Creating a new project in ChatGPT"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Creating a new project in ChatGPT&lt;/p&gt;
&lt;p&gt;After clicking the &amp;ldquo;New project&amp;rdquo; button, you&amp;rsquo;ll be presented with a form
to provide a name for your project. Although you can use whatever name you
prefer, for this example I used &lt;code&gt;Crafting Eng Strategy&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-found-openai-2.png" alt="Naming a new ChatGPT Project"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Naming a new ChatGPT Project&lt;/p&gt;
&lt;p&gt;After you click the &lt;code&gt;Create project&lt;/code&gt; button, you&amp;rsquo;ll be automatically redirected
to the new project&amp;rsquo;s page. On that page, you click the &lt;code&gt;Add files&lt;/code&gt; button to upload
a copy of the book.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-found-openai-3.png" alt="Project page for new &amp;ldquo;Crafting Eng Strategy&amp;rdquo; project"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Project page for new "Crafting Eng Strategy" project&lt;/p&gt;
&lt;p&gt;After clicking &lt;code&gt;Add files&lt;/code&gt;, you&amp;rsquo;ll land on the &lt;code&gt;Project files&lt;/code&gt; page which lists all
uploaded files. When there, you can use the &lt;code&gt;Add files&lt;/code&gt; button or simply drag
your copy of &lt;code&gt;ces_llm.md&lt;/code&gt; into the project. (Reminder, downloading &lt;code&gt;ces_llm.md&lt;/code&gt;
is covered in the &lt;a href="ces-ai-preface"&gt;Preface&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-found-openai-4.png" alt="Project files with a warning about file size"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Project files with a warning about file size&lt;/p&gt;
&lt;p&gt;After uploading the file, you will get a warning that the large file may impact responses.
This is a real concern. First, OpenAI models have token window limitations described in &lt;a href="https://platform.openai.com/docs/models"&gt;their documentation&lt;/a&gt;.
Second, OpenAI has &lt;a href="https://platform.openai.com/settings/organization/limits"&gt;rate limits&lt;/a&gt;
for some models which may restrict which models work for such a large project.
You&amp;rsquo;ll be fine working with either &lt;a href="https://platform.openai.com/docs/models/gpt-4.1-mini"&gt;GPT-4.1 mini&lt;/a&gt;
or &lt;a href="https://platform.openai.com/docs/models/gpt-4o-mini"&gt;GPT-4o mini&lt;/a&gt;.
Many other models will work as well, but some models might not.&lt;/p&gt;
&lt;p&gt;At this point, you&amp;rsquo;re able to query the project including the entire uploaded book in the
context window. (Note that this prompt was performed using &lt;code&gt;ChatGPT-4.1 mini&lt;/code&gt;.)&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-found-openai-5.png" alt="Prompting a ChatGPT Project with Crafting Engineering Strategy in context window"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Prompting a ChatGPT Project with Crafting Engineering Strategy in context window&lt;/p&gt;
&lt;p&gt;At this point, you have the ChatGPT project fully configured and are ready to query against it.&lt;/p&gt;
&lt;h3 id="llmpy"&gt;llm.py&lt;/h3&gt;
&lt;p&gt;Now that we&amp;rsquo;ve used the two UX-driven project interfaces,
we can also configure a project to use the &lt;a href="https://llm.datasette.io/"&gt;&lt;code&gt;llm&lt;/code&gt;&lt;/a&gt;
library via the command-line as a third configuration option.
While the examples here are specifically using OpenAI&amp;rsquo;s APIs, they would
work equally well with Anthropic, Gemini, or a self-hosted model.
You would just have to specify slightly different parameters for the model and API keys
in the command line.&lt;/p&gt;
&lt;p&gt;Start by downloading your copy of the LLM-optimized
&lt;em&gt;Crafting Engineering Strategy&lt;/em&gt; as discussed in &lt;a href="ces-ai-preface"&gt;the Preface&lt;/a&gt;.
These instructions will assume you&amp;rsquo;ve moved the copy into a file in the current
directory named &lt;code&gt;ces_llm.md&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Then follow the &lt;code&gt;llm&lt;/code&gt; library &lt;a href="https://llm.datasette.io/en/stable/setup.html"&gt;setup instructions&lt;/a&gt;.
I&amp;rsquo;ll rely on &lt;a href="https://docs.astral.sh/uv/getting-started/installation/"&gt;uv&lt;/a&gt; to manage installation,
but you&amp;rsquo;re welcome to use &lt;code&gt;pip&lt;/code&gt; or any other Python package management.
Make sure you set your OpenAI API keys:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uvx llm keys set openai
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then run a short prompt to verify your setup:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;uvx llm -m gpt-4.1-mini \
'what are the steps to write an engineering strategy?'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Assuming that works, now you&amp;rsquo;re ready to work with the book as well:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cat ./ces_llm.md | \
uvx llm -m gpt-4.1-mini \
'what are the steps to write an engineering strategy?'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will include the entire book ahead of your prompt,
and will provide a quite different response than the previous
prompt without the book included.&lt;/p&gt;
&lt;h2 id="using-tools-via-mcp"&gt;Using tools via MCP&lt;/h2&gt;
&lt;p&gt;In the rare scenarios where meta prompting and in-context learning are insufficient to
make a prompt work effectively, the final technique is providing purpose-built tools
to your LLM that allow the LLM to do what it does best&amp;ndash;manipulate textual representations&amp;ndash;and
push other sorts of complexity into the tool itself, especially tasks that require complex math or image rendering.&lt;/p&gt;
&lt;p&gt;Two chapters in this companion dig into &lt;a href="https://craftingengstrategy.com/aic/generate-systems-model-with-llm/"&gt;generating systems models with LLMs&lt;/a&gt;
and &lt;a href="https://craftingengstrategy.com/aic/generate-wardley-map-with-llm/"&gt;generating Wardley maps with LLMs&lt;/a&gt;. Both include approaches that
rely on using &lt;a href="https://modelcontextprotocol.io/"&gt;Model Context Protocol&lt;/a&gt; servers to access additional tools.
Although all major LLM providers &lt;em&gt;intend&lt;/em&gt; to support MCP servers in the future, as things stand today
&lt;a href="https://claude.ai/download"&gt;Claude Desktop&lt;/a&gt; is the only tool that provides straightforward support
for allowing your project to interact with custom MCP servers.&lt;/p&gt;
&lt;p&gt;The instructions at &lt;a href="https://modelcontextprotocol.io/quickstart/user"&gt;&lt;code&gt;modelcontextprotocol.io/quickstart/user&lt;/code&gt;&lt;/a&gt;
are the best available for setting up your local Claude Desktop to use custom tools.
You can see similar instructions in the &lt;a href="https://github.com/lethain/systems-mcp"&gt;&lt;code&gt;lethain/systems&lt;/code&gt;&lt;/a&gt; repository,
which is used in this companion&amp;rsquo;s chapter on generating systems models.&lt;/p&gt;
&lt;p&gt;If you are wholly disinterested in running Claude Desktop, but &lt;em&gt;do&lt;/em&gt; want to experiment with MCP servers,
one other option to consider is the &lt;a href="https://openai.github.io/openai-agents-python/mcp/"&gt;OpenAI Agents SDK&lt;/a&gt;.
This will require writing custom code, but is the exact sort of code that an LLM is effective at writing if
you include the relevant documentation in your prompt.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In this chapter, we&amp;rsquo;ve worked through three foundational prompting techniques to
make your prompts more effective: LLM-as-Intern, meta prompting, and in-context learning.
We&amp;rsquo;ve also worked through three different viable setups for interacting with large context
windows, including the LLM-optimized format of &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;With a configured environment, you&amp;rsquo;re now ready to move on to improving strategy
with your LLM copilot.&lt;/p&gt;</description></item><item><title>AI Companion / Preface</title><link>https://craftingengstrategy.com/aic/preface/</link><pubDate>Sat, 15 Mar 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/aic/preface/</guid><description>&lt;p&gt;As someone who writes books, I am continually grateful to the fact that there are
people who buy a book, and then actually read that book.
It&amp;rsquo;s equally clear to me that there&amp;rsquo;s a second, large group who like the &lt;em&gt;idea&lt;/em&gt; of books,
but would much prefer a project-based approach to learning the book&amp;rsquo;s content.&lt;/p&gt;
&lt;p&gt;As the technology sector applies Large Language Models (LLMs) to &lt;em&gt;everything&lt;/em&gt;,
one very valid reaction for authors is to be concerned. Those concerns are merited: it &lt;em&gt;is&lt;/em&gt; possible that fewer
people will be reading book a decade from now than they do today.
However, I also think that LLMs are a powerful tool for supporting project-based leaders.&lt;/p&gt;
&lt;p&gt;This compansion to &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt; aims to do exactly that,
showcasing a number of ways you can actively engage with this book as your guide and
co-pilot in creating engineering strategy.&lt;/p&gt;
&lt;h2 id="accessing-the-llm-optimized-edition"&gt;Accessing the LLM-optimized edition&lt;/h2&gt;
&lt;p&gt;You can download the LLM-optimized edition of this book at &lt;code&gt;only-in-the-final-version&lt;/code&gt;.
That URL is also accessible via this QRCode:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/ces-llm.png" alt="Use this QR code to download the LLM-optimized edition"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Use this QR code to download the LLM-optimized edition&lt;/p&gt;
&lt;p&gt;Once you have downloaded your copy, instructions for loading and utilizing
it are covered in the &lt;a href="https://craftingengstrategy.com/aic/foundations-of-collaboration/"&gt;Foundations of Collaboration&lt;/a&gt; chapter.&lt;/p&gt;
&lt;h2 id="what-this-book-is-not"&gt;What This Book is Not&lt;/h2&gt;
&lt;p&gt;This is the companion to &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt;, with an LLM-optimized version of the book,
and instructions on how to use that LLM-optimized version. This is not the human-optimized digital
version of &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt;, and it is certainly not a physical copy.
If you want to read the book directly, you should buy one of those instead.&lt;/p&gt;
&lt;h2 id="navigating-this-book"&gt;Navigating This Book&lt;/h2&gt;
&lt;p&gt;While &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt; is several hundred pages long and has different
ways you might want to approach it, this companion is much shorter and is intended to be
read from front to back. It covers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Part 1: Foundations of Collaboration&lt;/strong&gt; covers concrete approaches to querying
this book with an LLM, and how to use those approches effectively&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Part 2: Co-writing Strategy with an LLM&lt;/strong&gt; shows how to write a new engineering strategy
with an LLM as a co-pilot&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Part 3: Reviewing an existing strategy&lt;/strong&gt; explains approaches to generating feedback on an existing strategy
with an LLM&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Part 4: Generating Systems Models with an LLM&lt;/strong&gt; includes instructions for generating systems models using an LLM
to absorb much of the complexity&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Part 5: Generating Wardley Maps with an LLM&lt;/strong&gt; provides details on generating a Wardley map with an LLM
simplifying the process&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Part 6: Next steps to using this book&lt;/strong&gt; suggests additional ways you can experiment
with using the LLM-optimized version of this book&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It&amp;rsquo;s absolutely not necessary to read &lt;em&gt;Crafting Engineering Strategy&lt;/em&gt; before reading the
companion, but if you are particularly passionate about strategy, you are likely to get
more value from both books if you eventually read through the human-optimized version as well.&lt;/p&gt;</description></item><item><title>AI Companion for Crafting Engineering Strategy</title><link>https://craftingengstrategy.com/aicompanion/</link><pubDate>Sat, 15 Mar 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/aicompanion/</guid><description/></item><item><title>Preface</title><link>https://craftingengstrategy.com/preface/</link><pubDate>Sat, 15 Mar 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/preface/</guid><description>&lt;p&gt;In 2015, the Mini Sky City skyscraper, with 57 floors, was built in Changsha, China, in 19 days. Driving to work over the past few years, I’ve watched a nine-story building in San Francisco get built over three years. There’s some argument that Mini Sky City’s record isn’t legitimate because it relied heavily on modular, pre-built architecture, but I can assure you that the three-years-and-counting building in San Francisco is similarly being built from modular components.
Why did one of these projects build three floors per day, and the other three floors per year?&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="https://www.amazon.com/How-Big-Things-Get-Done-ebook/dp/B0B3HS4C98/"&gt;How Big Things Get Done&lt;/a&gt;&lt;/em&gt; by Bent Flyvbjerg and Dan Gardner explores how strategy impacts the successful creation of complex buildings, and their foundational observation is that you go fast by making most of your mistakes where it’s cheapest–for example, in simulation–and fewer where it’s difficult to fix–for example, after you’ve built most of a physical building.
In my experience, their observation applies equally well to software engineering strategy.&lt;/p&gt;
&lt;p&gt;However, the problem in software engineering goes further.
You&amp;rsquo;ll never meet an architect who hasn&amp;rsquo;t seen a building plan,
but the majority of software engineers and even software executives will tell you that they’ve never seen a clear, written engineering strategy.
There’s a widespread belief that engineering strategy doesn’t exist, but if you ask the right questions, you’ll find that almost every engineer has a strong instinctive understanding of their current company’s engineering strategy.
Even if that strategy isn&amp;rsquo;t particularly good, they&amp;rsquo;ll know what it is.&lt;/p&gt;
&lt;p&gt;This book wants to reshape the conversation around software engineering strategy in two ways. First, I hope to establish a sufficiently clear, shared definition of engineering strategy so that we agree on what we’re talking about.
With that definition, we can start the discussion examining how to improve our strategies, rather than debating whether they exist.
My second goal is to make it easier for all of us to take up the pen to write down our companies’ engineering strategies.
If this book is particularly successful, a few years from now the ideas in this book will be obsolete through their own ubiquity.
They’ll be so obvious that they’re not worth discussing&amp;ndash;that would be a triumph&lt;/p&gt;
&lt;p&gt;Strategy is often viewed as the dominion of Staff-plus engineers and executives. I hope those folks think a lot about strategy, but this book believes that strategy is applicable–and improvable–by everyone within an engineering organization. If you work within an engineering organization, or even adjacent to an engineering organization, then this book wants to help you understand and improve on your company’s engineering strategy. Certainly, different roles require different approaches, but &lt;em&gt;you&lt;/em&gt; can contribute to improvement.&lt;/p&gt;
&lt;p&gt;Finally, I believe a bit of rigor in our thinking can change our lives, our colleagues&amp;rsquo; lives, and the lives of the people who
use the software that we create.
Engineering organizations today routinely waste dozens or hundreds of years of their teams’ lives by refusing to engage
with the reality of their problems.
Far from an abstract and aspirational endeavor, strategy is the bare minimum we owe ourselves, our colleagues and our users
to invest our scarce time wisely.&lt;/p&gt;
&lt;h2 id="what-this-book-is-not"&gt;What This Book is Not&lt;/h2&gt;
&lt;p&gt;This book is intended to be widely accessible, particularly so for
anyone working in, or adjacent, to software engineering.
However, this book certainly won&amp;rsquo;t be everything to everyone,
and I want to acknowledge some of its limitations.&lt;/p&gt;
&lt;p&gt;First, the examples in this book are rooted in my personal experiences.
I&amp;rsquo;ve done many things in my career, from starting a small iOS gaming startup in 2008,
to growing Calm&amp;rsquo;s engineering organization, to contributing to
Stripe and Uber&amp;rsquo;s periods of rapid growth. Of course, my experience has its gaps.
I worked at Yahoo! when it was quite large with a massive codebase, but I was there in a junior role,
and I consequently have an incomplete view of working at a company of that size.
I&amp;rsquo;ve also never worked in government. Indeed, the list of things I haven&amp;rsquo;t done is endless.&lt;/p&gt;
&lt;p&gt;Second, this book is an opinionated introduction to engineering strategy,
and is intended to serve as your first introduction to that hopelessly broad topic.
If you&amp;rsquo;re looking for a more general book on strategy, particularly
if you don&amp;rsquo;t work in a software engineering adjacent field, I&amp;rsquo;d probably suggest you instead start with
Rumelt&amp;rsquo;s &lt;em&gt;&lt;a href="https://www.amazon.com/Good-Strategy-Bad-Difference-Matters/dp/0307886239"&gt;Good Strategy, Bad Strategy&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Finally, this book touches on software architecture a number of times, as software architecture
is a common topic within engineering strategies. However, it is not a book about software architecture.
Where one strategy will focus on software architecture, the next will focus just as heavily on managerial
mechanisms like approving headcount backfills.
For a book on architecture, I might suggest
&lt;em&gt;&lt;a href="https://www.amazon.com/Fundamentals-Software-Architecture-Comprehensive-Characteristics/dp/1492043451"&gt;Fundamentals of Software Architecture: An Engineering Approach&lt;/a&gt;&lt;/em&gt;
by Mark Richards and Neal Ford.&lt;/p&gt;
&lt;h2 id="navigating-this-book"&gt;Navigating This Book&lt;/h2&gt;
&lt;p&gt;The default way to read this book is to start at the beginning, and read through to the end.
If you do that, you&amp;rsquo;ll work through the book&amp;rsquo;s five parts in order:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Part 1: Introducing engineering strategy&lt;/strong&gt; introduces this book&amp;rsquo;s overall thesis&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Part 2: Steps for building engineering strategies&lt;/strong&gt; breaks down step-by-step instructions
for following each of the steps to craft, implement and operate an engineering strategy&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Part 3: Refine strategy: test, model &amp;amp; map&lt;/strong&gt; goes into further detail on the topic of strategy refinement,
which I believe is the most valuable and most neglected element of engineering strategy&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Part 4: Strategy case studies&lt;/strong&gt; provides ten concrete engineering strategies, all of which are
based on concrete work I&amp;rsquo;ve done in my career (although some are lightly anonymized)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Part 5: Going forward&lt;/strong&gt; wraps up the book with advice for evaluating strategy, and improving your own strategy work&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you want a more focused dive into the book, I&amp;rsquo;d encourage you to start by reading the case studies,
and then selectively reading the chapters to understand any parts that you find interesting or surprising.
Reading this way will mean that you&amp;rsquo;re left without some of the relevant definitions,
but realistically most strategy readers are working without those definitions as well,
so hopefully it&amp;rsquo;s still a coherent read.&lt;/p&gt;
&lt;h2 id="acknowledgments"&gt;Acknowledgments&lt;/h2&gt;
&lt;p&gt;Each book is the culmination of the prior writing I have done, the people I have had the chance to collaborate with,
and the work itself that has educated me despite my best efforts.
Thank you to each person who has helped me with this book itself, and the work it stands on.&lt;/p&gt;
&lt;p&gt;Above all else, I owe indefinite thanks to my wife, Laurel, and my son, Emerson.
Thank you both for being part of this book&amp;rsquo;s journey, and the much
larger journey where this book is only a very small chapter.&lt;/p&gt;</description></item><item><title>Setting policy</title><link>https://craftingengstrategy.com/policy/</link><pubDate>Thu, 13 Mar 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/policy/</guid><description>&lt;p&gt;This book&amp;rsquo;s introduction started by defining strategy as &amp;ldquo;making decisions.&amp;rdquo;
Then we dug into &lt;a href="https://craftingengstrategy.com/explore/"&gt;exploration&lt;/a&gt;,
&lt;a href="https://lethain.com/diagnosis-for-strategy"&gt;diagnosis&lt;/a&gt;, and
&lt;a href="https://craftingengstrategy.com/refine/"&gt;refinement&lt;/a&gt;.
Those are three chapters where you could argue that we didn&amp;rsquo;t decide anything at all.
Clarifying the problem to be solved is the prerequisite of effective decision making, but eventually decisions do have to be made.
Here in this chapter on policy, and the &lt;a href="https://craftingengstrategy.com/operations/"&gt;following chapter on operations&lt;/a&gt;, we finally start making some decisions.&lt;/p&gt;
&lt;p&gt;In this chapter, we&amp;rsquo;ll dig into:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How we define policy, and how setting policy differs from operating policy as discussed
in the next chapter&lt;/li&gt;
&lt;li&gt;The structured steps for setting policy&lt;/li&gt;
&lt;li&gt;How many policies should you set? Is it preferable to have one policy, many policies,
or does it not matter much either way?&lt;/li&gt;
&lt;li&gt;Recurring kinds of policies that appear frequently in strategies&lt;/li&gt;
&lt;li&gt;Why it&amp;rsquo;s valuable to be intentional about your strategy&amp;rsquo;s altitude,
and how engineers and executives generally maintain
different altitudes in their strategies&lt;/li&gt;
&lt;li&gt;Criteria to use for evaluating whether your policies are likely to be impactful&lt;/li&gt;
&lt;li&gt;How to develop novel policies, and why it&amp;rsquo;s rare&lt;/li&gt;
&lt;li&gt;Why having multiple bundles of alternative policies is generally
a phase in strategy development that
indicates a gap in your diagnosis&lt;/li&gt;
&lt;li&gt;How policies that ignore constraints sound inspirational,
but accomplish little&lt;/li&gt;
&lt;li&gt;Dealing with ambiguity and uncertainty created by missing strategies
from cross-functional stakeholders&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By the end, you&amp;rsquo;ll be ready to evaluate why an existing strategy&amp;rsquo;s policies are struggling
to make an impact, and to start iterating on policies for strategy of your own.&lt;/p&gt;
&lt;h2 id="what-is-policy"&gt;What is policy?&lt;/h2&gt;
&lt;p&gt;Policy is interpreting your &lt;a href="https://craftingengstrategy.com/diagnosis/"&gt;diagnosis&lt;/a&gt; into a concrete plan.
That plan will be a collection of decisions, tradeoffs, and approaches.
They&amp;rsquo;ll range from coding practices, to hiring mandates, to architectural decisions,
to guidance about how choices are made within your organization.&lt;/p&gt;
&lt;p&gt;An effective policy solves the entirety of the strategy&amp;rsquo;s diagnosis,
although the diagnosis itself is encouraged to specify which aspects can be ignored.
For example, the &lt;a href="https://craftingengstrategy.com/private-equity-strategy/"&gt;strategy for working with private equity ownership&lt;/a&gt;
acknowledges in its diagnosis that they don&amp;rsquo;t have clear guidance on what kind of reduction to expect:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Based on general practice, it seems likely that our new Private Equity ownership will expect us to reduce R&amp;amp;D headcount costs through a reduction. However, we don’t have any concrete details to make a structured decision on this, and our approach would vary significantly depending on the size of the reduction.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Faced with that uncertainty, the policy simply acknowledges the
ambiguity and commits to reconsider when more information becomes available:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We believe our new ownership will provide a specific target for Research and Development (R&amp;amp;D) operating expenses during the upcoming financial year planning. We will revise these policies again once we have explicit targets, and will delay planning around reductions until we have those numbers to avoid running two overlapping processes.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There are two frequent points of confusion when creating policies
that are worth addressing directly:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Policy is a subset of strategy, rather than the entirety of strategy,
because policy is only meaningful in the context of the strategy&amp;rsquo;s diagnosis.
For example, the &lt;a href="https://craftingengstrategy.com/private-equity-model/"&gt;&amp;ldquo;N-1 backfill policy&amp;rdquo;&lt;/a&gt; makes sense in the context of &lt;a href="https://craftingengstrategy.com/private-equity-strategy/"&gt;new, private equity ownership&lt;/a&gt;.
The policy wouldn&amp;rsquo;t work well in a rapidly expanding organization.&lt;/p&gt;
&lt;p&gt;Any strategy without a policy is useless, but you&amp;rsquo;ll also find policies without context
aren&amp;rsquo;t worth much either. This is particularly unfortunate, because so often
strategies are communicated without those critical sections.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Policy describes how tradeoffs should be made, but it doesn&amp;rsquo;t verify how the tradeoffs
are actually being made in practice.
The next chapter on operations covers how to inspect an organization&amp;rsquo;s behavior to ensure policies
are followed.&lt;/p&gt;
&lt;p&gt;When reworking a strategy &lt;a href="https://craftingengstrategy.com/readable-strategy/"&gt;to be more readable&lt;/a&gt;,
it often makes sense to merge policy and operation sections together.
However, when drafting strategy it&amp;rsquo;s valuable to keep them separate.
Yes, you &lt;em&gt;might&lt;/em&gt; use a weekly meeting to review whether the policy is being followed,
but whether it&amp;rsquo;s an effective policy is independent of having such a meeting,
and what operational mechanisms you use will vary depending on the number of policies
you intend to implement.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With this definition in mind,
now we can move onto the more interesting discussion of how to set policy.&lt;/p&gt;
&lt;h2 id="how-to-set-policy"&gt;How to set policy&lt;/h2&gt;
&lt;p&gt;Every part of writing a strategy feels hard when you&amp;rsquo;re doing it,
but I personally find that writing policy either feels uncomfortably
easy or painfully challenging. It&amp;rsquo;s never a happy medium.
Fortunately, the exploration and diagnosis usually come together
to make writing your policy simple: although sometimes that simple
conclusion may be a difficult one to swallow.&lt;/p&gt;
&lt;p&gt;The steps I follow to write a strategy&amp;rsquo;s policy are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Review diagnosis&lt;/strong&gt; to ensure it captures the most important themes.
It doesn&amp;rsquo;t need to be perfect, but it shouldn&amp;rsquo;t have omissions so obvious
that you can immediately identify them.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Select policies&lt;/strong&gt; that address the diagnosis.
Explicitly match each policy to one or more diagnoses that it addresses.
Continue adding policies until every diagnoses is covered.&lt;/p&gt;
&lt;p&gt;This is a broad instruction, but it&amp;rsquo;s simpler than it sounds because you&amp;rsquo;ll
typically select from policies &lt;a href="https://craftingengstrategy.com/explore/"&gt;identified during your exploration phase&lt;/a&gt;.
However, there certainly is space to tweak those policies,
and to reapply familiar policies to new circumstances.&lt;/p&gt;
&lt;p&gt;If you do find yourself developing a novel policy,
there&amp;rsquo;s a later section in this chapter, &lt;em&gt;Developing novel policies&lt;/em&gt;,
that addresses that topic in more detail.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Consolidate policies&lt;/strong&gt; in cases where they overlap or adjoin.
For example, two policies about specific teams might be generalized into a policy about all teams
in the engineering organization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Backtest policy&lt;/strong&gt; against recent decisions you&amp;rsquo;ve made.
This is particularly effective if you maintain a &lt;a href="https://infraeng.dev/decision-log/"&gt;decision log&lt;/a&gt;
in your organization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Mine for conflict&lt;/strong&gt; once again, much as you did in developing your diagnosis.
Emphasize feedback from teams and individuals with a different perspective than your own,
but don&amp;rsquo;t wholly eliminate those that you agree with.
Just as it&amp;rsquo;s easy to crowd out opposing views in diagnosis if you don&amp;rsquo;t solicit their input,
it&amp;rsquo;s possible to accidentally crowd out your own perspective if you anchor too much on others&amp;rsquo; perspectives.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Consider refinement&lt;/strong&gt; if you finish writing, and you just aren&amp;rsquo;t sure your approach works &amp;ndash; that&amp;rsquo;s fine!
Return to the refinement phase by deploying &lt;a href="https://craftingengstrategy.com/refine/"&gt;one of the refinement techniques&lt;/a&gt; to increase your conviction.
Remember that we &lt;em&gt;talk&lt;/em&gt; about strategy like it&amp;rsquo;s done in one pass,
but almost all real strategy takes many refinement passes.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The steps of writing policy are relatively pedestrian, largely because
you&amp;rsquo;ve done so much of the work already in the exploration, diagnosis, and refinement steps.
If you skip those phases, you&amp;rsquo;d likely follow the above steps for writing policy,
but the expected quality of the policy itself would be far lower.&lt;/p&gt;
&lt;h2 id="how-many-policies"&gt;How many policies?&lt;/h2&gt;
&lt;p&gt;Addressing the entirety of the diagnosis is often complex,
which is why most strategies feature a set of policies rather than just one.
The &lt;a href="https://craftingengstrategy.com/monolith-decomposition-strategy/"&gt;strategy for decomposing a monolithic application&lt;/a&gt;
is not one policy deciding not to decompose, but a series of four policies:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Business units should always operate in their own code repository and monolith.&lt;/li&gt;
&lt;li&gt;New integrations across business unit monoliths should be done using gRPC.&lt;/li&gt;
&lt;li&gt;Except for new business unit monoliths, we don’t allow new services.&lt;/li&gt;
&lt;li&gt;Merge existing services into business-unit monoliths where you can.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Four isn&amp;rsquo;t universally the right number either.
It&amp;rsquo;s simply the number that was required to solve that strategy&amp;rsquo;s diagnosis.
With an excellent diagnosis, your policies will often feel inevitable, and perhaps even boring.
That&amp;rsquo;s great: what makes a policy good is that it&amp;rsquo;s effective, not that it&amp;rsquo;s novel or inspiring.&lt;/p&gt;
&lt;h2 id="kinds-of-policies"&gt;Kinds of policies&lt;/h2&gt;
&lt;p&gt;While there are &lt;em&gt;so many&lt;/em&gt; policies you can write, I&amp;rsquo;ve found they generally fall into one of four major
categories: approvals, allocations, direction, and guidance. This section introduces those categories.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Approvals&lt;/strong&gt; define the process for making a recurring decision.
This might require invoking an architecture advice process,
or it might require involving an authority figure like an executive.&lt;/p&gt;
&lt;p&gt;In the &lt;a href="https://craftingengstrategy.com/index-acquisition-strategy/"&gt;Index post-acquisition integration strategy&lt;/a&gt;,
there were a number of complex decisions to be made, and the approval mechanism was:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Escalations come to paired leads: given our limited shared context across teams, all escalations must come to both Stripe’s Head of Traffic Engineering and Index’s Head of Engineering.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This allowed the acquired and acquiring teams to start building trust between each other
by ensuring both were consulted before any decision was finalized.
On the other hand, the &lt;a href="https://craftingengstrategy.com/user-data-strategy/"&gt;user data access strategy&lt;/a&gt;&amp;rsquo;s approval
strategy was more focused on managing corporate risk:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Exceptions must be granted in writing by CISO.&lt;/strong&gt; While our overarching Engineering Strategy states
that we follow an advisory architecture process as described in &lt;em&gt;Facilitating Software Architecture&lt;/em&gt;,
the customer data access policy is an exception and must be explicitly approved, with documentation, by the CISO. Start that process in the #ciso channel.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;These two different approval processes had different goals,
so they made tradeoffs differently. There are so many ways to tweak approval,
allowing for many different tradeoffs between safety, productivity, and trust.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Allocations&lt;/strong&gt; describe how resources are split across multiple potential investments.
Allocations are the most concrete statement of organizational priority, and also articulate
the organization&amp;rsquo;s belief about how productivity happens in teams.
Some companies believe you go fast by swarming more people onto critical problems.
Other companies believe you go fast by forcing teams to solve problems without additional headcount.
Both can work, and teach you something important about the company&amp;rsquo;s beliefs.&lt;/p&gt;
&lt;p&gt;The strategy on &lt;a href="https://craftingengstrategy.com/uber-strategy/"&gt;Uber&amp;rsquo;s service migration&lt;/a&gt; has two concrete examples
of allocation policies. The first describes the Infrastructure engineering team&amp;rsquo;s
allocation between manual provision tasks and investing into creating a self-service provisioning platform:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Constrain manual provisioning allocation to maximize investment in self-service provisioning.&lt;/strong&gt; The service provisioning team will maintain a fixed allocation of one full time engineer on manual service provisioning tasks. We will move the remaining engineers to work on automation to speed up future service provisioning. This will degrade manual provisioning in the short term, but the alternative is permanently degrading provisioning by the influx of new service requests from newly hired product engineers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The second allocation policy is implicitly noted in this strategy&amp;rsquo;s diagnosis,
where it describes the allocation policy in the Engineering organization&amp;rsquo;s higher altitude strategy:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Within infrastructure engineering, there is a team of four engineers responsible for service provisioning today. While our organization is growing at a similar rate as product engineering, none of that additional headcount is being allocated directly to the team working on service provisioning. We do not anticipate this changing.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Allocation policies often create a surprising amount of clarity for the team,
and I include them in almost every policy I write either explicitly,
or implicitly in a higher altitude strategy.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Direction&lt;/strong&gt; provides explicit instruction on how a decision &lt;em&gt;must&lt;/em&gt; be made.
This is the right tool when you know where you want to go, and exactly the way
that you want to get there. Direction is appropriate for problems you understand
clearly, and you value consistency more than empowering individual judgment.&lt;/p&gt;
&lt;p&gt;Direction works well when you need an unambiguous policy that doesn&amp;rsquo;t leave room for interpretation.
For example, &lt;a href="https://craftingengstrategy.com/product-eng-strategy/"&gt;Calm&amp;rsquo;s policy for working in the monolith&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We write all code in the monolith. It has been ambiguous if new code (especially new application code) should be written in our JavaScript monolith, or if all new code must be written in a new service outside of the monolith. This is no longer ambiguous: all new code must be written in the monolith.&lt;/p&gt;
&lt;p&gt;In the rare case that there is a functional requirement that makes writing in the monolith implausible, then you should seek an exception as described below.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In that case, the team couldn&amp;rsquo;t agree on what should go into the monolith.
Individuals would often make incompatible decisions, so creating consistency required removing personal judgment from the equation.&lt;/p&gt;
&lt;p&gt;Sometimes judgment is the issue, and sometimes consistency is difficult due to misaligned incentives.
A good example of this comes in &lt;a href="https://craftingengstrategy.com/private-equity-strategy/"&gt;strategy on working with new Private Equity ownership&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We will move to an “N-1” backfill policy, where departures are backfilled with a less senior level.
We will also institute a strict maximum of one Principal Engineer per business unit.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It&amp;rsquo;s likely that hiring managers would simply ignore this backfill policy if it was stated more
softly, although sometimes less forceful policies are useful.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Guidance&lt;/strong&gt; provides a recommendation about how a decision &lt;em&gt;should&lt;/em&gt; be made.
Guidance is useful when there&amp;rsquo;s enough nuance, &lt;a href="https://lethain.com/navigating-ambiguity/"&gt;ambiguity&lt;/a&gt;, or complexity
that you &lt;em&gt;can&lt;/em&gt; explain the desired destination, but you &lt;em&gt;can&amp;rsquo;t&lt;/em&gt; mandate the path to reaching it.&lt;/p&gt;
&lt;p&gt;One example of guidance comes from the &lt;a href="https://craftingengstrategy.com/index-acquisition-strategy/"&gt;Index acquisition integration strategy&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Minimize changes to tokenization environment&lt;/strong&gt;: because point-of-sale devices directly work with customer payment details, the API that directly supports the point-of-sale device must live within our secured environment where payment details are stored.&lt;/p&gt;
&lt;p&gt;However, any other functionality must not be added to our tokenization environment.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This might read like direction, but it&amp;rsquo;s clarifying the desired outcome of avoiding unnecessary complexity
in the tokenization environment. However, it&amp;rsquo;s not able to articulate what complexity is necessary,
so ultimately it&amp;rsquo;s guidance because it requires significant judgment to interpret.&lt;/p&gt;
&lt;p&gt;A second example of guidance comes in the &lt;a href="https://craftingengstrategy.com/monolith-decomposition-strategy/"&gt;strategy on decomposing a monolithic codebase&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Merge existing services into business-unit monoliths where you can.&lt;/strong&gt; We believe that each choice to move existing services back into a monolith should be made “in the details” rather than from a top-down strategy perspective. Consequently, we generally encourage teams to wind down their existing services outside of their business unit’s monolith, but defer to teams to make the right decision for their local context.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is another case of knowing the desired outcome, but encountering too much uncertainty
to direct the team on how to get there. If you ask five engineers about whether it&amp;rsquo;s possible
to merge a given service back into a monolithic codebase, they&amp;rsquo;ll probably disagree.
That&amp;rsquo;s fine, and highlights the value of guidance: it makes it possible to make incremental progress in areas
where more concrete direction would cause confusion.&lt;/p&gt;
&lt;p&gt;When you&amp;rsquo;re working on a strategy&amp;rsquo;s policy section, it&amp;rsquo;s important to consider all of these categories.
Which feel most natural to use will vary depending on your team and role, but they&amp;rsquo;re all usable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you&amp;rsquo;re a developer productivity team, you might have to lean heavily on guidance in your policies
and increased support for that guidance within the details of your platform.&lt;/li&gt;
&lt;li&gt;If you&amp;rsquo;re an executive, you might lean heavily on direction. Indeed, you might lean &lt;em&gt;too&lt;/em&gt; heavily on direction,
where guidance often works better for areas where you understand the direction but not the path.&lt;/li&gt;
&lt;li&gt;If you&amp;rsquo;re a product engineering organization, you might have to narrow the scope of your direction
to the engineers within that organization to deal with the realities of complex cross-organization dynamics.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Finally, if you have a clear approach you want to take that doesn&amp;rsquo;t fit cleanly into any of these
categories, then don&amp;rsquo;t let this framework dissuade you. Give it a try, and adapt if it doesn&amp;rsquo;t initially work out.&lt;/p&gt;
&lt;h2 id="maintaining-strategy-altitude"&gt;Maintaining strategy altitude&lt;/h2&gt;
&lt;p&gt;The chapter on &lt;a href="https://craftingengstrategy.com/when-write-strategy/"&gt;when to write engineering strategy&lt;/a&gt;
introduced the concept of strategy altitude, which is being deliberate about where
certain kinds of policies are created within your organization.&lt;/p&gt;
&lt;p&gt;Without repeating that section in its entirety, it&amp;rsquo;s particularly relevant when
you set policy to consider how your new policies eliminate flexibility within your
organization. Consider these two somewhat opposing strategies:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://craftingengstrategy.com/stripe-sorbet-strategy/"&gt;Stripe&amp;rsquo;s Sorbet strategy&lt;/a&gt; only worked in an organization
that enforced the use of a single programming language across
(essentially) all teams&lt;/li&gt;
&lt;li&gt;&lt;a href="https://craftingengstrategy.com/project-resourcing-strategy/"&gt;Calm&amp;rsquo;s strategy for resourcing Engineering-driven projects&lt;/a&gt;
knew that resourcing had to be managed by the team directly.
Attempting to solve the problem at another level would simply result
in someone talking to the team directly to rewrite their priorities
to incorporate a new urgent project.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Stripe&amp;rsquo;s organization-altitude policy took away the freedom of individual teams
to select their preferred technology stack. In return, they unlocked the ability
to centralize investment in a powerful way.
Calm went the opposite way, recognizing that only teams were empowered to
manage the contents of their roadmap; executives were more senior, but frequently
overridden by other executives&amp;rsquo; out-of-band instructions.&lt;/p&gt;
&lt;p&gt;Both altitudes make sense. Both have consequences.&lt;/p&gt;
&lt;h2 id="criteria-for-effective-policies"&gt;Criteria for effective policies&lt;/h2&gt;
&lt;p&gt;In &lt;em&gt;&lt;a href="https://www.amazon.com/Engineering-Executives-Primer-Impactful-Leadership/dp/1098149483/"&gt;The Engineering Executive&amp;rsquo;s Primer&lt;/a&gt;&lt;/em&gt;&amp;rsquo;s
chapter on &lt;a href="https://lethain.com/eng-strategies/"&gt;engineering strategy&lt;/a&gt;, I introduced three criteria for evaluating policies.
They ought to be applicable, enforced, and create leverage. Defining those a bit:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Applicable&lt;/strong&gt;: it can be used to navigate complex, real scenarios, particularly when making tradeoffs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enforced&lt;/strong&gt;: teams will be held accountable for following the guiding policy.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Create Leverage&lt;/strong&gt;: create compounding or multiplicative impact.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The last of these three, create leverage, made sense in the context of a book about
engineering executives, but probably doesn&amp;rsquo;t make as much sense here.
Some policies certainly should create leverage
(e.g. &lt;a href="https://craftingengstrategy.com/api-deprecation-strategy/"&gt;the policy to avoid deprecating APIs&lt;/a&gt; makes other customer retention mechanisms more effective)
but others might not
(e.g. &lt;a href="https://craftingengstrategy.com/private-equity-strategy/"&gt;moving to an N-1 backfill policy&lt;/a&gt;).
Outside the executive context, what&amp;rsquo;s important isn&amp;rsquo;t necessarily creating leverage,
but that a policy solves for part of the diagnosis.&lt;/p&gt;
&lt;p&gt;That leaves the other two&amp;ndash;being applicable and enforced&amp;ndash;both of which are necessary
for a policy to actually address the diagnosis.
Any policy which you can&amp;rsquo;t determine how to apply, or aren&amp;rsquo;t willing to enforce,
simply won&amp;rsquo;t be useful.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s apply these criteria to a handful of potential policies.
First let&amp;rsquo;s think about policies we might write to improve the
talent density of our engineering team:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&amp;ldquo;We only hire world-class engineers.&amp;rdquo;&lt;/strong&gt;
This isn&amp;rsquo;t applicable, because it&amp;rsquo;s unclear what a world-class engineer means.
Because there&amp;rsquo;s no mutually agreeable definition in this policy, it&amp;rsquo;s also not consistently enforceable.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&amp;ldquo;We only hire engineers that get at least one &amp;lsquo;strong yes&amp;rsquo; in scorecards.&amp;rdquo;&lt;/strong&gt;
This is applicable, because there&amp;rsquo;s a clear definition.
This is enforceable, depending on the willingness of the organization to reject seemingly good candidates
who don&amp;rsquo;t happen to get a strong yes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Next, let&amp;rsquo;s think about a policy regarding code reuse within a codebase:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;We follow a strict Don&amp;rsquo;t Repeat Yourself policy in our codebase.&amp;rdquo;&lt;/strong&gt;
There&amp;rsquo;s room for debate within a team about whether two pieces of code are truly
duplicative, but this is generally applicable.
Because there&amp;rsquo;s room for debate, it&amp;rsquo;s a very context specific determination to decide how to enforce a decision.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;Code authors are responsible for determining if their contributions violate Don&amp;rsquo;t Repeat Yourself, and rewriting them if they do.&amp;rdquo;&lt;/strong&gt;
This is much more applicable, because now there&amp;rsquo;s only a single person&amp;rsquo;s judgment to assess the potential repetition.
In some ways, this policy is also more enforceable, because there&amp;rsquo;s no longer any ambiguity around who is deciding whether
a piece of code is a repetition.&lt;/p&gt;
&lt;p&gt;The challenge is that
enforceability now depends on one individual, and making this policy effective will require
holding individuals accountable for the quality of their judgement.
An organization that&amp;rsquo;s unwilling to distinguish between good and bad judgment
won&amp;rsquo;t get any value out of the policy.
This is a good example of how a good policy in one organization might become a poor policy in another.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you ever find yourself wanting to include a policy that for some reason
either can&amp;rsquo;t be applied or can&amp;rsquo;t be enforced, stop to ask yourself what you&amp;rsquo;re
trying to accomplish and ponder if there&amp;rsquo;s a different policy that might be better suited to that goal.&lt;/p&gt;
&lt;h2 id="developing-novel-policies"&gt;Developing novel policies&lt;/h2&gt;
&lt;p&gt;My experience is that there are vanishingly few truly novel policies
to write. There&amp;rsquo;s almost always someone else has already done something similar to your intended approach.
&lt;a href="https://craftingengstrategy.com/product-eng-strategy/"&gt;Calm&amp;rsquo;s engineering strategy&lt;/a&gt; is such a case:
the details are particular to the company, but the general approach is common across the industry.&lt;/p&gt;
&lt;p&gt;The most likely place to find truly novel policies is during the adoption phase of a new widespread technology,
such as the rise of ubiquitous mobile phones, cloud computing, or large language models.
Even then, as explored in &lt;a href="https://craftingengstrategy.com/llm-adoption-strategy/"&gt;the strategy for adopting large-language models&lt;/a&gt;,
the new technology can be engaged with as a generic technology:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Develop an LLM-backed process for reactivating departed and suspended drivers in mature markets.&lt;/strong&gt; Through modeling our driver lifecycle, we determined that improving onboarding time will have little impact on the total number of active drivers. Instead, we are focusing on mechanisms to reactivate departed and suspended drivers, which is the only opportunity to meaningfully impact active drivers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You could simply replace &amp;ldquo;LLM&amp;rdquo; with &amp;ldquo;data-driven&amp;rdquo; and it would be equally readable.
In this way, policy can generally sidestep areas of uncertainty by being a bit abstract.
This avoids being overly specific about topics you simply don&amp;rsquo;t know much about.&lt;/p&gt;
&lt;p&gt;However, even if your policy isn&amp;rsquo;t novel to the industry,
it might still be novel to you or your organization.
The steps that I&amp;rsquo;ve found useful to debug novel policies are the same steps as running a condensed version
of the strategy process, with a focus on exploration and refinement:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Collect a number of &lt;em&gt;similar&lt;/em&gt; policies, with a focus on how
those policies differ from the policy you are creating&lt;/li&gt;
&lt;li&gt;Create a &lt;a href="https://craftingengstrategy.com/systems-modeling/"&gt;systems model&lt;/a&gt; to
articulate how this policy will work,
and also how it will differ from the similar policies you&amp;rsquo;re considering&lt;/li&gt;
&lt;li&gt;Run a &lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt; cycle
for your proto-policy to discover any unknown-unknowns about how it
works in practice&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Whether you run into this scenario is largely a function of the extent
of your, and your organization&amp;rsquo;s, experience. Early in my career, I found
myself doing novel (for me) strategy work very frequently, and these days
I rarely find myself doing novel work, instead focusing on adaptation
of well-known policies to new circumstances.&lt;/p&gt;
&lt;h2 id="are-competing-policy-proposals-an-anti-pattern"&gt;Are competing policy proposals an anti-pattern?&lt;/h2&gt;
&lt;p&gt;When creating policy, you&amp;rsquo;ll often have to engage with the question of
whether you should develop one preferred policy or a series of potential strategies to pick from.
Developing these is a useful stage of setting policy, but rather than
helping you refine your policy, I&amp;rsquo;d encourage you to think of this as
exposing gaps in your diagnosis.&lt;/p&gt;
&lt;p&gt;For example, &lt;a href="https://craftingengstrategy.com/stripe-sorbet-strategy/"&gt;when Stripe developed the Sorbet ruby-typing tooling&lt;/a&gt;,
there was debate between two policies:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Should we build a ruby-typing tool to allow a centralized team to gradually
migrate the company to a typed codebase?&lt;/li&gt;
&lt;li&gt;Should we migrate the codebase to a preexisting strongly typed language like Golang
or Java?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These were, initially, equally valid hypotheses. It was only by clarifying our diagnosis
around resourcing that it became clear that incurring the bulk of costs
in a centralized team was clearly preferable to spreading the costs across many teams.
Specifically, recognizing that we wanted to prioritize short-term product engineering velocity,
even if it led to a longer migration overall.&lt;/p&gt;
&lt;p&gt;If you do develop multiple policy options, I encourage you to move the alternatives
into an appendix rather than &lt;a href="https://craftingengstrategy.com/readable-strategy/"&gt;including them in the core of your strategy document&lt;/a&gt;.
This will make it easier for readers of your final version to understand how to follow your policies,
and they are the most important long-term user of your written strategy.&lt;/p&gt;
&lt;h2 id="recognizing-constraints"&gt;Recognizing constraints&lt;/h2&gt;
&lt;p&gt;A similar problem to competing solutions is developing a policy that you cannot possibly fund.
It&amp;rsquo;s easy to get enamored with policies that you can&amp;rsquo;t meaningfully enforce,
but that&amp;rsquo;s bad policy, even if it would work in an alternate universe where it
was possible to enforce or resource it.&lt;/p&gt;
&lt;p&gt;To consider a few examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://craftingengstrategy.com/user-data-strategy/"&gt;strategy for controlling access to user data&lt;/a&gt; might have proposed
requiring manual approval by a second party of every access
to customer data. However, that would have gone nowhere.&lt;/li&gt;
&lt;li&gt;Our &lt;a href="https://craftingengstrategy.com/uber-strategy/"&gt;approach to Uber&amp;rsquo;s service migration&lt;/a&gt;
might have required more staffing for the infrastructure engineering team,
but we knew that wasn&amp;rsquo;t going to happen, so it was a meaningless policy proposal to make.&lt;/li&gt;
&lt;li&gt;The strategy for &lt;a href="https://craftingengstrategy.com/private-equity-strategy/"&gt;navigating private equity ownership&lt;/a&gt; might
have argued that new ownership should not hold engineering accountable to a new standard
on spending. But they would have just invalidated that strategy in the next financial planning period.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you find a policy that contemplates an impractical approach,
it doesn&amp;rsquo;t &lt;em&gt;only&lt;/em&gt; indicate that the policy is a poor one,
it also suggests your policy is missing an important pillar.
Rather than debating the policy options, the fastest path to
resolution is to align on the diagnosis that would invalidate
potential paths forward.&lt;/p&gt;
&lt;p&gt;In cases where aligning on the diagnosis isn&amp;rsquo;t possible,
for example because you simply don&amp;rsquo;t understand the possibilities
of a new technology as encountered in the &lt;a href="https://craftingengstrategy.com/llm-adoption-strategy/"&gt;strategy for adopting LLMs&lt;/a&gt;,
then you&amp;rsquo;ve typically found a valuable opportunity to use &lt;a href="https://craftingengstrategy.com/refine/"&gt;strategy refinement&lt;/a&gt;
to build alignment.&lt;/p&gt;
&lt;h2 id="dealing-with-missing-strategies"&gt;Dealing with missing strategies&lt;/h2&gt;
&lt;p&gt;At a recent company offsite, we were debating which policies we might adopt to deal
with annual plans that kept getting derailed after less than a month.
Someone remarked that this would be much easier if we could get the executive team to
commit to a clearer, written strategy about which business units we were prioritizing.&lt;/p&gt;
&lt;p&gt;They were, of course, right. It would be much easier. Unfortunately,
it goes back to the problem we discussed in the &lt;a href="https://craftingengstrategy.com/diagnosis/"&gt;diagnosis chapter&lt;/a&gt;
about reframing blockers into diagnosis. If a strategy from the company or a peer function is missing,
the empowering thing to do is to include the absence in your diagnosis and move forward.&lt;/p&gt;
&lt;p&gt;Sometimes, even when you do this, it&amp;rsquo;s easy to fall back into the belief that you cannot set a policy
because a peer function might set a conflicting policy in the future.
Whether you&amp;rsquo;re an executive or an engineer, you&amp;rsquo;ll never have the details you want to make
the ideal policy. Meaningful leadership requires taking meaningful risks,
which is never something that gets comfortable.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;After working through this chapter,
you know how to develop policy, how to assemble policies to solve your diagnosis,
and how to avoid a number of the frequent challenges that policy writers encounter.
At this point, there&amp;rsquo;s only one phase of strategy left to dig into,
&lt;a href="https://craftingengstrategy.com/operations/"&gt;operating the policies you&amp;rsquo;ve created&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Who gets to do strategy?</title><link>https://craftingengstrategy.com/who-does-strategy/</link><pubDate>Thu, 06 Mar 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/who-does-strategy/</guid><description>&lt;p&gt;If you talk to enough aspiring leaders, you&amp;rsquo;ll become familiar with
the prevalent idea that they need to be promoted before they can
work on strategy.
It&amp;rsquo;s widely accepted as true, but I&amp;rsquo;ve found this idea fundamentally incorrect:
you can work on strategy from anywhere in an organization.
It just requires different tactics to do so.&lt;/p&gt;
&lt;p&gt;Both &lt;em&gt;Staff Engineer&lt;/em&gt; and &lt;em&gt;The Engineering Executive&amp;rsquo;s Primer&lt;/em&gt; have chapters
on strategy. While the chapters&amp;rsquo; contents are quite different, both present
a practical path to advancing your organization&amp;rsquo;s thinking about complex topics.
This chapter explains my belief that
&lt;em&gt;anyone&lt;/em&gt; within an organization can make meaningful progress on strategy,
particularly if you are honest about the tools accessible to you
and thoughtful about how to use them.&lt;/p&gt;
&lt;p&gt;The themes we&amp;rsquo;ll dig into are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How to do strategy as an engineer, particularly an engineer who hasn&amp;rsquo;t
been given explicit authority to do strategy&lt;/li&gt;
&lt;li&gt;Doing strategy as an engineering executive who is responsible for your
organization&amp;rsquo;s decision-making&lt;/li&gt;
&lt;li&gt;How you can develop engineering strategy even in difficult situations,
such as when there&amp;rsquo;s no existing strategy,
when acknowledging certain problems is politically sensitive,
or when misaligned incentives make consensus challenging&lt;/li&gt;
&lt;li&gt;If this book&amp;rsquo;s argument is that everyone should do strategy,
is there anyone who, nonetheless, really should not do strategy?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By the end, you&amp;rsquo;ll hopefully agree that engineering strategy is accessible to
everyone, even though you&amp;rsquo;re always operating within constraints.&lt;/p&gt;
&lt;h2 id="doing-strategy-as-an-engineer"&gt;Doing strategy as an engineer&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s easy to get so distracted by an executive&amp;rsquo;s top-down approach to
strategy that you convince yourself that there aren&amp;rsquo;t other approachable
mechanisms to doing strategy. There are!&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Staff Engineer&lt;/em&gt; introduces an approach I call &lt;a href="https://staffeng.com/guides/engineering-strategy/"&gt;Take five, then synthesize&lt;/a&gt;,
which does strategy by:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Documenting how five current and historical related decisions have been made in your organization.
This is an extended exploration phase&lt;/li&gt;
&lt;li&gt;Synthesizing those five documents into a diagnosis and policy.
You are naming the implicit strategy,
so it&amp;rsquo;s impossible for someone to reasonably argue you&amp;rsquo;re not empowered
to do strategy: you&amp;rsquo;re just describing what&amp;rsquo;s already happening&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At that point, either the organization feels comfortable with what you&amp;rsquo;ve written&amp;ndash;which is their current strategy&amp;ndash;or
it doesn&amp;rsquo;t in which case you&amp;rsquo;ve forced a conversation about how to revise the approach.
Creating awareness is often enough to drive strategic change,
and doing so doesn&amp;rsquo;t require any explicit authorization from an executive to do.&lt;/p&gt;
&lt;p&gt;When awareness is insufficient, the other pattern I&amp;rsquo;ve found highly effective
in low-authority scenarios is an approach I wrote about in
&lt;em&gt;An Elegant Puzzle&lt;/em&gt;, and call &lt;a href="https://lethain.com/model-document-share/"&gt;model, document, and share&lt;/a&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Model the approach you want others to adopt.
Make it easy for them to observe how you&amp;rsquo;ve changed the way you&amp;rsquo;re doing things.&lt;/li&gt;
&lt;li&gt;Document the approach, the thinking behind it, and how to adopt it.&lt;/li&gt;
&lt;li&gt;Share the document around. If people see you succeeding with the approach,
then they&amp;rsquo;re likely to copy it from you.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;You might be skeptical because this is an influence-based approach.
However, as we&amp;rsquo;ll discuss in the next section, even an executive-driven
strategy is highly dependent on influence.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;&lt;strong&gt;Strategy archaeology&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Vernor Vinge&amp;rsquo;s &lt;em&gt;&lt;a href="https://en.wikipedia.org/wiki/A_Deepness_in_the_Sky"&gt;A Deepness in the Sky&lt;/a&gt;&lt;/em&gt;,
published in 1999, introduced the term software archaeologists,
folks who created functionality by cobbling together millennia
of scraps of existing software.&lt;/p&gt;
&lt;p&gt;Although it&amp;rsquo;s a somewhat different usage, I sometimes think of the &amp;ldquo;take five, then synthesize&amp;rdquo; approach as
performing strategy archaeology. Simply by recording what has happened in the past,
we make it easier to understand the present, and influence the future.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="doing-strategy-as-an-executive"&gt;Doing strategy as an executive&lt;/h2&gt;
&lt;p&gt;The biggest misconception about executive roles, frequently held by
non-executives and new executives who are about to make a series of regrettable
mistakes, is that executives operate without constraints.
That is false: executives have an extremely high number of constraints that
they operate under.
Executives have budgets, CEO visions, peers to satisfy, and a team to motivate.
They can disappoint any of these temporarily, but in long-term they have to satisfy
all of them.&lt;/p&gt;
&lt;p&gt;Nonetheless, it is true that executives have more latitude to mandate
and cajole participation in the strategies that they sponsor.
&lt;em&gt;The Engineering Executive&amp;rsquo;s Primer&lt;/em&gt;&amp;rsquo;s &lt;a href="https://lethain.com/eng-strategies/"&gt;chapter on strategy&lt;/a&gt;
is a brief summary of this entire book, but it doesn&amp;rsquo;t say much about
how executive strategy differs from non-executive strategy.&lt;/p&gt;
&lt;p&gt;How the executive&amp;rsquo;s approach to strategy differs from the engineer&amp;rsquo;s
can be boiled down to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Executives can mandate adherence to their strategy, which empowers their policy options.
An engineer can&amp;rsquo;t prevent the promotion of someone who refused to follow their policy, but an executive can.&lt;/p&gt;
&lt;p&gt;Mandates only matter if there are consequences. If an executive is unwilling to
enforce consequences for non-compliance with a mandate, the ability to issue a
mandate isn&amp;rsquo;t meaningful.&lt;/p&gt;
&lt;p&gt;This is also true if they &lt;em&gt;can&amp;rsquo;t&lt;/em&gt; enforce a mandate because of lack of support from
their peer executives.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Even if an executive is unwilling to use mandates, they have significant visibility
and access to their organization to advocate for their preferred strategy.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Neither access nor mandates improve an executive&amp;rsquo;s ability to diagnose problems.
However, both often create the appearance of progress.
This is why executive strategies can fail so spectacularly and endure so long despite failure.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As a result, my experience is that executives have an easier time doing strategy,
but a much harder time learning how to do strategy well,
and fewer protections to avoid serious mistakes.
Further, the consequences of an executive&amp;rsquo;s poor strategy tend to be
much further reaching than an engineer&amp;rsquo;s.
Waiting to do strategy until you are an executive is a recipe for disaster,
even if it looks easier from a distance.&lt;/p&gt;
&lt;h2 id="doing-strategy-in-other-roles"&gt;Doing strategy in other roles&lt;/h2&gt;
&lt;p&gt;Even if you&amp;rsquo;re neither an engineer nor an engineering executive,
you can still do engineering strategy.
It&amp;rsquo;ll just require an even more influence-driven approach.&lt;/p&gt;
&lt;p&gt;The engineering organization is generally right to believe that they know
the most about engineering, but that&amp;rsquo;s not always true.
Sometimes a product manager used to be an engineer and has
significant relevant experience.
Other times, such as the &lt;a href="https://craftingengstrategy.com/llm-adoption-strategy/"&gt;early adoption of large language models&lt;/a&gt;,
engineers don&amp;rsquo;t know much either, and benefit from outside perspectives.&lt;/p&gt;
&lt;h2 id="doing-strategy-in-challenging-environments"&gt;Doing strategy in challenging environments&lt;/h2&gt;
&lt;p&gt;Good strategies succeed by accurately diagnosing circumstances and picking policies that address those circumstances.
You are likely to spend time in organizations where both of those are challenging due to internal limitations,
so it&amp;rsquo;s worth acknowledging that and discussing how to navigate those challenges.&lt;/p&gt;
&lt;h3 id="low-trust-environment"&gt;Low-trust environment&lt;/h3&gt;
&lt;p&gt;Sometimes the struggle to diagnose problems is a skill issue.
Being bad at strategy is in some ways the easy problem to solve: just do more strategy work to build expertise.
In other cases, you may see what the problems are fairly clearly, but not know how to acknowledge the problems
because your organization&amp;rsquo;s culture would frown on it.
The latter is a diagnosis problem rooted in low-trust, and does make things more difficult.&lt;/p&gt;
&lt;p&gt;The chapter on &lt;a href="https://craftingengstrategy.com/diagnosis/"&gt;Diagnosis&lt;/a&gt; recognizes this problem,
and admits that sometimes you have to whisper the controversial parts of a strategy:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;When you’re writing a strategy, you’ll often find yourself trying to choose between two awkward options:
say something awkward or uncomfortable about your company or someone working within it, or
omit a critical piece of your diagnosis that’s necessary to understand the wider thinking.
Whenever you encounter this sort of debate, my advice is to find a way to include the diagnosis, but to reframe it into a palatable statement that avoids casting blame too narrowly.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In short, the solution to low-trust is to translate difficult messages into
softer, less direct versions that are acceptable to state.
If your goal is to hold people accountable, this can feel dishonest or
like a ethical compromise, but the goal of strategy is to make better decisions,
which is an entirely different concern than holding folks accountable for the past.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;&lt;strong&gt;Karpman Drama Triangle&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Sometimes when the diagnosis seems particularly obvious,
and people don&amp;rsquo;t agree with you,
it&amp;rsquo;s because you are wrong.
When I&amp;rsquo;ve been obviously wrong about things I understand well,
it&amp;rsquo;s usually because I&amp;rsquo;ve fallen into viewing a situation through the
&lt;a href="https://en.wikipedia.org/wiki/Karpman_drama_triangle"&gt;Karpman Drama Triangle&lt;/a&gt;,
where all parties are mapped onto roles as persecutor, rescuer, or victim.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="poor-judgment-environment"&gt;Poor-judgment environment&lt;/h2&gt;
&lt;p&gt;Even when you do an excellent job diagnosing challenges,
it can be difficult to drive agreement within the organization
about how to address them.
Sometimes this is due to genuinely complex tradeoffs,
for example in &lt;a href="https://craftingengstrategy.com/index-acquisition-strategy/"&gt;Stripe&amp;rsquo;s acquisition of Index&lt;/a&gt;,
there was debate about how to deal with Index&amp;rsquo;s Java-based technology stack,
which culminated in a compromise that didn&amp;rsquo;t make anyone particularly happy:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Defer making a decision regarding the introduction of Java to a later date: the introduction of Java is incompatible with our existing engineering strategy, but at this point we’ve also been unable to align stakeholders on how to address this decision. Further, we see attempting to address this issue as a distraction from our timely goal of launching a joint product within six months.&lt;/p&gt;
&lt;p&gt;We will take up this discussion after launching the initial release.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That compromise is a good example of a difficult tradeoff:
although parties disagreed with the approach, everyone understood
the conflicting priorities that had to be addressed.&lt;/p&gt;
&lt;p&gt;In other cases, though, there are policy choices that simply don&amp;rsquo;t
make much sense, generally driven by poor judgment in your organization.
Sometimes it&amp;rsquo;s not poor technical judgment, but poor judgment in choosing
to prioritize one&amp;rsquo;s personal interests at the expense of the company&amp;rsquo;s needs.
Calm&amp;rsquo;s strategy to &lt;a href="https://craftingengstrategy.com/product-eng-strategy/"&gt;focus on being a product-engineering organization&lt;/a&gt;
dealt with some aspects of that, acknowledged in its diagnosis:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We’re arguing a particularly large amount about adopting new technologies and rewrites. Most of our disagreements stem around adopting new technologies or rewriting existing components into new technology stacks. For example, can we extend this feature or do we have to migrate it to a service before extending it? Can we add this to our database or should we move it into a new Redis cache instead? Is JavaScript a sufficient programming language, or do we need to rewrite this functionality in Go?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In that situation, your strategy is an attempt to educate your colleagues
about the tradeoffs they are making, but ultimately sometimes folks will disagree with your
strategy.
In that case, remember that most interesting problems require iterative solutions.
Writing your strategy and sharing it will start to change the organization&amp;rsquo;s mind.
Don’t get discouraged even if that change is initially slow.&lt;/p&gt;
&lt;h3 id="dealing-with-missing-strategies"&gt;Dealing with missing strategies&lt;/h3&gt;
&lt;p&gt;The strategy for &lt;a href="https://craftingengstrategy.com/private-equity-strategy/"&gt;dealing with new private equity ownership&lt;/a&gt;
introduces a common problem: lack of clarity about what other parts of your own company
want. In that case, it seems likely there will be a layoff, but it&amp;rsquo;s unclear how large
that layoff will be:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Based on general practice, it seems likely that our new Private Equity ownership will expect us to reduce R&amp;amp;D headcount costs through a reduction.
However, we don’t have any concrete details to make a structured decision on this, and our approach would vary significantly depending on the size of the reduction.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Many leaders encounter that sort of ambiguity and decide that they cannot move forward with
a strategy of their own until that decision is made.
While it&amp;rsquo;s true that it&amp;rsquo;s inconvenient not to know the details,
getting blocked by ambiguity is &lt;em&gt;always&lt;/em&gt; the wrong decision.&lt;/p&gt;
&lt;p&gt;Instead you should do what the private equity strategy does: accept that ambiguity
as a fact to be worked around. Rather than giving up, it adopts a series of new policies
to start reducing cost growth by changing their &lt;a href="https://craftingengstrategy.com/private-equity-model/"&gt;organization&amp;rsquo;s seniority mix&lt;/a&gt;,
and recognizes that once there is clarity on reduction targets that there will be additional actions to be taken.&lt;/p&gt;
&lt;p&gt;Whenever you&amp;rsquo;re working on challenging problems, you can always find many reasons to justify not making progress.
Leadership is finding a way to move forward despite those issues. A missing strategy is always part of your
diagnosis, but never a reason that you can&amp;rsquo;t do strategy.&lt;/p&gt;
&lt;h2 id="who-shouldnt-do-strategy"&gt;Who shouldn&amp;rsquo;t do strategy&lt;/h2&gt;
&lt;p&gt;In my experience, there&amp;rsquo;s almost never a reason why &lt;em&gt;you&lt;/em&gt; cannot do strategy,
but there are two particular scenarios where doing strategy probably doesn&amp;rsquo;t make sense.
The first is not a who, but a &lt;a href="https://craftingengstrategy.com/when-write-strategy/"&gt;when problem&lt;/a&gt;:
sometimes there is so much strategy already happening, that doing more is a distraction.
If another part of your organization is already working on the same problem,
do your best to work with them directly rather than generating competing work.&lt;/p&gt;
&lt;p&gt;The other time to avoid strategy is when you&amp;rsquo;re trying to satisfy an emotional need to make a direct, immediate impact.
Sharing a thoughtful strategy always drives progress, though it&amp;rsquo;s often the slow, incremental progress of changing your organization&amp;rsquo;s beliefs.
Even definitive, top-down strategies from executives are often ignored in pockets of an organization,
and bottoms-up strategy spread slowly as they are modeled, documented and shared.
Embarking on strategy work requires a tolerance for winning in the long-run,
even when there&amp;rsquo;s little progress this week or this quarter.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;As you finish reading this chapter, my hope is that you also believe
that you can work on strategy in your organization, whether you&amp;rsquo;re
an engineer or an executive.
I also hope that you appreciate that the tools you use vary greatly
depending on who you are within your organization and the culture in which you work.
Whether you need to model or can mandate, there&amp;rsquo;s a mechanism
that will work for you.&lt;/p&gt;</description></item><item><title>Introduction</title><link>https://craftingengstrategy.com/introduction/</link><pubDate>Thu, 06 Mar 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/introduction/</guid><description/></item><item><title>How to integrate Stripe's acquisition of Index? (2018)</title><link>https://craftingengstrategy.com/index-acquisition-strategy/</link><pubDate>Thu, 27 Feb 2025 06:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/index-acquisition-strategy/</guid><description>&lt;p&gt;Discussions around acquisitions often focus on
&lt;a href="https://lethain.com/engineering-in-mergers-and-acquisition/"&gt;technical diligence&lt;/a&gt; and
deciding whether to make the acquisition.
However, the integration that follows afterwards can be even more complex.
There are few irreversible trapdoor decisions in engineering,
but decisions made early in an integration tend to be surprisingly durable.&lt;/p&gt;
&lt;p&gt;This engineering strategy explores Stripe&amp;rsquo;s approach to integrating
&lt;a href="https://www.pymnts.com/news/partnerships-acquisitions/2018/stripe-pos-software-startup-index-acquisition/"&gt;their 2018 acquisition of Index&lt;/a&gt;.
While a business book would focus on the rationale for the acquisition itself,
here that rationale is merely part of the diagnosis that defines
the integration tradeoffs. The integration itself is the area of focus.&lt;/p&gt;
&lt;p&gt;Like most acquisitions, the team responsible for the integration has
only learned about the project after the deal closed,
which means early efforts are
a scramble to apply &lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt;
to distinguish between optimistic dates and technical realities.&lt;/p&gt;
&lt;h2 id="reading-this-document"&gt;Reading this document&lt;/h2&gt;
&lt;p&gt;To apply this strategy, start at the top with &lt;em&gt;Policy &amp;amp; Operation&lt;/em&gt;. To understand the thinking behind this strategy, read sections in reverse order, starting with &lt;em&gt;Explore&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;More detail on this structure in &lt;a href="https://lethain.com/readable-engineering-strategy-documents"&gt;Making a readable Engineering Strategy document&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="policy--operation"&gt;Policy &amp;amp; Operation&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;re starting with little shared context between the acquired and acquiring engineering
teams, and have a six month timeline to launch a joint product.
So our starting policy is a mix of a commitment to joint refinement and several
provisional architectural policies:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Meet at least weekly until the initial release is complete&lt;/strong&gt;:
the involved leadership from Stripe and Index will hold a weekly sync meeting
to refine our approach until we fulfill our initial release timeline.&lt;/p&gt;
&lt;p&gt;This meeting is jointly owned by Stripe&amp;rsquo;s Head of Traffic Engineering and
Index&amp;rsquo;s Head of Engineering.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Minimize changes to tokenization environment&lt;/strong&gt;: because point-of-sale devices directly work with
customer payment details, the API that directly supports the point-of-sale device
must live within our secured environment where payment details are stored.&lt;/p&gt;
&lt;p&gt;However, any other functionality &lt;em&gt;must not&lt;/em&gt; be added to our tokenization environment.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;All other functionality must exist in standard environments&lt;/strong&gt;: except for the minimum necessary
functionality moving into the tokenization environment, everything else must be operated in
our standard, non-tokenization environments.
In particular, any software that requires frequent changes, or introduces complex external dependencies,
should exist in the standard environments.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Defer making a decision regarding the introduction of Java to a later date&lt;/strong&gt;: the introduction of Java is incompatible
with our existing engineering strategy, but at this point we&amp;rsquo;ve also been unable to align stakeholders on
how to address this decision. Further, we see attempting to address this issue as a distraction
from our timely goal of launching a joint product within six months.&lt;/p&gt;
&lt;p&gt;We will take up this discussion after launching the initial release.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Escalations come to paired leads&lt;/strong&gt;: given our limited shared context across teams,
all escalations must come to both Stripe&amp;rsquo;s Head of Traffic Engineering and
Index&amp;rsquo;s Head of Engineering.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Security review of changes impacting tokenization environment&lt;/strong&gt;: we need to move quickly to launch
the combined point-of-sale and payments product, but we &lt;em&gt;must not&lt;/em&gt; cut corners on
security to launch faster. Security must be included and explicitly sign off
on any integration decisions that involve our tokenization environment&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="diagnose"&gt;Diagnose&lt;/h2&gt;
&lt;p&gt;There are generally four categories of acquisitions: talent acquisitions to bring on a talented team,
business acquisitions to buy a company&amp;rsquo;s revenue and product, technology acquisitions to add a differentiated
capability that would be challenging to develop internally, and time-to-market acquisitions where you
could develop the capability internally but can develop it meaningfully faster by acquiring a company.&lt;/p&gt;
&lt;p&gt;While most acquisitions have a flavor of several of these dimensions, this acquisition
is primarily a time-to-market acquisition aimed at addressing these constraints:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Several of our largest customers are pushing for us to provide a point-of-sale device integrated
with our API-driven payments ecosystem. At least one has implied that we either provide this
functionality on a committed timeline or they may churn to a competitor.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We currently have no homegrown expertise in developing or integrating with hardware such
as point-of-sale devices.
Based on other zero-to-one efforts internally, we believe it would take about a year to
hire the team, develop and launch a minimum-viable product for a point-of-sale device integrated into our platform.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Where we&amp;rsquo;ve taken a horizontal approach to supporting web payments via an API,
at least one of our competitors, Square, has taken a vertically integrated approach.
While their API ecosystem is less developed than ours, they are a plausible destination
for customers threatening to churn.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We believe that at least one of our enterprise customers will churn if our best commitment is
launching a point-of-sale solution 12 months from now.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We&amp;rsquo;ve decided to acquire a small point-of-sale startup, which we will use to commit
to a six month timeframe for supporting an integrated point-of-sale device with
our API ecosystem.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We will need to rapidly integrate the acquired startup to meet this timeline.
We only know a small number of details about what this will entail.
We &lt;em&gt;do&lt;/em&gt; know that point-of-sale devices directly operate on payment details
(e.g. the point-of-sale device knows the credit card details of the card it reads).&lt;/p&gt;
&lt;p&gt;Our compliance obligations restrict such activity to our &amp;ldquo;tokenization environment&amp;rdquo;,
a highly secured and isolated environment with direct access to payment details.
This environment converts payment details into a unique token that other environments
can utilize to operate against payment details without the compliance overhead of
having direct access to the underlying payment details.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Going into this technical integration, we have few details about the acquired company&amp;rsquo;s
technology stack. We do know that they are primarily a Java shop running on AWS, whereas
we are primarily a Ruby (with some Go) shop running on AWS.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="explore"&gt;Explore&lt;/h2&gt;
&lt;p&gt;Prior to this acquisition, we have done several small acquisitions.
None of those acquisitions had a meaningful product to integrate with ours,
so we don&amp;rsquo;t have much of an internal playbook to anchor our approach in.&lt;/p&gt;
&lt;p&gt;We do have limited experience in integrating technical acquisitions from
prior companies we&amp;rsquo;ve worked in, along with talking to peers at other
companies to mine their experience.
Synthesizing those experiences, some recurring patterns are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Usually deal teams have made certain commitments,
or the acquired team has understood certain commitments,
that will be challenging to facilitate.
This is doubly true when you are unaware of what those commitments might be.&lt;/p&gt;
&lt;p&gt;If folks seem to be behaving oddly, it might be one such misunderstanding,
and it&amp;rsquo;s worth engaging directly to debug the confusion.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There should be an executive sponsor for the acquisition,
and the sponsor is typically the best person to ask about the company&amp;rsquo;s intentions.
If you can&amp;rsquo;t find the executive sponsor, or they are not engaged,
try to recruit a new executive sponsor rather than trying to make
things work without one.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Close the culture gap quickly where there&amp;rsquo;s little friction,
and cautiously where there&amp;rsquo;s little trust.&lt;/p&gt;
&lt;p&gt;We do need to bring the acquired company into our culture,
but we have years to do that. The most successful stories of doing this
leaned on a mix of moving folks into and out of the acquired team rather than
applying force.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The long-term cost of supporting a new technology stack is
high, and in conflict with our technology strategy of consolidating on
as few programming languages as possible.&lt;/p&gt;
&lt;p&gt;This is not the place to be flexible, as each additional feature
in the new stack will
take you further from your desired outcome.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Finally, find a way to derisk key departures. Things can go wrong
quickly. One of the easiest starting points is consolidating
infrastructure immediately, even if the product or software takes
longer.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Altogether, this was not the most reassuring exploration: it was a bit
abstract, and much of our research returned strongly-held, conflicting perspectives.
Perhaps acquisitions, like starting a new company, are among those places
where there&amp;rsquo;s simply no right way to do it well.&lt;/p&gt;</description></item><item><title>Diagnosis</title><link>https://craftingengstrategy.com/diagnosis/</link><pubDate>Sat, 22 Feb 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/diagnosis/</guid><description>&lt;p&gt;Once you&amp;rsquo;ve written your &lt;a href="https://craftingengstrategy.com/explore/"&gt;strategy&amp;rsquo;s exploration&lt;/a&gt;,
the next step is working on its diagnosis.
Diagnosis is understanding the constraints and challenges your strategy needs to address.
In particular, it’s about slowing yourself down from jumping to solutions
before fully understanding the nuances and constraints of the problem.&lt;/p&gt;
&lt;p&gt;If you ever find yourself wanting to skip the diagnosis phase&amp;ndash;let&amp;rsquo;s get to the solution already!&amp;ndash;then
maybe it&amp;rsquo;s worth acknowledging that every strategy that I&amp;rsquo;ve seen fail, did so due to a lazy or inaccurate diagnoses.
It&amp;rsquo;s very challenging to fail with a proper diagnosis, and almost impossible to succeed without one.&lt;/p&gt;
&lt;p&gt;The topics this chapter will cover are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why diagnosis forms the foundation of an effective strategy, and how effective policies depend upon it.
Conversely, how skipping the diagnosis phase consistently ruins strategies&lt;/li&gt;
&lt;li&gt;A step-by-step approach to diagnosing your strategy&amp;rsquo;s circumstances&lt;/li&gt;
&lt;li&gt;How to incorporate data into your diagnosis effectively,
and where to focus on adding data&lt;/li&gt;
&lt;li&gt;Dealing with controversial elements of your diagnosis,
such as pointing out that your own executive is one
of the challenges to solve&lt;/li&gt;
&lt;li&gt;Why it&amp;rsquo;s more effective to view difficulties as part
of the problem to be solved, rather than a blocking
issue that prevents making forward progress&lt;/li&gt;
&lt;li&gt;The near impossibility of an effective diagnosis
if you don&amp;rsquo;t bring humility and self-awareness
to the process&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Into the details we go!&lt;/p&gt;
&lt;h2 id="diagnosis-is-strategys-foundation"&gt;Diagnosis is strategy&amp;rsquo;s foundation&lt;/h2&gt;
&lt;p&gt;One of the challenges in evaluating strategy is that, after the fact,
many effective strategies are so obvious that they&amp;rsquo;re pretty boring.
Similarly, most ineffective strategies are so clearly flawed that their
authors look lazy.
That&amp;rsquo;s because, as a strategy is operated, the reality around it becomes clear.
When you&amp;rsquo;re writing your strategy, you don&amp;rsquo;t know if you can convince your colleagues
to adopt a new approach to specifying APIs, but a year later you know very definitively
whether it&amp;rsquo;s possible.&lt;/p&gt;
&lt;p&gt;Building your strategy&amp;rsquo;s diagnosis is your attempt to correctly recognize the context that
the strategy needs to solve before deciding on the policies to address that context.
Done well, the subsequent steps of writing strategy often feel like an afterthought,
which is why I think of diagnosis as strategy&amp;rsquo;s foundation.&lt;/p&gt;
&lt;p&gt;Where &lt;a href="https://craftingengstrategy.com/explore/"&gt;exploration&lt;/a&gt; was an evaluation-free activity,
diagnosis is all about evaluation. How do teams feel today? Why did that project fail?
Why did the last strategy go poorly? What will be the distractions to overcome to make
this new strategy successful?&lt;/p&gt;
&lt;p&gt;That said, not all evaluation is equal. If you state your judgment directly, it&amp;rsquo;s easy to dispute.
An effective diagnosis is hard to argue against, because it&amp;rsquo;s
a web of interconnected observations, facts, and data.
Even for folks who dislike your conclusions, the weight of evidence
should be hard to shift.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;&lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;Strategy testing&lt;/a&gt;, explored in the Refinement section,
takes advantage of the reality that it&amp;rsquo;s easier to diagnose by doing than by speculating.
It proposes a recursive diagnosis process until you have real-world evidence that the strategy is working.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="how-to-develop-your-diagnosis"&gt;How to develop your diagnosis&lt;/h2&gt;
&lt;p&gt;Your strategy is almost certain to fail unless you start from an effective diagnosis,
but how to build a diagnosis is often left unspecified.
That&amp;rsquo;s because, for most folks, building the diagnosis is indeed a dark art: unspecified,
undiscussed, and uncontrollable.
I&amp;rsquo;ve been guilty of this as well; &lt;em&gt;The Engineering Executive&amp;rsquo;s Primer&lt;/em&gt;&amp;rsquo;s
&lt;a href="https://lethain.com/eng-strategies/"&gt;chapter on strategy&lt;/a&gt; is notably silent on how to perform diagnosis.&lt;/p&gt;
&lt;p&gt;So, yes, there is some truth to the idea that forming your diagnosis is an emergent, organic process rather
than a structured, mechanical one.
However, over time I&amp;rsquo;ve come to adopt a fairly structured approach:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Braindump&lt;/strong&gt;, starting from a blank sheet of paper, write down your best understanding of the circumstances that
inform your current strategy. Then set that piece of paper aside for the moment.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Summarize exploration&lt;/strong&gt; on a new piece of paper, review the contents of your &lt;a href="https://craftingengstrategy.com/explore/"&gt;exploration&lt;/a&gt;.
Pull in every piece of diagnosis from similar situations that resonates with you.
This is true for both internal and external works!
For each diagnosis, tag whether it fits perfectly, or needs to be adjusted for your current circumstances.
Then, once again, set the piece of paper aside.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Mine for distinct perspectives&lt;/strong&gt; on yet another blank page, talking to different stakeholders
and colleagues who you know are likely to disagree with your early thinking.
Your goal is not to agree with this feedback. Instead, it&amp;rsquo;s to understand their view.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="https://www.amazon.com/Crux-How-Leaders-Become-Strategists-ebook/dp/B09G2QXXWX"&gt;The Crux&lt;/a&gt;&lt;/em&gt;
by Richard Rumelt anchors diagnosis in this approach, emphasizing the importance of &amp;ldquo;testing, adjusting, and changing the frame, or point of view.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Synthesize views into one internally consistent perspective.&lt;/strong&gt;
Sometimes the different perspectives you&amp;rsquo;ve gathered don&amp;rsquo;t mesh well.
They might well explicitly differ in what they believe the underlying problem
is, as is typical in tension between platform and product engineering teams.
The goal is to competently represent each of these perspectives in the diagnosis,
even the ones you disagree with, so that later on you can evaluate your proposed approach
against each of them.&lt;/p&gt;
&lt;p&gt;When synthesizing feedback goes poorly, it tends to fail in one of two ways.
First, the author&amp;rsquo;s opinion shines through so strongly that it renders the author
suspect.
Your goal isn’t to agree with every perspective, nor should your diagnosis crown one viewpoint as correct.
The reader should see detailed perspectives without clearly sensing the author&amp;rsquo;s biases.&lt;/p&gt;
&lt;p&gt;The second common issue is when a group tries to jointly own the synthesis,
but creates fractured perspectives rather than a unified one.
I generally find that having one author who is accountable for representing all views
works best to address both of these issues.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Test drafts across perspectives.&lt;/strong&gt;
Once you&amp;rsquo;ve written your initial diagnosis,
you want to sit down with the people who you expect to disagree
most fervently. Iterate with them until they agree that you&amp;rsquo;ve
accurately captured their perspective.&lt;/p&gt;
&lt;p&gt;It might be that they
disagree with some other viewpoints, but they should be able
to agree that others hold those views. They might argue that
the data you&amp;rsquo;ve included doesn&amp;rsquo;t capture their full reality,
in which case you can caveat the data by saying that their
team disagrees that it&amp;rsquo;s a comprehensive lens.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Don&amp;rsquo;t worry about getting the details perfectly right in your initial diagnosis.&lt;/strong&gt;
You&amp;rsquo;re trying to get the right crumbs to feed into the next phase,
&lt;a href="https://craftingengstrategy.com/refine/"&gt;strategy refinement&lt;/a&gt;.
Allowing yourself to be directionally correct, rather than perfectly correct,
makes it possible to cover a broad territory quickly.
Getting caught up in perfecting details is an easy way to anchor yourself into one perspective
prematurely.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At this point, I hope you&amp;rsquo;re starting to predict how I&amp;rsquo;ll conclude any recipe for
strategy creation:
if these steps feel overly mechanical to you, adjust them to something that feels
more natural and authentic. There&amp;rsquo;s no perfect way to understand complex problems.
That said, if you feel uncertain, or are skeptical of your own track record,
I do encourage you to start with the above approach as a launching point.&lt;/p&gt;
&lt;h2 id="incorporating-data-into-your-diagnosis"&gt;Incorporating data into your diagnosis&lt;/h2&gt;
&lt;p&gt;The diagnosis behind &lt;a href="https://craftingengstrategy.com/stripe-sorbet-strategy/"&gt;Stripe&amp;rsquo;s creation of Sorbet&lt;/a&gt;
includes a number of pieces of data to help readers understand their reasoning.
For example, it covers staffing numbers of relevant teams, and the extent of test coverage in the Ruby code base.&lt;/p&gt;
&lt;p&gt;If everyone has the same data, and the same assumptions about how that data is likely to change going forward,
then evaluating the strategy becomes vastly simpler.
Data is also your mechanism for supporting or critiquing the various views that
you&amp;rsquo;ve gathered when drafting your diagnosis; to an impartial reader, data will speak louder than passion.
If you&amp;rsquo;re confident that a perspective is true, then include a data narrative that
supports it. If you believe another perspective is overstated, then include data that the reader
will require to come to the same conclusion.&lt;/p&gt;
&lt;p&gt;Do your best to include data analysis with a link out to the full data,
rather than requiring readers to interpret the data themselves while they are reading.
As your strategy document travels further, there will be inevitable requests for
different cuts of data to help readers understand your thinking, and this is somewhat
preventable by linking to your original sources.&lt;/p&gt;
&lt;p&gt;If much of the data you want doesn&amp;rsquo;t exist today,
that&amp;rsquo;s a fairly common scenario for strategy work: if the data to make the decision
easy already existed, you probably would have already made a decision rather than needing to
run a structured thinking process.
The next chapter &lt;a href="https://craftingengstrategy.com/refine/"&gt;on refining strategy&lt;/a&gt; covers a number of tools
that are useful for building confidence in low-data environments.&lt;/p&gt;
&lt;h2 id="whisper-the-controversial-parts"&gt;Whisper the controversial parts&lt;/h2&gt;
&lt;p&gt;At one time, the company I worked at rolled out a bar raiser program styled after Amazon&amp;rsquo;s,
where there was an interviewer from outside the team that had to approve every hire.
I spent some time arguing against adding this additional step as I didn&amp;rsquo;t understand
what we were solving for, and I was surprised at how disinterested management was
about knowing if the new process actually improved outcomes.&lt;/p&gt;
&lt;p&gt;What I didn&amp;rsquo;t realize until much later was that most of the senior leadership distrusted one
of their peers, and had rolled out the bar raiser program solely to create
a mechanism to control that manager&amp;rsquo;s hiring bar when the CTO was disinterested holding
that leader accountable.
(I also learned that these leaders didn&amp;rsquo;t care much about implementing this policy,
resulting in bar raiser rejections being frequently ignored,
but that&amp;rsquo;s a discussion for the &lt;a href="https://craftingengstrategy.com/operations/"&gt;Operations for strategy chapter&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;This is a good example of a strategy that &lt;em&gt;does&lt;/em&gt; make sense with the full diagnosis,
but makes little sense without it, and where stating part of the diagnosis out loud is
nearly impossible. Even senior leaders are not generally allowed to write a document
that says, &amp;ldquo;The Director of Product Engineering is a bad hiring manager.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;When you&amp;rsquo;re writing a strategy, you&amp;rsquo;ll often find yourself trying to choose between
two awkward options:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Say something awkward or uncomfortable about your company or someone working within it&lt;/li&gt;
&lt;li&gt;Omit a critical piece of your diagnosis that&amp;rsquo;s necessary to understand the wider thinking&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Whenever you encounter this sort of debate, my advice is to find a way to include the diagnosis,
but to reframe it into a palatable statement that avoids casting blame too narrowly.
I think it&amp;rsquo;s helpful to discuss a few concrete examples of this,
starting with the strategy for &lt;a href="https://craftingengstrategy.com/private-equity-strategy/"&gt;navigating private equity&lt;/a&gt;,
whose diagnosis includes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Based on general practice, it seems likely that our new Private Equity ownership will expect us to reduce R&amp;amp;D headcount costs through a reduction. However, we don’t have any concrete details to make a structured decision on this, and our approach would vary significantly depending on the size of the reduction.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There are many things the authors of this strategy likely feel about their state of reality.
First, they are probably upset about the fact that their new private equity ownership is likely
to eliminate colleagues. Second, they are likely upset that there is no clear plan around what
they need to do, so they are stuck preparing for a wide range of potential outcomes.
However they feel, they stick to precise, factual statements.&lt;/p&gt;
&lt;p&gt;For a second example, we can look to the &lt;a href="https://craftingengstrategy.com/uber-strategy/"&gt;Uber service migration strategy&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Within infrastructure engineering, there is a team of four engineers responsible for service provisioning today. While our organization is growing at a similar rate as product engineering, none of that additional headcount is being allocated directly to the team working on service provisioning. We do not anticipate this changing.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The team didn&amp;rsquo;t &lt;em&gt;agree&lt;/em&gt; that their headcount should not be growing,
but it was the reality they were operating in. They acknowledged their reality
as a factual statement, without any additional commentary about that statement.&lt;/p&gt;
&lt;p&gt;In both of these examples, they found a professional, non-judgmental way to acknowledge
the circumstances they were solving. The authors would have preferred that the leaders
behind those decisions take explicit accountability for them, but it would have undermined
the strategy work had they attempted to do it within their strategy writeup.&lt;/p&gt;
&lt;p&gt;Excluding critical parts of your diagnosis makes your strategies particularly
hard to evaluate, copy or recreate.
Find a way to say things politely to make the strategy effective.
As always, strategies are much more about realities than ideals.&lt;/p&gt;
&lt;h2 id="reframe-blockers-as-part-of-diagnosis"&gt;Reframe blockers as part of diagnosis&lt;/h2&gt;
&lt;p&gt;When I work on strategy with early-career leaders,
an idea that comes up a lot is that an identified problem
means that strategy is not possible. For example, they might
argue that doing strategy work is impossible at their current
company because the executive team changes their mind too often.&lt;/p&gt;
&lt;p&gt;That core insight is almost certainly true, but it&amp;rsquo;s much more
powerful to reframe that as a diagnosis: if we don&amp;rsquo;t find a way
to show concrete progress quickly, and use that to excite the executive team,
our strategy is likely to fail.
This transforms the thing preventing your strategy into a condition
your strategy needs to address.&lt;/p&gt;
&lt;p&gt;Whenever you run into a reason why your strategy seems unlikely to work,
or why strategy overall seems difficult, you&amp;rsquo;ve found an important piece
of your diagnosis to include. There are never reasons why strategy simply
cannot succeed, only diagnoses you&amp;rsquo;ve failed to recognize.&lt;/p&gt;
&lt;p&gt;For example, in &lt;a href="https://craftingengstrategy.com/project-resourcing-strategy/"&gt;Calm&amp;rsquo;s approach to resourcing Engineering-driven projects&lt;/a&gt;,
we knew that the company&amp;rsquo;s informal approach to prioritization wasn&amp;rsquo;t going to change.
Even if we convinced our peers in product management to change how &lt;em&gt;they&lt;/em&gt; planned,
we&amp;rsquo;d still be impacted by the executive team&amp;rsquo;s informal planning which wasn&amp;rsquo;t going to change.
Rather than preventing us from implementing a strategy,
those dynamics clarified what sort of approach could actually succeed.&lt;/p&gt;
&lt;h2 id="the-role-of-self-awareness"&gt;The role of self-awareness&lt;/h2&gt;
&lt;p&gt;Every problem of today is partially rooted in the decisions of yesterday.
If you&amp;rsquo;ve been with your organization for any duration at all,
this means that &lt;em&gt;you&lt;/em&gt; are directly or indirectly responsible for
a portion of the problems that your diagnosis ought to recognize.&lt;/p&gt;
&lt;p&gt;This means that recognizing the impact of your prior actions in your diagnosis is a powerful demonstration
of self-awareness. It also suggests that your next strategy&amp;rsquo;s success is rooted
in your self-awareness about your prior choices.
Don&amp;rsquo;t be afraid to recognize the failures in your past work.
While changing your mind &lt;em&gt;without&lt;/em&gt; new data is a sign of chaotic leadership,
changing your mind &lt;em&gt;with&lt;/em&gt; new data is a sign of thoughtful leadership.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Because diagnosis is the foundation of effective strategy,
I&amp;rsquo;ve always found it the most intimidating phase of strategy work.
While I think that&amp;rsquo;s a somewhat unavoidable reality, my hope is
that this chapter has somewhat prepared you for that challenge.&lt;/p&gt;
&lt;p&gt;The four most important things to remember are simply:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;form your diagnosis before deciding how to solve it,&lt;/li&gt;
&lt;li&gt;try especially hard to capture perspectives you initially disagree with,&lt;/li&gt;
&lt;li&gt;supplement intuition with data where you can, and&lt;/li&gt;
&lt;li&gt;accept that sometimes you&amp;rsquo;re missing the data you need to fully understand.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The last piece in particular, is why many good strategies never get shared,
and the topic we&amp;rsquo;ll address in the next chapter on &lt;a href="https://craftingengstrategy.com/refine/"&gt;strategy refinement&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Exploring</title><link>https://craftingengstrategy.com/explore/</link><pubDate>Thu, 13 Feb 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/explore/</guid><description>&lt;p&gt;A surprising number of strategies are doomed from inception
because their authors get attached to one particular approach
without considering alternatives that would work better for
their current circumstances.
This happens when engineers want to pick tools solely because
they are trending, and when executives insist on adopting the
tech stack from their prior organization where they felt comfortable.&lt;/p&gt;
&lt;p&gt;Exploration is the antidote to early anchoring, forcing you to consider
the problem widely &lt;em&gt;before&lt;/em&gt; evaluating any of the paths forward.
Exploration is about updating your priors before assuming the industry
hasn&amp;rsquo;t evolved since you last worked on a given problem.
Exploration is continuing to believe that things can get better
when you&amp;rsquo;re not watching.&lt;/p&gt;
&lt;p&gt;This chapter covers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The goals of the exploration phase of strategy creation&lt;/li&gt;
&lt;li&gt;When to explore (always first!) and when it makes sense to stop exploring&lt;/li&gt;
&lt;li&gt;How to explore a topic, including discussion of the most common mechanisms:
mining for internal precedent, reading industry papers and books,
and leveraging your external network&lt;/li&gt;
&lt;li&gt;Why avoiding judgment is an essential part of exploration&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By the end of this chapter, you&amp;rsquo;ll be able to conduct an exploration
for the current or next strategy that you work on.&lt;/p&gt;
&lt;h2 id="what-is-exploration"&gt;What is exploration?&lt;/h2&gt;
&lt;p&gt;One of the frequent senior leadership anti-patterns I&amp;rsquo;ve encountered in my career
is &lt;a href="https://lethain.com/grand-migration/"&gt;The Grand Migration&lt;/a&gt;, where a new leader declares that
a massive migration to a new technology stack&amp;ndash;typically the stack used by their
former employer&amp;ndash;will solve every pressing problem.
What&amp;rsquo;s distinguishing about the Grand Migration is not the initially bad selection,
but the single-minded ferocity with which the senior leader pushes for their approach,
even when it becomes abundantly clear to others that it doesn&amp;rsquo;t solve the problem at hand.&lt;/p&gt;
&lt;p&gt;These senior leaders are very intelligent, but have allowed themselves to be trapped
by their initial thinking from prior experiences. Accepting those early thoughts as the
foundation of their strategy, they build the entire strategy on top of those ideas,
and eventually there is so much weight standing on those early assumptions that it
becomes impossible to acknowledge the errors.&lt;/p&gt;
&lt;p&gt;Exploration is the deliberate practice of searching through a strategy&amp;rsquo;s problem and solution spaces
before allowing yourself to commit to a given approach.
It&amp;rsquo;s understanding how others have approached the same problem recently and in the past.
It&amp;rsquo;s doing this both in trendy companies you admire, and in practical companies that actually resemble yours.&lt;/p&gt;
&lt;p&gt;Most exploration will be external to your team, but depending on your company,
much of your exploration might be internal to the company.
If you&amp;rsquo;re in a massive engineering organization of 100,000, there are likely existing internal solutions
to your problem that you&amp;rsquo;ve never heard of.
Conversely, if you&amp;rsquo;re in an organization of 50 engineers, it&amp;rsquo;s likely that much of your exploration will be external.&lt;/p&gt;
&lt;h2 id="when-to-explore"&gt;When to explore&lt;/h2&gt;
&lt;p&gt;Exploration is the first step of good strategy work.
Even when you want to skip it, you will always regret skipping it,
because you&amp;rsquo;ll inadvertently frame yourself into whatever approach you
focus on first.
Especially when it comes to problems that you&amp;rsquo;ve solved
previously, exploration is the only thing preventing you
from over-indexing on your prior experiences.&lt;/p&gt;
&lt;p&gt;Try to continue exploration until you know how three similar teams within your company and
three similar companies have recently solved the same problem.
Further, make sure you are able to explain the thinking behind those decisions.
At that point, you should be ready to stop exploring
and move on to the &lt;a href="https://craftingengstrategy.com/diagnosis/"&gt;diagnosis step&lt;/a&gt; of strategy creation.&lt;/p&gt;
&lt;p&gt;Exploration should always come with a minimum and maximum timeframe:
less than a few hours is very suspicious, and more than a week is
questionable as well.&lt;/p&gt;
&lt;h2 id="how-to-explore"&gt;How to explore&lt;/h2&gt;
&lt;p&gt;While the details of each exploration will differ a bit,
the overarching approach tends to be pretty similar across strategies.
After I open up the draft strategy document I&amp;rsquo;m working on,
my general approach to exploration is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Start throwing in every resource I can think of related to that problem.&lt;/p&gt;
&lt;p&gt;For example, in the &lt;a href="https://craftingengstrategy.com/uber-strategy/"&gt;Uber service migration strategy&lt;/a&gt;,
I started by collecting recent papers on Mesos, Kubernetes, and Aurora to understand the
state of the industry on orchestration.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Do some web searching, foundational model prompting, and checking with a few current and prior colleagues
about what topics and resources I might be missing.&lt;/p&gt;
&lt;p&gt;For example, for the &lt;a href="https://craftingengstrategy.com/product-eng-strategy/"&gt;Calm engineering strategy&lt;/a&gt;,
I focused on talking with industry peers on tools they&amp;rsquo;d used to focus
a team with diffuse goals.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Summarize the list of resources I&amp;rsquo;ve gathered, organizing them by which I want to explore,
and which I won&amp;rsquo;t spend time on but are worth mentioning.&lt;/p&gt;
&lt;p&gt;For example, the &lt;a href="https://craftingengstrategy.com/llm-adoption-strategy/"&gt;Large Language Model adoption strategy&lt;/a&gt;&amp;rsquo;s exploration
section documents the variety of resources the team explored before completing it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Work through the list one by one, continuing to collect notes in the strategy document.
When you&amp;rsquo;re done, synthesize those into a concise, readable summary of what you&amp;rsquo;ve learned.&lt;/p&gt;
&lt;p&gt;For example, the &lt;a href="https://craftingengstrategy.com/monolith-decomposition-strategy/"&gt;monolith decomposition strategy&lt;/a&gt;
synthesizes the exploration of a broad topic into four paragraphs, with links out
to references.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Stop once I generally understand how a handful of similar internal and external teams
have recently approached this problem.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Of all the steps in strategy creation, exploration is inherently open-ended,
and you may find a different approach works better for you. If you&amp;rsquo;re not sure
what to do, try following the above steps closely.
If you have a different approach that you&amp;rsquo;re confident in&amp;ndash;as long as it&amp;rsquo;s not skipping exploration!&amp;ndash;then
go ahead and try that instead.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;While not discussed in this chapter, you can also use some techniques like &lt;a href="wardley-mapping/"&gt;Wardley mapping&lt;/a&gt;,
covered in the &lt;a href="https://craftingengstrategy.com/refine/"&gt;Refinement chapter&lt;/a&gt;, to support your exploration phase.
Wardley mapping is a strategy tool designed within a different strategy tradition,
and consequently categorizing it as either solely an exploration tool or a refinement tool
ignores some of its potential uses.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s no perfect way to do strategy: take what works for you and use it.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="mine-internal-precedent"&gt;Mine internal precedent&lt;/h2&gt;
&lt;p&gt;One of the most powerful forms of strategy is simply documenting
how similar decisions have been made internally: often this is enough
to steer how similar future decisions are made within your organization.
This approach, documented in &lt;em&gt;Staff Engineer&lt;/em&gt;&amp;rsquo;s &lt;a href="https://staffeng.com/guides/engineering-strategy/"&gt;Write five, then synthesize&lt;/a&gt;,
is also the most valuable step of exploration for those working in established companies.&lt;/p&gt;
&lt;p&gt;If you are a tenured engineer within your organization,
then it&amp;rsquo;s somewhat safe to assume that you are aware of the
typical internal approaches. Even then, it&amp;rsquo;s worth poking around
to see if there are any related skunkworks projects happening internally.
This is doubly true if you&amp;rsquo;ve joined the organization recently, or are distant from the codebase itself.
In that case, it&amp;rsquo;s almost always worth poking around to see what already exists.&lt;/p&gt;
&lt;p&gt;Sometimes the internal approach isn&amp;rsquo;t ideal, but it&amp;rsquo;s still superior because it&amp;rsquo;s
already been implemented and there&amp;rsquo;s someone else maintaining it.
In the long-run, your strategy can ride along as someone else addresses the issues
that aren&amp;rsquo;t a perfect fit.&lt;/p&gt;
&lt;h2 id="using-your-network"&gt;Using your network&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://craftingengstrategy.com/user-data-strategy/"&gt;How should we control access to user data&lt;/a&gt;&amp;rsquo;s
exploration section begins with:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Our experience is that best practices around managing internal access to user data
are widely available through our networks,
and otherwise hard to find. The exact rationale for this is hard to determine,&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;While there are many topics with significant public writing out there,
my experience is that there are many topics where there&amp;rsquo;s very little
you can learn without talking directly to practitioners.
This is especially true for security, compliance, operating at truly large scale,
and competitive processes like optimizing advertising spend.&lt;/p&gt;
&lt;p&gt;Further, it&amp;rsquo;s surprisingly common to find that how people publicly describe
solving a problem and how they actually approach the problem are largely divorced.&lt;/p&gt;
&lt;p&gt;This is why having a broad personal network is exceptionally powerful,
and makes it possible to quickly understand the breadth of possible solutions.
It also provides access to the practical downsides to various approaches,
which are often omitted when talking to public proponents.&lt;/p&gt;
&lt;p&gt;In a recent strategy session, a proposal came up that seemed off to me,
and I was able to text&amp;ndash;and get answers to those texts&amp;ndash;industry peers
before the meeting ended, which invalidated the room&amp;rsquo;s assumptions about what was
and was not possible.
A disagreement that might have taken weeks to resolve was instead resolved in
a few minutes, and we were able to figure out next steps in that meeting rather
than waiting a week for the next meeting when we&amp;rsquo;d realized our mistake.&lt;/p&gt;
&lt;p&gt;Of course, it&amp;rsquo;s &lt;em&gt;also&lt;/em&gt; important to hold information from your network with skepticism.
I&amp;rsquo;ve certainly had my network be wrong, and your network never knows how your current
circumstances differ from theirs, so blindly accepting guidance from your network
is never the right decision either.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;If you&amp;rsquo;re looking for a more detailed coverage on
building your network, this topic has also come up in &lt;em&gt;Staff Engineer&lt;/em&gt;&amp;rsquo;s
chapter on &lt;a href="https://staffeng.com/guides/network-of-peers/"&gt;Build a network of peers&lt;/a&gt;,
and &lt;em&gt;The Engineering Executive&amp;rsquo;s Primer&lt;/em&gt;&amp;rsquo;s chapter on
&lt;a href="https://lethain.com/building-exec-network/"&gt;Building your executive network&lt;/a&gt;.
It feels silly to cover the same topic a third time,
but it&amp;rsquo;s a foundational technique for effective decision making.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="read-widely-read-narrowly"&gt;Read widely; read narrowly&lt;/h2&gt;
&lt;p&gt;Reading has always been an important part of my strategy work.
There are two distinct motions to this approach: read widely on an ongoing basis to broaden your thinking,
and read narrowly on the specific topic you&amp;rsquo;re working on.&lt;/p&gt;
&lt;p&gt;Starting with reading widely, I make an effort each year to read ten to twenty industry-relevant works.
These are not necessarily new releases, but are new releases &lt;em&gt;for me&lt;/em&gt;.
Importantly, I try to read things that I don&amp;rsquo;t know much about or that I initially disagree with.
Some of my recent reads were
&lt;em&gt;&lt;a href="https://www.amazon.com/Chip-War-Worlds-Critical-Technology/dp/1982172002"&gt;Chip War&lt;/a&gt;&lt;/em&gt;,
&lt;em&gt;&lt;a href="https://www.amazon.com/Building-Green-Software-Sustainable-Development/dp/1098150627"&gt;Building Green Software&lt;/a&gt;&lt;/em&gt;,
&lt;em&gt;&lt;a href="https://learning.oreilly.com/library/view/tidy-first/9781098151232/"&gt;Tidy First?&lt;/a&gt;&lt;/em&gt;, and
&lt;em&gt;&lt;a href="https://www.amazon.com/How-Big-Things-Get-Done-ebook/dp/B0B3HS4C98/"&gt;How Big Things Get Done&lt;/a&gt;&lt;/em&gt;.
From each of these books, I learned something, and over time they&amp;rsquo;ve built a series of bookmarks
in my head about ideas that might apply to new problems.&lt;/p&gt;
&lt;p&gt;On the other end of things is reading narrowly.
When I recently started working on an AI agents strategy,
the first thing I did was read through Chip Huyen&amp;rsquo;s &lt;em&gt;&lt;a href="https://www.amazon.com/AI-Engineering-Building-Applications-Foundation/dp/1098166302"&gt;AI Engineering&lt;/a&gt;&lt;/em&gt;,
which was an exceptionally helpful survey.
Similarly, when we started thinking about &lt;a href="https://craftingengstrategy.com/uber-strategy/"&gt;Uber&amp;rsquo;s service migration&lt;/a&gt;,
we read a number of industry papers, including
&lt;a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43438.pdf"&gt;Large-scale cluster management at Google with Borg&lt;/a&gt;
and
&lt;a href="https://people.eecs.berkeley.edu/~alig/papers/mesos.pdf"&gt;Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;None of these readings had all the answers to the problems I was working on,
but they did an excellent job at helping me understand the range of options,
as well as identifying other references to consult in my exploration.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll mention two nuances that will help a lot here.
First, I highly encourage getting comfortable with skimming books.
Even tightly edited books will have a lot of content that isn&amp;rsquo;t
particularly relevant to your current goals, and you should skip
that content liberally.
Second, what you read doesn&amp;rsquo;t have to be books.
It can be blog posts, essays, interview transcripts,
or certainly it can be books.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;In this context, &amp;ldquo;reading&amp;rdquo; doesn&amp;rsquo;t even have to actually be reading.
There are conference talks that contain just as much as a blog
post, and conferences that cover as much breadth as a book.
There are also conference talks without a written
equivalent, such as Dan Na&amp;rsquo;s excellent &lt;a href="https://blog.danielna.com/talks/pushing-through-friction"&gt;Pushing Through Friction&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="each-job-is-an-education"&gt;Each job is an education&lt;/h2&gt;
&lt;p&gt;Experience is frequently disregarded in the technology industry,
and there are ways to misuse experience by copying too liberally
the solutions that worked in different circumstances, but
the most effective, and the slowest, mechanism for exploring
is continuing to work in the details of meaningful problems.&lt;/p&gt;
&lt;p&gt;You probably won&amp;rsquo;t &lt;a href="https://lethain.com/forty-year-career/"&gt;choose every job to optimize for learning&lt;/a&gt;,
but it allows you to explore more complex problems over time&amp;ndash;recognizing that some
of your prior knowledge will have gone stale along the way&amp;ndash;which is uniquely valuable.&lt;/p&gt;
&lt;h2 id="save-judgment-for-later"&gt;Save judgment for later&lt;/h2&gt;
&lt;p&gt;As I&amp;rsquo;ve mentioned several times, the point of exploration is to go broad
with the goal of understanding approaches you might not have considered,
and invalidating things you initially think are true.
Both of those things are only possible if you save judgment for later:
if you&amp;rsquo;re passing judgment about whether approaches are &amp;ldquo;good&amp;rdquo; or &amp;ldquo;bad&amp;rdquo;,
then your exploration is probably going astray.&lt;/p&gt;
&lt;p&gt;As a soft rule, I&amp;rsquo;d argue that if no one involved in a strategy
has changed their mind about something they believed when you
started the exploration step, then you&amp;rsquo;re not done exploring.
This is &lt;em&gt;especially&lt;/em&gt; true when it comes to strategy work by
senior leaders. Their beliefs are often well-justified by
years of experience, but it&amp;rsquo;s unclear which parts of their
experience have become stale over time.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;At this point, I hope you feel comfortable exploring as the first step
of your strategy work, and understand the likely consequences of skipping
this step.
It&amp;rsquo;s not an overstatement to say that every one of the worst strategic failures I&amp;rsquo;ve encountered
would have been prevented by its primary author taking a few days to explore the space before
anchoring on a particular approach.&lt;/p&gt;
&lt;p&gt;A few days of feeling slow are always worth avoiding years of misguided efforts.&lt;/p&gt;</description></item><item><title>How should we control access to user data?</title><link>https://craftingengstrategy.com/user-data-strategy/</link><pubDate>Fri, 07 Feb 2025 06:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/user-data-strategy/</guid><description>&lt;p&gt;At some point in a startup&amp;rsquo;s lifecycle, they decide that they
need to be ready to go public in 18 months, and a flurry of IPO-readiness
activity kicks off.
This strategy focuses on a company working on IPO readiness,
which has identified a gap in internal controls for managing user data access.
It&amp;rsquo;s a company that &lt;em&gt;wants&lt;/em&gt; to meaningfully
improve their security posture around user data access, but which has
had a number of failed security initiatives over the years.&lt;/p&gt;
&lt;p&gt;Most of those initiatives have failed because they significantly
degraded internal workflows for teams like customer support,
such that the initial progress was reverted and subverted over time,
to little long-term effect.
This strategy represents the Chief Information Security Officer&amp;rsquo;s (CISO) attempt to acknowledge and overcome those historical
challenges while meeting their IPO readiness obligations, and&amp;ndash;most importantly&amp;ndash;doing
right by their users.&lt;/p&gt;
&lt;h2 id="reading-this-document"&gt;Reading this document&lt;/h2&gt;
&lt;p&gt;To apply this strategy, start at the top with &lt;em&gt;Policy&lt;/em&gt;. To understand the thinking behind this strategy, read sections in reverse order, starting with &lt;em&gt;Explore&lt;/em&gt;, then &lt;em&gt;Diagnose&lt;/em&gt; and so on.
Relative to the default structure, this document has been refactored in two ways
to improve readability:
first, &lt;em&gt;Operation&lt;/em&gt; has been folded into &lt;em&gt;Policy&lt;/em&gt;;
second, &lt;em&gt;Refine&lt;/em&gt; has been embedded in &lt;em&gt;Diagnose&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;More detail on this structure in &lt;a href="https://lethain.com/readable-engineering-strategy-documents"&gt;Making a readable Engineering Strategy document&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="policy--operations"&gt;Policy &amp;amp; Operations&lt;/h2&gt;
&lt;p&gt;Our new policies, and the mechanisms to operate them are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Controls for accessing user data must be significantly stronger prior to our IPO.&lt;/strong&gt;
Senior leadership, legal, compliance and security have decided that we are not comfortable accepting the status quo
of our user data access controls as a public company.
We must meaningfully improve the quality of resource-level access controls
(e.g. how we determine which rows, rather than tables, a user has permission to access)
as part of our pre-IPO readiness efforts.&lt;/p&gt;
&lt;p&gt;Our Security team is accountable for the exact mechanisms and approach to addressing this risk.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;We will continue to prioritize a hybrid solution to resource-access controls.&lt;/strong&gt;
This has been our approach thus far, and the fastest available option.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Directly expose the log of our resource-level accesses to our users.&lt;/strong&gt;
We will build towards a user-accessible log of all company accesses of user data,
and ensure we are comfortable explaining each and every access.
In addition, it means that each rationale for access must be comprehensible and reasonable
from a user perspective.&lt;/p&gt;
&lt;p&gt;This is important because it aligns our approach with our users&amp;rsquo; perspectives.
They will be able to evaluate how we access their data, and make decisions about
continuing to use our product based on whether they agree with our use.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Good security discussions don&amp;rsquo;t frame decisions as a compromise between security and usability.&lt;/strong&gt;
We will pursue &lt;a href="https://lethain.com/multi-dimensional-tradeoffs/"&gt;multi-dimensional tradeoffs&lt;/a&gt; to simultaneously improve security and efficiency.
Whenever we frame a discussion on trading off between security and utility,
it&amp;rsquo;s a sign that we are having the wrong discussion, and that we should rethink our approach.&lt;/p&gt;
&lt;p&gt;We will prioritize mechanisms that can both automatically authorize &lt;em&gt;and&lt;/em&gt; document the rationale
for accesses to customer data. The most obvious example
of this is automatically granting access to a customer support agent for users who have an open support ticket
assigned to that agent. (And removing that access when that ticket is reassigned or resolved.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Measure progress on percentage of customer data access requests justified by a user-comprehensible, automated rationale.&lt;/strong&gt;
This will anchor our approach on simultaneously improving the security of user data and the usability of our colleagues&amp;rsquo; internal tools.
If we only expand requirements for accessing customer data, we won&amp;rsquo;t view this as progress because it&amp;rsquo;s
not automated (and consequently is likely to encourage workarounds as teams try to solve problems quickly).
Similarly, if we only improve usability, charts won&amp;rsquo;t represent this as progress, because we won&amp;rsquo;t
have increased the number of supported requests.&lt;/p&gt;
&lt;p&gt;As part of this effort, we will create a private channel where the security and compliance
team has visibility into all manual rationales for user data access, and will
notify the manager of anyone who repeatedly uses a manual justification
for accessing user data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Expire unused roles to move towards principle of least privilege.&lt;/strong&gt;
Today we have a number of roles granted in our role-based access control (RBAC) system
to users who do not use the granted permissions.
To address that issue, we will automatically remove roles from colleagues after 90 days of not using the role&amp;rsquo;s permissions.&lt;/p&gt;
&lt;p&gt;Engineers in an active on-call rotation are the exception to this automated permission pruning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Weekly reviews until we see progress; monthly access reviews in perpetuity.&lt;/strong&gt;
Starting now, there will be a weekly sync between the security engineering team,
teams working on customer data access initiatives, and the CISO. This meeting will
focus on rapid iteration and problem solving.&lt;/p&gt;
&lt;p&gt;This is explicitly a forum for ongoing &lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt;,
with the CISO serving as the meeting&amp;rsquo;s sponsor, and their Principal Security Engineer serving as the
meeting&amp;rsquo;s guide.
It will continue until we have clarity
on the path to 100% coverage of user-comprehensible, automated rationales for access to customer data.&lt;/p&gt;
&lt;p&gt;Separately, we are also starting a monthly review of sampled accesses to customer data to ensure
the proper usage and function of the rationale-creation mechanisms we build.
This meeting&amp;rsquo;s goal is to review access rationales for quality and appropriateness,
both by reviewing sampled rationales in the short-term, and identifying more automated mechanisms
for identifying high-risk accesses to review in the future.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Exceptions must be granted in writing by CISO.&lt;/strong&gt;
While our overarching Engineering Strategy states that we follow
an advisory architecture process as described in &lt;em&gt;&lt;a href="https://www.amazon.com/Facilitating-Software-Architecture-Empowering-Architectural-ebook/dp/B0DMHGWCPN/"&gt;Facilitating Software Architecture&lt;/a&gt;&lt;/em&gt;,
the customer data access policy is an exception and must be explicitly approved, with documentation,
by the CISO. Start that process in the &lt;code&gt;#ciso&lt;/code&gt; channel.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="diagnose"&gt;Diagnose&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;We have a strong baseline of role-based access controls (RBAC) and audit logging.
However, we have limited mechanisms for ensuring assigned roles follow
the &lt;a href="https://en.wikipedia.org/wiki/Principle_of_least_privilege"&gt;principle of least privilege&lt;/a&gt;.
This is particularly true in cases where individuals change teams or roles over the course
of their tenure at the company: some individuals have collected numerous unused roles
over five-plus years at the company.&lt;/p&gt;
&lt;p&gt;Similarly, our audit logs are durable and pervasive, but we have limited proactive mechanisms
for identifying anomalous usage. Instead they are typically used to understand what occurred after
an incident is identified by other mechanisms.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For resource-level access controls, we rely on a hybrid approach between a 3rd-party platform
for incoming user requests, and approval mechanisms within our own product.
Providing a rationale for access across these two systems requires manual work,
and those rationales are later manually reviewed for appropriateness in a batch fashion.&lt;/p&gt;
&lt;p&gt;There are two major ongoing problems with our current approach to resource-level access controls.
First, the teams making requests view them as a burdensome obligation without much benefit to
them or on behalf of the user.
Second, because the rationale review steps are manual, there is no verifiable evidence of the quality
of the review.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We&amp;rsquo;ve found no evidence of misuse of user data.
When colleagues do access user data, we have uniformly and consistently found that there
is a clear, and reasonable rationale for that access. For example, a ticket in the user
support system where the user has raised an issue.&lt;/p&gt;
&lt;p&gt;However, the quality of our documented rationales is consistently low because it depends on
busy people manually copying over significant information many times a day.
Because the rationales are of low quality, the verification of these rationales is somewhat arbitrary.
From a literal compliance perspective, we do provide rationales and auditing of these rationales, but
it&amp;rsquo;s unclear if the majority of these audits increase the security of our users&amp;rsquo; data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Historically, we&amp;rsquo;ve made significant security investments that caused temporary spikes in our security posture.
However, looking at those initiatives a year later, in many cases we see a pattern of increased scrutiny,
followed by a gradual repeal or avoidance of the new mechanisms.&lt;/p&gt;
&lt;p&gt;We have found that most of them involved increased friction for
essential work performed by other internal teams. In the natural order of performing work, those teams
would subtly subvert the improvements because it interfered with their immediate goals
(e.g. supporting customer requests).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;As such, we have high conviction from our track record that our historical approach can
create optical wins internally. We have limited conviction that it can create long-term
improvements outside of significant, unlikely internal changes (e.g. colleagues are markedly
less busy a year from now than they are today).
It seems likely we need a new approach to meaningfully shift our stance on these kinds of problems.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="explore"&gt;Explore&lt;/h2&gt;
&lt;p&gt;Our experience is that best practices around managing internal access to user data
are &lt;a href="https://craftingengstrategy.com/explore/"&gt;widely available through our networks&lt;/a&gt;,
and otherwise hard to find. The exact rationale for this is hard to determine,
but it seems possible that it&amp;rsquo;s a topic that folks are generally uncomfortable
discussing in public on account of potential future liability and compliance
issues.&lt;/p&gt;
&lt;p&gt;In our exploration, we found two standardized dimensions (role-based access controls, audit logs),
and one highly divergent dimension (resource-specific access controls):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Role-based access controls&lt;/strong&gt; (RBAC) are a highly standardized approach at this point.
The core premise is that users are mapped to one or more roles, and each role
is granted a certain set of permissions. For example, a role representing the customer support agent
might be granted permission to deactivate an account, whereas a role representing the sales engineer might be able
to configure a new account.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Audit logs&lt;/strong&gt; are similarly standardized. All access and mutation of resources should be
tied in a durable log to the human who performed the action. These logs should be
accumulated in a centralized, queryable solution.&lt;/p&gt;
&lt;p&gt;One of the core challenges is determining how to utilize these logs proactively
to detect issues rather than reactively when an issue has already been flagged.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Resource-level access controls&lt;/strong&gt; are significantly less standardized than RBAC
or audit logs. We found three distinct patterns adopted by companies, with
little consistency across companies on which is adopted.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Those three patterns for resource-level access control were:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;3rd-party enrichment&lt;/strong&gt; where access to resources is managed in a 3rd-party system
such as Zendesk.
This requires enriching objects within those systems with data and metadata from
the product(s) where those objects live.
It also requires implementing actions on the platform, such as archiving or configuration,
allowing them to live entirely in that platform&amp;rsquo;s permission structure.&lt;/p&gt;
&lt;p&gt;The downside of this approach is tight coupling with the platform vendor,
any limitations inherent to that platform, and the overhead of maintaining
engineering teams familiar with both your internal technology stack and the
platform vendor&amp;rsquo;s technology stack.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;1st-party tool implementation&lt;/strong&gt; where all activity, including creation and management of user issues,
is managed within the core product itself. This pattern is most common in earlier stage companies
or companies whose customer support leadership &amp;ldquo;grew up&amp;rdquo; within the organization without much
exposure to the approach taken by peer companies.&lt;/p&gt;
&lt;p&gt;The advantage of this approach is that there is a single, tightly integrated
and infinitely extensible platform for managing interactions.
The downside is that you have to build and maintain all of that work internally
rather than pushing it to a vendor that ought to be able to invest more heavily
into their tooling.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Hybrid solutions&lt;/strong&gt; where a 3rd-party platform is used for most actions,
and is also used to permit resource-level access within the 1st-party system.
For example, you might be able to access a user&amp;rsquo;s data only while there is an open
ticket created by that user, and assigned to you, in the 3rd-party platform.&lt;/p&gt;
&lt;p&gt;The advantage of this approach is that it allows supporting complex workflows
that don&amp;rsquo;t fit within the platform&amp;rsquo;s limitations, and allows you to avoid complex coupling
between your product and the vendor platform.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Generally, our experience is that all companies implement RBAC, audit logs, and one of the resource-level access control mechanisms.
Most companies pursue either 3rd-party enrichment with a sizable, long-standing team owning the platform implementation,
or rely on a hybrid solution where they are able to avoid a long-standing dedicated team by lumping that work into existing teams.&lt;/p&gt;</description></item><item><title>Is engineering strategy useful?</title><link>https://craftingengstrategy.com/is-useful/</link><pubDate>Thu, 30 Jan 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/is-useful/</guid><description>&lt;p&gt;While I frequently hear engineers bemoan a missing strategy,
their complaints rarely articulate why the missing strategy matters.
Instead, it serves as more of a truism: the economy used to be better,
children used to respect their parents,
and engineering organizations used to have an engineering strategy.&lt;/p&gt;
&lt;p&gt;This chapter starts by exploring something I believe quite strongly:
there&amp;rsquo;s &lt;em&gt;always&lt;/em&gt; an engineering strategy, even if there&amp;rsquo;s nothing written down.
From there, we&amp;rsquo;ll discuss why strategy, especially written strategy, is such
a valuable opportunity for organizations that take it seriously.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll dig into:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why there&amp;rsquo;s always a strategy, even when people say there isn&amp;rsquo;t&lt;/li&gt;
&lt;li&gt;How strategies have been impactful across my career&lt;/li&gt;
&lt;li&gt;How inappropriate strategies create significant organizational pain without much compensating impact&lt;/li&gt;
&lt;li&gt;How written strategy drives organizational learning&lt;/li&gt;
&lt;li&gt;The costs of not writing strategy down&lt;/li&gt;
&lt;li&gt;How strategy supports personal learning and development,
even in cases where you&amp;rsquo;re not empowered to &amp;ldquo;do strategy&amp;rdquo; yourself&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By this chapter&amp;rsquo;s end, hopefully you will agree with me
that strategy is an undertaking worth investing your&amp;ndash;and your organization&amp;rsquo;s&amp;ndash;time in.&lt;/p&gt;
&lt;h2 id="theres-always-a-strategy"&gt;There&amp;rsquo;s always a strategy&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve never worked somewhere where people didn&amp;rsquo;t claim there was no strategy.
In many of those companies, they&amp;rsquo;d say there was no engineering strategy.
Once I became an executive and was able to document and distribute an engineering strategy,
accusations of missing strategy didn&amp;rsquo;t go away; they just shifted to focus on a missing
product or company strategy.&lt;/p&gt;
&lt;p&gt;This even happened at companies that definitively had engineering strategies like Stripe
in 2016 which had numerous pillars to a clear engineering strategy such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Maintain backwards API compatibility, at almost any cost
(e.g. force an upgrade from TLS 1.2 to TLS 1.3 to retain PCI compliance,
but don&amp;rsquo;t force upgrades from the &lt;a href="https://docs.stripe.com/api/charges/create"&gt;/v1/charges&lt;/a&gt; endpoint
to the &lt;a href="https://docs.stripe.com/api/payment_intents"&gt;/v1/payment_intents&lt;/a&gt; endpoint)&lt;/li&gt;
&lt;li&gt;Work in Ruby within a monorepo, unless it&amp;rsquo;s the PCI environment, data processing, or data science work&lt;/li&gt;
&lt;li&gt;Engineers are fully responsible for the usability of their work, even when there are
product or engineering managers involved&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Working there it was generally clear what the company&amp;rsquo;s engineering strategy was on any given topic.
That said, it sometimes required asking around, and over time certain decisions became sufficiently
contentious that it became hard to definitively answer what the strategy was.
For example, the adoption of Ruby versus Java became contentious enough that I
distributed a strategy attempting to mediate the disagreement, &lt;a href="https://lethain.com/magnitudes-of-exploration/"&gt;Magnitudes of exploration&lt;/a&gt;,
although it wasn&amp;rsquo;t a particularly successful effort
(for reasons that are obvious in hindsight, particularly the lack of any enforcement mechanism).&lt;/p&gt;
&lt;p&gt;In the same sense that William Gibson said
&amp;ldquo;The future is already here – it’s just not very evenly distributed,&amp;rdquo;
there is always a strategy embedded into an organization&amp;rsquo;s decisions,
although in many organizations that strategy is only visible to a small
group, and may be quickly forgotten.&lt;/p&gt;
&lt;p&gt;If you ever find yourself thinking that a strategy doesn&amp;rsquo;t exist,
I&amp;rsquo;d encourage you to instead ask yourself where the strategy lives if you can&amp;rsquo;t find it.
Once you do find it, you may also find that the strategy is quite ineffective,
but I&amp;rsquo;ve simply never found that it doesn&amp;rsquo;t exist.&lt;/p&gt;
&lt;h2 id="strategy-is-impactful"&gt;Strategy &lt;em&gt;is&lt;/em&gt; impactful&lt;/h2&gt;
&lt;p&gt;In &lt;a href="https://craftingengstrategy.com/product-eng-strategy/"&gt;&amp;ldquo;We are a product engineering company!&amp;rdquo;&lt;/a&gt;, we discuss Calm&amp;rsquo;s engineering strategy
to address pervasive friction within the engineering team. The core of that strategy is clarifying how Calm
makes major technology decisions, along with documenting the motivating goal steering those decisions:
maximizing time and energy spent on creating their product.&lt;/p&gt;
&lt;p&gt;That strategy reduced friction by eliminating the cause of ongoing debate. It was successful in resetting
the team&amp;rsquo;s focus. It also caused several engineers to leave the company, because it was
incompatible with their priorities.
It&amp;rsquo;s easy to view that as a downside, but I don&amp;rsquo;t think it was.
A clear, documented strategy made it clear to everyone involved what sort of game we were playing,
the rules for that game, and for the first time let them accurately decide if they wanted to be
part of that game with the wider team.&lt;/p&gt;
&lt;p&gt;Creating alignment is one of the ways that strategy makes an impact, but
it&amp;rsquo;s certainly not the only way. Some of the ways that strategies
support the organization are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Concentrating company investment into a smaller space.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For example, &lt;a href="https://craftingengstrategy.com/monolith-decomposition-strategy/"&gt;deciding not to decompose a monolith&lt;/a&gt;
allows you to invest the majority of your tooling efforts on one language,
one test suite, and one deployment mechanism.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Many interesting properties only available through universal adoption.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For example, moving to an &lt;a href="https://craftingengstrategy.com/private-equity-model/"&gt;&amp;ldquo;N-1 policy&amp;rdquo; on backfilled roles&lt;/a&gt;
is a significant opportunity for managing costs, but only works if consistently adopted.
As another example, many strategies for disaster recovery or multi-region are only viable
if all infrastructure has a common configuration mechanism.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Focus execution on what truly matters.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For example, &lt;a href="https://craftingengstrategy.com/stripe-sorbet-strategy/"&gt;Stripe&amp;rsquo;s Sorbet strategy&lt;/a&gt; allowed a team of ten engineers
to incrementally move the company&amp;rsquo;s Ruby monolith towards static typing, without distracting
the larger organization with the push.
This was a difficult project, that could have consumed the entire organization for many months, but
focus allowed a small team to accomplish the majority of early work.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Creating a knowledge repository of how your organization thinks.&lt;/strong&gt;
Onboarding new hires, particularly senior new hires, is much more effective with documented strategy.&lt;/p&gt;
&lt;p&gt;For example, the &lt;a href="https://craftingengstrategy.com/user-data-strategy/"&gt;strategy for accessing user data&lt;/a&gt; requires that all
access to user data is supported by a clear, user-understandable rationale for that access.
While this might be obvious to new hires from larger companies,
folks with only small company experience are likely to be completely unaware this is necessary.
If it isn&amp;rsquo;t documented, compliance to the policy will quickly decline.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are some things that a strategy, even a cleverly written one, cannot do.
However, it&amp;rsquo;s always been my experience that developing a strategy creates progress,
even if the progress is understanding the inherent disagreement.&lt;/p&gt;
&lt;h2 id="inappropriate-strategy-is-especially-impactful"&gt;Inappropriate strategy is especially impactful&lt;/h2&gt;
&lt;p&gt;While good strategy can accomplish many things, it sometimes feels that inappropriate strategy is far more impactful.
Of course, impactful in all the wrong ways.
&lt;a href="https://lethain.com/digg-v4/"&gt;Digg V4&lt;/a&gt; remains the worst considered strategy I&amp;rsquo;ve personally participated in.
It was a complete rewrite of the Digg V3.5 codebase from a PHP monolith to a PHP frontend and backend of a dozen Python services.
It also moved the database from sharded MySQL to an early version of Cassandra.
Perhaps worst, it replaced the nuanced algorithms developed over a decade with a hack implemented a few days before
launch.&lt;/p&gt;
&lt;p&gt;Although it&amp;rsquo;s likely Digg would have struggled to become profitable due to its reliance on search engine optimization for traffic,
and Google&amp;rsquo;s frequently changing search algorithm of that era, the engineering strategy ensured we died fast
rather than having an opportunity to dig our way out.&lt;/p&gt;
&lt;p&gt;Importantly, it&amp;rsquo;s not just Digg. Almost every engineering organization you drill into will have its share of
unused platform projects that captured decades of engineering years to the detriment of an important opportunity.
A shocking number of senior leaders join new companies and initiate a &lt;a href="https://lethain.com/grand-migration/"&gt;grand migration&lt;/a&gt;
that attempts to entirely rewrite the architecture, switch programming languages, or otherwise shift their
new organization to resemble a prior organization where they understood things better.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;&lt;strong&gt;Inappropriate versus bad&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When I first wrote this section, I just labeled this sort of strategy as &amp;ldquo;bad.&amp;rdquo;
The challenge with that term is that the same strategy might well be very effective
in a different set of circumstances. For example, if Digg had been a three person
company with no revenue, rewriting from scratch could have the right decision!&lt;/p&gt;
&lt;p&gt;As a result, I&amp;rsquo;ve tried to prefer the term &amp;ldquo;inappropriate&amp;rdquo; rather than &amp;ldquo;bad&amp;rdquo;
to avoid getting caught up on whether a given approach &lt;em&gt;might&lt;/em&gt; work in other
circumstances. Every approach undoubtedly works in &lt;em&gt;some&lt;/em&gt; organization.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="written-strategy-drives-organizational-learning"&gt;Written strategy drives organizational learning&lt;/h2&gt;
&lt;p&gt;When I joined Carta, I noticed we had an inconsistent approach
to a number of important problems. Teams had distinct standard kits
for how they approached new projects. Adoption of existing internal platforms
was inconsistent, as was decision making around funding new internal platforms.
There was widespread agreement that we were in the process of decomposing our monolith,
but no agreement on how we were doing it.&lt;/p&gt;
&lt;p&gt;Coming into such a &lt;a href="https://craftingengstrategy.com/when-write-strategy/"&gt;permissive strategy&lt;/a&gt; environment,
with strong, differing perspectives on the ideal path forward, one of my first projects
was writing down an explicit engineering strategy along with our newly formed Navigators
team, itself a part of our new engineering strategy.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;&lt;strong&gt;Navigators at Carta&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As discussed in &lt;a href="https://lethain.com/navigators/"&gt;Navigators&lt;/a&gt;, we developed a program at Carta to have
explicitly named individuals who are technical leaders to represent key parts
of the engineering organization. This representative leadership group made it possible
to iterate on strategy with a small team of about ten engineers who represented the entire organization,
rather than take on the impossible task of negotiating with 400 engineers directly.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This written strategy made it possible to explicitly describe the problems we saw,
and how we wanted to navigate those problems. Further, it was an artifact that we
were able to iterate on in a small group, but then share widely for feedback from
teams we might have missed.&lt;/p&gt;
&lt;p&gt;After initial publishing, we shared it widely and talked about it frequently in engineering
all-hands meetings. Then we came back to it each year, or when things stopped making much sense,
and revised it. As an example, our initial strategy didn&amp;rsquo;t talk about artificial intelligence at all.
A few months later, we extended it to mention a very conservative approach to using Large Language Models.
Most recently, we&amp;rsquo;ve revised the artificial intelligence portion again, as we dive
deeply into &lt;a href="https://huyenchip.com//2025/01/07/agents.html"&gt;agentic workflows&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A lot of people have disagreed with parts of the strategy, which is great: that&amp;rsquo;s one
of the key benefits of a written strategy, it&amp;rsquo;s possible to precisely disagree.
From that disagreement, we&amp;rsquo;ve been able to evolve our strategy. Sometimes because there&amp;rsquo;s
new information like the current rapid evolution of artificial intelligence practices,
and other times because our initial approach could be improved like in how we gated
membership of the initial Navigators team.&lt;/p&gt;
&lt;p&gt;New hires are able to disagree too, and do it from an informed place rather than coming
across as attached to their prior company&amp;rsquo;s practices.
In particular, they&amp;rsquo;re able to understand the historical thinking
that motivated our decisions, even when that context is no longer obvious.
At the time we paused decomposition of our monolith, there was significant friction
in service provisioning, but that&amp;rsquo;s far less true today, which can make the decision
seem a bit arbitrary. Only the written document can consistently communicate that
context across a growing, shifting, and changing organization.&lt;/p&gt;
&lt;p&gt;With oral history, what you believe is highly dependent on who you talk with,
which shapes your view of history and the present.
With written history, it&amp;rsquo;s far more possible to agree at scale,
which is the prerequisite to growing at scale rather than isolating
growth to small pockets of senior leadership.&lt;/p&gt;
&lt;h2 id="the-cost-of-implicit-strategy"&gt;The cost of implicit strategy&lt;/h2&gt;
&lt;p&gt;We just finished talking about written strategy, and this book spends a lot of time on this topic,
including &lt;a href="https://craftingengstrategy.com/readable-strategy/"&gt;a chapter on how to structure strategies to maximize readability&lt;/a&gt;.
It&amp;rsquo;s not &lt;em&gt;just&lt;/em&gt; because of the positives created by written strategy, but also because of the damage unwritten strategy
creates.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Vulnerable to misinterpretation.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Information flow in verbal organizations depends on an individual being in a given room for a decision,
and then accurately repeating that information to the others who need it.
However, it&amp;rsquo;s common to see those individuals fail to repeat that information elsewhere.
Sometimes their interpretation is also faulty to some degree. Both of these create significant
problems in operating strategy.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;&lt;strong&gt;Two-headed organizations&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Some years ago, I started moving towards a model where most engineering organizations I worked with
have two leaders: one who&amp;rsquo;s a manager, and another who is a senior engineer. This was partially to
ensure engineering context was included in senior decision making, but it was also to reduce communication errors.&lt;/p&gt;
&lt;p&gt;Errors in point-to-point communication are so prevalent when done one-to-one, that the only solution I could
find for folks who weren&amp;rsquo;t reading-oriented communicators was ensuring I had communicated strategy (and other updates) to
at least two people.&lt;/p&gt;
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Inconsistency across teams.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;At one company I worked in, promotions to Staff-plus role happened at a much higher rate
in the infrastructure engineering organization than the product engineering team.
This created a constant drain out of product engineering to work on infrastructure-shaped
problems, even if those problems weren&amp;rsquo;t particularly valuable to the business.&lt;/p&gt;
&lt;p&gt;New leaders had no idea this informal policy existed, and they would routinely
run into trouble in &lt;a href="https://lethain.com/perf-management-system/"&gt;calibration discussions&lt;/a&gt;.
They &lt;em&gt;also&lt;/em&gt; weren&amp;rsquo;t aware they needed to go argue for a better policy.
Worse, no one was sure if this was a real policy or not, so it was ultimately random
whether this perspective was represented for any given promotion: sometimes
good promotions would be blocked, sometimes borderline cases would be approved.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Inconsistency over time.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Implementing a new policy tends to be a mix of persistent and one-time actions.
For example, let&amp;rsquo;s say you wanted to standardize all HTTP operations to use the same
library across your codebase. You might add a linter check to reject known alternatives,
and you&amp;rsquo;ll probably do a one-time pass across your codebase standardizing on that library.&lt;/p&gt;
&lt;p&gt;However, two years later there are another three random HTTP libraries in your codebase,
creeping into the cracks surrounding your linting. If the policy is written down, and a few
people read it, then there&amp;rsquo;s a number of ways this could be nonetheless prevented.
If it&amp;rsquo;s not written down, it&amp;rsquo;s much less likely someone will remember,
and much more likely they won&amp;rsquo;t remember the rationale well enough to argue about it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Hazard to new leadership.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When a new Staff-plus engineer or executive joins a company,
it&amp;rsquo;s common to blame them for failing to understand the existing context behind
decisions. That&amp;rsquo;s fair: a big part of senior leadership is uncovering and understanding context.
It&amp;rsquo;s also unfair: explicit documentation of prior thinking would have made this much easier for them.&lt;/p&gt;
&lt;p&gt;Every particularly bad new-leader onboarding that I&amp;rsquo;ve seen has involved a new leader coming into an unfilled role,
that the new leader&amp;rsquo;s manager didn&amp;rsquo;t know how to do. In those cases, success is entirely dependent on that new leader&amp;rsquo;s
ability and interest in learning.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In most ways, the practice of documenting strategy has a lot in common
with &lt;a href="https://lethain.com/succession-planning/"&gt;succession planning&lt;/a&gt;, where the full benefits
accrue to the organization rather than to the individual doing it.
It&amp;rsquo;s possible to maintain things when the original authors are present,
appreciating the value requires stepping outside yourself for a moment
to value things that will matter most to the organization when you&amp;rsquo;re no
longer a member.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;&lt;strong&gt;Information herd immunity&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A frequent objection to written strategy is that no one reads anything.
There&amp;rsquo;s some truth to this: it&amp;rsquo;s extremely hard to get everyone in an organization
to know something. However, I&amp;rsquo;ve never found that goal to be particularly important.&lt;/p&gt;
&lt;p&gt;My view of information dispersal in an organization is the same as
&lt;a href="https://en.wikipedia.org/wiki/Herd_immunity"&gt;Herd immunity&lt;/a&gt;:
you don&amp;rsquo;t need everyone to know something, just to have enough people who know
something that confusion doesn&amp;rsquo;t propagate too far.&lt;/p&gt;
&lt;p&gt;So, it may be impossible for all engineers to know strategy details,
but you certainly can have every Staff-plus engineer and engineering manager
know those details.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="strategy-supports-personal-learning"&gt;Strategy supports personal learning&lt;/h2&gt;
&lt;p&gt;While I believe that the largest benefits of strategy accrue to the
organization, rather than the individual creating it, I also believe that
strategy is an underrated avenue for self-development.&lt;/p&gt;
&lt;p&gt;The ways that I&amp;rsquo;ve seen strategy support personal development are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Creating strategy builds self-awareness.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Starting with a concrete example, I&amp;rsquo;ve worked with several engineers who viewed themselves
as extremely senior, but frequently demanded that projects were implemented using new programming
languages or technologies because they personally wanted to learn about the technology.
Their internal strategy was clear&amp;ndash;they wanted to work on something fun&amp;ndash;but following
&lt;a href="https://craftingengstrategy.com/strategy-steps/"&gt;the steps to build an engineering strategy&lt;/a&gt; would have
created a strategy that even they agreed didn&amp;rsquo;t make sense.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Strategy supports situational awareness in new environments.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://craftingengstrategy.com/wardley-mapping/"&gt;Wardley mapping&lt;/a&gt; talks a lot about situational awareness
as a prerequisite to good strategy.
This is ensuring you understand the realities of your circumstances,
which is the most destructive failure of new senior engineering leaders.
By explicitly stating the diagnosis where the strategy applied,
it makes it easier for you to debug why reusing a prior strategy
in a new team or company might not work.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Strategy as your personal archive.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Just as documented strategy is institutional memory,
it also serves as personal memory to understand the
impact of your prior approaches.
Each of us is an archivist of our prior work, pulling out the
most valuable pieces to address the problem at hand.
Over a long career, memory fades&amp;ndash;and motivated reasoning creeps in&amp;ndash;but explicit documentation doesn&amp;rsquo;t.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Indeed, part of the reason I started working on this book &lt;em&gt;now&lt;/em&gt; rather than later
is that I realized I was starting to forget the details of the strategy work I did
earlier in my career. If I wanted to preserve the wisdom of that era, and ensure I
didn&amp;rsquo;t have to relearn the same lessons in the future, I had to write it now.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve covered why strategy can be a valuable learning mechanism for both your engineering organization
and for you. We&amp;rsquo;ve shown how strategies have helped organizations deal with service migrations,
monolith decomposition, and right-sizing backfilling. We&amp;rsquo;ve also discussed how inappropriate strategy
contributed to Digg&amp;rsquo;s demise.&lt;/p&gt;
&lt;p&gt;However, if I had to pick two things to emphasize as this chapter ends, it wouldn&amp;rsquo;t be any of those
things. Rather, it would be two themes that I find are the most frequently ignored:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;There&amp;rsquo;s always a strategy, even if it isn&amp;rsquo;t written down.&lt;/li&gt;
&lt;li&gt;The single biggest act you can take to further strategy in your organization
is to write down strategy so it can be debated, agreed upon, and explicitly evolved.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Discussions around topics like strategy often get caught up in high prestige activities
like making controversial decisions, but the most effective strategists I&amp;rsquo;ve seen make
more progress by actually performing the basics: writing things down, exploring widely
to see how other companies solve the same problem, accepting feedback into their draft
from folks who disagree with them. Strategy &lt;em&gt;is&lt;/em&gt; useful, and doing strategy can be simple, too.&lt;/p&gt;</description></item><item><title>"We're a product engineering company!" — Engineering strategy at Calm.</title><link>https://craftingengstrategy.com/product-eng-strategy/</link><pubDate>Thu, 23 Jan 2025 06:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/product-eng-strategy/</guid><description>&lt;p&gt;In my career, the majority of the strategy work I&amp;rsquo;ve done has been in non-executive roles,
things like &lt;a href="https://craftingengstrategy.com/uber-strategy/"&gt;Uber&amp;rsquo;s service migration&lt;/a&gt;.
Joining Calm was my first executive role, where I was able to not only propose but also mandate strategy.&lt;/p&gt;
&lt;p&gt;Like almost all startups, the engineering team was scattered when I joined.
Was our most important work creating more scalable infrastructure?
Was our greatest risk the failure to adopt leading programming languages?
How did we rescue the stuck &lt;a href="https://craftingengstrategy.com/monolith-decomposition-strategy/"&gt;service decomposition initiative&lt;/a&gt;?&lt;/p&gt;
&lt;p&gt;This strategy is where the engineering team and I aligned after numerous rounds of iteration,
debate, and inevitably some disagreement. As a strategy, it&amp;rsquo;s both basic and also unambiguous
about what we valued, and I believe it&amp;rsquo;s a reasonably good starting point for any &lt;a href="https://lethain.com/quality/"&gt;low scalability-complexity&lt;/a&gt;
consumer product.&lt;/p&gt;
&lt;h2 id="reading-this-document"&gt;Reading this document&lt;/h2&gt;
&lt;p&gt;To apply this strategy, start at the top with &lt;em&gt;Policy&lt;/em&gt;. To understand the thinking behind this strategy, read sections in reverse order, starting with &lt;em&gt;Explore&lt;/em&gt;, then &lt;em&gt;Diagnose&lt;/em&gt; and so on.
Relative to the default structure, this document has one tweak, folding the &lt;em&gt;Operation&lt;/em&gt; section in with &lt;em&gt;Policy&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;More detail on this structure in &lt;a href="https://lethain.com/readable-engineering-strategy-documents"&gt;Making a readable Engineering Strategy document&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="policy--operation"&gt;Policy &amp;amp; Operation&lt;/h2&gt;
&lt;p&gt;Our new policies, and the mechanisms to operate them are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;We are a product engineering company.&lt;/strong&gt;
Users write in every day to tell us that our product has changed their lives for the better.
Our technical infrastructure doesn&amp;rsquo;t get many user letters&amp;ndash;and this is unlikely to change going forward
as our infrastructure is relatively low-scale and low-complexity.
Rather than attempting to change that, we want to devote the absolute maximum possible attention to product engineering.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;We exclusively adopt new technologies to create valuable product capabilities.&lt;/strong&gt;
We believe our technology stack as it exists today can solve the majority of our current and future product roadmaps.
In the rare case where we adopt a new technology, we do so because a product capability is inherently impossible
without adopting a new technology.&lt;/p&gt;
&lt;p&gt;We do not adopt new technologies for other reasons. For example, we would not adopt a new technology because
someone is interested in learning about it. Nor would we adopt a technology because it is 30% &lt;em&gt;better suited&lt;/em&gt;
to a task.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;We write all code in the monolith.&lt;/strong&gt;
It has been ambiguous if new code (especially new application code) should be written in our JavaScript monolith,
or if all new code &lt;em&gt;must&lt;/em&gt; be written in a new service outside of the monolith.
This is no longer ambiguous: all new code must be written in the monolith.&lt;/p&gt;
&lt;p&gt;In the rare case that there is a functional requirement that makes writing in the monolith implausible,
then you should request an exception as described below.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Exceptions are granted by the CTO, and must be in writing.&lt;/strong&gt;
The above policies are deliberately restrictive. Sometimes they may be wrong, and we will
make exceptions to them. However, each exception should be deliberate and grounded in concrete
problems we are aligned both on solving and how we solve them.
If we all scatter towards our preferred solution, then we&amp;rsquo;ll create negative leverage for Calm
rather than serving as the engine that advances our product.&lt;/p&gt;
&lt;p&gt;All exceptions must be written. If they are not written, then you should operate as if it has not been granted.
Our goal is to avoid ambiguity around whether an exception has, or has not, been approved.
If there&amp;rsquo;s no written record that the CTO approved it, then it&amp;rsquo;s not approved.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Proving the point about exceptions, there are two confirmed exceptions to the above strategy:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;We are incrementally migrating to TypeScript.&lt;/strong&gt;
We have found that static typing can prevent a number of our user-facing bugs.
TypeScript provides a clean, incremental migration path for our JavaScript codebase,
and we aim to migrate the entirety over the next six months.&lt;/p&gt;
&lt;p&gt;Our Web engineering team is leading this migration.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;We are evaluating Postgres Aurora as our primary database.&lt;/strong&gt;
Many of our recent production incidents are caused by index scans
for tables with high write velocity such as tracking customer logins.
We believe Aurora will perform better under these workloads.&lt;/p&gt;
&lt;p&gt;Our Infrastructure engineering team is leading this initiative.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="diagnose"&gt;Diagnose&lt;/h2&gt;
&lt;p&gt;The current state of our engineering organization:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Our product is not limited by missing infrastructure capabilities.&lt;/strong&gt;
Reviewing our roadmap, there&amp;rsquo;s nothing that we are trying to build today
or over the next year that is constrained by our technical infrastructure.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Our uptime, stability and latency are OK but not great.&lt;/strong&gt;
We have semi-frequent stability and latency issues in our application,
all of which are caused by one of two issues.
First, deploying new code with a missing index because it performed well enough in a test environment.
Second, writes to a small number of extremely large, skinny tables have become expensive in combination
with scans over those tables&amp;rsquo; indexes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Our infrastructure team is split between supporting monolith and service workflows.&lt;/strong&gt;
One way to measure technical debt is to understand how much time the team is spending
maintaining the current infrastructure. Today, that is meaningful but not overwhelming
work for our team of three infrastructure engineers supporting 30 product engineers.&lt;/p&gt;
&lt;p&gt;However, we &lt;em&gt;are&lt;/em&gt; finding infrastructure engineers increasingly pulled into debugging
incidents for components moved out of the central monolith into our service architecture.
This is partially due to increased inherent complexity, but it&amp;rsquo;s more due to exposing
lack of monitoring and ambiguous accountability in services&amp;rsquo; production incidents.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Our product and executive stakeholders experience us as competing factions.&lt;/strong&gt;
Engineering exists to build and operate software in the company.
Part of that is being easy to work with. We should not necessarily support every ask from Product
if we believe they are misaligned with Engineering&amp;rsquo;s goals (e.g. maintaining security),
but it should generally provide a consistent perspective across our team.&lt;/p&gt;
&lt;p&gt;Today, our stakeholders believe they will get radically different answers to basic
questions of capabilities and approach depending on who they ask. If they try to
get a group of engineers to agree on an approach, they often find we derail into
debate about approach rather than articulating a clear point of view that allows
the conversation to move forward.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;We&amp;rsquo;re spending an outsized amount of time debating technology adoptions and rewrites.&lt;/strong&gt;
Most of our disagreements stem from adopting new technologies or rewriting existing
components into new technology stacks. For example, can we extend this feature or
do we have to migrate it to a service before extending it?
Can we add this to our database or should we move it into a new Redis cache instead?
Is JavaScript a sufficient programming language, or do we need to rewrite this functionality in Go?&lt;/p&gt;
&lt;p&gt;This is particularly relevant to next steps around the ongoing services migration,
which has been in-flight for over a year, but is yet to move any core production code.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;We are spending more time on infrastructure and platform work than product work.&lt;/strong&gt;
This is the combination of all the above issues, from the stability issues we are
encountering in our database design, to the lack of engineering alignment on execution.
This places us at odds with stakeholders&amp;rsquo; expectations that we are predominantly focused
on new product development.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="explore"&gt;Explore&lt;/h2&gt;
&lt;p&gt;Calm is a mobile application that guides users to build and maintain either a meditation or sleep habit.
Recommendations and guidance across content are individual to the user, but the content is shared across
all customers and is amenable to caching on a content delivery network (CDN).
As long as the CDN is available, the mobile application can operate despite the inability to access servers
(e.g. the application remains usable from a user&amp;rsquo;s perspective, even if the non-CDN production infrastructure
is unreachable).&lt;/p&gt;
&lt;p&gt;In 2010, enabling a product of this complexity would have required significant bespoke infrastructure,
along with likely maintaining a physical presence in a series of datacenters to run your software.
In 2020, comparable applications are generally moving towards maintaining as little internal infrastructure as possible.
This perspective is summarized effectively in Intercom&amp;rsquo;s &lt;a href="https://www.intercom.com/blog/run-less-software/"&gt;Run Less Software&lt;/a&gt;
and Dan McKinley&amp;rsquo;s &lt;a href="https://mcfunley.com/choose-boring-technology"&gt;Choose Boring Technology&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;New companies founded in this space view essentially all infrastructure as a commodity bought off your cloud provider.
This even extends to areas of innovation, such as machine learning, where the training infrastructure is typically
run on an offering like AWS Bedrock, and the model infrastructure is provided by Anthropic or OpenAI.&lt;/p&gt;</description></item><item><title>Bridging theory and practice in engineering strategy.</title><link>https://craftingengstrategy.com/theory-and-practice/</link><pubDate>Thu, 16 Jan 2025 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/theory-and-practice/</guid><description>&lt;p&gt;Some people I&amp;rsquo;ve worked with have lost hope that engineering strategy
actually exists within &lt;em&gt;any&lt;/em&gt; engineering organizations.
I imagine that they, reading through the
&lt;a href="https://craftingengstrategy.com/strategy-steps/"&gt;steps to build engineering strategy&lt;/a&gt;,
or the &lt;a href="https://craftingengstrategy.com/project-resourcing-strategy/"&gt;strategy for resourcing Engineering-driven projects&lt;/a&gt;,
are not impressed. Instead, these ideas probably come across as theoretical at best.
In less polite company, they might describe these ideas as fake constructs.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s talk about it! Because they&amp;rsquo;re right. In fact, they&amp;rsquo;re right in two different ways.
First, this book is focused on explaining how to create clean, refined, and definitive strategy documents,
where initially most real strategy artifacts look rather messy.
Second, applying these techniques in practice can require a fair amount of creativity.
It might sound easy, but it&amp;rsquo;s quite difficult in practice.&lt;/p&gt;
&lt;p&gt;This chapter will cover:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why strategy documents need to be clear and definitive,
especially when strategy development has been messy&lt;/li&gt;
&lt;li&gt;How to iterate on strategy when there are demands for unrealistic timelines&lt;/li&gt;
&lt;li&gt;Using strategy as non-executives, where others might override your strategy&lt;/li&gt;
&lt;li&gt;Handling dynamic, quickly changing environments where diagnosis can change frequently&lt;/li&gt;
&lt;li&gt;Working with indecisive stakeholders who don&amp;rsquo;t provide clarity on approach&lt;/li&gt;
&lt;li&gt;Surviving other people&amp;rsquo;s bad strategy work&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Alright, let&amp;rsquo;s dive into the many ways that praxis doesn&amp;rsquo;t quite line up with theory.&lt;/p&gt;
&lt;h2 id="clear-and-definitive-documents"&gt;Clear and definitive documents&lt;/h2&gt;
&lt;p&gt;As explored in &lt;a href="https://craftingengstrategy.com/readable-strategy/"&gt;Making engineering strategies more readable&lt;/a&gt;,
documents that feel intuitive to write are often fairly difficult to read. That&amp;rsquo;s because thinking tends
to be a linear-ish journey from a problem to a solution. Most readers, on the other hand, usually just
want to know the solution and then move on. That&amp;rsquo;s because good strategies are read for direction
(e.g. when a team wants to understand how they&amp;rsquo;re supposed to solve a specific issue at hand)
far more frequently than they&amp;rsquo;re read to build agreement (e.g. building stakeholder alignment
during the initial development of the strategy).&lt;/p&gt;
&lt;p&gt;However, many organizations only produce writer-oriented strategy documents,
and may not have any reader-oriented documents at all.
If you&amp;rsquo;ve predominantly worked in those sorts of organizations,
then the first reader-oriented documents you encounter will seem artificial.&lt;/p&gt;
&lt;p&gt;There are also organizations that have many reader-oriented documents,
but omit the rationale behind those documents. Those documents feel prescriptive
and heavy-handed, because the infrequent reader who &lt;em&gt;does&lt;/em&gt; want to understand
the thinking can&amp;rsquo;t find it. Further, when they want to propose an alternative,
they have to do so without the rationale behind the current policies:
the absence of that context often transforms what was a collaborative problem-solving
opportunity into a political conflict.&lt;/p&gt;
&lt;p&gt;With that in mind, I&amp;rsquo;d encourage you to see the frequent absence of these documents
as a major opportunity to drive strategy within your organization, rather than
evidence that these documents don&amp;rsquo;t work. My experience is that they do.&lt;/p&gt;
&lt;h2 id="doing-strategy-despite-unrealistic-timelines"&gt;Doing strategy despite unrealistic timelines&lt;/h2&gt;
&lt;p&gt;The most frequent failure mode I see for strategy is when it&amp;rsquo;s rushed,
and its authors accept that thinking must stop when the artificial deadline is reached.
Taking annual planning at Stripe as an example,
&lt;a href="https://www.amazon.com/Scaling-People-Tactics-Management-Building/dp/1953953212/"&gt;Claire Hughes Johnson&lt;/a&gt;
argued that planning expands to fit any timeline, and consequently set a short planning timeline of
several weeks. Some teams accepted that as a fixed timeline and &lt;em&gt;stopped planning&lt;/em&gt; when the timeline
ended, whereas effective teams never stopped planning before or after the planning window.&lt;/p&gt;
&lt;p&gt;When strategy work is given an artificial or unrealistic timeline,
you should deliver the best draft you can.
Afterwards, rather than being finished, you should view yourself as
&lt;a href="https://craftingengstrategy.com/refine/"&gt;starting the refinement process&lt;/a&gt;.
An open strategy secret is that many strategies never leave
the refinement phase, and continue to be tweaked throughout their
lifespan. Why should a strategy with an early deadline be any different?&lt;/p&gt;
&lt;p&gt;Well, there is one important problem to acknowledge:
I&amp;rsquo;ve often found that the executive who initially provided the
unrealistic timeline intended it as a forcing function to inspire action and quick thinking.
If you have a discussion with them directly, they&amp;rsquo;re usually quite open to adjusting the approach.
However, the intermediate layers of leadership between that executive and you often calcify
on a particular approach which they claim that the executive insists on precisely following.&lt;/p&gt;
&lt;p&gt;Sometimes having the conversation with the responsible executive is quite difficult.
In that case, you do have to work with individuals taking the strategy literally and as unalterable
until either you can have the conversation or something goes wrong enough that the executive
starts paying attention again. Usually, though, you can find someone who has a communication path,
as long as you can articulate the issue clearly.&lt;/p&gt;
&lt;h2 id="using-strategy-as-non-executives"&gt;Using strategy as non-executives&lt;/h2&gt;
&lt;p&gt;Some engineers will argue that the only valid &lt;a href="https://craftingengstrategy.com/when-write-strategy/"&gt;strategy altitude&lt;/a&gt;
is the highest one defined by executives, because any other strategy can be invalidated by
a new, higher altitude strategy.
They would claim that teams simply &lt;em&gt;cannot&lt;/em&gt; do strategy, because executives might invalidate it.
Some engineering executives would argue the same thing, instead claiming that they can&amp;rsquo;t work on an engineering strategy
because the missing product strategy or business strategy might introduce new constraints.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t agree with this line of thinking at all.
To do strategy at any altitude, you have to come to terms with the certainty that
new information will show up, and you&amp;rsquo;ll need to revise your strategy to deal with that.&lt;/p&gt;
&lt;p&gt;The strategy for &lt;a href="https://craftingengstrategy.com/user-data-strategy/"&gt;controlling access to user data&lt;/a&gt; is a good counterexample
against the premise that effective strategy requires executive support.
The lack of progress had been framed as the result of limited executive engagement on the topic,
which had led to a disengaged team. However, as we started to work on the ergonomics of the problem,
we came to realize that we could significantly reduce unnecessary access to user data without
any top-down support at all.&lt;/p&gt;
&lt;p&gt;When it comes to using strategy, effective diagnosis trumps authority.
At least as many executives&amp;rsquo; strategies are ravaged by reality&amp;rsquo;s pervasive details as are overridden by higher altitude strategies.
The only way to be certain your strategy will fail is waiting until you&amp;rsquo;re certain that
no new information might show up and require it changing.&lt;/p&gt;
&lt;h2 id="doing-strategy-in-chaotic-environments"&gt;Doing strategy in chaotic environments&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://craftingengstrategy.com/llm-adoption-strategy/"&gt;How should you adopt LLMs?&lt;/a&gt; discusses how a company should plot a path
through the rapidly evolving LLM ecosystem.
Periods of rapid technology evolution are one reason why your strategy might encounter a pocket of chaos,
but there are many others. Pockets of rapid hiring, as well as layoffs, create chaos.
The departure of load-bearing senior leaders can change a company quickly.
Slowing revenue in a company&amp;rsquo;s core business can also initiate chaotic actions
in pursuit of a new business.&lt;/p&gt;
&lt;p&gt;Strategies don&amp;rsquo;t require stable environments. Instead, strategies require awareness of
the environment that they&amp;rsquo;re operating in. In a stable period, a strategy might expect
to run for several years and expect relatively little deviation from the initial approach.
In a dynamic period, the strategy might know you can only protect capacity in two-week chunks
before a new critical initiative pops up.
It&amp;rsquo;s possible to execute good strategy in either scenario, but it&amp;rsquo;s impossible to execute good strategy
if you don&amp;rsquo;t diagnose the context effectively.&lt;/p&gt;
&lt;h2 id="unreliable-information"&gt;Unreliable information&lt;/h2&gt;
&lt;p&gt;Oftentimes, the way forward is very obvious if a few key decisions were made.
You know who is supposed to make those decisions, but you simply cannot get them
to decide.
My most visceral experience of this was conducting a layoff where the CEO wouldn&amp;rsquo;t
define a target cost reduction or a thesis of how much various functions (e.g. engineering, marketing, sales)
should contribute to those reductions.
With those two decisions, engineering&amp;rsquo;s approach would be obvious, and without that clarity
things felt impossible.&lt;/p&gt;
&lt;p&gt;Although I was frustrated at the time,
I&amp;rsquo;ve since come to appreciate that missing decisions are the norm rather than the exception.
The strategy on &lt;a href="https://craftingengstrategy.com/private-equity-strategy/"&gt;Navigating Private Equity ownership&lt;/a&gt; deals with
this problem by acknowledging a missing decision, and expressly blocking one part of its execution
on that decision being made.
Other parts of its plan, like changing how roles are backfilled, went ahead to address
the broader cost problem.&lt;/p&gt;
&lt;p&gt;Rather than blocking on missing information, your strategy should acknowledge what&amp;rsquo;s
missing and move forward where you can. Sometimes that&amp;rsquo;s moving forward by taking risk,
sometimes that&amp;rsquo;s delaying for clarity, but it&amp;rsquo;s never accepting that you&amp;rsquo;re stuck
without options other than pointing a finger.&lt;/p&gt;
&lt;h2 id="surviving-other-peoples-bad-strategy-work"&gt;Surviving other people&amp;rsquo;s bad strategy work&lt;/h2&gt;
&lt;p&gt;Sometimes you will be told to follow something which is described as a strategy,
but is really just a policy without any strategic thinking behind it.
This is an unavoidable element of working in organizations and happens for all sorts of reasons.
Sometimes, your organization&amp;rsquo;s leader doesn&amp;rsquo;t believe it&amp;rsquo;s valuable to explain their thinking to others,
because they see themselves as the one important decision maker.&lt;/p&gt;
&lt;p&gt;Other times, your leader doesn&amp;rsquo;t agree with a policy they&amp;rsquo;ve been instructed to roll out.
Adoption of &amp;ldquo;high hype&amp;rdquo; technologies like blockchain technologies during the crypto boom
was often top-down direction from company leadership that engineering disagreed with,
but was obligated to align with. In this case, your leader is finding that it&amp;rsquo;s hard
to explain a strategy that they themselves don&amp;rsquo;t understand either.&lt;/p&gt;
&lt;p&gt;This is a frustrating situation. What I&amp;rsquo;ve found most effective is writing a strategy of my own,
one that acknowledges the broader strategy I disagree with in its diagnosis as a static, unavoidable truth.
From there, I&amp;rsquo;ve been able to make practical decisions that recognize the context, even if it&amp;rsquo;s not
a context I&amp;rsquo;d have selected for myself.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;I started this chapter by acknowledging that the &lt;a href="https://craftingengstrategy.com/strategy-steps/"&gt;steps to building engineering strategy&lt;/a&gt;
are a theory of strategy, and one that can get quite messy in practice.
Now you know why strategy documents often come across as overly pristine&amp;ndash;because they&amp;rsquo;re trying to communicate clearly
about a complex topic.&lt;/p&gt;
&lt;p&gt;You also know how to navigate the many ways reality pulls you away from perfect strategy,
such as unrealistic timelines, higher altitude strategies invalidating your own strategy work,
working in a chaotic environment, and dealing with stakeholders who refuse to align with your strategy.
Finally, we acknowledged that sometimes strategy work done by others is not what we&amp;rsquo;d consider strategy.
It&amp;rsquo;s often unsupported policy with neither a diagnosis nor an approach to operating the policy.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s all stuff you&amp;rsquo;re going to run into, and it&amp;rsquo;s all stuff you&amp;rsquo;re going to overcome
on the path to doing good strategy work.&lt;/p&gt;</description></item><item><title>Uber's service migration strategy circa 2014.</title><link>https://craftingengstrategy.com/uber-strategy/</link><pubDate>Thu, 09 Jan 2025 06:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/uber-strategy/</guid><description>&lt;p&gt;In early 2014, I joined as an engineering manager for Uber&amp;rsquo;s Infrastructure team.
We were responsible for a wide number of things, including provisioning new services.
While the overall team I led grew significantly over time,
the subset working on service provisioning never
grew beyond four engineers.&lt;/p&gt;
&lt;p&gt;Those four engineers successfully migrated 1,000+ services onto a new, future-proofed service platform.
More importantly, they did it while absorbing the majority, although certainly not the entirety, of the migration workload
onto that small team rather than spreading it across the 2,000+ engineers working at Uber at the time.
Their strategy serves as an interesting case study of how a team can drive strategy, even without any executive sponsor,
by focusing on solving a pressing user problem, and providing effective ergonomics while doing so.&lt;/p&gt;
&lt;h2 id="reading-this-document"&gt;Reading this document&lt;/h2&gt;
&lt;p&gt;To apply this strategy, start at the top with &lt;em&gt;Policy&lt;/em&gt;. To understand the thinking behind this strategy, read sections in reverse order, starting with &lt;em&gt;Explore&lt;/em&gt;, then &lt;em&gt;Diagnose&lt;/em&gt; and so on.
Relative to the default structure, this document makes one tweak, folding the &lt;em&gt;Operation&lt;/em&gt; section in with &lt;em&gt;Policy&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;More detail on this structure in &lt;a href="https://lethain.com/readable-engineering-strategy-documents"&gt;Making a readable Engineering Strategy document&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="policy--operation"&gt;Policy &amp;amp; Operation&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve adopted these guiding principles for
extending Uber&amp;rsquo;s service platform:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Constrain manual provisioning allocation to maximize investment in self-service provisioning.&lt;/strong&gt;
The service provisioning team will maintain a fixed allocation of one full time engineer on manual service provisioning tasks.
We will move the remaining engineers to work on automation to speed up future service provisioning.
This will degrade manual provisioning in the short term, but the alternative is
permanently degrading provisioning by the influx of new service requests from newly hired product engineers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Self-service must be safely usable by a new hire without Uber context.&lt;/strong&gt;
It is possible today to make a Puppet or Clusto change while provisioning a new service that
negatively impacts the production environment. This must not be true in any self-service solution.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Move to structured requests, and out of tickets.&lt;/strong&gt;
Missing or incorrect information in provisioning requests create significant delays in provisioning.
Further, collecting this information is the first step of moving to a self-service process.
As such, we can get paid twice by reducing errors in manual provisioning while also
creating the interface for self-service workflows.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Prefer initializing new services with good defaults rather than requiring user input.&lt;/strong&gt;
Most new services are provisioned for new projects with strong timeline pressure
but little certainty on their long-term requirements.
These users cannot accurately predict their future needs, and expecting them to do so
creates significant friction.&lt;/p&gt;
&lt;p&gt;Instead, the provisioning framework should suggest good defaults,
and make it easy to change the settings later when users have more clarity.
The gate from development environment to production environment is a particularly
effective one for ensuring settings are refreshed.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We are materializing those principles into this
sequenced set of tasks:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Create an internal tool that coordinates service provisioning,
replacing the process where teams request new services via Phabricator tickets.
This new tool will maintain a schema of required fields that must be supplied,
with the aim of eliminating the majority of back and forth between teams during service provisioning.&lt;/p&gt;
&lt;p&gt;In addition to capturing necessary data, this will also serve as our interface
for automating various steps in provisioning without requiring future changes in
the workflow to request service provisioning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Extend the internal tool to generate Puppet scaffolding for new services,
reducing the potential for errors in two ways. First, the data supplied in the service provisioning
request can be directly included in the rendered template.
Second, this will eliminate most human tweaking of templates where typos can create issues.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Port allocation poses a particularly high-risk, as reusing a port can
break routing to an existing production service.
As such, this will be the first area we fully automate, with the provisioning service
supplying the allocated port rather than requiring requesting teams to provide an already allocated port.&lt;/p&gt;
&lt;p&gt;Doing this will require moving the port registry out of a Phabricator wiki page
and into a database, which will allow us to guard access with a variety of checks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Manual assignment of new services to servers often leads to new services being allocated
to already heavily utilized servers. We will replace the manual assignment with an automated system, and do so
with the intention of migrating to the Mesos/Aurora cluster once it is available for production workloads.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Each week, we&amp;rsquo;ll review the size of the service provisioning queue,
along with the service provisioning time to assess whether the
strategy is working or needs to be revised.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;&lt;strong&gt;Prolonged strategy testing&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Although I didn&amp;rsquo;t have a name for this practice
in 2014 when we created and implemented this strategy,
the preceding section captures an important reality of team-led bottom-up strategy:
when you don&amp;rsquo;t have authority to mandate compliance, you have to get the details right.
The best way to do that is a prolonged &lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt; phase.
Indeed, because compliance is rooted in effectiveness,
my experience is that non-executive strategy development can never stop refining their approach.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="refine"&gt;Refine&lt;/h2&gt;
&lt;p&gt;In order to refine our diagnosis, we&amp;rsquo;ve &lt;a href="https://craftingengstrategy.com/uber-strategy-model/"&gt;created a systems model for service onboarding&lt;/a&gt;.
This will allow us to simulate a variety of different approaches to our problem,
and determine which approach, or combination of approaches, will be most effective.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/uber-provis-model-errors.png" alt="Systems model of provisioning services at Uber circa 2014"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Systems model of provisioning services at Uber circa 2014&lt;/p&gt;
&lt;p&gt;As we exercised the model, it became clear that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;we are increasingly falling behind,&lt;/li&gt;
&lt;li&gt;hiring onto the service provisioning team is not a viable solution, and&lt;/li&gt;
&lt;li&gt;moving to a self-service approach is our only option.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;While the model writeup justifies each of those statements in more detail,
we&amp;rsquo;ll include two charts here. The first chart shows the status quo,
where new service provisioning requests, labeled as &lt;code&gt;Initial RequestedServices&lt;/code&gt;, quickly accumulate into a backlog.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/uber-model-diag-1.png" alt="Service provisioning model without error states"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Service provisioning model without error states&lt;/p&gt;
&lt;p&gt;Second, we have a chart comparing the outcomes between the current status quo and a self-service approach.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/uber-model-chart-self-service.png" alt="Impact of self-service provisioning on provisioning rate"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Impact of self-service provisioning on provisioning rate&lt;/p&gt;
&lt;p&gt;In that chart, you can see that the service provisioning backlog in the self-service model remains steady,
as represented by the &lt;code&gt;SelfService RequestedServices&lt;/code&gt; line. Of the various attempts to find a solution,
none of the others showed promise, including eliminating all errors in provisioning and increasing the team&amp;rsquo;s
capacity by 500%.&lt;/p&gt;
&lt;h2 id="diagnose"&gt;Diagnose&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve diagnosed the current state of service provisioning at Uber as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Many product engineering teams are aiming to leave the centralized monolith,
which is generating two to three service provisioning requests each week.
We expect this rate to increase roughly linearly with the size of the product engineering
organization.&lt;/p&gt;
&lt;p&gt;Even if we disagree with this shift to additional services,
there&amp;rsquo;s no team responsible for maintaining the extensibility of the monolith,
and working in the monolith is the number one source of developer frustration,
so we don&amp;rsquo;t have a practical counter proposal to offer engineers other than
provisioning a new service.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The engineering organization is doubling every six months.
Consequently, a year from now, we expect eight to twelve service provisioning requests every week.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Within infrastructure engineering, there is a team of four engineers responsible for service provisioning today.
While our organization is growing at a similar rate as product engineering,
none of that additional headcount is being allocated directly to the team working on service provisioning.
We do not anticipate this changing.&lt;/p&gt;
&lt;p&gt;Some additional headcount is being allocated to Service Reliability Engineers (SREs) who
can take on the most nuanced, complicated service provisioning work.
However, their bandwidth is already heavily constrained across many tasks,
so relying on SREs is an insufficient solution.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The queue for service provisioning is already increasing in size as things are today.
Barring some change, many services will not be provisioned in a timely fashion.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Today, provisioning a new service takes about a week, with numerous round trips
between the requesting team and the provisioning team.
Missing and incorrect information between teams is the largest source of
delay in provisioning services.&lt;/p&gt;
&lt;p&gt;If the provisioning team has all the necessary information and it&amp;rsquo;s accurate,
then a new service can be provisioned in about three to four hours of work across
configuration in Puppet, metadata in Clusto, allocating ports, assigning the service to servers, and so on.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There are few safeguards on port allocation, server assignment and so on.
It is easy to inadvertently cause a production outage during service provisioning
unless done with attention to detail.&lt;/p&gt;
&lt;p&gt;Given our rate of hiring, training the engineering organization to use this unsafe toolchain
is an impractical solution: even if we train the entire organization perfectly today,
there will be just as many untrained individuals in six months.
Further, product engineering leadership has no interest in their team
being diverted to service provisioning training.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It&amp;rsquo;s widely agreed across the infrastructure engineering team that
essentially every component of service provisioning should be replaced as soon as possible,
but there is no concrete plan to replace any of the core components.
Further, there is no team accountable for replacing these components,
which means the service provisioning team will either need to
work around the current tooling or replace that tooling ourselves.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It&amp;rsquo;s urgent to unblock development of new services, but moving those new services
to production is rarely urgent, and occurs after a long internal development period.
Evidence of this is that requests to provision a new service generally come with significant urgency
and internal escalations to management. After the service is provisioned for development,
there are relatively few urgent escalations other than one-off requests for increased
production capacity during incidents.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Another team within infrastructure is actively exploring adoption of Mesos and Aurora,
but there&amp;rsquo;s no concrete timeline for when this might be available for our usage.
Until they commit to supporting our workloads, we&amp;rsquo;ll need to find an alternative
solution.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="explore"&gt;Explore&lt;/h2&gt;
&lt;p&gt;Uber&amp;rsquo;s server and service infrastructure today is composed of a handful of pieces.
First, we run servers on-prem within a handful of colocations.
Second, we describe each server in Puppet manifests to support repeatable provisioning of servers.
Finally, we manage fleet and server metadata in a tool named Clusto, originally created by Digg,
which allows us to populate Puppet manifests with server and cluster appropriate metadata during provisioning.
In general, we agree that our current infrastructure is nearing its end of lifespan,
but it&amp;rsquo;s less obvious what the appropriate replacements are for each piece.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s significant internal opposition to running in the cloud,
up to and including our CEO, so we don&amp;rsquo;t believe that will change
in the foreseeable future. We do however believe there&amp;rsquo;s opportunity to
change our service definitions from Puppet to something along the lines of Docker,
and to change our metadata mechanism towards a more purpose-built solution
like Mesos/Aurora or Kubernetes.&lt;/p&gt;
&lt;p&gt;As a starting point, we find it valuable to read
&lt;a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43438.pdf"&gt;Large-scale cluster management at Google with Borg&lt;/a&gt;
which informed some elements of the approach to Kubernetes,
and
&lt;a href="https://people.eecs.berkeley.edu/~alig/papers/mesos.pdf"&gt;Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center&lt;/a&gt;
which describes the Mesos/Aurora approach.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;If you&amp;rsquo;re wondering why there&amp;rsquo;s no mention of
&lt;a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44843.pdf"&gt;Borg, Omega, and Kubernetes&lt;/a&gt;,
it&amp;rsquo;s because it wasn&amp;rsquo;t published until 2016, a year after this strategy was developed.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Within Uber, we have a number of ex-Twitter engineers who can speak with confidence to their experience
operating with Mesos/Aurora at Twitter. We have been unable to find anyone to speak with that has
production Kubernetes experience operating a comparably large fleet of 10,000+ servers, although
presumably someone is operating&amp;ndash;or close to operating&amp;ndash;Kubernetes at that scale.&lt;/p&gt;
&lt;p&gt;Our general belief of the evolution of the ecosystem at the time
is &lt;a href="https://craftingengstrategy.com/uber-strategy-wardley/"&gt;described in this Wardley mapping exercise on service orchestration (2014)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/wardley-compute-v2.png" alt="Wardley map of service orchestration"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Wardley map of service orchestration&lt;/p&gt;
&lt;p&gt;One of the unknowns today is how the evolution of Mesos/Aurora and Kubernetes will look in the future.
Kubernetes seems promising with Google&amp;rsquo;s backing, but there are few if any meaningful production deployments today.
Mesos/Aurora has more community support and more production deployments, but the absolute number of deployments
remains quite small, and there is no large-scale industry backer outside of Twitter.&lt;/p&gt;
&lt;p&gt;Even further out, there&amp;rsquo;s considerable excitement around &amp;ldquo;serverless&amp;rdquo; frameworks,
which seem like a likely future evolution, but canvassing the industry and our networks
we&amp;rsquo;ve been unable to find enough real-world usage to make an active push towards
this destination today.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;&lt;a href="https://craftingengstrategy.com/wardley-mapping/"&gt;Wardley mapping&lt;/a&gt; is introduced as one of the techniques
for &lt;a href="https://craftingengstrategy.com/refine/"&gt;strategy refinement&lt;/a&gt;, but it can also be a useful
technique for exploring an dynamic ecosystem like service orchestration in 2014.&lt;/p&gt;
&lt;p&gt;Assembling each strategy requires exercising judgment on how to compile the pieces
together most usefully, and in this case I found that the map fits most naturally
with the rest of exploration rather than in the more operationally-focused refinement
section.&lt;/p&gt;
&lt;/div&gt;</description></item><item><title>Service onboarding model for Uber (2014).</title><link>https://craftingengstrategy.com/uber-strategy-model/</link><pubDate>Thu, 09 Jan 2025 05:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/uber-strategy-model/</guid><description>&lt;p&gt;At the core of
&lt;a href="https://craftingengstrategy.com/uber-strategy/"&gt;Uber&amp;rsquo;s service migration strategy (2014)&lt;/a&gt;
is understanding the service onboarding process, and identifying the levers
to speed up that process. Here we&amp;rsquo;ll develop a
&lt;a href="https://craftingengstrategy.com/systems-modeling/"&gt;system model&lt;/a&gt;
representing that onboarding process, and exercise the model to test a number
of hypotheses about how to best speed up provisioning.&lt;/p&gt;
&lt;p&gt;In this chapter, we&amp;rsquo;ll cover:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Where the model of service onboarding suggested we focus on efforts&lt;/li&gt;
&lt;li&gt;Developing a system model using the &lt;a href="https://github.com/lethain/systems"&gt;lethain/systems&lt;/a&gt; package on Github.
That model &lt;a href="https://github.com/lethain/eng-strategy-models/blob/main/UberServiceOnboarding.ipynb"&gt;is available in the lethain/eng-strategy-models&lt;/a&gt;
repository&lt;/li&gt;
&lt;li&gt;Exercising that model to learn from it&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let&amp;rsquo;s figure out what this model can teach us.&lt;/p&gt;
&lt;h2 id="learnings"&gt;Learnings&lt;/h2&gt;
&lt;p&gt;Even if we model this problem with a 100% success rate (e.g. no errors at all),
the backlog of requested new services continues to increase over time.
This clarifies that the problem to be solved is not the service provisioning team&amp;rsquo;s efficiency
in running their current process,
but rather that the fundamental approach is not working.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/uber-model-diag-1.png" alt="Service provisioning model without error states"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Service provisioning model without error states&lt;/p&gt;
&lt;p&gt;Although hiring is tempting as a solution, our model suggests it is not a particularly valuable approach in this scenario.
Even increasing the Service Provisioning team&amp;rsquo;s staff allocated to manually provisioning services by 500%
doesn&amp;rsquo;t solve the backlog of incoming requests.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/uber-model-chart-infra-hiring.png" alt="Impact of infrastructure engineering hiring on service provisioning"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Impact of infrastructure engineering hiring on service provisioning&lt;/p&gt;
&lt;p&gt;If reducing errors doesn&amp;rsquo;t solve the problem, and increased hiring for the team doesn&amp;rsquo;t solve the problem,
then we have to find a way to eliminate manual service provisioning entirely.
The most promising candidate is moving to a self-service provisioning model,
which our model shows solves the backlog problem effectively.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/uber-model-chart-self-service.png" alt="Impact of self-service provisioning on provisioning rate"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Impact of self-service provisioning on provisioning rate&lt;/p&gt;
&lt;p&gt;Refining our earlier statement, additional hiring may benefit the team if we are able to focus those
hires on building self-service provisioning, and we&amp;rsquo;re able to
&lt;a href="https://lethain.com/productivity-in-the-age-of-hypergrowth/"&gt;ramp their productivity&lt;/a&gt;
faster than the increase of incoming service provisioning requests.&lt;/p&gt;
&lt;h2 id="sketch"&gt;Sketch&lt;/h2&gt;
&lt;p&gt;Our initial sketch of service provisioning is a simple pipeline starting with
&lt;code&gt;requested services&lt;/code&gt; and moving step by step through to &lt;code&gt;server capacity allocated&lt;/code&gt;.
Some of these steps are likely much slower than others, but it gives a sense of the
stages and where things might go wrong. It also gives us a sense of what we can measure
to evaluate if our approach to provisioning is working well.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/uber-provis-model.png" alt="Systems model of provisioning services"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Systems model of provisioning services&lt;/p&gt;
&lt;p&gt;One element worth mentioning is the dotted lines from &lt;code&gt;hiring rate&lt;/code&gt; to &lt;code&gt;product engineers&lt;/code&gt; and
from &lt;code&gt;product engineers&lt;/code&gt; to &lt;code&gt;requested services&lt;/code&gt;. These are called &lt;em&gt;links&lt;/em&gt;, which are stocks that
influence another stock, but don&amp;rsquo;t flow directly into them.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;A purist would correctly note that links should connect to flows rather than stocks.
That is true! However, as we&amp;rsquo;ll encounter when we convert this sketch into a model,
there are actually several counterintuitive elements here that are necessary to model
this system but make the sketch less readable.
As a modeler, you&amp;rsquo;ll frequently encounter these sorts of tradeoffs,
and you&amp;rsquo;ll have to decide what choices serve your needs best in the moment.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The biggest missing element the initial model is error flows,
where things can sometimes go wrong in addition to sometimes going right.
There are many ways things can go wrong, but we&amp;rsquo;re going to focus on modeling
three error flows in particular:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Missing/incorrect information&lt;/code&gt; occurs twice in this model, and throws
a provisioning request back into the initial provisioning phase where information is collected.&lt;/p&gt;
&lt;p&gt;When this occurs during port assignment, this is a relatively small trip backwards.
However, when it occurs in Puppet configuration, this is a significantly larger
step backwards.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Puppet error&lt;/code&gt; occurs in the second to final stock, &lt;code&gt;Puppet configuration tested &amp;amp; merged&lt;/code&gt;.
This sends requests back one step in the provisioning flow.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Updating our sketch to reflect these flows, we get a fairly complete and somewhat nuanced,
view of the service provisioning flow.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/uber-provis-model-errors.png" alt="Model of provisioning services with error transitions"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Model of provisioning services with error transitions&lt;/p&gt;
&lt;p&gt;Note that the combination of these two flows introduces the possibility of a service
being almost fully provisioned, but then traveling from Puppet testing back to Puppet configuration
due to &lt;code&gt;Puppet error&lt;/code&gt;, and then backwards again to the initial step due to &lt;code&gt;Missing/incorrect information&lt;/code&gt;.
This means nearly all provisioning progress can be lost if things go wrong.&lt;/p&gt;
&lt;p&gt;There are more nuances we could introduce here, but there&amp;rsquo;s already enough complexity here for us
to learn quite a bit from this model.&lt;/p&gt;
&lt;h2 id="reason"&gt;Reason&lt;/h2&gt;
&lt;p&gt;Studying our sketches, a few things stand out:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The hiring of product engineers is going to drive up service provisioning requests
over time, but there&amp;rsquo;s no counterbalancing hiring of infrastructure engineers to
work on service provisioning. This means there&amp;rsquo;s an implicit, but very real,
deadline to scale this process independently of the size of the infrastructure
engineering team.&lt;/p&gt;
&lt;p&gt;Even without building the full model, it&amp;rsquo;s clear that we have to either stop hiring product engineers,
turn this into a self-service solution, or find a new mechanism to discourage
service provisioning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The size of error rates are going to influence results a great deal,
particularly those for &lt;code&gt;Missing/incorrect information&lt;/code&gt;.
This is probably the most valuable place to start looking for efficiency improvements.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Missing information errors are more expensive than the model implies,
because they require coordination across teams to resolve.
Conversely, Puppet testing errors are probably cheaper than the model
implies, because they should be solvable within the same team and
consequently benefit from a quick iteration loop.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Now we need to build a model that helps guide our inquiry into those questions.&lt;/p&gt;
&lt;h2 id="model"&gt;Model&lt;/h2&gt;
&lt;p&gt;You can find the &lt;a href="https://github.com/lethain/eng-strategy-models/blob/main/UberServiceOnboarding.ipynb"&gt;full implementation of this model on Github&lt;/a&gt;
if you want to see the entirety rather than these emphasized snippets.&lt;/p&gt;
&lt;p&gt;First, let&amp;rsquo;s get the success states working:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;HiringRate(10)
ProductEngineers(1000)
[PotentialHires] &amp;gt; ProductEngineers @ HiringRate
[PotentialServices] &amp;gt; RequestedServices(10) @ ProductEngineers / 10
RequestedServices &amp;gt; InflightServices(0, 10) @ Leak(1.0)
InflightServices &amp;gt; PortNameAssigned @ Leak(1.0)
PortNameAssigned &amp;gt; PuppetGenerated @ Leak(1.0)
PuppetGenerated &amp;gt; PuppetConfigMerged @ Leak(1.0)
PuppetConfigMerged &amp;gt; ServerCapacityAllocated @ Leak(1.0)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As we run this model, we can see that the number of requested services grows significantly over
time. This makes sense, as we&amp;rsquo;re only able to provision a maximum of ten services per round.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/uber-model-diag-1.png" alt="Service provisioning model without error states"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Service provisioning model without error states&lt;/p&gt;
&lt;p&gt;However, it&amp;rsquo;s also the best case, because we&amp;rsquo;re not capturing the three error states:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Unique port and name assignment can fail because of missing or incorrect information&lt;/li&gt;
&lt;li&gt;Puppet configuration can also fail due to missing or incorrect information.&lt;/li&gt;
&lt;li&gt;Puppet configurations can have errors in them, requiring rework.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let&amp;rsquo;s update the model to include these failure modes, starting with unique port and name assignment.
The error-free version looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;InflightServices &amp;gt; PortNameAssigned @ Leak(1.0)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now let&amp;rsquo;s add in an error rate, where 20% of requests are missing information
and return to inflight services stock.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PortNameAssigned &amp;gt; PuppetGenerated @ Leak(0.8)
PortNameAssigned &amp;gt; RequestedServices @ Leak(0.2)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then let&amp;rsquo;s do the same thing for puppet configuration errors:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# original version
PuppetGenerated &amp;gt; PuppetConfigMerged @ Leak(1.0)
# updated version with errors
PuppetGenerated &amp;gt; PuppetConfigMerged @ Leak(0.8)
PuppetGenerated &amp;gt; InflightServices @ Leak(0.2)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, we&amp;rsquo;ll make a similar change to represent errors
made in the Puppet templates themselves:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# original version
PuppetConfigMerged &amp;gt; ServerCapacityAllocated @ Leak(1.0)
# updated version with errors
PuppetConfigMerged &amp;gt; ServerCapacityAllocated @ Leak(0.8)
PuppetConfigMerged &amp;gt; PuppetGenerated @ Leak(0.2)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Even with relatively low error rates, we can see that the throughput of the system
overall has been meaningfully impacted by introducing these errors.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/uber-model-diag-2.png" alt="Service provisioning model with error states"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Service provisioning model with error states&lt;/p&gt;
&lt;p&gt;Now that we have the foundation of the model built, it&amp;rsquo;s time to start
exercising the model to understand the problem space a bit better.&lt;/p&gt;
&lt;h2 id="exercise"&gt;Exercise&lt;/h2&gt;
&lt;p&gt;We already know the errors are impacting throughput, but let&amp;rsquo;s start by
narrowing down which of errors matter most by increasing the error rate
for each of them independently and comparing the impact.&lt;/p&gt;
&lt;p&gt;To model this, we&amp;rsquo;ll create three new specifications, each of which increases
one error from from 20% error rate to 50% error rate, and see how the overall
throughput of the system is affected:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# test 1: port assignment errors increased
PortNameAssigned &amp;gt; PuppetGenerated @ Leak(0.5)
PortNameAssigned &amp;gt; RequestedServices @ Leak(0.5)
# test 2: puppet generated errors increased
PuppetGenerated &amp;gt; PuppetConfigMerged @ Leak(0.5)
PuppetGenerated &amp;gt; InflightServices @ Leak(0.5)
# test 3: puppet merged errors increased
PuppetConfigMerged &amp;gt; ServerCapacityAllocated @ Leak(0.5)
PuppetConfigMerged &amp;gt; PuppetGenerated @ Leak(0.5)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Comparing the impact of increasing the error rates from 20% to 50% in each
of the three error loops, we can get a sense of the model&amp;rsquo;s sensitivity to each error.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/uber-model-chart-diff-errors.png" alt="Impact of error rates across stages of provisioning"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Impact of error rates across stages of provisioning&lt;/p&gt;
&lt;p&gt;This chart captures why exercising is so impactful: we&amp;rsquo;d assumed during sketching that
errors in puppet generation would matter the most because they caused a long trip backwards,
but it turns out a very high error rate early in the process matters even more because
there are still multiple other potential errors later on that compound on its increase.&lt;/p&gt;
&lt;p&gt;Next we can get a sense of the impact of hiring more people onto the service
provisioning team to manually provision more services, which we can model by
increasing the maximum size of the inflight services stock from &lt;code&gt;10&lt;/code&gt; to &lt;code&gt;50&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# initial model
RequestedServices &amp;gt; InflightServices(0, 10) @ Leak(1.0)
# with 5x capacity!
RequestedServices &amp;gt; InflightServices(0, 50) @ Leak(1.0)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Unfortunately, we can see that even increasing the team&amp;rsquo;s capacity by 500% doesn&amp;rsquo;t solve the backlog of requested services.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/uber-model-chart-infra-hiring.png" alt="Impact of infrastructure engineering hiring on service provisioning"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Impact of infrastructure engineering hiring on service provisioning&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s some impact, but that much, and the backlog of requested services remains extremely high.
We can conclude that more infrastructure hiring isn&amp;rsquo;t the solution we need, but let&amp;rsquo;s
see if moving to self-service is a plausible solution.&lt;/p&gt;
&lt;p&gt;We can simulate the impact of moving to self-service by removing the maximum size from
inflight services entirely:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# initial model
RequestedServices &amp;gt; InflightServices(0, 10) @ Leak(1.0)
# simulating self-service
RequestedServices &amp;gt; InflightServices(0) @ Leak(1.0)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can see this finally solves the backlog.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/uber-model-chart-self-service.png" alt="Impact of self-service provisioning on provisioning rate"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Impact of self-service provisioning on provisioning rate&lt;/p&gt;
&lt;p&gt;At this point, we&amp;rsquo;ve exercised the model a fair amount and have a good sense of what it wants to tell us.
We know which errors matter the most to invest in early, and we also know that we need to make the
move to a self-service platform sometime soon.&lt;/p&gt;</description></item><item><title>Refining strategy with Wardley Mapping.</title><link>https://craftingengstrategy.com/wardley-mapping/</link><pubDate>Thu, 02 Jan 2025 06:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/wardley-mapping/</guid><description>&lt;p&gt;The first time I heard about Wardley Mapping was
from Charity Majors discussing it on Twitter.
Of the three core &lt;a href="https://craftingengstrategy.com/refine/"&gt;strategy refinement techniques&lt;/a&gt;,
this is the technique that I&amp;rsquo;ve personally used the least.
Despite that, I decided to include it in this book because it
highlights how many different techniques can be used for refining strategy,
and also because it&amp;rsquo;s particularly effective at looking at the broader ecosystems
your organization exists in.&lt;/p&gt;
&lt;p&gt;Where the other techniques like &lt;a href="https://craftingengstrategy.com/systems-modeling/"&gt;systems thinking&lt;/a&gt;
and &lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt; often zoom in,
Wardley mapping is remarkably effective at zooming out.&lt;/p&gt;
&lt;p&gt;In this chapter, we&amp;rsquo;ll cover:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A ten-minute primer on Wardley mapping&lt;/li&gt;
&lt;li&gt;Recommendations for tools to create Wardley maps&lt;/li&gt;
&lt;li&gt;When Wardley maps are an ideal strategy refinement tool,
and when they&amp;rsquo;re not&lt;/li&gt;
&lt;li&gt;The process I use to map, as well as integrate
a Wardley map into strategy creation&lt;/li&gt;
&lt;li&gt;Breadcrumbs to specific Wardley maps that provide examples&lt;/li&gt;
&lt;li&gt;Documenting a Wardley map in the context of a strategy writeup&lt;/li&gt;
&lt;li&gt;Why I limited focus on two elements of Wardley&amp;rsquo;s work: doctrines and gameplay&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After working through this chapter, and digging into some
of this book&amp;rsquo;s examples of Wardley Maps, you&amp;rsquo;ll have a good
background to start your own mapping practice.&lt;/p&gt;
&lt;h2 id="ten-minute-primer"&gt;Ten minute primer&lt;/h2&gt;
&lt;p&gt;Wardley maps are a technique created by Simon Wardley to ensure your strategy is grounded in reality.
Or, as mapping practitioners would say, it&amp;rsquo;s a tool for creating situational awareness.
If you have a few days, you might want to start your dive into Wardley mapping
by reading Simon Wardley&amp;rsquo;s book on the topic, &lt;em&gt;&lt;a href="https://medium.com/wardleymaps/on-being-lost-2ef5f05eb1ec"&gt;Wardley Maps&lt;/a&gt;&lt;/em&gt;.
If you only have ten minutes, then this section should be enough to get you up to speed
on reading Wardley maps.&lt;/p&gt;
&lt;p&gt;Picking an example to work through,
we&amp;rsquo;re going to create a Wardley map that aims to understand a knowledge base management
product, along the lines of a wiki like Confluence or Notion.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/intro-wardley-init.png" alt="Wardley map for a knowledge base management application"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Wardley map for a knowledge base management application&lt;/p&gt;
&lt;p&gt;You need to know three foundational concepts to read a Wardley map:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Maps are populated with three kinds of components: users, needs, and capabilities.
Users exist at the top, and represent a cohort of users who will use your product.
Each kind of user has a specific set of needs, generally tasks that they need to accomplish.
Each need requires certain capabilities required to fulfill that need.&lt;/p&gt;
&lt;p&gt;Any box connecting directly to a user is a need. Any box connecting to a need is a capability.
A capability can be connected to any number of needs, but can never connect directly to a user;
they connect to users only indirectly via a need.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The x-axis is divided into four segments, representing how commoditized a capability is.
On the far left is genesis, which represents a brand-new capability that hasn&amp;rsquo;t existed before.
On the far right is commoditized, something so standard and expected that it&amp;rsquo;s unremarkable,
like turning on a switch causing electricity to flow.
In between are custom and product, the two categories where most items fall on the map.
Custom represents something that requires specialized expertise and operation to function,
such as a web application that requires software engineers to build and maintain.
Product represents something that can generally be bought.&lt;/p&gt;
&lt;p&gt;In this map, document reading is commoditized: it&amp;rsquo;s unremarkable if your application
allows its users to read content. On the other hand, document editing is somewhat on the
border of product and custom. You might integrate an existing vendor for document editing
needs, or you might build it yourself, but in either case document editing is less commoditized
than document reading.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The y-axis represents visibility to the user. In this map, reading documents is something that
is extremely visible to the user. On the other hand, users depend on something indexing new
documents for search, but your users will generally have no visibility into the indexing process
or even that you have a search index to begin with.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Although maps can get quite complex, those three concepts are generally sufficient to allow
you to decode an arbitrarily complex map.&lt;/p&gt;
&lt;p&gt;In addition to mapping the current state, Wardley maps are also excellent at
exploring how circumstances might change over time.
To illustrate that, let&amp;rsquo;s look at a second iteration of our map,
paying particular attention to the red arrows indicating capabilities
that we expect to change in the future.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/intro-wardley-future.png" alt="AI-enhanced document editing as future state of document editing"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;AI-enhanced document editing as future state of document editing&lt;/p&gt;
&lt;p&gt;In particular, the map now indicates that the current document creation experience will be superseded
by an AI-enhanced editing process. Critically, the map also predicts that the AI-enhanced process will be more commoditized than
its current authoring experience, perhaps because the AI-enhancement will be driven by commoditized foundational
models from providers like Anthropic and OpenAI.
Building on that, the only place left in the map for meaningful differentiation is in search indexing.
Either the knowledge base company needs to accept the implication that they will increasingly be
a search company, or they need to expand the user needs they service to find a new avenue for differentiation.&lt;/p&gt;
&lt;p&gt;Some maps will show evolution of a given capability using a &amp;ldquo;pipeline&amp;rdquo;,
a box that describes a series of expected improvements in a capability over time.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/intro-wardley-future-pipeline.png" alt="Pipeline showing evolution of document editing"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Pipeline showing evolution of document editing&lt;/p&gt;
&lt;p&gt;Now instead of simply indicating that the authoring experience may be replaced by
an AI-enhanced capability over time, we&amp;rsquo;re able to express a sequence of steps.
From the starting place of a typical editing experience, the next expected step is AI-assisted
creation, and then finally we expect AI-led creation where the author only provides high-level
direction to a machine learning-powered agent.&lt;/p&gt;
&lt;p&gt;For completeness, it&amp;rsquo;s also worth mentioning that some Wardley maps will
have an overlay, which is a box to group capabilities or requirements together by some common denominator.
This happens most frequently to indicate the responsible team for various capabilities,
but it&amp;rsquo;s a technique that can be used to emphasize any interesting element of a map&amp;rsquo;s topology.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/intro-wardley-team-overlay.png" alt="Overlay showing which teams own which capabilities"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Overlay showing which teams own which capabilities&lt;/p&gt;
&lt;p&gt;At this point, you have the foundation to read a Wardley map, or get started creating your own.
Maps you encounter in the wild might appear significantly more complex than these initial examples,
but they&amp;rsquo;ll be composed of the same fundamental elements.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;&lt;strong&gt;More Wardley Mapping resources&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="https://itrevolution.com/product/the-value-flywheel-effect/"&gt;The Value Flywheel Effect&lt;/a&gt;&lt;/em&gt; by David Anderson&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="https://medium.com/wardleymaps/on-being-lost-2ef5f05eb1ec"&gt;Wardley Maps&lt;/a&gt;&lt;/em&gt; by Simon Wardley on Medium,
also &lt;a href="https://learnwardleymapping.com/book/"&gt;available as PDF&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://learnwardleymapping.com/"&gt;Learn Wardley Mapping&lt;/a&gt; by Ben Mosior&lt;/p&gt;
&lt;p&gt;&lt;a href="https://list.wardleymaps.com/"&gt;wardleymaps.com&amp;rsquo;s resources&lt;/a&gt; and &lt;a href="https://www.youtube.com/wardleymaps"&gt;@WardleyMaps on Youtube&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="tools-for-wardley-mapping"&gt;Tools for Wardley Mapping&lt;/h2&gt;
&lt;p&gt;Systems modeling has a serious tooling problem, which often prevents would-be adopters from
developing their systems modeling practice. Fortunately, Wardley Mapping doesn&amp;rsquo;t suffer from
that problem. You can simply print out a Wardley Map and draw on it by hand.
You can also use OmniGraffle, Miro, Figma or whatever diagramming tool you&amp;rsquo;re already familiar with.&lt;/p&gt;
&lt;p&gt;There are more focused tools as well, with
Ben Mosior pulling together an excellent writeup on
&lt;a href="https://learnwardleymapping.com/2024/06/24/top-5-wardley-mapping-tools-for-2024/"&gt;Wardley Mapping Tools as of 2024&lt;/a&gt;.
Of those two, I&amp;rsquo;d strongly encourage starting with &lt;a href="https://mapkeep.com/"&gt;Mapkeep&lt;/a&gt; as a simple, free, and intuitive
tool for your initial mapping needs.&lt;/p&gt;
&lt;p&gt;After you&amp;rsquo;ve gotten some practice, you may well want to move back into your most familiar diagramming tool
to make it easier to collaborate with colleagues, but initially prioritize the simplest tool you can to avoid
losing learning momentum on configuration, setup and so on.&lt;/p&gt;
&lt;h2 id="when-are-wardley-maps-useful"&gt;When are Wardley Maps useful?&lt;/h2&gt;
&lt;p&gt;All successful strategy begins with understanding the constraints and circumstances that the strategy needs to work within.
Wardley mapping labels that understanding as situational awareness,
and creating situational awareness is the foremost goal of mapping.&lt;/p&gt;
&lt;p&gt;Situational awareness is always useful, but it&amp;rsquo;s particularly essential in highly dynamic environments where the industry around you,
competitors you&amp;rsquo;re selling against, or the capabilities powering your product are shifting rapidly.
In the past several decades, there have been a number of these dynamic contexts,
including the rise of web applications, the proliferation of mobile devices,
and the expansion of machine learning techniques.&lt;/p&gt;
&lt;p&gt;When you&amp;rsquo;re in those environments, it&amp;rsquo;s obvious that the world is changing rapidly.
What&amp;rsquo;s sometimes easy to miss is that any strategy the needs to last longer than a
year or two is built on an evolving foundation, even if things seem very stable at the time.
For example, in the early 2010s, startups like Facebook, Uber and Digg were all operating in physical
datacenters with their owned hardware. Over a five year period, having a presence in a
physical datacenter went from the default approach for startups to a relatively unconventional solution,
as cloud based infrastructure rapidly expanded.
Any strategy written in 2010 that imagined the world of hosting was static, was destined
to be invalidated.&lt;/p&gt;
&lt;p&gt;No tool is universally effective, and that&amp;rsquo;s true here as well.
While Wardley maps are extremely helpful at understanding broad change,
my experience is that they&amp;rsquo;re less helpful in the details.
If you&amp;rsquo;re looping to optimize your onboarding funnel, then something like
&lt;a href="https://craftingengstrategy.com/systems-modeling/"&gt;systems modeling&lt;/a&gt;
or
&lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt;
are likely going to serve you better.&lt;/p&gt;
&lt;h2 id="how-to-wardley-map"&gt;How to Wardley Map&lt;/h2&gt;
&lt;p&gt;Learning Wardley mapping is a mix of reading others&amp;rsquo; maps and writing your own.
A variety of maps for reading are collected in the following breadcrumbs section,
and I&amp;rsquo;d recommend skimming all of them.
In this section are the concrete steps I&amp;rsquo;d encourage you to follow
for creating the first map of your own:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Commit to starting small and iterating.&lt;/strong&gt;
Simple maps are the foundation of complex maps.
Even the smallest Wardley map will
have enough detail to reveal something interesting about the environment
you&amp;rsquo;re operating in.&lt;/p&gt;
&lt;p&gt;Conversely, by starting complex, it&amp;rsquo;s easy to get caught up in all of your
early map&amp;rsquo;s imperfections. At worst, this will cause you to lose momentum in
creating the map. At best, it will accidentally steer your attention rather
than facilitating discovery of which details are important to focus on.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;List users, needs and capabilities.&lt;/strong&gt;
Identify the first one or two users for your product.
Going back to the knowledge management example from the primer,
your two initial users might be an author
and a reader. From there, identify those users&amp;rsquo; needs, such as authoring content,
finding content, and providing feedback on which content is helpful.
Finally, write down the underlying technical capabilities necessary
to support those needs, which might range from indexing content in a search index
to a customer support process to deal with frustrated users.&lt;/p&gt;
&lt;p&gt;Remember to start small!
On your first pass, it&amp;rsquo;s fine to focus on a single user.
As you iterate on your map, bring in more users, needs and capabilities
until the map conveys something useful.&lt;/p&gt;
&lt;p&gt;Tooling for this can be a piece of paper or wherever you keep notes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Establish value chains.&lt;/strong&gt;
Take your list and then connect each of the components into chains.
For example, the reader in the above knowledge base example would then
be connected to needing to discover content. Discovering content would
be linked to indexing in the search index. That sequence from reader
to discovering content to search index represents one value chain.&lt;/p&gt;
&lt;p&gt;Convergence across chains is a good thing.
As your chains get more comprehensive, it&amp;rsquo;s expected that a given capability
would be referenced by multiple different needs. Similarly, it&amp;rsquo;s expected that
multiple users might have a shared need.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Plot value chains&lt;/strong&gt; on a Wardley Map.
You can do this using any of the tools discussed in the Tools for Wardley mapping section,
including a piece of paper.&lt;/p&gt;
&lt;p&gt;Because you already have the value chains created, what you&amp;rsquo;re focused on in this step
is placing each component relative to it&amp;rsquo;s visibility to users (higher up is more visible to the user,
lower down is less visible), and how mature the solutions are (leftward represents more custom solutions,
rightward represents most commoditized solutions).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Study current state&lt;/strong&gt; of the map.
With the value chains plotted on your map,
it will begin to reveal where your organization&amp;rsquo;s attention should be focused,
and what complexity you can delegate to vendors.
Jot down any realizations you have from this topology.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Predict&lt;/strong&gt; evolution of the map, and create a second version of your
map that includes these changes. (Keep the previous version so you can
better see the evolution of your thinking!)&lt;/p&gt;
&lt;p&gt;It can be helpful to create multiple maps that contemplate different scenarios.
Thinking about the running knowledge base example, you might contemplate a future where AI-powered tools become
the dominant mechanism for authors creating content.
Then you could explore another future where such tools are regulated out of most tools,
and imagine how that would shape your approach differently.&lt;/p&gt;
&lt;p&gt;Picking the timeframe for these changes will vary on the environment
you&amp;rsquo;re mapping. Always prefer a timeframe that makes it easy to believe changes will happen,
maybe that&amp;rsquo;s five years, or maybe it&amp;rsquo;s 12 months.
If you&amp;rsquo;re caught up wondering whether change might take longer than a certain timeframe,
than simply extend your timeframe to sidestep that issue.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Study future state&lt;/strong&gt; of the map, now that you&amp;rsquo;ve predicted the future,
Once again, write down any unexpected implications of this evolution,
and how you may need to adjust your approach as a result.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Share with others&lt;/strong&gt; for feedback.
It&amp;rsquo;s impossible for anyone to know everything, which is why the best maps tend
to be a communal creation. That&amp;rsquo;s not to suggest that you should perform every
step in a broad community, or that your map should be the consensus of a working group.
Instead, you should test your map against others, see what they find insightful
and what they find artificial in the map, and include that in your map&amp;rsquo;s topology.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Document&lt;/strong&gt; what you&amp;rsquo;ve learned as discussed below in the section on documentation.
You should also connect that Wardley map writeup with your overall strategy document,
typically in the &lt;a href="https://craftingengstrategy.com/strategy-steps/"&gt;Refine or Explore sections&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;One downside of presenting steps to do something is that the sequence
can become a fixed recipe. These are the steps that I&amp;rsquo;ve found most useful,
and I&amp;rsquo;d encourage you to try them if mapping is a new tool in your toolkit,
but this is far from the canonical way.
Start here, then experiment with other approaches until you find the
best approach for you and the strategies that you&amp;rsquo;re working on.&lt;/p&gt;
&lt;h2 id="breadcrumbs-for-wardley-map-examples"&gt;Breadcrumbs for Wardley Map examples&lt;/h2&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;&lt;em&gt;I&amp;rsquo;ll update these examples as I continue writing more strategies for this book.&lt;/em&gt;
&lt;em&gt;Until then, I admit that some of these examples are &amp;ldquo;what I have laying around&amp;rdquo; moreso than the &amp;ldquo;ideal forms of Wardley maps.&amp;rdquo;&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;With the foundation in place, the best way to build on Wardley mapping is writing your
own maps. The second best way is to read existing maps that others have made,
and a number of which exist within this book:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://craftingengstrategy.com/wardley-llm-ecosystem/"&gt;LLM evolution&lt;/a&gt; studies the evolution of the Large Language Model ecosystem,
and how that will impact product engineering organizations attempting to validate and deploy
new paradigms like agentic workflows and retrieval augmented generation&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lethain.com/measuring-developer-experience-benchmarks-theory-of-improvement/"&gt;Evolution of developer experience tooling space&lt;/a&gt;
explores how Wardley mapping has helped me refine my understanding of how the developer experience ecosystem
will evolve over time&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In addition to the maps within this book, I also label maps that I create on my blog
using the &lt;a href="https://lethain.com/tags/wardley/"&gt;wardley category&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="how-to-document-a-wardley-map"&gt;How to document a Wardley Map&lt;/h2&gt;
&lt;p&gt;As explored in &lt;a href="https://craftingengstrategy.com/readable-strategy/"&gt;how to create readable strategy documents&lt;/a&gt;,
the default temptation is to structure documents around the creation process.
However, it&amp;rsquo;s essentially always better to write in two steps:
develop a writing-optimization version that&amp;rsquo;s focused on facilitating thinking, and then rework it into
a reading-optimized version that supports both readers who are, and are not, interested in the details.&lt;/p&gt;
&lt;p&gt;The writing-optimized version is what we discussed in &amp;ldquo;How to Wardley Map&amp;rdquo; above.
For a reading-optimized version, I recommend:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;How things work today&lt;/strong&gt; shares a map of the current environment,
explains any interesting rationales or controversies behind placements on the map,
and highlights the most interesting parts of the map&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Transition to future state&lt;/strong&gt; starts with a second map, this one showing the
transition from the current state to a projected future state.
It&amp;rsquo;s very reasonable to have multiple distinct maps, each of which considers
one potential evolution, or one step of a longer evolution.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Users and Value chains&lt;/strong&gt; are the first place you start creating a Wardley map,
but generally the least interesting part of explaining a map&amp;rsquo;s implications.
This isn&amp;rsquo;t because the value chains are unimportant, rather it&amp;rsquo;s because the map
itself tends to implicitly explain the value chain enough that you can move directly to
focusing on the map&amp;rsquo;s most interesting implications.&lt;/p&gt;
&lt;p&gt;In a sufficiently complex, it&amp;rsquo;s very reasonable to split this into two sections,
but generally I find it eliminates redundancy to cover users and value chains in one
joint section rather than separately. This is a good example of the difference between
reading and writing: splitting these two topics helps clarify thinking, but muddles reading.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This ordering may seem too brief or a bit counter-intuitive for you, as the person who has the full set of details,
but my experience is that it will be simpler to read for most readers. That&amp;rsquo;s because most readers
read until they agree with the conclusion, then stop reading, and are only interested in the details if they
disagree with the conclusion.&lt;/p&gt;
&lt;p&gt;This format is also fairly different than the format I recommend for documenting systems models.
That is because systems model diagrams exclude much of the relevant detail, showing the relationship
between stocks but not showing the magnitude of the flows. You can only fully understand a system model
by seeing both the diagram and a chart showing the model&amp;rsquo;s output.
Wardley maps, on the other hand, tend to be more self-explanatory, and often can stand on their
own with relatively less written description.&lt;/p&gt;
&lt;h2 id="what-about-doctrines-and-gameplay"&gt;What about doctrines and gameplay?&lt;/h2&gt;
&lt;p&gt;This book&amp;rsquo;s &lt;a href="https://craftingengstrategy.com/strategy-steps/"&gt;components of strategy&lt;/a&gt;
are most heavily influenced by Richard Rumelt&amp;rsquo;s approach.
Simon Wardley&amp;rsquo;s approach to strategy built around Wardley Mapping could be viewed as a competing lens.
For each problem that Rumelt&amp;rsquo;s system solves, there is a Wardley solution as well,
and it&amp;rsquo;s worth mentioning some of the components I&amp;rsquo;ve not included, and why I didn&amp;rsquo;t.&lt;/p&gt;
&lt;p&gt;The two most important components I&amp;rsquo;ve not discussed thus far are
Wardley&amp;rsquo;s ideas of &lt;a href="https://learnwardleymapping.com/2020/08/17/principles-first/"&gt;doctrine&lt;/a&gt;
and &lt;a href="https://www.wardleymaps.com/gameplay"&gt;gameplay&lt;/a&gt;. Wardley&amp;rsquo;s doctrine are universally applicable
practices like knowing your users, biasing towards data, and design for constant evolution.
Gameplay is similar to doctrine, but is context-dependent rather than universal.
Some examples of gameplay are talent raid (hiring from knowledgeable competitors), bundling (selling products together rather than separately),
and exploiting network effects.&lt;/p&gt;
&lt;p&gt;I decided not to spend much time on doctrine and gameplay because I find them lightly specialized
on the needs of business strategy, and consequently a bit messy to apply to the sorts of problems
that this book is most interested in solving: the problems of engineering strategy.&lt;/p&gt;
&lt;p&gt;To be explicit, I don&amp;rsquo;t personally view Rumelt&amp;rsquo;s approach and Wardley&amp;rsquo;s approaches as competing efforts.
What&amp;rsquo;s most valuable is to have a broad toolkit, and pull in the pieces of that toolkit that feel most
applicable to the problems at hand. I find Wardley Maps exceptionally valuable at enhancing exploration,
diagnosis, and refinement in some problems. In other problems, typically shorter duration or more internally-oriented,
I find the Rumelt playbook more applicable. In all problems, I find the combination more valuable than anchoring
in one camp&amp;rsquo;s perspective.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;No refinement technique will let you reliably predict the future,
but Wardley mapping is very effective at helping you plot out
the various potential futures your strategy might need to operate in.
With those futures in mind, you can tune your strategy to excel in
the most likely, and to weather the less desirable.&lt;/p&gt;
&lt;p&gt;It took me years to dive into Wardley mapping.
Once I finally did, it was simpler than I&amp;rsquo;d feared,
and now I find myself creating Wardley maps somewhat frequently.
When you&amp;rsquo;re working on your next strategy that&amp;rsquo;s impacted by
the ecosystem&amp;rsquo;s evolution around it, try your hand at mapping,
and soon you&amp;rsquo;ll &lt;a href="https://lethain.com/tags/wardley/"&gt;start to build your own collection of maps&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Refinement</title><link>https://craftingengstrategy.com/refinement/</link><pubDate>Thu, 02 Jan 2025 06:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/refinement/</guid><description/></item><item><title>Refining</title><link>https://craftingengstrategy.com/refine/</link><pubDate>Sat, 28 Dec 2024 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/refine/</guid><description>&lt;p&gt;In Jim Collins&amp;rsquo; &lt;em&gt;&lt;a href="https://www.amazon.com/Great-Choice-Uncertainty-Thrive-Despite/dp/1847940889/"&gt;Great by Choice&lt;/a&gt;&lt;/em&gt;,
he develops the concept of &lt;a href="https://www.jimcollins.com/concepts/fire-bullets-then-cannonballs.html"&gt;Fire Bullets, Then Cannonballs&lt;/a&gt;.
His premise is that you should cheaply test new ideas before fully committing to them.
Your organization can only afford firing a small number of cannonballs, but it can bankroll far more bullets.
Why not use bullets to derisk your cannonballs&amp;rsquo; trajectories?&lt;/p&gt;
&lt;p&gt;This chapter presents a series of concrete techniques that I have personally
used to effectively refine strategies before reaching the cannonball stage.
We&amp;rsquo;ll work through:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An introduction to the practice of strategy refinement&lt;/li&gt;
&lt;li&gt;Why strategy refinement is the highest impact step of strategy creation&lt;/li&gt;
&lt;li&gt;How mixed incentives often cause refinement to be skipped, even though
skipping leads to worse organizational outcomes&lt;/li&gt;
&lt;li&gt;Building your personal toolkit for refining strategy by picking from various
refinement techniques like &lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt;,
&lt;a href="https://craftingengstrategy.com/systems-modeling/"&gt;systems modeling&lt;/a&gt;,
and &lt;a href="https://craftingengstrategy.com/wardley-mapping/"&gt;Wardley mapping&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Brief introductions to each of those refinement techniques. These provide enough context
to pick which ones might be useful for the strategy that you&amp;rsquo;re working on&lt;/li&gt;
&lt;li&gt;Survey of anti-patterns that skip refinement or manufacture consent to create the illusion
of refinement without providing the benefits&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each of the refinement techniques, such as systems modeling, is covered in greater
detail&amp;ndash;including concrete applications to specific engineering strategies&amp;ndash;in the
refinement section of this book.&lt;/p&gt;
&lt;h2 id="what-is-strategy-refinement"&gt;What is strategy refinement?&lt;/h2&gt;
&lt;p&gt;Most strategies succeed because they properly address narrow problems within a broader strategy.
While fully implementing a strategy to validate it is possible, this approach is typically inefficient and slow.
Worse, it&amp;rsquo;s easy to get so distracted by miscellaneous details that you lose sight
of the levers that will make your strategy impactful.&lt;/p&gt;
&lt;p&gt;Strategy refinement is a toolkit of methods to identify those narrow problems that matter most,
and validate that your solutions to those problems will be effective.
The right tool within the toolkit will vary depending on the strategy you&amp;rsquo;re working on.
It might be using Wardley mapping to understand how the ecosystem&amp;rsquo;s evolution will impact your approach.
Or it might be systems modeling to determine which part of a migration is the most valuable lever.
In other cases, it&amp;rsquo;s slowing down committing to your strategy until you&amp;rsquo;ve done a narrow test drive
to derisk the pieces you don&amp;rsquo;t quite have conviction in yet.&lt;/p&gt;
&lt;p&gt;Whatever tools you&amp;rsquo;ve relied on to refine strategy thus far in your work, there
are always new refinement tools to pick up. This book presents a workable introduction to several tools that I find reliably useful,
while providing a broader foundation for deploying other techniques that you
develop towards strategy refinement.&lt;/p&gt;
&lt;h2 id="does-refinement-matter"&gt;Does refinement matter?&lt;/h2&gt;
&lt;p&gt;At Stripe, the head of engineering rolled out agile techniques in one meeting.
This change was aimed at our difficulties with planning in periods longer than a month,
which was becoming an increasing challenge as we started working with enterprise businesses who
wanted us to commit to specific functionality as part of signing their contracts.
On the other hand, the approach worked poorly, because it assumed that the issue was engineering managers
being generally unfamiliar with agile techniques.
The challenge of adoption wasn&amp;rsquo;t awareness, but rather the difficulty of prioritizing asks from numerous stakeholders
in an environment where saying no was frowned upon.&lt;/p&gt;
&lt;p&gt;In this agile rollout, the lack of a shared planning paradigm was a real, apt problem.
However, the solution solved the easiest part of the problem, without addressing the
messier parts, and consequently failed to make meaningful progress.
This happens a surprising amount, and can be largely avoided with a small dose of refinement.&lt;/p&gt;
&lt;p&gt;On the opposite end, we created Uber&amp;rsquo;s service adoption strategy exclusively through refinement,
because the infrastructure engineering team didn&amp;rsquo;t have any authority to mandate wider changes.
Instead, we relied on two different kinds of refinement to focus our iterative efforts.
First, we used systems modeling to understand what parts of adoption we needed to focus on.
Second, we used strategy testing to learn by migrating individual product engineering teams
over to the new platform.&lt;/p&gt;
&lt;p&gt;In the agile adoption example, failure to refine turned a moderately challenging problem
into a strategy failure.
In the service migration example, focus on refinement translated an extremely difficult problem
into a success.
Refinement is, in my experience, the kernel of effective strategy.&lt;/p&gt;
&lt;h2 id="if-it-matters-why-is-it-skipped"&gt;If it matters, why is it skipped?&lt;/h2&gt;
&lt;p&gt;When a small team creates a strategy,
a so-called &lt;a href="https://craftingengstrategy.com/when-write-strategy/"&gt;low-altitude strategies&lt;/a&gt;,
they almost always spend a great deal of time refining their strategy.
This isn&amp;rsquo;t because most teams believe in refinement.
Rather it&amp;rsquo;s because most teams lack the authority to force others to align with their strategy.
This lack of authority means they must incrementally prove out their approach until other teams or executives believe it&amp;rsquo;s worth aligning with.&lt;/p&gt;
&lt;p&gt;High-altitude strategy is typically the domain of executives, who generally have the ability to mandate adoption,
and routinely skip the refinement stage, even when it&amp;rsquo;s inexpensive and is almost guaranteed to make them more successful.
Why is that?
When &lt;a href="https://lethain.com/first-ninety-days-cto-vpe/"&gt;executives start a new role&lt;/a&gt;, they know making an early impression matters.
They also, unfortunately, know that sounding ambitious often resonates more loudly
than doing good work. So, while they do hope to eventually be effective, early on they kick off
a few aspirational initiatives &lt;a href="https://lethain.com/grand-migration/"&gt;like a massive overhaul of the codebase&lt;/a&gt;,
believing it&amp;rsquo;ll establish their reputation as an effective leader at the company.&lt;/p&gt;
&lt;p&gt;This isn&amp;rsquo;t uniquely an executive failure, it also happens frequently in
&lt;a href="https://craftingengstrategy.com/when-write-strategy/"&gt;permissive strategy organizations&lt;/a&gt;
that require &lt;a href="https://staffeng.com/guides/staff-projects/"&gt;an ambitious, high-leverage project to get promoted into senior engineering roles&lt;/a&gt;.
For example, you might see a novel approach to networking or authorization implemented in a company, whose adoption fails after solving some easier proof points,
and trace its heritage back to promotion criteria.
In many cases, the promotion will come before the rollout stalls out, disincentivizing the would-be promoted engineer
from worrying too deeply about whether this was net-positive for the organization.
The executive responsible for the promotion rubric will eventually recognize the flaw,
but it&amp;rsquo;s not the easiest tradeoff for them to pick between an organization that innovates too much while empowering individuals
or an organization with little waste but restricted room for creativity.&lt;/p&gt;
&lt;p&gt;Another reason refinement can get skipped is that sometimes you&amp;rsquo;re forced to urgently create and commit to a strategy,
usually because your boss tells you to. This doesn&amp;rsquo;t actually prevent refinement&amp;ndash;just say you&amp;rsquo;re committed and refine anyway&amp;ndash;but
often this interaction turns off the strategist&amp;rsquo;s mind, tricking them into intellectually thinking
they can&amp;rsquo;t change their approach because they&amp;rsquo;ve already committed to it.
This is never true, all decisions are up for review with proper evidence,
but it takes a certain courage to refine when those around you are asking
for weekly updates on completing the project.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s one other important reason that strategy refinement gets skipped:
many people haven&amp;rsquo;t built out a toolkit to perform strategy refinement,
and haven&amp;rsquo;t worked with someone who has a toolkit.&lt;/p&gt;
&lt;h2 id="building-your-toolkit"&gt;Building your toolkit&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m eternally grateful to my father, a professor of economics,
who brought me to a systems modeling workshop in Boston one summer
when I was in high school. This opened my eyes to the wide world of
techniques for reasoning about problems, and systems modeling became
the first tool in my toolkit for strategy refinement.&lt;/p&gt;
&lt;p&gt;The section on refinement will go into three refinement techniques in significant detail:
&lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt;,
&lt;a href="https://craftingengstrategy.com/systems-modeling/"&gt;systems modeling&lt;/a&gt;, and
&lt;a href="https://craftingengstrategy.com/wardley-mapping/"&gt;Wardley mapping&lt;/a&gt;,
as well as surveying a handful of other techniques more common to strategy consultants.
Systems modeling I adopted early, whereas Wardley mapping I only learned while
working on this book.
Few individuals are proficient users of many refinement tools, but it&amp;rsquo;s extraordinarily
powerful to unlock your first tool, and worthwhile to slowly expand your experience
with other tools over time. All tools are flawed, and each is best at illuminating
certain types of problems.&lt;/p&gt;
&lt;p&gt;If all of these are unfamiliar, then skim over all of them and pick one that seems
most applicable to a current problem you&amp;rsquo;re working on.
You&amp;rsquo;ll build expertise by trying a tool against many different problems,
and talking through the results with engaged peers.&lt;/p&gt;
&lt;p&gt;As you practice, remember that the important thing to share is the learning
from these techniques, and try to avoid getting too caught up in sharing the techniques themselves.
I&amp;rsquo;ve seen these techniques meaningfully change strategies,
but I&amp;rsquo;ve never seen those changes successfully justified through
the inherent insight of the refinement techniques themselves.&lt;/p&gt;
&lt;h2 id="strategy-testing"&gt;Strategy testing&lt;/h2&gt;
&lt;p&gt;Sometimes you&amp;rsquo;ll need a strategy to solve an ambiguous problem,
or a problem where diagnosing the issues blocking progress are poorly understood.
At Carta, one strategy problem we worked on was improving code quality,
which is a good example of both of those. It&amp;rsquo;s difficult to agree on
what code quality is, and it&amp;rsquo;s equally difficult to agree on appropriate,
concrete steps to improve it.&lt;/p&gt;
&lt;p&gt;To navigate that ambiguity, we spent relatively little time thinking
about the right initial solution, and a great deal of our time
deploying the &lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt; technique:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Identify the narrowest, deepest available slice of your strategy.
Iterate on applying that slice until you see some evidence it&amp;rsquo;s working.&lt;/li&gt;
&lt;li&gt;As you iterate, identify metrics that help you verify the approach is working.&lt;/li&gt;
&lt;li&gt;Operate from the belief that people are well-meaning,
and strategy failures are due to excess friction and poor ergonomics.&lt;/li&gt;
&lt;li&gt;Keep refining until you have conviction that your strategy’s details work in practice,
or that the strategy needs to be approached from a new direction.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In this case, we achieved some small wins, funded a handful of specific bets that we believed would improve
the problem long-term, and ended the initiative early without making a large organizational commitment.
You could argue that&amp;rsquo;s a failure, but my experience is quite different: having a problem doesn&amp;rsquo;t mean you
have an elegant solution, and strategy testing helps you validate if the solution&amp;rsquo;s efficiency and ergonomics
are viable.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re dealing with a deeply ambiguous problem and there&amp;rsquo;s no agreement on
the nature of the reality you&amp;rsquo;re operating in, strategy testing is a great technique to start with.&lt;/p&gt;
&lt;h2 id="systems-modeling"&gt;Systems modeling&lt;/h2&gt;
&lt;p&gt;When you&amp;rsquo;re unsure where leverage points might be in a complex system,
&lt;a href="https://craftingengstrategy.com/systems-modeling/"&gt;systems modeling&lt;/a&gt;
is an effective technique to cheaply determine which levers might be
effective. For example, the systems model for &lt;a href="https://craftingengstrategy.com/llm-onboarding-model/"&gt;onboarding drivers in a ride-share app&lt;/a&gt;
shows that reengaging drivers who&amp;rsquo;ve left the platform matters more than bringing on new drivers
in a mature market.&lt;/p&gt;
&lt;p&gt;Similarly, in the Uber service migration example,
systems modeling helped us focus on eliminating upfront steps during service
onboarding, shifting to reasonable defaults and away from forcing teams
to learn the new service platform before it had shown any usefulness to them.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/QualityMentalModels.png" alt="Systems model of errors in a load balancer"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Systems model of errors in a load balancer&lt;/p&gt;
&lt;p&gt;While you can certainly reach these insights without modeling, modeling tends to make
the insights immediately visible.
In cases where your model doesn&amp;rsquo;t immediately illuminate what matters most,
studying how your model&amp;rsquo;s projections conflict with real-world data will guide you
to understand where your assumptions are contorting your understanding of the problem.&lt;/p&gt;
&lt;p&gt;If you generally understand a problem, but need to determine where to focus
efforts to make the largest impact, then systems modeling is a valuable technique
to deploy.&lt;/p&gt;
&lt;h2 id="wardley-mapping"&gt;Wardley mapping&lt;/h2&gt;
&lt;p&gt;Many engineering strategies implicitly make the assumption that the ecosystem we&amp;rsquo;re operating
within is static. However, that&amp;rsquo;s certainly false. Many experienced engineers and engineering leaders
have great judgment, and great intuition, but nonetheless deploy a flawed strategy because they&amp;rsquo;ve
anchored on their memory of how things work rather than noticing how things have changed over time.&lt;/p&gt;
&lt;p&gt;If, rather than being hit over the head by them, you
want to incorporate these changes into your strategy,
&lt;a href="https://craftingengstrategy.com/wardley-mapping/"&gt;Wardley mapping&lt;/a&gt; is a great tool to add to your kit.&lt;/p&gt;
&lt;p&gt;Wardley maps allow you to plot users, their needs, and then study how
the solutions to those needs will shift over time.
For example, today there is a proliferation of narrow platforms built on
recent advances in large language models, but &lt;a href="https://craftingengstrategy.com/wardley-llm-ecosystem/"&gt;studying a Wardley map of the LLM ecosystem&lt;/a&gt;
suggests that it&amp;rsquo;s likely that this ecosystem will consolidate to fewer, broader platforms
rather than remaining so widely scattered across distinct vendors.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/llm-wardley-1.png" alt="Wardley map of Large Language Model ecosystem"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Wardley map of Large Language Model ecosystem&lt;/p&gt;
&lt;p&gt;If your strategy involves adopting a highly dynamic technology
such as observability in the 2010s, or if your strategy is intended
to span five-plus years, then Wardley mapping will help
surface how industry evolution will impact your approach.&lt;/p&gt;
&lt;h2 id="anti-patterns-in-refinement"&gt;Anti-patterns in refinement&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve already discussed why &lt;strong&gt;refinement is often skipped&lt;/strong&gt;, which is
the most frequent and most damning refinement anti-pattern.
At Calm, we cargo-culted adoption of decomposing our monolithic codebase
into microservices; we had no reason to believe this was improving developer productivity,
but we continued to pursue this strategy for a year before recognizing that we were suffering
from skipping refinement.&lt;/p&gt;
&lt;p&gt;The second most common anti-pattern is creating the impression of strategy refinement
through &lt;strong&gt;manufactured consent&lt;/strong&gt;. A new senior leader joined Uber and mandated a complete technical re-achitecture,
justifying this in part through the evidence that a number of internal leaders had adopted the same techniques
successfully on their teams. Speaking with those internal leaders, they themselves were skeptical that the
proposal made sense, despite the fact that their surface-level agreement was being used to convince the wider
organization that they believed in the new approach.&lt;/p&gt;
&lt;p&gt;Finally, refinement often occurs, but counter-evidence is discarded because the refining team
is &lt;strong&gt;optimizing for a side-goal&lt;/strong&gt; of some sort.
My first team at Yahoo adopted Erlang for a key component of &lt;a href="https://lethain.com/datahub/"&gt;Yahoo! Build Your Own Search Service&lt;/a&gt;,
which proved to be an excellent solution to our problem of wanting to use Erlang,
but a questionable solution to the core problem at hand.
Only three of the engineers on our fifteen person team were willing to touch the Erlang codebase,
but that counter-evidence was ignored because it was in conflict with the side-goal.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;This chapter has introduced the concept of strategy refinement, surveyed three common
refinement techniques&amp;ndash;strategy testing, systems modeling, and Wardley mapping&amp;ndash;and provided
a framework for building your personal toolkit for refinement.
When you&amp;rsquo;re ready to get into more detail,
further in the book there&amp;rsquo;s a section dedicated to the details of applying
these techniques, starting with &lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Wardley mapping the LLM ecosystem.</title><link>https://craftingengstrategy.com/wardley-llm-ecosystem/</link><pubDate>Tue, 24 Dec 2024 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/wardley-llm-ecosystem/</guid><description>&lt;p&gt;In &lt;a href="https://craftingengstrategy.com/llm-adoption-strategy/"&gt;How should you adopt LLMs?&lt;/a&gt;, we explore how a theoretical ride sharing company,
Theoretical Ride Sharing, should adopt Large Language Models (LLMs).
Part of that strategy&amp;rsquo;s diagnosis depends on understanding the expected evolution of
the LLM ecosystem, which we&amp;rsquo;ve built a &lt;a href="https://craftingengstrategy.com/wardley-mapping/"&gt;Wardley map&lt;/a&gt; to better explore.&lt;/p&gt;
&lt;p&gt;This map of the LLM space focuses on how product companies should address the
proliferation of model providers such as Anthropic, Google and OpenAI,
as well as the proliferation of LLM product patterns like agentic workflows, Retrieval Augmented Generation (RAG),
and running &lt;a href="https://github.com/openai/evals"&gt;evals to maintain performance as models change&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;This is an exploratory, draft chapter for a book on engineering strategy that I&amp;rsquo;m brainstorming in &lt;a href="https://lethain.com/tags/eng-strategy-book/"&gt;#eng-strategy-book&lt;/a&gt;.&lt;/em&gt;
&lt;em&gt;As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="reading-this-document"&gt;Reading this document&lt;/h2&gt;
&lt;p&gt;To quickly understand the analysis within this Wardley Map,
read from top to bottom to understand this analysis.
If you want to understand how this map was &lt;em&gt;written&lt;/em&gt;, then you should
read section by section from the bottom up, starting with Users, then Value Chains, and so on.&lt;/p&gt;
&lt;p&gt;More detail on this structure in &lt;a href="https://craftingengstrategy.com/wardley-mapping/"&gt;Refining strategy with Wardley Mapping&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="how-things-work-today"&gt;How things work today&lt;/h2&gt;
&lt;p&gt;If Retrieval Augmented Generation (RAG) was the trending LLM pattern of 2023,
and you could reasonably argue that agents&amp;ndash;or agentic workflows&amp;ndash;are the pattern of 2024,
then it&amp;rsquo;s hard to guess what the patterns of tomorrow will be, but it&amp;rsquo;s likely
that there are more, new patterns coming our way.
LLMs are a proven platform today, and now are being applied widely to discover new patterns.
It&amp;rsquo;s a safe bet that validating these patterns will continue to drive product companies to support additional
infrastructure components (e.g. search indexes to support RAG).&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/llm-wardley-1.png" alt="Current state of LLM ecosystem"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Current state of LLM ecosystem&lt;/p&gt;
&lt;p&gt;This proliferation of patterns has created a significant cost for these product companies,
a problem which market forces are likely to address as offerings evolve.&lt;/p&gt;
&lt;h2 id="transition-to-future-state"&gt;Transition to future state&lt;/h2&gt;
&lt;p&gt;Looking at the evolution of the LLM ecosystem, there are two questions
that I believe will define the evolution of the space:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Will LLM framework platforms for agents, RAG, and so on, remain bundled with
model providers such as OpenAI and Anthropic?
Or will they, instead, split with models and platforms being offered separately?&lt;/li&gt;
&lt;li&gt;Which elements of LLM frameworks will be productizable in the short-term?
For example, running evals seems like a straightforward opportunity for bundling,
as would providing &lt;em&gt;some&lt;/em&gt; degree of agent support.
Conversely, bundling RAG might seem straightforward but most production use cases would
require real-time updates, incurring the full complexity of operating scaled search clusters.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Depending on the answers to those questions, you might draw a very different map.
This map answers the first question by imagining that LLM platforms will decouple from model providers, while
also allowing you to license with that platform for model access rather than needing
to individually negotiate with each model provider.
It answers the second question by imagining that most non-RAG functionality will move into a bundled
platform provider. Given the richness of investment in the current space, it
seems safe to believe that every plausible combination will exist to some degree
until the ecosystem eventually stabilizes in one dominant configuration.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/llm-wardley-2.png" alt="Pipeline of LLM platform bundling"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Pipeline of LLM platform bundling&lt;/p&gt;
&lt;p&gt;The key drivers of this configuration are that the LLM ecosystem is investing
new patterns every year, and companies are spinning up haphazard internal solutions
to validate those patterns, but ultimately few product companies are able to effectively fund these
sorts of internal solutions in the long run.&lt;/p&gt;
&lt;p&gt;If this map is correct, then it means eventual headwinds faced by both model providers (who are inherently
limited to providing their own subset of models) as well as narrow LLM platform providers (who
can only service a subset of LLM patterns). The likely best bet for a product company in this future
is adopting the broadest LLM pattern platforms today, and to explicitly decouple pattern platform from model provider.&lt;/p&gt;
&lt;h2 id="user--value-chains"&gt;User &amp;amp; Value Chains&lt;/h2&gt;
&lt;p&gt;The LLM landscape is evolving rapidly, with some techniques getting introduced and reaching wide-spread adoption
within a single calendar year.
Sometimes those widely adopted techniques are &lt;em&gt;actually&lt;/em&gt; being adopted, and other times it&amp;rsquo;s closer to &amp;ldquo;conference-talk driven development&amp;rdquo;
where folks with broad platforms inflate the maturity of industry adoption.&lt;/p&gt;
&lt;p&gt;The three primary users attempting to navigate that dynamism are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Product Engineers&lt;/strong&gt; are looking for faster, easier solutions to deploying LLMs across
the many, evolving parameters: new models, support for agents, solutions to offload the search
dimensions of retrieval-augmented-generation (RAG), and so on.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Machine Learning Infrastructure&lt;/strong&gt; team is responsible for the effective usage of the mechanisms,
and steering product developers towards effective adoption of these tools.
They are also, in tandem with other infrastructure engineering teams, responsible for supporting
common elements for LLM solutions, such as search indexes to power RAG implementations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Security and Compliance&lt;/strong&gt; &amp;ndash; how to ensure models are hosted safely and securely,
and that we&amp;rsquo;re only sending approved information?
how do we stay in alignment with rapidly evolving AI risks and requirements?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;To keep the map focused on evolution rather than organizational dynamics,
I&amp;rsquo;ve consolidated a number of teams in slightly artificial ways,
and omitted a few teams that are certainly worth considering.
Finance needs to understand the cost and usage
of LLM usage. Security and Compliance are really different teams, with both overlapping and distinct requirements between them.
Machine Learning Infrastructure could be split into two distinct teams with somewhat conflicting perspectives
on who should own things like search infrastructure.&lt;/p&gt;
&lt;p&gt;Depending on what &lt;em&gt;you&lt;/em&gt; want to learn from the map, you might prefer to combine, split and introduce
a different set of combinations than I&amp;rsquo;ve selected here.&lt;/p&gt;</description></item><item><title>Navigating Private Equity ownership.</title><link>https://craftingengstrategy.com/private-equity-strategy/</link><pubDate>Mon, 11 Nov 2024 06:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/private-equity-strategy/</guid><description>&lt;p&gt;In 2020, you could credibly argue that &lt;a href="https://www.readmargins.com/p/zirp-explains-the-world"&gt;ZIRP explains the world&lt;/a&gt;,
but that&amp;rsquo;s an impossible argument to make in 2024 when zero-interest rate policy is only a fond memory.
Instead, we&amp;rsquo;re seeing a number of companies designed for rapid expansion, learning to adapt
to a world that expects immediate free cash flow rather than accepting the sweet promise of discounted future cash flow.&lt;/p&gt;
&lt;p&gt;This chapter aims to tackle that problem head-on, taking the role of an engineering organization attempting to navigate
new ownership by a private equity group. It&amp;rsquo;s an increasingly frequent scenario: after many years of learning to operate under the direction of its original founders,
and the brief excitement of going public, now there&amp;rsquo;s a short runway to change operating models.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s call this company Fungible Ecommerce Company. It&amp;rsquo;s a platform for supporting online commerce,
and this is their Engineering Leadership team&amp;rsquo;s attempt to think through
their options while waiting for new ownership to provide concrete guideposts.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;This is an exploratory, draft chapter for a book on engineering strategy that I&amp;rsquo;m brainstorming in &lt;a href="https://lethain.com/tags/eng-strategy-book/"&gt;#eng-strategy-book&lt;/a&gt;.&lt;/em&gt;
&lt;em&gt;As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="reading-this-document"&gt;Reading this document&lt;/h2&gt;
&lt;p&gt;To apply this strategy, start at the top with &lt;em&gt;Policy&lt;/em&gt;. To understand the thinking behind this strategy, read sections in reverse order, starting with &lt;em&gt;Explore&lt;/em&gt;, then &lt;em&gt;Diagnose&lt;/em&gt; and so on.
Relative to the default structure, this document has been refactored in two ways
to improve readability:
first, &lt;em&gt;Operation&lt;/em&gt; has been folded into &lt;em&gt;Policy&lt;/em&gt;;
second, &lt;em&gt;Refine&lt;/em&gt; has been embedded in &lt;em&gt;Diagnose&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;More detail on this structure in &lt;a href="https://lethain.com/readable-engineering-strategy-documents"&gt;Making a readable Engineering Strategy document&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="policy"&gt;Policy&lt;/h2&gt;
&lt;p&gt;Our policy for managing our new ownership structure is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;We believe our new ownership will provide a specific target for Research and Development (R&amp;amp;D) operating expenses
during the upcoming financial year planning. &lt;strong&gt;We will revise these policies again once we have explicit targets&lt;/strong&gt;,
and will delay planning around reductions until we have those numbers to avoid running two overlapping processes.&lt;/p&gt;
&lt;p&gt;That said, looking at our R&amp;amp;D investment relative to comparably growing peer set,
we believe that we&amp;rsquo;ll get pressure to moderately reduce our spend. We aim to accomplish
that reduction through a series of policies and one-off infrastructure projects, without requiring a major
reduction in headcount spend.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;We will move to an &amp;ldquo;N-1&amp;rdquo; backfill policy&lt;/strong&gt;, where departures are backfilled with a less senior level.
&lt;strong&gt;We will also institute a strict maximum of one Principal Engineer per business unit&lt;/strong&gt;, with any exceptions approved
in writing by the CTO&amp;ndash;this applies for both promotions and external hires.
These policies are effective immediately, and are based on our &lt;a href="https://craftingengstrategy.com/private-equity-model/"&gt;model of engineering-org seniority-mix&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We commit to this policy reducing headcount costs by approximately 5% YoY every year for the foreseeable future.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We evaluated a number of potential changes to our geographical hiring strategy,
but we believe that staffing engineers with cross-functional partners (Product, Marketing, Sales, and so on)
is a priority.
We have not been able to reach an agreement cross-functionally, and as such
&lt;strong&gt;we are not changing our geographical hiring strategy at this time&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;If we can agree on a policy here, we could accomplish 10-20% reduction in cost over 2-3 years,
but the details matter a great deal, so we cannot commit to a specific outcome until we get
more cross-functional alignment.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Our infrastructure spend has grown significantly more slowly than revenue for the past two years,
meaning that we&amp;rsquo;ve successfully implemented our infrastructure spend strategy of
&lt;a href="https://infraeng.dev/efficiency/"&gt;growing infrastructure costs more slowly than revenue&lt;/a&gt;.
&lt;strong&gt;We will continue our current infrastructure efficiency strategy&lt;/strong&gt;, and believe there are relatively few high impact efficiency opportunities at this point.&lt;/p&gt;
&lt;p&gt;We commit to growing infrastructure spend at no more than 5% YoY, significantly lower than our projected
revenue increase of 25% YoY.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There are two narrow infrastructure spend opportunities, both related to the integration of prior acquisitions
into our shared infrastructure and away from one-off approaches.
&lt;strong&gt;We will prioritize the post-acquisition integration work next quarter&lt;/strong&gt;, with the goal of fully standardizing all infrastructure
across the company into the stack maintained by our centralized Infrastructure Engineering team.&lt;/p&gt;
&lt;p&gt;We commit to a one-time reduction in infrastructure of 3% YoY.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We believe there are significant opportunities to reduce R&amp;amp;D maintenance investments,
but we don&amp;rsquo;t have conviction about which particular efforts we should prioritize.
&lt;strong&gt;We will kickoff a working group to identify the features with the highest support load.&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="diagnose"&gt;Diagnose&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve diagnosed Fungible Ecommerce Company&amp;rsquo;s current state as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Fungible Ecommerce Company&amp;rsquo;s revenue has grown 20-25% YoY for the past two years,
and our target for next year is 25% YoY revenue growth.
While this is not a guarantee, we grew slower than 25% last year, it&amp;rsquo;s a defensible
goal that we have a good chance of achieving.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Our Engineering headcount costs have grown by 15% YoY this year, and 18% YoY the prior year.
Headcount grew 7% and 9% respectively, with the difference between headcount and headcount costs
explained by salary band adjustments (4%), a focus on hiring senior roles (3%), and increased hiring in higher cost geographic regions (1%).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Based on general practice, it seems likely that our new Private Equity ownership will expect us
to reduce R&amp;amp;D headcount costs through a reduction.
However, without concrete details, we cannot yet make structured decisions.
Our strategy will depend significantly on the scale of any proposed reductions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Infrastructure engineering spend (including vendors) has grown by 4-5% YoY for the past three years.
We made a significant push on reducing costs three years ago, and have grown slower
than revenue since then.&lt;/p&gt;
&lt;p&gt;There are few remaining opportunities to significantly reduce infrastructure costs,
but we&amp;rsquo;ve made several acquisitions since our prior infrastructure consolidation,
that represent significant potential savings: roughly one-time 1.5% YoY reductions for each of two largest opportunities.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A significant portion of our current R&amp;amp;D spend goes into maintaining our existing functionality,
particularly functionality related to earlier geo-expansion efforts that only apply narrowly to some small markets.
We suspect there&amp;rsquo;s an opportunity to reduce maintenance overhead here.&lt;/p&gt;
&lt;p&gt;However, we lack believable metrics on both (1) time spent maintaining the software
and (2) time that would be saved by these cleanup efforts. As a result, it&amp;rsquo;s hard
to pitch projects of this sort as revenue saving with much conviction.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="explore"&gt;Explore&lt;/h2&gt;
&lt;p&gt;Financial markets evaluate companies in comparison to their peers.
This is most obvious in public markets, where there&amp;rsquo;s significant information transparency
about business performance, and sufficient liquidity to allow markets to revalue companies
in something approaching real-time. While private equity firms generally take controlling interest
of private businesses, or with the intent of taking the business private if it happens to be public,
they value businesses in the same way.&lt;/p&gt;
&lt;p&gt;In this exploration, we&amp;rsquo;re going to dig into two particular questions.
First, we&amp;rsquo;re going to dig into a dataset on the performance of public technology companies,
and then second we&amp;rsquo;re going to look into the concrete example of Zendesk,
&lt;a href="https://www.reuters.com/markets/deals/zendesk-goes-private-10-bln-deal-2022-11-22/"&gt;who were taken private in 2022&lt;/a&gt;
after being bought by two private equity firms.&lt;/p&gt;
&lt;h3 id="comparable-companies"&gt;Comparable companies&lt;/h3&gt;
&lt;p&gt;Exploring the benchmarking question first, most investors evaluate engineering within the context
of the overall Research &amp;amp; Development (R&amp;amp;D) investment. They generally judge that spend by constructing
a scatterplot of R&amp;amp;D spend versus year-over-year revenue growth for a cohort of similar companies.
Perfectly similar companies don&amp;rsquo;t exist, so this cohort is generally constructed from companies in similar
industries, with similar revenue, and operating in the same regions.&lt;/p&gt;
&lt;p&gt;We have reached out to our investors to see if they can provide the internal datasets they use for
this analysis, but in the meantime we&amp;rsquo;ve developed a directionally useful dataset using the
&lt;a href="https://iri.jrc.ec.europa.eu/scoreboard/2023-eu-industrial-rd-investment-scoreboard"&gt;2023 R&amp;amp;D Investment Scoreboard&lt;/a&gt;,
with some &lt;a href="https://docs.google.com/spreadsheets/d/1IwO3XWDd1inVXLBw4FhkaQh5OuUlYQf0NsX95nPiOtA/edit?gid=943277176#gid=943277176"&gt;rough cutting of the data&lt;/a&gt;
to remove outliers.
(If we repeat this process, we will use the &lt;a href="https://www.sec.gov/search-filings"&gt;SEC&amp;rsquo;s EDGAR database&lt;/a&gt; to pull
a more specifically helpful dataset, but this has been a useful starting point.)&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/rd-opincome-2022.png" alt="R&amp;amp;D investment versus operating profit growth at public companies"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;R&amp;D investment versus operating profit growth at public companies&lt;/p&gt;
&lt;p&gt;This isn&amp;rsquo;t a perfect dataset, we prefer revenue growth over growth in operating profit, but
it&amp;rsquo;s the best option within the dataset that we were able to quickly pull down.
Nonetheless, there&amp;rsquo;s a clear strong performer quadrant in top-left that we can plot ourselves into
to understand our general performance, which is discussed further in the diagnosis section above.&lt;/p&gt;
&lt;h3 id="zendesk"&gt;Zendesk&lt;/h3&gt;
&lt;p&gt;The second topic of exploration we dug into is understanding the general sequence of steps taken by private
equity ownership after acquiring a company.
For an example with available public documentation, we focused on &lt;a href="https://www.reuters.com/markets/deals/zendesk-goes-private-10-bln-deal-2022-11-22/"&gt;the purchase of Zendesk in 2022&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To start, we pulled Zendesk&amp;rsquo;s &lt;a href="https://www.sec.gov/ix?doc=/Archives/edgar/data/0001463172/000146317222000236/zen-20220630.htm"&gt;final 10-Q before going private&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/zendesk-pl-2022.png" alt="Zendesk&amp;rsquo;s P&amp;amp;L from their 2022 10-Q"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Zendesk's P&amp;L from their 2022 10-Q&lt;/p&gt;
&lt;p&gt;Taking those values, we can reformat them into a chart focusing on the year-over-year
changes in the 6 months period ending in 2022 versus the same period in 2021.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/zendesk-yoy-6m-2022.png" alt="Zendesk&amp;rsquo;s P&amp;amp;L from their 2022 10-Q, reformatted to show year-over-year changes"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Zendesk's P&amp;L from their 2022 10-Q, reformatted to show year-over-year changes&lt;/p&gt;
&lt;p&gt;The changes are a bit concerning. Sales and Marketing costs have grown more slowly than revenue, which is positive,
but Research and Development (R&amp;amp;D) expenses have grown about 50% faster than revenue, and General and Administration (G&amp;amp;A) charges
have grown more than twice as quickly as revenue.&lt;/p&gt;
&lt;p&gt;From those growth rates, we would assume that the new ownership might push to aggressively reduce spend in those two
areas, which is indeed what history suggests happened, with a
&lt;a href="https://www.zendesk.com/newsroom/articles/company-announcement/"&gt;November, 2022 reduction&lt;/a&gt;,
followed some months later by a
&lt;a href="https://www.zendesk.com/newsroom/articles/zendesk-workforce-reduction/"&gt;May, 2023 reduction&lt;/a&gt;.
It&amp;rsquo;s hard to get precise data here, but it&amp;rsquo;s our impression that these reductions focused on
areas where expenses were growing quickly, with particular focus on G&amp;amp;A functions.&lt;/p&gt;</description></item><item><title>Using systems modeling to refine strategy.</title><link>https://craftingengstrategy.com/systems-modeling/</link><pubDate>Mon, 04 Nov 2024 07:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/systems-modeling/</guid><description>&lt;p&gt;While I was probably late to learn the concept
of &lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt;,
I might have learned about systems modeling too early in my career,
stumbling on Donella Meadows&amp;rsquo; &lt;em&gt;&lt;a href="https://www.amazon.com/Thinking-Systems-Donella-H-Meadows-ebook/dp/B005VSRFEA/"&gt;Thinking in Systems: A Primer&lt;/a&gt;&lt;/em&gt;
before I began my career in software.
Over the years, I&amp;rsquo;ve discovered a number of ways to misuse systems modeling,
but it remains the most effective, flexible tool I&amp;rsquo;ve found to debugging complex problems.&lt;/p&gt;
&lt;p&gt;In this chapter, we&amp;rsquo;ll work through:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a two-minute primer on the basics of systems modeling, along with resources for those looking for a deeper exploration
of the foundational topics&lt;/li&gt;
&lt;li&gt;when systems modeling is a useful technique, and when it&amp;rsquo;s better to
rely on other refinement techniques like Wardley mapping or strategy testing instead&lt;/li&gt;
&lt;li&gt;a discussion on systems modeling tooling, why there&amp;rsquo;s no perfect systems modeling tool out there,
and how I recommend picking the tool that you build proficiency with&lt;/li&gt;
&lt;li&gt;the steps to build a systems model for a problem you&amp;rsquo;re engaging with&lt;/li&gt;
&lt;li&gt;how to document your learnings from a systems model to maximize the
chance that others will pay attention to it rather than ignoring
it due to the unfamiliarity or complexity of the tooling&lt;/li&gt;
&lt;li&gt;what systems modeling can&amp;rsquo;t do, even if you really want to believe it can&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After working through this chapter&amp;rsquo;s overview of systems modeling,
you can see the approaches implemented in a number of system models created
to refine the strategies throughout this book.
The theory of systems modeling is certainly interesting, but hopefully
seeing real models in support of concrete engineering strategies will
be even more useful.&lt;/p&gt;
&lt;h2 id="two-minute-primer"&gt;Two-minute primer&lt;/h2&gt;
&lt;p&gt;If you want an exceptional introduction to systems thinking, there&amp;rsquo;s no better place to go than
Donella Meadows&amp;rsquo; &lt;a href="https://www.amazon.com/dp/1603580557"&gt;Thinking in Systems&lt;/a&gt;.
If you want a worse, but shorter, introduction, I wrote a short &lt;a href="https://lethain.com/systems-thinking/"&gt;Introduction to systems thinking&lt;/a&gt;
available online and in &lt;em&gt;An Elegant Puzzle&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;If you want something &lt;em&gt;even shorter&lt;/em&gt;, then here&amp;rsquo;s the briefest that I can manage.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/QualityMentalModels.png" alt="Requests succeeding and failing between a user, load balancer, and server"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Requests succeeding and failing between a user, load balancer, and server&lt;/p&gt;
&lt;p&gt;Accumulations are called &lt;em&gt;stocks&lt;/em&gt;. For example, each of the boxes (&lt;code&gt;Requests&lt;/code&gt;, &lt;code&gt;Server&lt;/code&gt;, etc)
in the above diagram is a stock. Changes to stocks are called &lt;code&gt;flows&lt;/code&gt;. Every arrow (&lt;code&gt;OK&lt;/code&gt;, &lt;code&gt;Error in server&lt;/code&gt;, etc)
between stocks in the diagram is a a flow.&lt;/p&gt;
&lt;p&gt;Systems modeling is the practice of using various configurations of stocks and flows
to understand circumstances that might otherwise have surprising behavior or are too slow
to understand from measurement.&lt;/p&gt;
&lt;p&gt;For example, we can use the above model to explore the tradeoffs between a load balancer that does and does not cap throughput
to a load-sensitive service behind it.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/two-min-primer-chart.png" alt="Successful and errored requests in two different scenarios"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Successful and errored requests in two different scenarios&lt;/p&gt;
&lt;p&gt;Without a model, you might get into a philosophical debate about how ridiculous it is that the downstream server
is load-sensitive. With the model, it&amp;rsquo;s immediately obvious that it&amp;rsquo;s worthwhile protecting it, even if it certainly
is concerning that it is so sensitive. This is what models do: they create a cheap way to understand reality when
fully understanding reality is cumbersome.&lt;/p&gt;
&lt;div class="bg-light-gray br4 ph3 pv1"&gt;
&lt;p&gt;&lt;strong&gt;More systems thinking resources&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="https://www.amazon.com/Thinking-Systems-Donella-H-Meadows-ebook/dp/B005VSRFEA/"&gt;Thinking in Systems: A Primer&lt;/a&gt;&lt;/em&gt; by Donella Meadows&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="https://www.amazon.com/Business-Dynamics-Systems-Thinking-Modeling/dp/007238915X"&gt;Business Dynamics: Systems Thinking and Modeling for a Complex World&lt;/a&gt;&lt;/em&gt; by John D. Sterman&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href="https://www.amazon.com/Introduction-Systems-Thinking-Richmond-2004-11-15/dp/B01FGPA45Y/"&gt;An Introduction to Systems Thinking&lt;/a&gt;&lt;/em&gt; by Barry Richmond&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id="when-is-systems-modeling-useful"&gt;When is systems modeling useful?&lt;/h2&gt;
&lt;p&gt;Although &lt;a href="https://craftingengstrategy.com/refine/"&gt;refinement&lt;/a&gt; is an important step of developing any strategy,
some refinement techniques work better for any given strategy.
Systems modeling is extremely useful in three distinct scenarios:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When you&amp;rsquo;re unsure where leverage points might be in a complex system,
modeling allows you to cheaply test which levers might be meaningful.
For example, &lt;a href="https://craftingengstrategy.com/llm-onboarding-model/"&gt;modeling onboarding drivers in a ride-sharing app&lt;/a&gt;
showed that improving onboarding was less important than reengaging departed drivers.&lt;/li&gt;
&lt;li&gt;When you have significant data to compare against,
which allows you to focus in on the places where the real data and your model are in tensions.
For example, I was able to &lt;a href="https://lethain.com/productivity-in-the-age-of-hypergrowth/"&gt;model the impact of hiring on Uber&amp;rsquo;s engineering productivity&lt;/a&gt;,
and then compare that with internal data.&lt;/li&gt;
&lt;li&gt;When stakeholder disagreements are based in their unstated intuitions,
models can turns those intuitions into something structured that can be debated more effectively.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In all three categories, modeling makes it possible iterate your thinking much faster than running a live process or technology experiment
with your team. I sometimes hear concerns that modeling slows things down, but this is just an issue of familiarity.
The more you practice, modeling can be faster than asking for advice from industry peers.
The models I&amp;rsquo;ve developed for this book took less than an hour. (With one notable exception: &lt;a href="https://craftingengstrategy.com/llm-adoption-model/"&gt;modeling Large Language Models (LLMs) impacts on developer experience&lt;/a&gt;,
which took much longer because I deliberately used an impractical tool to reveal the importance of good tooling).&lt;/p&gt;
&lt;p&gt;Additionally, systems modeling will often expose counter-intuitive dimensions to the problem you&amp;rsquo;re working on.
For example, the model I mentioned above on LLMs&amp;rsquo; impact on developer experience suggests that effective LLMs
might cause us to spend &lt;em&gt;more&lt;/em&gt; time writing and testing code (but less fixing issues discovered post-production).
This is a bit unexpected, as you might imagine they&amp;rsquo;d reduce testing time, but reducing testing time is only valuable
to the extent that issues identified in production remains&amp;ndash;at worst&amp;ndash;constant; if issues found in production increases,
then reduced testing time does not contribute to increased productivity.&lt;/p&gt;
&lt;p&gt;Modeling without praxis creates unsubstantiated conviction.
However, in combination with praxis, I&amp;rsquo;ve encountered few other techniques that can similar accelerate learning.&lt;/p&gt;
&lt;p&gt;That doesn&amp;rsquo;t mean that it&amp;rsquo;s always the ideal refinement technique.
If you already have conviction on the general approach, and want to refine the narrow details,
then &lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt; is a better option.
If you&amp;rsquo;re trying to understand the evolution of a wider ecosystem, then you may prefer
Wardley mapping.&lt;/p&gt;
&lt;h2 id="tooling"&gt;Tooling&lt;/h2&gt;
&lt;p&gt;For an idea that&amp;rsquo;s quite intuitive, the tools of systems modeling are a real obstacle to wider adoption.
Perhaps a downstream consequence of many early, popular systems modeling tools being quite expensive,
the tooling ecosystems for systems modeling has remained fragmented for some time.
There also appears to be a mix of complex requirements, patent consolidation, and perceived small market size
that&amp;rsquo;s discouraged a modern solution from consolidating the tooling market.&lt;/p&gt;
&lt;p&gt;Earlier, I mentioned that system modeling is extremely quick, but that many folks find it a slow, laborious process.
Part of that is an issue of practice, but I suspect that the quality of modeling tooling is at least as big a part of the challenge.
In the &lt;a href="https://craftingengstrategy.com/llm-adoption-model/"&gt;LLMs impact on developer experience model&lt;/a&gt;, I go about the steps of building the model in an increasingly messy spreadsheet.
This was slow, challenging, and extremely brittle. Even after finishing the model, I couldn&amp;rsquo;t extend it effectively to test new ideas,
and I inadvertently introduced a number of bugs into the implementation.&lt;/p&gt;
&lt;p&gt;Going in the opposite direction, I explored using a handful of tools, such as &lt;a href="https://sagemodeler.concord.org/"&gt;Sagemodeler&lt;/a&gt;
or &lt;a href="https://insightmaker.com/"&gt;InsightMaker&lt;/a&gt;, which seemed like a potentially simpler toolchain
than the one I typically rely on. There are so many of these introductory toolchains for systems modeling,
but I generally find that they&amp;rsquo;re either constrained in their capabilities, have a fairly high learning curve,
or make it difficult to share your model with others.&lt;/p&gt;
&lt;p&gt;In the end, I wound up back at the toolchain that I use,
which happens to be one that I wrote some years ago,&lt;a href="https://github.com/lethain/systems"&gt;lethain/systems&lt;/a&gt;.
This is far from a perfect toolchain, but I think it&amp;rsquo;s a relatively effective mechanism for demonstrating
systems modeling for a few reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;quick to create models and iterate on those models&lt;/li&gt;
&lt;li&gt;easy to share those models with others for inspection and their own exploration&lt;/li&gt;
&lt;li&gt;relatively low surface area for bugs in your models&lt;/li&gt;
&lt;li&gt;free, open-source, self-hosted toolchain that integrates well with Jupyter ecosystem
for diagramming, modeling and so on&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;You should absolutely pick &lt;em&gt;any&lt;/em&gt; tool that feels right to you, and practice with it until you feel confident
quickly modeling scenarios. Afterwards, I wouldn&amp;rsquo;t recommend spending too much time thinking about tools at all:
the most important thing is to build models and learn from them quickly, and almost any tool will be sufficient
to that goal with some deliberate practice.&lt;/p&gt;
&lt;h2 id="how-to-model"&gt;How to model&lt;/h2&gt;
&lt;p&gt;Learning to system model takes some practice, so we&amp;rsquo;ll approach the details of learning to
model from two directions.
First, by documenting a general structure for approaching modeling,
and second by providing breadcrumbs to the models
developed in this book for deeper exploration of particular modeling ideas.&lt;/p&gt;
&lt;p&gt;The structure to systems modeling that I find effective is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sketch&lt;/strong&gt; the stocks and flows on paper or a diagramming application (e.g.
&lt;a href="https://excalidraw.com/"&gt;Excalidraw&lt;/a&gt;, Figma, Whimsical, etc).
Use whatever you&amp;rsquo;re comfortable with.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Reason&lt;/strong&gt; about how you would expect a potential change to shift the flows through the diagram.
Which flows do you expect to go up, and which down, and how would that movement help you
evaluate whether your strategy is working?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Model&lt;/strong&gt; the stocks and flows in your spreadsheet tool of choice.
Start by modeling the flows from left to right (e.g. the happy path flows). Once you have that fully working,
then start modeling the right to left flows (e.g. the exception path flows).&lt;/p&gt;
&lt;p&gt;See the &lt;a href="https://craftingengstrategy.com/llm-adoption-model/"&gt;Modeling impact of LLMs on Developer Experience&lt;/a&gt; model
for a deep dive into the particulars of creating a model.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Exercise&lt;/strong&gt; the model by experimenting with a number of different starting values
and determining how the rates influence the model&amp;rsquo;s values.
This is essentially performing &lt;a href="https://www.investopedia.com/terms/s/sensitivityanalysis.asp"&gt;sensitivity analysis&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Document&lt;/strong&gt; the work done in the above sections into a standalone writeup.
You can then link to that writeup from strategies that benefit from a given model&amp;rsquo;s insights.
You might link to any &lt;a href="https://craftingengstrategy.com/strategy-steps/"&gt;section of your strategy&lt;/a&gt;, depending on what
topic the particular model explores.
I recommend decoupling models from specific strategies, as &lt;em&gt;generally&lt;/em&gt; the details of any given
model are a distraction from understanding a strategy, and it&amp;rsquo;s best to avoid that distraction unless
a reader is surprised by the conclusion, in which case, the link out supports drilling into the details.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As always, this is the sequence of steps that I&amp;rsquo;d encourage you to follow,
and the sequence that I generally follow, but you should adapt them to solve
the particular problems at hand.
Over time, my experience is that most of these steps&amp;ndash;excluding documentation&amp;ndash;turn into a single
iterative process, and that I document everything after several iterations.&lt;/p&gt;
&lt;h2 id="breadcrumbs-for-deeper-exploration"&gt;Breadcrumbs for deeper exploration&lt;/h2&gt;
&lt;p&gt;Now that we&amp;rsquo;ve covered the overarching approach to system modeling,
here are the breadcrumbs to specific models that go deeper on particular elements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://craftingengstrategy.com/llm-onboarding-model/"&gt;Modeling driver onboarding&lt;/a&gt;
explores how the driver lifecycle at Theoretical Ride Sharing might be improved
with LLMs,
and introduces using the &lt;a href="https://github.com/lethain/systems"&gt;lethain/systems&lt;/a&gt; library
for modeling&lt;/li&gt;
&lt;li&gt;&lt;a href="https://craftingengstrategy.com/llm-adoption-model/"&gt;Modeling impact of LLMs on Developer Experience&lt;/a&gt;
looks at how LLMs might impact developer experience at Theoretical Ride Sharing,
and demonstrates (the downsides of) modeling with a spreadsheet&lt;/li&gt;
&lt;li&gt;&lt;a href="https://craftingengstrategy.com/private-equity-model/"&gt;Modeling engineering backfill strategy&lt;/a&gt;
studies the financial consequences of various policies for how we backfill departed
engineers in an engineering organization, and introduces further &lt;a href="https://github.com/lethain/systems"&gt;lethain/systems&lt;/a&gt; features&lt;/li&gt;
&lt;li&gt;&lt;a href="https://craftingengstrategy.com/uber-strategy-model/"&gt;Modeling service provisioning at Uber&lt;/a&gt;
determines whether it&amp;rsquo;s possible to optimize an existing service provisioning
workflow or if it instead needs to be replaced with a self-service workflow&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Beyond these models, you can find other systems models that I&amp;rsquo;ve written
on my blog&amp;rsquo;s &lt;a href="https://lethain.com/tags/systems-thinking/"&gt;systems-thinking category&lt;/a&gt;, and there
are numerous, great examples in the materials references in the systems modeling primer
section above.&lt;/p&gt;
&lt;h2 id="how-to-document-a-model"&gt;How to document a model&lt;/h2&gt;
&lt;p&gt;Much like &lt;a href="https://craftingengstrategy.com/readable-strategy/"&gt;documenting strategy is challenging&lt;/a&gt;,
communicating with models in a professional setting is challenging.
The core problems is that there are many distinct groups of model readers.
Some will lack familiarity with the tooling you use to develop models.
Others will try to refine, or invalidate, your model by digging into the details.&lt;/p&gt;
&lt;p&gt;I navigate those mismatches by focusing first on the audience who
is least likely to dig into the model. I still want to keep all the details
handy, ideally in the rawest form possible to allow others to manipulate the model
themselves, but it&amp;rsquo;s very much my second goal when documenting a model.&lt;/p&gt;
&lt;p&gt;From experience, I recommended this order (it&amp;rsquo;s also the order used in the models
in this book, so you&amp;rsquo;ll see it in practice a number of times):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;start with learning section, with charts showing what model has taught you&lt;/li&gt;
&lt;li&gt;sketch and explaining the stocks and flows&lt;/li&gt;
&lt;li&gt;reason about what the sketch itself teaches you&lt;/li&gt;
&lt;li&gt;explain how you developed the model, with an emphasis on any particularly complex portions&lt;/li&gt;
&lt;li&gt;exercise the model by testing how changing the flows and stocks leads to different outcomes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you remember nothing else, your document should reflect the reality that
most people don&amp;rsquo;t care how you built the model, and just want the insights.
Give them the insights early, and assume no one will trust your model nearly as much as you do.
Models are an input into the strategy, never a reliable sole backer for a strategy.&lt;/p&gt;
&lt;h2 id="what-systems-modeling-isnt"&gt;What systems modeling isn&amp;rsquo;t&lt;/h2&gt;
&lt;p&gt;Although I find systems modeling a uniquely powerful way to accelerate learning,
I&amp;rsquo;ve also encountered many practitioners who believe that their models &lt;em&gt;are&lt;/em&gt; reality
rather than &lt;em&gt;reflecting&lt;/em&gt; reality.
Over time, I&amp;rsquo;ve developed a short list of cautions to help
would-be modelers avoid overcommitting to their model&amp;rsquo;s insights:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;When your model and reality conflict, reality is always right.&lt;/strong&gt;
At Stripe, we developed &lt;a href="https://lethain.com/modeling-reliability/"&gt;a model to guide our reliability strategy&lt;/a&gt;.
The model was intuitively quite good, but its real-world results were mixed.
Attachment to our early model distracted us (too much time on collecting and classifying data)
and we were slow to engage with the most important problems (maximizing impact of scarce mitigation bandwidth, and growing mitigation bandwidth).
We’d have been more impactful if we engaged directly with what reality was teaching us rather than looking for reasons to disregard reality’s lessons.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Models are immutable, but reality isn’t.&lt;/strong&gt;
I once joined an organization investing tremendous energy into hiring but nonetheless struggling to hire.
Their intuitive model pushed them to spend years investing into top of funnel optimization,
and later steered them to improving the closing process.
What they weren’t able to detect was that &lt;a href="https://lethain.com/getting-to-yes/"&gt;misalignment in interviewer expectations&lt;/a&gt; was the largest hurdle in hiring.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Every model omits information; some omit critical information.&lt;/strong&gt;
The service migration at Uber is a great example: modeling clarified that we &lt;em&gt;had&lt;/em&gt; to adopt a more aggressive
approach to our service migration in order to succeed. Subsequently, we did succeed at the migration,
but the model didn&amp;rsquo;t study the consequences of completing the migration, which were a very challenging development environment.
The model captured everything my team cared about, as the team responsible for running the migration,
but did nothing to evaluate whether the migration was a good idea overall.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In each of those situations, two things are true: the model was extremely valuable, and the model subtly led us astray.
We would have been led astray even without a model, so the key thing to remember isn&amp;rsquo;t that models are inherently misleading,
instead the risk is being overly confident about your model. A powerful tool to use in tandem with judgment, not a replacement.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Systems modeling isn&amp;rsquo;t perfect.
If you&amp;rsquo;ve already determined your strategy and want to refine the details,
then strategy testing is probably a better choice.
If you&amp;rsquo;re trying to understand the dynamics of an evolving ecosystem,
then Wardley mapping is a more appropriate tool.&lt;/p&gt;
&lt;p&gt;However, if you have the general shape, but lack conviction on how
the pieces fit together, systems modeling is a remarkable tool.
After this chapter, you know how to select appropriate tooling,
and how to use that tooling to model your problem at hand.
Next, we&amp;rsquo;ll work through systems modeling &lt;a href="https://lethain.com/tags/systems-thinking/"&gt;a handful of detailed problems&lt;/a&gt;
to provide concrete examples of applying this technique.&lt;/p&gt;</description></item><item><title>Eng org seniority-mix model.</title><link>https://craftingengstrategy.com/private-equity-model/</link><pubDate>Sun, 27 Oct 2024 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/private-equity-model/</guid><description>&lt;p&gt;One of the trademarks of private equity ownership is the expectation that either the company maintains their current margin
and grows revenue at 25-30%, or they instead grow slower and increase their free cash flow year over year.
In many organizations, engineering costs have a major impact on their free cash flow.
There are many costs to reduce, cloud hosting and such, but inevitably part of the discussion is
addressing engineering headcount costs directly.&lt;/p&gt;
&lt;p&gt;One of the largest contributors to engineering headcount costs is your organization&amp;rsquo;s seniority mix:
more senior engineers are paid quite a bit more than earlier career engineers.
This model looks at how various policies impact an organization&amp;rsquo;s seniority mix.&lt;/p&gt;
&lt;p&gt;In this chapter, we&amp;rsquo;ll work to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Summarize this model&amp;rsquo;s learnings about policy impact on seniority mix&lt;/li&gt;
&lt;li&gt;Sketch the model&amp;rsquo;s stocks and flows&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://github.com/lethain/systems"&gt;lethain/systems&lt;/a&gt; to iteratively build and exercise the full model&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Time to start modeling.&lt;/p&gt;
&lt;h2 id="learnings"&gt;Learnings&lt;/h2&gt;
&lt;p&gt;An organization without a &amp;ldquo;backfill at N-1&amp;rdquo; hiring policy, e.g. an organization that hires a SWE2 to replace a departed SWE2,
will have an increasingly top-heavy organization over time.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/eng-costs-model-2.png" alt="Ratio of engineers at senior-most level becomes increasingly heavy over time"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Ratio of engineers at senior-most level becomes increasingly heavy over time&lt;/p&gt;
&lt;p&gt;However, even introducing the &amp;ldquo;backfill at N-1&amp;rdquo; hiring policy is insufficient, as our representation
in senior levels will become far too high, even if we stop hiring externally into our senior-most levels.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/eng-costs-model-4.png" alt="Implementing an N-1 backfill policy prevents unbounded increase of rate of senior-most engineers"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Implementing an N-1 backfill policy prevents unbounded increase of rate of senior-most engineers&lt;/p&gt;
&lt;p&gt;To fully accomplish our goal of a healthy seniority mix, we must stop hiring at senior-most levels,
implement a &amp;ldquo;backfill at N-1&amp;rdquo; policy, and cap the maximum number of individuals at the senior-most level.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/eng-costs-model-5.png" alt="N-1 backfill policy and capping number of engineers at senior-most level"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;N-1 backfill policy and capping number of engineers at senior-most level&lt;/p&gt;
&lt;p&gt;Any collection of lower-powered policies simply will not impact the model&amp;rsquo;s outcome.&lt;/p&gt;
&lt;h2 id="sketch"&gt;Sketch&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ll start by sketching this system in &lt;a href="https://excalidraw.com/"&gt;Excalidraw&lt;/a&gt;.
It&amp;rsquo;s always fine to use whatever tool you prefer, but simpler sketching tools
generally help you focus on iterating the stocks and flows&amp;ndash;without getting distracted
by tuning settings&amp;ndash;much like a designer starting with messy wireframes rather than pixel-perfect designs.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll start with sketching the junior-most level: SWE1.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/eng-costs-sketch-1.png" alt="Hiring, departures and promotions for SWE1 engineers"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Hiring, departures and promotions for SWE1 engineers&lt;/p&gt;
&lt;p&gt;We hire external candidates to become SWE1s. We have some get promoted to SWE2, some depart, and then backfill those
departures with new SWE1s.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/eng-costs-sketch-2.png" alt="Hiring and promotion lifecycle for SWE1 and SWE2"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Hiring and promotion lifecycle for SWE1 and SWE2&lt;/p&gt;
&lt;p&gt;As we start sketching the full stocks and flows for SWE2, we also introduce the idea of backfilling
at the prior level. As we replicate this pattern for two more career levels&amp;ndash;SWE3 and SWE4&amp;ndash;we get the
complete model.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/eng-costs-sketch-4.png" alt="Hiring and promotion lifecycle for four levels of career ladder"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Hiring and promotion lifecycle for four levels of career ladder&lt;/p&gt;
&lt;p&gt;The final level, SWE4, is simplified relative to the prior levels, as it&amp;rsquo;s no longer possible to get promoted to
a further level.
We could go further than this, but the model will simply get increasingly burdensome to work with,
so let&amp;rsquo;s stop with four levels.&lt;/p&gt;
&lt;h2 id="reason"&gt;Reason&lt;/h2&gt;
&lt;p&gt;When reviewing the sketched system, a few interesting conclusions emerge:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;If promotion rates at any level exceed the rate of hiring at that level plus rate of N-1 backfill at that level,
then the proportion of engineers at that level will grow over time&lt;/li&gt;
&lt;li&gt;If you are not hiring much, then this problem simplifies to promotion rate versus departure rate.
A company that does little hiring and has high retention cannot afford to promote frequently.
Promotion into senior roles will become financially restrained, even if the policy is explained
by some other mechanism&lt;/li&gt;
&lt;li&gt;Many companies use the &amp;ldquo;&lt;a href="https://lethain.com/career-levels-and-more/"&gt;career level&lt;/a&gt;&amp;rdquo; policy as the mechanism
to identify a level where promotions &lt;em&gt;generally&lt;/em&gt; stop happening.
The rationale is often not explicitly described, but we can conclude it&amp;rsquo;s likely a financial constraint
that typically incentivizes this policy&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With those starter insights, now we can get into modeling the details.&lt;/p&gt;
&lt;h2 id="model--exercise"&gt;Model &amp;amp; Exercise&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;re going to build this model using &lt;a href="https://github.com/lethain/systems"&gt;lethain/systems&lt;/a&gt;.
The first version will be relatively simple, albeit with a number of stocks given the size
of the model, and then we&amp;rsquo;ll layer on a number of additional features as we iteratively test
out a number of different scenarios.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve chosen to combine the Model and Exercise steps to showcase how each version of the model
can inspire new learnings that prompt new questions, that require a new model to answer.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;d rather view the full model and visualizations, each iteration
is &lt;a href="https://github.com/lethain/eng-strategy-models/blob/main/BackfillPolicy.ipynb"&gt;available on github&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="backfill-at-level"&gt;Backfill-at-level&lt;/h2&gt;
&lt;p&gt;The first policy we&amp;rsquo;re going to explore is backfilling a departure at the same level.
For example, if a SWE2 departs, then you go ahead and backfill them at SWE2. This intuitively
makes sense, because you needed a SWE2 before to perform the work, so why would you hire something
less senior?&lt;/p&gt;
&lt;p&gt;There are two new &lt;code&gt;systems&lt;/code&gt; concepts introduced in this model:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;For easier iteration, we&amp;rsquo;re going to use the systems modeling concept
of an &amp;ldquo;information link&amp;rdquo;, which is basically using a stock as a variable to define a flow,
Specifically, we&amp;rsquo;ll create a stock named &lt;code&gt;HiringRate&lt;/code&gt; with a size of two.
Then we&amp;rsquo;ll use that stock&amp;rsquo;s size to define hiring flows at each career level.
In programming terms, you can think as defining a reusable variable,
but you can use any stock&amp;rsquo;s size to define flows.&lt;/li&gt;
&lt;li&gt;There are effectively an infinite number of potential candidates for your company,
so we&amp;rsquo;re going to use an infinite stock, represented by initializing a new stock
surrounded by &lt;code&gt;[&lt;/code&gt; and &lt;code&gt;]&lt;/code&gt;. Specifically in this case this is &lt;code&gt;[Candidates]&lt;/code&gt;,
if we wanted a fixed size stock with 100 people in it, we could have initialized it as &lt;code&gt;Candidates(100)&lt;/code&gt;.
Depending on what you&amp;rsquo;re modeling both options are useful.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With those in mind, our initial model is defined as:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;HiringRate(2)
[Candidates] &amp;gt; SWE1(10) @ HiringRate
SWE1 &amp;gt; DepartedSWE1 @ Leak(0.1)
DepartedSWE1 &amp;gt; SWE1 @ Leak(0.5)
Candidates &amp;gt; SWE2(10) @ HiringRate
SWE1 &amp;gt; SWE2 @ Leak(0.1)
SWE2 &amp;gt; DepartedSWE2 @ Leak(0.1)
DepartedSWE2 &amp;gt; SWE2 @ Leak(0.5)
Candidates &amp;gt; SWE3(10) @ HiringRate
SWE2 &amp;gt; SWE3 @ Leak(0.1)
SWE3 &amp;gt; DepartedSWE3 @ Leak(0.1)
DepartedSWE3 &amp;gt; SWE3 @ Leak(0.5)
Candidates &amp;gt; SWE4(0) @ HiringRate
SWE3 &amp;gt; SWE4 @ Leak(0.1)
SWE4 &amp;gt; DepartedSWE4 @ Leak(0.1)
DepartedSWE4 &amp;gt; SWE4 @ Leak(0.5)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;To confirm that we&amp;rsquo;ve done something reasonable, we can model this using Graphviz.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/eng-costs-model-1.png" alt="GraphViz representation of systems model"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;GraphViz representation of systems model&lt;/p&gt;
&lt;p&gt;That looks like the same model we sketched before, without the downlevel backfill flows
that we haven&amp;rsquo;t yet added to the model, so we&amp;rsquo;re in a good spot.&lt;/p&gt;
&lt;p&gt;With that confirmed, let&amp;rsquo;s inspect the four distinct flows happening for the SWE2 stock.
In order they are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;External candidates being hired at the SWE2 level, at the fixed &lt;code&gt;HiringRate&lt;/code&gt;
defined here as 2 hires per round&lt;/li&gt;
&lt;li&gt;SWE1s being promoted to SWE2 at a 10% rate. This is a leak because someone being promoted
to SWE2 doesn&amp;rsquo;t mean the other SWE1s disappear&lt;/li&gt;
&lt;li&gt;SWE2s who are leaving the company at a 10% rate&lt;/li&gt;
&lt;li&gt;Backfill hires of departed SWE2s, who are rehired at the same level&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Running that model, we can see how the populations of the various levels grow over time.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/eng-costs-model-2.png" alt="Ratio of engineers at senior-most level becomes increasingly heavy over time"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Ratio of engineers at senior-most level becomes increasingly heavy over time&lt;/p&gt;
&lt;p&gt;Alright, so we can tell that this backfill at level policy is pretty inefficient, because our organization
just becomes more and more top-heavy with SWE4s over time. Something needs to change.&lt;/p&gt;
&lt;h2 id="backfill-at-n-1"&gt;Backfill at N-1&lt;/h2&gt;
&lt;p&gt;To reduce the number of SWE4s in our company, let&amp;rsquo;s update the model to backfill all hires at the level
below the departed employee. For example, a departing SWE2 would cause hiring a SWE1.
This specifically means replacing all these lines:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;DepartedSWE2 &amp;gt; SWE2 @ Leak(0.5)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;To instead hire into the prior level.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;DepartedSWE2 &amp;gt; SWE1 @ Leak(0.5)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The one exception is that SWE1s are still backfilled as SWE1s: as it&amp;rsquo;s the junior-most level,
there&amp;rsquo;s no lower level to backfill into.&lt;/p&gt;
&lt;p&gt;Running this updated model, we get a better looking organization.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/eng-costs-model-3.png" alt="N-1 backfill policy without overall hiring cap"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;N-1 backfill policy without overall hiring cap&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re still top-heavy, but we&amp;rsquo;ve turned an exponential growth problem into a linear growth problem,
so that&amp;rsquo;s an improvement. However, this is still a very expensive engineering organization to run,
and certainly &lt;em&gt;not&lt;/em&gt; an organization that&amp;rsquo;s reducing costs.&lt;/p&gt;
&lt;h2 id="no-hiring"&gt;No hiring&lt;/h2&gt;
&lt;p&gt;One reason our model shows so many SWE4s is because we&amp;rsquo;re hiring at an even rate across all levels,
which isn&amp;rsquo;t particularly realistic. Also, it&amp;rsquo;s unlikely that we&amp;rsquo;re growing headcount at all to the extent
that we&amp;rsquo;re aiming to reduce our engineering costs over time.&lt;/p&gt;
&lt;p&gt;We can model this by setting a &lt;code&gt;HiringRate&lt;/code&gt; of zero, and then setting more representative initial
values for each cohort of engineers (note that I&amp;rsquo;m only showing the changed lines,
check &lt;a href="https://github.com/lethain/eng-strategy-models/blob/main/BackfillPolicy.ipynb"&gt;on github&lt;/a&gt;
for the full model):&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;HiringRate(0)
[Candidates] &amp;gt; SWE1(100) @ HiringRate
Candidates &amp;gt; SWE2(100) @ HiringRate
Candidates &amp;gt; SWE3(100) @ HiringRate
Candidates &amp;gt; SWE4(10) @ HiringRate
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Now we&amp;rsquo;re starting out with 100 SWE1s, SWE2, and SWE3s.
We have a smaller cohort of SWE4s, with just ten initially.
Running the model gives us a updated perspective.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/eng-costs-model-4.png" alt="Implementing an N-1 backfill policy prevents unbounded increase of rate of senior-most engineers"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Implementing an N-1 backfill policy prevents unbounded increase of rate of senior-most engineers&lt;/p&gt;
&lt;p&gt;We can see that eliminating hiring &lt;em&gt;improves&lt;/em&gt; the ratio of SWE4s to the other levels, but it&amp;rsquo;s still just too
high. We&amp;rsquo;re ending up with roughly 1.25 SWE1s for each SWE4, when the ratio should be closer to five to one.&lt;/p&gt;
&lt;h2 id="capped-size-of-swe4s"&gt;Capped size of SWE4s&lt;/h2&gt;
&lt;p&gt;Finally, we&amp;rsquo;re going to introduce a stock with a maximum size. No matter what flows &lt;em&gt;want&lt;/em&gt; to accomplish,
they cannot grow a flow over that maximum. In this case, we&amp;rsquo;re defining &lt;code&gt;SWE4&lt;/code&gt; as a stock with an initial
size of 10, and a maximum size of 20.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;SWE4(10, 20)
Candidates &amp;gt; SWE4 @ HiringRate
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This could also be combined into a one-liner, although it&amp;rsquo;s potentially easy to miss in that case:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;Candidates &amp;gt; SWE4(10, 20) @ HiringRate
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;With that one change, we&amp;rsquo;re getting close to an engineering organization that works how we want.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/eng-costs-model-5.png" alt="N-1 backfill policy and capping number of engineers at senior-most level"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;N-1 backfill policy and capping number of engineers at senior-most level&lt;/p&gt;
&lt;p&gt;The ratio of SWE4s to other functions is right, although we can see that the backpressure means that we have
a surplus of SWE3s in this organization. You could imagine other policy work that might improve that as well,
e.g. presumably more SWE3s depart than SWE2s, because the SWE3s see their ability to be promoted is capped by
the departure rate of existing SWE4s. However, I think we&amp;rsquo;ve already learned quite a bit from this model,
so I&amp;rsquo;m going to end modeling here.&lt;/p&gt;</description></item><item><title>Modeling driving onboarding.</title><link>https://craftingengstrategy.com/llm-onboarding-model/</link><pubDate>Sat, 19 Oct 2024 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/llm-onboarding-model/</guid><description>&lt;p&gt;The &lt;a href="https://craftingengstrategy.com/llm-adoption-strategy/"&gt;How should you adopt LLMs?&lt;/a&gt; strategy explores how Theoretical Ride Sharing
might adopt LLMs. It builds on several models, the first is about &lt;a href="https://craftingengstrategy.com/llm-adoption-model/"&gt;LLMs impact on Developer Experience&lt;/a&gt;.
The second model, documented here, looks at whether LLMs might improve a core product and business problem: maximizing
active drivers on their ridesharing platform.&lt;/p&gt;
&lt;p&gt;In this chapter, we&amp;rsquo;ll cover:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Where the model of ridesharing drivers identifies opportunities for LLMs&lt;/li&gt;
&lt;li&gt;How the model was sketched and developed using &lt;a href="https://github.com/lethain/systems"&gt;lethain/systems&lt;/a&gt; package on Github&lt;/li&gt;
&lt;li&gt;Exercising this model to learn from it&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let&amp;rsquo;s get started.&lt;/p&gt;
&lt;h2 id="learnings"&gt;Learnings&lt;/h2&gt;
&lt;p&gt;An obvious assumption is making driver onboarding faster would increase the long-term number
of drivers in a market. However, this model shows that even doubling the rate that we qualify applicant drivers
as eligible has little impact on active drivers over time.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/llm-ride-results-2.png" alt="Speeding up onboarding doesn&amp;rsquo;t impact active drivers in the long-term"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Speeding up onboarding doesn't impact active drivers in the long-term&lt;/p&gt;
&lt;p&gt;Conversely, it&amp;rsquo;s clear that efforts to reengage departed drivers has a significant impact
on active drivers. We believe that there are potential LLM applications that could encourage
departed drivers to return to active driving, for example mapping their rationale for departing
against our recent product changes and driver retention promotions could generate high quality,
personalized emails.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/llm-ride-results-4.png" alt="Improving driver reengagement does increase active drivers"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Improving driver reengagement does increase active drivers&lt;/p&gt;
&lt;p&gt;Finally, the model shows that increasing either reactivation of departed or suspended drivers is significantly
less impactful than increasing both. If either rate is low, we lose an increasingly large number of drivers over time.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/llm-ride-results-5.png" alt="Increasing reactivation rate of suspend drivers has highest impact"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Increasing reactivation rate of suspend drivers has highest impact&lt;/p&gt;
&lt;p&gt;The only meaningful opportunities for us to increase active drivers with LLMs are improving those two reactivation rates.&lt;/p&gt;
&lt;h2 id="sketch"&gt;Sketch&lt;/h2&gt;
&lt;p&gt;The first step in modeling a system is sketching it (using &lt;a href="https://excalidraw.com/"&gt;Excalidraw&lt;/a&gt; here).
Here we&amp;rsquo;re developing a model for onboarding and retaining drivers for a ridesharing application in one city.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/llm-ride-model-1.png" alt="Sketch of onboarding drivers onto a ride-sharing application"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Sketch of onboarding drivers onto a ride-sharing application&lt;/p&gt;
&lt;p&gt;The stocks are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;City Population&lt;/code&gt; is the total population of a city&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Applied Drivers&lt;/code&gt; are the number of people who&amp;rsquo;ve applied to be drivers&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Eligible Drivers&lt;/code&gt; are the number of applied drivers who meet eligibility criteria (e.g. provided a current drivers license, etc)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Onboarded Drivers&lt;/code&gt; are eligible drivers who have successfully gone through an onboarding program&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Active Drivers&lt;/code&gt; are onboarded drivers who are actually performing trips on a weekly basis&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Departed Drivers&lt;/code&gt; were active drivers, but voluntarily stopped performing trips (e.g. took a different job)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Suspended Drivers&lt;/code&gt; were active drivers, but involuntarily stopped performing trips (e.g. are no longer allowed to drive on platform)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Looking at the left-to-right flows, there is a flow from each of those stocks to the following stock in the pipeline.
These are all simple one-to-one flows, with the exception of those coming from
&lt;code&gt;Active Drivers&lt;/code&gt; leads to two distinct stocks: &lt;code&gt;Departed Drivers&lt;/code&gt; and &lt;code&gt;Suspended Drivers&lt;/code&gt;.
These represent voluntary and involuntary departures.&lt;/p&gt;
&lt;p&gt;There are a handful of right-to-left, exception path flows to consider as well:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;Request missing information&lt;/code&gt; represents a driver who can&amp;rsquo;t be moved from &lt;code&gt;Applied Drivers&lt;/code&gt; to &lt;code&gt;Eligible Drivers&lt;/code&gt;
because their provided information proved insufficient in a review process&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Re-engage&lt;/code&gt; tracks &lt;code&gt;Departed Drivers&lt;/code&gt; who have decided to start driving again, perhaps
because of a bonus program for drivers who start driving again&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Remove suspension&lt;/code&gt; refers to drivers who were involuntarily removed, but who are now
allowed to return to driving&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is a fairly basic model, but let&amp;rsquo;s see what we can learn from it.&lt;/p&gt;
&lt;h2 id="reason"&gt;Reason&lt;/h2&gt;
&lt;p&gt;Now that we&amp;rsquo;ve sketched the system, we can start thinking about which flows are going to have the largest impact,
and where an LLM might increase those flows. Some observations from reasoning about it:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;If a city&amp;rsquo;s population is infinite, then what really matters in this model
is how many new drivers we can encourage to join the system.
On the other hand, if a city&amp;rsquo;s population is finite,
then onboarding new drivers will be essential in the early stages of coming
online in any particular city, but long-term reengaging departed drivers
is probably at least as important.&lt;/li&gt;
&lt;li&gt;LLMs tooling could speed up validating eligible drivers. If we speed that process up enough,
we could greatly reduce the rate of the &lt;code&gt;Request missing information&lt;/code&gt; flow by identifying
missing information in real-time rather than requiring a human to review the information later.&lt;/li&gt;
&lt;li&gt;We could potentially develop LLM tooling to craft personalized messaging to &lt;code&gt;Departed Drivers&lt;/code&gt;,
that explains which of our changes since their departure might be most relevant to their reasons
for stopping. This could increase the rate of the &lt;code&gt;Re-engage&lt;/code&gt; flow&lt;/li&gt;
&lt;li&gt;While we likely wouldn&amp;rsquo;t want an LLM approving the removal of suspensions, we could have it look
at requests to be revalidated, and identify promising requests to focus human attention on
the highest potential for approval.&lt;/li&gt;
&lt;li&gt;We could build LLM-powered tooling that helps a city resident decide whether they should apply
to become a driver by answering questions they might have.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As we exercise the model later, we know that our assumptions about whether this city has already exhausted
potential drivers will quickly steer us towards a specific subset of these potential options.
If all potential drivers are already tapped, only work to reactivate prior drivers that will matter.
If there are more potential drivers, then likely activating them will be a better focus.&lt;/p&gt;
&lt;h2 id="model"&gt;Model&lt;/h2&gt;
&lt;p&gt;For this model, we&amp;rsquo;ll be modeling it using the &lt;a href="https://github.com/lethain/systems"&gt;lethain/systems&lt;/a&gt; library that I wrote.
For a more detailed introduction, I recommend working through &lt;a href="https://github.com/lethain/systems/blob/master/README.md"&gt;the tutorial in the repository&lt;/a&gt;,
but I&amp;rsquo;ll introduce the basics here as well.
While &lt;code&gt;systems&lt;/code&gt; is far from a perfect tool, as you experiment with different modeling techniques
like &lt;a href="https://craftingengstrategy.com/llm-adoption-model/"&gt;spreadsheet-based modeling&lt;/a&gt; and &lt;a href="https://sagemodeler.concord.org/"&gt;SageModeler&lt;/a&gt;,
I think this approach&amp;rsquo;s emphasis on rapid development and reproducible, sharable models is somewhat unique.&lt;/p&gt;
&lt;p&gt;If you want to see the finished model,
you can find the model and visualizations in &lt;a href="https://github.com/lethain/eng-strategy-models/blob/main/DriverOnboarding.ipynb"&gt;the Jupyterhub notebook in lethain:eng-strategy-models&lt;/a&gt;.
Here we&amp;rsquo;ll work through the steps behind implementing that model.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll start by creating a stock for the city&amp;rsquo;s population,
with an initial size of 10,000.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# City population is 10,000
CityPop(10000)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Next, we want to initialize the applied drivers stock,
and specify a constant rate of 100 people in the city applying
to become drivers each round. This will only happen until
the 10,000 potential drivers in the city are exhausted,
at which point there will be no one left to apply.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# 100 folks apply to become drivers per round
# the @ 100 format is called a &amp;#34;rate&amp;#34; flow
CityPop &amp;gt; AppliedDrivers @ 100
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Now we want to initialize the eligible drivers stock,
and specify that 25% of the folks in applied drivers
will advance to become eligible each round.&lt;/p&gt;
&lt;p&gt;Before we used &lt;code&gt;@ 100&lt;/code&gt; to specify a fixed rate.
Here we&amp;rsquo;re using &lt;code&gt;@ Leak(0.25)&lt;/code&gt; to specify the idea
of 25% of the folks in applied drivers advancing into eligible driver.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# 25% of applied drivers become eligible each round
AppliedDrivers &amp;gt; EligibleDrivers @ Leak(0.25)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You could write this as &lt;code&gt;@ 0.25&lt;/code&gt;, but you&amp;rsquo;d actually get different behavior,
That&amp;rsquo;s because &lt;code&gt;@ 0.25&lt;/code&gt; is actually short-hand for &lt;code&gt;@ Conversion(0.25)&lt;/code&gt;,
which is similar to a leak but destroys the unconverted portion.&lt;/p&gt;
&lt;p&gt;Using an example to show the difference, let&amp;rsquo;s imagine that
we have 100 applied drivers and 100 eligible drivers, and then
see the consequences of applying a leak versus a conversion:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Leak(0.25)&lt;/code&gt; would end with 75 applied drivers and 125 eligible drivers&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Conversion(0.25)&lt;/code&gt; would end with 0 applied drivers and 125 eligible drivers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Depending on what you are modeling, you might need leaks, conversions or both.&lt;/p&gt;
&lt;p&gt;Moving on, next we model out first right-to-left flow.
Specifically, the request missing information flow where some eligible drivers end up
not being eligible because they need to provide more information.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# This is &amp;#34;Request missing information&amp;#34;, with 10%
# of folks moving backwards each round
EligibleDrivers &amp;gt; AppliedDrivers @ Leak(0.1)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Note that the syntax for left-to-right and right-to-left flows
is identical, without making a distinction.&lt;/p&gt;
&lt;p&gt;Now, 25% of eligible drivers become onboarded drivers each round.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# 25% of eligible drivers onboard each round
EligibleDrivers &amp;gt; OnboardedDrivers @ Leak(0.25)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Then 50% of onboarded drivers become active drivers,
actually providing rides.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# 50% of onboarded drivers become active
OnboardedDrivers &amp;gt; ActiveDrivers @ Leak(0.50)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The active drivers stock is drained by two flows:
drivers who voluntarily depart become departed drivers,
and drivers who are suspended become suspended drivers.
Both flows take 10% of active drivers each round.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# 10% of active drivers depart voluntarily and involuntarily
ActiveDrivers &amp;gt; DepartedDrivers @ Leak(0.10)
ActiveDrivers &amp;gt; SuspendedDrivers @ Leak(0.10)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Finally, we also see 5% of departed drivers returning to driving
each round. Similarly, we unsuspend 1% of suspended drivers.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# 5% of DepartedDrivers become active
DepartedDrivers &amp;gt; ActiveDrivers @ Leak(0.05)
# 1% of SuspendedDrivers are reactivated
SuspendedDrivers &amp;gt; ActiveDrivers @ Leak(0.01)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;We already sketched this model out earlier, but it&amp;rsquo;s worth noting
that &lt;code&gt;systems&lt;/code&gt; will allow you to export models via &lt;a href="https://graphviz.org/"&gt;Graphviz&lt;/a&gt;.
These diagrams are generally harder to read than a custom drawn one,
but it&amp;rsquo;s certainly possible to use this toolchain to combine sketching
and modeling into a single step.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/llm-ride-model-2.png" alt="GraphViz representation of systems model"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;GraphViz representation of systems model&lt;/p&gt;
&lt;p&gt;Now that we have the model, we can get to exercise it to learn its secrets.&lt;/p&gt;
&lt;h2 id="exercise"&gt;Exercise&lt;/h2&gt;
&lt;p&gt;Our base model acquires initial drivers quickly, then slows as city population is exhausted.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/llm-ride-results-1.png" alt="Base onboarding model stabilizing around 800 active drivers"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Base onboarding model stabilizing around 800 active drivers&lt;/p&gt;
&lt;p&gt;Now let&amp;rsquo;s imagine that our LLM-powered tool can speed up eligible drivers, doubling the speed that we move
applied drivers to eligible drivers. Instead of 25% of applied drivers becoming eligible each round,
we&amp;rsquo;ll instead see 50%.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# old
AppliedDrivers &amp;gt; EligibleDrivers @ Leak(0.25)
# new
AppliedDrivers &amp;gt; EligibleDrivers @ Leak(0.50)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Unfortunately, we can see that even doubling the speed at which we&amp;rsquo;re onboarding drivers to eligible
has a minimal impact.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/llm-ride-results-2.png" alt="Speeding up onboarding doesn&amp;rsquo;t impact active drivers in the long-term"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Speeding up onboarding doesn't impact active drivers in the long-term&lt;/p&gt;
&lt;p&gt;To finish testing this hypothesis, we can eliminate the &lt;code&gt;Request missing information&lt;/code&gt; flow entirely
and see if this changes things meaningfully, commenting out that line.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/llm-ride-results-3.png" alt="Eliminating request missing information error doesn&amp;rsquo;t impact active drivers long-term"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Eliminating request missing information error doesn't impact active drivers long-term&lt;/p&gt;
&lt;p&gt;Unfortunately, even eliminating the missing information rate has little impact on the number of active drivers.
So it seems like the opportunity for our LLM solutions to increase active drivers are going to need to focus
on reactivating existing drivers.&lt;/p&gt;
&lt;p&gt;Specifically, let&amp;rsquo;s go from 5% of departed drivers reactivating to 20%.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# 20% of DepartedDrivers become active
# DepartedDrivers &amp;gt; ActiveDrivers @ Leak(0.05)
# DepartedDrivers &amp;gt; ActiveDrivers @ Leak(0.2)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For the first time, we&amp;rsquo;re seeing a significant shift in impact.
We reach a much higher percentage of drivers at peak, and even after we
exhaust all drivers in a city, the total number of active reaches a higher
equilibrium.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/llm-ride-results-4.png" alt="Improving driver reengagement does increase active drivers"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Improving driver reengagement does increase active drivers&lt;/p&gt;
&lt;p&gt;Presumably increasing the rate that we reactivate suspended drivers from 1% to 2.5% would
have a similar, meaningful but smaller impact on active drivers over time.
So let&amp;rsquo;s model that change.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# 2.5% of SuspendedDrivers are reactivated
#SuspendedDrivers &amp;gt; ActiveDrivers @ Leak(0.01)
SuspendedDrivers &amp;gt; ActiveDrivers @ Leak(0.025)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;However, surprisingly, the impact of increasing the reactivation rate of suspended drivers is
actually much higher than reengaging departed drivers.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/llm-ride-results-5.png" alt="Increasing reactivation rate of suspend drivers has highest impact"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Increasing reactivation rate of suspend drivers has highest impact&lt;/p&gt;
&lt;p&gt;This is an interesting, and somewhat counter-intuitive result.
Increasing the rate for both suspended and departed rates is more impactful
than increasing either, because ultimately there&amp;rsquo;s a growing population of drivers
in the slower deflating stock.
This means, surprisingly, that a tool that helps us quickly determine which drivers
could be unsuspended might matter more than the small size of the flow indicates.&lt;/p&gt;
&lt;p&gt;At this point, we&amp;rsquo;ve probably found the primary story that this model wants to tell us:
we should focus efforts on reactivating departed and suspended drivers.
Changes elsewhere might reduce operational costs of our business, but they won&amp;rsquo;t
solve the problem of increasing active drivers.&lt;/p&gt;</description></item><item><title>Modeling impact of LLMs on Developer Experience.</title><link>https://craftingengstrategy.com/llm-adoption-model/</link><pubDate>Sun, 06 Oct 2024 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/llm-adoption-model/</guid><description>&lt;p&gt;In &lt;a href="https://craftingengstrategy.com/llm-adoption-strategy/"&gt;How should you adopt Large Language Models?&lt;/a&gt; (LLMs), we considered how
LLMs might impact a company&amp;rsquo;s developer experience. To support that exploration, I&amp;rsquo;ve developed a &lt;a href="https://craftingengstrategy.com/systems-modeling/"&gt;system model&lt;/a&gt;
of the software development process at the company.&lt;/p&gt;
&lt;p&gt;In this chapter, we&amp;rsquo;ll work through:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Summary results from this model&lt;/li&gt;
&lt;li&gt;How the model was developed, both sketching and building the model in a spreadsheet.
(As discussed in &lt;a href="https://craftingengstrategy.com/systems-modeling/"&gt;the overview of systems modeling&lt;/a&gt;,
I generally would recommend against using spreadsheets to develop most models,
but it&amp;rsquo;s educational to attempt doing so once or twice.)&lt;/li&gt;
&lt;li&gt;Exercise the model to see what it has to teach us&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let&amp;rsquo;s get into it.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in&lt;/em&gt;
&lt;em&gt;&lt;a href="https://lethain.com/tags/eng-strategy-book/"&gt;#eng-strategy-book&lt;/a&gt;.&lt;/em&gt;
&lt;em&gt;As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="learnings"&gt;Learnings&lt;/h2&gt;
&lt;p&gt;This model&amp;rsquo;s insights can be summarized in three charts.
First, the baseline chart, which shows an eventual equilibrium between errors discovered
in production and tickets that we&amp;rsquo;ve closed by shipping to production.
This equilibrium is visible because tickets continue to get opened, but the total number of
closed tickets stops increasing.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-chart-1.png" alt="Baseline chart where closed tickets quickly stops increasing"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Baseline chart where closed tickets quickly stops increasing&lt;/p&gt;
&lt;p&gt;Second, we show that we can shift that equilibrium by reducing the error rate in production.
Specifically, the first chart models 25% of closed tickets in production experiencing an error,
whereas the second chart models only a 10% error rate.
The equilibrium returns, but at a higher value of shipped tickets.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-chart-2.png" alt="Reduced error rates delay, but don&amp;rsquo;t prevent, reaching equilibrium of closed tickets"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Reduced error rates delay, but don't prevent, reaching equilibrium of closed tickets&lt;/p&gt;
&lt;p&gt;Finally, we can see that even tripling the rate that we start and test tickets
doesn&amp;rsquo;t meaningfully change the total number of completed tickets,
as modeled in this third chart.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-chart-4.png" alt="Starting or testing tickets faster creates noise but not progress"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Starting or testing tickets faster creates noise but not progress&lt;/p&gt;
&lt;p&gt;The constraint on this system is errors discovered in production,
and any technique that changes something else doesn&amp;rsquo;t make much of an impact.
Of course, this is just &lt;em&gt;a model&lt;/em&gt;, not reality. There are many nuances
that models miss, but this helps us focus on what probably matters the most,
and in particular highlights that any approach that increases development velocity
while also increasing production error rate is likely net-negative.&lt;/p&gt;
&lt;p&gt;With that summary out of the way, now we can get into developing the model itself.&lt;/p&gt;
&lt;h2 id="sketch"&gt;Sketch&lt;/h2&gt;
&lt;p&gt;Modeling in a spreadsheet is labor intensive, so we want to iterate as much as possible
in the sketching phase, before we move to the spreadsheet.
In this case, we&amp;rsquo;re working with &lt;a href="https://excalidraw.com/"&gt;Excalidraw&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/llm-dx-model-1.png" alt="Five stages of development, with errors causing backwards movement"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Five stages of development, with errors causing backwards movement&lt;/p&gt;
&lt;p&gt;I sketched five stocks to represent a developer&amp;rsquo;s workflow:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;Open Tickets&lt;/code&gt; is tickets opened for an engineer to work on&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Start Coding&lt;/code&gt; is tickets that an engineer is working on&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Tested Code&lt;/code&gt; is tickets that have been tested&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Deployed Code&lt;/code&gt; is tickets than have been deployed&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Closed Ticket&lt;/code&gt; is tickets that are closed after reaching production&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There are four flows representing tickets progressing through this development process from left to right.
Additionally, there are three exception flows that move from right to left:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;Testing found error&lt;/code&gt; represents a ticket where testing finds an error, moving the ticket backwards to &lt;code&gt;Start Coding&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Deployment exposed error&lt;/code&gt; represents a ticket encountering an error during deployment, where it&amp;rsquo;s moved backwards to &lt;code&gt;Start Coding&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Error found in production&lt;/code&gt; represents a ticket encountering a production error, which causes it to move all the way back to the beginning as a new ticket&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;One of your first concerns seeing this model might be that it&amp;rsquo;s embarrassingly simple.
To be honest, that was my reaction when I first looked at it, too.
However, it&amp;rsquo;s important to recognize that feeling and then dig into whether it matters.&lt;/p&gt;
&lt;p&gt;This model is quite simple, but in the next section we&amp;rsquo;ll find that it reveals several
counter-intuitive insights into the problem that will help us avoid erroneously viewing the
tooling as a failure if time spent testing increases.
The value of a model is in refining our thinking, and simple models are usually more effective
at refining thinking across a group than complex models, simply because complex models are fairly
difficult to align a group around.&lt;/p&gt;
&lt;h2 id="reason"&gt;Reason&lt;/h2&gt;
&lt;p&gt;As we start to look at this sketch, the first question to ask is how might LLM-based tooling show an improvement?
The most obvious options are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Increasing the rate that tasks flow from &lt;code&gt;Starting coding&lt;/code&gt; to &lt;code&gt;Tested code&lt;/code&gt;.
Presumably these tools might reduce the amount of time spent on implementation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Increasing the rate that &lt;code&gt;Tested code&lt;/code&gt; follows &lt;code&gt;Testing found errors&lt;/code&gt; to return to &lt;code&gt;Starting code&lt;/code&gt;
because more comprehensive tests are more likely to detect errors.
This is probably the first interesting learning from this model: if the adopted tool works well, it&amp;rsquo;s likely that we&amp;rsquo;ll spend
&lt;em&gt;more&lt;/em&gt; time in the testing loop, with a long-term payoff of spending less time solving problems in production where it&amp;rsquo;s more expensive.
This means that slower testing might be a successful outcome rather than a failure as it might first appear.&lt;/p&gt;
&lt;p&gt;A skeptic of these tools might argue the opposite, that LLM-based tooling will cause more issues to be identified &amp;ldquo;late&amp;rdquo;
after deployment rather than early in the testing phase. In either case, we now have a clear goal to measure to evaluate
the effectiveness of the tool: reducing the &lt;code&gt;Error found in production&lt;/code&gt; flow. We also know &lt;em&gt;not&lt;/em&gt; to focus on the
&lt;code&gt;Testing found error&lt;/code&gt; flow, which should probably increase.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Finally, we can also zoom out and measure the overall time from &lt;code&gt;Start Coding&lt;/code&gt; to &lt;code&gt;Closed Ticket&lt;/code&gt; for tasks
that don&amp;rsquo;t experience the &lt;code&gt;Error found in production&lt;/code&gt; flow for at least the first 90 days after being completed.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These observations capture what I find remarkable about systems modeling: even a very simple model
can expose counter-intuitive insights. In particular, the sort of insights that build conviction to push
back on places where intuition might lead you astray.&lt;/p&gt;
&lt;h2 id="model"&gt;Model&lt;/h2&gt;
&lt;p&gt;For this model, we&amp;rsquo;ll be modeling it directly in a spreadsheet, specifically Google Sheets.
The completed spreadsheet model &lt;a href="https://docs.google.com/spreadsheets/d/1YAego3JiNCUE15GeL_3GQfYmrE1jG9dVF6yzu-mAxLw/edit?gid=1325089804#gid=1325089804"&gt;is available here&lt;/a&gt;.
As discussed in &lt;a href="https://craftingengstrategy.com/systems-modeling/"&gt;Systems modeling to refine strategy&lt;/a&gt;, spreadsheet modeling
is brittle, slow and hard to iterate on. I generally recommend that folks attempt to model something in a spreadsheet
to get an intuitive sense of the math happening in their models, but I would almost always choose any tool other
than a spreadsheet for a complex model.&lt;/p&gt;
&lt;p&gt;This example is fairly tedious to follow, and you&amp;rsquo;re entirely excused if you decide to pull open the sheet itself,
look around a bit, and then skip the remainder of this section.
If you are hanging around, it&amp;rsquo;s time to get started.&lt;/p&gt;
&lt;p&gt;The spreadsheet we&amp;rsquo;re creating has three important worksheets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Model&lt;/em&gt; represents the model itself&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Charts&lt;/em&gt; holds charts of the model&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Config&lt;/em&gt; holds configuration values separately from the model to ease exercising the model after we&amp;rsquo;ve built it&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Going to the model worksheet, we want to start out by initializing each of the columns to the starting value.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-model-screeshot-0.png" alt="Initial values of a systems model"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Initial values of a systems model&lt;/p&gt;
&lt;p&gt;While we&amp;rsquo;ll use formulae for subsequent rows, the first row should contain literal values. I often start with
a positive value in the first column and zeros in the other columns, but that isn&amp;rsquo;t required.
You can start with whatever starting values are more useful for studying the model that you&amp;rsquo;re building.&lt;/p&gt;
&lt;p&gt;With the initial values set, we&amp;rsquo;re now going to implement the model in two passes.
First, we&amp;rsquo;ll model the left-to-right flows, which represent the standard development process.
Second, we&amp;rsquo;ll model the right-to-left flows, which represent exceptions in the process.&lt;/p&gt;
&lt;h3 id="modeling-left-to-right"&gt;Modeling left-to-right&lt;/h3&gt;
&lt;p&gt;We&amp;rsquo;ll start by modeling the interaction between the first two nodes: &lt;code&gt;Open Tickets&lt;/code&gt; and &lt;code&gt;Started Coding&lt;/code&gt;.
We want to have open tickets increased over time at a fixed rate, so let&amp;rsquo;s add a value in the config worksheet
for &lt;code&gt;TicketOpenRate&lt;/code&gt;, starting with &lt;code&gt;1&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Moving to the second stock, we want to start work on open tickets as long as we have at most &lt;code&gt;MaxConcurrentCodingNum&lt;/code&gt; open tickets.
If we have more than &lt;code&gt;MaxConcurrentCodingNum&lt;/code&gt; tickets that we&amp;rsquo;re working on, then we don&amp;rsquo;t start working on any new tickets.
To do this, we actually need to create an intermediate value (represented using an italics column name) to determine how
many should be created by checking if the current in started tickets is at maximum (another value in the config sheet)
or if we should increment that by one.&lt;/p&gt;
&lt;p&gt;That looks like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// Config!$B$3 is max started tickets
// Config!$B$2 is rate to increment started tickets
// $ before a row or column, e.g. $B$3 means that the row or column
// always stays the same -- not incrementing -- even when filled
// to other cells
= IF(C2 &amp;gt;= Config!$B$3, 0, Config!$B$2)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This also means that our first column, for &lt;code&gt;Open Tickets&lt;/code&gt; is decremented by the number of tickets that
we&amp;rsquo;re started coding:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// This is the definition of `Open Tickets`
=A2 + Config!$B$1 - B2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Leaving us with these values.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-model-screeshot-1.png" alt="Open tickets, StartCodingMore?, and Started Coding columns in spreadsheet model"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Open tickets, StartCodingMore?, and Started Coding columns in spreadsheet model&lt;/p&gt;
&lt;p&gt;Now we want to determine the number of tickets being tested at each step in the model.
To do this, we create a calculation column, &lt;code&gt;NumToTest?&lt;/code&gt; which is defined as:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// Config$B$4 is the rate we can start testing tickets
// Note that we can only start testing tickets if there are tickets
// in `Started Coding` that we're able to start testing
=MIN(Config!$B$4, C3)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We then add that value to the previous number of tickets being tested.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// E2 is prior size of the Tested Code stock
// D3 is the value of `NumToTest?`
// F2 is the number of tested tickets to deploy
=E2 + D3 - F2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-model-screeshot-2.png" alt="Started Coding, NumToTest?, and Tested Code columns in spreadsheet model"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Started Coding, NumToTest?, and Tested Code columns in spreadsheet model&lt;/p&gt;
&lt;p&gt;Spreadsheet showing three columns of systems modeling&lt;/p&gt;
&lt;p&gt;Moving on to deploying code, let&amp;rsquo;s keep things simple and start out by assuming that every tested change
is going to get deployed. That means the calculation for &lt;code&gt;NumToDeploy?&lt;/code&gt; is quite simple:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// E3 is the number of tested changes
=E3
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then the value for the &lt;code&gt;Deployed Code&lt;/code&gt; stock is simple as well:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// G2 is the prior size of Deployed Code
// F3 is NumToDeploy?
// H2 is the number of deployed changes in prior round
=G2+F3-H2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-model-screeshot-3.png" alt="NumToDeploy? and Deployed Code columns in spreadsheet model"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;NumToDeploy? and Deployed Code columns in spreadsheet model&lt;/p&gt;
&lt;p&gt;Now we&amp;rsquo;re on to the final stock.
We add the &lt;code&gt;NumToClose?&lt;/code&gt; calculation, which assumes that all deployed changes are now closed.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// G3 is the number of deployed changes
=G3
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This makes the calculation for the &lt;code&gt;Closed Tickets&lt;/code&gt; stock:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// I2 is the prior value of Closed Tickets
// H3 is the NumToClose?
=I2 + H3
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With that, we&amp;rsquo;ve now modeled the entire left-to-right flows.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-model-screeshot-4.png" alt="Entirety of left-to-right flows in spreadsheet model"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Entirety of left-to-right flows in spreadsheet model&lt;/p&gt;
&lt;p&gt;The left-to-right flows are simple, with a few constrained flows and a very scalable flows, but overall we see things
progressing through the pipeline evenly.
All that is about to change!&lt;/p&gt;
&lt;h3 id="modeling-right-to-left"&gt;Modeling right-to-left&lt;/h3&gt;
&lt;p&gt;We&amp;rsquo;ve now finished modeling the happy path from left to right.
Next we need to model all the exception paths where things flow right to left.
For example, an issue found in production would cause a flow from &lt;code&gt;Closed Ticket&lt;/code&gt;
back to &lt;code&gt;Open Ticket&lt;/code&gt;.
This tends to be where models get interesting.&lt;/p&gt;
&lt;p&gt;There are three right-to-left flows that we need to model:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;Closed Ticket&lt;/code&gt; to &lt;code&gt;Open Ticket&lt;/code&gt; represents a bug discovered in production.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Deployed Code&lt;/code&gt; to &lt;code&gt;Start Coding&lt;/code&gt; represents a bug discovered during deployment.
3 &lt;code&gt;Tested Code&lt;/code&gt; to &lt;code&gt;Start Coding&lt;/code&gt; represents a bug discovered in testing.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;To start, we&amp;rsquo;re going to add configurations defining the rates of those flows.
These are going to be percentage flows, with a certain percentage of the target stock
triggering the error condition rather than proceeding. For example, perhaps 25% of the
&lt;code&gt;Closed Tickets&lt;/code&gt; are discovered to have a bug each round.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-model-screeshot-5.png" alt="Introducing three additional values in model configuration"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Introducing three additional values in model configuration&lt;/p&gt;
&lt;p&gt;These are fine starter values, and we&amp;rsquo;ll experiment with how adjusting them changes the model in the &lt;em&gt;Exercise&lt;/em&gt; section below.&lt;/p&gt;
&lt;p&gt;Now we&amp;rsquo;ll start by modeling errors discovered in production, by adding a column
to model the flow from &lt;code&gt;Closed Tickets&lt;/code&gt; to &lt;code&gt;Open Tickets&lt;/code&gt;, the &lt;code&gt;ErrorsFoundInProd?&lt;/code&gt; column.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// I3 is the number of Closed Tickets
// Config!$B$5 is the rate of errors
=FLOOR(I3 * Config!$B$5)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note the usage of &lt;code&gt;FLOOR&lt;/code&gt; to avoid moving partial tickets.
Feel free to skip that entirely if you&amp;rsquo;re comfortable with the concept of fractional tickets, fractional deploys, and so on.
This is an aesthetic consideration, and generally only impacts your model if you choose overly small starting values.&lt;/p&gt;
&lt;p&gt;This means that our calculation for &lt;code&gt;Closed Ticket&lt;/code&gt; needs to be
updated as well to reduce by the prior row&amp;rsquo;s result for &lt;code&gt;ErrorsFoundInProd?&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// I2 is the prior value of ClosedTicket
// H3 is the current value of NumToClose?
// J2 is the prior value of ErrorsFoundInProd?
=I2 + H3 - J2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We&amp;rsquo;re not quite done, because we &lt;em&gt;also&lt;/em&gt; need to add the prior row&amp;rsquo;s value of &lt;code&gt;ErrorsInProd?&lt;/code&gt;
into &lt;code&gt;Open Tickets&lt;/code&gt;, which represents the errors&amp;rsquo; flow from closed to open tickets.
Based on this change, the calculation for &lt;code&gt;Open Tickets&lt;/code&gt; becomes:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// A2 is the prior value of Open Tickets
// Config!$B$1 is the base rate of ticket opening
// B2 is prior row's StartCodingMore?
// J2 is prior row's ErrorsFoundInProd?
=A2 + Config!$B$1 - B2 + J2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we have the full errors in production flow represented in our model.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-model-screeshot-6.png" alt="Modeling ErrorsFoundInProd? in spreadsheet model"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Modeling ErrorsFoundInProd? in spreadsheet model&lt;/p&gt;
&lt;p&gt;Next, it&amp;rsquo;s time to add the &lt;code&gt;Deployed Code&lt;/code&gt; to &lt;code&gt;Start Coding&lt;/code&gt; flow.
Start by adding the &lt;code&gt;ErrorsFoundInProd?&lt;/code&gt; calculation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// G3 is deployed code
// Config!$B$6 is deployed error rate
=FLOOR(G3 * Config!$B$6)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then we need to update the calculation for &lt;code&gt;Deployed Code&lt;/code&gt; to decrease by the
calculated value in &lt;code&gt;ErrorsFoundInProd?&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// G2 is the prior value of Deployed Code
// F3 is NumToDeploy?
// H2 is prior row's NumToClose?
// I2 is ErrorsFoundInDeploy?
=G2 + F3 - H2 - I2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, we need to increase the size of &lt;code&gt;Started Coding&lt;/code&gt; by the same value,
representing the flow of errors discovered in deployment:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// C2 is the prior value of Started Coding
// B3 is StartCodingMore?
// D2 is prior value of NumToTest?
// I2 is prior value of ErrorsFoundInDeploy?
=C2 + B3 - D2 + I2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We now have the working flow representing errors in production.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-model-screeshot-7.png" alt="DeployedCode, NumToClose?, and ErrorsFoundInDeploy? columns in spreadsheet model"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;DeployedCode, NumToClose?, and ErrorsFoundInDeploy? columns in spreadsheet model&lt;/p&gt;
&lt;p&gt;Finally, we can added the &lt;code&gt;Tested Code&lt;/code&gt; to &lt;code&gt;Started Coding&lt;/code&gt; flow.
This is pretty much the same as the prior flow we added,
starting with adding a &lt;code&gt;ErrorsFoundInTest?&lt;/code&gt; calculation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// E3 is tested code
// Config!$B$7 is the testing error rate
=FLOOR(E3 * Config!$B$7)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then we update &lt;code&gt;Tested Code&lt;/code&gt; to reduce by this value:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// E2 is prior value of Tested Code
// D3 is NumToTest?
// G2 is prior value of NumToDeploy?
// F2 is prior value of ErrorsFoundInTest?
=E2 + D3 - G2 - F2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And update &lt;code&gt;Started Coding&lt;/code&gt; to increase by this value:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// C2 is prior value of Started Coding
// B3 is StartCodingMore?
// D2 is prior value of NumToTest?
// J2 is prior value of ErrorsFoundInDeploy?
// F2 is prior value of ErrorsFoundInTest?
= C2 + B3 - D2 + J2 + F2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now this last flow is instrumented.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-model-screeshot-8.png" alt="Modeling ErrorsFoundInTest? in spreadsheet model"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Modeling ErrorsFoundInTest? in spreadsheet model&lt;/p&gt;
&lt;p&gt;With that, we now have a complete model that we can start exercising!
This exercise demonstrated both that it&amp;rsquo;s &lt;em&gt;quite possible&lt;/em&gt; to represent
a meaningful model in a spreadsheet, but also the challenges of doing so.&lt;/p&gt;
&lt;p&gt;While developing this model, a number of errors became evident. Some of them
I was able to fix relatively easily, and even more I left unfixed because fixing
them makes the model &lt;em&gt;even harder&lt;/em&gt; to reason about. This is a good example of why
I encourage developing one or two models in a spreadsheet, but I ultimately don&amp;rsquo;t
believe it&amp;rsquo;s the right mechanism to work in for most people:
even very smart people make errors in their spreadsheets, and catching those errors
is exceptionally challenging.&lt;/p&gt;
&lt;h2 id="exercise"&gt;Exercise&lt;/h2&gt;
&lt;p&gt;Now that we&amp;rsquo;re done building this model, we can finally start the fun part: exercising it.
We&amp;rsquo;ll start by creating a simple bar chart showing the size of each stock at each step.
We are going to expressly &lt;em&gt;not&lt;/em&gt; show the intermediate calculation columns such as &lt;code&gt;NumToTest?&lt;/code&gt;,
because those are implementation details rather than particularly interesting.&lt;/p&gt;
&lt;p&gt;Before we start tweaking the values , let&amp;rsquo;s look at the baseline chart.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-chart-1.png" alt="Baseline chart where closed tickets quickly stops increasing"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Baseline chart where closed tickets quickly stops increasing&lt;/p&gt;
&lt;p&gt;The most interesting thing to notice is that our current model doesn&amp;rsquo;t actually increase the number of closed
tickets over time. We actually just get further and further behind over time, which isn&amp;rsquo;t too exciting.&lt;/p&gt;
&lt;p&gt;So let&amp;rsquo;s start modeling the first way that LLMs might help, reducing the error rate in production.
Let&amp;rsquo;s shift &lt;code&gt;ErrorsInProd&lt;/code&gt; from &lt;code&gt;0.25&lt;/code&gt; down to &lt;code&gt;0.1&lt;/code&gt;, and see how that impacts the chart.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-chart-2.png" alt="Reduced error rates delays, but doesn&amp;rsquo;t prevent, reaching equilibrium of closed tickets"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Reduced error rates delays, but doesn't prevent, reaching equilibrium of closed tickets&lt;/p&gt;
&lt;p&gt;We can see that this allows us to make more progress on closing tickets, although
at some point equilibrium is established between closed tickets and the error rate in production,
preventing further progress. This does validate that reducing error rate in production matters.
It also suggests that as long as error rate is a function of everything we&amp;rsquo;ve previously shipped,
we are eventually in trouble.&lt;/p&gt;
&lt;p&gt;Next let&amp;rsquo;s experiment with the idea that LLMs allow us to test more quickly,
tripling &lt;code&gt;TicketTestRate&lt;/code&gt; from &lt;code&gt;1&lt;/code&gt; to &lt;code&gt;3&lt;/code&gt;. It turns out, increasing testing rate doesn&amp;rsquo;t change anything at all,
because the current constraint is in starting tickets.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-chart-3.png" alt="Changing testing rate doesn&amp;rsquo;t model&amp;rsquo;s behavior"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Changing testing rate doesn't model's behavior&lt;/p&gt;
&lt;p&gt;So, let&amp;rsquo;s test that. Maybe LLMs make us faster in starting tickets because &lt;em&gt;overall&lt;/em&gt; speed of development goes down.
Let&amp;rsquo;s model that by increasing &lt;code&gt;StartCodingRate&lt;/code&gt; from &lt;code&gt;1&lt;/code&gt; to &lt;code&gt;3&lt;/code&gt; as well.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://craftingengstrategy.com/static/blog/strategy/dx-chart-4.png" alt="Starting or testing tickets faster creates noise but not progress"&gt;&lt;/p&gt;
&lt;p class="img-desc i tc f6"&gt;Starting or testing tickets faster creates noise but not progress&lt;/p&gt;
&lt;p&gt;This is a fascinating result, because tripling development and testing velocity has changed how much work we start,
but ultimately the real constraint in our system is the error discovery rate in production.&lt;/p&gt;
&lt;p&gt;By exercising this model, we find an interesting result. To the extent that our error rate is a function of the volume
of things we&amp;rsquo;ve shipped in production, shipping faster doesn&amp;rsquo;t increase our velocity at all.
The only meaningful way to increase productivity in this model is to reduce the error rate in production.&lt;/p&gt;
&lt;p&gt;Models are imperfect representations of reality, but this one gives us a clear sense of what matters the most:
if we want to increase our velocity, we have to reduce the rate that we discover errors in production.
That might be reducing the error rate as implied in this model, or it might be ideas that exist outside of this model.
For example, the model doesn&amp;rsquo;t represent this well, but perhaps we&amp;rsquo;d be better off iterating more on fewer things to avoid this scenario.
If we make multiple changes to one area, it still just represents one implemented feature, not many implemented features, and the overall error
rate wouldn&amp;rsquo;t increase.&lt;/p&gt;</description></item><item><title>Strategy testing: avoid the waterfall strategy trap with iterative refinement.</title><link>https://craftingengstrategy.com/strategy-testing/</link><pubDate>Wed, 25 Sep 2024 17:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/strategy-testing/</guid><description>&lt;p&gt;If I could only popularize one idea about technical strategy, it would be that
prematurely applying pressure to a strategy&amp;rsquo;s rollout prevents evaluating whether the strategy is effective.
Pressure changes behavior in profound ways, and many of those changes are intended to make you believe your
strategy is working while minimizing change to the status quo (if you&amp;rsquo;re an executive)
or get your strategy repealed (if you&amp;rsquo;re not an executive). Neither is particular helpful.&lt;/p&gt;
&lt;p&gt;While some strategies are obviously wrong from the beginning, it&amp;rsquo;s much more common to see reasonable strategies that
fail because they didn&amp;rsquo;t get the small details right.
Premature pressure is one common cause of a more general phenomenon:
most strategies are developed in a waterfall model,
finalizing their approach before incorporating the lessons that reality teaches
when you attempt the strategy in practice.&lt;/p&gt;
&lt;p&gt;One effective mechanism to avoid the waterfall strategy trap is explicitly testing your strategy to refine the details.
This chapter describes the mechanics of strategy testing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;when it&amp;rsquo;s important to test strategy (and when it isn&amp;rsquo;t)&lt;/li&gt;
&lt;li&gt;how to test strategy&lt;/li&gt;
&lt;li&gt;when you should stop testing&lt;/li&gt;
&lt;li&gt;roles in strategy testing: sponsor vs guide&lt;/li&gt;
&lt;li&gt;metrics and meetings to run a strategy testing&lt;/li&gt;
&lt;li&gt;how to identify a strategy that skipped testing&lt;/li&gt;
&lt;li&gt;what to do when a strategy has progressed too far without testing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&amp;rsquo;s get into the details.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;This is an exploratory, draft chapter for a book on engineering strategy that I&amp;rsquo;m brainstorming in &lt;a href="https://lethain.com/tags/eng-strategy-book/"&gt;#eng-strategy-book&lt;/a&gt;.&lt;/em&gt;
&lt;em&gt;As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Many of the ideas here came together while working with &lt;a href="https://www.linkedin.com/in/shawnamartell/"&gt;Shawna Martell&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/danfike/"&gt;Dan Fike&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/madhurisarma/"&gt;Madhuri Sarma&lt;/a&gt;, and many others in Carta Engineering.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="when-to-test-strategy"&gt;When to test strategy&lt;/h2&gt;
&lt;p&gt;Strategy testing is ensuring that a strategy will accomplish its intended goal at a cost that you&amp;rsquo;re willing to pay.
This means it needs to happen prior to implementing a strategy, usually in a strategy&amp;rsquo;s early development stages.&lt;/p&gt;
&lt;p&gt;A few examples of when to test common strategy topics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Integrating a recent acquisition might focus on getting a single API integration working
before finalizing how the overall approach goes.&lt;/li&gt;
&lt;li&gt;A developer productivity strategy focused on requiring typing in a Python codebase might
start by having an experienced team member type an important module.&lt;/li&gt;
&lt;li&gt;A service migration might attempt migrating both a simple component (to test migration tooling)
and a highly complex component (to test integration complexity) before moving to a broader rollout.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In every case, the two most important pieces are testing before finalizing the strategy, and testing narrowly with a focus
on the underlying mechanics of the approach.
Avoid getting caught up in solving broad problems like motivating adoption and addressing conflicting incentives.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s not to say that you need to test every strategy. A few of the common cases where you might not want to test a strategy are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When you&amp;rsquo;re dealing with a &lt;a href="https://craftingengstrategy.com/when-write-strategy/"&gt;permissive strategy&lt;/a&gt; that&amp;rsquo;s very cheap to apply,
testing is often not too important; indeed, you can consider most highly-permissive strategies as a test
of whether it&amp;rsquo;s effective to implement a similar, but less permissive, strategy in the future.&lt;/li&gt;
&lt;li&gt;Where testing isn&amp;rsquo;t viable for some reason. For example, a hiring strategy where you shift hiring into
certain regions isn&amp;rsquo;t something you can test in most cases, it&amp;rsquo;s something you might need to run for several years
to get meaningful signal on results.&lt;/li&gt;
&lt;li&gt;There are also cases where you have such high conviction in a given strategy that it&amp;rsquo;s not worth testing, perhaps
because you&amp;rsquo;ve done something nearly identical at the same company before.
Hubris comes before the fall, so I&amp;rsquo;m generally skeptical of this category.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That said, my experience is that you should try very hard to find a way to test every strategy.
You certainly should not try hard to convince yourself testing a strategy isn&amp;rsquo;t worthwhile.
Testing is so, so much cheaper than implementing a bad strategy, that it&amp;rsquo;s almost always a good investment of time and energy.&lt;/p&gt;
&lt;h2 id="how-to-test-strategy"&gt;How to test strategy&lt;/h2&gt;
&lt;p&gt;For a valuable step that&amp;rsquo;s so often skipped, strategy testing is relatively straightforward.
The approach I&amp;rsquo;ve found effective is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Identify the narrowest, deepest available slice of your strategy, and iterate on applying your strategy to that slice until you&amp;rsquo;re confident the approach works well.&lt;/p&gt;
&lt;p&gt;For example, if you&amp;rsquo;re testing a new release strategy for your Product Engineering organization,
decide to release exactly one important release following the new approach.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;As you iterate, identify metrics that help you verify the approach is working; note that these aren&amp;rsquo;t metrics to measure adoption, instead measure impact of the change.&lt;/p&gt;
&lt;p&gt;For example, metrics that show the new release process reduces customer impact, or drives more top-of-funnel visitors.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Operate from the belief that people are well-meaning, and strategy failures are due to excess friction and poor ergonomics.&lt;/p&gt;
&lt;p&gt;For example, assume the release tooling is too complex if people aren&amp;rsquo;t using it. (Definitely don&amp;rsquo;t assume that people are too resistant to change.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Keep refining until you have conviction that your strategy&amp;rsquo;s details work in practice, or that the strategy needs to be approached from a new direction.&lt;/p&gt;
&lt;p&gt;For example, if the metrics you identified before show the new release process has significantly
reduced customer impact of the new release.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The most important details are the things you should &lt;em&gt;not&lt;/em&gt; do.
Don&amp;rsquo;t go broad where impact &lt;em&gt;feels&lt;/em&gt; higher but iteration cycles are slower.
Don&amp;rsquo;t get caught up on &lt;em&gt;forcing&lt;/em&gt; adoption such that you&amp;rsquo;re distracted from improving the underlying mechanics.
Finally, don&amp;rsquo;t get so attached to your current approach that you can&amp;rsquo;t accept that it might not be working.
Strategy testing is only valuable because many strategies don&amp;rsquo;t work as intended, and it&amp;rsquo;s much cheaper
to learn that early.&lt;/p&gt;
&lt;h2 id="testing-roles-sponsors-and-guides"&gt;Testing roles: sponsors and guides&lt;/h2&gt;
&lt;p&gt;Sometimes the strategy testing process is led by one individual who is able to sponsor the work
(a principal engineer at a smaller company, an executive, etc) and also coordinate the day-to-day work of validating
the approach (a principal engineer at a larger company, an engineering manager, a technical program manager, etc).
It&amp;rsquo;s even more common for these responsibilities to split between two roles: &lt;strong&gt;sponsor&lt;/strong&gt; and a &lt;strong&gt;guide&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;sponsor&lt;/strong&gt; is responsible for:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;serving as an escalation point to make quick decisions to avoid getting stuck in development stages&lt;/li&gt;
&lt;li&gt;pushing past historical decisions and beliefs that prevent meaningful testing&lt;/li&gt;
&lt;li&gt;marshalling cross organizational support&lt;/li&gt;
&lt;li&gt;telling the story to stakeholders, especially the executive team to avoid getting defunded&lt;/li&gt;
&lt;li&gt;preventing overloading of strategy (where people want to make the strategy solve &lt;em&gt;their&lt;/em&gt; semi-related problem)&lt;/li&gt;
&lt;li&gt;setting pace to avoid stalling out&lt;/li&gt;
&lt;li&gt;identifying when energy is dropping and to change the phase of strategy (from development to implementation)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The &lt;strong&gt;guide&lt;/strong&gt; is responsible for:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;translating the strategy into particulars where testing gets stuck&lt;/li&gt;
&lt;li&gt;identifying slowdowns and blockers&lt;/li&gt;
&lt;li&gt;escalating frequently to sponsor&lt;/li&gt;
&lt;li&gt;tracking goals and workstreams&lt;/li&gt;
&lt;li&gt;maintaining the pace set by the sponsor&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In terms of filling these roles, there are a few lessons that I&amp;rsquo;ve learned over time.
For sponsors, what matters the most is that they&amp;rsquo;re genuinely authorized by the
company to make the decision they&amp;rsquo;re making, and that they care enough about the impact
that they&amp;rsquo;re willing to make difficult decisions quickly. A sponsor is only meaningful
to the extent that the guide can escalate to the sponsor, who must rapidly resolve those escalations.
If they aren&amp;rsquo;t available for escalations or don&amp;rsquo;t resolve them quickly, they&amp;rsquo;re a poor sponsor.&lt;/p&gt;
&lt;p&gt;For guides, you need someone who can execute at pace without getting derailed by various organizational
messes, and has good, nuanced judgment relevant to the strategy being tested.
The worst guides are ideological (they reject the very feedback created by testing)
or easily derailed (you&amp;rsquo;re likely testing &lt;em&gt;because&lt;/em&gt; there&amp;rsquo;s friction somewhere, so
someone who can&amp;rsquo;t navigate friction is going to fail by default).&lt;/p&gt;
&lt;h2 id="meetings--metrics"&gt;Meetings &amp;amp; Metrics&lt;/h2&gt;
&lt;p&gt;The only absolute requirement for the strategy testing phase is that
the sponsor, guide, and any key folks working on the strategy &lt;strong&gt;must meet together every single week&lt;/strong&gt;.
Within that meeting, you&amp;rsquo;ll iterate on which metrics capture the current areas you&amp;rsquo;re trying to refine,
discuss what you&amp;rsquo;ve learned from prior metrics or data, and schedule one-off followups to ensure you&amp;rsquo;re making progress.&lt;/p&gt;
&lt;p&gt;The best version of this meeting is debugging heavy and presentation light.
Any week that you&amp;rsquo;re not learning something that informs subsequent testing,
or making a decision that modifies approach to testing, should be viewed with some suspicion.
It might mean that you&amp;rsquo;ve underresourced the testing effort, or that your testing approach is too
ambitious, but it&amp;rsquo;s a meaningful signal that testing is converging too slowly to maintain attention.&lt;/p&gt;
&lt;p&gt;If all of this seems like an overly large commitment, I&amp;rsquo;d push you to consider
your &lt;a href="https://craftingengstrategy.com/when-write-strategy/"&gt;strategy altitude&lt;/a&gt; to adjust the volume
or permissiveness of the strategy you&amp;rsquo;re working on.
If a strategy isn&amp;rsquo;t worth testing, then it&amp;rsquo;s either already quite good (which should be widely evident beyond its authors)
or it&amp;rsquo;s probably only worth rolling out in a highly permissive format.&lt;/p&gt;
&lt;h2 id="identifying-strategies-that-skipped-testing"&gt;Identifying strategies that skipped testing&lt;/h2&gt;
&lt;p&gt;While not all strategies &lt;em&gt;must&lt;/em&gt; be refined by a testing phase, essentially all failing strategies skip
the testing phase to move directly into implementation.
Strategies that skip testing &lt;em&gt;sound right&lt;/em&gt;, but don&amp;rsquo;t accomplish much.
Fully standardizing authorization and authentication across your company on one implementation &lt;em&gt;sounds right&lt;/em&gt;,
but can still fail if e.g. each team is responsible for its own approach to determining the standard.&lt;/p&gt;
&lt;p&gt;One particularly obvious pattern is something I describe as &amp;ldquo;pressure without a plan.&amp;rdquo;
This is a strategy that only &amp;ldquo;sounds right&amp;rdquo; aspect but lacks concrete details.
Service migrations are particularly prone to this, perhaps due to apocryphal descriptions of
Amazon&amp;rsquo;s service migration in the 2000s, which is often summarized as a top-down zero-details mandate to switch away from the monolith.&lt;/p&gt;
&lt;p&gt;Identification comes down to understanding two things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Are there numbers that show the strategy is driving the desired impact?
For example, API requests made into the new authentication service as a percentage of all authentication requests
is more meaningful than a spreadsheet tracking teams&amp;rsquo; commitments to move to the new service.&lt;/p&gt;
&lt;p&gt;Try to avoid proxy metrics when possible, but to instead look at the actual thing that matters.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the numbers aren&amp;rsquo;t moving, is there a clear mechanism debugging and solving those issues,
and is this team actually making progress?
For example, a team that helps with integration with the new authentication service to understand
where limitations are preventing effective adoption, and who are shipping working code.&lt;/p&gt;
&lt;p&gt;Because the numbers aren&amp;rsquo;t moving, you need to find a different source of meaningful evidence to validate that progress is happening.
Generally, the best bet is either new software running in a meaningful environment (e.g. production for product code).
It&amp;rsquo;s also useful to talk with skeptics or failed integrations, but be cautious of debugging exclusively with skeptics. &lt;br&gt;
They&amp;rsquo;re almost always right, but often they are out-of-date, such that they&amp;rsquo;re right but aren&amp;rsquo;t describing
current problems.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Unless one of these two identifications are &lt;em&gt;obviously true&lt;/em&gt;, then it&amp;rsquo;s very likely that you&amp;rsquo;ve
found a strategy that skipped testing.&lt;/p&gt;
&lt;h2 id="recovering-from-skipped-testing"&gt;Recovering from skipped testing&lt;/h2&gt;
&lt;p&gt;Once you&amp;rsquo;ve recognized a strategy that skipped testing and is now struggling,
the next question is what to do about it.
&lt;a href="https://craftingengstrategy.com/monolith-decomposition-strategy/"&gt;Should we decompose our monolith?&lt;/a&gt; looks at recovering from a failing service migration,
and is lightly based on my experience dealing with similar, stuck service migration at both Calm and Carta.
The answer to a stuck strategy is always: write a new strategy, and make sure &lt;em&gt;not&lt;/em&gt; to skip testing this time.&lt;/p&gt;
&lt;p&gt;Typically, the first step of this new strategy is explicitly pausing the struggling strategy
while a new testing phase occurs. This is painful to do, because the folks invested in the current
strategy will be upset with you, but there&amp;rsquo;s always going to be people who disagree with any change.
Long-term, the only thing that makes most people happy is a successful strategy, and anything that
delays progress towards that is a poor investment.&lt;/p&gt;
&lt;p&gt;Sometimes it is difficult to officially pause a struggling strategy,
in which case you have to look for an indirect mechanism to implicitly pause without
acknowledging it. For example, delaying new services while you take a month to invest into improving service provisioning
might give you enough breathing room to test the missing mechanisms from your strategy,
without requiring anyone to lose face over a failing migration.
It would be nice to always be able to say these things out loud,
but managing personalities is an enduring leadership challenge;
even when you&amp;rsquo;re an executive, you just have a different set of messy stakeholders.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Testing doesn&amp;rsquo;t determine whether a strategy might be good. It exposes the missing details required to translate
a directionally accurate strategy into a strategy that works.
After reading this chapter, you know how to lead that translation process as both a sponsor
and a guide. You can set up and run the necessary meetings to test a strategy, and also put together
the bank of metrics to determine if the strategy is ready to leave refinement and move to a broader rollout.&lt;/p&gt;</description></item><item><title>Should we decompose our monolith?</title><link>https://craftingengstrategy.com/monolith-decomposition-strategy/</link><pubDate>Sun, 15 Sep 2024 06:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/monolith-decomposition-strategy/</guid><description>&lt;p&gt;From their &lt;a href="https://en.wikipedia.org/wiki/Microservices"&gt;first introduction in 2005&lt;/a&gt;, the debate between adopting
a microservices architecture, a monolithic service architecture, or a hybrid between the two has become one of the
least-reversible decisions that most engineering organizations make.
Even migrating to a different database technology is &lt;em&gt;generally&lt;/em&gt; a less expensive change than moving from monolith
to microservices or from microservices to monolith.&lt;/p&gt;
&lt;p&gt;The industry has in many ways gone full circle on that debate, from most hyperscalers in the 2010s partaking in
a multi-year monolith to microservices migration, to
&lt;a href="https://x.com/kelseyhightower/status/940259898331238402"&gt;Kelsey Hightower&amp;rsquo;s iconic tweet on the perils of distributed monoliths&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;2020 prediction: Monolithic applications will be back in style after people discover the drawbacks of distributed monolithic applications.
- @KelseyHightower&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Even as popular sentiment has generally turned away from microservices, many engineering organizations have a bit of both,
often the remnants of one or more earlier but incomplete migration efforts. This strategy looks at a theoretical
organization stuck with a bit of both approaches, let&amp;rsquo;s call it Theoretical Compliance Company, which is looking to determine its path forward.&lt;/p&gt;
&lt;p&gt;Here is Theoretical Compliance Company&amp;rsquo;s service architecture strategy.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;This is an exploratory, draft chapter for a book on engineering strategy that I&amp;rsquo;m brainstorming in &lt;a href="https://lethain.com/tags/eng-strategy-book/"&gt;#eng-strategy-book&lt;/a&gt;.&lt;/em&gt;
&lt;em&gt;As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="reading-this-document"&gt;Reading this document&lt;/h2&gt;
&lt;p&gt;To apply this strategy, start at the top with &lt;em&gt;Policy&lt;/em&gt;. To understand the thinking behind this strategy, read sections in reverse order, starting with &lt;em&gt;Explore&lt;/em&gt;, then &lt;em&gt;Diagnose&lt;/em&gt; and so on.
Relative to the default structure, this document has been refactored in two ways
to improve readability:
first, &lt;em&gt;Operation&lt;/em&gt; has been folded into &lt;em&gt;Policy&lt;/em&gt;;
second, &lt;em&gt;Refine&lt;/em&gt; has been embedded in &lt;em&gt;Diagnose&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;More detail on this structure in &lt;a href="https://lethain.com/readable-engineering-strategy-documents"&gt;Making a readable Engineering Strategy document&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="policy"&gt;Policy&lt;/h2&gt;
&lt;p&gt;Our policy for service architecture is documented here.
All exceptions to this policy &lt;strong&gt;must&lt;/strong&gt; escalate &lt;em&gt;to&lt;/em&gt; a local Staff-plus engineer for their
approval, and then escalate &lt;em&gt;with&lt;/em&gt; that Staff-plus engineer to the CTO.
If you have questions about the policies, ask in &lt;code&gt;#eng-strategy&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Our policy is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Business units should always operate in their own code repository and monolith.&lt;/strong&gt;
They should not provision many different services. They should rarely work in other business units&amp;rsquo; monoliths.
There will be nuanced cases; in these cases, prefer decisions that move us closer to this policy.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;New integrations across business unit monoliths should be done using gRPC.&lt;/strong&gt;
The emphasis here is on &lt;em&gt;new&lt;/em&gt; integrations;
it&amp;rsquo;s desirable but not urgent to migrate existing integrations that use other implementations (HTTP/JSON, etc).&lt;/p&gt;
&lt;p&gt;When the decision is subtle (e.g. changes to an existing endpoint), optimize for
business velocity rather than technical purity. When the decision is far from subtle (e.g. brand new endpoint),
comply with the policy.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Except for new business unit monoliths, we don&amp;rsquo;t allow new services.&lt;/strong&gt;
You should work within the most appropriate business unit monolith or within the existing infrastructure repositories.
Provisioning a new service, unless it corresponds with a new business unit, always
requires approval from the CTO in &lt;code&gt;#eng-strategy&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;That approval generally will &lt;em&gt;not&lt;/em&gt; be granted, unless
the new service requires significantly different non-functional requirements than an existing monolith.
For example, if it requires significantly higher compliance review prior to changes such as our existing
payments service, or if it requires radically higher requests per second, and so on.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Merge existing services into business-unit monoliths where you can.&lt;/strong&gt;
We believe that each choice to move existing services back into a monolith should
be made &amp;ldquo;in the details&amp;rdquo; rather than from a top-down strategy perspective. Consequently,
we generally encourage teams to wind down their existing services outside of their business unit&amp;rsquo;s monolith,
but defer to teams to make the right decision for their local context.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="diagnose"&gt;Diagnose&lt;/h2&gt;
&lt;p&gt;Theoretical Compliance Company has a complex history with decomposing our monolith.
We are also increasing our number of business units, while limiting our investment into
our core business unit. These are complex times, with a lot of constraints to juggle.
To improve readability, we&amp;rsquo;ve split the diagnosis into
two sections: &amp;ldquo;business constraints&amp;rdquo; and &amp;ldquo;engineering constraints.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Our business constraints are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;We sell business-to-business compliance solutions to other companies on an annual subscription.
There is one major, established business line, and two smaller partially validated business lines
that are intended to attach to the established business line to increase average contract value.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There are 2,000 people at the company. About 500 of those are in the engineering organization.
Within that 500, about 150 work on the broadest definition of &amp;ldquo;infrastructure engineering,&amp;rdquo;
things like developer tools, compute and orchestration, networking, security engineering, and so on.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The business is profitable, but revenue growth has been 10-20% YoY, creating persistent pressure on spend
from our board, based on mild underperformance relative to public market comparables.
&lt;strong&gt;Unless we can increase YoY growth by 5-10%, they expect us to improve free cash flow by 5-10% each year&lt;/strong&gt;,
which jeopardizes our ability to maintain long-term infrastructure investments.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Growth in the primary business line is shrinking.&lt;/strong&gt;
The company&amp;rsquo;s strategy includes spinning up more adjacent business units to increase average contract value with new products.
&lt;strong&gt;We need to fund these business units without increasing our overall budget&lt;/strong&gt;,
which means budget for the new business units must be pulled away from either our core business or our platform teams.&lt;/p&gt;
&lt;p&gt;In addition to needing to fund our new business units, &lt;strong&gt;there&amp;rsquo;s ongoing pressure to make our core business more efficient&lt;/strong&gt;,
which means either accelerating growth or reducing investment. It&amp;rsquo;s challenging to accelerate growth while reducing investment,
which suggests that most improvement will come from reducing our investment&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Our methodology to allocate platform costs against business units does so proportionately to
the revenue created by each business unit. &lt;strong&gt;Our core business generates the majority of our revenue, which means it is accountable for the majority of our platform costs&lt;/strong&gt;,
even if those costs are motivated by new business lines.&lt;/p&gt;
&lt;p&gt;This means that, even as the burden placed on platform teams increases due to spinning up multiple business units,
there&amp;rsquo;s significant financial pressure to reduce our platform spend because it&amp;rsquo;s primarily represented as a cost to the core business
whose efficiency we have to improve.
This means we have little tolerance for anything that increases infrastructure overhead.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Our engineering constraints are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Our infrastructure engineering team is 150 engineers supporting 350 product engineers,
and it&amp;rsquo;s certain that &lt;strong&gt;infrastructure will not grow significantly in the foreseeable future&lt;/strong&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We spun up two new business units in the past six months, and &lt;strong&gt;plan to spin up an additional two new business units&lt;/strong&gt;
in the next year. Each business unit is led by a general manager, with engineering and product within that business unit
principally accountable to that general manager. Our CTO and CPO still set practice standards, but it&amp;rsquo;s situationally-specific
whether these practice standards or direction from general manager is the last word on any given debate.&lt;/p&gt;
&lt;p&gt;For example, one business unit has been unwilling to support an on-call rotation for their product, because their general
manager insists it is a wasteful practice. Consequently, that team often doesn&amp;rsquo;t respond to pages, even when their service
is responsible for impacting the stability of shared functionality.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We&amp;rsquo;ve modeled &lt;a href="https://lethain.com/services-overhead-model/"&gt;how services and monoliths create overhead for both product and infrastructure organizations over time&lt;/a&gt;,
and have conviction that, in general, &lt;strong&gt;it&amp;rsquo;s more overhead for infrastructure to support more services&lt;/strong&gt;.
We also found that in our organization, the rate of service ownership changing due to team reorganizations counteracts much of the initial productivity
gains from leaving the monolith.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There is some tension between the two preceding observations: it&amp;rsquo;s generally more overhead to have more services,
but it&amp;rsquo;s &lt;em&gt;even more&lt;/em&gt; overhead to have irresponsible business units breaking a shared monolithic service.
For example, we can much more easily rate limit usage from a misbehaving service than fix a misbehaving codepath
within a shared service.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We also have a payments service that moves money from customers to us.
&lt;strong&gt;Our compliance and security requirements for changes to this service are significantly higher&lt;/strong&gt;
than for the majority of our software, because the blast radius is essentially infinite.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Our primary programming language is Ruby, which generally relies on blocking IO,
and service-oriented architectures generally spend more time on blocking IO than monoliths.
Similarly, Ruby is &lt;em&gt;relatively&lt;/em&gt; inefficient at serializing and deserializing JSON payloads,
which our service architecture requires as part of cross-service communication.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We&amp;rsquo;ve previously attempted to decompose, and have &lt;strong&gt;a number of lingering partial migrations that don&amp;rsquo;t align cleanly with our current business unit ownership structure&lt;/strong&gt;.
The number of these new services continues to grow over time,
creating more burden on both infrastructure today and product teams in the future as they try to
maintain these services through various team reorganizations.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="explore"&gt;Explore&lt;/h2&gt;
&lt;p&gt;In the late 2010s, most large or scaling companies adopted services to some extent.
Few adopted microservices, with the majority of adopters opting for a &lt;a href="https://aws.amazon.com/compare/the-difference-between-soa-microservices/"&gt;service-oriented architecture&lt;/a&gt; instead.
&lt;a href="https://x.com/kelseyhightower/status/940259898331238402"&gt;Kelsey Hightower&amp;rsquo;s iconic tweet on the perils of distributed monoliths&lt;/a&gt; in 2020
captured the beginning of a reversal, with more companies recognizing the burden of operating service-oriented architectures.&lt;/p&gt;
&lt;p&gt;In addition to the wider recognition of those burdens, many of the cloud infrastructure challenges that originally motivated service architectures began
to mellow. Most infrastructure engineers today &lt;em&gt;only&lt;/em&gt; know how to operate with cloud-native patterns, rather than starting from machine oriented approaches.
Standard database technologies like PostgreSQL have significantly improved capabilities. Cloud providers have fast local caches for quickly retrieving verified upstream packages.
Supply and cost of cloud compute is affordable. Slow programming languages are faster than they were a decade ago. Untyped languages have reasonable incremental paths
to typed codebases.&lt;/p&gt;
&lt;p&gt;As a result of this shift, if you look at a new, emerging company, it&amp;rsquo;s particularly likely to have a monolith in one backend and one frontend programming language.
However, if you look at a five-plus-year-old company, you might find almost anything. One particularly common case is a monolith with most functionality,
and an inconsistent constellation of team-scoped macroservices scattered across the organization.&lt;/p&gt;
&lt;p&gt;The shift away from &lt;a href="https://en.wikipedia.org/wiki/Zero_interest-rate_policy"&gt;zero interest-rate policy&lt;/a&gt; has also impacted trends,
as service-oriented architectures tend to require more infrastructure to operate efficiently, such as service meshes, service provisioning and deprovisioning, etc.
Properly tuned, service-oriented architectures ought to be cost competitive, and potentially superior in complex workloads, but it&amp;rsquo;s
hard to maintain the required investment in infrastructure teams when in a cost-cutting environment.
This has encouraged new companies to restrict themselves to monolithic approaches, and pushed existing companies to
&lt;em&gt;attempt&lt;/em&gt; to reverse their efforts to decompose their prior monoliths, with mixed results.&lt;/p&gt;</description></item><item><title>When to write strategy, and how much?</title><link>https://craftingengstrategy.com/when-write-strategy/</link><pubDate>Sun, 25 Aug 2024 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/when-write-strategy/</guid><description>&lt;p&gt;Even if you believe that strategy is generally useful,
it is difficult to decide that today is the day to start writing engineering strategy.
When you do start writing strategy, it&amp;rsquo;s easy to write so much strategy that
your organization is overwhelmed and ignores your strategy rather than
investing time into understanding it.&lt;/p&gt;
&lt;p&gt;Fortunately, these are universal problems, and there are a handful of
useful mental models to avoid both extremes.
This chapter covers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;when to write strategy, in particular the pain points (like cross-team friction)
and opportunities (like senior hires) that are good moments to start writing&lt;/li&gt;
&lt;li&gt;how much strategy your organization can tolerate, avoiding the traps of writing so much that it&amp;rsquo;s ignored
or so little that there&amp;rsquo;s not much impact&lt;/li&gt;
&lt;li&gt;using strategy altitude&amp;ndash;how permissive a given strategy is and where it&amp;rsquo;s implemented&amp;ndash;to manage the overhead
that strategies creates&lt;/li&gt;
&lt;li&gt;mechanisms to debug whether you&amp;rsquo;re doing too much or too little strategy work&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When you&amp;rsquo;re done reading it, you should have a clear perspective on
when to start writing strategy, determining how many strategies to write,
and using strategy altitude to reduce overhead when you do decide to write a
high-volume of strategies.&lt;/p&gt;
&lt;h2 id="when-to-write-strategy"&gt;When to write strategy&lt;/h2&gt;
&lt;p&gt;Shortly after becoming Calm&amp;rsquo;s CTO, I opened a document, titled it “Engineering Strategy”, and then stared into that blank abyss before putting it away for a year.
A year later, I came back and documented three guiding principles: &lt;a href="https://mcfunley.com/choose-boring-technology"&gt;choose boring technology&lt;/a&gt;, resolve conflict with curiosity, and prefer vendors for commoditized functionality.
These simple statements greatly reduced conflict in decision making, and allowed us to focus more energy on improving our product.
When I started, I&amp;rsquo;d felt like we needed a clearer strategy, but I just didn&amp;rsquo;t know what to write at that point, so I wrote nothing.&lt;/p&gt;
&lt;p&gt;Often writing nothing is the best available choice. Indeed, a common slur against leaders is that they &amp;ldquo;want to be strategic,&amp;rdquo; implying that they&amp;rsquo;re too focused
on abstract ideas rather than on the concrete needs of today. Behind that allegation is an important
truth: strategy work isn&amp;rsquo;t always the most valuable thing you can spend your time. Sometimes working on
strategy is &lt;a href="https://staffeng.com/guides/work-on-what-matters/"&gt;just snacking&lt;/a&gt; to avoid something more important.&lt;/p&gt;
&lt;p&gt;Before you start working on strategy, you have to decide whether now is the correct time, which depends
on your organization&amp;rsquo;s current strategic state, the trend of your strategic state over time,
and whether you have enough context to be effective.&lt;/p&gt;
&lt;p&gt;The first of those three criteria is the idea of strategic state.
Using the example of &lt;a href="https://craftingengstrategy.com/monolith-decomposition-strategy/"&gt;service architecture strategy&lt;/a&gt;,
your engineering organization is going to be in one of three strategy states:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Globally consistent&lt;/strong&gt;: there is a clearly agreed upon strategy, even if it&amp;rsquo;s not written down. When you ask different members of the team
how to approach a given problem, you get similar answers.&lt;/p&gt;
&lt;p&gt;For example, everyone agrees to write new product functionality in the existing monolithic codebase.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Consistent within teams&lt;/strong&gt;: there is a clear strategy within pockets of the organization, but there&amp;rsquo;s some inconsistency
across pockets.&lt;/p&gt;
&lt;p&gt;For example, product engineering believes all new functionality should be in a new service within a shared monorepo,
but all platform engineering believes new functionality should be implemented in a monolith.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Highly varied&lt;/strong&gt;: there&amp;rsquo;s little agreement across individuals within engineering on how to approach problems.&lt;/p&gt;
&lt;p&gt;For example, some engineers want to do work in new services in a monorepo, others in new services in polyrepos,
and some believe in implementing new functionality in an existing monolithic service.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If your organization is globally consistent, then it&amp;rsquo;s unlikely that doing more strategy work will be useful unless
your organization is consistently deciding upon an undesirable approach.
If you&amp;rsquo;re in one of the later two states, then it&amp;rsquo;s likely a useful time to write some strategy.&lt;/p&gt;
&lt;p&gt;Even if the current state is good, if the organization is trending towards a worse state, it&amp;rsquo;s a valuable time to start doing strategy work.
Conversely, if the current state is decent, and trending towards something better, it&amp;rsquo;s likely not a valuable opportunity.
There are a handful of recurring causes that can lead to abrupt, sometimes unexpected, shifts in state:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;How much you are, or aren&amp;rsquo;t, hiring. Uber doubled engineering headcount every six months for four years,
along with opening many distributed engineering offices, which led to highly varied approaches.
This also meant that the tenure of most engineers was quite low, driving up inconsistency even more.&lt;/li&gt;
&lt;li&gt;Whether your newly hired external leaders are more playbook-driven or more responsive to the organization&amp;rsquo;s current context.
Although it&amp;rsquo;s a known anti-pattern in &lt;a href="https://lethain.com/first-ninety-days-cto-vpe/"&gt;executive onboarding&lt;/a&gt;,
many leaders are so desperate to make an early impact that they forget to diagnose their new environment
before making sweeping changes. This creates a strategy rift between teams aligning with the new direction and teams
maintaining the existing software and infrastructure.&lt;/li&gt;
&lt;li&gt;How frequently you have significant organizational changes such as reorganizations or layoffs.
These events can break the mechanisms that propagate culture, which are the sort of subtle &lt;a href="https://noidea.dog/glue"&gt;glue work&lt;/a&gt;
which often gets ignored in spreadsheet-driven exercises.&lt;/li&gt;
&lt;li&gt;How effectively you&amp;rsquo;ve documented historical decisions, and how well you communicate them during onboarding.
Some companies drill new hires on how decisions are made, and others expect teams to do the training locally.
Both approaches can work well. Both can work poorly.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Finally, even if the current state is poor and getting worse, you have to assess whether &lt;em&gt;you&lt;/em&gt; understand
the organization well enough to start doing useful strategy work. Many new leaders jump in, make assumptions without testing them,
and &lt;a href="https://lethain.com/grand-migration/"&gt;attempt a massive migration&lt;/a&gt;. That might &lt;em&gt;feel&lt;/em&gt; like an audacious example of driving strategy,
but it&amp;rsquo;s mostly just anxiety or ego wrapped in a gantt chart.&lt;/p&gt;
&lt;p&gt;The question to ask yourself is whether you understand the history around the areas you want to change,
the individuals who made the decisions, and the context that made them good decisions at that time.
If you know those things for the areas you&amp;rsquo;re focused on, then you&amp;rsquo;re ready to step into strategy.
If not, it&amp;rsquo;s worth slowing down to build the relationships and context necessary to make your subsequent work useful.&lt;/p&gt;
&lt;p&gt;If things could be better, or are trending down, and you know enough about the company to get started, then it&amp;rsquo;s
time to start working on strategy.&lt;/p&gt;
&lt;h2 id="how-many-strategies"&gt;How many strategies?&lt;/h2&gt;
&lt;p&gt;The next question you&amp;rsquo;ll run into after starting work on strategy is: &lt;em&gt;how much&lt;/em&gt; strategy should you undertake?
Is it something about programming language choice? Or service decomposition? Or how you prioritize bugs?
Or is it about data warehouses? What about doing all four at once?&lt;/p&gt;
&lt;p&gt;With genuinely infinite potential strategies you could work on, it&amp;rsquo;s hard to decide where to start.
By far the most valuable decision you can make is to &lt;a href="https://lethain.com/limiting-wip/"&gt;limit work in progress&lt;/a&gt;, even if it means starting smaller than you want.
Generally what I&amp;rsquo;ve found effective is to start small, iterate on small pieces until you get them working, and only then move on to something larger.
Limit yourself to developing one or two strategies at a time. This gives you bandwidth to ensure the strategies actually work.&lt;/p&gt;
&lt;p&gt;To remain effective while limiting concurrent strategy development, it&amp;rsquo;s important to have a clear, but lightly held, point of view
about where you want to get over time. Having a strategy destination makes it possible for each of these smaller chunks to ladder
up into something larger.&lt;/p&gt;
&lt;p&gt;Grounding that in a concrete example: at Uber, we were having reliability and productivity issues
related to the monolithic Python codebase. My team didn&amp;rsquo;t have the ability to forbid commits there, but we did have the ability to
make service provisioning really, really easy. So we created a strategy around making service provisioning and operation as painless
as possible. The strategy aimed to solve a later problem of decomposing and departing the monolith, but we didn&amp;rsquo;t address that directly.
We focused on the first step, believing that it was a necessary prerequisite for the subsequent steps. After we proved out the first step,
it then became possible to work strategy on the subsequent steps.&lt;/p&gt;
&lt;p&gt;If we had started with the broader strategy, we might have gotten stuck having an intellectual debate about what should happen in the future,
and required many different teams to buy our future vision without having any concrete step for them to take today. By narrowing down, we were
able to iterate on the prerequisites, and delay building consensus until there was a concrete step we needed folks to take.
At that point, there was no intellectual debate about whether it was possible because most people were already operating as we intended.&lt;/p&gt;
&lt;p&gt;One of the challenges with reducing the volume of concurrent strategies is that it appears unambitious.
In the Uber example, we &lt;em&gt;needed&lt;/em&gt; to solve development in the monolithic codebase, but instead we were talking about
service provisioning, which from a distance seemed like we&amp;rsquo;d lost the plot. This is a recurring challenge with effective
strategy development: it can appear overly conservative.
Even though in practice it&amp;rsquo;s usually the fastest solution to the underlying problem, it often comes across as slow or indirect.
Solving the appearance of unambition requires proactive storytelling to your stakeholders to explain
both the incremental initiative and the broader vision it will expand to fill over time.&lt;/p&gt;
&lt;p&gt;Sometimes this isn&amp;rsquo;t just a stakeholder problem: it can feel slow to you as well.
In those moments, I try to remember that &lt;a href="https://lethain.com/friction-vs-velocity/"&gt;friction isn&amp;rsquo;t velocity&lt;/a&gt; and
think about Digg&amp;rsquo;s engineering strategy when I joined.
We had an extremely clear and consistent architecture (a PHP frontend, Python services, Cassandra for all storage),
but the company still collapsed around us.
A few strategies that work are more valuable than a bunch of strategies, even good ones, in a burning building.&lt;/p&gt;
&lt;h2 id="strategy-altitude"&gt;Strategy altitude&lt;/h2&gt;
&lt;p&gt;Sometimes you &lt;em&gt;do&lt;/em&gt; want to lay out a broad, comprehensive strategy,
and you want to do it quickly. That violates the general rule of developing
one strategy at a time, but there&amp;rsquo;s one helpful idea that can often
make this possible: strategy altitude.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s easiest to explain this idea by starting with a few examples of
operating at different altitudes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;A developer experience team wants to increase code quality.
They create a mechanism that allows teams to define linting rules
for their own builds. The developer experience team creates opinionated defaults for teams to adopt,
but each team is empowered to override those defaults locally.&lt;/p&gt;
&lt;p&gt;This is a permissive strategy at the engineering organization altitude.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A CTO wants to increase code quality.
They mandate that every pull request must include a test and that CI/CD should block merging pull requests
that reduce code coverage.&lt;/p&gt;
&lt;p&gt;This is a proscriptive strategy at the engineering organization altitude.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A product engineering team wants to decrease security vulnerabilities in their software.
They tell engineers that it&amp;rsquo;s important to consider a number of security issues when
implementing software, and includes resources for engineers to educate themselves.&lt;/p&gt;
&lt;p&gt;This is a permissive strategy at the team altitude.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A product engineering team wants to reduce user-impacting bugs.
They decide that their planning sprints will schedule bug fixes first,
and only schedule features after draining the bug backlof.&lt;/p&gt;
&lt;p&gt;This is a proscriptive strategy at the team altitude.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Permissive strategies are less expensive than prescriptive strategies, because they require little-to-no enforcement.
Lower-altitude strategies (e.g. team strategies) are less expensive than higher-altitude strategies (e.g. org or company strategies),
because they can rely on local mechanisms for rollout and maintenance rather than oversaturated and lossy mechanisms for wider communication
(e.g. communicating in engineering-wide chat channels is, at best, ineffective).&lt;/p&gt;
&lt;p&gt;Pulling these ideas together, the formula to increase
strategy volume, is to either reduce altitude or increase permisiveness. Or both.&lt;/p&gt;
&lt;p&gt;Going into a concrete example, when I joined Carta, I worked across engineering to roll out quite a bit of strategy work in the first six months.
Some of this was documenting existing strategy, so it didn&amp;rsquo;t require much adoption overhead.
Other parts were a shift in approach, so we focused on developing permissive strategies.
Every strategy included an escape hatch to support local customization, generally asking
each team&amp;rsquo;s &lt;a href="https://lethain.com/navigators/"&gt;Navigator&lt;/a&gt; (a Staff-plus engineer responsible for that area) to override
the strategy as appropriate. There was only one place where I was highly proscriptive, which was around
provisioning new services&amp;ndash;there, the escape hatch was more restrictive, requiring escalating to the CTO.&lt;/p&gt;
&lt;p&gt;Because we focused on permissive strategies, we were able to cover a broad range of topics at high altitude.
If I&amp;rsquo;d been more proscriptive, the approach would have certainly failed, even though I might have looked like
a more courageous leader. Annoyingly, looking effective and being effective tend to be only lightly correlated.&lt;/p&gt;
&lt;h2 id="are-you-doing-too-much"&gt;Are you doing too much?&lt;/h2&gt;
&lt;p&gt;Although many engineers feel that their company doesn&amp;rsquo;t have a clear engineering strategy,
it&amp;rsquo;s my experience that significantly more leaders fail by attempting too much strategy work
than by attempting to do too little.&lt;/p&gt;
&lt;p&gt;To debug whether you&amp;rsquo;re doing too much, the most valuable question you can ask
is whether your prior strategy work has impacted the subsequent decisions being made.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;ve shared out a bunch of strategy work, but it doesn&amp;rsquo;t seem to be impacting
how your software is written, then you should scale back.
Instead, focus on getting just a single strategy working well, and deeply understand
what&amp;rsquo;s gone wrong in your prior efforts. Then, and only then, return to your prior
work and fix it. Finally, and only after completing the prior steps, expand further.&lt;/p&gt;
&lt;p&gt;You may also be doing good work, but simply overwhelming the organization with too much.
Adopting new approaches is hard, and changing everything at once is overwhelming.
Adjust your strategy altitude to make strategies easier to adopt, and slow down on
adding more until the existing ones have been fully adopted.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;After reading this chapter, you know when it&amp;rsquo;s effective to write strategy, and how to pace yourself to write
a reasonable volume of strategies. You can use strategy altitude to make strategies easier to
adopt, and can debug whether you&amp;rsquo;re overwhelming your organization with too much strategy work.&lt;/p&gt;
&lt;p&gt;If you take nothing else away from this chapter, try to always be working on exactly one strategy.
Doing more feels like progress, but usually fails. Doing less is always a missed opportunity.&lt;/p&gt;</description></item><item><title>Making engineering strategies more readable</title><link>https://craftingengstrategy.com/readable-strategy/</link><pubDate>Sat, 18 May 2024 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/readable-strategy/</guid><description>&lt;p&gt;As discussed in &lt;a href="https://craftingengstrategy.com/strategy-steps/"&gt;Components of engineering strategy&lt;/a&gt;,
a complete engineering strategy has five components: explore, diagnose, refine (map &amp;amp; model), policy, and operation.
However, it&amp;rsquo;s actually quite challenging to read a strategy document written that way.
That&amp;rsquo;s an effective sequence for &lt;em&gt;creating&lt;/em&gt; a strategy, but it&amp;rsquo;s a challenging sequence
for those trying to quickly &lt;em&gt;read and apply&lt;/em&gt; a strategy without necessarily wanting to understand
the complete thinking behind each decision.&lt;/p&gt;
&lt;p&gt;This post covers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why the order for writing a strategy is hard to read&lt;/li&gt;
&lt;li&gt;How to organize a strategy document for reading&lt;/li&gt;
&lt;li&gt;How to refactor and merge components for improved readability&lt;/li&gt;
&lt;li&gt;Additional tips for effective strategy documents&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After reading it, you should be able to take a written strategy and
rework it into a version that&amp;rsquo;s much easier for others to read.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;This is an exploratory, draft chapter for a book on engineering strategy that I&amp;rsquo;m brainstorming in &lt;a href="https://lethain.com/tags/eng-strategy-book/"&gt;#eng-strategy-book&lt;/a&gt;.&lt;/em&gt;
&lt;em&gt;As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="why-writing-structure-inhibits-reading"&gt;Why writing structure inhibits reading&lt;/h2&gt;
&lt;p&gt;Most software engineers learn to structure documents early in their lives as
students writing academic essays. Academic essays are focused on presenting evidence
to support a clear thesis, and generally build forward towards their conclusion.
Some business consultancies explicitly train their new hires in business writing,
such as McKinsey teaching
Barbara Minto&amp;rsquo;s &lt;em&gt;&lt;a href="https://www.amazon.com/Pyramid-Principle-Logic-Writing-Thinking/dp/0273710516"&gt;The Pyramid Principle&lt;/a&gt;&lt;/em&gt;,
but that&amp;rsquo;s the exception.&lt;/p&gt;
&lt;p&gt;While academic essays want to develop an argument, professional writing is a bit different.
Professional writing typically has one of three distinct goals:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Refining thinking about a given approach&lt;/strong&gt; (&amp;ldquo;how do we select databases for our new products?&amp;rdquo;) &amp;ndash; this is an area where the academic structure can be useful,
because it focuses on the thinking behind the proposal rather than the proposal itself&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Seeking approval from stakeholders or executives&lt;/strong&gt; (&amp;ldquo;what database have we selected for our new analytics product?&amp;rdquo;) &amp;ndash; this is an area where the academic structure
creates a great deal of confusion, because it focuses on the thinking rather than the specific proposal,
but stakeholders view the specific proposal as the primary topic to review&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Communicating a policy to your organization&lt;/strong&gt; (&amp;ldquo;databases allowed for new products&amp;rdquo;) &amp;ndash; helping engineers at your company
understand the permitted options for a given problem, and also explaining the rationale behind the decision for
the subset who may want to understand or challenge the current policy&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The ideal format for the first case is generally at odds with the other two, which is a frequent cause of
strategy documents which struggle to graduate from brainstorm to policy.
I find that most strategy writers are resistant to the idea that it&amp;rsquo;s worth their time to restructure their
initial documents, so let me expand on challenges I&amp;rsquo;ve encountered when I&amp;rsquo;ve personally tried to make
progress without restructuring:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Too long, didn&amp;rsquo;t read.&lt;/strong&gt; Thinking-oriented structures leave policy recommendations at the very bottom,
but the vast majority of strategy readers are simply trying to understand that policy so they can
apply it to their specific problem at hand. Many of those readers, in my experience a majority of them,
will simply give up before reading the sections that answer their questions and assume that the
document doesn&amp;rsquo;t provide clear direction because finding that direction took too long.&lt;/p&gt;
&lt;p&gt;This is very much akin to the core lesson of Steve Krug&amp;rsquo;s &lt;a href="https://www.amazon.com/Dont-Make-Think-Revisited-Usability/dp/0321965515"&gt;Don&amp;rsquo;t Make Me Think&lt;/a&gt;:
users (and readers) don&amp;rsquo;t understand, they muddle through.
Assuming readers will invest significant time to deeply understand your document is an act of hubris.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Approval meeting to nowhere.&lt;/strong&gt; There are roughly three types of approval meetings. The first, you go in and no one has any feedback.
Maybe someone gripes that it could have been done asynchronously instead of a meeting, but your document is approved.
The second, there are two sets of stakeholders with incompatible goals, and you need a senior decision-maker to
mediate between them. This is a very useful meeting, because you generally can&amp;rsquo;t make progress without that
senior decision-maker breaking the tie.&lt;/p&gt;
&lt;p&gt;The third sort of meeting is when you get derailed early with questions about the research, whether you&amp;rsquo;d considered
another option, and whether this is even relevant. You might think this is because your strategy is wrong, but
in my experience it&amp;rsquo;s usually because you failed to structure the document to present the policy upfront.
Stakeholders might disagree with many elements of your thinking but still agree with your ultimate policy,
but it&amp;rsquo;s only useful to dig into your rationale if they actually disagree with the policy itself.
Avoid getting stuck debating details when you agree on the overarching approach by presenting the
policy &lt;em&gt;first&lt;/em&gt;, and only digging into those details when there&amp;rsquo;s disagreement.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Transient alignment.&lt;/strong&gt; Sometimes you&amp;rsquo;ll see two distinct strategy documents, with the first covering
the full thinking, and the second only including the policy and operations sections.
This tends to work quite well initially, but over time existing members of the team depart
and new members are hired. At some point, a new member will challenge the thinking behind
the strategy as obviously wrong, generally because it&amp;rsquo;s a different set of policies than
they used at the previous employer. If you omit the diagnosis and exploration sections entirely,
then they can&amp;rsquo;t trace through the decision making to understand the reasoning,
which will often cause them to leap to simplistic conclusions like the
ever popular, &amp;ldquo;I guess the previous engineers here were just dumb.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As annoying as each of these challenges is, the solution is simple:
use the writing structure for writing, and invert that structure for reading.&lt;/p&gt;
&lt;h2 id="invert-structure-for-reading"&gt;Invert structure for reading&lt;/h2&gt;
&lt;p&gt;Reiterating a point from &lt;a href="https://craftingengstrategy.com/strategy-steps/"&gt;Components of engineering strategy&lt;/a&gt;:
it&amp;rsquo;s always appropriate to change the structure that you use to develop or present a strategy,
as long as you are making a deliberate, informed decision.&lt;/p&gt;
&lt;p&gt;While I&amp;rsquo;ve generally found explore, diagnose, refine, policy, and operation to work well for
writing policy, I&amp;rsquo;ve consistently found it a poor format for presenting strategy.
Whether I&amp;rsquo;m presenting a strategy for review or rolling the strategy out to be followed by
the wider organization, I recommend an inverted structure:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Policy&lt;/strong&gt;: what does the strategy require or allow?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Operation&lt;/strong&gt;: how is the strategy enforced and carried out, how do I get exceptions for the policy?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Refine&lt;/strong&gt;: what were the load-bearing details that informed the strategy?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Diagnose&lt;/strong&gt;: what are the more generalized trends and observations that steered the thinking?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Explore&lt;/strong&gt;: what is the high-level, wide-ranging context that we brought into creating this strategy?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When seeking approval, you&amp;rsquo;ll probably focus on the &lt;strong&gt;Policy&lt;/strong&gt; section.
When rolling it out to your organization, you&amp;rsquo;ll probably focus on the &lt;strong&gt;Operation&lt;/strong&gt; section more.
In both cases, those are the critical components and you want them upfront.
Very few strategy readers want to understand the full thinking behind your strategy, instead they just
want to understand how it impacts the specific decision they are trying to answer.&lt;/p&gt;
&lt;p&gt;The vast majority of strategy readers want the answer, not to understand the thinking behind the answer,
and these are your least motivated readers. Someone who wants to really understand the thinking will
invest time reading through the document, even if it isn&amp;rsquo;t perfectly structured for them.
Someone who just wants an answer will frequently give up and make up an answer rather than reading all
the way through to where the document does in fact answer their question.&lt;/p&gt;
&lt;p&gt;Zooming out a bit, this is a classic &amp;ldquo;lack of user empathy&amp;rdquo; problem. Folks authoring the document
are so deep in the details that they can&amp;rsquo;t put themselves in the readers&amp;rsquo; mindset to think about
how overwhelming the document would be if you were simply trying to pop in, get an answer, and then pop out.
This lack of empathy also means that most strategy writers refuse to structure their documents to
support the large population of answer seekers over the tiny population of strategy authors,
but just try it a few times and I think you&amp;rsquo;ll see it helps a great deal.
Even faster, go read someone else&amp;rsquo;s strategy document that you aren&amp;rsquo;t familiar with, and
you&amp;rsquo;ll quickly appreciate how challenging it can be to identify the actual proposal
if they follow the academic structure.&lt;/p&gt;
&lt;h2 id="strategy-refactoring"&gt;Strategy refactoring&lt;/h2&gt;
&lt;p&gt;Inverting the structure is the first step of optimizing a document
for readability, but you don&amp;rsquo;t have to stop there. Often you&amp;rsquo;ll
find that even the inverted strategy structure is somewhat confusing to read
for a given document. I think of this process as &amp;ldquo;strategy refactoring.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;For example, &lt;a href="https://craftingengstrategy.com/llm-adoption-strategy/"&gt;How should you adopt LLMs?&lt;/a&gt; makes two refactors
to the inverted format. First, it merges &lt;em&gt;Refine&lt;/em&gt; into &lt;em&gt;Diagnose&lt;/em&gt;, which keeps
the map and models closer to the specific topics that are explored.
Second, it discards the &lt;em&gt;Operation&lt;/em&gt; section entirely, and includes the relevant
details with the policies they apply to in the &lt;em&gt;Policy&lt;/em&gt; section.&lt;/p&gt;
&lt;p&gt;Strategy refactoring is about discarding structure where it interferes with usability.
The strategy structure is very effective at separating concerns while reasoning through
decision making, but most readers benefit more from engaging with the full implications at once.
Once you&amp;rsquo;re done thinking, refactor away the thinking tools: don&amp;rsquo;t let the best tools for
one workflow mislead you into thinking they&amp;rsquo;re the best for an entirely different one.&lt;/p&gt;
&lt;h2 id="additional-tips-for-effective-strategy-docs"&gt;Additional tips for effective strategy docs&lt;/h2&gt;
&lt;p&gt;In addition to the above advice, there are a handful of smaller tips
that I&amp;rsquo;ve found helpful for creating readable strategy documents:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Before releasing a document widely, find someone entirely uninvolved with the
strategy thus far and have them point out areas that are difficult to understand.
Anyone who&amp;rsquo;s been thinking about the strategy is going to gloss over areas that might
be inscrutinable to those who are approaching it with fresh eyes.&lt;/li&gt;
&lt;li&gt;Every strategy document should be rolled out with an explicit commenting period where you
invite discussion, as well as office hours where you are available to explain how to apply
the strategy correctly. These steps help with adoption, but even more importantly they
help you identify dissenters who disagree with the strategy such that you can follow up
to better understand their concerns.&lt;/li&gt;
&lt;li&gt;Every company should maintain its own internal engineering strategy
template, incorporating the notes in &lt;a href="https://craftingengstrategy.com/readable-strategy/"&gt;making readable engineering strategies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Your template should include consistent metadata, particularly when it was created,
the current approval status, and where to ask questions.
Of these, a clear, durable place to ask questions is the most important,
as it slows the rate that these documents rot.&lt;/li&gt;
&lt;li&gt;After you release your strategy, disable in-document commenting. This isn&amp;rsquo;t intended to prevent further discussion,
but rather to move the discussion outside of the document.
Nothing creates the impression of an unapproved, unfinished strategy document
faster than a long string of open comments. Open comments also make it difficult
to read the strategy document, as often the reader will get distracted from reading
the document to read the comments.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;After reading this chapter, you know how to escape the rigid structures imposed during the
creation of a strategy to create a readable document that is easier for others to both approve and apply.
Beyond initially inverting the structure for easier reading, you also understand how to refactor sections
away entirely that may have been essential for creation but interfere with understanding how to apply the strategy,
which is by far the most common task for strategy readers.&lt;/p&gt;
&lt;p&gt;Most importantly, I hope you finish this chapter agreeing that it&amp;rsquo;s worth your time to
rework your thinking-optimized draft rather than leaving it as is. The deliberate refusal
to structure documents for readers is the root cause of a surprising number of good strategies
that utterly fail to have their intended impact.&lt;/p&gt;</description></item><item><title>How should you adopt LLMs?</title><link>https://craftingengstrategy.com/llm-adoption-strategy/</link><pubDate>Tue, 14 May 2024 06:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/llm-adoption-strategy/</guid><description>&lt;p&gt;Whether you’re a product engineer, a product manager, or an engineering executive, you’ve probably been pushed to consider using Large Language Models (LLMs) to extend your product or enhance your processes. 2023-2024 is an interesting era for LLM adoption, where these capabilities have transitioned into the mainstream, with many companies worrying that they’re falling behind despite the fact that most integrations appear superficial.&lt;/p&gt;
&lt;p&gt;That context makes LLM adoption a great topic for a strategy case study. This document is an engineering strategy document determining how a hypothetical company, Theoretical Ride Sharing, could adopt LLMs.&lt;/p&gt;
&lt;p&gt;Building out the scenario a bit before diving into the strategy: Theoretical has 2,000 employees, 300 of which are software engineers. They’ve raised $400m, are doing $50m in annual revenue, and are operating in 200 cities across North America and Europe.
They are a ride sharing business, similar to Uber or Lyft. However, they’ve innovated by using larger vehicles—essentially reinventing public transit.&lt;/p&gt;
&lt;h2 id="reading-this-document"&gt;Reading this document&lt;/h2&gt;
&lt;p&gt;To apply this strategy, start at the top with &lt;em&gt;Policy&lt;/em&gt;. To understand the thinking behind this strategy, read sections in reserve order, starting with &lt;em&gt;Explore&lt;/em&gt;, then &lt;em&gt;Diagnose&lt;/em&gt; and so on.
Relative to the default structure, this document has been refactored in two ways
to improve readability:
first, &lt;em&gt;Operation&lt;/em&gt; has been folded into &lt;em&gt;Policy&lt;/em&gt;;
second, &lt;em&gt;Refine&lt;/em&gt; has been embedded in &lt;em&gt;Diagnose&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;More detail on this structure in &lt;a href="https://lethain.com/readable-engineering-strategy-documents"&gt;Making a readable Engineering Strategy document&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="policy"&gt;Policy&lt;/h2&gt;
&lt;p&gt;Our combined policy for using LLMs at Theoretical Ride Sharing are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Develop an LLM-backed process for reactivating departed and suspended drivers in mature markets.&lt;/strong&gt;
Through modeling &lt;a href="https://craftingengstrategy.com/llm-onboarding-model/"&gt;our driver lifecycle&lt;/a&gt;, we determined that improving onboarding time
will have little impact on the total number of active drivers. Instead, we are focusing on mechanisms to reactivate
departed and suspended drivers, which is the only opportunity to meaningfully impact active drivers.&lt;/p&gt;
&lt;p&gt;Report on progress monthly in &lt;em&gt;Exec Weekly Meeting&lt;/em&gt;, coordinated in #exec-weekly&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Start with Anthropic.&lt;/strong&gt;
We use Anthropic models, which are available through our existing cloud provider via &lt;a href="https://aws.amazon.com/bedrock/"&gt;AWS Bedrock&lt;/a&gt;. To avoid maintaining multiple implementations, where we view the underlying foundational model quality to be somewhat undifferentiated, we are not looking to adopt a broad set of LLMs at this point.
This is anchored in our &lt;a href="https://craftingengstrategy.com/wardley-llm-ecosystem/"&gt;Wardley map of the LLM ecosystem&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Exceptions will be reviewed by the &lt;em&gt;Machine Learning Review&lt;/em&gt; in #ml-review&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Developer experience team (DX) must offer at least one LLM-backed developer productivity tool.&lt;/strong&gt;
This tool should enhance the experience, speed, or quality of writing software in TypeScript. This tool should help us develop our thinking for next year, such that we have conviction increasing (or decreasing!) our investment. This tool should be available to all engineers. Adopting one tool is the required baseline, if DX identifies further interesting tools, e.g. Github Copilot, they are empowered to bring the request to the &lt;em&gt;Engineering Exec&lt;/em&gt; team for review. Review will focus on balancing our rate of learning, vendor cost, and data security.
We&amp;rsquo;ve &lt;a href="https://craftingengstrategy.com/llm-adoption-model/"&gt;modeled options for measuring LLMs&amp;rsquo; impact on developer experience&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Vendor approvals to be reviewed in #cto&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Internal Tools team (INT) must offer at least one LLM-backed ad-hoc prompting tool.&lt;/strong&gt;
This tool should support arbitrary non-engineering use cases for LLMs, such as text extraction, rewriting notes, and so on. It must be usable with customer data while also honoring our existing data processing commitments. This tool should be available to all employees.&lt;/p&gt;
&lt;p&gt;Vendor approvals to be reviewed in #coo&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Refresh policy in six months.&lt;/strong&gt;
Our primary goal is to quickly learn about this unfamiliar domain where we have limited internal expertise,
then review whether we should increase our investment afterwards.&lt;/p&gt;
&lt;p&gt;Flag questions and suggestions in #cto&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="diagnose"&gt;Diagnose&lt;/h2&gt;
&lt;p&gt;Here’s a summary of the challenges we face in adopting LLMs at Theoretical Ride Sharing:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;There are, at minimum, &lt;strong&gt;three distinct needs&lt;/strong&gt; that folks internally are asking us to solve (either separately or with a shared solution):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;em&gt;productivity tools for non-engineers&lt;/em&gt;, e.g. ad-hoc document rewriting, document summarization&lt;/li&gt;
&lt;li&gt;&lt;em&gt;productivity tools for engineers&lt;/em&gt;, e.g. advanced autocomplete tooling like Github Copilot&lt;/li&gt;
&lt;li&gt;&lt;em&gt;product extensions&lt;/em&gt;, e.g. high-quality document extraction in driver onboarding workflows&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Of the above, &lt;strong&gt;we see product extensions are potential strategic differentiation&lt;/strong&gt;, and the other two as workflow optimizations that improve our productivity but don’t necessarily differentiate us from the broader industry. Some of the opportunities for strategic differentiation we see are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;em&gt;Reactivating the departed and suspended drivers&lt;/em&gt; is our largest lever to increasing active drivers,
as explored in our &lt;a href="https://craftingengstrategy.com/llm-onboarding-model/"&gt;model of the driver lifecycle&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Faster driver onboarding&lt;/em&gt; with less human involvement will not increase active drivers, but we see a clear opportunity for LLMs to reduce operating
costs, which may be worthwhile even if it doesn&amp;rsquo;t address the core problem of active drivers&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Improved customer support&lt;/em&gt; by increasing the response speed and quality of our responses to customer inquiries&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;We currently have limited experience or expertise in using LLMs in the company and in the industry.&lt;/strong&gt; Prolific thought leadership to the contrary, there are very few companies or products using LLMs in scaled, differentiated ways. That’s currently true for us as well&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;We want to develop our expertise without making an irreversible commitment.&lt;/strong&gt; We think that our internal expertise is a limiter for effective problem selection and utilization of LLMs, and that developing our expertise will help us become more effective in iterative future decisions on this topic. Conversely, we believe that making a major investment now, prior to developing our in-house expertise, would be relatively high risk and low reward given no other industry players appear to have identified a meaningful advantage at this point&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Switching across foundational models and foundational model providers is cheap&lt;/strong&gt;. This is true both economically (low financial commitment) and from an integration cost perspective (APIs and usage is largely consistent across providers)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Foundational models and providers are evolving rapidly, and it’s unclear how the space will evolve.&lt;/strong&gt; It’s likely that current foundational model providers will train one or two additional generations of foundational models with larger datasets, but at some point they will become cost prohibitive to train (e.g. the next major version of OpenAI or Anthropic models seem likely to cost $500m+ to train). Differentiation might move into developer-experience at that point. Open source models like LLaMa might become significantly cost-advantaged. Or something else entirely. The future is wide open.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ve built a Wardley map to understand the &lt;a href="https://craftingengstrategy.com/wardley-llm-ecosystem/"&gt;possible evolution of the foundational model ecosystem&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Training a foundational model is prohibitively expensive for our needs.&lt;/strong&gt; We’ve raised $400m, and training a competitive foundational model would cost somewhere between $3m to $100m to match the general models provided by Anthropic or OpenAI&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="explore"&gt;Explore&lt;/h2&gt;
&lt;p&gt;Large Language Models operate on top of a foundational model. Training these foundational models is exceptionally expensive, and growing more expensive over time as competition for more sophisticated models accelerates. &lt;a href="https://www.cnbc.com/2023/10/16/metas-open-source-approach-to-ai-puzzles-wall-street-techies-love-it.html"&gt;Meta allegedly spent $20-30m training LLaMa 2&lt;/a&gt;, up from about $3m training costs for LLaMa 1. OpenAI’s GPT-4 &lt;a href="https://www.wired.com/story/openai-ceo-sam-altman-the-age-of-giant-ai-models-is-already-over/"&gt;allegedly cost $100m to train&lt;/a&gt;. With some nuance related to the quality of corpus and its relevance to the task at hand, &lt;a href="https://arxiv.org/abs/2309.16583"&gt;larger models outperform smaller models&lt;/a&gt;, so there’s not much incentive to train a smaller foundational model unless you have a large, unique dataset to train against, and even in that case you might be better off fine-tuning or in-context learning (ICL).&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.anthropic.com/api"&gt;Anthropic charges&lt;/a&gt; between $0.25 and $15 per million tokens of input, and a bit more for output tokens. &lt;a href="https://openai.com/api/pricing"&gt;OpenAI charges&lt;/a&gt; between $0.50 and $60 per million tokens of input, and a bit more for output tokens. The average English word is about 1.3 tokens, which means you can do a significant amount of LLM work while spending less than most venture funded startups spend on snacks.&lt;/p&gt;
&lt;p&gt;There’s &lt;a href="https://garymarcus.substack.com/p/evidence-that-llms-are-reaching-a"&gt;significant debate on whether LLMs have reached a point where their performance improvements will slow&lt;/a&gt;. Much like the ongoing debate around whether Moore’s Law has died, it’s unclear how much LLM performance will improving going forward. From a cost to train perspective, it’s unlikely that companies can continue to improve foundational models merely by spending more money on compute. Few companies can tolerate a $1B training cost, and fewer will tolerate a $10B training cost, but it’s hard to imagine a world where any companies are building $100B models. However, algorithmic improvements and investment in datasets may well drive improvements without driving up compute costs. The only high confidence prediction you can make in this space is that it’s likely model improvement will double one or two more times over the next 3 years, after which it &lt;em&gt;might&lt;/em&gt; continue doubling at that rate or it &lt;em&gt;might&lt;/em&gt; plateau at that level of performance: either outcome is plausible.&lt;/p&gt;
&lt;p&gt;For some decisions, there’s a strategic imperative to get it right from the beginning. For example, migrating from AWS to Azure is very expensive due to the degree of customization and lock-in. However, LLMs don’t appear to be in this category. Talking with industry peers, the majority of companies are experimenting with a variety of models from Anthropic, OpenAI and elsewhere (e.g. &lt;a href="https://mistral.ai/"&gt;Mistral&lt;/a&gt;). Behaviors do vary across models, but it’s also true that behavior of existing models varies over time (e.g. &lt;a href="https://arstechnica.com/information-technology/2023/12/is-chatgpt-becoming-lazier-because-its-december-people-run-tests-to-find-out/"&gt;GPT 3.5 allegedly got “lazier” over time&lt;/a&gt;), which means the overhead of dealing with model differences is unavoidable even if you only adopt one.
Vendor lock-in for models is low from a technical perspective.
However, regulatory requirements&amp;ndash;like updating Data Processing Agreements&amp;ndash;introduce some friction when switching providers.&lt;/p&gt;
&lt;p&gt;Although there’s an ongoing investment boom in artificial intelligence, most scaled technology companies are still looking for ways to leverage these capabilities beyond the obvious, widespread practices like adopting &lt;a href="https://github.com/features/copilot"&gt;Github Copilot&lt;/a&gt;. For example, &lt;a href="https://podcasts.apple.com/us/podcast/build-ai-products-at-on-ai-companies-with-emily/id1668002688?i=1000644619725"&gt;Stripe is investing heavily in LLMs for internal productivity&lt;/a&gt;, including presumably relying on them to perform some internal tasks that would have previously been performed by an employee such as verifying a company’s website matches details the company supplied in their onboarding application, but it’s less clear that they have yet found an approach to meaningfully shift their product, or their product’s user experience, using LLMs.&lt;/p&gt;
&lt;p&gt;Looking at ridesharing companies more specifically, there don’t appear to be any breakout industry-specific approaches either. Uber is similarly adopting LLMs for internal productivity, and some operational efficiency improvements as documented in their &lt;a href="https://www.uber.com/blog/the-transformative-power-of-generative-ai/"&gt;August, 2023 post describing their internal developer and operations productivity investments using LLMs&lt;/a&gt; and &lt;a href="https://www.uber.com/blog/from-predictive-to-generative-ai/"&gt;May, 2024 post describing those efforts in more detail&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Introduction to case studies</title><link>https://craftingengstrategy.com/strategies-intro/</link><pubDate>Sat, 04 May 2024 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/strategies-intro/</guid><description>&lt;p&gt;This book&amp;rsquo;s &lt;a href="https://craftingengstrategy.com/intro/"&gt;Introduction&lt;/a&gt; started with a commitment to grounding its approach in concrete case studies.
In this section, we&amp;rsquo;re living up to that commitment by presenting ten real-world strategies I&amp;rsquo;ve directly worked on or observed.
These strategies take the somewhat abstract concepts we&amp;rsquo;ve covered thus far and materialize them into concrete ideas,
hopefully making them easier to grasp and easier for you to apply.&lt;/p&gt;
&lt;p&gt;The first five strategies are selected to show a varied mix of &lt;a href="https://craftingengstrategy.com/refine/"&gt;refinement techniques&lt;/a&gt;
and &lt;a href="https://craftingengstrategy.com/operations/"&gt;operational mechanisms&lt;/a&gt;.
The next five strategies are organized by the companies in which they were implemented.
If you work through these case studies and find yourself wanting more,
the &lt;a href="https://craftingengstrategy.com/additional-resources/"&gt;Strategy Resources Appendix&lt;/a&gt; includes suggestions
for further study.&lt;/p&gt;</description></item><item><title>Introduction to refinement tools</title><link>https://craftingengstrategy.com/refinement-intro/</link><pubDate>Sat, 04 May 2024 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/refinement-intro/</guid><description>&lt;p&gt;Perhaps the most important piece of the &lt;a href="https://craftingengstrategy.com/strategy-steps/"&gt;Steps to build an engineering strategy&lt;/a&gt;
is strategy refinement. As we already worked through the &lt;a href="https://craftingengstrategy.com/refine/"&gt;overview of strategy refinement&lt;/a&gt;
in the &amp;ldquo;Steps&amp;rdquo; section of this book,
the goal of the &amp;ldquo;Refinement&amp;rdquo; section is to go into much greater detail about the three
core mapping techniques:
&lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt;,
&lt;a href="https://craftingengstrategy.com/systems-modeling/"&gt;systems modeling&lt;/a&gt;,
and &lt;a href="https://craftingengstrategy.com/wardley-mapping/"&gt;Wardley mapping&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As we work through them, keep in mind that there are many other techniques out there,
such as the many covered in Eben Hewitt&amp;rsquo;s &lt;em&gt;&lt;a href="https://lethain.com/notes-on-the-technology-strategy-patterns/"&gt;Technology Strategy Patterns&lt;/a&gt;&lt;/em&gt;.
This section covers those that I&amp;rsquo;ve found most useful, and you can find breadcrumbs to
those preferred by others in the &lt;a href="https://craftingengstrategy.com/additional-resources/"&gt;appendix on strategy resources&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;With that said, it&amp;rsquo;s time to start drilling into &lt;a href="https://craftingengstrategy.com/strategy-testing/"&gt;strategy testing&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Introduction</title><link>https://craftingengstrategy.com/intro/</link><pubDate>Sat, 04 May 2024 04:00:00 -0700</pubDate><guid>https://craftingengstrategy.com/intro/</guid><description>&lt;p&gt;I&amp;rsquo;ve worked alongside many talented people who spent years waiting for a chance to finally &amp;ldquo;do strategy.&amp;rdquo;
My hope is this book convinces you—and maybe them—that waiting is optional.
Strategy isn’t reserved for executives.
It&amp;rsquo;s the art of making thoughtful decisions, and is accessible to everyone&amp;ndash;including you.&lt;/p&gt;
&lt;p&gt;Even if you&amp;rsquo;d prefer to avoid strategy, it&amp;rsquo;s still happening all around you.
My first big dose of strategy came managing the team responsible for
&lt;a href="https://craftingengstrategy.com/uber-strategy/"&gt;Uber&amp;rsquo;s service migration&lt;/a&gt;,
where we desperately tried to survive accelerating inbound requests for support.
Since then, I&amp;rsquo;ve seen strategy everywhere I worked, from
&lt;a href="https://craftingengstrategy.com/index-acquisition-strategy/"&gt;Stripe&amp;rsquo;s acquisition of Index&lt;/a&gt; to
&lt;a href="https://craftingengstrategy.com/product-eng-strategy/"&gt;Calm&amp;rsquo;s focusing on being a product engineering company&lt;/a&gt;.
There are even some strategy problems that I&amp;rsquo;ve encountered again and again at every company I&amp;rsquo;ve joined,
such as &lt;a href="https://craftingengstrategy.com/monolith-decomposition-strategy/"&gt;deciding how to decompose monolithic codebases&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This book is focused on engineering strategy.
In other words, making thoughtful decisions about engineering.
&amp;ldquo;Engineering&amp;rdquo; is defined both as the discipline of writing software,
and also concerns of the Engineering organization within your company.
If this seems like a hopelessly broad topic, then we agree on the scope of my definition.
However, I would never agree it&amp;rsquo;s hopeless.&lt;/p&gt;
&lt;p&gt;My decision making has significantly improved over the course of my career.
I believe very strongly that my improvement had very little to do
with &lt;em&gt;me&lt;/em&gt;, and a lot to do with learning to engage in structured thinking.
I also believe the lessons that I learned slowly are eminently teachable
in the next couple hundred pages.&lt;/p&gt;
&lt;h2 id="grounded-in-my-direct-experience"&gt;Grounded in my direct experience&lt;/h2&gt;
&lt;p&gt;Strategy is a broad topic, and many strategy books become awkwardly abstract.
To avoid falling into that trap, I&amp;rsquo;ve anchored this book in my personal experiences doing
strategy and the strategy work of colleagues that I had the opportunity to witness directly.&lt;/p&gt;
&lt;p&gt;As much as possible, I&amp;rsquo;ve used examples that I worked on in real companies,
and I&amp;rsquo;ve mentioned those companies by name. That&amp;rsquo;s true for more than half the strategies included in this book,
which describe strategies I collaborated on during my time at Stripe, Uber, and Calm.
For the other half of the strategies, I have abstracted away from specific companies because they are sensitive topics
such as &lt;a href="https://craftingengstrategy.com/private-equity-strategy/"&gt;how to work with private equity ownership&lt;/a&gt;,
or expose internal information better kept private as in
&lt;a href="https://craftingengstrategy.com/user-data-strategy/"&gt;how to manage access to customer data&lt;/a&gt;.
In both sorts of examples, I&amp;rsquo;ve worked hard to remain honest, even when I&amp;rsquo;ve had to omit some details,
out of respect for the companies and individuals involved.&lt;/p&gt;
&lt;p&gt;You&amp;rsquo;ll also notice that I&amp;rsquo;ve tried to be positive about all of these strategies.
If it seems that I&amp;rsquo;ve been too positive, it&amp;rsquo;s because all strategies age.
Even the best strategies eventually turn sour.
It&amp;rsquo;s most interesting to understand strategies in the context they were originally conceived.
Of course, evaluation matters too, which we&amp;rsquo;ll cover in the chapter on &lt;a href="https://craftingengstrategy.com/evaluating-strategy/"&gt;evaluating strategy quality&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="adapting-rumelt-for-engineering"&gt;Adapting Rumelt for engineering&lt;/h2&gt;
&lt;p&gt;In addition to my own experience, the second largest influence on this book is
Richard Rumelt’s &lt;em&gt;&lt;a href="https://www.amazon.com/dp/B004J4WKEC"&gt;Good Strategy, Bad Strategy&lt;/a&gt;&lt;/em&gt;.
It&amp;rsquo;s a quick read, and was a life-changing discovery for me.
Rumelt describes three pillars of effective strategy:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;em&gt;Diagnosis&lt;/em&gt; - a theory describing the nature of the challenge. This involves identifying the root cause(s) at play, for example “high work-in-progress is preventing us from finishing any tasks, so we are increasingly behind each sprint” might be a good diagnosis&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Guiding policy&lt;/em&gt; - a series of general policies which will be applied to grapple with the challenge. Guiding policies are typically going to be implicit or explicit tradeoffs. For example, a guiding policy might be “only hire for most urgent team, do not spread hires across all teams.” If a guiding policy doesn’t imply a tradeoff, you should be suspicious of it. “Working harder to get it done” isn’t really a guiding policy, the relevant guiding policy there might be “work folks hard and expect high attrition”&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Coherent actions&lt;/em&gt; - a set of specific actions directed by guiding policy to address the challenge. This is the most important part, and I think the most exciting part, because it clarifies that a strategy is only meaningful if it leads to aligned action&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first time I read this definition was eye-opening: it answered so many strategy questions I&amp;rsquo;d had for such a long time.
However, although I was grateful to Rumelt for giving me my first framework for thinking about strategy,
I continued noticing how little deliberate strategy existed in the engineering organizations I&amp;rsquo;d joined.&lt;/p&gt;
&lt;p&gt;Eventually, I recognized that if Rumelt&amp;rsquo;s work was trivial to apply to engineering,
we&amp;rsquo;d see a lot more disciplined engineering strategy in practice.
We&amp;rsquo;d also, one hopes, see fewer obviously flawed engineering strategies.
This book is the culmination of my past decade spent understanding how to
adapt Rumelt&amp;rsquo;s approach to something that not only &lt;em&gt;could&lt;/em&gt; work,
but concretely &lt;em&gt;has&lt;/em&gt; worked in the organizations that I&amp;rsquo;ve joined.&lt;/p&gt;
&lt;h2 id="iterative-intellectual-and-mechanical"&gt;Iterative, intellectual &lt;em&gt;and&lt;/em&gt; mechanical&lt;/h2&gt;
&lt;p&gt;In addition to anchoring in my personal experience and building on Richard Rumelt&amp;rsquo;s approach,
there are three characteristics that underpin this book&amp;rsquo;s approach:
being iterative, and embracing both the intellectual and the mechanical aspects of strategy.&lt;/p&gt;
&lt;p&gt;Even my proudest strategy work has eventually become obsolete.
For some time, I was embarrassed by this realization.
Eventually, I came to recognize that entropy is natural in strategy work;
good strategy embraces change rather than fights it.
This solidified into the concept of &lt;a href="https://craftingengstrategy.com/refine/"&gt;strategy refinement&lt;/a&gt;,
where ideas are deliberately validated and improved rather than treated as immutable.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;ve ever participated in an executive hiring loop, you&amp;rsquo;ve probably interviewed
someone who described strategic thinking as a personal strength.
Those candidates often draw a distinction between directing how work should be done,
and being in the weeds of doing the work itself.
It happens enough that you start to appreciate that
many people view strategy as a fundamentally intellectual endeavor about how things ought to work,
rather than a mechanical endeavor that studies how things actually do work in practice.&lt;/p&gt;
&lt;p&gt;While strategy does indeed have intellectual elements,
effective strategy is at least equally dependent on the mechanical nuances of reality as it is on intellectual frameworks.
Even the best &lt;a href="https://craftingengstrategy.com/policy/"&gt;policies&lt;/a&gt; will fail without attention to whether the team is actually adopting the policy&amp;rsquo;s guidance.
Similarly, very effective &lt;a href="https://craftingengstrategy.com/operations/"&gt;operational mechanisms&lt;/a&gt; to roll out a strategy
won&amp;rsquo;t help your company if the policy being rolled out is a bad one.&lt;/p&gt;
&lt;p&gt;As obvious as these ideas seem, many organizations expect strategies to manifest
perfectly into existence from the very beginning.
This book discusses how to bridge the gap between that pressing expectation of perfection
and the reality that effective strategy
development is grounded in iterative work that is both intellectual and mechanical.&lt;/p&gt;
&lt;h2 id="this-books-ambition"&gt;This book&amp;rsquo;s ambition&lt;/h2&gt;
&lt;p&gt;As I&amp;rsquo;ve worked on this book, one of my lingering concerns
is that the ideas in it are perhaps simply too obvious to write down.
Each time I&amp;rsquo;ve been tempted to set the project aside, I see a new example,
or am reminded of an old experience, where some of the smartest people I&amp;rsquo;ve
ever known have struggled unsuccessfully with a strategy problem that some people
would describe as quite simple.&lt;/p&gt;
&lt;p&gt;The belief that strategy is complex often gets people in trouble.
It&amp;rsquo;s appealing to believe that strategies fail due to detailed
errors in decision-making, or the unanticipated move of an adversary.
Maybe that is common when it comes to grand strategy.
However, my experience is that engineering strategies fail for very mundane reasons.
The most common of these mundane reasons is that executives assume their
strategy will roll itself out. The second is forgetting to spend time
validating the details. Both are avoidable with a bit of structure.&lt;/p&gt;
&lt;p&gt;This book&amp;rsquo;s framework is not an attempt to discredit all other approaches. Rather, it&amp;rsquo;s a synthesis
of the various approaches I&amp;rsquo;ve encountered, along with a few dimensions that
I&amp;rsquo;ve not seen addressed in much detail elsewhere.
Even if you don&amp;rsquo;t agree with my framework, I hope it helps you refine your own framework.
Either way, our industry will be much better for it.&lt;/p&gt;</description></item><item><title>Strategy resources</title><link>https://craftingengstrategy.com/additional-resources/</link><pubDate>Tue, 21 Nov 2023 05:00:00 -0600</pubDate><guid>https://craftingengstrategy.com/additional-resources/</guid><description>&lt;p&gt;One of the hardest parts of learning about engineering strategy
is finding useful resources on a topic where so much is kept
private. This appendix highlights some of the public resources
that I&amp;rsquo;ve found valuable during my learning experience.&lt;/p&gt;
&lt;h2 id="my-prior-writing"&gt;My prior writing&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://lethain.com/eng-strategies/"&gt;Writing an engineering strategy&lt;/a&gt; is a chapter from
&lt;em&gt;&lt;a href="https://lethain.com/eng-execs-primer/"&gt;The Engineering Executive&amp;rsquo;s Primer&lt;/a&gt;&lt;/em&gt; on setting engineering strategy as an executive&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lethain.com/good-engineering-strategy-is-boring/"&gt;Write five, then synthesize&lt;/a&gt; is a chapter from
&lt;em&gt;&lt;a href="https://staffeng.com/"&gt;Staff Engineer&lt;/a&gt;&lt;/em&gt; on driving engineering strategy without executive authority
(primarily through documentation)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="books"&gt;Books&lt;/h2&gt;
&lt;p&gt;In addition to my own &lt;em&gt;&lt;a href="https://staffeng.com/"&gt;Staff Engineer&lt;/a&gt;&lt;/em&gt;
and &lt;em&gt;&lt;a href="https://lethain.com/eng-execs-primer/"&gt;The Engineering Executive&amp;rsquo;s Primer&lt;/a&gt;&lt;/em&gt;,
both of which have chapters on engineering strategy, related books that I would encourage reading are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://www.manning.com/books/architecture-modernization"&gt;Architecture Modernization&lt;/a&gt;&lt;/em&gt;
by Nick Tune with Jean-Georges Perrin &amp;ndash; covers much of the same topics as &lt;em&gt;Technology Strategy Patterns&lt;/em&gt;
and &lt;em&gt;The Value Flywheel Effect&lt;/em&gt;, but often with more recent examples and references given its later publication date&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://lethain.com/notes-on-enterprise-architecture-as-strategy/"&gt;Enterprise Architecture as Strategy&lt;/a&gt;&lt;/em&gt;
by Jeanne Ross, Peter Weill, and David Robertson &amp;mdash; an interesting read from 2006 on
the evolution of software (e.g. IT in that era&amp;rsquo;s vernacular) maturity within businesses,
and deciding among strategies for coupling and integration across business units&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://lethain.com/notes-on-the-technology-strategy-patterns/"&gt;Technology Strategy Patterns&lt;/a&gt;&lt;/em&gt; by Eben Hewitt &amp;ndash; a method-focused book
on creating and communicating engineering strategy&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://www.amazon.com/Phoenix-Project-DevOps-Helping-Business-ebook/dp/B078Y98RG8/"&gt;The Phoenix Project&lt;/a&gt;&lt;/em&gt; by Kim, Behr, and Spafford &amp;ndash; a
modern retelling of Goldratt&amp;rsquo;s &lt;em&gt;&lt;a href="https://www.amazon.com/Goal-Process-Ongoing-Improvement/dp/0884271951"&gt;The Goal&lt;/a&gt;&lt;/em&gt;,
which shows how to model and resolve problems using constraint optimization.
Previously, I would not have considered this a strategy book, but as my opinion on what strategy is
evolves (mapping plus guiding policies), I think it demonstrates a useful mapping strategy&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://lethain.com/notes-on-the-value-flywheel-effect"&gt;The Value Flywheel Effect&lt;/a&gt;&lt;/em&gt; by Anderson, McCann, and O&amp;rsquo;Reilly &amp;ndash; an introduction
to Wardley maps via an exploration of Liberty Mutual&amp;rsquo;s rationale for serverless&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://medium.com/wardleymaps/on-being-lost-2ef5f05eb1ec"&gt;Wardley Maps&lt;/a&gt;&lt;/em&gt; by Simon Wardley explains
how to use Wardley Maps to understand and improve strategy&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Additionally, I&amp;rsquo;d recommend these general books. They don&amp;rsquo;t focus on engineering strategy,
but I&amp;rsquo;ve found them quite useful:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://www.amazon.com/Good-Strategy-Bad-Difference-Matters-ebook/dp/B004J4WKEC/"&gt;Good Strategy, Bad Strategy&lt;/a&gt;&lt;/em&gt; by
Richard Rumelt &amp;ndash; the most helpful strategy book that I have ever read, because it actually provides a usable
definition of strategy&lt;/li&gt;
&lt;li&gt;*&lt;a href="https://www.amazon.com/How-Big-Things-Get-Done-ebook/dp/B0B3HS4C98/"&gt;How Big Things Get Done&lt;/a&gt; by Bent Flyvbjerg and Dan Gardner &amp;ndash; a
fascinating look at why some &lt;a href="https://en.wikipedia.org/wiki/Megaproject"&gt;megaprojects&lt;/a&gt; fail so resoundingly and why others succeed under budget and under schedule.
Connects to many related topics, such as how &lt;a href="https://lethain.com/benchmarking/"&gt;benchmarking&lt;/a&gt; can help evaluate guiding policies within a strategy&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://lethain.com/notes-on-the-crux/"&gt;The Crux&lt;/a&gt;&lt;/em&gt; by Richard Rumelt &amp;ndash; another good book by Richard Rumelt,
this one more oriented on how to create strategies and why strategy creation often fails.
(And less structurally focused on documenting strategies than &lt;em&gt;Good Strategy, Bad Strategy&lt;/em&gt;.)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://www.amazon.com/Thinking-Systems-Donella-H-Meadows-ebook/dp/B005VSRFEA/"&gt;Thinking in Systems: A Primer&lt;/a&gt;&lt;/em&gt; by
Donella Meadows &amp;ndash; a book on systems thinking, which for a long time was my sole tool for mapping things around me.
This is not a software engineering book, but provides a lens into a useful mapping mechanism that you can apply to
software and software development (&lt;a href="https://lethain.com/limiting-wip/"&gt;Why limiting work-in-progress works&lt;/a&gt; is one example of
me using systems thinking to model a software system)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="case-studies"&gt;Case studies&lt;/h2&gt;
&lt;p&gt;Every discussion of engineering strategy includes a weary remark about how few strategies are
publicly documented. Acknowledging that concern, some case studies that I&amp;rsquo;ve found helpful:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://lethain.com/magnitudes-of-exploration/"&gt;Magnitudes of exploration&lt;/a&gt; documented a public version of Stripe&amp;rsquo;s Engineering strategy&lt;/li&gt;
&lt;li&gt;&lt;em&gt;The Value Flywheel Effect&lt;/em&gt; (linked under &amp;ldquo;Books&amp;rdquo; header above) is a good case study of Liberty Mutual&amp;rsquo;s engineering strategy, and additionally includes case studies for A Cloud Guru, Workgrid, and BBC&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.intercom.com/blog/run-less-software/"&gt;Run less software&lt;/a&gt; by Rich Archbold &amp;ndash; a fantastic writeup of
a cornerstone of Intercom&amp;rsquo;s engineering strategy&lt;/li&gt;
&lt;li&gt;&lt;a href="https://slack.engineering/how-big-technical-changes-happen-at-slack/"&gt;How Big Technical Changes Happen at Slack&lt;/a&gt; by Adams and Rodgers &amp;ndash; this is not
quite Slack&amp;rsquo;s engineering strategy, but it has many components of their engineering strategy within it&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/ft-product-technology/the-difficult-teenage-years-setting-tech-strategy-after-a-launch-7f42eb94a424"&gt;The difficult teenage years: Setting tech strategy after a launch&lt;/a&gt; by Anna Shipman &amp;ndash; a look at
FT&amp;rsquo;s engineering strategy, particularly one that wasn&amp;rsquo;t &lt;em&gt;really&lt;/em&gt; defined until somewhat late in the lifecycle
(which is an extremely common occurrence, even if we don&amp;rsquo;t admit it)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A few more resources that are either case study-ish or engineering-ish, so they don&amp;rsquo;t
quite fit in the above list, but are nonetheless relevant reads:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://boringtechnology.club/"&gt;BoringTechnology.club&lt;/a&gt; by Dan McKinley &amp;ndash; a guiding principle that many engineering strategies include&lt;/li&gt;
&lt;li&gt;&lt;a href="https://handbook.gitlab.com/handbook/company/strategy/"&gt;GitLab Strategy&lt;/a&gt; &amp;ndash; ok, it&amp;rsquo;s actually the GitLab company strategy.
but given they&amp;rsquo;re a technology company that builds technology for technologists, it&amp;rsquo;s an interesting read despite being
at a slightly higher altitude than an engineering strategy&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The internet is an unruly place, and I&amp;rsquo;m sure by the time that you&amp;rsquo;re reading this,
many more excellent writeups will exist as well.&lt;/p&gt;</description></item><item><title>Appendix</title><link>https://craftingengstrategy.com/appendix/</link><pubDate>Tue, 21 Nov 2023 05:00:00 -0600</pubDate><guid>https://craftingengstrategy.com/appendix/</guid><description/></item><item><title>Engineering Strategy: Frequently Asked Questions</title><link>https://craftingengstrategy.com/faq/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://craftingengstrategy.com/faq/</guid><description>&lt;h2 id="is-engineering-strategy-a-real-thing"&gt;Is engineering strategy a real thing?&lt;/h2&gt;
&lt;p&gt;Yes, engineering strategy absolutely exists in every engineering organization. One of the key insights from the book is that there&amp;rsquo;s &lt;em&gt;always&lt;/em&gt; an engineering strategy, even if there&amp;rsquo;s nothing written down. The strategy may be implicit rather than explicit—embedded in decisions, behaviors, and organizational structures rather than documented in a comprehensive strategy document.&lt;/p&gt;
&lt;p&gt;The challenge isn&amp;rsquo;t that engineering organizations lack strategies; it&amp;rsquo;s that those strategies are often unspoken, inconsistently understood across the organization, or not deliberately created. This leads to situations where engineers claim their company has no strategy while simultaneously understanding &amp;ldquo;how things work around here.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;By documenting your strategy, you make it possible to deliberately improve it rather than having it evolve haphazardly.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/is-useful/"&gt;Is engineering strategy useful?&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="how-is-engineering-strategy-different-from-strategy-in-general"&gt;How is engineering strategy different from strategy in general?&lt;/h2&gt;
&lt;p&gt;Engineering strategy applies general strategic principles to the specific domain of software engineering. The book adapts Richard Rumelt&amp;rsquo;s framework from &lt;em&gt;Good Strategy, Bad Strategy&lt;/em&gt; (diagnosis, guiding policy, coherent actions) to focus on engineering contexts, challenges, and decisions.&lt;/p&gt;
&lt;p&gt;What makes engineering strategy distinct is its focus on technical decisions like architecture choices, technology adoption, quality standards, and engineering processes—all within the context of delivering value through software. Engineering strategy deals with topics like monolith decomposition, managing technical debt, adopting new technologies, API design and lifecycle, and optimizing developer experience.&lt;/p&gt;
&lt;p&gt;While general business strategy might focus on market positioning, competitive advantage, and business models, engineering strategy focuses on how to organize and execute technical work effectively in service of those broader business goals.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/intro/"&gt;Introduction&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="what-are-examples-of-engineering-strategy"&gt;What are examples of engineering strategy?&lt;/h2&gt;
&lt;p&gt;The book provides numerous real-world engineering strategy examples including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Uber&amp;rsquo;s service migration strategy&lt;/strong&gt; (2014): How a small infrastructure team successfully migrated 1,000+ services to a new service platform by focusing on self-service tooling&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Stripe&amp;rsquo;s Sorbet strategy&lt;/strong&gt; (2017): The decision to build a custom static type checker for Ruby rather than decomposing their monolith&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Calm&amp;rsquo;s product engineering strategy&lt;/strong&gt;: Focusing on product development over infrastructure innovation with policies like writing all code in the monolith&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Stripe&amp;rsquo;s API deprecation strategy&lt;/strong&gt;: Never deprecating APIs without an unavoidable requirement to do so&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Private equity ownership strategy&lt;/strong&gt;: How to manage engineering costs and team structure after acquisition&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monolith decomposition strategy&lt;/strong&gt;: Deciding whether and how to break up a monolithic codebase&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Large Language Model adoption strategy&lt;/strong&gt;: A structured approach to introducing LLMs into a company&amp;rsquo;s products and processes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These case studies demonstrate how engineering strategies address specific technical and organizational challenges within different contexts.&lt;/p&gt;
&lt;p&gt;Learn more in the &lt;a href="https://craftingengstrategy.com/strategies-intro/"&gt;Case Studies&lt;/a&gt; section&lt;/p&gt;
&lt;h2 id="what-template-should-i-use-to-create-an-engineering-strategy"&gt;What template should I use to create an engineering strategy?&lt;/h2&gt;
&lt;p&gt;The book recommends a template with five key components:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Explore&lt;/strong&gt;: Research how others have approached similar problems&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Diagnose&lt;/strong&gt;: Define the specific challenges you&amp;rsquo;re trying to solve&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Refine&lt;/strong&gt;: Test and validate your understanding with tools like systems modeling&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Policy&lt;/strong&gt;: Make concrete decisions and tradeoffs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Operations&lt;/strong&gt;: Establish mechanisms to implement and enforce your strategy&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;However, this template is meant for &lt;em&gt;creating&lt;/em&gt; a strategy. When &lt;em&gt;presenting&lt;/em&gt; a strategy, it&amp;rsquo;s often better to invert the structure (Policy first, then Operations, Refine, Diagnose, Explore) to make it more readable.&lt;/p&gt;
&lt;p&gt;The book emphasizes that this template isn&amp;rsquo;t sacrosanct—you should adapt it to your needs. What matters is that you&amp;rsquo;re deliberately covering each component, even if you merge or reorder sections.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/strategy-steps/"&gt;Steps to build an engineering strategy&lt;/a&gt; and &lt;a href="https://craftingengstrategy.com/readable-strategy/"&gt;Making engineering strategies more readable&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="can-engineers-do-engineering-strategy-is-engineering-strategy-only-for-executives"&gt;Can engineers do engineering strategy? Is engineering strategy only for executives?&lt;/h2&gt;
&lt;p&gt;Engineering strategy is absolutely not limited to executives. The book explicitly states that strategy is accessible to everyone in an engineering organization, though the approaches differ based on role and authority.&lt;/p&gt;
&lt;p&gt;Individual engineers can use techniques like &amp;ldquo;Take five, then synthesize&amp;rdquo; (documenting how similar decisions have been made, then synthesizing a policy) or &amp;ldquo;model, document, and share&amp;rdquo; (demonstrating a better approach through example). These bottom-up methods can be highly effective even without executive authority.&lt;/p&gt;
&lt;p&gt;While executives can mandate adoption of strategies, they often lack the detailed context that engineers have. The most effective strategies typically come from collaboration across multiple levels of the organization, combining executive support with deep technical understanding.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/who-does-strategy/"&gt;Who gets to do strategy?&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="how-to-get-better-at-engineering-strategy"&gt;How to get better at engineering strategy?&lt;/h2&gt;
&lt;p&gt;To improve your engineering strategy skills, the book recommends:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Study existing strategies&lt;/strong&gt;: Read public resources on engineering blogs, ask peers about their experiences, and build a collection of strategies to learn from&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Join or create a learning circle&lt;/strong&gt;: Form a community with peers to discuss strategy challenges and approaches&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Evaluate strategies using a structured rubric&lt;/strong&gt;: Analyze how quickly strategies are refined, how expensive refinement is, and how well they solve their diagnosed problems&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Practice at appropriate altitude&lt;/strong&gt;: If you can&amp;rsquo;t work on organization-wide strategies, focus on team-level or personal strategies&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Track your progress&lt;/strong&gt;: Maintain a record of strategies you&amp;rsquo;ve implemented, refined, or studied, and review it quarterly&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Seek feedback&lt;/strong&gt;: Review your strategy work with trusted peers who can provide honest assessment&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The key is consistent practice—even if it&amp;rsquo;s just one strategy every six months—coupled with deliberate learning from each attempt.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/getting-better/"&gt;How to get better at strategy?&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="are-there-jobs-in-engineering-strategy"&gt;Are there jobs in engineering strategy?&lt;/h2&gt;
&lt;p&gt;While &amp;ldquo;Strategy Engineer&amp;rdquo; isn&amp;rsquo;t typically a standalone job title, engineering strategy work is embedded in many roles throughout engineering organizations. Senior and Staff-plus engineers, engineering managers, technical program managers, and engineering executives all regularly engage in strategy work as part of their responsibilities.&lt;/p&gt;
&lt;p&gt;Some larger organizations do have dedicated teams focused on engineering strategy, often under names like &amp;ldquo;Engineering Effectiveness,&amp;rdquo; &amp;ldquo;Developer Experience,&amp;rdquo; or &amp;ldquo;Technical Strategy.&amp;rdquo; These teams help shape and implement strategies that affect the broader engineering organization.&lt;/p&gt;
&lt;p&gt;Even without a dedicated strategy role, you can incorporate strategy work into your current position. The book emphasizes that everyone in engineering can contribute to strategy, and developing this skill can be valuable for career advancement toward senior technical leadership roles.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/who-does-strategy/"&gt;Who gets to do strategy?&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="what-are-other-engineering-strategy-resources"&gt;What are other engineering strategy resources?&lt;/h2&gt;
&lt;p&gt;The book recommends several resources for learning more about engineering strategy:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Books:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Good Strategy, Bad Strategy&lt;/em&gt; by Richard Rumelt&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Technology Strategy Patterns&lt;/em&gt; by Eben Hewitt&lt;/li&gt;
&lt;li&gt;&lt;em&gt;The Value Flywheel Effect&lt;/em&gt; by Anderson, McCann, and O&amp;rsquo;Reilly&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Architecture Modernization&lt;/em&gt; by Nick Tune with Jean-Georges Perrin&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Thinking in Systems: A Primer&lt;/em&gt; by Donella Meadows&lt;/li&gt;
&lt;li&gt;&lt;em&gt;How Big Things Get Done&lt;/em&gt; by Bent Flyvbjerg and Dan Gardner&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Articles and case studies:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;Magnitudes of exploration&amp;rdquo; (Stripe&amp;rsquo;s Engineering strategy)&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Run less software&amp;rdquo; by Rich Archbold (Intercom)&lt;/li&gt;
&lt;li&gt;&amp;ldquo;How Big Technical Changes Happen at Slack&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;The difficult teenage years: Setting tech strategy after a launch&amp;rdquo; (Financial Times)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Other resources:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;BoringTechnology.club by Dan McKinley&lt;/li&gt;
&lt;li&gt;Wardley Maps by Simon Wardley&lt;/li&gt;
&lt;li&gt;Public engineering blogs from companies like Slack, Stripe, and Netflix&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/additional-resources/"&gt;Strategy resources&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="what-are-the-five-steps-to-build-an-engineering-strategy"&gt;What are the five steps to build an engineering strategy?&lt;/h2&gt;
&lt;p&gt;The book outlines five essential steps for building an effective engineering strategy:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Explore&lt;/strong&gt;: Research how other organizations have approached similar challenges. This includes reading industry literature, speaking with peers at other companies, and investigating existing solutions within your own organization. This step helps prevent early anchoring on one approach.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Diagnose&lt;/strong&gt;: Clearly define the problem you&amp;rsquo;re trying to solve before jumping to solutions. This includes understanding the technical, social, and business constraints you&amp;rsquo;re operating within. A good diagnosis forms the foundation for effective policy.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Refine&lt;/strong&gt;: Use techniques like systems modeling, Wardley mapping, or strategy testing to validate and improve your understanding of the problem. This step helps identify which elements of your strategy are most important.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Policy&lt;/strong&gt;: Make concrete decisions and tradeoffs that address the diagnosed challenges. These can range from specific technical approaches to organizational processes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Operations&lt;/strong&gt;: Create mechanisms to implement and enforce your strategy, ensuring it doesn&amp;rsquo;t just remain a document but becomes an active force in your organization.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/strategy-steps/"&gt;Steps to build an engineering strategy&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="how-do-i-evaluate-if-a-strategy-is-any-good"&gt;How do I evaluate if a strategy is any good?&lt;/h2&gt;
&lt;p&gt;The book provides a three-part rubric for evaluating strategy quality:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;How quickly is the strategy refined?&lt;/strong&gt; Good strategies evolve based on new information and changing circumstances. A strategy that improves quickly is better than one that remains static despite evidence it&amp;rsquo;s not working.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;How expensive is the strategy&amp;rsquo;s refinement?&lt;/strong&gt; Effective strategies can be validated and improved cheaply. If testing a strategy requires massive investment before you know if it works, that&amp;rsquo;s a red flag.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;How well does the current iteration solve its diagnosis?&lt;/strong&gt; Ultimately, a strategy must address the problems it set out to solve.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The book also emphasizes the concept of strategy phases. As a strategy is implemented, new information emerges that may invalidate parts of the original diagnosis. Good strategists recognize these phase transitions and adjust accordingly.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s worth noting that it&amp;rsquo;s very difficult to accurately evaluate other companies&amp;rsquo; strategies from the outside because you&amp;rsquo;re missing critical context about their circumstances and constraints.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/evaluating-strategy/"&gt;Is this strategy any good?&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="what-is-strategy-refinement-and-why-does-it-matter"&gt;What is strategy refinement and why does it matter?&lt;/h2&gt;
&lt;p&gt;Strategy refinement is the process of validating and improving your strategy through deliberate testing and feedback. It&amp;rsquo;s perhaps the most critical yet often neglected aspect of engineering strategy.&lt;/p&gt;
&lt;p&gt;The book describes refinement as &amp;ldquo;a toolkit of methods to identify which parts of your diagnosis are most important, and verify that your approach to solving the diagnosis actually works.&amp;rdquo; Three key refinement techniques covered are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Strategy testing&lt;/strong&gt;: Identifying the narrowest slice of your strategy to implement, then iterating until you have evidence it works&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Systems modeling&lt;/strong&gt;: Creating models to understand complex systems and identify leverage points&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Wardley mapping&lt;/strong&gt;: Plotting the evolution of capabilities to understand how the ecosystem is changing&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Refinement matters because even the best initial strategy is based on incomplete information and untested assumptions. Without refinement, strategies frequently fail due to unexpected obstacles or changing circumstances.&lt;/p&gt;
&lt;p&gt;Effective strategy isn&amp;rsquo;t a one-time exercise but an iterative process. Organizations that skip refinement often push forward with flawed approaches, leading to expensive failures that could have been avoided with early, cheap tests.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/refine/"&gt;Refining strategy&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="how-do-i-navigate-ambiguity-and-uncertainty-in-strategy"&gt;How do I navigate ambiguity and uncertainty in strategy?&lt;/h2&gt;
&lt;p&gt;Ambiguity and uncertainty are inherent in strategy work, and the book provides several approaches to navigate them effectively:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Accept ambiguity as part of the diagnosis&lt;/strong&gt;: Rather than getting blocked by missing information, acknowledge it explicitly in your strategy. For example, the private equity strategy acknowledges uncertainty about reduction targets but still provides guidance on what to do in the meantime.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Use refinement techniques&lt;/strong&gt;: Systems modeling, Wardley mapping, and strategy testing can help reduce uncertainty by providing structured ways to explore complex problems and test assumptions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Don&amp;rsquo;t wait for missing strategies&lt;/strong&gt;: If you&amp;rsquo;re waiting for strategies from other teams or executives before creating your own, you&amp;rsquo;ll never make progress. Instead, incorporate the absence of those strategies into your diagnosis and move forward.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Adopt provisional policies&lt;/strong&gt;: When perfect information isn&amp;rsquo;t available, create policies that work with what you know and explicitly plan to revisit them when more information becomes available.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Focus on operational mechanisms&lt;/strong&gt;: In highly ambiguous situations, focus on creating mechanisms that help you learn faster, rather than trying to get the perfect policy immediately.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The book emphasizes that strategy is iterative, and uncertainty isn&amp;rsquo;t a reason to avoid strategy—it&amp;rsquo;s precisely why strategy is valuable.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/policy/"&gt;Setting policy&lt;/a&gt; and &lt;a href="https://craftingengstrategy.com/theory-and-practice/"&gt;Bridging theory and practice&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="whats-the-difference-between-policy-and-operations-in-strategy"&gt;What&amp;rsquo;s the difference between policy and operations in strategy?&lt;/h2&gt;
&lt;p&gt;Policy and operations are two distinct but complementary components of strategy:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Policy&lt;/strong&gt; is the set of decisions, tradeoffs, and approaches that address your diagnosed challenges. Policies can take several forms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Approvals: Defining processes for making recurring decisions&lt;/li&gt;
&lt;li&gt;Allocations: Determining how to split resources across investments&lt;/li&gt;
&lt;li&gt;Direction: Providing explicit instructions on how decisions must be made&lt;/li&gt;
&lt;li&gt;Guidance: Offering recommendations on how decisions should be made&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Operations&lt;/strong&gt; are the concrete mechanisms that implement and enforce your policies. These include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Approval forums: Committees or individuals who review exceptions&lt;/li&gt;
&lt;li&gt;Inspection mechanisms: Ways to check if policies are being followed&lt;/li&gt;
&lt;li&gt;Nudges: Context-aware reminders that guide better decisions&lt;/li&gt;
&lt;li&gt;Documentation: Clear guidance on how to follow policies&lt;/li&gt;
&lt;li&gt;Automation: Technical systems that enforce policies&lt;/li&gt;
&lt;li&gt;Meetings: Regular forums to review progress and address issues&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The key distinction is that policy describes what should happen, while operations ensure it actually happens. Even the best policy will fail without effective operational mechanisms to support it.&lt;/p&gt;
&lt;p&gt;In practice, when writing a strategy for readers, these sections are often merged to improve readability, but when creating a strategy, it&amp;rsquo;s valuable to think about them separately.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/policy/"&gt;Setting policy&lt;/a&gt; and &lt;a href="https://craftingengstrategy.com/operations/"&gt;Operations for strategy&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="what-is-wardley-mapping-and-when-should-i-use-it"&gt;What is Wardley mapping and when should I use it?&lt;/h2&gt;
&lt;p&gt;Wardley mapping is a technique for visualizing the evolution of components in a value chain, created by Simon Wardley. A Wardley map plots components along two axes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Visibility to user&lt;/strong&gt; (vertical axis): How aware users are of a component&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Evolution&lt;/strong&gt; (horizontal axis): How mature a component is, from genesis to commodity&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The book identifies several scenarios where Wardley mapping is particularly valuable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When you need to understand how a technology ecosystem is evolving (like the LLM ecosystem example)&lt;/li&gt;
&lt;li&gt;When your strategy needs to account for industry-wide changes&lt;/li&gt;
&lt;li&gt;When you want to identify opportunities for strategic advantage based on component evolution&lt;/li&gt;
&lt;li&gt;When your strategy needs to span multiple years and accommodate changing circumstances&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Wardley mapping helps you zoom out to see broader patterns and anticipate changes, which makes it complementary to techniques like systems modeling (which zooms in on specific dynamics).&lt;/p&gt;
&lt;p&gt;The book provides a step-by-step approach to creating Wardley maps, starting with identifying users and needs, establishing value chains, plotting them on the map, and then predicting evolution.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/wardley-mapping/"&gt;Refining strategy with Wardley Mapping&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="what-is-systems-modeling-and-how-does-it-help-with-strategy"&gt;What is systems modeling and how does it help with strategy?&lt;/h2&gt;
&lt;p&gt;Systems modeling is representing interconnected components as stocks (accumulations) and flows (changes to stocks) to understand complex behaviors and identify leverage points. The book describes it as &amp;ldquo;an effective, flexible tool for debugging complex problems.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Systems modeling helps with strategy in several key ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Identifying leverage points&lt;/strong&gt;: Models reveal which interventions will have the most impact. For example, the rider onboarding model showed that reengaging departed drivers was more valuable than improving onboarding speed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Testing ideas cheaply&lt;/strong&gt;: Models let you experiment with different approaches without real-world risks. The API deprecation model demonstrated how reducing both baseline churn and deprecation-related churn together created the biggest impact.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Revealing counter-intuitive insights&lt;/strong&gt;: Models often show unexpected behaviors. The developer experience model revealed that implementing LLMs might increase time writing and testing code, not decrease it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Mediating stakeholder disagreements&lt;/strong&gt;: When stakeholders have conflicting intuitions about what will work, models provide a structured way to explore those intuitions.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The book provides multiple examples of applying systems modeling to engineering strategies, including service provisioning, driver onboarding, and API deprecation.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/systems-modeling/"&gt;Using systems modeling to refine strategy&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="when-should-i-write-strategy-and-how-much-is-too-much"&gt;When should I write strategy, and how much is too much?&lt;/h2&gt;
&lt;p&gt;The book provides clear guidance on timing and volume for strategy work:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When to write strategy:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When your organization is not globally consistent in approach (different teams give different answers)&lt;/li&gt;
&lt;li&gt;When you&amp;rsquo;re experiencing significant hiring growth (which can dilute cultural knowledge)&lt;/li&gt;
&lt;li&gt;When you have new external leadership who may drive inconsistent approaches&lt;/li&gt;
&lt;li&gt;When you have significant organizational changes like reorganizations&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;How much strategy:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Limit yourself to one or two strategies at a time to ensure quality and focus&lt;/li&gt;
&lt;li&gt;Use strategy altitude to manage volume:
&lt;ul&gt;
&lt;li&gt;Higher altitude (organization-wide) strategies should be more permissive&lt;/li&gt;
&lt;li&gt;Lower altitude (team-level) strategies can be more prescriptive&lt;/li&gt;
&lt;li&gt;Permissive strategies create less overhead than prescriptive ones&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Signs you&amp;rsquo;re doing too much:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Your prior strategy work isn&amp;rsquo;t impacting subsequent decisions&lt;/li&gt;
&lt;li&gt;Teams feel overwhelmed by changes in approach&lt;/li&gt;
&lt;li&gt;You&amp;rsquo;re spending more time on strategy than on implementation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The book recommends focusing on one strategy until it works before moving to the next one. This approach may seem unambitious, but it&amp;rsquo;s typically more effective than attempting many strategies simultaneously.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/when-write-stratefy/"&gt;When to write strategy, and how much?&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="how-should-i-format-my-strategy-document-to-maximize-readability"&gt;How should I format my strategy document to maximize readability?&lt;/h2&gt;
&lt;p&gt;The book emphasizes that the order for writing a strategy is different from the order for reading it. For maximum readability:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Invert the structure&lt;/strong&gt;: Put the most immediately useful information first:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Policy: What does the strategy require or allow?&lt;/li&gt;
&lt;li&gt;Operation: How is the strategy enforced?&lt;/li&gt;
&lt;li&gt;Refine: What were the key insights that informed the strategy?&lt;/li&gt;
&lt;li&gt;Diagnose: What challenges is the strategy addressing?&lt;/li&gt;
&lt;li&gt;Explore: What broader context informed the thinking?&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Consider refactoring&lt;/strong&gt;: Feel free to merge sections (like combining Policy and Operations) or embed one section in another when it makes the document more readable.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Additional readability tips&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Test with uninvolved readers before wide release&lt;/li&gt;
&lt;li&gt;Include a clear commenting period and office hours for questions&lt;/li&gt;
&lt;li&gt;Disable in-document commenting after release to avoid distraction&lt;/li&gt;
&lt;li&gt;Include consistent metadata (creation date, approval status, where to ask questions)&lt;/li&gt;
&lt;li&gt;Create a template specific to your organization&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Remember that most readers just want to know what to do, not the full thinking behind it. The inverted structure serves those readers first while still providing context for those who want to understand the rationale.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/readable-strategy/"&gt;Making engineering strategies more readable&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="what-is-strategy-altitude-and-why-does-it-matter"&gt;What is strategy altitude and why does it matter?&lt;/h2&gt;
&lt;p&gt;Strategy altitude refers to the organizational level at which a strategy operates and how permissive or prescriptive it is. The book identifies this concept as a key tool for managing the volume and impact of strategies.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Altitude levels include:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Company-wide strategies&lt;/li&gt;
&lt;li&gt;Engineering organization strategies&lt;/li&gt;
&lt;li&gt;Team-level strategies&lt;/li&gt;
&lt;li&gt;Individual strategies&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Permissiveness spectrum:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Prescriptive strategies mandate specific approaches with little flexibility&lt;/li&gt;
&lt;li&gt;Permissive strategies provide guidance but allow for local customization&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Strategy altitude matters because:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;It affects adoption costs&lt;/strong&gt;: Higher-altitude, prescriptive strategies create more organizational overhead&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;It impacts effectiveness&lt;/strong&gt;: The right altitude ensures the strategy can be properly enforced&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;It enables managing multiple strategies&lt;/strong&gt;: Using different altitudes allows you to address more challenges without overwhelming the organization&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The book recommends deliberately choosing your strategy&amp;rsquo;s altitude based on your context. If you lack authority for a high-altitude strategy, operate at a lower altitude. If you need to cover many topics quickly, use more permissive approaches at higher altitudes.&lt;/p&gt;
&lt;p&gt;Examples include Calm&amp;rsquo;s product engineering strategy (high-altitude, relatively prescriptive) versus Uber&amp;rsquo;s service migration strategy (lower-altitude, more permissive).&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/when-write-stratefy/"&gt;When to write strategy, and how much?&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="how-do-operational-mechanisms-make-or-break-a-strategy"&gt;How do operational mechanisms make or break a strategy?&lt;/h2&gt;
&lt;p&gt;Operational mechanisms are the concrete ways a strategy is implemented and enforced. The book emphasizes that even the best policies fail without effective operational mechanisms, calling them &amp;ldquo;two-thirds avoiding common practices that simply don&amp;rsquo;t work.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Effective operational mechanisms:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Make policies real&lt;/strong&gt;: They translate abstract decisions into concrete actions&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Provide feedback&lt;/strong&gt;: They help identify when policies aren&amp;rsquo;t working&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Create accountability&lt;/strong&gt;: They ensure teams actually follow the policy&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reduce friction&lt;/strong&gt;: They make following the policy easier than ignoring it&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The book identifies common effective mechanisms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Approval and advice forums&lt;/li&gt;
&lt;li&gt;Inspection mechanisms to evaluate compliance&lt;/li&gt;
&lt;li&gt;Nudges that provide timely guidance&lt;/li&gt;
&lt;li&gt;Documentation of how to follow the policy&lt;/li&gt;
&lt;li&gt;Automation that enforces the policy&lt;/li&gt;
&lt;li&gt;Meetings to review progress and address issues&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It also highlights mechanisms that typically fail:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Top-down pronouncements without enforcement&lt;/li&gt;
&lt;li&gt;One-time educational announcements&lt;/li&gt;
&lt;li&gt;Mandatory recurring trainings that no one pays attention to&lt;/li&gt;
&lt;li&gt;Vague cultural expectations without concrete steps&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Effective mechanisms differ based on your role—executives have more leverage through mandates, while individual engineers may rely more on nudges and documentation. However, the book notes that mandates often don&amp;rsquo;t work as well as expected, while lightweight mechanisms can be surprisingly effective.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/operations/"&gt;Operations for strategy&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="how-do-i-handle-strategy-when-my-organization-is-undergoing-rapid-change"&gt;How do I handle strategy when my organization is undergoing rapid change?&lt;/h2&gt;
&lt;p&gt;The book specifically addresses doing strategy in chaotic environments, emphasizing that strategy doesn&amp;rsquo;t require stable environments—it requires awareness of the environment you&amp;rsquo;re operating in.&lt;/p&gt;
&lt;p&gt;Key approaches for handling rapid change include:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Explicitly acknowledge uncertainty in your diagnosis&lt;/strong&gt;: Rather than waiting for perfect information, incorporate the uncertainty into your strategy. For example, the private equity strategy acknowledged unknown reduction targets but still provided guidance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Focus on shorter timeframes&lt;/strong&gt;: In dynamic environments, your strategy might only project forward a few weeks or months instead of years.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Use Wardley mapping&lt;/strong&gt; to anticipate evolution and build adaptability into your strategy.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Emphasize operational mechanisms&lt;/strong&gt; that provide frequent feedback, allowing you to detect when circumstances have changed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Focus on policies that create options&lt;/strong&gt; rather than those that commit to specific long-term approaches.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Layer permissive, high-altitude strategies&lt;/strong&gt; that provide guidance without constraining teams from adapting to changing circumstances.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The book emphasizes that strategies don&amp;rsquo;t fail because of chaotic environments—they fail because they don&amp;rsquo;t diagnose those environments accurately. A good strategy in a rapidly changing context will explicitly account for that change.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/theory-and-practice/"&gt;Bridging theory and practice&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="what-are-the-most-common-ways-strategies-fail"&gt;What are the most common ways strategies fail?&lt;/h2&gt;
&lt;p&gt;The book identifies several recurring patterns of strategy failure:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Skipping refinement&lt;/strong&gt;: Many strategies fail because they aren&amp;rsquo;t tested before being fully implemented. Strategy testing is particularly valuable for avoiding this failure mode.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Poor diagnosis&lt;/strong&gt;: Strategies that misunderstand the core problem or ignore critical constraints are doomed from the start. Often this happens when strategies anchor too quickly on one solution.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Neglecting operational mechanisms&lt;/strong&gt;: Even sound policies fail without effective mechanisms to ensure adoption. This is especially common with executive-driven strategies that assume announcement equals adoption.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Overly ambitious scope&lt;/strong&gt;: Attempting to change too much at once often leads to failure. The book recommends working on one strategy at a time.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Mismatched altitude&lt;/strong&gt;: Strategies that are too prescriptive for their altitude create too much overhead and resistance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Manufactured consent&lt;/strong&gt;: Creating the illusion of agreement without actual alignment leads to surface-level compliance but ultimate failure.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Optimizing for side goals&lt;/strong&gt;: When strategies prioritize secondary objectives (like learning a new technology) over the primary goal, they often fail to solve the core problem.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Disregarding constraints&lt;/strong&gt;: Ignoring real limitations (like available resources) leads to strategies that cannot be implemented.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The book emphasizes that many of these failure modes can be detected early through proper refinement techniques and then corrected before significant resources are wasted.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/refine/"&gt;Refining strategy&lt;/a&gt; and &lt;a href="https://craftingengstrategy.com/evaluating-strategy/"&gt;Is this strategy any good?&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="how-do-i-get-buy-in-for-my-strategy"&gt;How do I get buy-in for my strategy?&lt;/h2&gt;
&lt;p&gt;While the book doesn&amp;rsquo;t dedicate a specific chapter to getting buy-in, it provides several approaches across different chapters:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Start with solid exploration and diagnosis&lt;/strong&gt;: When your strategy is grounded in a thorough understanding of the problem and existing approaches, it&amp;rsquo;s more compelling to stakeholders.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Use refinement techniques to build evidence&lt;/strong&gt;: Systems modeling, Wardley mapping, and strategy testing provide concrete evidence that builds confidence in your approach.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Match your strategy&amp;rsquo;s altitude to your influence&lt;/strong&gt;: If you&amp;rsquo;re not an executive, focus on team-level strategies or use influence-based approaches like &amp;ldquo;model, document, and share.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Focus on solving urgent, recognized problems&lt;/strong&gt;: Strategies addressing widely acknowledged pain points get more traction than those solving theoretical issues.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Format for readability&lt;/strong&gt;: Structure your strategy document with policy first so stakeholders immediately understand what you&amp;rsquo;re proposing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Acknowledge constraints&lt;/strong&gt;: Strategies that work within recognized constraints rather than fighting them are more likely to be adopted.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Use a hybrid rollout&lt;/strong&gt;: Start with a permissive approach that allows teams to customize implementation, then gradually move to more prescriptive policies as you demonstrate value.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Target &amp;ldquo;strategy archaeologists&amp;rdquo;&lt;/strong&gt;: Identify long-tenured employees who understand the organization&amp;rsquo;s history and get their support first.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The book notes that different approaches work better in different organizations, so understanding your company&amp;rsquo;s culture is essential to effective buy-in strategies.&lt;/p&gt;
&lt;p&gt;Learn more in &lt;a href="https://craftingengstrategy.com/who-does-strategy/"&gt;Who gets to do strategy?&lt;/a&gt; and &lt;a href="https://craftingengstrategy.com/theory-and-practice/"&gt;Bridging theory and practice&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Drafting Strategy</title><link>https://craftingengstrategy.com/about/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://craftingengstrategy.com/about/</guid><description>&lt;p&gt;Good decision making is the core of successful engineering organizations, and strategy is the art of reproducibly making good decisions. This book will explain how you can improve your organization’s decision making as an engineer or as an executive, and provide a number of concrete examples from real, meaningful companies.&lt;/p&gt;
&lt;p&gt;Start reading from the &lt;a href="https://craftingengstrategy.com/"&gt;table of contents&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Written by Will Larson, who writes frequently at &lt;a href="https://lethain.com/"&gt;Irrational Exuberance&lt;/a&gt;.
He has previously written
&lt;em&gt;&lt;a href="https://www.amazon.com/Elegant-Puzzle-Systems-Engineering-Management/dp/1732265186"&gt;An Elegant Puzzle&lt;/a&gt;&lt;/em&gt;,
&lt;em&gt;&lt;a href="https://staffeng.com/book"&gt;Staff Engineer&lt;/a&gt;&lt;/em&gt;, and
&lt;em&gt;&lt;a href="https://www.amazon.com/Engineering-Executives-Primer-Impactful-Leadership/dp/1098149483/"&gt;The Engineering Executive&amp;rsquo;s Primer&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;</description></item></channel></rss>