We Have To Support Every Line of Production Code Forever

9 min readDec 18, 2023

In the heat of an enterprise deal moment, it’s easy to think very short-term about the long-term costs of one-off specials and “small requirements.” There’s tremendous pressure to maximize the importance of a feature tweak to close this quarter’s big deal, and similar pressure to minimize both the initial effort to make that tweak and the ongoing effort to keep it working.

The upside (for the sales team) is huge, and the cost (for product/ engineering/ support) is diffuse. Far away. Hard to see. So we often pooh-pooh these discussions as whinging/ lack of commercial understanding/ technical grumpiness/ inattention.

What Have We Committed To?

Regardless of whether a big customer pays us for a one-off enhancement or we give it to them for free, our responsibility is clear: this bit of code needs to work as promised, and continue working as promised, for as long as that customer has it in production. If it breaks three years from now — or has to change to work with changes in their other various systems — we’ll be expected to fix/change/adapt/improve it to meet purpose. Most enterprise systems last for 7–10 years: that means 7–10 years of having someone on the product staff who knows it exists and someone on the technical staff who understands it enough to make repairs/improvements.

(Think we can ignore it once it ships? Hope it will work forever? Role-play what happens when MegaCorp’s CIO calls your CEO on a weekend because they have an outage apparently caused by our unsupported doodad.)

So when an important deal needs a one-off enhancement, we rarely ask the fundamental product questions:

What’s the ongoing effort to keep this working? We’ll need to assign some fraction staff forever (until this one customer retires it).
How many other one-off bits are we fractionally maintaining, and how big a bite is that taking out of our strategic product work?

These inevitably lead to some mental gymnastics about why this particular item won’t need ongoing support or improvements…

Won’t It Just Keep Working?

It’s easy to believe that a small piece of software that works today is done — that it will keep working into the far future. This is a fundamental but entirely wrong assumption of project-based software staffing: that once it’s tested and delivered and installed, we can wash our hands of this thing and move everyone onto the next thing.
It’s rather the opposite: every piece of our software that’s in production will (at the most inconvenient time) need some improvement or repair or adjustment. We need to plan for that: some fractional staff that understands this widget, has some customer context, and know they’re responsible when s**t hits fan.

Imagine that a major prospect for our investment banking application needs a custom connector for their legacy stock market analysis app: their trading strategy team uses an old database with obsolete formats for trade type, date/time, currency symbol, etc. They demand that we write a utility to convert our modern dates, times, buy/sell/market/limit orders, and International currencies ($US, $CA, $NZ, $AU…) to their schema. Specs seem straightforward. Closing this $480k/year license deal is highly visible to the exec team and Board. First guessimate is that our best engineer could whip this up in two weeks (if we take her off our #1 top priority commitment that’s already been announced to the world, don’t interrupt her with anything else, have perfect specs, and don’t run into any surprises).

What events down the road might force us to “fix” this connector?

NASDAQ might introduce a new transaction type such as “post-only.” We then need to map this to one of the customer’s legacy transaction types.
Their market strategy team realizes they need one more field mapped. (Repeat every quarter.)
We upgrade our core product, adding or changing how currencies are stored. The customer is waiting for unrelated improvements in this upgrade, but can’t move until we modify the connector.
Their legacy database vendor announces that all users must upgrade to the latest (supported) version, which stores dates differently.
New financial regulations require additional reports using data items not currently in their market analysis app.
Azure “improves” its app-to-app authentication protocols, so we need to tweak our multi-factor authentication.
Some unrelated app of theirs starts spewing trades with dates from the 19th century. Their system is too old to fix, so they need us to filter out anything more than a year old.
A country adopts the euro (€), devalues its currency, opens a new stock exchange, or shifts its daylight savings day.
The bank’s auditors decide that we need to log every transaction for future security or privacy audits.

And so on. No piece of software is ever done, ever perfect, or meets all possible future needs. “Future-proof” is a fiction. Users are wonderfully creative about what else our app should do.

If we have a few dozen of these one-off, special, bespoke, too-simple-to-fail, set-and-forget, customer-twisted-our-arms, no-one-is-assigned widgets in our installed base, we’re likely to have an escalation on something almost every week. We’re on our back foot, scrambling to find someone who remembers something about what’s broken. Unfortunately, our brilliant engineer has left the company and we forgot to do a handoff or knowledge transfer. Again.

Productivity and morale on our technical team suffer. We lose focus. Escalations stack up. And it’s very hard to assign costs (or blame) to any single bespoke item, since each individually seems small. But they add up.

Let’s Run the Numbers

But we closed the deal, right? And we’ll be earning (commissionable) SaaS license fees for years. This should pay for itself. Let’s run the numbers.

Quarter after quarter, this custom connector will probably consume 15%-20% of a person — spread across a senior developer (including hours of account background, initial technical discussion, specs, sample data, and clarifications); a junior developer (for holidays/off-hours coverage and cross-training); a test engineer (building, running and inspecting test suites); a Level 3 support analyst ( is it a bug? can we reproduce it?), a product manager (escalation management, context-setting), and DevOps (build automation). And that’s ongoing, for as long as this connector is in production. Remember: everything eventually breaks, usually on a holiday weekend. BTW: if we think we can measure and optimize 15% of a person, we’re lying to ourselves.

What might it cost?

Hard cost: 15% * $200k average (fully loaded) staff salary * 7 years = $210k as a starting SWAG. Unlikely that we ever track or account for this.
Opportunity cost: our R&D team is never big enough to do everything we want, so some other feature or bug fix will get pushed out for this. Inevitably. Even though we delude ourselves that “there’s slack in our development process” or “we can work just this one weekend” or “this one isn’t so difficult” or “lots of other customers will eventually want it” or “we have the smartest people” or “this is the only time we’ll interrupt the team.”
Our formal product roadmap is focused on improvements that a lot of our customers have been asking for, and which we believe will drive new customer acquisition or major upgrade revenue. And all of the money in the software product business is selling in the nth identical copy of bits we’ve already built. Which of our ten top #1 priorities takes the hit, and how much revenue does it leave on the floor?
Investor cost: R&D spending at product companies should earn 6x-15x. A dollar per year spent on products should bring in $6-$15/year in incremental revenue. This one-off will surely reduce our effective margin on the specific deal, and also reduce our perceived R&D effectiveness. Could we claim that this work delivered an incremental $1M-$3M in revenue?

Eventually, this software hardening-of-the-arteries slows down our ability to get new products built and innovations explored. We’re spending our time doing CPR on one-off connectors built between 2017 and 2022.

This is much harder with AI

Since every discussion this year eventually shifts to AI…

Most software is deterministic: we know exactly how it will behave each time it runs, and we can test that behavior over time to confirm it’s still working. We may have made mistakes (implemented it wrong), in which case it makes the same mistake each time we recreate the exact inputs/situation. For our hypothetical data connector, we could make an automated test suite that feeds it sample data nightly, then automatically checks if the wrong answers pop out. It either does what we want (every time) or it doesn’t.

Debugging is a bitch, but a well-documented use case is probably enough to (eventually) track down precisely what the app is doing. (That might take 10 minutes or a month.) And when the customer’s connected systems change, we can usually get them to tell us what’s different.

Most AI applications work statistically, so their outputs change as new data arrives. (Not just new code.) And they are black boxes: we can’t precisely describe their decision logic or exactly define what will come out.

Corner cases keep appearing. Our autonomous cars were trained on 10M videos of pedestrians and 2M videos of bike riders, but we didn’t think about (train on) pedestrians walking their bicycles. People died.
Our LLM model data ages. Any current events bot trained on news and history before October 2023 will give an unsatisfactory response to prompts about the Middle East. Likewise, brand-new HR policies about unpaid leave and extended holidays aren’t reflected in our training data, or responses alternately dredge up old and new policies.
A neural network trained on millions of x-rays to identify colon cancer will miss new diseases, emerging symptoms, and may fail on new x-ray formats or resolution. We have to keep reviewing our model and our test data.
Our AI platform vendor improves their algorithms. What looked right yesterday doesn’t today. They can’t clearly explain what happened.
Bad information arrives. The disinformation-industrial-complex starts spewing thousands of posts and articles claiming that NASA is hiding proof of climate change on Jupiter. Those get absorbed into the shared public corpus of our LLM vendor, and start popping up as facts. Or some of our super-top-secret proprietary corporate strategy docs get mixed in with our trouble tickets. The SEC wants to know why users of our support chatbot start seeing unreleased financials.

And so on. Highly complex systems that we don’t completely understand and aren’t deterministic will be even harder to support, maintain, and enhance over time.

Important to note: AI systems (especially LLMs) don’t actually have intelligence. They process what we give them, and return plausible responses based on statistics. They are guaranteed to get some things wrong: working from human inputs that are guaranteed to have errors/omissions/old data/ambiguous language. We’ve been mistakenly assigning intelligence to machines since the 1960's.

Especially with LLMs, we cherrypick interesting responses that give us emotional validation. I keep talking with folks who’ve used generative apps to create images. On close questioning, they admit to inspecting 120 images and tuning their prompt 6 times before getting what they wanted. The intelligence is in our heads. But when we see the final selected image or poem out of context, we forget that a human plucked out the very best one for us and discarded the other 119.

So in addition to all of the code-driven/data-driven events which might force product changes, any “one-off” AI-assisted features will have serious content-driven testing challenges over time.

How do we know it’s still working? Do we have an experienced human to periodically generate outputs and evaluate/compare/verify? What if the SME who knows this subject is no longer with our company?
Do we have clear statistical goals? If this AI is supposed to filter security alerts better than humans, does it need to be 20% better or 10x better? Is a 1% error rate OK? Free LLM poem generators and autonomous taxis are different.
How do we accept complaints or support tickets? We may need the exact prompt or x-ray or rejected mortgage application to run through the system. Or a button reading “I THINK THIS ANSWER IS WRONG.” Plus someone who definitively knows what the right answers should be…

My working assumption is that maintaining AI-based systems will be even harder than our deterministic applications. “ I’m sorry Dave, I’m afraid I can’t do that.”

Sound Bite

We have a heap of justifications and explanations for why this tiny customer-demanded item won’t cost us much and won’t need ongoing support/ maintenance/ debugging/ enhancement/ product management/ investment. But we’re lying to ourselves.
Product and Engineering should keep a running list of what we’re actually supporting — and how frequently we’re interrupting “real work” to fix them — for that unavoidable executive conversation about R&D efficiency.

Originally published at https://www.mironov.com on December 18, 2023.