The Risks of Replatforming
(Or How Major Platform Migrations Really Happen)
Many companies have replatforming efforts underway. Architectures get old, new kinds of partners or integrations emerge, hard-to-maintain monolithic code gets broken into microservices, acquisitions force integration of dissimilar systems, etc. This is an essential part of the software product business, but fraught with poor assumptions and lack of experience/understanding. And the majority of replatforming and reimplementation efforts I’ve seen have failed.
IMHO the #1 source of failure is setting entirely wrong expectations. I typically hear this: “We’ll shift a majority of development effort onto our NextGen system, build a 100% plug-compatible replacement for LegacyTech by February 15th, and immediately migrate all customers to NextGen. That lets us minimize work on LegacyTech for the next few quarters and deploy all of Engineering onto NextGen right after launch.”
Magical thinking in almost every word. Here’s what probably happens:
- While we intend to move most of the development team to NextGen, we rediscover that 100% of our current revenue comes from customers on our current platform. There’s a huge list of commitments, urgently needed improvements, compliance issues and current-quarter revenue drivers that can’t be dropped. Engineering time-slices everyone (inefficient and demotivating). Or we assign a much smaller-than-planned team to NextGen…
Or we assign the “best” developers to NextGen, massively demotivating the developers “left behind” on revenue-producing LegacyTech… Or we hire an entirely new team to work on NextGen which not only demotivates everyone already on board, but requires them to teach the fresh hires everything we already know about our platform/processes/tools/ company/strategy/culture/market.
- We don’t have a complete spec for the current system. It’s 10 years old, the original architects are long gone, we’ve grown and grafted lots of obscure capabilities, and portions have not been touched for years. No one person actually knows how the entire system works. Some modules are too scary or delicate to touch.
But we nonetheless tell ourselves that reverse engineering requirements from the existing product should be easy. (“How hard could it be to just make the new product work just like the old one?”) So our project estimates are wildly optimistic. As we work, we keep unearthing surprising functionality and technical ugliness that pushes out a final delivery date.
- We should eliminate assorted features that aren’t widely used, don’t work well, address outdated needs, or otherwise weigh down the current version. This is a strategic product decision, since it will inevitably p*ss off a handful of long-time customers even though it’s the right thing to do. But we don’t think through these hard trade-offs in advance, since NextGen is imagined as a perfect substitute.
- Major systems projects never deliver everything all at once, magically, on a single day. In reality, we’ll get a v1.0 that covers 60% of use cases and existing features in February; then v1.1 that covers 70% in April; then 78% coverage in v1.35 in July; then 82% in November… eventually we creep up on 90%. (We never reach 100% compatibility.) And over-optimism means that February doesn’t happen until August — or possibly the following January — as schedules slip and we add more features to both versions.
- Current customers — the people paying our salaries while we replatform — hear about NextGen and immediately want all of the details. “I need a document right now that walks me through every step of the migration and every change in feature/function so that I can do detailed planning now for Q4 2022.” “We just licensed LegacyTech and expect 4+ years of support on that architecture. Our auditors take a full year to review major upgrades, so quick replacements are not an option.” “What about the third-party integration we put on top of your data extraction API? Can you promise 100% parameter-level compatibility and zero recoding?” “I have 8 urgent P1 tickets waiting to be fixed. You can’t reduce engineering support for production systems.” Executives panic about these entirely expected responses.
- Sales and Support take us at our word, and tell customers that NextGen will be 100% plug-compatible with LegacyTech: no feature changes, no data migrations, nothing new to learn, no incremental work for them, and includes unlimited free assistance. Promised for the original February delivery date. After all, that was our stated goal for the project. They eventually look foolish in front of customers, and product/engineering are the scapegoats.
- As replatforming takes much longer than planned (as it always does), there’s increasing pressure to add new features to the current product. Customers demand continuous improvement, and our competitors are not asleep. So we back-fit just one new feature into LegacyTech, then a second, then a third. Each of these has to be implemented in NextGen to keep 100% compatibility, further delaying its arrival.
- We discover that our six-month project has taken two years and isn’t yet complete. We’re always one last feature away from perfection, and the shiny tech choices we made two years ago are outdated. Internal pressure builds to cancel it (again) and start fresh (again). We fire some architects and product managers, hoping that it goes better next time. Replatforming becomes our codeword for catastrophe.
BTW, decades ago Bob Epstein (Sybase’s technical co-founder) taught me that a legacy system is any system that works. So let’s avoid labeling the software that earns 100% of our revenue as legacy tech. Instead, we’ll go with CurrentPlatform and UpgradedPlatform.
Coming back to the very beginning, we’ve chosen unrealistic objectives and set ourselves on the wrong path. Consider this alternative:
- Migrating our entire customer base of 1000 companies will take 2–3 years starting once we deliver UpgradedPlatform v1.0. We’ll be running two development efforts in parallel for at least 18 months for our six-month project (or 3 years for our 1-year project). Sorry, them’s the facts.
- We’ll migrate customers in well-defined cohorts. First, we set a date for all new customers to start on UpgradedPlatform v1; then a segment of current customers using only core/vanilla features; then those who need some handholding and prodding; then those with complex migrations, etc.
- Customer Success will create a dedicated migration assistance squad. We will also recruit and train some external development partners to take on bespoke integrations.
- We will undoubtedly have 40 or 50 long-standing customers who can’t migrate or refuse to move forward. Development’s savings from completely retiring CurrentPlatform may be $10M+/year, but only when we terminate support for the very last customer. So let’s agree now (instead of 3 years from now) that we’re willing to lose a few big customers and their ARR. (If not, let’s cancel this project before we start.)
- The KPI/success metric is a stepwise migration of customers. We hope to have 100% of new customers starting on UpgradedPlatform by September, and the easiest 30% of our installed base by December. Each new version will unlock another segment, so we’ll move another 10% per quarter as each improved release arrives.
- We’ll push back hard on any enhancements to CurrentPlatform but know that a few will be required. (And customers who need those will be unwilling to migrate until UpgradedPlatform has those exact improvements.) So we’ll escalate requests to the Exec Team and implement those later into UpgradedPlatform. These customers go to the end of the migration queue even though it costs us current-quarter revenue.
- Initially, we don’t make a big public deal about UpgradedPlatform — since it’s a subset of CurrentPlatform. Once we hit 35%, we’ll make a formal End-of-Support announcement for CurrentPlatform with 2 years’ notice. Hard stop on that date with no exceptions, no extensions, no excuses even when HugeCorp calls our CEO and complains. (Otherwise, we’ll be maintaining it for another decade and lose all of the intended efficiencies.)
- CurrentPlatform goes into maintenance-only mode when UpgradedPlatform hits 75% adoption: that’s when we stop all work except truly critical P1/system down issues. We redeploy the whole CurrentPlatform team to UpgradedPlatform at 95% and have a ceremonial burial for that codeline.
My long-time friend and collaborator Ron Lichty has a rule of thumb: start with the worst possible estimate for how long replatforming might take and then triple it. (Exception: if you have a monopoly position in the market, you can push off urgent feature requests for most of a year.)
To illustrate the cohort approach, let’s imagine we are market leaders in the highly competitive livestock management system (LMS) market. Our Hoofbeats application suite helps 5000 ranchers and farmers track their herds and submit assorted regulatory paperwork. But it’s 10 years old; runs on a second-tier cloud vendor; doesn’t scale well; is built in technologies that developers ran screaming years ago; and has mountains of tech debt resulting in brittleness that makes every new feature a time sink. We have to re-architect and re-platform even though most of the engineering team is working on expanding feature/function.
We’ve analyzed customer usage patterns and mapped out a series of releases that can handle increasing portions of the installed base. Since this already lives in the cloud, we can see what users are actually doing — and group them accordingly.
- 40% of customers are single-location American ranchers raising only cattle (cows), submitting paperwork via email/PDF, and using the basic version of our RFID animal tracking system. They use roughly one third of Hoofbeats’ total capabilities. If we ruthlessly prioritize, this could be v1.0 of HoofbeatsTNG. Early migrations will include some manual data transfers, so upgrading 2000 smaller customers will take Q3 and Q4.
- Another 15% are similar but raise cows and horses and bison. That requires report summaries by species, additional regulatory forms, and breed-specific medication recommendations. Let’s add those to v1.1 and migrate those ranchers next.
- At this point, we stop selling Hoofbeats-Current to new customers. (“As of November 1, all fresh proposals will be for HoofbeatsTNG. As of January 1, no new signed deals for Hoofbeats-Current. Sales teams get zero commission on any exceptions.”) And we formally announce our end-of-support plan for Hoofbeats-Current. (“Last day of product support for Current will be 30 October 2023. No extensions, no special side maintenance contracts.”)
- 20% of customers are outside the US, and each country has its own regulatory filings: 16% in Canada, Australia, UK, Brazil and Argentina — so we’ll limit v1.2 to those five countries. In addition to reporting, we must support local languages and currency, date format, and wider phone number/postal code fields.
- 5% of our customers (but 25% of total revenue) are large agricultural corporations with many locations and management hierarchies. So v1.3 will need multi-site reporting, complex forecasting tools, online form submissions, delegated user permissions, and integration with the top two agricultural accounting systems. Etc.
- The last 4% are customers we will actively drop or refuse to accommodate. A few raise llamas, some have integrations with obsolete financial software, and we’ll limit ourselves to 6 languages. We have clear executive agreement not to serve these groups going forward — no matter how much they complain. Our account teams will notify them at the appropriate time and recommend alternatives.
And so on. Thinking through the various cohorts and releases lets us build a realistic plan, sequence feature delivery, show stepwise progress, and respectfully notify customers.
The alternative is the worst of all possible worlds: 4800 customers on HoofbeatsTNG, 200 on Hoofbeats-Current, and maintaining both product streams for the next decade.
BTW, twenty demerits and a trip to the woodshed for anyone referring to any release as MVP.
Replatforming or re-architecting production software is a complex, expensive, risky proposition. Most of the challenges are around dealing with internal expectations and paying external customers, not the underlying technical details. So let’s not think of these as development-only exercises and assign sole responsibility to Engineering.
Originally posted at https://www.mironov.com/replatforming/