Last updated on 26.12.2020
Hello everyone!
It’s been a long time since the last post, so I’m really excited to brush the dust off my blog.
Today I’d like to talk about technical debt in backend systems, take a look at it from different perspectives and go through some examples of how one can reason about it.
This post is in no way a full reference of any sort, neither is it backed by academical research. It’s rather a collection of practical experiences from real-world Scala projects I have worked on. Main goal of the post is to provide a baseline for reasoning about technical debt to growing developers, that already have some experience and are starting to take on more responsibilities, including management related ones.
There will be a lot of “it depends” answers to the questions raised. Tech debt by its nature is a game of tradeoffs. And the value of being prepared is not to know exact answers, but to know how to go about selecting the best approach to the problem.
Some ideas presented here may sound obvious for seasoned developers. Still, I’m sure many people will either find valuable ideas here, or can share their experience from similar cases.
With this out of the way, let’s begin!
Planning the principal payments
I really like comparing technical debt to a regular money loan. Business deliberately (or with technical debt it can also happen inadvertently) takes on an obligation to “repay” it later.
As with money loan, there’s interest business has to pay. It comes in many forms: reduced system performance, tax on developer’s productivity, increased amount of bugs, etc. Similar to perpetual bonds, there are tech debts that business is fine to service indefinitely.
Another similarity is that you can go bankrupt on your tech debt. It happens when you’re only taking on new debt without repaying any principal. When your team can’t produce any new features and spends all of its time mitigating issues caused by tech debt — you’re bankrupt.
Other similarities can be named, but there’s also a very important difference. Financial debt is always very explicit, formalized and trivial to deal with, given you have resources. There’s some kind of contract, you can always refer to, and multiple payment methods you can use to repay the principal.
On the other hand, technical debt requires deliberate and non-trivial work for it to be identified, tracked, prioritized, estimated and, eventually, paid off.
I’m pretty sure you have seen something like this: technical team begs the business to allocate some resources to deal with technical debt. Finally, business agrees to give developers some time to work on it. Very often technical team is totally unprepared for such gift and spends the resources inefficiently. I myself remember that feeling, when after a week of work you are sure that something really important wasn’t done.
It happens for many reasons. To name a few: lack of tracking, miscategorization, wrong priorities. In my experience, you never have enough time to repay all technical debt, so properly planning principal payments is very important.
Don’t feel sorry for tech debt
Some managers have really dangerous attitude towards tech debt. Their view is that each and every case of tech debt is development team’s fault. This view can be expressed in different ways, but you will notice it. Couple of examples:
- “Why didn’t you do it properly last time?”
- “How much time do you need to rewrite this app so that there’s no more bugs and tech debt ever?”
I hope you get the point. Of course, there are cases of tech debt where the dev team is to blame, and we’ll see them below. But vast majority of tech debt appears as a direct consequence of software evolution with limited time constraints.
Tradeoff between time-to-market and amount of tech debt incurred is unavoidable. Great developer teams just produce less amount of debt compared to mediocre teams. But it’s never gonna be zero.
So, don’t tolerate being blamed for tech debt — such toxic environment will just gradually destroy the relationships in your company.
Now let’s look at how you can repay your tech debt efficiently.
Prioritization
Prioritization sounds easy in theory: the higher the interest, the quicker you should deal with the loan. It’s much more complicated in practice though, because:
- interest on technical debt is sometimes hard to measure. When in doubt, speak with business to clarify, which issues are more of a drag for the company;
- unlike financial debt, interest on technical debt can drastically change over time. It depends on many things: feature roadmap, team composition, major customers and so on.
Still, prioritization is critical to be done properly. Because of dynamic nature of tech debt interest, make sure to reevaluate the priorities when you pick up some old tasks. Some of them may go below the threshold of what you should really care about.
Tracking
The way you track tech debt really depends on the development process you have in your team.
Some teams have a luxury of regular time slots, dedicated to technical debt. For example, 1 human-week a month. In this case, it’s beneficial to keep an up-to-date technical debt backlog, which is groomed, let’s say, every month.
There’s probably no need to go over the whole backlog every time, since there’s always a lot of low-priority “eternal” tickets. Having a properly prioritized “Top 5” is gonna be enough to ensure your tech debt fixing train is running at max speed.
Other teams have to periodically negotiate for time to be allocated to tech debt. You can sometimes run for half a year before you have time to address some of those piled up problems.
You still need to have a ticket for every tech debt entry, that’s worth it (see “Categorization” section below). But your approach to grooming and prioritization should probably change. Having regular sessions sounds like a waste, since you don’t really know when you can work on these issues. And when the time comes, priorities can be totally different.
Therefore, you probably should evaluate your tech debt tickets only when you manage to secure a pool of resources to handle some of it. Also, for such kind of processes it becomes much more important to use the opportunities for “on the fly” repayments.
“On the fly” principal repayments
The concept is nothing new, and probably almost every developer has done it in their career. Small refactoring within a feature ticket is a good example. It’s just too much overhead to make such issues go through the full life cycle — creating a ticket, assigning priority, etc. Therefore, it’s optimal to just do the change on the spot within a related feature ticket.
For this to go smoothly, be sure to discuss the mechanics with the team, like are there gonna be two separate pull requests or a single one? On-the-fly refactoring should be an explicit part of the process.
This is a good way to work with small technical debt, especially for the teams which are strapped for resources to deal with it. But there are much more opportunities here.
Another valid way to push forward some tech debt is when team starts developing a feature, that is affected by tech debt in a major way. Let’s say you can currently deliver this feature in two weeks, because development and QA are slowed down by some legacy architectural weakness.
Before kicking off the feature development, it very important for the team to estimate, how much time it would take to address the debt and develop the feature afterwards. If, for example, it’s 1.5 weeks for the debt and only 1 week for the feature, then it’s a reasonable plan you can offer to the business. It’s realistic to expect approval here, but only if repaying this debt would speed up delivery of some other features in future.
Business likes numbers, but doesn’t like abstract talk. Showing a specific planned feature as an example of how much interest company pays on the tech debt is a great tactic.
Maintaining the on the fly threshold
There has to be some kind reasonable threshold for delivery time increase, which is eligible for on the fly refactorings. Your team can agree, that if addressing tech debt won’t increase delivery time by more than 25% (for example), this refactoring can be done on the fly.
Be also wary of how refactoring can affect time the issue goes through QA. It probably won’t have a huge effect if your test suite is solid and covers the refactored functionality well. But if your tests are weak, discuss the additional regression testing amount with your QA team to get the full picture.
It’s really important to not let the seemingly simple refactoring go out of control. Backend systems are complex and problems are not always obvious before you jump into coding. If after starting your work you realize, that you’re definitely exceeding the threshold by a lot — immediately discuss it with the team, including the person representing business.
Sometimes you can afford additional delay and, depending on how far you already are, it could be reasonable to push to the finish line. But most often you should take the disciplined approach: postpone the refactoring for better times and just deliver the feature. Unplanned refactoring that goes out of control does not only piss off the business — your teammates might also be unhappy with unexpected merge conflicts.
Categorization
Let’s take a look at different kinds of technical debt and how it can be handled. As I said — this list of categories is probably not exhaustive.
I also tried to group the categories by how they interact with development process. It led to some of the most well-known categories of tech debt to fall into the same group.
Critical race conditions or consistency errors
We’ll start with issues that sometimes get categorized as tech debt, but should actually be treated as bugs and have much higher priority. An example could be some payment processing, that doesn’t properly use the source queue and can lead to skipped payments or payments processed twice.
While it’s hard to imagine this classified as technical debt, I saw this happening. If a project is under development and is yet to go live, such issues are hard to notice: due to nature of concurrency they happen randomly and usually under at least some load. Inexperienced team without proper supervision can easily classify this as technical debt.
Such issues are almost always not technical debt. Business usually expects critical parts of the system work reliably, so if this sort of bug appears, there should be no problem getting resources to fix it as soon as possible.
It still can be viewed as an example of inadvertent tech debt in a sense, that mistakes were made during system design, which now have to be fixed. But in my mind, such tech loans should never be taken in the first place — the interest on them is extremely high. What I mean by that is that once system is big enough, such issues are many times harder to fix, than it would cost to properly design the solution in the beginning.
How do you know it’s critical? You should ask the business. Even if you’re sure this is a critical thing, talking to business may give you better understanding why. Also, you’ll have much easier time getting this very high in the backlog priority.
Summary: such issues should be highlighted, tracked and fixed with highest priority. These are dangerous bugs and not tech debt, so they have to surface in the team’s issue tracker along with other features and bugs.
Non-critical bugs
There are also issues, similar by nature, but that can indeed be classified as technical debt. If, for example, due to imperfect code 1 out of 1000 users gets his payment confirmation email duplicated, most of businesses won’t be very concerned.
Some other examples:
- accidental duplicate/missing push notifications about non-essential events in the system;
- complex calculation algorithm sometimes charging client 1 cent less than in should be;
- non-critical background service occasionally restarting due to hard-to-diagnose OOM exception;
- unhandled exception, which leads to HTTP endpoint sometimes answering 500 while it should have been a 4xx code.
Frankly, these are still bugs. But because the interest rate on this debt is very low, business might not care enough to allocate daily work resources for it. Of course, you should first discuss the issue with business people, to not miss a critical bug. If company doesn’t care, be prepared to put it in your tech debt backlog with respective priority. Its day will come.
It shouldn’t be at the very bottom though. If this part of functionality is evolving/changing from time to time, these kinds of issues might be quite painful. It’s really hard to add something reliable on top of unreliable code.
So, take notice if you find yourself spending too much time adding new features there or introducing new tech debt to work around existing one. This may mean, that the interest you pay isn’t actually that low and the issue requires attention sooner.
Summary: while still being bugs, these issues can sometimes be categorized as tech debt and have respective lifecycle. Make sure you make correct assumptions about the cost of not fixing them sooner.
Outdated library/tool
There are several reasons to consider this kind of case as technical debt:
- some old library can lack essential features, which leads to the team re-implementing some of them. Often with bugs 😉
- more up to date alternatives can have much better performance. It sometimes matters;
- if it’s really old, over time it will be harder to find developers that want to work with it or have enough experience.
Scala community evolves very rapidly, so examples are numerous. Switching from Future
to one of functional effect systems, replacing abandoned spray-json
with a modern json library like circe
, or dropping runtime DI in favor of compile-time one — these are only a few of real decision points I personally faced in my career.
To count these as tech debt or not is often a very controversial topic. Sometimes the real driver for such migrations is that developers just want to work with something fancier and more modern. It’s hard for me to blame them, cause most of us have such aspirations. But you have to be honest, when categorizing such upgrades as technical debt.
If there’s no interest — there’s no debt either. If using spray-json
doesn’t cost anything for the company, then upgrading to circe
is not a tech debt issue. It can go to other categories, like “developer happiness”, “tech radar” or “engineering PR”, but not tech debt.
Is there maybe a way to objectively classify most common upgrades here? I think no, because in most cases “it depends”.
Some teams can really leverage a modern effect system like ZIO
or Monix Task
, so upgrading from Future
can actually pay off long term. Performance of other teams will only suffer, because the new semantics are foreign to them.
Yes, spray-json
is old and lacks a lot of features. But it works, so if your team doesn’t need convenient json AST, investments in the upgrade might not have desirable rate of return.
There’s another interesting aspect. If company is on a mission to hire a lot of quality technical talent, it cares a lot about how technical community perceives their software. Even if some old library doesn’t slow down the development, upgrading may still be a valid choice, because modern and fancy tech stack will attract great developers. From this point of view, keeping that library around is actually technical debt, that potentially reduces amount and quality of new hirings.
Implementing upgrades
Another reason to carefully estimate potential benefit is that such upgrades are often very hard to execute:
- big refactoring will probably affect the ongoing work of any other developer in the team, leading to merge conflicts;
- libraries may have differences in semantics, undocumented in worst cases. Be prepared to thoroughly test the patch and quickly deliver hotfixes after deployment;
- such refactorings are very sensitive to any kind of pauses/delays, since every PR, merged during the such pause has to be refactored afterwards.
Because of this nature, every option to split the upgrade into several consecutive steps should be considered. Being able to upgrade in several smaller bites is truly invaluable. Don’t just rush to refactor everything in one go.
Tools like Scala Steward follow the same principle and can reduce amount of pain to do upgrades. They nudge you with pull requests for every new dependency release, and if you’re not super far behind, this often makes your upgrades small enough to even do on-the-fly.
Summary: Pending upgrades definitely have to be represented in the bug tracker. Some of them will rot there for long time, because company doesn’t pay interest on this “debt”.
Make sure to properly plan upgrade execution, aim to split into several steps.
Problems with repo/service/module structure or dependency graph
A lot of things mentioned in the previous section apply here as well:
- such refactorings are often massive and hard to execute;
- benefit is often too small to go through this kind of change.
Still, sometimes it just has to be done. Each case is unique on its own, so it’s hard to come up with specific guidelines. Instead, I’ll just talk about a recent thing I experienced, which is one of the most painful examples of technical debt I have seen in my career: small team developing several services in separate repositories.
This sometimes can happen because of non-technical reasons, for example an internal conflict between parts of the team. Driven by this real background motivation, team can easily come up with “reasonable” justification for splitting the app into several services and, god forbid, repositories.
Resulting tech debt burden is huge, and the worst part is that there’s no actual benefit, except for maybe a delay for solving the underlying conflict. Interest paid by the company includes:
- increased operational complexity;
- multiple CI/CD pipelines, storage instances, etc.;
- code duplication;
- frequent bugs in communication protocol (unless there’s a shared protocol dependency that both repositories use);
- amount of work to deliver some features just explodes. It’s hard to get service boundaries right when the real driver behind the split is not the desire to have the best architecture possible.
So, if your team is less than, say, 7 developers, you should almost never have several source code repositories.
Also do not rush to have “microservices” ASAP. Keeping your monolith around has several advantages:
- your team will be as productive as possible, because there will be no overhead;
- you give your product some time to shape itself, so that you have more data to properly chose service boundaries when the time comes.
Summary: In many ways similar to library/tool upgrades. Unjustified “microservices” with split repositories is one of the worst kinds of technical debt a small team can have.
Leaky abstractions
Another example of a very painful debt that can form during natural evolution of a product. Or rather (and more often) unhealthily fast growth, overclocked by management.
Sometimes there’s just no time to properly design and implement a standalone model or business process. Development teams just have to build on top of existing model or reuse existing processes in this case. This inevitably leads to leaky domain abstractions.
If business doesn’t know that the feature was built this way, they might want to immediately evolve the functionality without paying respect to the enormous amount of tech debt introduced by such leaky abstractions.
So, the first thing you should do in this case is to make sure management knows how bad the hack is and how much pain it will bring in future. Hopefully with some numbers. If you’re persistent enough, who knows — you may be lucky to secure a resource window to deal with the debt in the nearest future.
As with other most critical cases of tech debt, the longer you go without repaying leaky abstraction debt, the higher the interest. Keep tickets for the most painful cases with high enough priority to annoy your managers. Otherwise, it can become a mess that’s too hard to untangle.
As far as tactics to implement the fix — it’s pretty similar to other big refactorings (see two previous sections). I will bother to repeat: the best thing to have is to be able to implement it in several consequent steps.
Duplicated code
Duplicated code is not necessarily a bad thing.
Sometimes it just happens, that two components run identical pieces of logic. They might work in totally different contexts, so logic can diverge if future. Hence, it may be correct to not DRY this spot, even if the code is identical at this point in time.
Often duplicated code indicates problems with module and/or dependency graph (see the section above). You might have to address those first, before you’re able to DRY.
Duplicated pieces of code can only be considered technical debt, if this code has to change. When fixing bugs or adding features, duplicates incur additional risks of not being updated properly. From “tech finance” standpoint, it only makes sense to DRY relatively “hot” code blocks — these are blocks that have been changing recently.
Here’s an interesting take on how you can automate tech debt interest rate estimations with your VCS:
Of course, if the code hasn’t changed for a long time, but you have ambitious plans to evolve this part of the system, getting rid of duplicates can still be a good investment.
Tracking duplicates
In my experience, code duplicate issues are usually too small to have a dedicated task in the bug tracker. So, I prefer to leave “todo” comment and refactor it “on-the-fly” while working on some related feature.
There are limits and tradeoffs here, of course. If some complex logic being duplicated causes regular pain, bugs and angry customers — it’s worth its own ticket and priority for sure.
Make sure to leverage modern tooling to detect and DRY small duplicates on the spot. For example, IntelliJ IDEA offers fantastic duplicate detection tool, do not sleep on that one!
Summary: Not all duplicates are bad, and even bad ones aren’t often worth DRYing. Rate of change and future evolution plans are key. Do not go crazy with tickets — depending on the process most of the duplicates can just be marked with a “todo” comment.
Scary/ugly/fragile/legacy code
From evaluation standpoint these kinds of issues are usually very similar to code duplicates. Similarity is well summarized by this picture:
So, treat it accordingly. Same things can be said about tracking — only serious and painful cases deserve an explicit backlog slot.
One important note: once you decide to refactor a bunch of fragile or unclear code, make sure you have an extensive test suite first. This way you can feel relatively safe about your refactoring.
Lack of tests and/or documentation
These are the usual veterans of any tech debt backlog, rotting there for years. And there are reasons:
- if something works fine and doesn’t need to change — there’s zero value in adding new tests;
- unless you’re an API provider, documentation hardly brings any value. It can be useful for onboarding, but it also gets outdated very quickly.
There are exceptions of course. Some critical calculations might deserve a wiki page for non-technical people to reference to. But overall, these should have the lowest priority, compared to all the other ones mentioned.
Hopefully it’s clear, that I’m not advocating to deliver features without tests. Having tests is essential. Here I’m talking specifically about already delivered and stable features, that don’t have good test coverage for some reason.
Other categories
There are other categories, which we’re not gonna look at that closely. Some of them are very similar to what we already discussed, like problems with build or performance bottlenecks. You should now be able to derive your approach to them. Some are too complex and deserve a topic of their own, like organizational tech debt, when company structure is what drags the progress.
As a last word — sometimes people just need to have some fun in their work. So even when some tech debt issue doesn’t look worth fixing on paper, if developers are very passionate about it, the resulting moral boost can pay for it. If you’re not in a desperate crunch and can afford it — make such exceptions from time to time.
Conclusion
That’s gonna do it for this post. I hope it makes some impact.
If you’re an aspiring developer, aiming to grow and take more responsibilities, you should now be armed to effectively approach tech debt issues. And if you’re a seasoned team lead, maybe you had a nice trip down the memory lane of learning all of this with your own experience.
Maybe you have something to add or even disagree with! Feel free to spark a discussion in the comments or on twitter.
Keep your tech debt in check and have a great day!
P. S. Thanks to Artsiom and Daniel who reviewed this post and provided valuable feedback!