Are you happy with crappy estimates? — Patterns and practices for better estimates

The estimates are a tricky thing. There are a lot of good estimation techniques, but no technique is perfect. There is also a lot of polarized debate about software estimation. I’ll try to avoid that and shed a systemic view on the topic.

Here you have my top 12 patterns and principles for better estimates in software projects and organizations:

💡1💡 Start with why and address stakeholders' needs

Don’t estimate if you don’t understand, why that is useful. And don’t get stuck with the idea that all estimates are useless. I know that it’s trendy nowadays, but such dogmatism hinders your ability to see and sense what kind of signals, feedback loops, and heuristics the other stakeholders need for prudent decision-making and better collaboration.

When building software, the most common motivations to estimate things relate to pricing, feasibility of investment, expectation management, coordination of work, building and maintaining trust, and allocation of resources (including but not limited to funding and budgeting). Once you know the reasons to estimate, the next step is to figure out the most efficient and reliable ways to fulfill the needs within a reasonable amount of time and effort.

💡2💡The realistic estimate of the value is more important than the estimate of costs (including but not limited to work estimates)

An estimate of the work amount is just a part of all costs. In the end, you are interested in the expected net value, and in that calculation work amount estimates one of the least important components.

Estimating the value is hard and tricky. I won’t dive into it in this post. Nonetheless, when evaluating what is a good enough estimate case by case, the main criterion for this evaluation is the estimated value. If the value of the whole is estimated in millions, it probably does not make sense to estimate the work amount in days, let alone hours.

Estimation error in value impact tends to cause more damage and tends to be harder to manage and quantify. If it turns out that your work estimates were overly optimistic, you can always try to cut the scope or reduce complexity in another way. In addition, usually, there are early warning signs about this. If it turns out that your estimate about the size of the customer base was overly optimistic, there is there are less to do, and the possible corrective actions tend to be more expensive and uncertain. And at last, the warning signs are harder to accept and believe, and you get them later than the warning signs of overly optimistic work estimates. It’s just too easy and pleasing to deceive oneself about the value and repeat, “They will come once we have built all these features”.

Coping with estimation errors

💡3💡All estimates have errors and they are always somewhat unreliable, and that’s fine

It does not matter how you estimate; the estimates are unreliable.

It sucks to take risks and do decisions based on unreliable data, but so it goes. Nonetheless, it’s wise to figure out ways to minimize damage caused by unavoidable estimation errors.

For sure, you can criticize estimates in various ways, but that criticism does not address, say, a sale person’s pains with pricing that a traditional and conservative might accept and understand.

I keep on telling stakeholders and customers that in the very early stage, you MUST prepare for an error of magnitude of +300% (i.e., the actual work amount is 4 times more). It can be bigger, but that +300% is the bare minimum from a risk management point of view in the early stage.

If you cannot tolerate that big error and risk, you have two prudent and good options:

Don’t do it at all because you have a high chance of failure and
Cut the scope of the initial version to ensure that you can deliver something useful within the available budget.

💡4💡A risk of being wrong is a bad reason not to estimate and predict the results.

The risk of significant estimation errors can be a reason to seek more evidence. The primary goal of estimations and predictions is not to be right but to make as prudent decisions as possible within available evidence and time.

I repeat: The only reason to estimate anything is to help decision-makers make more prudent decisions and collaborate more efficiently. It’s possible that some stakeholders ask things that won’t help them reach their goals. It’s possible that you don’t get their goals and needs when you feel that their estimation requests make no sense.

Sometimes, a large error margin is a problem. Sometimes, it is not. And sometimes, the problem is that the error margin is not explicated at all. It is surprisingly hard for some to accept that forecasting and estimating are not exact sciences and that the magnitude of error is equally relevant information as the actual estimate. Estimates that prove to be wrong early can work as useful warning signs.

Then again, don’t overdo the estimation part of planning. Sometimes, increased certainty is not worth the time and money. Sometimes, you just don’t have enough evidence for more precise and accurate estimates, and all attempts to make them better, in fact, just make them more delusive. When estimating, it’s important to be aware what you know and of what you guess, otherwise you cannot make a difference between well-found and delusive estimates.

At last, often the least expensive way to improve chances of success and prudent decisions is experimentation with minimal investment. In that way, you get more evidence, and you can estimate a bit better with a slightly smaller expected estimation error.

💡5💡Compensate estimation errors by using many techniques

There are two main approaches to estimates and estimation: subjective, expert’s opinion-based and objective, evidence-based heuristics and metrics. This applies to both costs and value. Both are useful when applied properly, and neither should be relied on blindly.

There are techniques that you could put in either of these classes, and some techniques fit neither one. E.g. “a community voting as a way to set up the value of an open source library feature” fits both categories. Asking for an estimate from ChatGTP fits neither of the categories (and should not be used alone).

Evidence-based heuristics and metrics (about lead time, cycle time, etc.) give more accurate estimates and forecasts than an expert’s opinion-based approach if there is a good amount of relevant data available. However, the evidence-based metrics and heuristics won’t work if your dataset is inadequate.

A small but diverse set of heuristics, metrics, and methods for estimation is the most efficient way to maximize reliability of the estimate. Use of many diverse methods compensate the errors and biases in the individual methods but also requires more work and time.

In practice, you need to consider the trade-off between the required effort and error tolerance on a case-by-case basis. This is especially true when estimating an idea's value.

When navigating toward the goal

💡6💡 “How much work it requires” and “how much work is still left” are often sub-optimal questions.

They’re bad if you don’t have a good understanding of what is needed and what is not, and even worse if you think that you know but you don’t.

Usually, you have identified correctly the most things that are needed and important and then you have a lot of assumptions that just make the solution needlessly complex and laborious.

Instead of the mentioned questions, consider opening “when might we see some results” and maybe as addition “And what kind of results are needed first to ensure that the approach is working”. You may need to explain your intention behind this question, in addition. If (or better, once) you learn that the first results weren’t yet sufficient, ask “when we might see more results”, and so forth. Rinse and repeat until you have found the simplest, good enough solution.

The exact wording of the questions has secondary importance. The goal behind this opening is to find the simplest possible thing that could work and to which the team is willing to commit to finding out if it is good enough.

If you ask “how much” and the estimated thing is vague and big, the imagined scope is “safe one” and there is no commitment to deliver it within that scope. Quite the contrary, the estimators often think carefully about how likely it is that the scope changes and expands. If things go wrong, there is an increased risk of blaming, lack of trust, and increased risk of micromanagement.

The most agile development teams I’ve worked with are still stuck in the age-old idea that “well-known and validated requirements must precede work estimates”. In practice, that doesn’t make sense. If you don’t know what is needed, you need to figure out what is the shortest path to find out and deliver what is needed. If you cannot estimate what is the shortest path just because the requirements aren’t well-known, clear, and validated, the only way to find it is luck.

💡7💡Don’t waste much time to estimate work amounts of exceptionally valuable ideas.

It’s probably wise to do them anyway. Instead, use your time to estimate what is good enough and slice the ideas so that the first value-adding results are delivered early (see also 💡4💡).

There are some exceptions to this rule of thumb. For instance, if you suspect that even the simplest possible solution might be too expensive or not doable within a reasonable time frame, a careful work estimate as a sanity check is wise.

Also, if everything seems to be highly valuable, you probably just don’t truly know what is valuable, and you should seek more evidence about the value of the different ideas. Using work amount as a tiebreaker is rarely the optimal solution.

Improving metrics

💡8💡Classify and model data to make evidence-based metrics more useful for forecasting and easier to grasp for all

If the items in the backlog are too different, you cannot use average and median values of metrics such as lead time and cycle time for forecasting because variance and expected error are too big. In this case, finding a good way to classify work is helpful. Often, I start with three main classes “new features”, “bugs and ad hocs”, “complicated special cases” but a good enough classification scheme can be (and often is) more complex. For example, the following classes have been useful for me: small bug, tricky bug, small change, tricky change, requires collaboration with another team, and integration. As an example, cycle time for “new features” is 1–3 weeks, for “bug fixes” a day or so and for “complicated special cases” a few months.

Of course, the numbers and heuristics vary from team to team and from project to project, but it is relatively easy to collect the data needed for rules of thumb and ensure that they remain valid. Once you have found good enough heuristics for the needs of different stakeholders, you’ll need to waste a minimum amount of scarce shared time and cognitive capacity on estimates.

These kinds of classifications are just an example of how you might make it more useful for forecasting. Generally speaking you should consider ways to make forecasts and predictions easier to grasp for those that are not software professionals.

When things go well, burndown char is an excellent visualization tool. And there are many others. However, every visualisation has its limits. A burndown chart is most likely not the right tool to visualize what is slowing you down, and it may give a false impression about the last mile of a long and complex software project. Whatever approach you use to visualize the big picture, you need to balance between ease of understanding and over simplification

💡9💡Start with something simple and intuitive. Improve if it sucks or just doesn’t work.

For instance, if it’s natural to give a ballpark estimate in person workdays and everyone is satisfied and happy with them, that’s fine. If planning poker is a natural and value-adding part of sprint planning (that’s rare but possible), that’s fine. These estimates are probably rather crappy and inaccurate, but so what.

You are not seeking the truth, are you? Rather you seek — at least, you should seek — good enough means to fulfil other stakeholders' needs relating to prudent decision-making and efficient collaboration (see 💡1 & 4💡). If you cannot share their ideals or ideas for reaching the goal, try to offer something better and convince them that another way would work better. If you cannot, well, then minimize the damage and adapt. That’s realism. Your chances of success in change are better if you propose something simple and intuitive that is easy to grasp and effortless. If you need to start with an hour lecture — well, it probably, it is not that easy, and it will take time.

The estimates and forecasts based on the hard evidence and statistics may feel “the right way”, but they also require substantial investment in data collection, number crunching and modelling the work. In fact, collecting the data needed for all kinds of metrics and dashboards easily becomes a big burden and a source of waste. If you know why to estimate and forecast things, all this this should be common sense. Otherwise, return to the basics: what kind of predictions and estimates improve the chances of success, and based on what? E.g., do we even share the idea what it means to succeed in terms of work and proven of progress?

Improving systems conditions

💡10💡To reduce impact of estimation error, minimize dependencies between teams and avoid unnecessary constraints (e.g. hard deadlines without justification).

For instance, in a multi-team setup with many dependencies between the teams, the delays tend to cascade in unpredictable ways. In my experience, it’s unrealistic to think that you could handle cascading delays just by “better coordination and upfront planning”. If there can be delays in the critical part of delivery flow, you’ll have critical delays sooner or later. Just accept the realities, and minimize the damage due to the delays instead.

Say, two teams need to build an integration between two services. To increase resilience against delays, consider ways to mitigate implementation order-related constraints (e.g., the service part of an integration must be implemented before the client part). You might be able to do so, for instance, with a combination of feature flags, automated tests, transferable schemas, faked interfaces, and some planning together.

💡11💡High amount of work in progress (WIP) decreases the accuracy of predictions and forecasts.

It does not matter what kind of estimation techniques you use. High WIP has disastrous impact to subjective, expert’s opinion-based estimates mainly because it is hard to take into account the impact of task switching and increased need for coordination.

The impact is slightly smaller in estimates based on historical data if the amount of WIP is stable. On the other hand, unpredictably oscillating amount of WIP tends to make evidence-based, statical metrics next to useless for forecasting purposes.

Low level of WIP is prerequisite if you wish to have predictable software delivery flow.

💡12💡Low level of trust and psychological safety leads to fabricated numbers and false optimism

Subjective, expert opinion-based estimates don’t work at all if the experts fear telling what they think. Evidence-based metrics are no better. People are really good at making numbers look good if it is needed to save their ass.

I don’t know how much trust and psychological safety you need for reliable estimates. When I notice that the way progress is expressed does not match reality 📝 that there is needed as a facade that hides problems, I know in this organization estimates are just tokens in the political game. Even if you refuse to play that game, you need to consider it to get anything done. Corporate politics is not necessarily bad, nor is it good. It is just a feature of big organizations

🎇 All images are generated with Generative AI (DALLE-4) using Microsoft Bing image creator.

📝 Originally published in Medium.com at 25.3.2024 by the author.

Are you happy with crappy estimates? — Patterns and practices for better estimates

Coping with estimation errors

When navigating toward the goal

Improving metrics

Improving systems conditions

Ari-Pekka Lappi