How lack of pre-analysis doubles or triples the costs for business rule extraction from legacy systems

by Suvo Dutta · Published June 13, 2019 · Updated November 20, 2019

Legacy platform modernization is one of the top IT priorities for most business organizations. It is a key strategy for organizations to stay relevant in the modern market. However, legacy modernization is typically a large and complex initiative. One of the reasons why legacy modernization is complex is because it typically involves business rule extraction from legacy systems.

Business rule extraction from legacy mainframe systems has always been a complex task, involves a lot of unknowns, and the degree of accuracy is traditionally low. This makes it difficult to estimate what it takes to perform business rule extraction with reasonable accuracy.

Potential for inaccurate estimates in business rule extraction makes it difficult for organizations to budget the overall legacy modernization work effectively.

The estimates are normally erroneous causing the actual cost of execution to deviate from the budgeted cost by significant margins. Such erroneous estimates impact budget and bottom-line in big ways.

Why is estimating business rule extraction from legacy systems difficult, and how can we resolve it?

We will talk about that in this paper.

Why estimating business rule extraction from legacy mainframe systems difficult?

The legacy systems (especially mainframe systems) are complex by nature. Generally, these platforms evolve over time into large “monoliths” that contain thousands of business rules piled into “spaghetti” code.

What’s more, many of these business rules are not industry-standard but unique in the context of how an organization runs its business.

While the uniqueness makes these rules a critical business asset for the organization, its “secret sauce”, but the way these rules are architected in legacy systems creates the biggest hindrance to the modernization of those systems.

To put it in a simple statement, it is super-difficult to extract these rules from the existing legacy mainframe systems to be able to re-platform, re-write, or re-architect those rules for modernization.

What adds to the pain is the lack of documentation and knowledge retention for these legacy systems. These systems have been running and patched up for many decades. That caused documentation to have gone out-dated, SMEs have either retired or do not live on this planet anymore. The new workforce has no idea how the system works.

What option do we have for estimating business rule extraction?

But then what are we left with if we have to refactor such an application?

What are we supposed to do if we want to transform such an application into something new that is technologically more relevant in today’s world, but without costing the running business?

And, how do we estimate the work involved?

The answer to the above questions is “Bottom-up technical code analysis to extract hidden business rules” or in other words, “business rule extraction through source code analysis”!

No, that does not sound like fun does it! but that is what we are left with, other options being practically non-existent in a meaningful way.

What are the challenges in that option?

So what are the challenges in the approach to business rule extraction through source code analysis? Let’s talk through those!

Business rule extraction through code analysis is not one but consists of many layers of work that have to happen in sequence, as follows —

Understanding and slicing the whole monolithic system into contextual business-process flows (often called “Domain-Driven Decomposition”)
Inventorying the technical components in each of those contextual flows
Technical rules extraction from those components
Rationalization of the extracted technical tules
Translation of rationalized technical rules into business rules in plain English language and in the right functional context

Business rule extraction from a legacy old-tech system by following the above steps is mostly a manual and tedious exercise. This is because of a number of reasons as follows —

There are no industry-standard tools that can do the job automatically in a meaningful way (even though many available commercial solutions claim to do that).

What’s more, the skills in most of the legacy technologies are receding away fast. So there are ever-rising technical knowledge risks.

What makes it even worse is the fact that developers and technical teams mostly work in silos and do not possess the full big picture of these large monolithic legacy systems. This makes it challenging to connect the silos and draw the appropriate functional context from the atomic rules extracted locally across the system.

Thus, for a legacy mainframe system with the above drawbacks, how do we build the scope and estimate what it takes to extract all relevant business rules to help modernize or transform the system without risking any business processes?

What is the traditional approach to mitigating those challenges, and what is the impact of that approach?

The traditional approach is to combine the below three steps —

Build up an inventory of modules (or, program components) to arrive at the ‘Component count’ in the scope of analysis
Derive the ‘Average lines of code’ or LOC at the level of each component in the inventory
Then combine the ‘Component count’ and ‘Average lines of code (LOC) per component’ to arrive at the necessary estimates.

However, the above approach is just too linear and only considers the tip of the iceberg. There are a host of other complex variables and unknowns that clearly prove the futility of the above approach!

Those complexities can easily disarray the ‘Component count/LOC’ based estimates by inflicting margins of errors into the estimates in astronomical quantities. This can potentially lead to a severe underestimation of the work at hand.

What’s more, such astronomical margins of error often get realized at the eleventh hour or at the point of no return.

This can play havoc on the fate of the overall program. The execution team simply would not have any clue about what they are jumping into based on the estimates committed through the LOC based methodology.

This can potentially double or triple the costs of business rule extraction than what is likely to be estimated using the traditional LOC based estimation approach.

What is the right approach for business rule extraction and how?

That is why the pre-analysis of the legacy systems is so relevant and important to enable an estimate with better accuracy. While this pre-analysis may seem like additional overhead and contributor to additional costs and time, it could be the savior from a much bigger disaster.

But the question is how much pre-analysis is needed or is important that will give sufficient insights to reasonably estimate business rule extraction from a legacy system! Is there any framework or guidelines that can help?

That is what this paper entails. It discusses four key dimensions that the pre-analysis should be based upon for providing the necessary insights that can help to estimate with reasonable accuracy.

It is written from first-hand experience. My hope is that it will serve as the guardrails for the pre-analysis strategy for estimating business rule extraction from legacy mainframe systems.

What are the dimensions for pre-analysis?

While not limited to these, stated below are the four areas that you should keep in mind for pre-analysis that will effectively help to understand and estimate the business rule extraction exercise from a given legacy system.

1. Long nested Call Chains (Long chain of Calling/Called dependencies):

Mainframe legacy applications traditionally employ heavy program call chains that can go more than 10 components deep.

Any or all of these “called” components can host business rules.

Also, there would definitely be multiple call chains.

What’s more, the highly inter-twined “spaghetti” nature of Cobol codes keeps many of these ‘CALL’ statements under the covers or encapsulated in Procedural Copybooks or in forms of other linking command like ENTRY, etc.

Additionally, the called modules may reside in multiple source code libraries.

With so much of complexities to consider, pre-analysis becomes critical. Without having pre-analysis done, it may result in potential misses in the components in the inventorying process. This might result in severe underestimation of the work.

Imagine if you discover a large block of additional components during execution! You will have no additional budget or time to take those additional components in scope, simply because if you have not planned or estimated for those components. This can potentially derail your program.

2. Rules are written in multiple technologies in a legacy application:

A common aspect of old legacy application ecosystems is the lack of IT governance.

As a result, these legacy systems do not follow a consistent technology stack.

Thus, it is not necessary that all components in a “call chain” will be built in Cobol. There could be components written in Cobol, Assembler, Easytrieve, and a bunch of other legacy technologies.

The choice of these technologies was mainly driven by the available skillsets at a point in time when these components were built, or on specific job functions that the components were meant to perform, like data processing or creating reports, etc.

Unknown technology issue

With no pre-analysis done, the full technology landscape of the system may not be able to be discovered at the time of estimation. This will cause many technical complexities to remain hidden and not factored into the resource planning or estimates for the work. This is a potential risk.

It may lead to a sudden discovery of new technology components during execution that was never planned for. This can cause a huge spike in the actuals that would be totally not in line with the original estimates.

This would be a recipe for failure.

Skillset issue

What’s more, skillsets might not be available when you discover these new technology components during execution, because if you never planned or procured those skillsets. Such unknowns can easily sink your ship.

That is where pre-analysis becomes so important.

3. It’s a jungle out there – Rules are everywhere

Large and monolithic legacy mainframe systems are a jungle of business rules. Business rules exist all over the place across program components, copybooks, stored procedures, and even database tables.

It can be visualized as a big mesh of wires like this picture.

Untangling the business rules from this mess can be overwhelming. It is anything but easy.

Such a messy persistence of rules can lead to potential leakages in the discovery of those rules unless it is understood how they persist and span across the system, at least at a high level.

The result would be that you will end up on a broken business process if you do not carry forward all the rules that work together to build up that business process. That is a business risk.

That is where pre-analysis will help. It will guide you with what all places in the system you should look so you don’t miss out on critical business rules.

4. Rationalization of extracted rules into meaningful information

What does this mean?

Remember, we said that one of the key stages in business rule extraction from legacy systems is the rationalization of technical rules, and then the translation of those rationalized technical rules into business rules in plain English language and in the right functional context.

The functional context is “golden” here. It is this context that makes the rules meaningful and usable from a modernization perspective.

As an example, from an insurance carrier’s perspective, a meaningful business rule extraction would involve all client-specific rules to be rationalized under Client Management business function, which needs to be separate from billing rules that should be rationalized under the Billing function.

Also, the rules should be organized in the sequence of business events under each of these business functions.

How pre-analysis helps?

Pre-analysis helps to draw a strategy for domain-driven decomposition which is the pre-requisite to business rule extraction. Such a strategy helps to arrive at the hierarchy of processes and sub-processes in which the rules should be grouped into for providing meaningful contexts and perspectives.

Without having this done in the pre-analysis phase, the team will grapple with the messaging of the rules in a meaningful way. Remember, rules make sense when put in context. It is not easy to derive that context without a functional idea of the ecosystem. Pre-analysis can give you the opportunity to create a context-driven framework that will be helpful.

Summing it up

Large transformation programs have a low success rate. The main reason for that is unpredictability caused by many unknowns. The more we can get insights into these unknowns the better we can manage the risks and get more control over what we do and how we do. That increases our chances of success.

Pre-analysis is a strategy that can help us get those insights. It is a risk mitigation strategy that drives better control and more predictable outcomes in terms of cost and quality.

Thus, it is imperative that while business rule extraction is the foundational step to the platform modernization, pre-analysis of the legacy system is foundational to business rule extraction. Share your thoughts.

Read here some additional but interesting perspectives on how to extract business rules from legacy systems!

Disclaimer

The diagrams used in this paper are for illustrations only and taken from external sources/ 3rd party websites.

How lack of pre-analysis doubles or triples the costs for business rule extraction from legacy systems

Why estimating business rule extraction from legacy mainframe systems difficult?