Sunday, 22 July 2012

Debuggers are for Losers



Code defects are not planned; they are accidents that result from an inability to understand execution flow under different input conditions.
But just as airbags are the last line of defense in a car, a debugger should be the last line of defense for a programmer. Defensive driving reduces or eliminates reliance on airbags.  Defensive driving is proactive, it is using all available data to predict and avoid problems.

The alternative to defensive driving is to drive inattentively and treat driving like you are in a demolition derby.

Situations exist where hardware and software debuggers are essential and have no substitutes.  But most of the time the debugger is not the best tool to find and remove defects.

Debuggers will not fix the defect for you. 

Debuggers are not like washing machines where you can throw in your code and some soap, go get some coffee and watch TV, and then 30 minutes later the defect is removed.

You can only fix a defect when you understand the root cause of the problem.

Remember, removing a defect does not advance your project; your project will be on hold until you resolve the problem.  You are better served by making sure the defect never gets into your code rather than learning to become efficient at getting rid of them.

Inspiration, not Perspiration

A defect is an observation of program behavior that deviates from a specific requirement (no requirements = no defect :-) ).  A developer familiar with the code pathways can find defects by simply reasoning about how the observation could arise.

At McGill, my year was the last one that had to take mainframe programming.  Debuggers did not exist on the mainframe, you would simply submit your code for execution in the batch system, get a code dump if there was a defect, and send the listing and the dump to the printer. 

The nature of mainframes in university was that you would queue up at the high speed printer with the 10—15 other people all waiting for their print-outs. 

Not having anything else to do, I would think about how my program could have failed – and in the vast majority of cases I was able to figure out the source of the defect before my listing would print.
The lesson learned was that the ability to think is critical to finding defects.

There is no substitute for the ability to reason through a defect; however, you will accelerate the process of finding defects if you lay down code pathways as simply as possible.  In addition, if there are convoluted sections of code you will benefit by rewriting them, especially if they were written by someone else.

Not There When You Want Them

Even if you think the debugger is the best invention since sliced bread most of us can't use a debugger on a customer's production environments

There are so many production environments that are closed off behind firewalls on computers with minimal O/S infrastructure.  Most customers are not going to let you mess around with their production environments and it would take an act of god to get the debugger symbolic information on those computers.

In those environments the typical solution is to use a good logging system (i.e. Log4J, etc).  The placement of the logging calls will be at the same locations that you would throw exceptions – but since this is a discussion of debuggers and not debugging we will discuss logging at another time.

Debuggers and Phantom Problems

I worked for one client where enabling the compiler for debugging information actually caused intermittent errors to pop up.  The program would crash randomly for no apparent reason and we were all baffled as to the source of the problem. Interestingly enough, the problem was only happening in our development environment and not in the production environment.

Our performance in development was quite slow and I suspected that our code files with symbolic information were potentially the source of the problem.  I started compiling the program without the symbolic information to see if we could get a performance boost.

To my surprise, not only did we get a performance boost but the random crashes vanished.  With further investigation we discovered that the symbolic information for the debugger was taking up so much RAM that it was causing memory overflows.

Debuggers and Productivity

Research in developer productivity has show that the most productive developers are at least 10x faster than everyone else. 

If you work with a hyper-productive individual then ask them how much time they spend in the debugger, odds are they don't...



Do you know who Charles Proteus Steinmetz was?  He was a pioneer in electrical engineering and retired from GE in 1902.  After retirement, GE called him because one of the very complex systems that he had built was broken and GE’s best technicians could not fix it. 

Charlie came back as a consultant; he walked around the system for a few minutes then took out some chalk and made an X on a part (he correctly identified the malfunctioning part).

Charlie then submitted a $10,000 bill to GE, who was taken aback by the size of the bill (about $300,000 today).  GE asked for an itemized invoice for the consultation and Charles sent back the following invoice:
Making the chalk mark          $1
Knowing where to place it      $9999

Debuggers are Distracting

As Brian W. Kernighan and Rob Pike put it in their excellent book “The Practice of Programming
As personal choice, we tend not to use debuggers beyond getting a stack trace or the value of a variable or two. One reason is that it is easy to get lost in details of complicated data structures and control flow; we find stepping through a program less productive than thinking harder and adding output statements and self-checking code at critical places. Clicking over statements takes longer than scanning the output of judiciously-placed displays. It takes less time to decide where to put print statements than to single-step to the critical section of code, even assuming we know where that is. More important, debugging statements stay with the program; debugging sessions are transient.

A debugger is like the Sirens of Greek mythology, it is very easy to get mesmerized and distracted by all the data being displayed by the debugger.

Despite a modern debugger’s ability to show an impressive amount of context at the point of execution, it is just a snap shot of a very small portion of the code.  It is like trying to get an image of a painting where you can only see a small window at a time.  Above we show 6 snapshots of a picture over time and it is hard to see the big picture.

The above snap shots are small windows on the Mona Lisa, but it is hard to see the bigger picture from small snap shots.   To debug a problem efficiently you need to have a higher level view of the picture to assess what is going on (effectiveness).
If you have to find defects in code that you have not written then you will still be better off analyzing the overall structure of the code (modules, objects, etc) than simply single-stepping through code in a debugger hoping to get lucky.

Defects occur when we have not accounted for all the behavioral pathways that will be taken through the code relative to all the possible data combinations of the variables.

Some common defects categories include:
  • Uninitialized data or missing resources (i.e. file not found)
  • Improper design of code pathways
  • Calculation errors
  • Improper persistent data adjustment (i.e. missing data)
The point is that there are really only a few ways that defects behave.  If you understand those common behaviors then you can structure your code to minimize the chances of them happening.

Defensive Programming

The same way that we have defensive driving there is the concept of defensive programming, i.e. “The best defense is a good offense”.

My definition of defensive programming is to use techniques to construct the code so that the pathways are simple and easy to understand. Put your effort into writing defect free code, not on debugging code with defects.

In addition, if I can make the program aware of its context when a problem occurs then it should halt and prompt me with an error message so that I know exactly what happened.

There are far too many defensive programming techniques to cover them all here, so let's hit just a few that I like (send me your favorite techniques email ):
  • Proper use of exceptions
  • Decision Tables
  • Design by contract
  • Automated tests

Proper Use of Exceptions

One poor way to design a code pathway is the following comment:

// Program should not get here

Any developer who writes a comment like this should be shot.  After all, what if you are wrong and the program really gets there?  This situation exists in most programs especially for the default case of a case/switch statement.

Your comment is not going to stop the program and it will probably go on to create a defect or side effect somewhere else. 

It could take you a long time to figure out what the root cause of the eventual defect; especially since you may be thrown off by your own comment and not consider the possibility of the code getting to that point.

Of course you should at least log the event, but this article is about debuggers and not debugging – so let’s leave the discussion about logging until the end of the article.

At a minimum you should replace the comment with the following:

throw new Exception( “Program should not get here!”);

Assuming that you have access to exceptions.  But even this is not going far enough; after all, you have information about the conditions that must be true to get to that location.  The exception should reflect those conditions so that you understand immediately what happened if you see it.

For example, if the only way to get to that place in the code is because the Customer object is inconsistent then the following is better:

throw new InconsistentCustomerException( “New event date preceeds creation date for Customer ” + Customer.getIdentity() );

If the InconsistentCustomerException is ever thrown then you will probably have enough information to fix the defect as soon as you see it.

Decision Tables

Improper design of code pathways has other incarnations, such as not planning enough code pathways in complex logic.  i.e. you build code with 7 code pathways but you really need 8 pathways to handle all  inputs.  Any data that requires the missing 8th pathway will turn into a calculation error or improper persistent data adjustment and cause a problem somewhere else in the code.

When you have complex sections of code with multiple pathways then create a decision table to help you to plan the code. 

Decision tables have been around since the 1960s, they are one of the best methods for pro-actively designing sections of code that are based on complex decisions.

However, don't include the decision table in a comment unless you plan to keep it up to date (See Comments are for Losers).


Design by Contract

The concept of design by contract (DbC) was introduced in 1986 by the Eiffel programming language.  I realize that few people program in Eiffel, but the concept can be applied to any language.  The basic idea is that every class function should have pre-conditions and post-conditions that are tested on every execution. 

The pre-conditions are checked on the entry to the function and the post-conditions are checked on the exit to the function.  If any of the conditions are violated then you would throw an exception.

DbC is invasive and adds overhead to every call in your program.  Let's see what can make this overhead worth it.  When we write programs, the call stack can be quite deep.

Let's assume we have the following call structure; as you can see H can be reached through different call paths (i.e. through anyone calling F or G).

Let's assume that the call flow from A has a defect that needs to be fixed in H, clearly that will affect the call flow from B

If the fix to H will cause a side effect to occur in F, then odds are the DbC post conditions of F (or B) will catch the problem and throw an exception.

Contrast this with fixing a problem with no checks and balances.  The side effect could manifest pretty far away from the problem was created and cause intermittent problems that are very hard to track down.

DbC via Aspect Oriented Programming

Clearly very few of us program in Eiffel.  If you have access to Aspect Oriented Programming (AOP) then you can implement DbC via AOP.  Today there are AOP implementations as a language extension or as a library for many current languages (Java, .NET, C++, PHP, Perl, Python, Ruby, etc).

Automated Tests

Normal application use will exercise few code pathways in day to day application use.  For every normal code pathway there will be several alternative pathways to handle exceptional processing, and this is where most defects will be found.

Some of these exceptions do not happen very often because it will involve less than perfect input data – do you really want those exceptions to happen for the first time on a remote customer system for which you have few troubleshooting tools?

If you are using Test Driven Development (TDD) then you already have a series of unit tests for all your code.  The idea is good, but does not go far enough in my opinion.

Automated tests should not only perform unit tests but also perform tests at the integration level.  This will not only test your code with unusual conditions at the class level but also at the integration level between classes and modules.  Ultimately these tests should be augmented with the use of a canonical database where you can test data side effects as well.

The idea is to build up your automated tests until the code coverage gets in the 80%+ category.  Getting to 80%+ means that your knowledge of the system pathways will be very good and the time required for defect fixing should fall dramatically.

Why 80%+? (See Minimum Acceptable Code Coverage)

If you can get to 80%+ it also means that you have a pretty good understanding of your code pathways.  The advantage to code coverage testing is that it does not affect the runtime performance of your code and it improves your understanding of the code.

All your defects are hiding in the code that you don't test.

Reducing Dependence on Debuggers

If you have to resort to a debugger to find a defect then take a few minutes after you solve the problem and ask, "Was there a faster way?".  What did you learn from using the debugger?  Is there something that you can do to prevent you (or someone else) having to use the debugger down this same path again?  Would refactoring the code or putting in a better exception help?

The only way to resolve a defect is by understanding what is happening in your code.  So the next time you have a defect to chase down, start by thinking about the problem.  Spend a few minutes and see if you can't resolve the issue before loading up the debugger.

Another technique is to discuss the source of a defect with another developer.  Even if they are not familiar with your code they will prompt you with questions that will quite often help you to locate the source of the defect without resorting to the debugger.

In several companies I convinced the developers to forgo using their debuggers for a month.  We actually deleted the code for the debuggers on everyones system.  This drastic action had the effect of forcing the developers to think about how defects were caused.  Developers complained for about 2 weeks, but within the month their average time to find a defect had fallen over 50%.

Try to go a week without your debugger, what have you got to lose?


After all, the next graph shows how much effort we spend trying to remove defects, the average project ends up spending 30% of the total time devoted to defect removal. Imagine what a 50% boost in efficiency will do for your project? The yellow zone sketches out where most projects fall.

Function Points Average Effort
10 1 man month
100 7 man month
1000 84 man month
10000 225 man year

Conclusion

There will definitely be times where a debugger will be the best solution to finding a problem.  While you are spending time in the debugger your project is on hold, it will not progress.

Some programs have complex and convoluted sections that generate many defects.  Instead of running through the sections with the debugger you should perform a higher level analysis of the code and rewrite the problematic section.

If you find yourself going down the same code pathways multiple times in the debugger then you might want to stop and ask yourself, "Is there a better way to do this?"

As Master Card might say, "Not using the debugger... priceless".



Moo?
Want to see more sacred cows get tipped? Check out:

Thursday, 5 July 2012

Shift Happens

Scope shift (creep) is inevitable. Risk involves uncertainty; so it is not a risk because it is certain to happen, the only uncertainty is how much shift will occur and whether it will cause your project to fail.

Understanding the reasons for scope shift is the first step towards removing the causes of scope shift.  Once you understand the reasons you can dramatically improve your chances of project success.

There is a rhythm to software development that matches the length of your software release cycle. The longer the cycle the more  scope creep or shift happens.

Your effectiveness at capturing requirements will dictate the ideal length of that cycle. When your project length exceeds your ability to capture requirements the following nightmare happens.


In the beginning was the idea, and the idea was good. The idea was good and so senior management blessed it and turned it into a software project. You began to gather the requirements and assemble unto you a project team. Senior management declared the deadline and IT management and the team accepted it in silence. 

It was at this point that it all went wrong…  

Initially, there will be no sense of urgency and the team will calmly began to assemble the software. The team will then produce the first build that can be shown to management. Someone will recognize that the project is off course and this will lead to a set of meetings to resolve the issues. Intensity will pick up and corrections to the requirements will be made. Team requests to fix the architecture will be made and turned down because it would delay the project.  

IT and project managers will omit details from their reports so that senior management will think everything is on track. Many tasks will get to 95% complete and yet the software will not seem to be complete. The subsequent builds will slow down as you as you approach the deadline, the number of bugs will increase, stability will decrease, and discussions will devolve into fire-fighting.  

Now the team will have a sense of urgency and Intensity will increase until the team is working extremely long hours. With luck and heroic efforts you that you can declare “mission accomplished”.


If you are unlucky the entire mess descends into finger pointing and damaged resumes.  



For some reason, few organizations do post-mortem analysis after the nightmare to figure out what went wrong.

Probably because between all the finger pointing and politics everyone wants to move on as quickly as possible and pretend that nothing happened.  Move along, nothing to see here.

Most organizations dust themselves off and rinse and repeat.

If you have never experienced parts of the above nightmare then either:
  1. You live a charmed life 
  2. You have never worked on any project of significant size 
  3. You have always worked with Agile development 
Organizations don't know how to capture effective requirements that will lead to quality software systems. Let's borrow  Zeno's Paradox to talk about the effects of scope shift on a project and then let's figure out how to fix it.

Zeno , in the 5th century BC, philosophized that Achilles should not be able to catch up with a tortoise in a foot race; every time Achilles catches up with the last location of the tortoise, the tortoise has already moved.
Of course, Achilles does catch the tortoise and thus we have Zeno’s paradox.

Applying Zeno's paradox to software, Achilles represents our software team and it moves as our team builds out functionality; the tortoise represents the requirements it moves because of scope shift.

Unfortunately, Achilles does not always catch the tortoise in the software world. Today's statistics show that Achilles only catches the tortoise about 30% of the time before the race ends.

My claim is that if a project fails it would have failed because of poor requirements gathered before the first line of code was written

Unfortunately, the probability of a project being canceled is identical to the amount of scope shift.

Root Cause of Shift Happening
What causes the tortoise (requirements) to move is scope shift, which is the result of not having a complete set of requirements when the project starts.  There are two kinds of scope shift:
  1. The shift that comes from your industry changing
  2. The shift that comes from not scoping a project correctly
All industries shift at different rates.  At this time social media is moving very quickly and people are experimenting and discovering different ideas about what works and what doesn't -- it is normal for newer domains to shift strongly.  I consider this to be normal shift and par for the course.

Then there is the shift that comes from not scoping out your project correctly.  Capers Jones  does quite a bit of work in software litigation and one of the most common issues in software litigation is inadequate change control and requirements changing more than 2% per month.

If the requirements are complete and consistent, the team's progress would resemble the animation to the right. The circle is the scope of the requirements and the blue area represent the functionality of the software being built out at time tn. In a perfect world, we would see increasing functionality being delivered until we match our scope by the project end-date.

However, shift happens...

The reality is that requirements will change and/or be discovered to be inadequate as soon as development starts.Missing requirements will be discovered, inconsistent requirements will create fire fighting, and technically challenging requirements will have you scrambling for technical work-arounds. The scope will begin to shift and the tortoise will be off to the races.

Requirements are incomplete because:
  • insufficient time was taken to gather requirements
  • analysts are unable synthesize missing requirements  
  • subject matter experts were not available  
  • key stakeholders were not interviewed  
  • of inexperience with the subject matter domain, i.e. new software  
Requirements are inconsistent because multiple viewpoints are not vetted and turned into consistent requirements. Let's observe what the effects of incomplete and inconsistent requirements are as the project progresses.

A Day at the Races
Let's diagram the process of Achilles (team) chasing the tortoise (requirements).

As the team begins at time zero (T0) they perceive the requirements at time zero (R0)and will aim to build architecture to get there. If R0 is close to the actual requirements then you will have a successful project. Odds are you don't have good business analysts (product managers) and your requirements are incomplete and inconsistent.

So after a certain amount of time the requirements will shift either as you discover inconsistencies and missing requirements. T1 represents the functionality (code) that the team has built after the 1st time interval. They would have built out T1 with R0 as the target. However, they will notice that the requirements are now at R1 and that the architecture needs to be altered.
The team will try to make architecture adjustments to hit the new target (R1), but they are reacting to the scope change after the fact. By T1 the requirements have shifted to R2.

The diagram shows the requirements (Rn) moving perpendicularly to the time access, however, depending on your industry and requirements capability the requirements might be converging with the code that has been built out or it might be moving further away.

If you have a relatively complete and consistent set of requirements then there can not be much scope shift. With competent architects to design infrastructure and make estimates you should have a successful software project on your hands.

Competent architects can not make up for incomplete and inconsistent requirements.  Poor requirements translate to poor estimates.

Another Brick in the Wall
Organizations typically produce some form of business requirements document ( BRD) to describe the target system. The BRD generally ends up being a brick of paper a few inches thick. We assume that if the brick is big enough then all requirements must be covered; unfortunately the thicker the brick the more complacent we tend to be. Instead of a BRD we should probably call it a BRicK.

The bad news is that the team doesn't read the brick.

One colleague told me that many teams don’t even know where to find the brick! These documents are poorly written, inconsistent, and not in the language of the developers. Developers will simply pick and choose the pages of the brick that seem to apply to them and rely on business analysts and QA to describe the system.

Don't believe me?

If you look through your email server then you will find chains of requirement discussions that are not in your BRK.

In some cases the volume of requirements littered through your email server will exceed the size of the BRK.

The most accurate requirements never make their way back to the BRK and this will lead to fire-fighting later on.

Since the main deliverable of a software project is source code, and teams believe that they will not have enough time to write all the code, it seems the sooner that you start coding the better off you will be, correct?

Good requirements can be turned into code much faster than poor requirements.

Odds are that if you don't have any quality control processes integrated with your requirements gathering process then you are probably producing poor requirements.

OK, Shift Happens, So What?
There are a couple of common ways that organizations have tried to address scope shift:
  1. Don't create code until the requirements are complete and consistent 
  2. Expect your architects and team to handle scope shift
There are very few good business analysts (product managers) out there who can synthesize stakeholder requirements and turn them into effective requirements that architects can use to build high quality systems. Writing good quality requirements is hard and as much attention should be spent on them as on development.

Only the top 5% of business analysts are competent, the problem is the other 95% think they are in the top 5%.

The process of writing good requirements is too involved to explain here, but, the rule of thumb is that good analysts will be able to work out the requirements of a system in about 30% of the time of the time it takes to develop the entire system. The time needed must be modified by familiarity with the subject matter domain, i.e. re-engineering projects should take less time to gather requirements.

There are some projects that spend too much time writing requirements and they either never get started or start late. Rather than determining if the failure was due to poor analysts (product managers) or poor processes, these organizations conclude that gathering good requirements is either not possible or not worth the time.

The quality of your software system can't exceed the quality of your initial requirements.

Your Architect Can't Save You
Another way to handle scope shift is to assume that your architects are good enough to build a flexible system. Experienced architects are able to anticipate some scope shift and accommodate shifting requirements. However, technical people that really understand your industry are rare; they are the exception and not the rule.

Do you really want your technical people guessing what your business requirements should be?

A successful project depends on your architects knowing how big your project is.

For example, let’s suppose that you ask a builder to build a 10 story building. When the builder has put up 8 stories then change the requirements and say that the building needs to be 20 stories tall. In addition, say that you are not going to give him extra time or money to make the change.

In all likelihood, the builder will refuse the change request; the difference is that we try to do this all the time with software.

I Laugh in the Face of Danger

Beware of architects with "combat stress reaction".

Architects that experience project failure due to scope creep sometimes conclude that their architectures are not flexible enough.

This leads architects to overcompensate by over-engineering their next system. Architecture with too much flexibility imposes steep learning curves on the team and will cause very slow software development. You will be able to accommodate scope shifting; you will just do it very slowly and very painfully.

Expecting your architects to defend you against poor requirements is like trying to mow your lawn with scissors.

Not All Requirements are Created Equal

The architecture for a 2 story house is different than the architecture for the 102 story Empire State Building. The size of your software project depends on your core requirements; they in turn dictate what needs to be built out for architecture.

When core requirements are discovered (or added) as the project progresses, the architect will realize that a different set of technologies and/or techniques should have been used in the project. This is when the following conversation will happen:

Manager: We need to have feature X changed to allow behavior Y, how soon can we do this?

(long pause)

Architect: We had asked if feature X would ever need behavior Y and we were told that it would never happen and we designed the architecture based on that. If we have to have behavior Y it will take 4 months to adjust the architecture and we would have to rewrite 10% of the application.

Manager: That would take too long. Look I don't want you to over-engineer this, we need to get behavior Y without taking too much of a hit on the schedule. What if we only need to do this for this screen?

( long pause

Architect: If we ONLY had to do it for this one screen then we can code a work-around that will only take 2 weeks. But it would be 2 weeks for every screen where you need this. It would be much better in the long run to fix the architecture.

Manager: Let's just code the work around for this screen. We don't have time to fix the architecture.

Damn the Torpedoes, Full Speed Ahead
The effect of discovering a requirement that affects the core architecture depends on when that requirement is discovered. If it is discovered when the project starts then there will be little impact to the architecture; the later it is discovered the greater the impact to the architecture. Regardless of when it is discovered the project deadline is impacted and should be moved back (of course, it won't be ).

Impacts to the architecture will cause development productivity to slow down as the team looks for a work-around. The team will need to choose between building the correct architecture to satisfy the requirements and cutting corners.

Unfortunately, pressure from senior management and the lack of intestinal fortitude from IT management will cause the team to cut corners.

When enough corners are cut then we end up with work-around piled on work-around and the stability will start to fail, bugs will increase exponentially, and productivity will diminish sharply.


“Damn the torpedoes, full speed ahead” will cause the architecture to lose flexibility and the team to lose productivity  (brown circle is the original requirements). Developers will be spending more time in meetings that writing code and development speed will slow to a crawl.

Whether you succeed with your project or fail will depend on how many core requirements are missing and how late they are discovered in the project:
  • If your productivity is not impacted then you will have a successful project.
  • If your productivity slows down enough then you will be lucky to build a subset of the original scope and declare victory, i.e. mission accomplished.
  • If your productivity slows to a crawl then you will end up with a failure.

Conclusion

Regardless of your development methodology the most important thing is to identify the core requirements that are needed to build the core architecture. This means making sure that you identify all the stakeholders and determine the core functionality.  Failure to talk to all stakeholders will leave you vulnerable to requirements that you architecture can not handle.

When development starts have the team work on the most uncertain requirements first. Have the team work on core functionality for the system and not CRUD use cases. Some teams build out the screens of the system because they want to show progress. In reality this is just a façade and lulls management into a false sense of security. By working on the core functionality the team will identify missing core requirements at the beginning of the project when they are laying down the architecture.

Since core requirements discovered late in a project are the most deadly then shorten your development cycle. This is one of the reasons that Agile software development works for small and medium sized projects, your development cycle is usually no more than a month and so the impact of discovering a core requirement late can't have a huge impact.

If you don't use Agile software development then you should try to shorten your development life cycle and have with quarterly releases.  This means to turn your project into a program of projects with much shorter duration.

At a minimum take a look at some of your failed projects and see if you can figure out where the project fell apart. In all likelihood you will discover core requirements that were added to the project after the core architecture had already been laid down.

For more information on requirements uncertainty check out: Uncertainty Trumps Risk in Software Development

Want me to elaborate on any issue mentioned in this article?
Just write me at dmahal@accelerateddevelopment.ca