Disaster Prevention


Don't ignore the signs of impending doom for your project. Avert the worst by conducting a technical and management audit, followed by recovery planning for the future.

By William H. Roetzheim

You're developing new software, and your project team has slipped the schedule. Should you be concerned? Not if this is the first slip. Software projects are more difficult to manage than construction projects, and you wouldn't go crazy if your home remodeling project slipped a few weeks.

However, a second or third slip is a definite sign of potential trouble. You might think that the trouble lies in the schedule delays, but this is merely the surface manifestation of a deeper problem. Unfortunately, when you look beneath the surface of a project missing deadlines, you often find that the underlying architecture and code itself are seriously, perhaps even fatally, flawed. There are two possible reasons for this:


  1. Most frequently, the developers are in over their heads, building a system whose complexity exceeds their experience or ability.
  2. Less frequently but not uncommonly, the developers are capable of building the system, but the initial estimate of effort and time was so badly underscoped that they have nowhere near enough time to do the job right.

If you ignore the initial warning sign of multiple schedule slips, you've laid a foundation for total project failure and cancellation, to be revealed in one or more of the following forms when the system is delivered:


  • The system has numerous defects and crashes or operates incorrectly to the extent that it isn't usable.
  • The system is missing key functionality that's necessary for it to be deployed operationally.
  • Minor system enhancements are difficult and costly to implement, often resulting in unexpected problems in other parts of the application.
  • The system performance is sufficiently slow that it isn't feasible to deploy it operationally.

Damage Auditing
If you suspected that a subsidiary corporation was in financial trouble, and was hiding the problem's magnitude within its accounting department, you'd call in outside help (certified public accountants) to do an audit and tell you where you stood.

Similarly, if you suspect a project is in trouble, you need to immediately call in outside experts to do an audit. The audit team should be composed of very senior managers and software engineers. It should last between three days and three weeks, based on the size of the project, and consist of both a management and technical component. Typically, the technical audit is most critical on smaller projects, while the management audit dominates on the larger projects (over $5 million).

Thetechnical auditfocuses on the design team, and to a lesser extent, the programmers doing the actual implementation, beginning with overall system architecture and database design. The question isn't whether these are right or wrong, but rather whether they're appropriate to the nature of the application (usage, transaction volume, database size, planned evolution and so on). In my experience, if these two elements are correct, the project has a solid foundation and salvage is possible—even if there are other problems. On the other hand, if these two elements are flawed, the remainder of the system probably needs a total rewrite. Running a close third in importance is object and business application server design. If these are wrong, the system can often be made to work, but maintenance will be difficult. You must decide whether to fix and deploy the current system while immediately redesigning a follow-on system, or to redesign immediately.

Once the design has been reviewed, the implementation must be examined. The process begins by using automated tools to look at comment density (both in headers and embedded within functions) across the application as a whole and by function or module. A similar analysis of McCabe's complexity metric for each function is completed. Functions with high complexity are candidates for simplification and likely trouble points for defects. The code itself (including data access code such as SQL statements and stored procedures) is then examined, either in its entirety or by sampling. The audit team looks for inefficient coding techniques, proper error and exception handling, duplicate code blocks (duplicate code should be encapsulated in a function or object, not just cut and pasted), and other obvious problems. Finally, the user interface is examined for usability and conformance with industry standards. Of all the items mentioned, the user interface is the easiest to fix if deficient.

Themanagement audithas three steps: gather metric-oriented input data, prepare project baselines using industry-standard approaches, and compare actual or projected values with the resultant baseline. The project baseline will include total effort and schedule, deliverables (including page counts), labor loading curves over time by skill set, development team skills and experience, maintenance projections, defect projections by category, and so on. By comparing historic values for staffing and other metric values with baseline values, deviations can be identified and analyzed. Similarly, forward-looking project plans can be compared to the baseline values and deviations examined. Examples of the types of problems that pop out are shown in "Management Audit Problem Areas."

Disaster Prevention Audits
With periodic audits throughout the project lifecycle, problems can be identified early and corrective action taken in a timely fashion. These preventative audits have significantly raised success rates on large projects for several agencies of the State of California, and are now a requirement for all large projects. At each audit, the team examines all the work performed to date, and any plans for future work, to identify potential problems with each area (see "Audit How-To").

In addition, project audits are normally conducted as a minimum every six months, so if one stage extends longer, a progress audit is conducted during the middle of the phase.

Saving Your Job and Sanity—Recovery Planning
Suppose that the project audit determines that your project is heading for disaster. What are your options? You could cancel the project immediately and cut your losses. For non-mission-critical systems that don't have a large return, this is often the best choice. In many cases, however, failure is not an option, so recovery planning is the next phase.

Recovery planning begins with software triage. Triage is a military term used by medical personnel following a major battle, separating the wounded into three categories: Those who will get better on their own; those who will die no matter what is done; and those who might be saved by medical attention. The doctors then focus all of their attention on the third group. In an existing software project, the initial step in recovery planning is to conduct triage: What is usable as is? What can be economically fixed and used? What should be discarded? In this step, the core system functionality is also identified, and all extraneous features that can be deleted or delayed are called out.

You then develop a new, zero-based baseline plan. Existing plans and schedules are thrown out, and the project is planned from the current situation to an achievable completion. This requires that formal techniques be used for estimating effort, schedule, the time-cost trade-off and other tasks. Delivered functionality and other project-estimating parameters are adjusted until an acceptable completion date is achieved, or it becomes obvious that no acceptable completion date is possible.

The Final Word
Outside project audits are critical for any project you suspect is in trouble, but remember that auditors should have no vested interest in the project succeeding or failing. They should not come from outside development shops, as they may disparage the current team in order to place their own staff on the project.

Much as CPAs conduct regular audits to ensure accuracy and to provide management and investors with vital information, outside audits of IT projects ensure the accuracy of project status reporting and help management and investors make informed decisions regarding those projects.

Management Audit Problem Areas

Metric
Industry Standard
Audit Results
Software Design Description page count
2,110
493
Software Test Description page count
873
42
Software testers (person months)
72
14
Software integration and test time (calendar months)
4.2
0.75 (planned)

Audit How-To
Preventative audits involve the following five milestones.

Milestone Scope of Audit
1. Project initiation Project baseline plans
2. Software requirements review Requirements, architecture, plans
3. Software design review Scope creep, design, architecture, database, interfaces, test documentation and approach, coding guidelines, plans
4. Completion of coding Code implementation (complexity, adherence to guidelines, encapsulation, algorithm order of magnitude, plans, user interface)
5. Delivery Maintainability, conformance to requirements, usability