A little over a year ago, I got a call from Kent Beck inviting me to be his "eyes on the ground" (though I have come to prefer to call my role "on-site mentor") on a payroll project for one of the big three auto makers, let's call them Auto. He explained to me that the project had been going on for well over a year, that it had been deemed a failure, and that he had recommended starting over. His recommendation had been accepted.
As a long time admirer and would-be follower of Kent's work, and being out of work at the time, I reluctantly agreed immediately. Here, a report on the project I have begun calling "the best Smalltalk project in the world", from the viewpoint of business transactions.
The task of our program was to pay some 9,000 professional and managerial employees, with a program that could be extended until it paid all of Auto's more than 100,000 people. We needed a program that was flexible enough to pay monthly, biweekly, and weekly payrolls, including salaried and hourly employees, and that could correctly interface with tens of input streams and tens of output streams already in existence in the company. The program was to be done in Smalltalk, and we wanted it to be a showcase of what good objects and Smalltalk could do.
In this paper, we'll look at the program from the traditional angles of input, process, and output.
Input to the system consists of literally hundreds of fixed-format transactions, with clever code names like FX30 and 2844. These transactions signify deductions to be taken (income tax, lease car, safety shoes), entitlements to be paid (refunds, bonuses), or historical values such as year- or quarter-to-date values of deductions taken.
Input to the system is read from several different flat files, and any file can be provided more than once during the pay cycle. The system was to be set up to read files frequently during to reduce the time required during the actual payment process.
Some inputs merely signify events for specific employees. When an event is identified, the system must access multiple tables from a DB2 database of salary and personal information for the affected employee.
For each employee to be paid, processing consists of
Once an employee's check has been calculated, about 15 different output files need to be created. Some of these present the same information in formats expected by different downstream processes: some produce summarizations and subsettings of the information: general ledger summaries, overall payment summaries, and so on.
Our First Big Mistake
We made our first big mistake in the first weeks of the project. Although we had the blessing of the corporation to start completely over, we decided to retain and reuse a piece of the framework from the previous attempt at the project. This piece was a large and complex network of objects for performing formatted input and output to and from flat files and relational databases. I'll call the subsystem IO (not its real name).
At the time, as Sarek said when Spock asked him why he married Amanda, it seemed to be the logical thing to do. The problem was thought to be large, and after all, the objects all existed. Most everyone knew how to use the IO tool, and many of the record definitions were already in place from the previous lifetime of the project. The tool might need a little work, but their overall conception and implementation were thought to be good. In retrospect, the decision cost us in time, reliability, and efficiency.
I must emphasize that the IO system was written by a respected developer from the first attempt at writing the payroll program. It was known to work; it was known to be reliable. Its structure made sense, and it was even well-commented. Overall, it was an excellent IO framework. Yet, with all that, using it to handle our input and output transactions turned out to be a costly mistake.
Drilling Down: Input
The IO system included the ability to define fixed record formats using a "language" of nested arrays, much like those used in VisualWorks' window definitions. These formats were supported by a convenient GUI that could be used to define each of the fields of each of the many record formats we had to support. It was possible to define records using the GUI, or to define them by editing the arrays, and then subsequently again use the GUI to do further editing.
When input needed to be done, the primary record definition for the file would be sent to the IO system. The IO system would read each record, and then interpret the field definitions, compute the selector to be sent to parse the input field, parse the field, and place it in a holding object until the record was complete.
For variable-format files, IO included the ability to use a key field in the record to determine a more specific record format to parse the remainder of the fields.
When the record parsing was complete, the IO system would format an actual domain object from the holding object, then send that object the message #addToDomain, which would cause the object to take whatever action was necessary to place it into the domain model in the appropriate place.
The upshot of all this was that the input process drove the system. It wasn't as if you could view the input files as a stream of transaction objects: the entire domain model had to be prepared to receive #addToDomain messages willy-nilly. The flow of the system was being controlled by the IO system. Now in some transaction-oriented systems, you must let the transactions control the system flow. In a batch-oriented system like payroll, however, there is the flexibility to process input and output with the system in control.
Lesson learned: we needed our software to drive the I/O, not the other way around. It took us a long time to realize how this difference was shaping our system.
Additionally, the IO system handled all its own errors. Originally most of these were handled silently. Later modifications caused the IO system to send notification messages to the domain when things went wrong. Since the IO system was controlling the domain, not the other way around, the ability to recover errors was severely curtailed. In addition, there were many errors that were processed silently for many months into the project before it was discovered that they were being "handled" without comment by some low-level part of the IO system.
Lesson learned: be cautious with bullet-proofing your underlying frameworks. Their very reliability may mask problems in your legacy domain.
Overall, the effect was that we were slowed down because:
Most important lesson learned: no matter how powerful and well-designed your frameworks are, you need to be fully on top of how they work and how they are implemented. You won't always need to know, but when you need to know, you need it badly.
Drilling Down: Process
The metaphor on which the core model is created is that of manufacturing. Inputs and outputs of the system are parts, which are stored in named bins. Stations that take parts from one or more bins process those parts and place result parts into other bins.
Input transactions all become transaction parts. They are all read into a single bin, named "Legacy Transactions". Specialized stations convert transaction parts into data parts that are placed into specific input bins. Work hour transaction parts become hour parts in a WorkHours bin, year-to-date paid FICA transaction parts become dollar parts in a FICA bin, and so on.
Some transaction parts are intended to update employee profile information rather than to provide input values to process; an example might be a declaration of the number of exemptions desired for federal tax. This information is not stored in bins, but is part of the person. When the corresponding stations process these updating, they become command parts. Command parts are processed at the end of input processing: they are little more than a stored message that is sent to the target object to update it. The deferral of command parts ensures that all data values in bins are up to date, in case the update operation needs to look at current input values.
To summarize the input processing: file inputs become transaction parts. Transaction parts are processed by specialized stations that transform them to input parts (actual hours or dollars), or that convert them to commands to be executed when all inputs are present.
Payment consists of running additional stations that convert hours worked to dollars earned, summarize other entitlements such as awards or refunds, down to stations that compute tax and other deductions. Some of these stations hide facilities for calling a commercial tax package or a commercial AI-based wage attachment package.
When all the stations have run, the person's bins contain all the calculated values that make up his check, corresponding input to the general ledger and other downstream programs, and so on. It is then time to export that individual's data.
Drilling Down: Output
The output process is implemented by creating an object (packing helper) that looks to the IO system as if it were a single object with an accessor for each field the IO record should contain. This object is in fact a proxy for all the bins and all the personal values of the person being exported. The IO system processes through the record definition, sends messages to the packing helper, receives values back, and places them into a giant string, ready for export. The packing helper and its collaborators embody all the translations between "reasonable" domain values and the special requirements of the legacy systems downstream. Often, even though the real domain model wants to keep two kinds of values separate, some specific downstream program wants them summed or differenced. The packing helper implements this requirement: the idea is that someday we will replace the downstream programs and remove the corresponding nasty bits from the packing helper.
Using the IO package didn't affect output as much as input, but we were slowed down there as well. The IO system, in output mode, liked to format the record by going through the record definition asking the domain for the value for field 1, field 2, and so on. To accommodate this meant that we had to implement literally thousands of messages that were those the IO system would send, and each of these messages had to be ready to be sent at random, whenever the IO system wanted the value.
Lesson learned: again, if we had had stronger understanding of the tool, we might have discovered earlier what was going on.
On the output side, we found that the IO system was particularly inefficient, largely because it insisted on building a 12000 byte record before writing it out, rather than streaming out the bytes a few at a time. Since we left detailed efficiency considerations like these to the end of the project, it was late in the schedule before we fully understood the impact. There may have been viable alternatives already built in to the system: because we didn't have deep knowledge of it, we didn't know what they were if they were there.
As a rough indicator of the impact on output, some later parts of the system produce their output files using a simple #storeOn: kind of processing, just writing encoded values to an output file stream. These files are not entirely comparable to those done with the IO system, but they appear to be easier to understand and nearly an order of magnitude more efficient.
Lesson learned: if you have a simple thing to do, first try using a simple solution. It is easier to enhance a simple solution than to optimize a complex one.
At the workshop, I hope to discuss the lessons learned from this experience on handling input and output transactions, and to compare with the experience of other attendees with similar problems. Here are some starting ideas at what I think we have learned: