Porting legacy code

Matt Davey poses an interesting question:

The problem:
  • C++ Corba legacy codebase (5+ years old, 1 million lines)
  • No unit tests
  • Little test data
  • Limited knowledge transfer from the original development team.
  • A flake environment to run the application in.
The requirement:
  • Port the C++ result accumulation and session management code to Java
Do you:
  1. Write C+ unit tests to understand the current system, then write Java equivalent code using TDD
  2. Write Java tests using TDD based on your understanding of the C++ code
  3. Hope you understand the C++ code, and JFDI in Java
  4. Give up and go home
  5. Get the original development team to do the work
Ah, I love the smell of legacy code in the morning. :-)

My answer: depends. (Typical.) Here's what I mean:

  • Option 1 is clearly the "best" answer if the goal is to produce code that will most accurately match what the current C++ code is doing, but also represents the greatest time and energy commitment, as well as making the fundamental assumption that what the C++ code does today is correct in the first place.
  • Option 2 is the approach to take if the time crunch is a bit tighter and/or if the C++ unit tests can't be sold to management ("You're just going to throw them away anyway!"), particularly if the team working on the port has many or all of the original C++ devs. It also allows for the inevitable "You know, we always wanted to change how that code worked, so why don't we...." requirements changes.
  • Option 3 is probably appropriate in those shops where WHISKEY (Why the Hell Isn't Somebody Koding Everything Yet) is considered an acceptable development methodology, but the lack of unit tests for the Java port will catch up to you someday (as it always does).
  • Option 4 is probably best if the company you work for is seriously considering Option 3. :-)
  • Option 5 is only viable if the original development team is available (not going to happen if you outsourced it, by the way), able to work on it (meaning they've flipped the switch to Java at both a syntactic and semantic level), and isn't otherwise engaged on another project (which is probably the dealbreaker).
Matt also left out a few options:
  • 6. Let management believe in the whizzy-bang code conversion wizard that such-and-such company is trying to sell them on that "guarantees" 99% code translation and compatibility
  • 7. Let management outsource the port, and let them worry about it
  • 8. Give it all up and start from scratch--who needs that system anyway? It's not like anybody ever really used it, right?

Porting legacy code is one of the least-favorite projects of any software developer, but what few developers seem to realize is that they're also the least-favorite of management, too: it's a project that has no discernible ROI beyond that of "getting us out of the Stone Age". You might argue that the code becomes more maintainable if it's written in whatever-the-latest-technology-flavor-is-today, but the truth of the matter is, today's hot language is tomorrow's legacy language, subject to being rewritten in tommorrow's hot language. (Any programmer who's been writing code for more than five years probably already knows this, and any programmer who's been writing code for more than 10 years almost certainly knows this.)

Companies have been on this hamster wheel for far too long. Having gone through several transitions, particularly the C++-to-COM/CORBA-to-Java/EJB transitions over the last decade--and they're starting to resist if not outright reject the idea. Instead, they're preferring to find ways to create interoperable solutions rather than ported solutions--hence the huge interest in Web services when they first came out (and the interest in CORBA when it first came out, and the interest in middleware products in general like Tuxedo when they first came out, and so on). Integration still remains the "hard problem" of our industry, one that none of the new languages or platforms seem to want to address until they have to. Witness, for example, Sun's reluctance to really adopt any sort of external-facing technology into Java until they had to (meaning the Java Connector Architecture; their adoption of CORBA was half-hearted at best and a PR move at worst). .NET suffers the same problem, though fortunately Microsoft was wise enogh to realize that shipping .NET without a good Win32/COM interop story was going to kill it before it left the gate. C++ at least had the advantage of being call-compatible with C (if you declared the prototypes correctly), and so could automatically interop against the operating system's libraries pretty easily. In fact, it could be argued that C has long been the de-facto call-level compatibility interoperability standard (Python has C bindings, Ruby has C bindings, Java reluctantly, it seems, support C bindings through JNI, and so on), but of course that only works to a given platform/OS, since C offers so little by way of standardization and the operating systems have never been able to create a portable OS layer beyond the simple stuff; POSIX was arguably the closest they came, and many's the POSIX programmer who will tell you just how successful THAT was.

My point? I hereby declare a rule that any new language developed should think first about its interoperability bindings, and developers contemplating the adoption of a new language must flesh out, in concrete form, how they will integrate the hot new language into their existing architecture, or else they can't use it. (Yes, this applies equally to Ruby, Java, .NET, C++, and all the rest, even FORTRAN--no exceptions.) If you can't describe how it'll integrate into your current stuff, then you're just fascinated with the bright shiny new toy and need to grow up. It doesn't really matter to me how it integrates--through a database, through files on a filesystem, through a message-passing interface like JMS, or through a call-level interface, just have SOME kind of plan for hooking your new <technology X> project into the rest of the enterprise. (And yes, those answers are there for each of those languages/platforms; the test is not whether such answers exist, but how they map into your existing infrastructure.)

What's more, I hereby rededicate this blog to finding interoperabilty solutions across the technology spectrum--got an interop problem you're not sure how to solve? Email me and (with your permission) I'll post the response--sort of an "Ann Landers" for interop geeks. :-)

By the way, this conundrum can be genericized pretty easily using generics/templates:

enum Q
  No, Bad, Little, Flakey, Untouchable
enum technology
  C, C++, Java, C#, C++/CLI, VB 6, VB 7, VB 8, FORTRAN, COBOL, Smalltalk, Lisp, ...

Problem<technology X, technology Y, type T extends AbstractTest, enum Q>: {

  • <X> legacy codebase (<int N where N > 1> years old, <int L where L > 1000> lines)
  • No <type T> tests
  • <Q> test data
  • <Q> knowledge transfer from the original development team
  • <Q> environment to run the application in.
} returning requirement:
  • Port the <X> project to <Y>
(I thought about doing it in Schema, but this seemed geekier... and easier, given all the angle-brackets XSD would require. ;-) )