JOB REFERRALS
    ON THIS PAGE
    ARCHIVES
    CATEGORIES
    BLOGROLL
    LINKS
    SEARCH
    MY BOOKS
    DISCLAIMER
 
 Thursday, August 4, 2011
Of communities, companies, and bugs (Or, “Dr Dobbs Journal is a slut!”

Andrew Binstock (Editor-in-Chief at DDJ) has taken a shot at Oracle’s Java7 release, and I found myself feeling a need to respond.

In his article, Andrew notes that

… what really turned up the heat was Oracle's decision to ship the compiler aware that the known defects would cause one of two types of errors: hang the program or silently generate incorrect results. Given that Java 7 took five years to see light, it seems to me and many others that Oracle could have waited a bit longer to fix the bug before releasing the software. To a large extent, there is a feeling in the Java community that Oracle does not understand Java (despite the company's earlier acquisition of BEA). That may or may not be, but I would have expected it to understand enterprise software enough not to ship a compiler with defects that hang a valid program.

There’s so many things in this paragraph alone I want to respond to, I feel it necessary to deconstruct it and respond individually:

  • “Oracle’s decision to ship the compiler aware that the known defects…” According to the post that went out to the Apache Solr mailing list (seen quoted in a blog post), “These problems were detected only 5 days before the official Java 7 release, so Oracle had no time to fix those bugs… .” I’m sorry, folks, but five days before the release is not a “known defect”. It’s a late-breaking bug. This is yellow journalism, if you ask me.
  • “Given that Java 7 took five years to see light…” Much of that time being the open-sourcing of the JDK itself (1.5 years) and the Oracle acquisition (1.5 years), plus the community’s wrangling over closures that Sun couldn’t find a way to bring consensus around. Remember when they stood on the stage at Devoxx one year and promised “no closures” only to turn around the year following at the same conference and said, “Yes closures”? Sun' had a history of flip-flopping on commitments worse than a room full of politicians. Slapping Oracle with the implicit “you had all this time and you wasted it” argument is just unfair.
  • “… it seems to me and many others that Oracle could have waited a bit longer to fix the bug before releasing the software.” First of all, what “many others”? Remember when Sun proposed the “Java7 now with less features vs Java7 later with more features” question? Overwhelmingly, everybody voted for now, citing “It’s been so long already, just ship *something*” as a reason. If Oracle slipped the date, the howls would still be echoing across the hills and valleys, and Andrew would be writing, “If Oracle commits to a date, they really should stick with this date…” But secondly, remember, the bug was noticed five days before the release. Those of you who’ve never seen a bug show up during a production deployment roll out, please cover your eyes. The rest of you know good and well that sometimes trying to abort a rollout like that mid-stream causes far more damage than just leaving the bug in place. Particularly if there’s a workaround. (Which there is, by the way.)
  • “To a large extent, there is a feeling in the Java community that Oracle does not understand Java.” Hmm. Not surprising, really, when pundits continually hammer away how Oracle doesn’t get Java and doesn’t understand that everything should be given away for free and when people bitch and complain you should immediately buy them all ponies and promise that they’ll never do anything wrong again…. Seriously? Oracle doesn’t understand Java? Or is it that Oracle refuses to play the same bullshit game that Sun played? Let’s see, what is Sun’s stock price these days? Oh, right.
  • “I would have expected it to understand enterprise software enough…” And frankly, I would have expected an editor to understand journalism enough to at least attempt a fair and unbiased story. It’s disappointing, really. Andrew has struck me as a pretty nice and intelligent guy (we’ve chatted over email), but this piece clearly falls way short on a number of levels.
  • “… not to ship a compiler with defects that hang a valid program.” Let’s get to the next paragraph to get into this one.

Andrew’s next paragraph reveals some disturbing analysis:

The problem, from what is known so far, derives from a command-line optimization switch on the Java compiler. This switch incorrectly optimized loops, resulting in the various reported errors. In Java 7, this switch is on by default, while it was off by default in previous releases. Regardless of the state of the switch, the resulting optimizations were not tested sufficiently.

This is a curious problem, because compilers are one of the most demonstrably easy products to test. Text file, easily parsed binary file out. Or earlier in the compilation process: text file in, AST out. The easy generation of input and the simple validation of output make it possible to create literally tens of thousands of regression tests that can explore every detail of the generated code in an automated fashion. These tests are known to be especially important in the case of optimizations because defects in optimized code are far more difficult for developers to locate and identify. The implicit contract by the compiler is that going from debug code during development to optimized code for release does not change functionality. Consequently, optimizations must be tested extra carefully.

Actually, no, the problem, according once again to the Solr mailing list entry, is with the hotspot compiler, not with the compiler itself. Andrew demonstrates a shocking lack of comprehension with this explanation: JIT compilation is nothing like traditional compilation (unless you hyperfocus on the optimization phases of the traditional compiler toolchain), and often has nothing to do with ASTs and so forth. In short, Andrew saw “compiler” and basically leapt to conclusions. It’s a sin of which I’m guilty of as well, but damn, somebody should have caught this somewhere along the way, including Andrew himself—like maybe contacting Oracle and asking them to explain the problem and offer an explanation?

Nah, it’s much better (and gets DDJ a lot more hits) if we leave it the way it’s written. Sensationalism sells. Hence my title.

And, it turns out, if they’re optimizations in the JITter, they can be disabled:

At least disable loop optimizations using the -XX:-UseLoopPredicate JVM option to not risk index corruptions.

Please note: Also Java 6 users are affected, if they use one of those JVM options, which are not enabled by default: -XX:+OptimizeStringConcat or -XX:+AggressiveOpts

Oh, did we mention? It turns out these optimizations have been there in Java 6 as well, so apparently not only is Oracle an idiot for not finding these bugs before now, but so is the entire Java ecosystem. (It seems these bugs only appear now because the optimizations are turned on by default now, instead of turned off.)

Andrew continues:

But even if Oracle's in-house testing was not complete, I have to wonder why they were not testing the code on some of the large open-source codebases currently available. One program that reported the fatal bug was Apache Solr, which most developers would agree is a high profile, open source project. Projects such as Solr provide almost ideal test beds: a large code base that is widely used. Certainly, Oracle might not cotton to writing UATs and other tests to validate what the compiler did with the Solr code. But, in fact, it didn’t have to write a test at all. It simply needed to run the package and the SIGSEGV segmentation fault would occur.

Oh, right. With the acquisition of Sun, Oracle also inherited a responsibility to test their software against every open-source software package known to man. Those people working on those projects have no responsibility to test it themselves, it’s all Oracle’s fault if it all doesn’t work right out of the box. Particularly with fast-moving source bases like those seen in open-source projects. Hmm.

I have to hope that this event will be a sharp lesson to Oracle to begin using the large codebases at its disposal as a fruitful proving ground for its tools. While the sloppiness I've discussed is disturbing, it's made worse by the fact that the same defects can be found in Java 6. The reason they suddenly show up now is that the optimization switch is off by default on Java 6, while on in Java 7. This suggests that Sun's testing was no better than Oracle's. (And given that much of the JDK team at Oracle is the same team that was at Sun, this is no surprise.) The crucial difference is that Oracle knew about the bugs prior to release and went ahead with the release anyway, while there is no evidence Sun was aware of the problems.

I have to hope that this even won’t be a sharp lesson to Oracle that the community is basically made up of a bunch of whiny bitches who complain when a workaroundable bug shows up in their products. Frankly, I would.

Did we mention that all of this was done on an open-source project? At any point anyone can grab the source, build it, and test it for themselves. So, Andrew, are you volunteering to run every build against every open-source project out there? After all, if this is a “community”, then you should be willing to donate all of your time for the community’s benefit, right? Where are the hordes of developers willing to volunteer and donate their time to working on the JDK itself? You’re all quite ready to throw rocks at Oracle (and before that, Sun), but how many of you are willing to put down the rock, pick up a hammer, and start working to build it better?

Yeah, I kind of thought so.

Oracle's decision was political, not technical. And here Oracle needs to really reassess its commitment to its users. Is Java a sufficiently important enterprise technology that shipping showstopper bugs will no longer be permitted? The long-term future of Java, the language, hangs in the balance.

Unless you were in the room when they made the decision, Andrew, you’re basically blowing hot air out your ass, and it smells about as good as when anyone else does. This is a blatantly stupid thing to say, and quite frankly, if Oracle refuses to talk to you ever again, I‘d say they were back to making good decisions. You can’t responsibly declare what the rationale for a decision was unless you were in the room when it was made, and sometimes not even then.

Worse than that, the Solr mailing list entry even points out that Oracle acknowledged the fix, and discussed with the community (the Solr maintainers, in this case, it seems) when and how the fix could come out:

In response to our questions, they proposed to include the fixes into service release u2 (eventually into service release u1, see [6]).

Wow. Oracle actually responded to the bug and discussed when the fix would come out. Clearly they are unengaged with the community and don’t “get” Java.

Maybe I should rename this blog’s title to “Sloppy Work at Dr Dobb’s Journal”.

Nah. Sensationalism sells better. Even when it turns out to be completely unfounded.




Thursday, August 4, 2011 12:45:02 PM (Pacific Standard Time, UTC-08:00)
Comments [3]  |