Tuesday, July 29, 2008
More Thoughts on Architects and Architecture

Speaking of things crossing my Inbox, Shane Paterson sent me this email:

Hi Ted,

How’s things in the USA?

I just wrote the following little blog entry I wanted to share with you, which I thought you may find interesting.

I used to work with a Naval Architect a few years back. On day we were discussing where the name "Naval Architect" came from. He explained that "Naval Architecture" is really "Naval Engineering" or "Ship Engineering". The word Engineering came into use AFTER the industrial revolution, however ship design existed hundreds of years before. At that time, people needed a name for ship designers, so they reasoned that Architects designed buildings, therefore ship designers must be a kind of Architect too - hence the name "Naval Architect".

Clearly IT didn't exist before the industrial revolution, and IT Architects don't design buildings, so the question begs to be asked "How did we end up with the word Architecture being used for a role in IT". It seems to me to be a rather vague and grandiose name for a role which is probably better described by the use of the word "Engineer".

Perhaps instead of the names "Solution Architect" and "Enterprise Architect" we should actually be using the names "IT Solution Engineer", "IT Enterprise Engineer"?

I've heard this idea put forward before, and I have to say, I'm not overly fond of it, because I believe that what we do as architects is fundamentally different from what a civil engineer does when he engages in the architect's role.

When a civil architect--be it Frank Lloyd Wright or the folks who designed the I-35 bridge in Minnesota--sits down to draw up the plans for a given project, they do so under a fairly strict set of laws and codes governing the process. Not only must the civic restrictions about safety and appearance be honored and respected, but also the basic physical laws of the universe--weight loads, stress, wind shear and harmonics (which the engineers of the first infamous Tacoma Narrows bridge discovered, to everlasting infamy). Ignoring these has disastrous consequences, and a discipline of mathematical calculation joins in with legal regulation to ensure that those laws are obeyed. Only then can the architect engage in the artistry that Lloyd Wright made so much a part of his craft.

Software architecture, though, is a different matter. Not only do we mostly enjoy complete freedom from legal regulation (Sarbannes-Oxley compliance being perhaps the most notable exception, and even then it routinely fails to apply at the small- to medium-sized project levels), we can also ignore most of the laws of physics (the speed of digital signal across a cable or through the air being perhaps our most notable barrier at the moment). "Access data in Tokyo from a web server in Beijing and send the rendered results to a browser in San Francisco? Sure, yeah, no problem, so long as there's a TCP/IP connection, I don't see why not...." There's just so much less by way of physical restrictions in software than there is in civil (or any other kind) of engineering, it seems.

And that sort of hits the central point squarely on the head--there's a lot we don't know about building software yet. We keep concocting these tortured metaphors and imperfect analogies to other practices, industries and disciplines in an attempt to figure out how best to build software, and they keep leading us astray in various ways. When's the last time you heard an accountant say, "Well, what I do is kinda like what the clerk in a retail store does--handle money--so therefore I should take ideas on how to do my job better from retail store clerks"? Sure, maybe the basic premise is true at some levels, but clearly the difference is in the details. And analogies and metaphors have this dangerous habit--they cause us to lose sight of those limitations, in the pursuit of keeping the analogy pure. Remember when everybody was getting purist about objects, such that an automobile repair shop's accounting system had to model "Car" as an object having an "Engine" object and four "Tire" objects and so on, not because these were things that needed to be repaired and tracked somehow, but because cars in real life have an engine and four tires and other things? (Stroustrup even touches on this at one point, talking about an avionics system which ran into design difficulties trying to decide if "Cloud" objects were owned by the "Sky" object, or something along those lines.) All analogies break down somewhere.

Now, to go back to architects and architecture. At the risk of offering up yet another of those tortured metaphors, let me proffer my own architect analogy: an architect is not like a construction architect, but more like the conductor of a band or symphony. Yes, the band could play without him, but at the end of the day, the band plays better with one guy coordinating the whole thing. The larger the band, the more necessary a conductor becomes. Sometimes the conductor is the same thing as the composer (and perhaps that's the most accurate analogous way to view this), in which case it's his "vision" of how the music in his head should come out in real life, and his job is to lead the performers into contributing towards that vision. Each performer has their own skills, freedom to interpret, and so on, but within the larger vision of the work.

Is it a perfect analogy? Heavens, no. It falls apart, just as every other analogy does, if you stress it too hard. But it captures the essence of art and rigor that I think seeing it as "architecture" along the lines of civil engineering just can't. At least, not easily.

Tuesday, July 29, 2008 10:40:31 AM (Pacific Daylight Time, UTC-07:00)
Comments [2]  | 
 Sunday, July 27, 2008
Quotable Quotes, Notable Notes

Overheard, at the NFJS Phoenix 2008 show:

"We (ThoughtWorkers) are firm believers in aggressively promiscuous pairing." --Neal Ford

Sunday, July 27, 2008 1:10:59 PM (Pacific Daylight Time, UTC-07:00)
Comments [1]  | 
 Friday, July 25, 2008
From the "Gosh, You Wanted Me to Quote You?" Department...

This comment deserves response:

First of all, if you're quoting my post, blocking out my name, and attacking me behind my back by calling me "our intrepid troll", you could have shown the decency of linking back to my original post. Here it is, for those interested in the real discussion:

Well, frankly, I didn't get your post from your blog, I got it from an email 'zine (as indicated by the comment "This crossed my Inbox..."), and I didn't really think that anybody would have any difficulty tracking down where it came from, at least in terms of the email blast that put it into my Inbox. Coupled with the fact that, quite honestly, I don't generally make a practice of using peoples' names without their permission (and my email to the author asking if I could quote the post with his name attached generated no response), so I blocked out the name. Having said that, I'm pleased to offer full credit as to its source.

Now, let's review some of your remarks:

"COBOL is (at least) twenty years old, so therefore any use of COBOL must clearly be as idiotic."

I never talked about COBOL, or any other programming language. I was talking about old practices that are nowadays considered harmful and seriously damaging. (Like practising waterfall project management, instead of agile project management.) I don't see how programming in COBOL could seriously damage a business. Why do you compare COBOL with lobotomies? I don't understand. I couldn't care less about programming languages. I care about management practices.

Frankly, the distinction isn't very clear in your post, and even more frankly, to draw a distinction here is a bit specious. "I didn't mean we should throw away the good stuff that's twenty years old, only the bad stuff!" doesn't seem much like a defense to me. There are cases where waterfall style development is exactly the right thing to do a more agile approach is exactly the wrong thing to do--the difference, as I'm fond of saying, lies entirely in the context of the problem. Analogously, there are cases where keeping an existing COBOL system up and running is the wrong thing to do, and replacing it with a new system is the right thing to do. It all depends on context, and for that reason, any dogmatic suggestion otherwise is flawed.

"How can a developer honestly claim to know "what it can be good for", without some kind of experience to back it?"

I'm talking about gaining knowledge from the experience of others. If I hear 10 experts advising the same best practice, then I still don't have any experience in that best practice. I only have knowledge about it. That's how you can apply your knowledge without your experience.

Leaving aside the notion that there is no such thing as best practices (another favorite rant of mine), what you're suggesting is that you, the individual, don't necessarily have to have experience in the topic but others have to, before we can put faith into it. That's a very different scenario than saying "We don't need no stinkin' experience", and is still vastly more dangerous than saying, "I have used this, it works." I (and lots of IT shops, I've found) will vastly prefer the latter to the former.

"Knowledge, apparently, isn't enough--experience still matters"

Yes, I never said experience doesn't matter. I only said it has no value when you don't have gained the appropriate knowledge (from other sources) on when to apply it, and when not.

You said it when you offered up the title, "Knowledge, not Experience".

"buried under all the ludicrous hyperbole, he has a point"

Thanks for agreeing with me.

You're welcome! :-) Seriously, I think I understand better what you were trying to say, and it's not the horrendously dangerous comments that I thought you were saying, so I will apologize here and now for believing you to be a wet-behind-the-ears/lets-let-technology-solve-all-our-problems/dangerous-to-the-extreme developer that I've run across far too often, particularly in startups. So, please, will you accept my apology?

"developers, like medical professionals, must ensure they are on top of their game and spend some time investing in themselves and their knowledge portfolio"


Exactly. :-)

"this doesn't mean that everything you learn is immediately applicable, or even appropriate, to the situation at hand"

I never said that. You're putting words into my mouth.

My only claim is that you need to KNOW both new and old practices and understand which ones are good and which ones can be seriously damaging. I simply don't trust people who are bragging about their experience. What if a manager tells me he's got 15 years of experience managing developers? If he's a micro-manager I still don't want him. Because micro-management is considered harmful these days. A manager should KNOW that.

Again, this was precisely the read I picked up out of the post, and my apologies for the misinterpretation. But I stand by the idea that this is one of those posts that could be read in a highly dangerous fashion, and used to promote evil, in the form of "Well, he runs a company, so therefore he must know what he's doing, and therefore having any kind of experience isn't really necessary to use something new, so... see, Mr. CEO boss-of-mine? We're safe! Now get out of my way and let me use Software Factories to build our next-generation mission-critical core-of-the-company software system, even though nobody's ever done it before."

To speak to your example for a second, for example: Frankly, there are situations where a micro-manager is a good thing. Young, inexperienced developers, for example, need more hand-holding and mentoring than older, more senior, more experienced developers do (speaking stereotypically, of course). And, quite honestly, the guy with 15 years managing developers is far more likely to know how to manage developers than the guy who's never managed developers before at all. The former is the safer bet; not a guarantee, certainly, but often the safer bet, and that's sometimes the best we can do in this industry.

"And we definitely shouldn't look at anything older than five years ago and equate it to lobotomies."

I never said that either. Why do you claim that I said this? I don't have a problem with old techniques. The daily standup meeting is a 80-year old best practice. It was already used by pilots in the second world war. How could I be against that? It's fine as it is.

Um... because you used the term "lobotomies" first? And because your title pretty clearly implies the statement, perhaps? (And let's lose the term "best practice" entirely, please? There is no such thing--not even the daily standup.)

It's ok you didn't like my article. Sure it's meant to be provocative, and food for thought. The article got twice as many positive votes than negative votes from DZone readers. So I guess I'm adding value. But by taking the discussion away from its original context (both physically and conceptually), and calling me names, you're not adding any value for anyone.

I took it in exactly the context it was given--a DZone email blast. I can't help it if it was taken out of context, because that's how it was handed to me. What's worse, I can see a development team citing this as an "expert opinion" to their management as a justification to try untested approaches or technologies, or as inspiration to a young developer, who reads "knowledge, not experience", and thinks, "Wow, if I know all the cutting-edge latest stuff, I don't need to have those 10 years of experience to get that job as a senior application architect." If your article was aimed more clearly at the development process side of things, then I would wish it had appeared more clearly in the arena of development processes, and made it more clear that your aim was to suggest that managers (who aren't real big on reading blogs anyway, I've sadly found) should be a bit more pragmatic and open-minded about who they hire.

Look, I understand the desire for a provocative title--for me, the author of "The Vietnam of Computer Science", to cast stones at another author for choosing an eye-catching title is so far beyond hypocrisy as to move into sheer wild-eyed audacity. But I have seen, first-hand, how that article has been used to justify the most incredibly asinine technology decisions, and it moves me now to say "Be careful what you wish for" when choosing titles that meant to be provocative and food for thought. Sure, your piece got more positive votes than negative ones. So too, in their day, did articles on client-server, on CORBA, on Six-Sigma, on the necessity for big up-front design....


Let me put it to you this way. Assume your child or your wife is sick, and as you reach the hospital, the admittance nurse offers you a choice of the two doctors on call. Who do you want more: the doctor who just graduated fresh from medical school and knows all the latest in holistic and unconventional medicine, or the doctor with 30 years' experience and a solid track record of healthy patients?

.NET | C++ | Conferences | Development Processes | F# | Java/J2EE | Languages | LLVM | Mac OS | Parrot | Ruby | Visual Basic | Windows | XML Services

Friday, July 25, 2008 12:03:40 AM (Pacific Daylight Time, UTC-07:00)
Comments [9]  | 
 Thursday, July 24, 2008
From the "You Must Be Trolling for Hits" Department...

Recently this little gem crossed my Inbox....

Professionalism = Knowledge First, Experience Last
By J----- A-----

Do you trust a doctor with diagnosing your mental problems if the doctor tells you he's got 20 years of experience? Do you still trust that doctor when he picks up his tools, and asks you to prepare for a lobotomy?

Would you still be impressed if the doctor had 20 years of experience in carrying out lobotomies?

I am always skeptic when people tell me they have X years of experience in a certain field or discipline, like "5 years of experience as a .NET developer", "8 years of experience as a project manager" or "12 years of experience as a development manager". It is as if people's professional levels need to be measured in years of practice.

This, of course, is nonsense.

Professionalism is measured by what you are going to do now...

Are you going to use some discredited technique from half a century ago?
•    Are you, as a .NET developer, going to use Response.Write, because you've got 5 years of experience doing exactly that?
•    Are you, as a project manager, going to create Gantt charts, because that's what you've been doing for 8 years?
•    Are you, as a development manager, going to micro-manage your team members, as you did in the 12 years before now?

If so, allow me to illustrate the value of your experience...

(Photo of "Zero" signs)

Here's an example of what it means to be a professional:

There's a concept called Kanban making headlines these days in some parts of the agile community. I honestly and proudly admit that I have no experience at all in applying Kanban. But that's just a minor inconvenience. Because I have attained the knowledge of what it is and what it can be good for. And now there are some planning issues in our organization for which this Kanban-stuff might be the perfect solution. I'm sure we're going to give it a shot, in a controlled setting, with time allocated for a pilot and proper evaluations afterwards. That's the way a professional tries to solve a problem.

Professionals don't match problems with their experiences. They match them with their knowledge.

Sure, experience is useful. But only when you already have the knowledge in place. Experience has no value when there's no knowledge to verify that you are applying the right experience.

Knowledge Comes First, Experience Comes Last

This is my message to anyone who wants to be a professional software developer, a professional project manager, or a professional development manager.

You must gain and apply knowledge first, and experience will help you after that. Professionals need to know about the latest developments and techniques.

They certainly don't bother measuring years of experience.

Are you still practicing lobotomies?



Let's start with the logical fallacy in the first section. Do I trust a doctor with diagnosing my mental problems if he tells me he's got 20 years of experience? Generally, yes, unless I have reasons to doubt this. If the guy picks up a skull-drill and starts looking for a place to start boring into my skull, sure, I'll start to question his judgement.... But what does this have to do with anything? I wouldn't trust the guy if he picked up a chainsaw and started firing it up, either.

Look, I get the analogy: "Doctor has 20 years of experience using outdated skills", har har. Very funny, very memorable, and totally inappropriate metaphor for the situation. To stand here and suggest that developers who aren't using the latest-and-greatest, so-bleeding-edge-even-saying-the-name-cuts-your-skin tools or languages or technologies are somehow practicing lobotomies (which, by the way, are still a recommended practice in certain mental disorder cases, I understand) in order to solve any and all mental-health issues, is a gross mischaracterization--and the worst form of negligence--I've ever heard suggested.

And it comes as no surprise that it's coming from the CIO of a consulting company. (Note to self: here's one more company I don't want anywhere near my clients' IT infrastructure.)

Let's take this analogy to its logical next step, shall we?

COBOL is (at least) twenty years old, so therefore any use of COBOL must clearly be as idiotic as drilling holes in your skull to let the demons out. So any company currently using COBOL has no real option other than to immediately upgrade all of their currently-running COBOL infrastructure (despite the fact that it's tested, works, and cashes most of the US banking industry's checks on a daily basis) with something vastly superior and totally untested (since we don't need experience, just knowlege), like... oh, I dunno.... how about Ruby? Oh, no, wait, that's at least 10 years old. Ditto for Java. And let's not even think about C, Perl, Python....

I know; let's rewrite the entire financial industry's core backbone in Groovy, since it's only, what, 6 years old at this point? I mean, sure, we'll have to do all this over again in just four years, since that's when Groovy will turn 10 and thus obviously begin it's long slide into mediocrity, alongside the "four humors" of medicine and Aristotle's (completely inaccurate) anatomical depictions, but hey, that's progress, right? Forget experience, it has no value compared to the "knowledge" that comes from reading the documentation on a brand-new language, tool, library, or platform....

What I find most appalling is this part right here:

I honestly and proudly admit that I have no experience at all in applying Kanban. But that's just a minor inconvenience. Because I have attained the knowledge of what it is and what it can be good for.

How can a developer honestly claim to know "what it can be good for", without some kind of experience to back it? (Hell, I'll even accept that you have familiarity and experience with something vaguely relating to the thing at hand, if you've got it--after all, experience in Java makes you a pretty damn good C# developer, in my mind, and vice versa.)

And, to make things even more interesting, our intrepid troll, having established the attention-gathering headline, then proceeds to step away from the chasm, by backing away from this "knowledge-not-experience" position in the same paragraph, just one sentence later:

I'm sure we're going to give it a shot, in a controlled setting, with time allocated for a pilot and proper evaluations afterwards.

Ah... In other words, he and his company are going to experiment with this new technique, "in a controlled setting, with time allocated for a pilot and proper evaluations afterwards", in order to gain experience with the technique and see how it works and how it doesn't.

In other words....

.... experience matters.

Knowledge, apparently, isn't enough--experience still matters, and it matters a lot earlier than his "knowledge first, experience last" mantra seems to imply. Otherwise, once you "know" something, why not apply it immediately to your mission-critical core?

At the end of the day, buried under all the ludicrous hyperbole, he has a point: developers, like medical professionals, must ensure they are on top of their game and spend some time investing in themselves and their knowledge portfolio. Jay Zimmerman takes great pains to point this out at every No Fluff Just Stuff show, and he's right: those who spend the time to invest in their own knowledge portfolio, find themselves the last to be fired and the first to be hired. But this doesn't mean that everything you learn is immediately applicable, or even appropriate, to the situation at hand. Just because you learned Groovy last weekend in Austin doesn't mean you have the right--or the responsibility--to immediately start slipping Groovy in to the production servers. Groovy has its share of good things, yes, but it's got its share of negative consequences, too, and you'd better damn well know what they are before you start betting the company's future on it. (No, I will not tell you what those negative consequences are--that's your job, because what if it turns out I'm wrong, or they don't apply to your particular situation?) Every technology, language, library or tool has a positive/negative profile to it, and if you can't point out the pros as well as the cons, then you don't understand the technology and you have no business using it on anything except maybe a prototype that never leaves your local hard disk. Too many projects were built with "flavor-of-the-year" tools and technologies, and a few years later, long after the original "bleeding-edge" developer has gone on to do a new project with a new "bleeding-edge" technology, the IT guys left to admin and upkeep the project are struggling to find ways to keep this thing afloat.

If you're languishing at a company that seems to resist anything and everything new, try this exercise on: go down to the IT guys, and ask them why they resist. Ask them to show you a data flow diagram of how information feeds from one system to another (assuming they even have one). Ask them how many different operating systems they have, how many different languages were used to create the various software programs currently running, what tools they have to know when one of those programs fails, and how many different data formats are currently in use. Then go find the guys currently maintaining and updating and bug-fixing those current programs, and ask to see the code. Figure out how long it would take you to rewrite the whole thing, and keep the company in business while you're at it.

There is a reason "legacy code" exists, and while we shouldn't be afraid to replace it, we shouldn't be cavalier about tossing it out, either.

And we definitely shouldn't look at anything older than five years ago and equate it to lobotomies. COBOL had some good ideas that still echo through the industry today, and Groovy and Scala and Ruby and F# undoubtedly have some buried mines that we will, with benefit of ten years' hindsight, look back at in 2018 and say, "Wow, how dumb were we to think that this was the last language we'd ever have to use!".

That's experience talking.

And the funny thing is, it seems to have served us pretty well. When we don't listen to the guys claiming to know how to use something effectively that they've never actually used before, of course.

Caveat emptor.

.NET | C++ | Development Processes | F# | Java/J2EE | Languages | LLVM | Mac OS | Parrot | Ruby | Visual Basic | Windows | XML Services

Thursday, July 24, 2008 12:53:02 AM (Pacific Daylight Time, UTC-07:00)
Comments [9]  | 
 Wednesday, July 16, 2008
Blog change? Ads? What gives?

If you've peeked at my blog site in the last twenty minutes or so, you've probably noticed some churn in the template in the upper-left corner; by now, it's been finalized, and it reads "JOB REFERRALS".

WTHeck? Has Ted finally sold out? Sort of, not really. At least, I don't think so.

Here's the deal: the company behind those ads, Entice Labs, contacted me to see if I was interested in hosting some job ads on my blog, given that I seem to generate a moderate amount of traffic. I figured it was worthwhile to at least talk to them, and the more I did, the more I liked what I heard--the ads are focused specifically at developers of particular types (I chose a criteria string of "Software Developers", subcategorized by "Java, .NET, .NET (Visual Basic), .NET (C#), C++, Flex, Ruby on Rails, C, SQL, JavaScript, HTML" though I'm not sure whether "HTML" will bring in too many web-designer jobs), and visitors to my blog don't have to click through the ads to get to the content, which was important to me. And, besides, given the current economic climate, if I can help somebody find a new job, I'd like to.

Now for the full disclaimer: I will be getting money back from these job ads, though how much, to be honest with you, I'm not sure. I'm really not doing this for the money, so I make this statement now: I will take 50% of whatever I make through this program and donate it to a charitable organization. The other 50% I will use to offset travel and expenses to user groups and/or CodeCamps and/or for-free conferences put on throughout the country. (Email me if you know of one that you're organizing or attending and would like to see me speak at, and I'll tell you if there's any room in the budget left for it. :-) )

Anyway, I figured if the ads got too obnoxious, I could always remove them; it's an experiment of sorts. Tell me what you think.

.NET | C++ | Conferences | F# | Flash | Java/J2EE | Languages | Mac OS | Parrot | Ruby | Visual Basic | VMWare | Windows | XML Services

Wednesday, July 16, 2008 7:29:46 PM (Pacific Daylight Time, UTC-07:00)
Comments [4]  | 
Object.hashCode implementation

After the previous post, I just had to look. The implementation of Object.equals is, as was previously noted, just "return this == obj", but the implementation of Object.hashCode is far more complicated.

Taken straight from the latest hg-pulled OpenJDK sources, Object.hashCode is a native method registered from Object.c that calls into a Hotspot-exported function, JVM_IHashCode(), from hotspot\src\share\vm\prims\jvm.cpp:

JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))
// as implemented in the classic virtual machine; return 0 if object is NULL
return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;

which in turn calls ObjectSynchronizer::FastHashCode, defined in hotspot\src\share\vm\runtime\synchronizer.cpp as:

intptr_t ObjectSynchronizer::FastHashCode (Thread * Self, oop obj) {
if (UseBiasedLocking) {
// NOTE: many places throughout the JVM do not expect a safepoint
// to be taken here, in particular most operations on perm gen
// objects. However, we only ever bias Java instances and all of
// the call sites of identity_hash that might revoke biases have
// been checked to make sure they can handle a safepoint. The
// added check of the bias pattern is to avoid useless calls to
// thread-local storage.
if (obj->mark()->has_bias_pattern()) {
// Box and unbox the raw reference just in case we cause a STW safepoint.
Handle hobj (Self, obj) ;
// Relaxing assertion for bug 6320749.
assert (Universe::verify_in_progress() ||
"biases should not be seen by VM thread here");
BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current());
obj = hobj() ;
assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");

// hashCode() is a heap mutator ...
// Relaxing assertion for bug 6320749.
assert (Universe::verify_in_progress() ||
!SafepointSynchronize::is_at_safepoint(), "invariant") ;
assert (Universe::verify_in_progress() ||
Self->is_Java_thread() , "invariant") ;
assert (Universe::verify_in_progress() ||
((JavaThread *)Self)->thread_state() != _thread_blocked, "invariant") ;

ObjectMonitor* monitor = NULL;
markOop temp, test;
intptr_t hash;
markOop mark = ReadStableMark (obj);

// object should remain ineligible for biased locking
assert (!mark->has_bias_pattern(), "invariant") ;

if (mark->is_neutral()) {
hash = mark->hash(); // this is a normal header
if (hash) { // if it has hash, just return it
return hash;
hash = get_next_hash(Self, obj); // allocate a new hash code
temp = mark->copy_set_hash(hash); // merge the hash code into header
// use (machine word version) atomic operation to install the hash
test = (markOop) Atomic::cmpxchg_ptr(temp, obj->mark_addr(), mark);
if (test == mark) {
return hash;
// If atomic operation failed, we must inflate the header
// into heavy weight monitor. We could add more code here
// for fast path, but it does not worth the complexity.
} else if (mark->has_monitor()) {
monitor = mark->monitor();
temp = monitor->header();
assert (temp->is_neutral(), "invariant") ;
hash = temp->hash();
if (hash) {
return hash;
// Skip to the following code to reduce code size
} else if (Self->is_lock_owned((address)mark->locker())) {
temp = mark->displaced_mark_helper(); // this is a lightweight monitor owned
assert (temp->is_neutral(), "invariant") ;
hash = temp->hash(); // by current thread, check if the displaced
if (hash) { // header contains hash code
return hash;
// The displaced header is strictly immutable.
// It can NOT be changed in ANY cases. So we have
// to inflate the header into heavyweight monitor
// even the current thread owns the lock. The reason
// is the BasicLock (stack slot) will be asynchronously
// read by other threads during the inflate() function.
// Any change to stack may not propagate to other threads
// correctly.

// Inflate the monitor to set hash code
monitor = ObjectSynchronizer::inflate(Self, obj);
// Load displaced header and check it has hash code
mark = monitor->header();
assert (mark->is_neutral(), "invariant") ;
hash = mark->hash();
if (hash == 0) {
hash = get_next_hash(Self, obj);
temp = mark->copy_set_hash(hash); // merge hash code into header
assert (temp->is_neutral(), "invariant") ;
test = (markOop) Atomic::cmpxchg_ptr(temp, monitor, mark);
if (test != mark) {
// The only update to the header in the monitor (outside GC)
// is install the hash code. If someone add new usage of
// displaced header, please update this code
hash = test->hash();
assert (test->is_neutral(), "invariant") ;
assert (hash != 0, "Trivial unexpected object/monitor header usage.");
// We finally get the hash
return hash;

Hope this answers all the debates. :-)

Editor's note: Yes, I know it's a long quotation of code completely out of context; my goal here is simply to suggest that the hashCode() implementation is not just a integerification of the object's address in memory, as was suggested in other discussions. For whatever it's worth, the get_next_hash() implementation that's referenced in the FastHashCode() method looks like:

// hashCode() generation :
// Possibilities:
// * MD5Digest of {obj,stwRandom}
// * CRC32 of {obj,stwRandom} or any linear-feedback shift register function.
// * A DES- or AES-style SBox[] mechanism
// * One of the Phi-based schemes, such as:
// 2654435761 = 2^32 * Phi (golden ratio)
// HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stwRandom ;
// * A variation of Marsaglia's shift-xor RNG scheme.
// * (obj ^ stwRandom) is appealing, but can result
// in undesirable regularity in the hashCode values of adjacent objects
// (objects allocated back-to-back, in particular). This could potentially
// result in hashtable collisions and reduced hashtable efficiency.
// There are simple ways to "diffuse" the middle address bits over the
// generated hashCode values:

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
intptr_t value = 0 ;
if (hashCode == 0) {
// This form uses an unguarded global Park-Miller RNG,
// so it's possible for two threads to race and generate the same RNG.
// On MP system we'll have lots of RW access to a global, so the
// mechanism induces lots of coherency traffic.
value = os::random() ;
} else
if (hashCode == 1) {
// This variation has the property of being stable (idempotent)
// between STW operations. This can be useful in some of the 1-0
// synchronization schemes.
intptr_t addrBits = intptr_t(obj) >> 3 ;
value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
} else
if (hashCode == 2) {
value = 1 ; // for sensitivity testing
} else
if (hashCode == 3) {
value = ++GVars.hcSequence ;
} else
if (hashCode == 4) {
value = intptr_t(obj) ;
} else {
// Marsaglia's xor-shift scheme with thread-specific state
// This is probably the best overall implementation -- we'll
// likely make this the default in future releases.
unsigned t = Self->_hashStateX ;
t ^= (t << 11) ;
Self->_hashStateX = Self->_hashStateY ;
Self->_hashStateY = Self->_hashStateZ ;
Self->_hashStateZ = Self->_hashStateW ;
unsigned v = Self->_hashStateW ;
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
Self->_hashStateW = v ;
value = v ;

value &= markOopDesc::hash_mask;
if (value == 0) value = 0xBAD ;
assert (value != markOopDesc::no_hash, "invariant") ;
return value;

Thus (hopefully) putting the idea that it might be allocating a hash based on the object's identity completely to rest.

For the record, this is all from the OpenJDK source base--naturally, it's possible that earlier VM implementations did something entirely different.


Wednesday, July 16, 2008 1:18:19 AM (Pacific Daylight Time, UTC-07:00)
Comments [3]  | 
 Tuesday, July 15, 2008
Of Zealotry, Idiocy, and Etiquette...

I'm not sure what it is about our industry that promotes the flame war, but for some reason exchanges like this one, unheard of in any other industry I've ever touched (even tangentially), are far too common, too easy to get into, and entirely too counterproductive.

I'm not going to weigh in on one side or the other here; frankly, I have a hard time following the debate and figuring out who's exactly arguing for what. I can see, however, that the entire debate follows some traditional patterns of the flame war:

  1. Citing yourself as the final authority. At no point during the debate does anybody reach for their copy of Effective Java, a widely-accepted source of Java guidance, for a potential resolution to the discussion. Instead, the various players simply say, "Fact A is true" or "Fact A is false", with zero supporting information, citations, or demonstrations either way. (A few people cite the Javadoc, but there is enough ambiguity there to merit further citation.)
  2. Refusal to accept the possibility of an alternative viewpoint. At no point, near as I can tell, did any of the participants bother to say, "You know, you could be right, but I remain unconvinced. Can you give me more information to support your point of view?" The entire time, everybody is arguing from "fact", and nobody even considers the possibility that different JVMs can have different implementations, despite the fact that the Javadoc being quoted says as much.
  3. Degeneration into personal attacks. I don't care who started it, I don't care who called who the worse name. Fact is, reasonable people can reasonably disagree, and nobody in that transcript seemed overly reasonable to me.
  4. Nobody ever really gets around to answering the question because they're too busy arguing their position or point. Poor "doub", the initiator of the question, tries valiantly to circle the conversation back on topic, but the various players are too busy whipping out their instruments of manhood onto the table so everybody can see how much bigger it is than the other guys'. When "doub" points out that writing some sample code "gave me a very loose but still usefull information about my object, and took less time than the conversation about my question :-)", or in other words, "Hey, guys, I kinda already got my answer, can we move on now?", the conversation continues as if the comment never occurred--the question has turned into a "biggest-geek" argument by this point. "doub" even asks, at 10:12:12, "do i get bad karma points for being the initiator of a conflict?", and the image I get in my head is that of the poor kid, hiding in his bedroom while his parents yell and scream downstairs, feeling awful because the fight started over his backpack lying in the hallway where Mom told him to put it and Dad thought he left it instead of putting it away. ("doub", if you read this, no, you get no bad karma points, at least not in my universe.)

The interesting thing, though, is that this conversation has nothing to do with Scala. "dysinger" twitters:

Frankly, "dysinger", it's kinda hard to have much sympathy for somebody when they blame the language or tool for a conversation that's had around it; this would be like blaming Python, the language, for the community around it (which some people do, I understand). I can understand the frustration, on both sides, since everybody was essentially arguing past one another, but why is that Scala's fault, pray tell?

And frankly, I find the dig at the academics to be a tad disingenuous. Yes, academics have a reputation--duly earned in some cases--of being removed from reality and the slings and arrows of a life spent developing software for production environments, but name for me a language in the popular mainstream that doesn't owe a huge debt to the preliminary work laid down by academics before it. In every other industry, academics are revered and honored--it's only in this industry they are used as an example of degradation and insult. Way to bite the hand that makes your life easier, folks....

At the end of the day, these kind of debates do nothing but harm the innocent, "doub", in this case. "dysinger", "DrMacIver", "JamesIry", all of you, right or wrong, didn't exactly cover yourselves in glory, nor did you really convince anybody of anything. Instead, you shouted at each other really loudly, made lots of noise, got angry over nothing in particular, and really failed to achieve much of anything. Regardless of your intentions, now Scala, Java, the JVM and the entire ecosystem have seen their reputation tarnished just a touch more than it was when you started. Great job.

Here's a tip for all of you: Try listening.


Tuesday, July 15, 2008 11:18:43 PM (Pacific Daylight Time, UTC-07:00)
Comments [9]  | 
 Friday, July 11, 2008
So You Say You Want to Kill XML....

Google (or at least some part of it) has now weighed in on the whole XML discussion with the recent release of their "Protocol Buffers" implementation, and, quite naturally, the debates have begun, with all the carefully-weighed logic, respectful discourse, and reasoned analysis that we've come to expect and enjoy from this industry.

Yeah, right.

Anyway, without trying to take sides either way in this debate--yes, the punchline is that I believe in a world where both XML and Protocol Buffers are useful--I thought I'd weigh in on some of the aspects about PBs that are interesting/disturbing, but more importantly, try to frame some of the debate and discussions around these two topics in a vain attempt to wring some coherency and sanity out of what will likely turn into a large shouting match.

For starters, let's take a quick look at how PBs work.

Protocol Buffers 101

The idea behind PBs is pretty straightforward: given a PB definition file, a code-generator tool builds C++, Java or Python accessors/generators that know how to parse and produce files in the Protocol Buffer format. The generated classes follow a pretty standard format, using the traditional POJO/JavaBean style get/set for Java classes, and something similar for both the Python and C++ classes. (The Python implementation is a tad different from the C++ and Java versions, as it makes use of Python metaclasses to generate the class at runtime, rather than at generation-time.) So, for example, given the Google example of:

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;

enum PhoneType {
HOME = 1;
WORK = 2;

message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];

repeated PhoneNumber phone = 4;

... using the corresponding generated C++ class would look something like

Person person;
person.set_name("John Doe");
fstream output("myfile", ios::out | ios::binary);

... and the Java implementation would look somewhat similar, except using a Builder to create the Person.

The Protocol Buffer interoperable definition language is relatively complete, with support for a full range of scalar types, enumerations, variable-length collections of these types, nested message types, and so on. Each field in the message is tagged with a unique field number, as you can see above, and the language provides the ability to "reserve" field IDs via the "extension" mechanism, presumably to allow for either side to have flexibility in extending the PB format.

There's certainly more depth to the PB format, including the "service" and "rpc" features of the language that will generate full RPC proxies, but this general overview serves to provide the necessary background to be able to engage in an analysis of PBs.

Protocol Buffers: An Analysis

When looking at Protocol Buffers, it's important to realize that this is an open-source implementation, and that Google has already issued a call to others to modify them and submit patches to the project. Anything that comes across as a criticism or inherent flaw in the implementation is thus correctable, though whether Google would apply those fixes to the version they use internally is an open question--there's a tension in any open-source project sponsored by a company, between "What we need the project for" and "What other people using the project need it for", and it's not always clear how that tension will play out in the long term.

So, without further ado ...

For starters, Protocol Buffers' claim to be language and/or platform-neutral is hardly justifiable, given that they have zero support for the .NET platform out of the box. Now, before the Googlists and Google-fanbois react angrily, let me be the first to admit, that yes, nothing stops anybody from producing said capability and contributing it back to the project. In fact, there's even some verbage to that effect on the Protocol Buffers' FAQ page. But without it coming out of the box, it's not fair to claim language- and platform-neutrality, unless, of course, they are willing to suggest that COM's Structured Storage was also language- and platform-neutral, and that it wasn't Microsoft's fault that nobody went off and built implementations for it under *nix or the Mac. Frankly, any binary format, regardless of how convoluted, could be claimed to be language- and platform-neutral under those conditions, which I think makes the claim spurious to make. The fact that Google doesn't care about the .NET platform doesn't mean that their implementation is incapable of running on it; the fact that Google wrote their code generator to support C++, Java and Python doesn't mean that Protocol Buffers are language- and platform-neutral, either. XML still holds the edge here, by a long shot--until we see implementations of Protocol Buffers for Perl/Parrot, C, D, .NET, Ruby, JavaScript, mainframes and others, PBs will have to take second-place status behind XML in terms of "reach" across the wide array of systems. Does that diminish their usefulness? Hardly. It just depends on how far a developer wants their data format to stretch.

Having gotten that little bit out of the way ...

Remember a time when binary formats were big? Back in the early- to mid-90's, binary formats were all the rage, and criticism of them abounded: they were hard to follow, hard to debug, bloated, inherently inflexible, tightly-coupled, and so on. Consider, for example, this article comparing XML against its predecessor information-exchange technologies:

A great example ... is a data feed of all orders taken electronically via a web site into an internal billing and shipping system. The main advantage of XML here is it's flexibility - developers create their own tags and dictionaries, as they deem necessary. Therefore no matter what type of data is being transferred, the right XML representation of it will accurately describe the data.

Logically, each order can be described as one customer purchasing one or more items using their credit card and potentially entering different billing and shipping addresses. The contents of the file are very easy to read, even for a person who is not familiar with XML. The information within each of the order tags is well structured and organized. This enables developers to use parsing components and easily access any data within the document. Each item in the order is logically a unique entity, and is also represented with a separate tag. All item properties are defined as "child" nodes of the item tag.

XML is the language of choice for two major reasons. First of all, an XML formatted document can be easily processed under any OS, and in any language, as long as XML parsing components are available. On the other hand, XML files are still raw data, which enables merchants to format the data any way they want to. All in all, document structure and wide acceptance of this format has made it possible to enable customers to build more efficient internal order Tracking systems based on XML-formatted order files. Other online merchant sites are making similar functionality available on their Web sites as well.

And yet, now, the criticism goes the other way, complaining that XML is bloated, hard to follow, and so on:

Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format.

How do we reconcile these apparently contradictory positions?

First of all, I don't think anybody in the XML community has ever sought to argue that XML was designed for efficiency; in fact, quite the opposite, as the XML specification itself states in the abstract:

The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML. ... This document specifies a syntax created by subsetting an existing, widely used international text processing standard (Standard Generalized Markup Language, ISO 8879:1986(E) as amended and corrected) for use on the World Wide Web.

The Introduction section is even more explicit:

XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure.

And if that wasn't clear enough....

The design goals for XML are:

  1. XML shall be straightforwardly usable over the Internet.
  2. XML shall support a wide variety of applications.
  3. XML shall be compatible with SGML.
  4. It shall be easy to write programs which process XML documents.
  5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
  6. XML documents should be human-legible and reasonably clear.
  7. The XML design should be prepared quickly.
  8. The design of XML shall be formal and concise.
  9. XML documents shall be easy to create.
  10. Terseness in XML markup is of minimal importance.

In essence, then, the goal of XML was never to be small or fast, but still clearly simple. And, despite your personal opinion about the ecosystem that has grown up around XML (SOAP, WS-*, and so on), it's still fairly easy to defend the idea that XML itself is a simple technology, particularly if we make some basic assumptions around things that usually complicate text like character sets and encoding and such.

Note: I am deliberately ignoring the various attempts at a binary Infoset specification, which has been on the TODO list of many XML working groups in the past and has yet to really make any sort of impact in the industry. Theoretically, yes, XML Infoset-compliant documents could be rendered into a binary format that would have a much smaller, more efficient footprint than the textual representation. If and when we ever get there, it would be interesting to see what the results look like. I'm not holding my breath.

Why, then, did XML take on a role as data-transfer format if, on the surface of things, using text here was such a bad idea?


"With Web services your accounting departments Win 2k servers billing system can connect with your IT suppliers UNIX server." --

"Conventional application development often means developing for multiple devices, and that form factors of the client devices can be dramatically different. If you base an application on a web browser display size of 800x600, it would never work on a device with a resolution of 4 lines by 20 characters. Conversely, if you took a lowest common denominator approach and sized for the smaller device, the user interface would be lost on an 800x600 device.Using a non-XML approach, this leaves us writing multiple clients speaking to the server, or writing multiple clients speaking to multiple severs ... Limitations of this approach include:

  • Tightly coupled to browser
  • Multiple code-bases
  • Difficult to adapt to new devices
  • Major development efforts
  • Slow time to market

"In this case, only one application exists. It runs against the back-end database, and produces an XML stream. A "translator" takes this XML stream, and applies an XSLT transformation to it. Every device could either use a generic XSLT, or have a specialized XSLT that would produce the required device-specific output. The transformation occurs on the server, meaning that no special client capabilities are required.This "hub and spoke" architecture yields tremendous flexibility. When a new device appears, anew spoke can be added to accommodate it. The application itself does not need to be changed, only the translator needs to be informed about the existence of the new device, and which XSLT to use for it." --

Certainly the interoperability argument doesn't require a text-based format, it was just always cited that way. In fact, both the CORBA supporters and the developers over at ZeroC will both agree with Google in suggesting that a binary format can and will be an efficient and effective interoperability format. (I'll let the ZeroC folks talk to the pros and cons of their Ice format as compared to IIOP and SOAP.)

Some of the key inherent advantages in XML that are lost in this new binary format, however, center around the XML Infoset itself and the fact that it has a number of ancillary tools around it, which gets us into the next point; consider some of those inherent advantages in XML that are lost in this new binary, structured, tightly-coupled format:

  • XPath. The ability to selectively extract nodes from inside a document is incredibly powerful and highly underutilized by most developers.
  • XSLT. Although somewhat on the down-and-out in popularity with developers because of its complexity, XSLT stands as a shining example of how a general-purpose transformation tool can be built once the basic structure is well-understood and independent of the actual domain. (SQL is another.)
  • Structureless parsing. In order to use Protocol Buffers, each side must have a copy of the .proto file that generated the proxies. (While it's theoretically possible to build an API that picks through the structure element-by-element in the same way an XML API does, I haven't found such an API in the PB code yet, and doing so would mean spending some quality time with the Protocol Buffer binary encoding format. The same was always true of IIOP or the DCOM MEOW format, too.)

In essence, a Protocol Buffer format consumer is tightly-coupled against the .proto file that it was generated against, whereas in XML we can follow Noah Mendelsohn's advice of years ago ("Schema are relative") and parse XML in whatever manner suits the consumer, with or without schema, without or without schema-based-and-bound proxy code. The advantage to the XML approach, of course, is that it provides a degree of flexibility; the advantage of the Protocol Buffer approach is that the code to produce and consume the elements can be much simpler, and therefore, faster.

Note: it's only fair to point out at this point that the Protocol Buffer approach already contains a certain degree of flexibility that earlier binary formats lacked (if I remember correctly), via PB's use of "optional" vs "required" tags for the various fields. Whether this turns out to be sufficient over time is yet to be decided, though Google claims to be using Protocol Buffers throughout the company for its own internal purposes, which is certainly supporting evidence that cannot be discarded casually.

But there's a corollary effect here, as well: because XML documents are intended to be self-descriptive, the Protocol Buffer format can contain just the data, and leave the format and structure to be enforced by the code on either side of the producer/consumer relationship. Whether you consider this a Good Thing or a Bad Thing probably stands as a good indicator of whether you like the Protocol Buffer approach or the XML approach better.

Note, too, by the way, that many of the XML-based binding APIs can now parse objects out of part of an XML document, as opposed to having to read in the whole thing and then convert--JAXB, JiBX and XMLBeans can all pick up parsing an XML document from any node beyond the initial root element, for example--whereas, at least as of this point (as far as I can tell, and I'd love to have somebody at Google tell me I'm wrong here, because I think it's a major flaw), the Protocol Buffers approach assumes it will read in the entire object from the file. I don't see any way, short of putting developer-specified "separation markers" into the stream or some other kind of encoding or convention, of doing a "partial read" of an object model from a PB data file.

To see what I mean by this, consider the AddressBook example. Suppose the AddressBook holds several thousand records, and my processing system only cares about a select few (less than five, perhaps, who all have the last name "Neward"). In a Protocol Buffer scheme, I deserialize the entire AddressBook, then go through the persons item by item, looking for the one I want. In an XML-based scheme, I can pull-mode parse (StAX in Java, or using the pull-mode XML parser in .NET, for example) the nodes, throwing away nodes until I see one where the <lastName> node contains "Neward", and then JAXB-consume the next n number of nodes into a Person object before continuing through the remainder of the AddressBook.

Let's also note that the Protocol Buffer scheme assumes working with a stream-based (which usually means file-based) storage style for when Protocol Buffers are used as a storage mechanism. But frankly, if I want to store objects, I'd rather use a system that understands objects a bit more deeply than PBs do, and gives me some additional support beyond just input and output. This gets us into the long and involved discussion around object databases, which is another one that's likely to devolve into a shouting match, so I'll leave it at that. Suffice it to say that for object storage, I can see using (for example) db4o-storing-to-a-file as a vastly more long-term solution than I can using PBs, at least for now. (Undoubtedly the benchmarks will be along soon to try and convince us one way or another.) One area that is of particular interest along these lines, though, will be the evolutionary capabilities of each--from my (limited) study of PBs thus far, I believe db4o has a more flexible evolutionary scheme, and I have to admit, I don't like the idea of having to run the codegen step before being able to store my objects, but that's a minor nit that's easily solved with tools like Ant and a global rule saying "Nobody touches the .proto file without checking with me first."

Which, by the way, brings up another problem, the same one that plagues CORBA, COM/DCOM, WSDL-based services, and anything that relies on a shared definition file that is used for code-generation purposes, what I often call The Myth of the One True Schema. Assuming a developer creates a working .proto/.idl/.wsdl definition, and two companies agree on it, what happens when one side wants to evolve or change that definition? Who gets to decide the evolutionary progress of that file? Who "owns" that definition, in effect? And this, of course, presumes that we can even get some kind of definition as to what a "Customer" looks like across the various departments of the company in the first place, much less across companies. Granted, the "optional" tag in PBs help with this, but we're still stuck with an inherently unscalable problem as the number of participants in the system grows.

I'd give a great deal of money to see what the 12,000-odd .proto files look like inside Google, and then again at what they look like in five years, particularly if they are handed out to paying customers as APIs against which to compile and link. There's ways to manage this, of course, but they all look remarkably like the ways we managed them back in the COM/DCOM-vs-CORBA days, too.

Long story short, the Protocol Buffer approach looks like a good one, but let's not let the details get lost in the shouting: Protocol Buffers, as with any binary protocol format and/or RPC mechanism (and I'm not going to go there; the weaknesses of RPC are another debate for another day), are great for those situations where performance is critical and both ends of the system are well-known and controlled. If Google wants to open up their services such that third-parties can call into those systems using the Protocol Buffers approach, then more power to them... but let's not lose sight of the fact that it's yet another proprietary API, and that if Microsoft were to do this, the world would be screaming about "vendor lock-in" and "lack of standards compliance". (In fact, I heard exactly these complaints from Java developers during WCF Q&A when they were told that WCF-to-WCF endpoints could "negotiate up" to a faster, binary, protocol between them.)

In the end, if you want an endpoint that is loosely coupled and offers the maximum flexibility, stick with XML, either wrapped in a SOAP envelope or in a RESTful envelope as dictated by the underlying transport (which means HTTP, since REST over anything else has never really been defined clearly by the Restafarians). If you need a binary format, then Protocol Buffers are certainly one answer... but so is ICE, or even CORBA (though this is fast losing its appeal thanks to the slow decline of the players in this space). Don't lose sight of the technical advantages or disadvantages of each of those solutions just because something has the Google name on it.

Friday, July 11, 2008 2:02:47 AM (Pacific Daylight Time, UTC-07:00)
Comments [11]  | 
 Wednesday, July 02, 2008
Polyglot Plurality

The Pragmatic Programmer says, "Learn a new language every year". This is great advice, not just because it puts new tools into your mental toolbox that you can pull out on various occasions to get a job done, but also because it opens your mind to new ideas and new concepts that will filter their way into your code even without explicit language support. For example, suppose you've looked at (J/Iron)Ruby or Groovy, and come to like the "internal iterator" approach as a way of simplifying moving across a collection of objects in a uniform way; for political and cultural reasons, though, you can't write code in anything but Java. You're frustrated, because local anonymous functions (also commonly--and, I think, mistakenly--called closures) are not a first-class concept in Java. Then, you later look at Haskell/ML/Scala/F#, which makes heavy use of what Java programmers would call "static methods" to carry out operations, and realize that this could, in fact, be adapted to Java to give you the "internal iteration" concept over the Java Collections:

   1: package com.tedneward.util;
   3: import java.util.*;
   5: public interface Acceptor
   6: {
   7:   public void each(Object obj);
   8: }
  10: public class Collection
  11: {
  12:   public static void each(List list, Acceptor acc)
  13:   {
  14:     for (Object o : list)
  15:       acc.each(o);
  16:   }
  17: }

Where using it would look like this:

   1: import com.tedneward.util.*;
   3: List personList = ...;
   4: Collection.each(new Accpetor() {
   5:   public void each(Object person) {
   6:     System.out.println("Found person " + person + ", isn't that nice?");
   7:   }
   8: });

Is it quite as nice or as clean as using it from a language that has first-class support for anonymous local functions? No, but slowly migrating over to this style has a couple of definitive effects, most notably that you will start grooming the rest of your team (who may be reluctant to pick up these new languages) towards the new ideas that will be present in Groovy, and when they finally do see them (as they will, eventually, unless they hide under rocks on a daily basis), they will realize what's going on here that much more quickly, and start adding their voices to the call to start using (J/Iron)Ruby/Groovy for certain things in the codebase you support.

(By the way, this is so much easier to do in C# 2.0, thanks to generics, static classes and anonymous delegates...

   1: namespace TedNeward.Util
   2: {
   3:   public delegate void EachProc<T>(T obj);
   4:   public static class Collection
   5:   {
   6:     public static void each(ArrayList list, EachProc proc)
   7:     {
   8:       foreach (Object o in list)
   9:         proc(o);
  10:     }
  11:   }
  12: }
  14: // ...
  16: ArrayList personList = ...;
  17: Collection.each(list, delegate(Object person) {
  18:   System.Console.WriteLine("Found " + person + ", isn't that nice?");
  19: });

... though the collection classes in the .NET FCL are nowhere near as nicely designed as those in the Java Collections library, IMHO. C# programmers take note: spend at least a week studying the Java Collections API.)


This, then, opens the much harder question of, "Which language?" Without trying to infer any sort of order or importance, here's a list of languages to consider, with URLs where applicable; I invite your own suggestions, by the way, as I'm sure there's a lot of languages I don't know about, and quite frankly, would love to. The "current hotness" is to learn the languages marked in bold, so if you want to be daring and different, try one of those that isn't. (I've provided some links, but honestly it's kind of tiring to put all of them in; just remember that Google is your friend, and you should be OK. :-) )

  • Visual Basic. Yes, as in Visual Basic--if you haven't played with dynamic languages before, try turning "Option Strict Off", write some code, and see how interacting with the .NET FCL suddenly changes into a duck-typed scenario. If you're really curious, have a look at the generated code in Reflector or ILDasm, and notice how the generated code looks a lot like the generated JVM code from other dynamic languages on an execution environment, a la Groovy.
  • Ruby (JRuby, IronRuby):
  • Groovy: Some call this "javac 2.0"; I'm not sure it merits that title, or the assumption of the mantle of "King of the JVM" that would seem to go with that title, but the fact is, Groovy's a useful language.
  • Scala: A "SCAlable LAnguage" for the JVM (and CLR, though that feature has been left to the community to support), incorporating both object-oriented and functional concepts, plus a few new ideas, into a single package. I'm obviously bullish on Scala, given the talks and articles I've done on it.
  • F#: Originally OCaml-on-the-CLR, now F# is starting to take on a personality of its own as Microsoft productizes it. Like Scala and Erlang, F# will be immediately applicable in concurrency scenarios, I think. I'm obviously bullish on F#, given the talks, articles, and book I'm doing on it.
  • Erlang: Functional language with a strong emphasis on parallel processing, scalability, and concurrency.
  • Perl: People will perhaps be surprised I say this, given my public dislike of Perl's syntax, but I think every programmer should learn Perl, and decide for themselves what's right and what's wrong about Perl. Besides, there's clearly no argument that Perl is one of the power tools in every *nix sysadmin's toolbox.
  • Python: Again, given my dislike of Python's significant whitespace, my suggestion to learn it here may surprise some, but Python seems to be stepping into Perl's shoes as the sysadmin language tool of choice, and frankly, lots of people like the significant whitespace, since that's how they format their code anyway.
  • C++: The grandaddy of them all, in some ways; if you've never looked at C++ before, you should, particularly what they're doing with templates in the Boost library. As Scott Meyers once put it, "We're a long way from Stack<T>!"
  • D: Walter Bright's native-compiling garbage-collected successor to C++/Java/C#.
  • Objective-C (part of gcc): Great "other" object-oriented C-based language that never gathered the kind of attention C++ did, yet ended up making its mark on the industry thanks to Steve Jobs' love of the language and its incorporation into the NeXT (and later, Mac OS X) toolchain. Obj-C is a message-passing object language, which has some interesting implications in its own right.
  • Common Lisp (Steel Bank Common Lisp): What happens when you create a language that holds as a core principle that the language should hold no clear delineation between "code" and "data"? Or that the syntactic expression of the language should be accessible from within that langauge? You get Lisp, and if you're not sure what I'm talking about, pick up a Lisp or a Scheme implementation and start experimenting.
  • Scheme (PLT Scheme, SISC): Scheme is one of the earliest dialects of Lisp, and much of the same syntactic flexibility and power of Lisp is in Scheme, as well. While the syntaxes are usually not directly interchangeable, they're close enough that learning one is usually enough.
  • Clojure: Rich Hickey (who also built "dotLisp" for the CLR) has done an amazing job of bringing Lisp to the JVM, including a few new ideas, such as some functional concepts and an implementation of software transactional memory, among other things.
  • ECMAScript (E4X, Rhino, ES4): If you've never looking at JavaScript outside of the browser, you're in for a surprise--as Glenn Vanderburg put it during one of his NFJS talks, "There's a real programming language in there!". I'm particularly fond of E4X, which integrates XML as a native primitive type, and the Rhino implementation fully supports it, which makes it attractive to use as an XML services implementation language.
  • Haskell (Jaskell): One of the original functional languages. Learning this will give a programmer a leg up on the functional concepts that are creeping into other environments. Jaskell is an implementation of Haskell on the JVM, and they've taken the concept of functional further, creating a build system ("Neptune") on top of Jaskell + Ant, to yield a syntax that's... well... more Haskellian... for building Java projects. (Whether it's better/cleaner than Ant is debatable, but it certainly makes clear the functional nature of build scripts.)
  • ML: Another of the original functional languages. Probably don't need to learn this if you learn Haskell, but hey, it can't hurt.
  • Heron: Heron is interesting because it looks to take on more of the modeling aspects of programming directly into the language, such as state transitions, which is definitely a novel idea. I'm eagerly looking forward to future drops. (I'm not so interested in the graphical design mode, or the idea of "executable UML", but I think there's a vein of interesting ideas here that could be mined for other languages that aren't quite so lofty in scope.)
  • HaXe: A functional language that compiles to three different target platforms: its own (Neko), Flash, and/or Javascript (for use in Web DOMs).
  • CAL: A JVM-based statically-typed language from the folks who bring you Crystal Reports.
  • E: An interesting tack on distributed systems and security. Not sure if it's production-ready, but it's definitely an eye-opener to look at.
  • Prolog: A language built around the idea of logic and logical inference. Would love to see this in play as a "rules engine" in a production system.
  • Nemerle: A CLR-based language with functional syntax and semantics, and semantic macros, similar to what we see in Lisp/Scheme.
  • Nice: A JVM-based language that permits multi-dispatch methods, sometimes known as multimethods.
  • OCaml: An object-functional fusion that was the immediate predecessor of F#. The HaXe and MTASC compilers are both built in OCaml, and frankly, it's in a startlingly small number of lines of code, highlighting how appropriate functional languages are for building compilers and interpreters.
  • Smalltalk (Squeak, VisualWorks, Strongtalk): Smalltalk was widely-known as "the O-O language that all the C guys turned to in order to learn how to build object-oriented programs", but very few people at the time understood that Smalltalk was wildly different because of its message-passing and loosely/un-typed semantics. Now we know better (I hope). Have a look.
  • TCL (Jacl): Tool Command Language, a procedural scripting language that has some nice embedding capabilities. I'd be curious to try putting a TCL-based language in the hands of end users to see if it was a good DSL base. The Jacl implementation is built on top of the JVM.
  • Forth: The original (near as I can tell) stack-based language, in which all execution happens on an execution stack, not unlike what we see in the JVM or CLR. Given how much Lisp has made out of the "atoms and lists" concept, I'm curious if Forth's stack-based approach yields a similar payoff.
  • Lua: Dynamically-typed language that lives to be embedded; known for its biggest embedder's popularity: World of Warcraft, along with several other games/game engines. A great demonstration of the power of embedding a language into an engine/environment to allow users to create emergent behavior.
  • Fan: Another language that seeks to incorporate both static and dynamic typing, running on top of both the JVM or the CLR.
  • Factor: I'm curious about Factor because it's another stack-based language, with a lot of inspiration from some of the other languages on this list.
  • Boo: A Python-inspired CLR language that Ayende likes for domain-specific languages.
  • Cobra: A Python-inspired language that seeks to encompass both static and dynamic typing into one language. Fascinating stuff.
  • Slate: A "prototype-based object-oriented programming language based on Self, CLOS, and Smalltalk-80." Apparently on hold due to loss of interest from the founder, last release was 0.3.5 in August of 2005.
  • Joy: Factor's primary inspiration, another stack-based language.
  • Raven: A scripting language that "rips off" from Python, Forth, Perl, and the creator's own head.
  • Onyx: "Onyx is a powerful stack-based, multi-threaded, interpreted, general purpose programming language similar to PostScript. It can be embedded as an extension language similarly to ficl (Forth), guile (scheme), librep (lisp dialect), s-lang, Lua, and Tcl."
  • LOLCode: No, you won't use LOLcode on a project any time soon, but LOLCode has had so many different implementations of it built, it's a great practice tool towards building your own languages, a la DSLs. LOLcode has all the basic components a language would use, so if you can build a parser, AST and execution engine (either interpreter or compiler) for LOLcode, then you've got the basic skills in place to build an external DSL.

There's more, of course, but hopefully there's something in this list to keep you busy for a while. Remember to send me your favorite new-language links, and I'll add them to the list as seems appropriate.

Happy hacking!

.NET | C++ | F# | Flash | Java/J2EE | Languages | LLVM | Mac OS | Parrot | Ruby | Visual Basic | Windows

Wednesday, July 02, 2008 7:13:10 PM (Pacific Daylight Time, UTC-07:00)
Comments [2]  | 
The power of Office as a front-end

I recently had the pleasure of meeting Bruce Wilson, a principal with iLink, and we had a pleasant conversation about enterprise applications and trends and such. Last week, in the middle of my trip to Prague and Zurich, he sent me a link to a blog entry he'd written on using Office as a front-end, and it sort of underscored some ideas I've had around Office in general.

The interesting thing is, most of the ideas he talks about here could just as easily be implemented on top of a Java back-end, or a Ruby back-end, as a .NET back-end. Office is a tool that many end-users "get" right away (whether you agree with Microsoft's user interface metaphors or not, or even like the fact that Office is one of the most widely-installed software packages on the planet), and it has a lot of support infrastructure built in. "Mashup" doesn't have to mean YouTube on your website; in fact, I dislike the term "mashup" entirely, since it sounds like something done in the heat of the moment without any planning or thought (which is the antithesis of anything that goes--or should go--into the enterprise). Can we use the term "cardinality" instead? Please?

.NET | Java/J2EE | Windows | XML Services

Wednesday, July 02, 2008 6:17:23 PM (Pacific Daylight Time, UTC-07:00)
Comments [2]  |