Tuesday, February 26, 2008
Lang.NET videos are now online

If you read the three days of Lang.NET posts I did last month and wondered "Man, I wish I could've seen...", fret no more.

My personal favorites:

Of course the other presentations are good, but each of these had a moment in them when I said, "Hmm...."

Tuesday, February 26, 2008 7:49:40 PM (Pacific Standard Time, UTC-08:00)
Comments [1]  | 
 Sunday, February 24, 2008
Apropos of nothing: Job trends

While tracking some of the links relating to the Groovy/Ruby war, I found this website, which purportedly tracks job trends based on a whole mess of different job sites. So, naturally, I had to plug in to get a graph of C#, C++, Java, Ruby, and VB:

Interesting. I don't think it proves anything one way or another, mind you, but interesting nonetheless. Having said that, a few things stand out to me after looking at this for all of thirty seconds:

  • Wow, what the hell happened in 1Q and 2Q of 2005? Java takes a huge drop in 2005, and all of them take a small drop of some form around the same time in 2006. What is it with summertime? Did the HR supervisor suddenly take a look at the company's job board and mutter, "Damn, I thought we closed all those listings already..."? (Or maybe, "Thank God for cheap college interns..."?)
  • C++ jobs still outnumber C# jobs, even in 4Q 2007?
  • C++ jobs remain essentially flat from 1Q 2005 to 4Q 2007; apparently, there's a lot more C++ going on than most companies are willing to admit to.... (Can't you picture it? The nervous candidate, sitting at the table, as the interviewer shuffles the paper and says, "So, you're here for a programming job?" The candidate sort of squirms in his chair as he replies, "Well, actually, I was hoping for a... a... C++ job." The interviewer quickly looks around to see who might be listening as he says loudly, "C++? What ever gave you the idea that we do C++ here at BigCorp?" Meanwhile, he surreptitiously scribbles on the back of a business card and slides it across the table to the candidate, then stands up and says loudly, "I'm afraid you've come to the wrong place, sir. You can see yourself out, I take it?" The candidate palms the card, and only once has he left the building does he look at the back, which reads, "8PM, corner of Mission and Vine, password is 'Lippman, Stroustrup, Sutter, and Meyers!' Viva C++!"...)
  • VB jobs fall to below C#? So much for those vast hordes of VB programmers that supposedly form the "long tail" of the .NET community....
  • Java jobs remain essentially flat from 1Q 2005 to 4Q 2007, despite numerous ups and downs. So much for the idea that Java is somehow going away....
  • Ruby's penetration into the job market is much smaller than what I would have guessed.
  • I couldn't help myself, I did another query with "cobol" added in, but I'll leave it to you to run your own query to see what that looks like. It's surprising....

Of course, statistics without any sort of understanding of how they were gathered or from what sources are essentially meaningless, but ooooh, it's in color....

.NET | C++ | Java/J2EE | Languages | Ruby

Sunday, February 24, 2008 9:33:02 PM (Pacific Standard Time, UTC-08:00)
Comments [6]  | 
Some interesting tidbits about LLVM

LLVM definitely does some interesting things as part of its toolchain.

Consider the humble HelloWorld:

   1: #include <stdio.h>
   3: int main() {
   4:   printf("hello world\n");
   5:   return 0;
   6: }

Assuming you have a functioning llvm and llvm-gcc working on your system, you can compile it into LLVM bitcode. This bitcode is directly executable using the lli.exe from llvm:

$ lli < hello.bc
hello world

Meh. Not so interesting. Let's look at the LLVM bitcode for the code, though--that's interesting as a first peek at what LLVM bitcode might look like:

   1: ; ModuleID = '<stdin>'
   2: target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64"
   3: target triple = "mingw32"
   4: @.str = internal constant [12 x i8] c"hello world\00"        ; <[12 x i8]*> [#uses=1] 
   6: define i32 @main() {
   7: entry:
   8:     %tmp2 = tail call i32 @puts( i8* getelementptr ([12 x i8]* @.str, i32 0, i32 0) )        ; <i32> [#uses=0]
   9:     ret i32 0
  10: } 
  12: declare i32 @puts(i8*)

Hmm. Now of course, LLVM also has to be able to get down to actual machine instructions, and in point of fact there is a tool in the LLVM toolchain, called llc, that can do this transformation ahead-of-time, like so:

$ llc hello.bc -o hello.bc.s -march x86

And, looking at the results, we see...

   1: .text
   2: .align    16
   3: .globl    _main
   4: .def     _main;    .scl    2;    .type    32;    .endef
   5: n:
   6: pushl    %ebp
   7: movl    %esp, %ebp
   8: subl    $8, %esp
   9: andl    $4294967280, %esp
  10: movl    $16, %eax
  11: call    __alloca
  12: call    ___main
  13: movl    $_.str, (%esp)
  14: call    _puts
  15: xorl    %eax, %eax
  16: movl    %ebp, %esp
  17: popl    %ebp
  18: ret
  19: .data
  20: r:                # .str
  21: .asciz    "hello world"
  22: .def     _puts;    .scl    2;    .type    32;    .endef

Bleah. Assembly language, and in NASM format, to boot. (What did you expect, anyway?)

Of course, assembly language and C were always considered fairly close together in terms of their abstraction layer (C was designed as a replacement for assembly language when porting Unix, remember), so it might not be too hard to...

$ llc hello.bc -o hello.bc.c -march c

And get...

   1: /* Provide Declarations */
   2: #include <stdarg.h>
   3: #include <setjmp.h>
   4: /* get a declaration for alloca */
   5: #if defined(__CYGWIN__) || defined(__MINGW32__)
   6: #define  alloca(x) __builtin_alloca((x))
   7: #define _alloca(x) __builtin_alloca((x))
   8: #elif defined(__APPLE__)
   9: extern void *__builtin_alloca(unsigned long);
  10: #define alloca(x) __builtin_alloca(x)
  11: #define longjmp _longjmp
  12: #define setjmp _setjmp
  13: #elif defined(__sun__)
  14: #if defined(__sparcv9)
  15: extern void *__builtin_alloca(unsigned long);
  16: #else
  17: extern void *__builtin_alloca(unsigned int);
  18: #endif
  19: #define alloca(x) __builtin_alloca(x)
  20: #elif defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__)
  21: #define alloca(x) __builtin_alloca(x)
  22: #elif defined(_MSC_VER)
  23: #define inline _inline
  24: #define alloca(x) _alloca(x)
  25: #else
  26: #include <alloca.h>
  27: #endif
  29: #ifndef __GNUC__  /* Can only support "linkonce" vars with GCC */
  30: #define __attribute__(X)
  31: #endif
  33: #if defined(__GNUC__) && defined(__APPLE_CC__)
  34: #define __EXTERNAL_WEAK__ __attribute__((weak_import))
  35: #elif defined(__GNUC__)
  36: #define __EXTERNAL_WEAK__ __attribute__((weak))
  37: #else
  38: #define __EXTERNAL_WEAK__
  39: #endif
  41: #if defined(__GNUC__) && defined(__APPLE_CC__)
  42: #define __ATTRIBUTE_WEAK__
  43: #elif defined(__GNUC__)
  44: #define __ATTRIBUTE_WEAK__ __attribute__((weak))
  45: #else
  46: #define __ATTRIBUTE_WEAK__
  47: #endif
  49: #if defined(__GNUC__)
  50: #define __HIDDEN__ __attribute__((visibility("hidden")))
  51: #endif
  53: #ifdef __GNUC__
  54: #define LLVM_NAN(NanStr)   __builtin_nan(NanStr)   /* Double */
  55: #define LLVM_NANF(NanStr)  __builtin_nanf(NanStr)  /* Float */
  56: #define LLVM_NANS(NanStr)  __builtin_nans(NanStr)  /* Double */
  57: #define LLVM_NANSF(NanStr) __builtin_nansf(NanStr) /* Float */
  58: #define LLVM_INF           __builtin_inf()         /* Double */
  59: #define LLVM_INFF          __builtin_inff()        /* Float */
  60: #define LLVM_PREFETCH(addr,rw,locality) __builtin_prefetch(addr,rw,locality)
  61: #define __ATTRIBUTE_CTOR__ __attribute__((constructor))
  62: #define __ATTRIBUTE_DTOR__ __attribute__((destructor))
  63: #define LLVM_ASM           __asm__
  64: #else
  65: #define LLVM_NAN(NanStr)   ((double)0.0)           /* Double */
  66: #define LLVM_NANF(NanStr)  0.0F                    /* Float */
  67: #define LLVM_NANS(NanStr)  ((double)0.0)           /* Double */
  68: #define LLVM_NANSF(NanStr) 0.0F                    /* Float */
  69: #define LLVM_INF           ((double)0.0)           /* Double */
  70: #define LLVM_INFF          0.0F                    /* Float */
  71: #define LLVM_PREFETCH(addr,rw,locality)            /* PREFETCH */
  72: #define __ATTRIBUTE_CTOR__
  73: #define __ATTRIBUTE_DTOR__
  74: #define LLVM_ASM(X)
  75: #endif
  77: #if __GNUC__ < 4 /* Old GCC's, or compilers not GCC */ 
  78: #define __builtin_stack_save() 0   /* not implemented */
  79: #define __builtin_stack_restore(X) /* noop */
  80: #endif
  82: #define CODE_FOR_MAIN() /* Any target-specific code for main()*/
  84: #ifndef __cplusplus
  85: typedef unsigned char bool;
  86: #endif
  89: /* Support for floating point constants */
  90: typedef unsigned long long ConstantDoubleTy;
  91: typedef unsigned int        ConstantFloatTy;
  92: typedef struct { unsigned long long f1; unsigned short f2; unsigned short pad[3]; } ConstantFP80Ty;
  93: typedef struct { unsigned long long f1; unsigned long long f2; } ConstantFP128Ty;
  96: /* Global Declarations */
  97: /* Helper union for bitcasts */
  98: typedef union {
  99:   unsigned int Int32;
 100:   unsigned long long Int64;
 101:   float Float;
 102:   double Double;
 103: } llvmBitCastUnion;
 105: /* External Global Variable Declarations */
 107: /* Function Declarations */
 108: double fmod(double, double);
 109: float fmodf(float, float);
 110: long double fmodl(long double, long double);
 111: unsigned int main(void);
 112: unsigned int puts(unsigned char *);
 113: unsigned char *malloc();
 114: void free(unsigned char *);
 115: void abort(void);
 118: /* Global Variable Declarations */
 119: static unsigned char _2E_str[12];
 122: /* Global Variable Definitions and Initialization */
 123: static unsigned char _2E_str[12] = "hello world";
 126: /* Function Bodies */
 127: static inline int llvm_fcmp_ord(double X, double Y) { return X == X && Y == Y; }
 128: static inline int llvm_fcmp_uno(double X, double Y) { return X != X || Y != Y; }
 129: static inline int llvm_fcmp_ueq(double X, double Y) { return X == Y || llvm_fcmp_uno(X, Y); }
 130: static inline int llvm_fcmp_une(double X, double Y) { return X != Y; }
 131: static inline int llvm_fcmp_ult(double X, double Y) { return X <  Y || llvm_fcmp_uno(X, Y); }
 132: static inline int llvm_fcmp_ugt(double X, double Y) { return X >  Y || llvm_fcmp_uno(X, Y); }
 133: static inline int llvm_fcmp_ule(double X, double Y) { return X <= Y || llvm_fcmp_uno(X, Y); }
 134: static inline int llvm_fcmp_uge(double X, double Y) { return X >= Y || llvm_fcmp_uno(X, Y); }
 135: static inline int llvm_fcmp_oeq(double X, double Y) { return X == Y ; }
 136: static inline int llvm_fcmp_one(double X, double Y) { return X != Y && llvm_fcmp_ord(X, Y); }
 137: static inline int llvm_fcmp_olt(double X, double Y) { return X <  Y ; }
 138: static inline int llvm_fcmp_ogt(double X, double Y) { return X >  Y ; }
 139: static inline int llvm_fcmp_ole(double X, double Y) { return X <= Y ; }
 140: static inline int llvm_fcmp_oge(double X, double Y) { return X >= Y ; }
 142: unsigned int main(void) {
 143:   unsigned int llvm_cbe_tmp2;
 145:   CODE_FOR_MAIN();
 146:   llvm_cbe_tmp2 =  /*tail*/ puts((&(_2E_str[((signed int )((unsigned int )0))])));
 147:   return ((unsigned int )0);
 148: }

Granted, it's some ugly-looking C code, with all those preprocessor fragments floating around in there, but if you take a few moments and go down to the main() definition, it's C to bitcode to C. We've come full circle.

Looking back at that first disassembly dump, I'm struck by how LLVM bitcode looks a lot like any other high-level assembly or low-level virtual machine language, even reminiscent of MSIL. In fact, there's probably a pretty close correlation between LLVM bitcode and MSIL.

In point of fact, LLVM knows this, too:

$ llc hello.bc -o -march msil

And check out what it generates:

   1: .assembly extern mscorlib {}
   2: .assembly MSIL {}
   4: // External
   5: .method static hidebysig pinvokeimpl("MSVCRT.DLL")
   6:     unsigned int32 modopt([mscorlib]System.Runtime.CompilerServices.CallConvCdecl) 'puts'(void* ) preservesig {}
   8: .method static hidebysig pinvokeimpl("MSVCRT.DLL")
   9:     vararg void* modopt([mscorlib]System.Runtime.CompilerServices.CallConvCdecl) 'malloc'() preservesig {}
  11: .method static hidebysig pinvokeimpl("MSVCRT.DLL")
  12:     void modopt([mscorlib]System.Runtime.CompilerServices.CallConvCdecl) 'free'(void* ) preservesig {}
  14: .method public hidebysig static pinvokeimpl("KERNEL32.DLL" ansi winapi)  native int LoadLibrary(string) preservesig {}
  15: .method public hidebysig static pinvokeimpl("KERNEL32.DLL" ansi winapi)  native int GetProcAddress(native int, string) preservesig {}
  16: .method private static void* $MSIL_Import(string lib,string sym)
  17:  managed cil
  18: {
  19:     ldarg    lib
  20:     call    native int LoadLibrary(string)
  21:     ldarg    sym
  22:     call    native int GetProcAddress(native int,string)
  23:     dup
  24:     brtrue    L_01
  25:     ldstr    "Can no import variable"
  26:     newobj    instance void [mscorlib]System.Exception::.ctor(string)
  27:     throw
  28: L_01:
  29:     ret
  30: }
  32: .method static private void $MSIL_Init() managed cil
  33: {
  34:     ret
  35: }
  37: // Declarations
  38: .class value explicit ansi sealed 'unsigned int8 [12]' { .pack 1 .size 12 }
  40: // Definitions
  41: .field static private valuetype 'unsigned int8 [12]' '.str' at '.str$data'
  42: .data '.str$data' = {
  43: int8 (104),
  44: int8 (101),
  45: int8 (108),
  46: int8 (108),
  47: int8 (111),
  48: int8 (32),
  49: int8 (119),
  50: int8 (111),
  51: int8 (114),
  52: int8 (108),
  53: int8 (100),
  54: int8 (0) [1]
  55: }
  57: // Startup code
  58: .method static public int32 $MSIL_Startup() {
  59:     .entrypoint
  60:     .locals (native int i)
  61:     .locals (native int argc)
  62:     .locals (native int ptr)
  63:     .locals (void* argv)
  64:     .locals (string[] args)
  65:     call    string[] [mscorlib]System.Environment::GetCommandLineArgs()
  66:     dup
  67:     stloc    args
  68:     ldlen
  69:     conv.i4
  70:     dup
  71:     stloc    argc
  72:     ldc.i4    4
  73:     mul
  74:     localloc
  75:     stloc    argv
  76:     ldc.i4.0
  77:     stloc    i
  78: L_01:
  79:     ldloc    i
  80:     ldloc    argc
  81:     ceq
  82:     brtrue    L_02
  83:     ldloc    args
  84:     ldloc    i
  85:     ldelem.ref
  86:     call    native int [mscorlib]System.Runtime.InteropServices.Marshal::StringToHGlobalAnsi(string)
  87:     stloc    ptr
  88:     ldloc    argv
  89:     ldloc    i
  90:     ldc.i4    4
  91:     mul
  92:     add
  93:     ldloc    ptr
  94:     stind.i
  95:     ldloc    i
  96:     ldc.i4.1
  97:     add
  98:     stloc    i
  99:     br    L_01
 100: L_02:
 101:     call void $MSIL_Init()
 102:     call    unsigned int32 modopt([mscorlib]System.Runtime.CompilerServices.CallConvCdecl) main()
 103:     conv.i4
 104:     ret
 105: }
 107: .method static public unsigned int32 modopt([mscorlib]System.Runtime.CompilerServices.CallConvCdecl) 'main'
 108:     () cil managed
 109: {
 110:     .locals (unsigned int32 'ltmp_0_1')
 111:     .maxstack    16
 112: ltmp_1_2:
 114: //    %tmp2 = tail call i32 @puts( i8* getelementptr ([12 x i8]* @.str, i32 0, i32 0) )        ; <i32> [#uses=0]
 116:     ldsflda    valuetype 'unsigned int8 [12]' '.str'
 117:     conv.u4
 118:     call    unsigned int32 modopt([mscorlib]System.Runtime.CompilerServices.CallConvCdecl) 'puts'(void* )
 119:     stloc    'ltmp_0_1'
 121: //    ret i32 0
 123:     ldc.i4    0
 124:     ret
 125: }

Holy frickin' crap. I think I'm in love.

.NET | C++ | Languages | LLVM

Sunday, February 24, 2008 5:00:17 AM (Pacific Standard Time, UTC-08:00)
Comments [0]  | 

Some quotes I've found to be thought-provoking over the last week or so:

"Some programming languages manage to absorb change, but withstand progress."

"In a 5 year period we get one superb programming language. Only we can't control when the 5 year period will begin."

"Every program has (at least) two purposes: the one for which it was written and another for which it wasn't."

"If a listener nods his head when you're explaining your program, wake him up."

"A language that doesn't affect the way you think about programming, is not worth knowing."

"Wherever there is modularity there is the potential for misunderstanding: Hiding information implies a need to check communication."

(All of the above, Alan Perlis)


"Program testing can be used to show the presence of bugs, but never to show their absence!"

"The competent programmer is fully aware of the limited size of his own skull. He therefore approaches his task with full humility, and avoids clever tricks like the plague."

"How do we convince people that in programming simplicity and clarity —in short: what mathematicians call "elegance"— are not a dispensable luxury, but a crucial matter that decides between success and failure?"

"Are you quite sure that all those bells and whistles, all those wonderful facilities of your so called powerful programming languages, belong to the solution set rather than the problem set?"

"Object-oriented programming is an exceptionally bad idea which could only have originated in California."

"The prisoner falls in love with his chains."

"Write a paper promising salvation, make it a 'structured' something or a 'virtual' something, or 'abstract', 'distributed' or 'higher-order' or 'applicative' and you can almost be certain of having started a new cult."

"I remember from those days two design principles that have served me well ever since, viz.

  1. before really embarking on a sizable project, in particular before starting the large investment of coding, try to kill the project first, and
  2. start with the most difficult, most risky parts first."

(All of the above, Edsgar Dijkstra)

Make of them what you will....

Languages | Reading

Sunday, February 24, 2008 3:16:52 AM (Pacific Standard Time, UTC-08:00)
Comments [0]  | 
 Saturday, February 23, 2008
Building LLVM on Windows using MinGW32

As I've mentioned in passing, one of the things I'm playing with in my spare time (or will play with, now that I've got everything working, I think) is the LLVM toolchain. In essence, it looks to be a parallel to Microsoft's Phoenix, except that it's out, it's been in use in production environments (Apple is a major contributor to the project and uses it pretty extensively, it seems), and it supports not only C/C++ and Objective-C, but also Ada and Fortran. It's also a useful back-end for people writing languages, hence my interest.

One of the things that appeals about LLVM is that it uses an "intermediate representation" that in many ways reminds me of Phoenix's Low IR, though I'm sure there are significant differences that I'm not well-practiced enough to spot. Consider this bit of Fibonacci code, for example:

   1: define i32 @fib(i32 %AnArg) {
   2: EntryBlock:
   3:     %cond = icmp sle i32 %AnArg, 2        ; <i1> [#uses=1]
   4:     br i1 %cond, label %return, label %recurse
   6: return:        ; preds = %EntryBlock
   7:     ret i32 1
   9: recurse:        ; preds = %EntryBlock
  10:     %arg = sub i32 %AnArg, 1        ; <i32> [#uses=1]
  11:     %fibx1 = tail call i32 @fib( i32 %arg )        ; <i32> [#uses=1]
  12:     %arg1 = sub i32 %AnArg, 2        ; <i32> [#uses=1]
  13:     %fibx2 = tail call i32 @fib( i32 %arg1 )        ; <i32> [#uses=1]
  14:     %addresult = add i32 %fibx1, %fibx2        ; <i32> [#uses=1]
  15:     ret i32 %addresult
  16: }
  18: declare void @abort()

It's rather interesting to imagine this as a direct by-product of that first pass off of the hypothetical Universal AST....

Getting this thing to build has been an exercise of patience, however.

The documentation on the website, while extensive, isn't very Windows-friendly. For example, there's a page that describes how to build it with Visual Studio, but it's a touch out-of-date. On top of that, it turns out that the VS/LLVM tools can't compile to LLVM bitcode, only execute it once it's in that format; you need "llvm-gcc" to compile to bitcode, which means you're left with a two-machine solution: a *nix box using llvm-gcc to compile the code, and then your Windows box to run it. Ugh.

Fortunately, Windows users have two choices for dealing with *nix solutions: Cygwin and MinGW32. The first tries to lay down a *nix-like layer on top of the Win32 APIs (meaning everything depends on cygwin1.dll once built), the second tries to provide an adapter layer such that when a *nix tool is done building, it has no dependencies beyond what you'd see from any other Win32 app. Debates rage about the validity of each, and rather than seem like I'm coming down in favor of one or the other, I'll simply note that I have both installed in my Languages VMWare image now, and leave it at that.

Building LLVM with MinGW was a bit more painful than I expected, however, so for a long time I just didn't bother. Last night that changed, thanks to Anton Korobeynikov, who spent the better part of three or four hours in back-and-forth email conversation with me, walking me patiently through the step-by-step of getting MinGW and msys up and running on my machine long enough to build the LLVM 2.2+ (meaning the tip beyond the current 2.2 release) code base. I can't thank him enough--both for the direct help in getting the MinGW bits up and in the right places as well as for the casual conversation about MinGW along the way--so I thought I'd replicate what we did on my box to the 'Net in an attempt to spare others the effort.

First, there's a pile of tarballs from the MinGW download page that require downloading and extracting:

  • gcc-g++-3.4.5-20060117-1.tar.gz
  • binutils-2.18.50-20080109.tar.gz
  • mingw-runtime-3.14.tar.gz
  • gcc-core-3.4.5-20060117-1.tar.gz
  • w32api-3.11.tar.gz

Note that I also pulled down the other gcc- tarballs (gcj, objc and so on), just because I wanted to play with the MinGW versions of these tools. Extract all of these into a directory; on my system, that's C:/Prg/MinGW.

(There is a .exe installer on the Sourceforge page that supposedly manages all this for you, but it installed the binutils-2.17 package instead of 2.18, and I couldn't figure out how to get it to grab 2.18. All it does is download these packages and extract them, so going without it isn't a huge ordeal.)

By the way, if you're curious about experimenting with gcj as well (hey, it's a Java compiler that compiles to native code--that's interesting in its own right, if you ask me), take careful note that as it stands right now in the installation process, you can run gcj but can't compile with it--it complains about a missing library, "iconv". This is a known bug, it seems, and the solution is to install libiconv from the GnuWin32 project--just extract the "bin" and "lib" packages into C:/Prg/MinGW.

At this point, you're done with C:/Prg/MinGW32.

Next, there's a couple of installers and additional tarballs that need downloading and extracting:

  • MSYS-1.0.10.exe
  • msysDTK-1.0.1.exe
  • bash-3.1-MSYS-1.0.11-1.tar.bz2
  • bison-2.3-MSYS-1.0.11.tar.bz2
  • flex-2.5.33-MSYS-1.0.11.tar.bz2
  • regex-0.12-MSYS-1.0.11.tar.bz2 (required by flex)

The first two just execute and install; on my system, that is C:/Prg/msys/1.0. The next one just extracts into the C:/Prg/msys/1.0 directory. The last three are a tad tricky, however--apparently they assume that everything should be installed into a top-level "usr" directory, and that's not quite where we want them; we want them. Apparently, we want them installed directly (so that "/usr/bin" from bison goes into "/bin" inside of "C:/Prg/msys/1.0"), so extract these to a temporary directory, then xcopy everything inside the temp/usr directory over to C:/Prg/msys/1.0. (That is, "cd temp", then "cd usr", then "xcopy /s/e * C:/Prg/msys/1.0".)

At this point, we're done with the setup--create a directory into which you want LLVM built (on my system, that's C:/Prg/LLVM/msys-build, where the source from SVN is held in C:/Prg/LLVM/llvm-svn), and execute the "configure" script in this directory (that is, "cd C:/Prg/LLVM/msys-build" and "../llvm-svn/configure"). The script will deposit a bunch of makefiles and directories into the build directory, after which a simple "make" suffices to build everything (in Debug; if you want Release, do "make ENABLE_OPTIMIZED=1", as per the LLVM documentation).

Thanks again, Anton! Now can you help me get llvm-gcc working? :-)

C++ | Java/J2EE | Languages | LLVM | Windows

Saturday, February 23, 2008 8:34:35 PM (Pacific Standard Time, UTC-08:00)
Comments [1]  | 
I love it when good accountanting girls go geek

Erik Mork, C++ and .NET programmer extraordinaire and bright guy in his own right, has subverted my sister-in-law to programming, and the pair of them are now opening the doors of their new company, Silver Bay Labs, with a series of podcasts on Silverlight and "sparkling clients" in general. Have a listen, if you're interested in the whole "rich client" thing....

.NET | Windows

Saturday, February 23, 2008 2:21:33 AM (Pacific Standard Time, UTC-08:00)
Comments [0]  | 
 Friday, February 22, 2008
URLs as first-class concepts in a language

While perusing the E Tutorial, I noticed something that was simple and powerful all at the same time: URLs as first-class concepts in the language. Or, if you will, URLs as a factory for creating objects. Check out this snippet of E:

? pragma.syntax("0.8")

? def poem := <>
# value: <>

? <file:c:/jabbertest>.mkdirs(null);
? <file:c:/jabbertest/jabberwocky.txt>.setText(poem.getText())

Notice how the initialization of the "poem" variable is set to what looks like an HTTP URL? This essentially downloads the contents of that file and stores it into poem (in a form I don't precisely understand yet--I think it's an object that wraps the contents, but I could be wrong). Then the script uses file URLs to create the local directory (jabbertest) and to create a new file (jabberwocky.txt) and set the contents of that file to be the same as the contents of the stored "poem" object.

That, my friends, is just slick. It also neatly avoids the whole "how are files and directories and stuff different from URLs" that tends to make doing this same bit of code in Java or C# that much more difficult.

.NET | C++ | Java/J2EE | Languages | Parrot

Friday, February 22, 2008 11:40:06 PM (Pacific Standard Time, UTC-08:00)
Comments [1]  | 
More language features revisited

Since we're examining various aspects of the canonical O-O language (the three principals being C++, Java and C#/VB.NET), let's take in a review of another recent post, this time on the use of "new" in said languages.

All of us have probably written code like this:

Foo f = new Foo();

And what could be simpler?  As long as the logic in the constructor is simple (or better yet, the constructor is empty), it would seem that the simplest code is the best, so just use the constructor.  Certainly the MSDN documentation is rife with code that uses public constructors.  You can probably find plenty of public constructors used right here on my blog.  Why invest the effort in writing (and using) a factory class that will probably never do anything useful, other than call a public constructor?

In his excellent podcast entitled "Emergent Design: The Evolutionary Nature of Software Development," Scott Bain of Net Objectives nevertheless makes a strong case against the routine use of public constructors.  The problem, notes Scott, is that the use of a public constructor ties the calling code to the implementation of Foo as a concrete class.  But suppose that you later discover that there need to be many subtypes of Foo, and Foo should therefore be an abstract class instead of a concrete class--what then?  You've got a big problem, that's what; a lot of client code that has been making use of Foo's public constructor suddenly becomes invalid.

I just love it when people rediscover advice that they could have had much earlier, had they only been aware of the prior art in the field. I refer the curious C#/VB.NET developer to the book Effective Java, by Joshua Bloch, in which Item 1 states, "Consider providing static factory methods instead of constructors". Quoting from said book, we see:

One advantage of static factory methods is that, unlike constructors, they have names. If the parameters to a constructor do not, in and of themselves, describe the object being returned, a static factory with a well-chosen name can make a class easier to user and the resulting client code easier to read. ...

A second advantage of static factory methods is that, unlike constructors, they are not required to create a new object each time they're invoked. This allows immutable classes (Item 13) to use preconstructed instances or to cache instances as they're constructed and to dispense these instances repeatedly so as to avoid creating unnecessary duplicate values. ...

A third advantage of static factory methods is that, unlike constructors, they can return an object of any subtype of their return type. This gives you great flexibility in choosing the class of the returned object. ...

The main disadvantage of static factory methods is that classes without public or protected constructors cannot be subclassed. The same is true for nonpublic classes returned by public static factories.

A second disadvantage of static factory methods is that they are not readily distinguishable from other static methods. They do not stand out in API documentation the way that constructors do.

C# and VB.NET developers are encouraged to read the book to discover about 30 or so other nuggets of wisdom that are directly applicable to the .NET framework. Note that Josh is in the process, this very month, of revising the book for rerelease as a second edition, taking into account the wide variety of changes that have taken place in the Java language since EJ's initial release.


One thing that's been nagging at me is how I think Java and C# missed the boat in respect to the various ways we'd like to construct objects. The presumption was always that allocation and initialization would (a) always take place at the same time, and (b) always take place in the same manner--the underlying system would allocate the memory, the object would be laid out in this newly-minted chunk of heap, and your constructor would then initialize the contents. Neither assumption can be taken to be true, as we've seen over the years; the object may need to come from pre-existing storage (a la the object cache), or the object may need to be a derived type (a la the covariant return Josh mentions in #3 advantage above), or in some cases you want to mint the object from an entirely different part of the process.

C++ actually had an advantage over C# and Java here, in that you could overload operator new() for a class (which then meant you had to overload operator delete(), and oh-by-the-way don't forget to overload array new, that is, operator new[]() and its corresponding twin, array delete, operator delete[](), which was a bit of a pain) to gain better control over both allocation and initialization, to a degree. Initially we always used it to control allocation--the idea being one would create a class-specific allocator, on the grounds that knowing some of the assumptions of the class, such as its size, would allow you to write faster allocation routines for it. But one of the rarely-used features of operator new() was that it could take additional parameters, using a truly obscure syntactic corner of C++:

   1: void* operator new(size_t s, const string& message)
   2: {
   3:     cout << "Operator new sez " << message << endl;
   4:         // allocate s bytes and return; Foo ctor will be invoked automagically
   5: }
   6: Foo* newFoo = new ("Howdy, world!") Foo();

Officially, one such overloaded operator was recognized, the placement new operator, which took a void* as a parameter, indicating the exact location in which your object was to be allocated and thus laid down. This meant that C++ developers could allocate from some other part of the process (including shudder a pointer they'd made up out of thin air) and drop the initialized object right there. While useful in its own right, placement new opened up a whole new world of construction options to the C++ developer that we never really took advantage of, since now you could pass parameters to the construction process without involving the constructor.

That's kind of nifty, in an obscure and slightly terrifying fashion. One thought I'd always had was that it would be cool if a C++ O/R-M overloaded operator new() for database-bound objects to indicate which database connection to use during construction:

   1: DBConnection conn;
   3: Person* newFoo = new (conn) Person("Ted", "Neward");


Of course, such syntax has the immediate drawback of eliciting a chorus of "WTF?!?" at the next code review, but still....

Meanwhile, other languages choose to view new as one of those nasty static methods Gilad dislikes so much, Ruby and Smalltalk being two of them. That is to say, construction now basically calls into a static method on a class, which has the nice effect of keeping the number of "special" parts of the language to a minimum (since now "new" is just a method, not a keyword), makes it easier to have different-yet-similar names to represent slightly different concepts ("create" vs "new" vs "fetch" vs "allocate", and so on) sitting side by side, and helps eliminate Josh's second disadvantage above. I'm not certain how exactly this could eliminate Josh's first disadvantage (that of inheritance and inaccessible constructors), but it's not entirely unimaginable that the language would have a certain amount of incestuous knowledge here to be able to reach those static method (constructors) in the same way it does currently.

(It actually works better if they aren't static methods at all, but instance methods on class objects, to which the language automatically defers when it sees a ""; that is, when it sees

Person ann ="Ann", "Sheriff");

the language automatically changes this to read:

Person ann ="Ann", "Sheriff");

which would be eminently doable in Java, were class objects available for modification/definition somehow. In a language built on top of the JVM or CLR, the class object would be a standalone singleton, a la "object" definitions in Scala.)

.NET | C++ | Java/J2EE | Languages | Parrot | Ruby

Friday, February 22, 2008 1:49:49 AM (Pacific Standard Time, UTC-08:00)
Comments [2]  | 
 Thursday, February 21, 2008
Static considered harmful?

Gilad makes the case that static, that staple of C++, C#/VB.NET, and Java, does not belong:

Most imperative languages have some notion of static variable. This is unfortunate, since static variables have many disadvantages. I have argued against static state for quite a few years (at least since the dawn of the millennium), and in Newspeak, I’m finally able to eradicate it entirely.

I think Gilad conflates a few things, but he's also got some good points. To the dissecting table!

To begin:

Static variables are bad for security. See the E literature for extensive discussion on this topic. The key idea is that static state represents an ambient capability to do things to your system, that may be taken advantage of by evildoers.

Eh.... I'm not sure I buy into this. For evildoers to be able to change static state, they have to have some kind of "poke" access inside the innards of your application, and if they have that, then just about anything is vulnerable. Now, granted, I haven't spent a great deal of time on the E literature, so maybe I'm missing the point here, but if an attacker has data-manipulability into my program, then I'm in a whole world of pain, whether he's attacking statics or instances. Having said that, statics have to be stored in a particular well-known location inside the process, so maybe that makes them a touch more vulnerable. Still, this seems a specious argument.

Static variables are bad for distribution. Static state needs to either be replicated and sync’ed across all nodes of a distributed system, or kept on a central node accessible by all others, or some compromise between the former and the latter. This is all difficult/expensive/unreliable.

Now this one I buy into, but the issue isn't the "static"ness of the data, but the fact that it's effectively a Singleton, and Singletons in any distributed system are Evil. I talked a great deal about this in Effective Enterprise Java, so I'll leave that alone, but let me point out that any Singleton is evil, whether it's represented in a static, a Singleton object, a Newspeak module, or a database. The "static"ness here is a red herring.

Static variables are bad for re-entrancy. Code that accesses such state is not re-entrant. It is all too easy to produce such code. Case in point: javac. Originally conceived as a batch compiler, javac had to undergo extensive reconstructive surgery to make it suitable for use in IDEs. A major problem was that one could not create multiple instances of the compiler to be used by different parts of an IDE, because javac had significant static state. In contrast, the code in a Newspeak module definition is always re-entrant, which makes it easy to deploy multiple versions of a module definition side-by-side, for example.

Absolutely, but this is true for instance fields, too--any state that is modified as part of two or more method bodies is vulnerable to a re-entrancy concern, since now the field is visibly modified state to that particular instance. How deeply do you want your code to be re-entrant? Gilad's citation of the javac compiler points out that the compiler was hardly re-entrant at any reasonable level, but the fact is that the compiler *could* have been used in a parallelized fashion using the isolational properties of ClassLoaders. (Its ugly, and Java desperately needs Isolates for that reason.)

Static variables are bad for memory management. This state has to be handed specially by implementations, complicating garbage collection. The woeful tale of class unloading in Java revolves around this problem. Early JVMs lost application’s static state when trying to unload classes. Even though the rules for class unloading were already implicit in the specification, I had to add a section to the JLS to state them explicitly, so overzealous implementors wouldn’t throw away static application state that was not entirely obvious.

This one I can't really comment on, since I'm not in the habit of writing memory-management code. I'll take Gilad's word for it, though I'm curious to know why this is so, in more detail.

Static variables are bad for for startup time. They encourage excess initialization up front. Not to mention the complexities that static initialization engenders: it can deadlock, applications can see uninitialized state, and unless you have a really smart runtime, you find it hard to compile efficiently (because you need to test if things are initialized on every use).

I'm not sure I see how this is different for any startup/initialization code--anything that the user can specify as part of startup will run the risk of deadlocks and viewing uninitialized state. Consider the alternative, however--if the user didn't have the ability to specify startup code, then they would have to either write their own, post-runtime, startup code, or else they have to constantly check the state of their uninitialized objects and initialize them on first use, the very thing that he claims is hard to compile efficiently.

Static variables are bad for for concurrency. Of course, any shared state is bad for concurrency, but static state is one more subtle time bomb that can catch you by surprise.

Absolutely: any shared state is bad for concurrency. However, I think we need to go back to first principles here. Since any shared state is bad for concurrency, and since static data is always shared by definition, it follows that static data is bad for concurrency. Pay particular attention to that chain of reasoning, however: any shared state is bad for concurrency, whether it's held by the process in a special non-instance-aligned location or in an data store that happens to be reachable from multiple paths of control. This means that your average database table is also bad for concurrency, were it not for the transactional protections that surround the table. This isn't an indictment of static variables, per se, but of shared state.

Gilad goes on to describe how Newspeak solves this problem of static:

It may seem like you need static state, somewhere to start things off, but you don’t. You start off by creating an object, and you keep your state in that object and in objects it references. In Newspeak, those objects are modules.

Newspeak isn’t the only language to eliminate static state. E has also done so, out of concern for security. And so has Scala, though its close cohabitation with Java means Scala’s purity is easily violated. The bottom line, though, should be clear. Static state will disappear from modern programming languages, and should be eliminated from modern programming practice.

I wish Newspeak were available for widespread use, because I'd love to explore this concept further; in the CLR, for example, there is the same idea of "modules", in that modules are singleton entities in which methods and data can reside, at a higher level than individual objects themselves. Assemblies, for example, form modules, and this is where "global variables" and "global methods" exist (when supported by the compiling language in question). At the end of the day, though, these are just statics by another name, and face most, if not all, of the same problems Gilad lays out above. Scala "objects" have the same basic property.

I think the larger issue here is that one should be careful where one stores state, period. Every piece of data has a corresponding scope of accessibility, and developers have grown complacent about considering that scope when putting data there: they consider the accessibility at the language level (public, private, what-have-you), and fail to consider the scope beyond that (concurrency, re-entrancy, and so on).

At the end of the day, it's simple: static entities and instance entities are just entities. Nothing more, nothing less. Caveat emptor.

.NET | C++ | Java/J2EE | Languages | Parrot | Ruby | XML Services

Thursday, February 21, 2008 8:07:37 PM (Pacific Standard Time, UTC-08:00)
Comments [7]  | 
 Tuesday, February 19, 2008
The Fallacies Remain....

Just recently, I got this bit in an email from the Redmond Developer News ezine:


In the course of just over a week starting on Jan. 30, a total of five undersea data cables linking Europe, Africa and the Middle East were damaged or disrupted. The first two cables to be lost link Europe with Egypt and terminate near the Port of Alexandria.

Early speculation placed the blame on ship anchors that might have dragged across the sea floor during heavy weather. But the subsequent loss of cables in the Persian Gulf and the Mediterranean has produced a chilling numbers game. Someone, it seems, may be trying to sabotage the global network.

It's a conclusion that came up at a recent International Telecommunication Union (ITU) press conference. According to an Associated Press report, ITU head of development Sami al-Murshed isn't ready to "rule out that a deliberate act of sabotage caused the damage to the undersea cables over two weeks ago."

You think?

In just seven or eight days, five undersea cables were disrupted.

Five. All of them serving or connecting to the Middle East. And thus far, only one cable cut -- linking Oman and the United Arab Emirates -- has been identified as accidental, caused by a dragging ship anchor.

So what does it mean for developers? A lot, actually. Because it means that the coming wave of service-enabled applications needs to take into account the fact that the cloud is, literally, under attack.

This isn't new. For as long as the Internet has been around, concerns about attacks on the network have centered on threats posed by things like distributed denial of service (DDOS) and other network-borne attacks. Twice -- once in 2002 and again in 2007 -- DDOS attacks have targeted the 13 DNS root servers, threatening to disrupt the Internet.

But assaults on the remote physical infrastructure of the global network are especially concerning. These cables lie hundreds or even thousands of feet beneath the surface. This wasn't a script-kiddie kicking off an ill-advised DOS attack on a server. This was almost certainly a sophisticated, well-planned, well-financed and well-thought-out effort to cut off an entire section of the world from the global Internet.

Clearly, efforts need to be made to ensure that the intercontinental cable infrastructure of the Internet is hardened. Redundant, geographically dispersed links, with plenty of excess bandwidth, are a good start.

But development planners need to do their part, as well. Web-based applications shouldn't be crafted with the expectation of limitless bandwidth. Services and apps must be crafted so that they can fail gracefully, shift to lower-bandwidth media (such as satellite) and provide priority to business-critical operations. In short, your critical cloud-reliant apps must continue to work, when almost nothing else will.

And all this, I might add, as the industry prepares to welcome the second generation of rich Internet application tools and frameworks.

Silverlight 2.0 will debut at MIX08 next month. Adobe is upping the ante with its latest offerings. Developers will enjoy a major step up in their ability to craft enriched, Web-entangled applications and environments.

But as you make your plans and write your code, remember this one thing: The people, organization or government that most likely sliced those four or five cables in the Mediterranean and Persian Gulf -- they can do it again.

There's a couple of things to consider here, aside from the geopolitical ramifications of a concerted attack on the global IT infrastructure (which does more to damage corporations and the economy than it does to disrupt military communications, which to my understanding are mostly satellite-based).

First, this attack on the global infrastructure raises a huge issue with respect to outsourcing--if you lose touch with your development staff for a day, a week, a month (just how long does it take to lay down new trunk cable, anyway?), what sort of chaos is this going to strike with your project schedule? In The World is Flat, Friedman mentions that a couple of fast-food restaurants have outsourced the drive-thru--you drive up to the speaker, and as you place your order, you're talking to somebody half a world way who's punching it into a computer that's flashing the data back to the fast-food join in question for harvesting (it's not like they make the food when you order it, just harvest it from the fields of pre-cooked burgers ripening under infrared lamps in the back) and disbursement as you pull forward the remaining fifty feet to the first window.

The ludicrousness of this arrangement notwithstanding, this means that the local fast-food joint is now dependent on the global IT infrastructure in the same way that your ERP system is. Aside from the obvious "geek attraction" to a setup like this, I find it fascinating that at no point did somebody stand up and yell out, "What happened to minimizing the risks?" Effective project development relies heavily on the ability to touch base with the customer every so often to ensure things are progressing in the way the customer was anticipating. When the development team is one ocean and two continents away in one direction, or one ocean and a whole pile of islands away in the other direction, or even just a few states over, that vital communication link is now at the mercy of every single IT node in between them and you.

We can make huge strides, but at the end of the day, the huge distances involved can only be "fractionalized", never eliminated.

Second, as Desmond points out, this has a huge impact on the design of applications that are assuming a 100% or 99.9% Internet uptime. Yes, I'm looking at you, GMail and Google Calendar and the other so-called "next-generation Internet applications" based on technologies like AJAX. (I categorically refuse to call them "Web 2.0" applications--there is no such thing as "Web 2.0".) As much as we keep looking to the future for an "always-on" networking infrastructure, the more we delude ourselves to the practical realities of life: there is no such thing as "always-on" infrastructure. Networking or otherwise.

I know this personally, since last year here in Redmond, some stronger-than-normal winter storms knocked down a whole slew of power lines and left my house without electricity for a week. To very quickly discover how much of modern Western life depends on "always-on" assumptions, go without power to the house for a week. We were fortunate--parts of Redmond and nearby neighborhoods got power back within 24 hours, so if I needed to recharge the laptop or get online to keep doing business, much less get a hot meal or just find a place where it was warm, it meant a quick trip down to the local strip mall where a restaurant with WiFi (Canyon's, for those of you that visit Redmond) kept me going. For others in Redmond, the power outage meant a brief vacation down at the Redmond Town Center Marriott, where power was available pretty much within an hour or two of its disruption.

The First Fallacy of Enterprise Systems states that "The network is reliable". The network is only as reliable as the infrastructure around it, and not just the infrastructure that your company lays down from your workstation to the proxy or gateway or cable modem. Take a "traceroute" reading from your desktop machine to the server on which your application is running--if it's not physically in the building as you, then you're probably looking at 20 - 30 "hops" before it reaches the server. Every single one of those "hops" is a potential point of failure. Granted, the architecture of TCP/IP suggests that we should be able to route around any localized points of failure, but how many of those points are, in fact, to your world view, completely unroutable? If your gateway machine goes down, how does TCP/IP try to route around that? If your ISP gets hammered by a Denial-of-Service attack, how do clients reach the server?

If we cannot guarantee 100% uptime for electricity, something we've had close to a century to perfect, then how can you assume similar kinds of guarantees for network availability? And before any of you point out that "Hey, most of the time, it just works so why worry about it?", I humbly suggest you walk into your Network Operations Center and ask the helpful IT people to point out the Uninterruptible Power Supplies that fuel the servers there "just in case".

When they in turn ask you to point out the "just in case" infrastructure around the application, what will you say?

Remember, the Fallacies only bite you when you ignore them:

1) The network is reliable

2) Latency is zero

3) Bandwidth is infinite

4) The network is secure

5) Topology doesn't change

6) There is one administrator

7) Transport cost is zero

8) The network is homogeneous

9) The system is monolithic

10) The system is finished

Every project needs, at some point, to have somebody stand up in the room and shout out, "But how do we minimize the risks?" If this is truly a "mission-critical" application, then somebody needs the responsibility of cooking up "What if?" scenarios and answers, even if the answer is to say, "There's not much we can reasonably do in that situation, so we'll just accept that the company shuts its doors in that case".

.NET | C++ | Development Processes | Java/J2EE | Ruby | Security | XML Services

Tuesday, February 19, 2008 9:25:03 PM (Pacific Standard Time, UTC-08:00)
Comments [1]  | 
 Monday, February 18, 2008
Who herds the cats?

Recently I've been looking more closely at the various (count them, four of them) proposals for adding new features into the Java language, the "BGGA", "FCM", "CICE" and "JCA" proposals. All of them are interesting and have their merits. A few other proposals for Java 7 have emerged as well, such as extension methods, enhancements to switch, the so-called "multi-catch" enhancement to exceptions, properties, better null support, and some syntax to support lists and maps natively. All of them intriguing ideas, and highly subject to reasonable debate among reasonable people. My concern lies in a different direction.

Who herds this bunch of cats?

This isn't just a question of process within the JCP. And it's not just a question of closures or the other features we're looking at for Java 7. This is a question about the moral leadership of Java.

In the C# space, we have Anders. He clearly "owns" language, and acts as the benevolent dictator. Nothing goes into "his" language without his explicit and expressed OK. Other languages have similar personages in similar roles. Python has Guido. Perl has Larry. C++ has Bjarne. Ruby has Matz. Certainly other individuals "float" around these languages and lend their impressive weight towards the language's design--Scott Meyers, and Herb Sutter in C++, for example, or Dave Thomas and Martin Fowler in Ruby--but the core language design principles rest firmly inside the head of one man.

Whereso for Java? James Gosling? Please--Jimmy abandoned the language shortly after its release, and now only comes out every so often to launch T-shirts into the crowd, answer reporters' questions whenever something Java-related comes up, and blog his two cents' worth. He's a reminder of the "good old days", for sure, but he's not coming out with new directions of his own accord and taking the reins to lead us there. He's the Teddy Kennedy of the Java Party. His endorsement weighs in as about as influential as Bob Dole's--interesting to an analytical few, but hardly meaningful in the grand scheme of things.

Unfortunately, the two most recognized "benevolent dictators" of the Java language, Neal Gafter and Joshua Bloch, are on opposing sides of the aisle on this. Each has put forth a competing proposal for how the Java language should evolve. Each has his good reasons for how he wants to implement closures in Java. Each has his impressive list of names supporting him. It's Clinton and Obama, Java Edition. The fact is, though, that when these two disagreed on how to move forward, lots of Java developers found themselves in the uncomfortable position faced by the children when the parents fight: do you take sides? Do you try to make peace between them? Or do you just go hide your head under a pillow until the yelling stops?

This is the real danger facing Java right now: there is no one with enough moral capital and credibility in the Java space to make this call. We can take polls and votes and strawman proposals until the cows come home, but language design by committee has generally not worked well in the past. If someone without that authority ends up making the decision, it will alienate half the Java community regardless of which way the decision goes. The split is too even to expect one to come out as the obvious front-runner. And expecting a JSR committee process to somehow resolve the differences between these four proposals into a single direction forward is asking a lot.

So who makes the call?

Java/J2EE | Languages

Monday, February 18, 2008 9:47:38 PM (Pacific Standard Time, UTC-08:00)
Comments [14]  | 
Why we need both static and dynamic in the same language

Stu demonstrates one of the basic problems with an all-dynamic language: "I just spent an hour figuring out why some carefully-tested code went no-op after adding RSpec to a project." As much as I berate Stu at times (both in person and in blog), the fact is, I deeply respect and admire his programming skill, and if he can lose an hour to something that (I submit for your consideration) could have been caught by a static analysis tool fairly easily, then clearly that was a wasted hour of Stu's life. Worse, the problem is not yet solved, since now he has to make a hard choice about which definition to use, or else find a way to hack around the two definitions and create a third. Or perhaps something even uglier than this....

And this presumes that all developers using Ruby will have Stu's skill and his sense of responsibility when coming up with the solution. Asking that of all programmers across the globe is simply too much.

But clearly we cannot simply abandon the power of the dynamic language, either. Quoting again from the same source, Stu points out the very reason why dynamic languages are so powerful: "Once you start treating code as data, the elegance of your code is dependent on your skill. You cannot hide behind the limitations of your programming language anymore, because there aren't any."

What's a language designer left to do?

Choose both, of course.

The more I think about it, the more I think Cobra (and other languages) has it right: a programming language should have both static and dynamic features within it, simultaneously. This is the first "modern" language I've seen come along that espouses the "static when you can, dynamic when you want" principle as a first-class concept. Even at that, I imagine that there's much more that could be done than what Cobra espouses. Imagine combining the power of Scala's type inferencing system with the flexibility of a Groovy or Ruby.


Monday, February 18, 2008 4:22:11 AM (Pacific Standard Time, UTC-08:00)
Comments [2]  | 
Modular Toolchains

During the Lang.NET Symposium, a couple of things "clicked" all simultaneously, giving me one of those "Oh, I get it now" moments that just doesn't want to leave you alone.

During the Intentional Software presentation, as the demo wound onwards I (and the rest of the small group gathered there) found myself looking at the same source code, but presented in a variety of new ways, some of which appealed to me as the programmer, others of which appealed to the mathematicians in the room, others of which appealed to the non-programmers in the room. (I heard one of the Microsoft hosts, a non-technical program manager, I think, say, "Wow, even I could understand that spreadsheet view, and that was writing code?")

During the spreadsheet-written-in-IronPython presentation (ResolverOne), we were essentially looking at new ways of writing IronPython code, thus leveraging all the syntactic power of a programming language with a nicer front end.

During the aspect-oriented talk (the one by Stefan Wenig and Fabian Schmeid), we found ourselves looking at a tool that essentially takes compiled assemblies and weaves in additional code based on descriptors from outside that codebase; in essence, just another aspect-oriented tool.

But combine this with my own investigations into Soot, LLVM, Parrot, and Phoenix, alongside the usual discussions around the DLR, CLR, JVM and DaVinci machine, couple that with the presentation Harry gave about parser expression grammars and the research in the functional community into parser combinators, throw in the aspect-oriented and metaprogramming facilities that the Rubyists and other dynamic linguists go on for days about, and what do you end up with?

Folks, the future is in modular toolchains.

This is an oversimplification, and a radical oversimplification at that, but imagine for a moment:

  1. A parser takes your source code (let's assume it is Java, just for grins) and builds an AST out of it. Not an AST that's inherently deeply coupled to the Java language, mind you, but a general-purpose one that stands as a union of Java, C#, C++, Perl, Python, Smalltalk, and other languages. (Note that some of the linguistic concepts in some of those languages may not end up in this AST, but instead operate on the AST itself, a la C++'s template facilities.) Said parser is now finished, and can either output a binary (or potentially XML, though it'd probably be hideously verbose) version of this AST to disk for later consumption, or would more than likely be passed directly along to the next beast in the chain.
  2. In the simplest scenario, the next beast would be a code generator, which takes the AST and seeks to export some kind of back-end code out of it. Here, since we're working with a general-purpose AST, we can assume that this back-end is flexible and open, a la the Phoenix toolkit (where either native or MSIL can be generated).
  3. In a slightly more complicated scenario, verification of the correctness of the AST (against whatever libraries are specified) is checked, usually prior to code-gen, thus making this particular toolchain a statically-checked chain; were verification left out, it would need to happen at runtime, in which case we'd be talking about a dynamically-checked chain.
    Note that I stay away from the term "statically-typed" or "dynamically-typed" for the moment. That would be a measurement of the parser, not the verifier. Verification still occurs in a lot of these dynamically-typed languages, just as it does in statically-typed languages.
    Assuming the verification process succeeds, the AST can be again, written out or passed to the next step in the chain.
  4. Another potential step in the process, usually post-parser and pre-verification, would be an "aspect" step, in which a tool takes the AST, consults some external descriptors, and modifies the AST based on what it finds there. (This is how most of your non-AspectJ-like AOP tools work today, except that they have to rebuild the AST from compiled .class files or assemblies first.)
  5. Naturally, another step in the process would be an optimize step, but this has to be considered carefully, since some "high-level" optimizations can be done without regard to code-gen backend, and some will need to be done with regard to code-gen backend; for example, register spill is (from what I've heard, can't say I know too much about this) generally only useful if you know how many registers you're targeting. Plus, it's not hard to imagine certain optimizations that are only generally useful on the x86 architecture, versus those that are useful on other CPU platforms. Even operating systems I would imagine would have an impact here. (It turns out that many compiler toolchains go through a dozen or so optimization steps today, so it's not hard to imagine a "code-gen backend" being a series of a half-dozen or so targeted optimization steps before actually generating code.)
  6. Bear in mind, too, that these ASTs should have enough information to be directly executable, thus giving us an interpreter back-end instead of a code-generation back-end, a la the DLR instead of the CLR.
  7. Also, given the standard AST format, it would be relatively trivial to create a whole series of different "parser"s to get to the AST, along the lines of what the Intentional Software guys have created, thus blowing open the whole concept of "DSL" into areas that heretofore have only been imagined. You still get the complete support of the rest of the toolchain, which is what makes the whole DSL concept viable in the first place, including aspects and verification and your choice of either interpretation or compilation.
  8. While we're at it, bear in mind that this AST could/should also be reachable from within the code itself, thus giving languages that want to operate on their own AST at runtime the ability to do so, because the AST is in a standard format and the interpreter could be bundled as part of the generated executable, thus providing a compile-when-you-can-interpret-when-you-must flavor that is currently the reigning meme in language/platform environments like JRuby. (It would also have the happy side effect of making Paul Graham shut up about Lisp, at least for a while. Yes, Paul, code-as-data, it's brilliant, it's wonderful, we get it.)
  9. Nothing says this toolchain needs be one-way, by the way: many of the toolkits I mentioned before (LLVM, Phoenix, Soot) can start from compiled binary and work back to AST, thus offering us the opportunity to do surgery of either the exploratory kind (static analysis) or the manipulative kind (aspect-weaving, etc) on compiled code in a relatively clean way. Reflector demonstrates the power of being able to go "back and forth" in this way (even in the relatively limited way Reflector does so), so imagine how powerful it would be to do this from end-to-end throughout the toolchain.

How likely is this utopian vision? I'm not sure, honestly--certainly tools like LLVM and Phoenix seem to imply that there's ways to represent code across languages in a fairly generic form, but clearly there's much more work to be done, starting with this notion of the "uber-AST" that I've been so casually tossing around without definition. Every AST is more or less tied to the language it is supposed to represent, and there's clearly no way to imagine an AST that could represent every language ever invented. Just imagine trying to create an AST that could incorporate Java, COBOL and Brainf*ck, for example. But if we can get to a relatively stable 80/20, where we manage to represent the most-commonly-used 80% of languages within this AST (such as an AST that can incorporate Java, C#, and C++, for starters), then maybe there's enough of a critical mass there to move forward.

Now all I need to do is find somebody who'll fund this little bit of research... anybody got a pile of cash they don't know what to do with? :-)

Update: By the way, in case you want a graphical depiction of what I'm thinking about, the Phoenix page has one (though obviously it's limited to the Phoenix scope of vision, and you may have to be a Microsoft CONNECT member to see it).

.NET | C++ | Flash | Java/J2EE | Languages | Mac OS | Parrot | Ruby

Monday, February 18, 2008 1:55:53 AM (Pacific Standard Time, UTC-08:00)
Comments [1]  | 
 Sunday, February 10, 2008
An Appeal:

Long-time readers of this blog know that as a general rule, I try not to include much in the way of personal stuff here; I try (sometimes with more success than others) to keep the subject material focused on the technology space: Java, .NET, Ruby, languages, XML services, and so on.

This, however, is a deviation from that norm.

A near and dear friend of mine has asked that I help spread the word about the disappearance of a family member (a cousin, in fact). I don't know the details of the disappearance other than what anybody else can read on the website, but I do know that if someone in my family were to go missing for an inexplicable reason, I would want the help of anybody and everybody I knew to try and find them.

I would like to ask everyone's help in finding my brother John. He went missing January 28, 2008 and official search efforts were called off last Friday, even though the family has mounted their own search.   Please go to to see a picture of John and print off a flyer.   If you could put it in your car window or some other visible place, it would help us a lot.   It is possible that he could have traveled out of the area where he went missing, so we are trying to get the word out on a national level, to cover all possible scenarios.   Thank you all for your help.

Donna Jean Glasgow

February 6, 2008

If you reside near the Little Rock, Arkansas area in particular, please take a look at the photo below ...


... and let somebody (either me or through the above-mentioned website) know if you've seen him, one way or another.

I won't ask you to forward this to everyone you know; instead I just ask that if you feel a twinge of sympathy for a missing family member that's connected to you through less than two degrees of separation, then do what you think would help.

Thanks for your time.

Sunday, February 10, 2008 6:09:10 AM (Pacific Standard Time, UTC-08:00)
Comments [0]  | 
 Sunday, February 03, 2008
Maybe 'twould be better to suggest "done like the Giants"

Wow. Giants 17, Patriots 14, when just about everybody had the Patriots by two touchdowns or so.

Just goes to show, shouldn't count the little guy out 'til the fat lady sings and the cows come home.

Also just goes to show, I shouldn't be blogging after an emotional heart-jerker like that one.

Sunday, February 03, 2008 9:38:19 PM (Pacific Standard Time, UTC-08:00)
Comments [0]  | 
 Saturday, February 02, 2008
My Secret (?) Shame (Or, Building Parrot 0.5.2)

OK, after a week of getting the Internet equivalent of Bad Mojo being sent my way by every Perl developer on the planet, I have to admit something that may strike readers as inconsistent and incongruous.

I want Parrot to work.

I don't really care about Perl 6, per se. As I've said before, the language has a lot of linguistic inconsistencies and too many violations of the the Principle of Least Surprise to carry a lot of favor with me. Whether Perl-the-language lives or dies really doesn't make a significant dent in my life.

But Parrot.... now there's something I care about.

Following the open debate on Perl (a surprising side-effect, given the subject matter of the post that spawned it), and chromatic's insistence that Parrot development was moving along, I decided to give in to my secret hopes, and pull the Parrot bits down again for a look-see.

In the spirit of the OpenJDK post last month, this is a quick chronicle of how I got Parrot to build on a Win32 system.

Installation details

Just for the record, I'm doing this in a VMWare image (one in which I keep all the languages I play with) with both Visual Studio 2008 and Visual Studio 2005 installed. The Parrot docs explicitly reference using Visual Studio 2003 (or the free Visual C++ Toolkit, which has since turned into Visual C++ 2005 Express), but I'm going to first have a shot at it with VS 2008 before falling back to VS 2005. This shouldn't make any difference, because 2008 is supposed to be a superset of 2005, but... well, you know how that old chestnut goes.

svn co parrot

Checking Parrot's code out is easy: just svn co parrot-svn . (I use the -svn suffix on directories to distinguish between svn-pulled source trees and downloaded source trees. Helps in case I ever need/want to pull down a named release and keep the svn-pulled source at the same time.) I pull all this into a directory underneath C:\Prg, so the total path to Parrot's source base is C:\Prg\parrot-svn.


From there, as with many Unix-based projects, you have to run the "" script. I opened up a VS 2008 Command Prompt, and used ActiveState's Perl [1] to run the Configure script. It chugs away and comes back with this message:

Parrot Version 0.5.2 Configure 2.0
Copyright (C) 2001-2008, The Perl Foundation.

Hello, I'm Configure. My job is to poke and prod your system to figure out
how to build Parrot. The process is completely automated, unless you passed in
the `--ask' flag on the command line, in which case I'll prompt you for a few
pieces of info.

Since you're running this program, you obviously have Perl 5--I'll be pulling
some defaults from its configuration.

Checking MANIFEST.....................................................done.
Setting up Configure's default values.................................done.
Setting up installation paths.........................................done.
Tweaking settings for miniparrot...................................skipped.
Loading platform and local hints files................................done.
Finding header files distributed with Parrot..........................done.
Determining what C compiler and linker to use.........................done.
Determining whether make is installed..................................yes.
Determining whether lex is installed...............................skipped.
Determining whether yacc is installed..............................skipped.
Determining if your C compiler is actually
Determining whether libc has the backtrace* functions (glibc only)
Determining Fink location on Darwin................................skipped.
Determining if your C compiler is actually Visual C++..................yes.
Detecting compiler attributes (-DHASATTRIBUTE_xxx)....................done.
Detecting supported compiler warnings (-Wxxx)......................skipped.
Determining flags for building shared libraries.......................done.
Determine if parrot should be linked against a shared library..........yes.
Determining what charset files should be compiled in..................done.
Determining what encoding files should be compiled in.................done.
Determining what types Parrot should use..............................done.
Determining what opcode files should be compiled in...................done.
Determining what pmc files should be compiled in......................done.
Determining your minimum pointer alignment......................... 1 byte.
Probing for C headers.................................................done.
Determining some sizes................................................done.
Computing native byteorder for Parrot's wordsize.............little-endian.
Test the type of va_ptr (this test is likely to segfault)............stack.
Figuring out how to pack() Parrot's types.............................done.
Figuring out what formats should be used for sprintf..................done.
Determining if your C library has a working
Determining CPU architecture and OS...................................done.
Determining architecture, OS and JIT capability.......................done.
Generating CPU specific stuff.........................................done.
Verifying that the compiler supports function pointer casts............yes.
Determining whether your compiler supports computed
Determining if your compiler supports inline...........................yes.
Determining what allocator to use.....................................done.
Determining if your C library supports
Determining some signal stuff.........................................done.
Determining whether there is
Determining if your C library has setenv / unsetenv...............unsetenv.
Determining if your platform supports
Determining if your platform supports
Determining if your platform supports
Determining if your platform supports
Testing snprintf......................................................done.
Determining whether perldoc is installed...............................yes.
Determining whether python is installed.........................yes, 2.5.1.
Determining whether GNU m4 is installed................................yes.
Determining whether (exuberant) ctags is
Determining Parrot's revision.......................................r25452.
Determining whether ICU is installed................................failed.
Generating C headers..................................................done.
Generating core pmc list..............................................done.
Generating runtime/parrot/include.....................................done.
Configuring languages.................................................done.
Generating makefiles and other build files............................done.
Moving platform files into place......................................done.
Recording configuration data for later retrieval......................done.
Okay, we're done!

You can now use `nmake' to build your Parrot.
After that, you can use `nmake test' to run the test suite.

Happy Hacking,
        The Parrot Team


Looks good so far. I kick off nmake (which is still running as I write this). Note that the Configure script discovers ActiveState's Perl as part of its rummaging around on my system, so that's what it uses to do the build steps that require execution of Perl. I have no idea what the least-acceptable version of AS Perl is, but the version I pulled down was probably about a year ago.

(Note: I have to admit, the Configure stuff is slick. I don't like opening those files and looking at what's in there, but you'll never hear me criticize the existence of Perl, for this reason alone: having a scripting language that can rummage around your machine and figure out the paths to all the cr*p it needs to build is a hideously useful thing. I do admit to wishing those scripts were written in something I feel better about reading, though, like Ruby, but this is a practice that far pre-dates me, so I'll just shut up and ride along because I find it useful when it works. As it does here.)

Note to the Parrot guys: under VS 2008, the build generates a ton of warnings. Most of all, VS 2008 complains about the use of the Wp64 flag, which it says is deprecated and will be removed in a future release. (Chromatic, if you want a full build log, I can clean-and-build again and send you the piped output, if it'll help.)

After about 10 minutes of disk churn and a ton of warnings reported (most of which seem to be just three or four warnings being repeated throughout the code, so either it's something in a couple of headers files that're included from everywhere, or these are spurious warnings that could be turned off via a #pragma)... success! I have a parrot.exe, along with a few other .exe utilities, in the root of the parrot-svn directory.

Next step: "nmake test".

Well, clearly parrot must be working pretty well, because it's churning through a ton of tests with "ok" results for everything except that which is platform-specific (a la the Fink tests intended for Darwin/Mac OS X, which are obviously going to fail on my XP box and therefore get skipped). A couple of tests get skipped (in the compilers tree?) with explanations that I don't quite understand, but it doesn't look like these are errors, per se, so I'm willing to accept on faith that we're all kosher. So while the tests are still running, I'll post this and offer up kudos to chromatic and the crew for something that at least builds, runs, and passes a whole slew of unit tests. Now for the fun part--finding out how extensive PMC, PIR and PASM are, and thinking about how this VM fits in the Grand Scheme of Things against the Da Vinci Machine and the DLR and the JVM and the CLR.... :-)

(Note to self: must suggest to John Lam and the guys on the DLR team to invite chromatic up to the Lang.NET 2009 Symposium. If the Sun folks can be made to feel welcome on the Microsoft campus for this kind of event, then surely the Parrot guys can come and feel welcome and--hopefully--carry away some interesting ideas, too.)

Update: Well, might have spoken too soon, looks like the tests failed after all. To be exact, the tests hung for a while, and I Ctrl-C'ed the process because it didn't look like it was going anywhere; this is the last few lines:

        1/22 skipped: test not written
t/library/pcre...............................NOK 1#     Failed test (t/library/p
cre.t at line 48)
# Exited with error code: 1
# Received:
# ok 1
# ok 2
# Null PMC access in invoke()
# current instr.: 'parrot;PCRE;compile' pc 118 (C:\Prg\parrot-svn\runtime\parrot
# called from Sub 'main' pc 83 (C:\Prg\parrot-svn\t\library\pcre_1.pir:49)
# Expected:
# ok 1
# ok 2
# ok 3
# ok 4
# ok 5
# Looks like you failed 1 test of 1.
        Test returned status 1 (wstat 256, 0x100)
        Failed 1/1 tests, 0.00% okay
t/library/pg.................................Terminating on signal SIGINT(2)
NMAKE : fatal error U1077: NMAKE : fatal error U1058: terminated by user


Not sure what this means, but bear in mind, this is off today's tip, so it may be a temporary thing.




[1] Why, you may ask, do I have Active State's Perl installed if I so despise the language? Rotor (SSCLI 2.0) uses it as part of its build process, and I like spelunking with Rotor, as some of you will have noticed.

Languages | Parrot | Windows

Saturday, February 02, 2008 6:43:19 PM (Pacific Standard Time, UTC-08:00)
Comments [2]  | 
Diving into the Grails-vs-Rails wars (Or, Here we go again....)

Normally, I like to stay out of these kinds of wars, but this post by Stu (whom I deeply respect and consider a friend, though he may not reciprocate by the time I'm done here) just really irked me somewhere sensitive. I'm not entirely sure why, but something about it just... rubbed me the wrong way, I guess is the best way to say it.

Let's dissect, shall we?

Stu begins with the following two candidates:

1. Joe has a problem to solve. The problem is specific, the need is immediate, and the scope is well-contrained.
2. Jane has a problem to solve. The problem is poorly understood, the need is ongoing, and the scope is ambiguous.

For starters, Joe doesn't exist. Or rather, exists only in the theoretical. Of course, neither does Jane really exist, either. Fact is, almost all projects are a combination of Joe and Jane. More importantly, Stu's efforts here to force people into the "either/or" approach to categorization is a subtle (or perhaps not so) ploy to force people into the decision-making path he thinks should be taken.

It's sort of like saying, most people fall into two categories:

  1. Joe lives in Ghettopia, where all the men are dumb, the women are ugly, and the children are rejects from the ADHD Clinic.
  2. Jane lives in Utopia, where all the men are smart, the woman are good-looking, and the children are well-behaved.

Think about it: you're at work, you have a project, and you happen across Stu's page. Faced with the typical project (too little time, too few resources, too vague in the understanding of requirements and domain comprehension), with whom are you likely to identify? Disturblingly happy Joe, who has a specific problem in a well-constrained scope? Hardly. So from the beginning, you're expected to identify with Jane, which (not surprisingly) leads you into Stu's preferred conclusion.

He goes on:

How should Joe and Jane think differently about software platforms?

   1. Joe's platform needs to be mainstream. It needs to offer immediate productivity, and the toolset should closely match the problem. Also, Joe doesn't want to climb a learning curve.
   2. Jane's needs are quite the opposite. Jane needs flexibility. She needs glue that doesn't set. She needs a way to control technical debt (Joe doesn't care.)

For my part, I am interested in Jane's problems. (And anyway, Joe often discovers he is actually Jane midway through projects.)

Hey, Stu, quick reality check for ya: most developers want all of the above. It's not a binary choice, productivity and toolset vs. flexibility and dynamism. The fact is, the Java language has a degree of flexibility, just not as much as is offered by the Ruby language. For that matter, if you want real flexibility, maybe you oughta look into Lisp, or even Smalltalk, since it (ST) can get at the underlying stack frames from the ST language itself! Now that's flexibility you Ruby guys can only dream of. (Oh, I know, Rubinius will give you that flexibility. Someday. Justin even alludes to how Rubinius is essentially an attempt to recapture that dynamism from Smalltalk. Ironic, then, isn't it, that the guys who wrote the fastest Smalltalk VM on the planet (Strongtalk, which is open-source now, by the way) ended up working at Sun... on the thing that later came to be called Hotspot? You think maybe they have a little familiarity and experience with VMs?)

And that crack about "control technical debt (Joe doesn't care)"?


Let me repeat that in case you missed it: BULL-SHIT.

Joe and Jane both care about technical debt. Each may be willing to spend their currency on different problems, granted, but both of them care about technical debt. Not caring about technical debt is what got Chandler into trouble, and it had nothing to do with language or tools whatsoever. It's insulting to suggest that either of them don't care about technical debt, particularly the guy that chooses differently than you.

(Shame on you, Stu. You know better. Quit trolling.)

We continue:

So how does this affect platform choice? If you are Joe, you care about specific details about what a toolset can do right now. Most of Graeme's Top 10 reasons are in the "Right here, right now" category. This is true regardless of whether you think he is right. (Sometimes he is, sometimes not.)

I'll grant you, some of Graeme's Top 10 reasons are a bit spurious, and Stu-and-company do a good job of pointing those out. Frankly, anybody who makes a technical selection based on version numbers or whether or not a book exists for it seems to be missing the point, if you ask me. Of far greater concern is the stability of the language/tool, or the wealth of documentation for it. (And yes, this may seem to fly in the face of my arguments against Parrot a few posts ago; actually, it's not. If Parrot were more stable and/or more fully fleshed out, and the version updates just kept going, I'd be happy to say, "Go get this thing and give it a spin". But it doesn't feel stable to me, so I can't.)

But Stu's argument here is spurious: I don't care if you're Joe or if you're Jane, you always care about specific details about what a toolset can do, right now or otherwise. Certain concerns may be concerns that you can put off until later, but those concerns are always a part of the platform selection. Consider a hypothetical for a second: you currently are developing on Windows, and your project will run on Windows servers, with a possibility that it may need to run on non-Windows servers at some point in the future. Do you consider .NET or not? This is exactly the kind of detail that needs to be discussed--how likely is the move to a non-Windows server going to be? If it's <25%, then the CLR and ASP.NET might be a good choice, particularly if your developers are less "plumbing wonk" than "GUI designer", and you rely on being able to move the assemblies to a non-Windows server later via Mono.

Note: I'm not suggesting this a good choice in all scenarios. I'm making the point that the details of the toolset matter in your choice of toolsets, based on what your particular project needs are.

Jane cares just as much about toolset details as Joe does. I can't imagine a scenario where either of them don't care.

To continue:

My advice to Joe: Know exactly what you need, and then pick the platform that comes closest to solving it out of the box. Depending on Joe's needs, either Rails or Grails might be appropriate (or neither!). A particular point in Grails' favor would be an established team of Spring ninjas.

"Know exactly what you need"? Ah, right, because Joe belongs to that .01% of projects that have "specific problems, immediate need, and well-constrained scope". Nothing like conceding a point to the other guys, in preparation for the "killer blow":

If you are Jane, you care more about architecture. I mean this term in two senses:

   1. Architecture: the decisions you cannot unmake easily.
   2. Architecture: the constraints on how you think and work.

If you are Jane, you care about how and why the platform was assembled, because you are likely to have to adapt it quite a bit.

You know, I don't think I've ever been on a project where I didn't care about architecture or in having to "adapt it quite a bit". Of course, back in the days when I was writing C++, this meant either subclassing CWnd or TWindow in interesting ways, or else sometimes even going so far as to reach into the source code and making some tweaks, either at compile-time or through some well-established hackery. (Yes, I wrote a template class called THackOMatic that allowed me to bang away on private fields. Sue me. It worked, I documented the hell out of it, and ripped the hack back out once the bug was fixed.) Point is, both Joe and Jane care about the architecture.

Now, I think what Stu means here is that the architecture of the web framework is more malleable in Rails than it is in Grails, because Rails is written on top of Ruby and Grails is written on top of Groovy, Spring, the JEE container architecture, and Java:

Most of the commenters on my earlier post (and Graeme in his addendum) correctly identified the real architectural difference between Grails and Rails. Rails builds on Ruby, while Grails builds on Groovy and Spring.

Yes! I agree with this so far. (In fact, everybody should, because these are simple statements of fact.) But then Stu takes the cake for the Best Parting Non-Supported Shot Ever:

Rails wins this architecture bakeoff twice:

    * Ruby is a better language than Groovy.
    * Spring does most of its heavy lifting in the stable layer, which is not the right place.


Ruby is perhaps a more flexible language than Groovy (and that's an arguable point, folks, and one which I really don't care to get into), but Ruby also runs on a less-flexible and less-scalable and less-supported platform than Groovy. I dunno that this makes Ruby better. It simply makes it different. Try convincing your IT guys to add yet another platform into their already-overwhelmingly complex suite of tools, particularly given the surprisingly sparse amount of monitoring information that Ruby platform offers. Stu may want to argue that Ruby-the-language is more flexible, regardless of what platform it runs on, and if so, then we're arguing languages not platforms, and while he might win much of his "Ruby is a better language than Groovy" argument, he's going to lose the "Ruby is more dynamic than Groovy", because on the JVM they have to be implemented under the same set of restrictions. You can't have it both ways.

(By the way, if you're one of those Ruby/Rails enthusiasts who's going to counterclaim that "Ruby-meaning-MRV is fast enough", I've heard the argument, and I think it's specious and ignorant. "Fast enough" is an argument that rests on your project being able to remain within the expected performance and scalability curve known at the beginning of the project, and remember, Jane's problem is that she doesn't know those sorts of things yet. So either you know, and have some better scope around the problem than Stu gives credit to Jane for having, or else you don't know, and can't assume that the Ruby interpreter will be able to handle the load.)

And WTF is up with the idea that "Spring does most of its heavy lifting in the stable layer, which is not the right place"? I think Stu means to say that Spring is a static layer, not stable layer[1], because hey, stability is kinda important to a few folks. (I'll give Stu the benefit of the doubt here and assume he cares about stability, too. I know his customers do.) Spring has its flaws, mind you, but arguing that it's not up to the heavy lifting seems to be like arguing that Java cannot scale. (Even Microsoft has given up on that argument, by the way.)

The worst part of this is, I've had discussions like this with Stu in the past, and he's much more articulate about it in person than he is in this blog post. Frankly, I think the most interesting space here is the intersection of Graeme's and Stu's positions, which is to say JRuby (and IronRuby or Ruby.NET, but that's for a different platform and out of the scope of this discussion entirely... yet still compelling and relevant, strangely enough). At the end of the day, these arguments about "my web framework is better than your web framework" are really just stupid. (As long as you're not trying to claim that Perl is the best web framework, anyway. Yes, Perl enthusiasts, I'm picking on you.)

My advice to Jane: Rails over Grails.

My advice to Jane: pick a consulting firm that doesn't have preconceived dogma about which web framework... or language, or any other toolset... to use. [2]

And if Jane can't afford a consulting firm, then Jane needs to do the research on her own and make her own decision based on the problem set, the context, and the whole range of tools available to her. (Anybody making a decision based solely on the basis of a blog-post-flame-war deserves what they get, regardless.)

As for Joe? Well, Joe could probably benefit from the goodness inherent in the dynamic languages that are popping up all over the place, too, not to mention the goodness inherent in the type-inferred languages that are starting to poke their heads through the Barrier of Adoption, all the while not ignoring the fact that he could probably benefit from the inherent performance and scalability of the major virtual machine technologies that have been a decade or more in production...

Meaning Joe probably needs to go through the same decision-making criteria Jane does. Thank God both of them, it turned out, work on the same project, as is often the case.

Meanwhile, I'm done with this thread. It's a pointless, stupid argument. Use the right tool for the job. Or, if you prefer, "From each language, according to its abilities, to each project, according to its needs."

Just remember that both shipping and supporting are features, too. Don't neglect the other in favor of the one.




[1] Yes, I saw the hyperlink to Ola's post about languages, and his definitions therein. Ironically, Ola's own comments there state that "Java is really the only choice here", which directly contradicts Stu's choice of MRV (the native Ruby interpreter). More importantly, I think Stu's point is resting on the static nature of the Java layer in Groovy, and while it's certainly more flexible to be able to hack at any layer of the stack, this is only realistically possible in small applications--this isn't my opinion, it's the opinion of Gregor Kiczales, who spent many years in CLOS and determined that CLOS's extremely flexible MOP system (more so than what Ruby currently supports, in fact) led to inherent problems in larger-scale projects. It was this thought that led him to create AspectJ in the first place.

[2] By the way, if there's any temptation in you[3] to post commentary and say, "Dude, you just don't understand Ruby" or "How can you agree with Graeme this way?", just don't. I do understand Ruby, and I like the language. (Much more than I do Rails, anyway.) And I'm not intrinsically agreeing that Grails is better than Rails, because I don't believe that, either. I believe in the basic equation that says the solution you pick is the one that is the right solution to the given problem in the stated context that yields the most desirable consequences.

[3] This includes you, Stu. Or Justin, or Graeme, or anybody working for Relevance, or anybody working for G2One, Inc.

.NET | C++ | Java/J2EE | Languages | Ruby | Windows

Saturday, February 02, 2008 3:14:20 AM (Pacific Standard Time, UTC-08:00)
Comments [4]  | 
 Friday, February 01, 2008
Latest installment of "Pragmatic Architecture" (Data Access) is up ...

... here. (Yes, it's an MSDN web page, but the article itself--as have all of its brethren in the series--is actually quite technology-neutral.) Enjoy and flame away....

Friday, February 01, 2008 9:57:38 PM (Pacific Standard Time, UTC-08:00)
Comments [1]  |