|
JOB REFERRALS
|
|
|
|
ON THIS PAGE
|
|
|
|
|
ARCHIVES
|
| July, 2008 (8) |
| June, 2008 (5) |
| May, 2008 (10) |
| April, 2008 (13) |
| March, 2008 (11) |
| February, 2008 (18) |
| January, 2008 (17) |
| December, 2007 (12) |
| November, 2007 (2) |
| October, 2007 (6) |
| September, 2007 (1) |
| August, 2007 (2) |
| July, 2007 (7) |
| June, 2007 (1) |
| May, 2007 (1) |
| April, 2007 (2) |
| March, 2007 (2) |
| February, 2007 (1) |
| January, 2007 (16) |
| December, 2006 (3) |
| November, 2006 (7) |
| October, 2006 (5) |
| September, 2006 (1) |
| June, 2006 (4) |
| May, 2006 (3) |
| April, 2006 (3) |
| March, 2006 (17) |
| February, 2006 (5) |
| January, 2006 (13) |
| December, 2005 (2) |
| November, 2005 (6) |
| October, 2005 (15) |
| September, 2005 (16) |
| August, 2005 (17) |
|
|
|
CATEGORIES
|
|
|
|
|
BLOGROLL
|
|
|
|
|
LINKS
|
|
|
|
|
SEARCH
|
|
|
|
|
MY BOOKS
|
|
|
|
|
DISCLAIMER
|
Powered by:
newtelligence dasBlog 1.9.7067.0
The opinions expressed herein are my own personal opinions and do not represent
my employer's view in any way.
© Copyright
2008
,
Ted Neward
E-mail
|
|
|
|
|
 Friday, February 22, 2008
|
More language features revisited
|
|
Since we're examining various aspects of the canonical O-O language (the three principals being C++, Java and C#/VB.NET), let's take in a review of another recent post, this time on the use of "new" in said languages. All of us have probably written code like this: Foo f = new Foo(); And what could be simpler? As long as the logic in the constructor is simple (or better yet, the constructor is empty), it would seem that the simplest code is the best, so just use the constructor. Certainly the MSDN documentation is rife with code that uses public constructors. You can probably find plenty of public constructors used right here on my blog. Why invest the effort in writing (and using) a factory class that will probably never do anything useful, other than call a public constructor? In his excellent podcast entitled "Emergent Design: The Evolutionary Nature of Software Development," Scott Bain of Net Objectives nevertheless makes a strong case against the routine use of public constructors. The problem, notes Scott, is that the use of a public constructor ties the calling code to the implementation of Foo as a concrete class. But suppose that you later discover that there need to be many subtypes of Foo, and Foo should therefore be an abstract class instead of a concrete class--what then? You've got a big problem, that's what; a lot of client code that has been making use of Foo's public constructor suddenly becomes invalid. I just love it when people rediscover advice that they could have had much earlier, had they only been aware of the prior art in the field. I refer the curious C#/VB.NET developer to the book Effective Java, by Joshua Bloch, in which Item 1 states, "Consider providing static factory methods instead of constructors". Quoting from said book, we see: One advantage of static factory methods is that, unlike constructors, they have names. If the parameters to a constructor do not, in and of themselves, describe the object being returned, a static factory with a well-chosen name can make a class easier to user and the resulting client code easier to read. ... A second advantage of static factory methods is that, unlike constructors, they are not required to create a new object each time they're invoked. This allows immutable classes (Item 13) to use preconstructed instances or to cache instances as they're constructed and to dispense these instances repeatedly so as to avoid creating unnecessary duplicate values. ... A third advantage of static factory methods is that, unlike constructors, they can return an object of any subtype of their return type. This gives you great flexibility in choosing the class of the returned object. ... The main disadvantage of static factory methods is that classes without public or protected constructors cannot be subclassed. The same is true for nonpublic classes returned by public static factories. A second disadvantage of static factory methods is that they are not readily distinguishable from other static methods. They do not stand out in API documentation the way that constructors do. C# and VB.NET developers are encouraged to read the book to discover about 30 or so other nuggets of wisdom that are directly applicable to the .NET framework. Note that Josh is in the process, this very month, of revising the book for rerelease as a second edition, taking into account the wide variety of changes that have taken place in the Java language since EJ's initial release. Meanwhile.... One thing that's been nagging at me is how I think Java and C# missed the boat in respect to the various ways we'd like to construct objects. The presumption was always that allocation and initialization would (a) always take place at the same time, and (b) always take place in the same manner--the underlying system would allocate the memory, the object would be laid out in this newly-minted chunk of heap, and your constructor would then initialize the contents. Neither assumption can be taken to be true, as we've seen over the years; the object may need to come from pre-existing storage (a la the object cache), or the object may need to be a derived type (a la the covariant return Josh mentions in #3 advantage above), or in some cases you want to mint the object from an entirely different part of the process. C++ actually had an advantage over C# and Java here, in that you could overload operator new() for a class (which then meant you had to overload operator delete(), and oh-by-the-way don't forget to overload array new, that is, operator new[]() and its corresponding twin, array delete, operator delete[](), which was a bit of a pain) to gain better control over both allocation and initialization, to a degree. Initially we always used it to control allocation--the idea being one would create a class-specific allocator, on the grounds that knowing some of the assumptions of the class, such as its size, would allow you to write faster allocation routines for it. But one of the rarely-used features of operator new() was that it could take additional parameters, using a truly obscure syntactic corner of C++: 1: void* operator new(size_t s, const string& message) 2: { 3: cout << "Operator new sez " << message << endl; 4: // allocate s bytes and return; Foo ctor will be invoked automagically 5: } 6: Foo* newFoo = new ("Howdy, world!") Foo();
Officially, one such overloaded operator was recognized, the placement new operator, which took a void* as a parameter, indicating the exact location in which your object was to be allocated and thus laid down. This meant that C++ developers could allocate from some other part of the process (including shudder a pointer they'd made up out of thin air) and drop the initialized object right there. While useful in its own right, placement new opened up a whole new world of construction options to the C++ developer that we never really took advantage of, since now you could pass parameters to the construction process without involving the constructor.
That's kind of nifty, in an obscure and slightly terrifying fashion. One thought I'd always had was that it would be cool if a C++ O/R-M overloaded operator new() for database-bound objects to indicate which database connection to use during construction:
1: DBConnection conn; 2: 3: Person* newFoo = new (conn) Person("Ted", "Neward");
Of course, such syntax has the immediate drawback of eliciting a chorus of "WTF?!?" at the next code review, but still....
Meanwhile, other languages choose to view new as one of those nasty static methods Gilad dislikes so much, Ruby and Smalltalk being two of them. That is to say, construction now basically calls into a static method on a class, which has the nice effect of keeping the number of "special" parts of the language to a minimum (since now "new" is just a method, not a keyword), makes it easier to have different-yet-similar names to represent slightly different concepts ("create" vs "new" vs "fetch" vs "allocate", and so on) sitting side by side, and helps eliminate Josh's second disadvantage above. I'm not certain how exactly this could eliminate Josh's first disadvantage (that of inheritance and inaccessible constructors), but it's not entirely unimaginable that the language would have a certain amount of incestuous knowledge here to be able to reach those static method (constructors) in the same way it does currently.
(It actually works better if they aren't static methods at all, but instance methods on class objects, to which the language automatically defers when it sees a "classname.new"; that is, when it sees
Person ann = Person.new("Ann", "Sheriff");
the language automatically changes this to read:
Person ann = Person.class.new("Ann", "Sheriff");
which would be eminently doable in Java, were class objects available for modification/definition somehow. In a language built on top of the JVM or CLR, the class object would be a standalone singleton, a la "object" definitions in Scala.)
|
 Thursday, February 21, 2008
|
Static considered harmful?
|
|
Gilad makes the case that static, that staple of C++, C#/VB.NET, and Java, does not belong: Most imperative languages have some notion of static variable. This is unfortunate, since static variables have many disadvantages. I have argued against static state for quite a few years (at least since the dawn of the millennium), and in Newspeak, I’m finally able to eradicate it entirely. I think Gilad conflates a few things, but he's also got some good points. To the dissecting table! To begin: Static variables are bad for security. See the E literature for extensive discussion on this topic. The key idea is that static state represents an ambient capability to do things to your system, that may be taken advantage of by evildoers. Eh.... I'm not sure I buy into this. For evildoers to be able to change static state, they have to have some kind of "poke" access inside the innards of your application, and if they have that, then just about anything is vulnerable. Now, granted, I haven't spent a great deal of time on the E literature, so maybe I'm missing the point here, but if an attacker has data-manipulability into my program, then I'm in a whole world of pain, whether he's attacking statics or instances. Having said that, statics have to be stored in a particular well-known location inside the process, so maybe that makes them a touch more vulnerable. Still, this seems a specious argument. Static variables are bad for distribution. Static state needs to either be replicated and sync’ed across all nodes of a distributed system, or kept on a central node accessible by all others, or some compromise between the former and the latter. This is all difficult/expensive/unreliable. Now this one I buy into, but the issue isn't the "static"ness of the data, but the fact that it's effectively a Singleton, and Singletons in any distributed system are Evil. I talked a great deal about this in Effective Enterprise Java, so I'll leave that alone, but let me point out that any Singleton is evil, whether it's represented in a static, a Singleton object, a Newspeak module, or a database. The "static"ness here is a red herring. Static variables are bad for re-entrancy. Code that accesses such state is not re-entrant. It is all too easy to produce such code. Case in point: javac. Originally conceived as a batch compiler, javac had to undergo extensive reconstructive surgery to make it suitable for use in IDEs. A major problem was that one could not create multiple instances of the compiler to be used by different parts of an IDE, because javac had significant static state. In contrast, the code in a Newspeak module definition is always re-entrant, which makes it easy to deploy multiple versions of a module definition side-by-side, for example. Absolutely, but this is true for instance fields, too--any state that is modified as part of two or more method bodies is vulnerable to a re-entrancy concern, since now the field is visibly modified state to that particular instance. How deeply do you want your code to be re-entrant? Gilad's citation of the javac compiler points out that the compiler was hardly re-entrant at any reasonable level, but the fact is that the compiler *could* have been used in a parallelized fashion using the isolational properties of ClassLoaders. (Its ugly, and Java desperately needs Isolates for that reason.) Static variables are bad for memory management. This state has to be handed specially by implementations, complicating garbage collection. The woeful tale of class unloading in Java revolves around this problem. Early JVMs lost application’s static state when trying to unload classes. Even though the rules for class unloading were already implicit in the specification, I had to add a section to the JLS to state them explicitly, so overzealous implementors wouldn’t throw away static application state that was not entirely obvious. This one I can't really comment on, since I'm not in the habit of writing memory-management code. I'll take Gilad's word for it, though I'm curious to know why this is so, in more detail. Static variables are bad for for startup time. They encourage excess initialization up front. Not to mention the complexities that static initialization engenders: it can deadlock, applications can see uninitialized state, and unless you have a really smart runtime, you find it hard to compile efficiently (because you need to test if things are initialized on every use). I'm not sure I see how this is different for any startup/initialization code--anything that the user can specify as part of startup will run the risk of deadlocks and viewing uninitialized state. Consider the alternative, however--if the user didn't have the ability to specify startup code, then they would have to either write their own, post-runtime, startup code, or else they have to constantly check the state of their uninitialized objects and initialize them on first use, the very thing that he claims is hard to compile efficiently. Static variables are bad for for concurrency. Of course, any shared state is bad for concurrency, but static state is one more subtle time bomb that can catch you by surprise. Absolutely: any shared state is bad for concurrency. However, I think we need to go back to first principles here. Since any shared state is bad for concurrency, and since static data is always shared by definition, it follows that static data is bad for concurrency. Pay particular attention to that chain of reasoning, however: any shared state is bad for concurrency, whether it's held by the process in a special non-instance-aligned location or in an data store that happens to be reachable from multiple paths of control. This means that your average database table is also bad for concurrency, were it not for the transactional protections that surround the table. This isn't an indictment of static variables, per se, but of shared state. Gilad goes on to describe how Newspeak solves this problem of static: It may seem like you need static state, somewhere to start things off, but you don’t. You start off by creating an object, and you keep your state in that object and in objects it references. In Newspeak, those objects are modules. Newspeak isn’t the only language to eliminate static state. E has also done so, out of concern for security. And so has Scala, though its close cohabitation with Java means Scala’s purity is easily violated. The bottom line, though, should be clear. Static state will disappear from modern programming languages, and should be eliminated from modern programming practice. I wish Newspeak were available for widespread use, because I'd love to explore this concept further; in the CLR, for example, there is the same idea of "modules", in that modules are singleton entities in which methods and data can reside, at a higher level than individual objects themselves. Assemblies, for example, form modules, and this is where "global variables" and "global methods" exist (when supported by the compiling language in question). At the end of the day, though, these are just statics by another name, and face most, if not all, of the same problems Gilad lays out above. Scala "objects" have the same basic property. I think the larger issue here is that one should be careful where one stores state, period. Every piece of data has a corresponding scope of accessibility, and developers have grown complacent about considering that scope when putting data there: they consider the accessibility at the language level (public, private, what-have-you), and fail to consider the scope beyond that (concurrency, re-entrancy, and so on). At the end of the day, it's simple: static entities and instance entities are just entities. Nothing more, nothing less. Caveat emptor.
|
 Tuesday, February 19, 2008
|
The Fallacies Remain....
|
|
Just recently, I got this bit in an email from the Redmond Developer News ezine: TWO IF BY SEA In the course of just over a week starting on Jan. 30, a total of five undersea data cables linking Europe, Africa and the Middle East were damaged or disrupted. The first two cables to be lost link Europe with Egypt and terminate near the Port of Alexandria. http://reddevnews.com/columns/article.aspx?editorialsid=2502 Early speculation placed the blame on ship anchors that might have dragged across the sea floor during heavy weather. But the subsequent loss of cables in the Persian Gulf and the Mediterranean has produced a chilling numbers game. Someone, it seems, may be trying to sabotage the global network. It's a conclusion that came up at a recent International Telecommunication Union (ITU) press conference. According to an Associated Press report, ITU head of development Sami al-Murshed isn't ready to "rule out that a deliberate act of sabotage caused the damage to the undersea cables over two weeks ago." http://tinyurl.com/3bjtdg You think? In just seven or eight days, five undersea cables were disrupted. Five. All of them serving or connecting to the Middle East. And thus far, only one cable cut -- linking Oman and the United Arab Emirates -- has been identified as accidental, caused by a dragging ship anchor. So what does it mean for developers? A lot, actually. Because it means that the coming wave of service-enabled applications needs to take into account the fact that the cloud is, literally, under attack. This isn't new. For as long as the Internet has been around, concerns about attacks on the network have centered on threats posed by things like distributed denial of service (DDOS) and other network-borne attacks. Twice -- once in 2002 and again in 2007 -- DDOS attacks have targeted the 13 DNS root servers, threatening to disrupt the Internet. But assaults on the remote physical infrastructure of the global network are especially concerning. These cables lie hundreds or even thousands of feet beneath the surface. This wasn't a script-kiddie kicking off an ill-advised DOS attack on a server. This was almost certainly a sophisticated, well-planned, well-financed and well-thought-out effort to cut off an entire section of the world from the global Internet. Clearly, efforts need to be made to ensure that the intercontinental cable infrastructure of the Internet is hardened. Redundant, geographically dispersed links, with plenty of excess bandwidth, are a good start. But development planners need to do their part, as well. Web-based applications shouldn't be crafted with the expectation of limitless bandwidth. Services and apps must be crafted so that they can fail gracefully, shift to lower-bandwidth media (such as satellite) and provide priority to business-critical operations. In short, your critical cloud-reliant apps must continue to work, when almost nothing else will. And all this, I might add, as the industry prepares to welcome the second generation of rich Internet application tools and frameworks. Silverlight 2.0 will debut at MIX08 next month. Adobe is upping the ante with its latest offerings. Developers will enjoy a major step up in their ability to craft enriched, Web-entangled applications and environments. But as you make your plans and write your code, remember this one thing: The people, organization or government that most likely sliced those four or five cables in the Mediterranean and Persian Gulf -- they can do it again. There's a couple of things to consider here, aside from the geopolitical ramifications of a concerted attack on the global IT infrastructure (which does more to damage corporations and the economy than it does to disrupt military communications, which to my understanding are mostly satellite-based). First, this attack on the global infrastructure raises a huge issue with respect to outsourcing--if you lose touch with your development staff for a day, a week, a month (just how long does it take to lay down new trunk cable, anyway?), what sort of chaos is this going to strike with your project schedule? In The World is Flat, Friedman mentions that a couple of fast-food restaurants have outsourced the drive-thru--you drive up to the speaker, and as you place your order, you're talking to somebody half a world way who's punching it into a computer that's flashing the data back to the fast-food join in question for harvesting (it's not like they make the food when you order it, just harvest it from the fields of pre-cooked burgers ripening under infrared lamps in the back) and disbursement as you pull forward the remaining fifty feet to the first window. The ludicrousness of this arrangement notwithstanding, this means that the local fast-food joint is now dependent on the global IT infrastructure in the same way that your ERP system is. Aside from the obvious "geek attraction" to a setup like this, I find it fascinating that at no point did somebody stand up and yell out, "What happened to minimizing the risks?" Effective project development relies heavily on the ability to touch base with the customer every so often to ensure things are progressing in the way the customer was anticipating. When the development team is one ocean and two continents away in one direction, or one ocean and a whole pile of islands away in the other direction, or even just a few states over, that vital communication link is now at the mercy of every single IT node in between them and you. We can make huge strides, but at the end of the day, the huge distances involved can only be "fractionalized", never eliminated. Second, as Desmond points out, this has a huge impact on the design of applications that are assuming a 100% or 99.9% Internet uptime. Yes, I'm looking at you, GMail and Google Calendar and the other so-called "next-generation Internet applications" based on technologies like AJAX. (I categorically refuse to call them "Web 2.0" applications--there is no such thing as "Web 2.0".) As much as we keep looking to the future for an "always-on" networking infrastructure, the more we delude ourselves to the practical realities of life: there is no such thing as "always-on" infrastructure. Networking or otherwise. I know this personally, since last year here in Redmond, some stronger-than-normal winter storms knocked down a whole slew of power lines and left my house without electricity for a week. To very quickly discover how much of modern Western life depends on "always-on" assumptions, go without power to the house for a week. We were fortunate--parts of Redmond and nearby neighborhoods got power back within 24 hours, so if I needed to recharge the laptop or get online to keep doing business, much less get a hot meal or just find a place where it was warm, it meant a quick trip down to the local strip mall where a restaurant with WiFi (Canyon's, for those of you that visit Redmond) kept me going. For others in Redmond, the power outage meant a brief vacation down at the Redmond Town Center Marriott, where power was available pretty much within an hour or two of its disruption. The First Fallacy of Enterprise Systems states that "The network is reliable". The network is only as reliable as the infrastructure around it, and not just the infrastructure that your company lays down from your workstation to the proxy or gateway or cable modem. Take a "traceroute" reading from your desktop machine to the server on which your application is running--if it's not physically in the building as you, then you're probably looking at 20 - 30 "hops" before it reaches the server. Every single one of those "hops" is a potential point of failure. Granted, the architecture of TCP/IP suggests that we should be able to route around any localized points of failure, but how many of those points are, in fact, to your world view, completely unroutable? If your gateway machine goes down, how does TCP/IP try to route around that? If your ISP gets hammered by a Denial-of-Service attack, how do clients reach the server? If we cannot guarantee 100% uptime for electricity, something we've had close to a century to perfect, then how can you assume similar kinds of guarantees for network availability? And before any of you point out that "Hey, most of the time, it just works so why worry about it?", I humbly suggest you walk into your Network Operations Center and ask the helpful IT people to point out the Uninterruptible Power Supplies that fuel the servers there "just in case". When they in turn ask you to point out the "just in case" infrastructure around the application, what will you say? Remember, the Fallacies only bite you when you ignore them: 1) The network is reliable 2) Latency is zero 3) Bandwidth is infinite 4) The network is secure 5) Topology doesn't change 6) There is one administrator 7) Transport cost is zero 8) The network is homogeneous 9) The system is monolithic 10) The system is finished Every project needs, at some point, to have somebody stand up in the room and shout out, "But how do we minimize the risks?" If this is truly a "mission-critical" application, then somebody needs the responsibility of cooking up "What if?" scenarios and answers, even if the answer is to say, "There's not much we can reasonably do in that situation, so we'll just accept that the company shuts its doors in that case".
|
 Monday, February 18, 2008
|
Who herds the cats?
|
|
Recently I've been looking more closely at the various (count them, four of them) proposals for adding new features into the Java language, the "BGGA", "FCM", "CICE" and "JCA" proposals. All of them are interesting and have their merits. A few other proposals for Java 7 have emerged as well, such as extension methods, enhancements to switch, the so-called "multi-catch" enhancement to exceptions, properties, better null support, and some syntax to support lists and maps natively. All of them intriguing ideas, and highly subject to reasonable debate among reasonable people. My concern lies in a different direction. Who herds this bunch of cats? This isn't just a question of process within the JCP. And it's not just a question of closures or the other features we're looking at for Java 7. This is a question about the moral leadership of Java. In the C# space, we have Anders. He clearly "owns" language, and acts as the benevolent dictator. Nothing goes into "his" language without his explicit and expressed OK. Other languages have similar personages in similar roles. Python has Guido. Perl has Larry. C++ has Bjarne. Ruby has Matz. Certainly other individuals "float" around these languages and lend their impressive weight towards the language's design--Scott Meyers, and Herb Sutter in C++, for example, or Dave Thomas and Martin Fowler in Ruby--but the core language design principles rest firmly inside the head of one man. Whereso for Java? James Gosling? Please--Jimmy abandoned the language shortly after its release, and now only comes out every so often to launch T-shirts into the crowd, answer reporters' questions whenever something Java-related comes up, and blog his two cents' worth. He's a reminder of the "good old days", for sure, but he's not coming out with new directions of his own accord and taking the reins to lead us there. He's the Teddy Kennedy of the Java Party. His endorsement weighs in as about as influential as Bob Dole's--interesting to an analytical few, but hardly meaningful in the grand scheme of things. Unfortunately, the two most recognized "benevolent dictators" of the Java language, Neal Gafter and Joshua Bloch, are on opposing sides of the aisle on this. Each has put forth a competing proposal for how the Java language should evolve. Each has his good reasons for how he wants to implement closures in Java. Each has his impressive list of names supporting him. It's Clinton and Obama, Java Edition. The fact is, though, that when these two disagreed on how to move forward, lots of Java developers found themselves in the uncomfortable position faced by the children when the parents fight: do you take sides? Do you try to make peace between them? Or do you just go hide your head under a pillow until the yelling stops? This is the real danger facing Java right now: there is no one with enough moral capital and credibility in the Java space to make this call. We can take polls and votes and strawman proposals until the cows come home, but language design by committee has generally not worked well in the past. If someone without that authority ends up making the decision, it will alienate half the Java community regardless of which way the decision goes. The split is too even to expect one to come out as the obvious front-runner. And expecting a JSR committee process to somehow resolve the differences between these four proposals into a single direction forward is asking a lot. So who makes the call?
Java/J2EE | Languages
Monday, February 18, 2008 9:47:38 PM (Pacific Standard Time, UTC-08:00)
|
|
|
Why we need both static and dynamic in the same language
|
|
Stu demonstrates one of the basic problems with an all-dynamic language: "I just spent an hour figuring out why some carefully-tested code went no-op after adding RSpec to a project." As much as I berate Stu at times (both in person and in blog), the fact is, I deeply respect and admire his programming skill, and if he can lose an hour to something that (I submit for your consideration) could have been caught by a static analysis tool fairly easily, then clearly that was a wasted hour of Stu's life. Worse, the problem is not yet solved, since now he has to make a hard choice about which definition to use, or else find a way to hack around the two definitions and create a third. Or perhaps something even uglier than this.... And this presumes that all developers using Ruby will have Stu's skill and his sense of responsibility when coming up with the solution. Asking that of all programmers across the globe is simply too much. But clearly we cannot simply abandon the power of the dynamic language, either. Quoting again from the same source, Stu points out the very reason why dynamic languages are so powerful: "Once you start treating code as data, the elegance of your code is dependent on your skill. You cannot hide behind the limitations of your programming language anymore, because there aren't any." What's a language designer left to do? Choose both, of course. The more I think about it, the more I think Cobra (and other languages) has it right: a programming language should have both static and dynamic features within it, simultaneously. This is the first "modern" language I've seen come along that espouses the "static when you can, dynamic when you want" principle as a first-class concept. Even at that, I imagine that there's much more that could be done than what Cobra espouses. Imagine combining the power of Scala's type inferencing system with the flexibility of a Groovy or Ruby. Shivers.
Monday, February 18, 2008 4:22:11 AM (Pacific Standard Time, UTC-08:00)
|
|
|
Modular Toolchains
|
|
During the Lang.NET Symposium, a couple of things "clicked" all simultaneously, giving me one of those "Oh, I get it now" moments that just doesn't want to leave you alone. During the Intentional Software presentation, as the demo wound onwards I (and the rest of the small group gathered there) found myself looking at the same source code, but presented in a variety of new ways, some of which appealed to me as the programmer, others of which appealed to the mathematicians in the room, others of which appealed to the non-programmers in the room. (I heard one of the Microsoft hosts, a non-technical program manager, I think, say, "Wow, even I could understand that spreadsheet view, and that was writing code?") During the spreadsheet-written-in-IronPython presentation (ResolverOne), we were essentially looking at new ways of writing IronPython code, thus leveraging all the syntactic power of a programming language with a nicer front end. During the aspect-oriented talk (the one by Stefan Wenig and Fabian Schmeid), we found ourselves looking at a tool that essentially takes compiled assemblies and weaves in additional code based on descriptors from outside that codebase; in essence, just another aspect-oriented tool. But combine this with my own investigations into Soot, LLVM, Parrot, and Phoenix, alongside the usual discussions around the DLR, CLR, JVM and DaVinci machine, couple that with the presentation Harry gave about parser expression grammars and the research in the functional community into parser combinators, throw in the aspect-oriented and metaprogramming facilities that the Rubyists and other dynamic linguists go on for days about, and what do you end up with? Folks, the future is in modular toolchains. This is an oversimplification, and a radical oversimplification at that, but imagine for a moment: - A parser takes your source code (let's assume it is Java, just for grins) and builds an AST out of it. Not an AST that's inherently deeply coupled to the Java language, mind you, but a general-purpose one that stands as a union of Java, C#, C++, Perl, Python, Smalltalk, and other languages. (Note that some of the linguistic concepts in some of those languages may not end up in this AST, but instead operate on the AST itself, a la C++'s template facilities.) Said parser is now finished, and can either output a binary (or potentially XML, though it'd probably be hideously verbose) version of this AST to disk for later consumption, or would more than likely be passed directly along to the next beast in the chain.
- In the simplest scenario, the next beast would be a code generator, which takes the AST and seeks to export some kind of back-end code out of it. Here, since we're working with a general-purpose AST, we can assume that this back-end is flexible and open, a la the Phoenix toolkit (where either native or MSIL can be generated).
- In a slightly more complicated scenario, verification of the correctness of the AST (against whatever libraries are specified) is checked, usually prior to code-gen, thus making this particular toolchain a statically-checked chain; were verification left out, it would need to happen at runtime, in which case we'd be talking about a dynamically-checked chain.
Note that I stay away from the term "statically-typed" or "dynamically-typed" for the moment. That would be a measurement of the parser, not the verifier. Verification still occurs in a lot of these dynamically-typed languages, just as it does in statically-typed languages. Assuming the verification process succeeds, the AST can be again, written out or passed to the next step in the chain. - Another potential step in the process, usually post-parser and pre-verification, would be an "aspect" step, in which a tool takes the AST, consults some external descriptors, and modifies the AST based on what it finds there. (This is how most of your non-AspectJ-like AOP tools work today, except that they have to rebuild the AST from compiled .class files or assemblies first.)
- Naturally, another step in the process would be an optimize step, but this has to be considered carefully, since some "high-level" optimizations can be done without regard to code-gen backend, and some will need to be done with regard to code-gen backend; for example, register spill is (from what I've heard, can't say I know too much about this) generally only useful if you know how many registers you're targeting. Plus, it's not hard to imagine certain optimizations that are only generally useful on the x86 architecture, versus those that are useful on other CPU platforms. Even operating systems I would imagine would have an impact here. (It turns out that many compiler toolchains go through a dozen or so optimization steps today, so it's not hard to imagine a "code-gen backend" being a series of a half-dozen or so targeted optimization steps before actually generating code.)
- Bear in mind, too, that these ASTs should have enough information to be directly executable, thus giving us an interpreter back-end instead of a code-generation back-end, a la the DLR instead of the CLR.
- Also, given the standard AST format, it would be relatively trivial to create a whole series of different "parser"s to get to the AST, along the lines of what the Intentional Software guys have created, thus blowing open the whole concept of "DSL" into areas that heretofore have only been imagined. You still get the complete support of the rest of the toolchain, which is what makes the whole DSL concept viable in the first place, including aspects and verification and your choice of either interpretation or compilation.
- While we're at it, bear in mind that this AST could/should also be reachable from within the code itself, thus giving languages that want to operate on their own AST at runtime the ability to do so, because the AST is in a standard format and the interpreter could be bundled as part of the generated executable, thus providing a compile-when-you-can-interpret-when-you-must flavor that is currently the reigning meme in language/platform environments like JRuby. (It would also have the happy side effect of making Paul Graham shut up about Lisp, at least for a while. Yes, Paul, code-as-data, it's brilliant, it's wonderful, we get it.)
- Nothing says this toolchain needs be one-way, by the way: many of the toolkits I mentioned before (LLVM, Phoenix, Soot) can start from compiled binary and work back to AST, thus offering us the opportunity to do surgery of either the exploratory kind (static analysis) or the manipulative kind (aspect-weaving, etc) on compiled code in a relatively clean way. Reflector demonstrates the power of being able to go "back and forth" in this way (even in the relatively limited way Reflector does so), so imagine how powerful it would be to do this from end-to-end throughout the toolchain.
How likely is this utopian vision? I'm not sure, honestly--certainly tools like LLVM and Phoenix seem to imply that there's ways to represent code across languages in a fairly generic form, but clearly there's much more work to be done, starting with this notion of the "uber-AST" that I've been so casually tossing around without definition. Every AST is more or less tied to the language it is supposed to represent, and there's clearly no way to imagine an AST that could represent every language ever invented. Just imagine trying to create an AST that could incorporate Java, COBOL and Brainf*ck, for example. But if we can get to a relatively stable 80/20, where we manage to represent the most-commonly-used 80% of languages within this AST (such as an AST that can incorporate Java, C#, and C++, for starters), then maybe there's enough of a critical mass there to move forward. Now all I need to do is find somebody who'll fund this little bit of research... anybody got a pile of cash they don't know what to do with?  Update: By the way, in case you want a graphical depiction of what I'm thinking about, the Phoenix page has one (though obviously it's limited to the Phoenix scope of vision, and you may have to be a Microsoft CONNECT member to see it).
|
 Sunday, February 10, 2008
|
An Appeal: www.findjohnglasgow.com
|
|
Long-time readers of this blog know that as a general rule, I try not to include much in the way of personal stuff here; I try (sometimes with more success than others) to keep the subject material focused on the technology space: Java, .NET, Ruby, languages, XML services, and so on. This, however, is a deviation from that norm. A near and dear friend of mine has asked that I help spread the word about the disappearance of a family member (a cousin, in fact). I don't know the details of the disappearance other than what anybody else can read on the website, but I do know that if someone in my family were to go missing for an inexplicable reason, I would want the help of anybody and everybody I knew to try and find them. I would like to ask everyone's help in finding my brother John. He went missing January 28, 2008 and official search efforts were called off last Friday, even though the family has mounted their own search. Please go to www.findjohnglasgow.com to see a picture of John and print off a flyer. If you could put it in your car window or some other visible place, it would help us a lot. It is possible that he could have traveled out of the area where he went missing, so we are trying to get the word out on a national level, to cover all possible scenarios. Thank you all for your help. Donna Jean Glasgow February 6, 2008 If you reside near the Little Rock, Arkansas area in particular, please take a look at the photo below ...  ... and let somebody (either me or through the above-mentioned website) know if you've seen him, one way or another. I won't ask you to forward this to everyone you know; instead I just ask that if you feel a twinge of sympathy for a missing family member that's connected to you through less than two degrees of separation, then do what you think would help. Thanks for your time.
Sunday, February 10, 2008 6:09:10 AM (Pacific Standard Time, UTC-08:00)
|
|
 Sunday, February 03, 2008
|
Maybe 'twould be better to suggest "done like the Giants"
|
|
Wow. Giants 17, Patriots 14, when just about everybody had the Patriots by two touchdowns or so. Just goes to show, shouldn't count the little guy out 'til the fat lady sings and the cows come home. Also just goes to show, I shouldn't be blogging after an emotional heart-jerker like that one.
Sunday, February 03, 2008 9:38:19 PM (Pacific Standard Time, UTC-08:00)
|
|
 Saturday, February 02, 2008
|
My Secret (?) Shame (Or, Building Parrot 0.5.2)
|
|
OK, after a week of getting the Internet equivalent of Bad Mojo being sent my way by every Perl developer on the planet, I have to admit something that may strike readers as inconsistent and incongruous. I want Parrot to work. I don't really care about Perl 6, per se. As I've said before, the language has a lot of linguistic inconsistencies and too many violations of the the Principle of Least Surprise to carry a lot of favor with me. Whether Perl-the-language lives or dies really doesn't make a significant dent in my life. But Parrot.... now there's something I care about. Following the open debate on Perl (a surprising side-effect, given the subject matter of the post that spawned it), and chromatic's insistence that Parrot development was moving along, I decided to give in to my secret hopes, and pull the Parrot bits down again for a look-see. In the spirit of the OpenJDK post last month, this is a quick chronicle of how I got Parrot to build on a Win32 system. Installation details Just for the record, I'm doing this in a VMWare image (one in which I keep all the languages I play with) with both Visual Studio 2008 and Visual Studio 2005 installed. The Parrot docs explicitly reference using Visual Studio 2003 (or the free Visual C++ Toolkit, which has since turned into Visual C++ 2005 Express), but I'm going to first have a shot at it with VS 2008 before falling back to VS 2005. This shouldn't make any difference, because 2008 is supposed to be a superset of 2005, but... well, you know how that old chestnut goes. svn co parrot Checking Parrot's code out is easy: just svn co https://svn.perl.org/parrot/trunk parrot-svn . (I use the -svn suffix on directories to distinguish between svn-pulled source trees and downloaded source trees. Helps in case I ever need/want to pull down a named release and keep the svn-pulled source at the same time.) I pull all this into a directory underneath C:\Prg, so the total path to Parrot's source base is C:\Prg\parrot-svn. Configure From there, as with many Unix-based projects, you have to run the "Configure.pl" script. I opened up a VS 2008 Command Prompt, and used ActiveState's Perl [1] to run the Configure script. It chugs away and comes back with this message: C:\Prg\parrot-svn>perl Configure.pl Parrot Version 0.5.2 Configure 2.0 Copyright (C) 2001-2008, The Perl Foundation. Hello, I'm Configure. My job is to poke and prod your system to figure out how to build Parrot. The process is completely automated, unless you passed in the `--ask' flag on the command line, in which case I'll prompt you for a few pieces of info. Since you're running this program, you obviously have Perl 5--I'll be pulling some defaults from its configuration. Checking MANIFEST.....................................................done. Setting up Configure's default values.................................done. Setting up installation paths.........................................done. Tweaking settings for miniparrot...................................skipped. Loading platform and local hints files................................done. Finding header files distributed with Parrot..........................done. Determining what C compiler and linker to use.........................done. Determining whether make is installed..................................yes. Determining whether lex is installed...............................skipped. Determining whether yacc is installed..............................skipped. Determining if your C compiler is actually gcc..........................no. Determining whether libc has the backtrace* functions (glibc only)......no. Determining Fink location on Darwin................................skipped. Determining if your C compiler is actually Visual C++..................yes. Detecting compiler attributes (-DHASATTRIBUTE_xxx)....................done. Detecting supported compiler warnings (-Wxxx)......................skipped. Enabling optimization...................................................no. Determining flags for building shared libraries.......................done. Determine if parrot should be linked against a shared library..........yes. Determining what charset files should be compiled in..................done. Determining what encoding files should be compiled in.................done. Determining what types Parrot should use..............................done. Determining what opcode files should be compiled in...................done. Determining what pmc files should be compiled in......................done. Determining your minimum pointer alignment......................... 1 byte. Probing for C headers.................................................done. Determining some sizes................................................done. Computing native byteorder for Parrot's wordsize.............little-endian. Test the type of va_ptr (this test is likely to segfault)............stack. Figuring out how to pack() Parrot's types.............................done. Figuring out what formats should be used for sprintf..................done. Determining if your C library has a working S_ISREG.....................no. Determining CPU architecture and OS...................................done. Determining architecture, OS and JIT capability.......................done. Generating CPU specific stuff.........................................done. Verifying that the compiler supports function pointer casts............yes. Determining whether your compiler supports computed goto................no. Determining if your compiler supports inline...........................yes. Determining what allocator to use.....................................done. Determining if your C library supports memalign.........................no. Determining some signal stuff.........................................done. Determining whether there is socklen_t..................................no. Determining if your C library has setenv / unsetenv...............unsetenv. Determining if your platform supports AIO...............................no. Determining if your platform supports GMP...............................no. Determining if your platform supports readline..........................no. Determining if your platform supports gdbm..............................no. Testing snprintf......................................................done. Determining whether perldoc is installed...............................yes. Determining whether python is installed.........................yes, 2.5.1. Determining whether GNU m4 is installed................................yes. Determining whether (exuberant) ctags is installed......................no. Determining Parrot's revision.......................................r25452. Determining whether ICU is installed................................failed. Generating C headers..................................................done. Generating core pmc list..............................................done. Generating runtime/parrot/include.....................................done. Configuring languages.................................................done. Generating makefiles and other build files............................done. Moving platform files into place......................................done. Recording configuration data for later retrieval......................done. Okay, we're done! You can now use `nmake' to build your Parrot. After that, you can use `nmake test' to run the test suite. Happy Hacking, The Parrot Team C:\Prg\parrot-svn> Looks good so far. I kick off nmake (which is still running as I write this). Note that the Configure script discovers ActiveState's Perl as part of its rummaging around on my system, so that's what it uses to do the build steps that require execution of Perl. I have no idea what the least-acceptable version of AS Perl is, but the version I pulled down was probably about a year ago. (Note: I have to admit, the Configure stuff is slick. I don't like opening those files and looking at what's in there, but you'll never hear me criticize the existence of Perl, for this reason alone: having a scripting language that can rummage around your machine and figure out the paths to all the cr*p it needs to build is a hideously useful thing. I do admit to wishing those scripts were written in something I feel better about reading, though, like Ruby, but this is a practice that far pre-dates me, so I'll just shut up and ride along because I find it useful when it works. As it does here.) Note to the Parrot guys: under VS 2008, the build generates a ton of warnings. Most of all, VS 2008 complains about the use of the Wp64 flag, which it says is deprecated and will be removed in a future release. (Chromatic, if you want a full build log, I can clean-and-build again and send you the piped output, if it'll help.) After about 10 minutes of disk churn and a ton of warnings reported (most of which seem to be just three or four warnings being repeated throughout the code, so either it's something in a couple of headers files that're included from everywhere, or these are spurious warnings that could be turned off via a #pragma)... success! I have a parrot.exe, along with a few other .exe utilities, in the root of the parrot-svn directory. Next step: "nmake test". Well, clearly parrot must be working pretty well, because it's churning through a ton of tests with "ok" results for everything except that which is platform-specific (a la the Fink tests intended for Darwin/Mac OS X, which are obviously going to fail on my XP box and therefore get skipped). A couple of tests get skipped (in the compilers tree?) with explanations that I don't quite understand, but it doesn't look like these are errors, per se, so I'm willing to accept on faith that we're all kosher. So while the tests are still running, I'll post this and offer up kudos to chromatic and the crew for something that at least builds, runs, and passes a whole slew of unit tests. Now for the fun part--finding out how extensive PMC, PIR and PASM are, and thinking about how this VM fits in the Grand Scheme of Things against the Da Vinci Machine and the DLR and the JVM and the CLR....  (Note to self: must suggest to John Lam and the guys on the DLR team to invite chromatic up to the Lang.NET 2009 Symposium. If the Sun folks can be made to feel welcome on the Microsoft campus for this kind of event, then surely the Parrot guys can come and feel welcome and--hopefully--carry away some interesting ideas, too.) Update: Well, might have spoken too soon, looks like the tests failed after all. To be exact, the tests hung for a while, and I Ctrl-C'ed the process because it didn't look like it was going anywhere; this is the last few lines: t/library/cgi_query_hash.....................ok t/library/coroutine..........................ok t/library/data_escape........................ok 1/22 skipped: test not written t/library/dumper.............................ok t/library/File_Spec..........................ok t/library/getopt_obj.........................ok t/library/iter...............................ok t/library/md5................................ok t/library/mime_base64........................ok t/library/parrotlib..........................ok t/library/pcre............................... t/library/pcre...............................NOK 1# Failed test (t/library/p cre.t at line 48) # Exited with error code: 1 # Received: # ok 1 # ok 2 # Null PMC access in invoke() # current instr.: 'parrot;PCRE;compile' pc 118 (C:\Prg\parrot-svn\runtime\parrot \library\pcre.pir:127) # called from Sub 'main' pc 83 (C:\Prg\parrot-svn\t\library\pcre_1.pir:49) # # Expected: # ok 1 # ok 2 # ok 3 # ok 4 # ok 5 # # Looks like you failed 1 test of 1. t/library/pcre...............................dubious Test returned status 1 (wstat 256, 0x100) DIED. FAILED test 1 Failed 1/1 tests, 0.00% okay t/library/pg.................................Terminating on signal SIGINT(2) NMAKE : fatal error U1077: NMAKE : fatal error U1058: terminated by user Stop. C:\Prg\parrot-svn> Not sure what this means, but bear in mind, this is off today's tip, so it may be a temporary thing. [1] Why, you may ask, do I have Active State's Perl installed if I so despise the language? Rotor (SSCLI 2.0) uses it as part of its build process, and I like spelunking with Rotor, as some of you will have noticed.
|
|