Powered by: newtelligence dasBlog 1.9.7067.0
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.
© Copyright 2008 , Ted Neward
E-mail
For those in the Java community who've heard brief rumors about the suggested feature set of C# 3.0 announced last week at PDC, let me be the first to point out that nothing in the language (aside from generics, which Microsoft did right in C# 2.0, integrating them into the virtual machine rather than the type-erasure-based approach that Java chose) that's proposed couldn't be done in the Java language or on top of the JVM; in fact, most of the features of C# 3.0 are, arguably, nothing but syntactic sugar designed to make programming more productive. What I plan to do here is explain each of the features of C# 3, show how they're implemented (by examining the generated CIL), at least in the PDC preview Microsoft handed out at PDC, and by doing so demonstrate how Java could be extended in turn to support exactly the same sorts of features.
Standard disclaimer applies: all of this is based on the PDC preview of C# 3.0, no guarantees or warranties implied, use at your own risk, yadda yadda yadda. In short, if you install it, and it blows up your hard drive, it's your own fault.
For starters, C# 3.0 will support implicitly typed local variables, meaning that programmers can now write code in a more "ignorant" fashion--programmers need not worry so much about getting the types exactly correct when working with local variables:
var i = 5; var s = "This is an implicitly typed local variable"; var a = new int[] { 1, 2, 3 };
int i = 5; string s = "This is an implicitly typed local variable"; int[] a = new int[] { 1, 2, 3 };
.method private hidebysig static void Main() cil managed { .entrypoint // Code size 28 (0x1c) .maxstack 3 .locals init (int32 V_0, string V_1, int32[] V_2) IL_0000: nop IL_0001: ldc.i4.5 IL_0002: stloc.0 IL_0003: ldstr "This is an implicitly typed local variable" IL_0008: stloc.1 IL_0009: ldc.i4.3 IL_000a: newarr [mscorlib]System.Int32 IL_000f: dup IL_0010: ldtoken field valuetype '{E4ADF86B-1985-4CA3-90AF-B705A8279423}'/'__StaticArrayInitTypeSize=12' '{E4ADF86B-1985-4CA3-90AF-B705A8279423}'::'$$method0x6000001-1' IL_0015: call void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array, valuetype [mscorlib]System.RuntimeFieldHandle) IL_001a: stloc.2 IL_001b: ret } // end of method Sample::Main
It may seem odd and a trivial feature to add, but this will turn out to be a profound feature of the language when coupled with object initializers, next.
One of the more annoying aspects of C# (and Java, or C++ for that matter) is that we end up having to write a lot of redundant code, particularly when frequently it's all effectively the same basic conceptual idea. One such area of redundancy is constructors--far too often, we write classes whose constructors do the most basic thing a constructor can do, which is of course to initialize its fields to their desired values. Object initializer syntax allows for simple initialization of types without requiring an explicit constructor to be written:
public class Point { int x; int y; public int X { get { return x; } set { x = value; } } public int Y { get { return y; } set { y = value; } } } Point p = new Point { X = 0, Y = 1 };
Point p = new Point(); p.X = 0; p.Y = 1;
.method private hidebysig static void Main() cil managed { .entrypoint // Code size 28 (0x1c) .maxstack 2 .locals init (class Point V_0, class Point V_1) IL_0000: nop IL_0001: nop IL_0002: newobj instance void Point::.ctor() IL_0007: stloc.1 IL_0008: ldloc.1 IL_0009: ldc.i4.0 IL_000a: callvirt instance void Point::set_X(int32) IL_000f: nop IL_0010: ldloc.1 IL_0011: ldc.i4.1 IL_0012: callvirt instance void Point::set_Y(int32) IL_0017: nop IL_0018: ldloc.1 IL_0019: nop IL_001a: stloc.0 IL_001b: ret } // end of method Program::Main
nop
newobj
callvirt set_X
callvirt set_Y
And this isn't limited to primitive type fields, either; we can do the same for complex fields, as in:
public class Rectangle { Point p1; Point p2; public Point UpperLeft { get { return p1; } set { p1 = value; } } public Point LowerRight { get { return p2; } set { p2 = value; } } } Rectangle r = new Rectangle { UpperLeft = new Point { X = 0, Y = 0 }, LowerRight = new Point { X = 5, Y = 5 } };
Note that along with object initializers, C# 3 also introduces a similar syntax for initializing arrays and collections of various forms; this is more fully documented in the C# 3.0 Language Specification that ships with the PDC Preview bits, but lexically looks pretty similar to object initializers, so I'll just refer you to that document for details.
var x = new { UpperLeft = new Point { X = 0, Y = 0 }, LowerRight = new Point { X = 5, Y = 5 } };
class __This_Name_Really_Doesnt_Matter { private Point _Field1; private Point _Field2; public Point UpperLeft { get { return _Field1; } set { _Field1 = value; } } public Point LowerRight { get { return _Field2; } set { _Field2 = value; } } public override bool Equals(bool rhs) { ... } public override string ToString() { ... } public override int HashCode() { ... } }
Another significant addition to the C# 3.0 language will be extension methods, whereby one class can lexically "inject" methods into another class by declaring a specific form of static method on a static class. Once again, however, it will be pretty clear that this is pretty much all just compiler syntactic sugar, and once again will play a significant role in DLinq.
To declare an extension method, create a static class (a new feature of C# 2.0, a static class is a class that can never be instantiated--in many respects, it is a formalization of the old procedural library concept from C or Pascal) that contains a static method as usual, but with one minor difference. To make this method an extension method, declare the first parameter to have an additional modifier, the this keyword, to indicate the type to which this method will extend.
this
This is a bit confusing, but bear with me--a few examples will make it clearer.
From time to time, every object programmer has lamented the inability to "slip in" functionality on a base class they do not control--one of the classes from the Framework Class Library, perhaps, or a class that comes out of a commercial third-party library to which they do not own the source. (Even open-source projects are resistant to this kind of injected change, because forking an open-source project is not a task undertaken lightly--you will have to make the same changes to every successive version of the library, an unenviable task.) Using an extension method, the compiler will effectively "pretend" that the extension method is declared on that class, and allow for invocation of the extension method as an instance method of the object.
Begin with a basic class, perhaps our Point class from before:
public class Point { int x, y; public int X { get { return x; } set { x = value; } } public int Y { get { return y; } set { y = value; } } }
Extension methods offer a way out:
namespace Extender { public static class XMLUtil { public static string ToXML(this Point pt) { Console.WriteLine("Imagine cool XML code here"); } } } }
using
Point pt = new Point { X = 0, Y = 1 }; Console.WriteLine("pt.ToXML = {0}", pt.ToXML());
.method private hidebysig static void Main() cil managed { .entrypoint // Code size 45 (0x2d) .maxstack 2 .locals init (class Point V_0, class Point V_1) IL_0000: nop IL_0001: nop IL_0002: newobj instance void Point::.ctor() IL_0007: stloc.1 IL_0008: ldloc.1 IL_0009: ldc.i4.0 IL_000a: callvirt instance void Point::set_X(int32) IL_000f: nop IL_0010: ldloc.1 IL_0011: ldc.i4.1 IL_0012: callvirt instance void Point::set_Y(int32) IL_0017: nop IL_0018: ldloc.1 IL_0019: nop IL_001a: stloc.0 IL_001b: ldstr "p.ToXML() = {0}" IL_0020: ldloc.0 IL_0021: call string Extender.XMLUtil::ToXML(class Point) IL_0026: call void [mscorlib]System.Console::WriteLine(string, object) IL_002b: nop IL_002c: ret } // end of method Program::Main
Note that of course extension methods introduce some interesting method-overload-resolution rules, such as when the extension method clashes with a method on the extended type (the extended type wins) or when two extension methods of the same name and signature are both brought in via a using clause (in which case the "most nested" using expression, inside namespace declarations, wins). These rules are likely to change as feedback filters in on the released PDC bits, so if you're to bet the farm on this particular aspect of the language (pun intended), make sure to keep up with the latest C# 3.0 specification changes as well.
namespace
Note also that as of this writing, the PDC Preview bits also come with this note in the documentation:
Extension methods are less discoverable and more limited in functionality than instance methods. For those reasons, it is recommended that extension methods be used sparingly and only in situations where instance methods are not feasible or possible. ... Extension members of other kinds, such as properties, events, and operators, are being considered but are currently not supported.
The lambda expression, long a favorite of Lisp programmers, has come to C#. While the .NET platform has always had the capability to create delegates, which are essentially managed function pointers, and while delegates could always be used as a poor man's subsitute for lambda expressions, former Lisp programmers have always had a yearning in their heart to see real lamba expressions in their favorite .NET language. Anders heard the call, and answered: where C# 2.0 introduced the ability to create anonymous delegates, method bodies that are implicitly converted into a class with a single method (the anonymous method itself), C# 3.0 introduces lambda expressions, the ability to define a method body--or, more accurately, just a block of code--in a fairly terse and elegant way. The lamba expressions are probably the hardest part of the C# 3.0 specification to grok if you've not nseen it before, however, so be prepared to spend a little time with it before it all makes intuitive sense.
In essence, a lambda expression follows the pattern aid down by a delegate, so to begin we start by declaring a delegate type to which lamba expressions should be assigned; in the PDC preview documentation, for example, they use this example:
delegate R Func<A, R>(A arg);
Func<int, int> f1 = new Func<int, int>(MyClass.MyMethodTakingAnIntAndReturningAnInt);
Func<int, int> f1 = delegate(int i) { return i * i; };
Func<int, int> f1 = x => x + 1;
Console.WriteLine(f1(12)); // prints 144
The last of the new features of C# 3, the query language features, isn't really a language feature per se, but a close integration of the compiler and expected library support it's compiling against, and as such doesn't really openly qualify as "language innovation", in my opinion. That said, though, it's damn useful, and what's more interesting, Java actually has a tool that can provide this kind of capability already--the OpenJava compiler tool (from the same folks that brought you Javassist, the bytecode manipulation tool that is at the heart of JBoss, among other open-source projects), which allows you full metaobject protocol capabilities, including the ability to add new keywords to the language.
And that's ultimately my point here: as you've seen, nothing that C# 3.0 introduces is really all that revolutionary once we get past the compiler--even the extension methods and lambda expressions are defined in terms of what's already present within the language and framework, making the entire exercise one in compiler syntactic sugar. Very sweet, very addictive sugar, perhaps, but just syntactic sugar nonetheless. And yet, because these features are still built in terms of the CLR, it means that we have full fidelity static-typing, even through the syntactic sugar (unlike what happens in the case of Java generics).
For ten years, Sun has insisted that Java Language and Java Virtual Machine must remain in lockstep, and as a result the language innovation in Java has either completely stagnated (the only real language innovation in Java 5 was the custom annotations model, and that was almost a direct copy of what .NET had done before), or else occurred outside of Sun's--and therefore "official Java"'s--boundaries. Sun needs to realize that the strength of the JVM by far exceeds the limited language potential of the Java language, and if they don't want to watch Java's popularity begin a steady decline, they need to cut the umbilical and let the JVM run free and the language innovation truly begin. Otherwise, it's looking like a very CLR world ahead of us.