JOB REFERRALS
    ON THIS PAGE
    ARCHIVES
    CATEGORIES
    BLOGROLL
    LINKS
    SEARCH
    MY BOOKS
    DISCLAIMER
 
 Wednesday, September 21, 2005
Language Innovation: C# 3.0 explained

For those in the Java community who've heard brief rumors about the suggested feature set of C# 3.0 announced last week at PDC, let me be the first to point out that nothing in the language (aside from generics, which Microsoft did right in C# 2.0, integrating them into the virtual machine rather than the type-erasure-based approach that Java chose) that's proposed couldn't be done in the Java language or on top of the JVM; in fact, most of the features of C# 3.0 are, arguably, nothing but syntactic sugar designed to make programming more productive. What I plan to do here is explain each of the features of C# 3, show how they're implemented (by examining the generated CIL), at least in the PDC preview Microsoft handed out at PDC, and by doing so demonstrate how Java could be extended in turn to support exactly the same sorts of features.

Standard disclaimer applies: all of this is based on the PDC preview of C# 3.0, no guarantees or warranties implied, use at your own risk, yadda yadda yadda. In short, if you install it, and it blows up your hard drive, it's your own fault. :-)

Implicitly typed variables

For starters, C# 3.0 will support implicitly typed local variables, meaning that programmers can now write code in a more "ignorant" fashion--programmers need not worry so much about getting the types exactly correct when working with local variables:

var i = 5;
var s = "This is an implicitly typed local variable";
var a = new int[] { 1, 2, 3 };
It's important to realize here that these are not "var" types in the JavaScript sense, but are in fact statically-typed references whose type is inferred by the compiler instead of explicitly declared by the programmer; in essence, the code that's generated is the same as if we'd written:
int i = 5;
string s = "This is an implicitly typed local variable";
int[] a = new int[] { 1, 2, 3 };
We can verify this by running the code through the C# compiler and examining the resulting IL:
.method private hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       28 (0x1c)
  .maxstack  3
  .locals init (int32 V_0,
           string V_1,
           int32[] V_2)
  IL_0000:  nop
  IL_0001:  ldc.i4.5
  IL_0002:  stloc.0
  IL_0003:  ldstr      "This is an implicitly typed local variable"
  IL_0008:  stloc.1
  IL_0009:  ldc.i4.3
  IL_000a:  newarr     [mscorlib]System.Int32
  IL_000f:  dup
  IL_0010:  ldtoken    field valuetype 
'{E4ADF86B-1985-4CA3-90AF-B705A8279423}'/'__StaticArrayInitTypeSize=12' 
'{E4ADF86B-1985-4CA3-90AF-B705A8279423}'::'$$method0x6000001-1'
  IL_0015:  call       
    void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array,
        valuetype [mscorlib]System.RuntimeFieldHandle)
  IL_001a:  stloc.2
  IL_001b:  ret
} // end of method Sample::Main
Notice the .locals directive? For those not familiar with IL, that's the declaration of the local variables in the method, and as you can see, the three locals (named V_0, V_1 and V_2) are declared to be of type int32, string and int32[], respectively--the compiler inferred those type values from the literals assigned to them. Which means, correspondingly, since the compiler has to infer the type values, we can't have an implicitly typed local variable without some sort of hint as to what type it should be--therefore, no uninitialized "var" types are allowed.

It may seem odd and a trivial feature to add, but this will turn out to be a profound feature of the language when coupled with object initializers, next.

Object initializers

One of the more annoying aspects of C# (and Java, or C++ for that matter) is that we end up having to write a lot of redundant code, particularly when frequently it's all effectively the same basic conceptual idea. One such area of redundancy is constructors--far too often, we write classes whose constructors do the most basic thing a constructor can do, which is of course to initialize its fields to their desired values. Object initializer syntax allows for simple initialization of types without requiring an explicit constructor to be written:

public class Point
{
  int x; int y;

  public int X { get { return x; } set { x = value; } }
  public int Y { get { return y; } set { y = value; } }
}

Point p = new Point { X = 0, Y = 1 };
Again, what gets compiled here is precisely what the client would write, given that there is no constructor for Point:
Point p = new Point();
p.X = 0;
p.Y = 1;
Verifying this in CIL is pretty easy:
.method private hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       28 (0x1c)
  .maxstack  2
  .locals init (class Point V_0,
           class Point V_1)
  IL_0000:  nop
  IL_0001:  nop
  IL_0002:  newobj     instance void Point::.ctor()
  IL_0007:  stloc.1
  IL_0008:  ldloc.1
  IL_0009:  ldc.i4.0
  IL_000a:  callvirt   instance void Point::set_X(int32)
  IL_000f:  nop
  IL_0010:  ldloc.1
  IL_0011:  ldc.i4.1
  IL_0012:  callvirt   instance void Point::set_Y(int32)
  IL_0017:  nop
  IL_0018:  ldloc.1
  IL_0019:  nop
  IL_001a:  stloc.0
  IL_001b:  ret
} // end of method Program::Main
The nops are interesting, but irrelevant to our discussion (they'll get optimized away by the JITter at runtime, anyway). The interesting part of this is the sequence of instructions at 0002, 000a, and 0012: newobj to create the Point instance, callvirt set_X and callvirt set_Y to set the X and Y properties, respectively. (In C#, the property construct basically maps to compiler-generated get_ and set_ calls accordingly.

And this isn't limited to primitive type fields, either; we can do the same for complex fields, as in:

public class Rectangle
{
  Point p1; Point p2;

  public Point UpperLeft { get { return p1; } set { p1 = value; } }
  public Point LowerRight { get { return p2; } set { p2 = value; } }
}

Rectangle r = new Rectangle { 
    UpperLeft = new Point { X = 0, Y = 0 },
    LowerRight = new Point { X = 5, Y = 5 }
};
Verifying that this is similar IL to the Point example above is left as an exercise to the reader. (Which is to say, it's there, but it's a bit long and doesn't really prove much; trust me on this.)

Note that along with object initializers, C# 3 also introduces a similar syntax for initializing arrays and collections of various forms; this is more fully documented in the C# 3.0 Language Specification that ships with the PDC Preview bits, but lexically looks pretty similar to object initializers, so I'll just refer you to that document for details.

Anonymous types

Combining the above two features brings us to an interesting conclusion: if we are teaching the compiler to infer static type information and provide some basic defaults for types, then we can actually expect some fairly interesting intuition on the part of the compiler now--in particular, the compiler is now smart enough to be able to infer an entire type during compilation. Thanks to the object-initializer syntax (to provide the necessary constructor capabilities) and the implicitly-typed local variable syntax (to be able to avoid having to name the type), we can write the following and expect a statically-typed class out of it:
var x = new { UpperLeft = new Point { X = 0, Y = 0 }, LowerRight = new Point { X = 5, Y = 5 } };
Again, thanks to the initalizer syntax, the compiler now has enough information to be able to auto-generate the following:
class __This_Name_Really_Doesnt_Matter
{
  private Point _Field1;
  private Point _Field2;

  public Point UpperLeft { get { return _Field1; } set { _Field1 = value; } }
  public Point LowerRight { get { return _Field2; } set { _Field2 = value; } }

  public override bool Equals(bool rhs) { ... }
  public override string ToString() { ... }
  public override int HashCode() { ... }
}
which, if you think about it, is pretty cool. Project DLinq, the relational access project Microsoft introduced at PDC, will use this to address the partial query problem that plagues automated object-relational mapping layers, as now we can introduce new types into the system (as return types from an ad-hoc query) in just a line or two of code, rather than the twenty or so that would otherwise be required.

Extension methods

Another significant addition to the C# 3.0 language will be extension methods, whereby one class can lexically "inject" methods into another class by declaring a specific form of static method on a static class. Once again, however, it will be pretty clear that this is pretty much all just compiler syntactic sugar, and once again will play a significant role in DLinq.

To declare an extension method, create a static class (a new feature of C# 2.0, a static class is a class that can never be instantiated--in many respects, it is a formalization of the old procedural library concept from C or Pascal) that contains a static method as usual, but with one minor difference. To make this method an extension method, declare the first parameter to have an additional modifier, the this keyword, to indicate the type to which this method will extend.

This is a bit confusing, but bear with me--a few examples will make it clearer.

From time to time, every object programmer has lamented the inability to "slip in" functionality on a base class they do not control--one of the classes from the Framework Class Library, perhaps, or a class that comes out of a commercial third-party library to which they do not own the source. (Even open-source projects are resistant to this kind of injected change, because forking an open-source project is not a task undertaken lightly--you will have to make the same changes to every successive version of the library, an unenviable task.) Using an extension method, the compiler will effectively "pretend" that the extension method is declared on that class, and allow for invocation of the extension method as an instance method of the object.

Begin with a basic class, perhaps our Point class from before:

public class Point
{
  int x, y;

  public int X { get { return x; } set { x = value; } }
  public int Y { get { return y; } set { y = value; } }
}
As we work with the Point class, however, it becomes obvious that Point doesn't provide some form of critical functionality--perhaps it doesn't support native transation to and from XML, for example. (The fact that XMLSerializer will provide that functionality for this simple of a type is irrelevant for now; substitute your own favorite example, if you prefer.) What we'd like to do is "slip in" a pair of methods, ToXML and FromXML, that produce and take a string, respectively. Unfortunately, Point is not under our control, and although we could decompile it to C# and recompile (which won't work with strongly-named assemblies), that's obviously a hack.

Extension methods offer a way out:

namespace Extender
{
  public static class XMLUtil
  {
    public static string ToXML(this Point pt)
    { 
      Console.WriteLine("Imagine cool XML code here"); }
    }
  }
}
To "kick in" an extension method (or, perhaps more appropriately, a set of extension methods), we need only reference the namespace in which the extensions are declared with a using statement, as we would otherwise do for a normal class. This tells the compiler that the extension methods are now lexically "in" the class' interface, and are available for use. Only now we can use the ToXML method on a Point instance directly, as shown below:
Point pt = new Point { X = 0, Y = 1 };
Console.WriteLine("pt.ToXML = {0}", pt.ToXML());
A horrendous violation of encapsulation? Not particularly--notice what the C# compiler will do with this call:
.method private hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       45 (0x2d)
  .maxstack  2
  .locals init (class Point V_0,
           class Point V_1)
  IL_0000:  nop
  IL_0001:  nop
  IL_0002:  newobj     instance void Point::.ctor()
  IL_0007:  stloc.1
  IL_0008:  ldloc.1
  IL_0009:  ldc.i4.0
  IL_000a:  callvirt   instance void Point::set_X(int32)
  IL_000f:  nop
  IL_0010:  ldloc.1
  IL_0011:  ldc.i4.1
  IL_0012:  callvirt   instance void Point::set_Y(int32)
  IL_0017:  nop
  IL_0018:  ldloc.1
  IL_0019:  nop
  IL_001a:  stloc.0
  IL_001b:  ldstr      "p.ToXML() = {0}"
  IL_0020:  ldloc.0
  IL_0021:  call       string Extender.XMLUtil::ToXML(class Point)
  IL_0026:  call       void [mscorlib]System.Console::WriteLine(string,
                                                                object)
  IL_002b:  nop
  IL_002c:  ret
} // end of method Program::Main
The giveaway is at instruction 0021: the C# compiler is actually generating a standard static method call on Extender::XMLUtil::ToXML, passing in the Point instance in question (which is why the first parameter being decorated with "this" makes sense, since it's conceptually the "this" reference normally implicit in an instance method) for manipulation and examination by the extension method. No violation of encapsulation whatsoever. In fact, the extension method has zero access to non-public members of Point, thus avoiding one of the principal concerns over aspects voiced by critics of AOP, that of managing state in aspects and/or across classes and aspects. But for all other purposes, this is aspect-oriented programming in the grand tradition of AspectJ, just with a very limited pointcut capability. (It would be trivial to write the corresponding AspectJ aspect to my ToXML method above, but I'll leave that for Ron Bodkin, Nick Liesecki or Ramnivas Laddad--or anyone else passingly familiar with AspectJ--to contribute on their own blogs. :-) )

Note that of course extension methods introduce some interesting method-overload-resolution rules, such as when the extension method clashes with a method on the extended type (the extended type wins) or when two extension methods of the same name and signature are both brought in via a using clause (in which case the "most nested" using expression, inside namespace declarations, wins). These rules are likely to change as feedback filters in on the released PDC bits, so if you're to bet the farm on this particular aspect of the language (pun intended), make sure to keep up with the latest C# 3.0 specification changes as well.

Note also that as of this writing, the PDC Preview bits also come with this note in the documentation:

Extension methods are less discoverable and more limited in functionality than instance methods. For those reasons, it is recommended that extension methods be used sparingly and only in situations where instance methods are not feasible or possible. ... Extension members of other kinds, such as properties, events, and operators, are being considered but are currently not supported.
If you are a C# programmer and particularly desire those styles of operations, now's the time to let Microsoft know.

Lambda Expressions

The lambda expression, long a favorite of Lisp programmers, has come to C#. While the .NET platform has always had the capability to create delegates, which are essentially managed function pointers, and while delegates could always be used as a poor man's subsitute for lambda expressions, former Lisp programmers have always had a yearning in their heart to see real lamba expressions in their favorite .NET language. Anders heard the call, and answered: where C# 2.0 introduced the ability to create anonymous delegates, method bodies that are implicitly converted into a class with a single method (the anonymous method itself), C# 3.0 introduces lambda expressions, the ability to define a method body--or, more accurately, just a block of code--in a fairly terse and elegant way. The lamba expressions are probably the hardest part of the C# 3.0 specification to grok if you've not nseen it before, however, so be prepared to spend a little time with it before it all makes intuitive sense.

In essence, a lambda expression follows the pattern aid down by a delegate, so to begin we start by declaring a delegate type to which lamba expressions should be assigned; in the PDC preview documentation, for example, they use this example:

delegate R Func<A, R>(A arg);
For those of you unfamiliar with delegates and generics syntax in use here, we are declaring a generic delegate type that, when constructed, will expect a single argument (the generic argument A) and return a value (the generic argument R). Thus, if we wanted to create an instance of Func around a method that takes an int and returns an int, the delegate instantiation syntax would normally look like:
Func<int, int> f1 = new Func<int, int>(MyClass.MyMethodTakingAnIntAndReturningAnInt);
But in the scenario where that method is a one-off, it's somewhat wasteful to have to write a complete method body inside of a class just for this. For example, if MyMethodTakingAnIntAndReturningAnInt is just multiplying the parameter by itself (a squaring function, in short), then it's a real waste of at least three or four lines of code to write it out as a formal, named method. This was where anonymous methods kicked in, so we could write it as:
Func<int, int> f1 = delegate(int i) { return i * i; };
But many feel that even this syntax is too unintuitive for casual use, so instead, in C# 3.0, a lambda expression can be used:
Func<int, int> f1 = x => x + 1;
And, as with all delegates, once constructed, any of the three versions can be invoked using the same syntax:
Console.WriteLine(f1(12)); // prints 144
So, in essence, the lamba expression is an easier way to write a delegate. Or, perhaps more correctly, to write an expression body. The C# Preview docs describe lambdas as "a functional superset of anonymous methods, providing the following additional functionality:
  • "Lambda expressions permit parameter types to be omitted and inferred whereas anonymous methods require parameter types to be explicitly stated.
  • "The body of a lambda expression can be an expression or a statement block whereas the body of an anonymous method can only be a statement block.
  • "Lambda expressions passed as arguments participate in type argument inference and in method overload resolution.
  • "Lambda expressions with an expression body can be converted to expression trees."
But goes on to note that as of the PDC Preview, lamba expressions with a statement block body are not yet supported. (Hey, it's not even an alpha yet, you have to expect a few of those kinds of wrinkles.) For right now, if you want statement block body lamba expressions, the anonymous method delegate syntax has to be used.

The last of the new features of C# 3, the query language features, isn't really a language feature per se, but a close integration of the compiler and expected library support it's compiling against, and as such doesn't really openly qualify as "language innovation", in my opinion. That said, though, it's damn useful, and what's more interesting, Java actually has a tool that can provide this kind of capability already--the OpenJava compiler tool (from the same folks that brought you Javassist, the bytecode manipulation tool that is at the heart of JBoss, among other open-source projects), which allows you full metaobject protocol capabilities, including the ability to add new keywords to the language.

And that's ultimately my point here: as you've seen, nothing that C# 3.0 introduces is really all that revolutionary once we get past the compiler--even the extension methods and lambda expressions are defined in terms of what's already present within the language and framework, making the entire exercise one in compiler syntactic sugar. Very sweet, very addictive sugar, perhaps, but just syntactic sugar nonetheless. And yet, because these features are still built in terms of the CLR, it means that we have full fidelity static-typing, even through the syntactic sugar (unlike what happens in the case of Java generics).

For ten years, Sun has insisted that Java Language and Java Virtual Machine must remain in lockstep, and as a result the language innovation in Java has either completely stagnated (the only real language innovation in Java 5 was the custom annotations model, and that was almost a direct copy of what .NET had done before), or else occurred outside of Sun's--and therefore "official Java"'s--boundaries. Sun needs to realize that the strength of the JVM by far exceeds the limited language potential of the Java language, and if they don't want to watch Java's popularity begin a steady decline, they need to cut the umbilical and let the JVM run free and the language innovation truly begin. Otherwise, it's looking like a very CLR world ahead of us.


Java/J2EE | .NET | C++

Wednesday, September 21, 2005 6:33:38 PM (Pacific Daylight Time, UTC-07:00)
Comments [16]  |  Related posts:
Tech Predictions, 2014
On Endings
Seattle (and other) GiveCamps
On speakers, expenses, and stipends
On startups
Farewell, Mr. Ballmer
Tracked by:
"Ted explains C# 3.0 in depth" (Panasonic Youth) [Trackback]
"C# 3.0" (Stefan Tilkov's Random Stuff) [Trackback]
"Spiegazione delle novit" (PhilloPuntoIt) [Trackback]
"The C# 3.0 Ride" (World Wide Webber) [Trackback]
"Two interesting articles - XLinq paper from Eric Meijer and C# 3.0 Features wra... [Trackback]
http://www.vasanth.in/2005/09/25/LanguageInnovationC30Explained.aspx [Pingback]
"Ted on C# 3.0" (Harry Pierson's DevHawk Weblog) [Trackback]
"Ted on C# 3.0" (DevHawk) [Trackback]
"Is C# Becomming VB?" (Don Kiely's Technical Blatherings) [Trackback]
"C# 3.0 Features" (Chris Breisch) [Trackback]
"C# 3.0 Features" (Chris Breisch) [Trackback]
"C# 3.0 : Lax Programming" (Nathan J Pledger) [Trackback]
"It Seemed Like A Good Idea at the Time" (Dare Obasanjo aka Carnage4Life) [Trackback]
"I used my first closure in C# 2.0!" (Jeremy D. Miller -- The Shade Tree Develop... [Trackback]
"http://9nn-information.info/12657468/index.html" (http://9nn-information.info/1... [Pingback]
"http://9nb-information.info/86101790/index.html" (http://9nb-information.info/8... [Pingback]
"http://9np-information.info/51124009/copyright-laws-commercial-tv-signal-restau... [Pingback]
"http://9nn-information.info/02069473/index.html" (http://9nn-information.info/0... [Pingback]
"http://9ny-information.info/14920072/index.html" (http://9ny-information.info/1... [Pingback]
"http://9na-information.info/33074771/index.html" (http://9na-information.info/3... [Pingback]
"http://9nc-information.info/16772041/index.html" (http://9nc-information.info/1... [Pingback]