JOB REFERRALS
    ON THIS PAGE
    ARCHIVES
    CATEGORIES
    BLOGROLL
    LINKS
    SEARCH
    MY BOOKS
    DISCLAIMER
 
 Thursday, February 21, 2008
Static considered harmful?

Gilad makes the case that static, that staple of C++, C#/VB.NET, and Java, does not belong:

Most imperative languages have some notion of static variable. This is unfortunate, since static variables have many disadvantages. I have argued against static state for quite a few years (at least since the dawn of the millennium), and in Newspeak, I’m finally able to eradicate it entirely.

I think Gilad conflates a few things, but he's also got some good points. To the dissecting table!

To begin:

Static variables are bad for security. See the E literature for extensive discussion on this topic. The key idea is that static state represents an ambient capability to do things to your system, that may be taken advantage of by evildoers.

Eh.... I'm not sure I buy into this. For evildoers to be able to change static state, they have to have some kind of "poke" access inside the innards of your application, and if they have that, then just about anything is vulnerable. Now, granted, I haven't spent a great deal of time on the E literature, so maybe I'm missing the point here, but if an attacker has data-manipulability into my program, then I'm in a whole world of pain, whether he's attacking statics or instances. Having said that, statics have to be stored in a particular well-known location inside the process, so maybe that makes them a touch more vulnerable. Still, this seems a specious argument.

Static variables are bad for distribution. Static state needs to either be replicated and sync’ed across all nodes of a distributed system, or kept on a central node accessible by all others, or some compromise between the former and the latter. This is all difficult/expensive/unreliable.

Now this one I buy into, but the issue isn't the "static"ness of the data, but the fact that it's effectively a Singleton, and Singletons in any distributed system are Evil. I talked a great deal about this in Effective Enterprise Java, so I'll leave that alone, but let me point out that any Singleton is evil, whether it's represented in a static, a Singleton object, a Newspeak module, or a database. The "static"ness here is a red herring.

Static variables are bad for re-entrancy. Code that accesses such state is not re-entrant. It is all too easy to produce such code. Case in point: javac. Originally conceived as a batch compiler, javac had to undergo extensive reconstructive surgery to make it suitable for use in IDEs. A major problem was that one could not create multiple instances of the compiler to be used by different parts of an IDE, because javac had significant static state. In contrast, the code in a Newspeak module definition is always re-entrant, which makes it easy to deploy multiple versions of a module definition side-by-side, for example.

Absolutely, but this is true for instance fields, too--any state that is modified as part of two or more method bodies is vulnerable to a re-entrancy concern, since now the field is visibly modified state to that particular instance. How deeply do you want your code to be re-entrant? Gilad's citation of the javac compiler points out that the compiler was hardly re-entrant at any reasonable level, but the fact is that the compiler *could* have been used in a parallelized fashion using the isolational properties of ClassLoaders. (Its ugly, and Java desperately needs Isolates for that reason.)

Static variables are bad for memory management. This state has to be handed specially by implementations, complicating garbage collection. The woeful tale of class unloading in Java revolves around this problem. Early JVMs lost application’s static state when trying to unload classes. Even though the rules for class unloading were already implicit in the specification, I had to add a section to the JLS to state them explicitly, so overzealous implementors wouldn’t throw away static application state that was not entirely obvious.

This one I can't really comment on, since I'm not in the habit of writing memory-management code. I'll take Gilad's word for it, though I'm curious to know why this is so, in more detail.

Static variables are bad for for startup time. They encourage excess initialization up front. Not to mention the complexities that static initialization engenders: it can deadlock, applications can see uninitialized state, and unless you have a really smart runtime, you find it hard to compile efficiently (because you need to test if things are initialized on every use).

I'm not sure I see how this is different for any startup/initialization code--anything that the user can specify as part of startup will run the risk of deadlocks and viewing uninitialized state. Consider the alternative, however--if the user didn't have the ability to specify startup code, then they would have to either write their own, post-runtime, startup code, or else they have to constantly check the state of their uninitialized objects and initialize them on first use, the very thing that he claims is hard to compile efficiently.

Static variables are bad for for concurrency. Of course, any shared state is bad for concurrency, but static state is one more subtle time bomb that can catch you by surprise.

Absolutely: any shared state is bad for concurrency. However, I think we need to go back to first principles here. Since any shared state is bad for concurrency, and since static data is always shared by definition, it follows that static data is bad for concurrency. Pay particular attention to that chain of reasoning, however: any shared state is bad for concurrency, whether it's held by the process in a special non-instance-aligned location or in an data store that happens to be reachable from multiple paths of control. This means that your average database table is also bad for concurrency, were it not for the transactional protections that surround the table. This isn't an indictment of static variables, per se, but of shared state.

Gilad goes on to describe how Newspeak solves this problem of static:

It may seem like you need static state, somewhere to start things off, but you don’t. You start off by creating an object, and you keep your state in that object and in objects it references. In Newspeak, those objects are modules.

Newspeak isn’t the only language to eliminate static state. E has also done so, out of concern for security. And so has Scala, though its close cohabitation with Java means Scala’s purity is easily violated. The bottom line, though, should be clear. Static state will disappear from modern programming languages, and should be eliminated from modern programming practice.

I wish Newspeak were available for widespread use, because I'd love to explore this concept further; in the CLR, for example, there is the same idea of "modules", in that modules are singleton entities in which methods and data can reside, at a higher level than individual objects themselves. Assemblies, for example, form modules, and this is where "global variables" and "global methods" exist (when supported by the compiling language in question). At the end of the day, though, these are just statics by another name, and face most, if not all, of the same problems Gilad lays out above. Scala "objects" have the same basic property.

I think the larger issue here is that one should be careful where one stores state, period. Every piece of data has a corresponding scope of accessibility, and developers have grown complacent about considering that scope when putting data there: they consider the accessibility at the language level (public, private, what-have-you), and fail to consider the scope beyond that (concurrency, re-entrancy, and so on).

At the end of the day, it's simple: static entities and instance entities are just entities. Nothing more, nothing less. Caveat emptor.


Thursday, February 21, 2008 8:58:53 PM (Pacific Standard Time, UTC-08:00)
At least in Java static state is not something special because it is assigned to an Instance of java.lang.Class. Therefore I don't see so much difference between static and non-static state. Furthermore in Java this can lead to strange results when a Singleton is suddenly more than one instance due to class loading in which a class is defined by its fully qualified class name and its class loader. This way I can create a class, load it by multiple class loaders and each of them can have a singleton. One might think that is pure academic but it is not. A simple EJB applications all containing the same class file will result in such a scenario in any modern J2EE application server.

In my view static is not more or less evil than any code construct AS LONG AS developer KNOWS what he/she is doing. Unfortunately this is mostly not the case leading to a lot of problems but also secure my job as Java consultant.

I, for example, used static members in an EJB to make sure that I can manage concurrency between the different instances of this EJB. There is not other way to accomplish that because the developer is not managing the EJB instances.

Have fun - Andy
Friday, February 22, 2008 1:18:03 AM (Pacific Standard Time, UTC-08:00)
A module isn't a singleton. It's a unit of code packaging to which is added metadata to form an assembly. Assemblies don't form modules. Modules form assemblies!
RichB
Friday, February 22, 2008 1:54:25 AM (Pacific Standard Time, UTC-08:00)
For evildoers to be able to change static state, they have to have some kind of "poke" access inside the innards of your application, and if they have that, then just about anything is vulnerable. Now, granted, I haven't spent a great deal of time on the E literature, so maybe I'm missing the point here


You have missed the point. Read up on Capability Security, and the Principle of Least Privilege. The main idea is to make it so that parts of 'your program' only have access to the other parts that they absolutely need to. Snippets of code should only be trusted to the extent that they needs to be. This can be further generalized across a network, like in E.
mind
Friday, February 22, 2008 2:02:12 AM (Pacific Standard Time, UTC-08:00)
@mind:

I am familiar with the Principle of Least Privilege; what I fail to see is how access to a static will create a security hole. I've been looking through the capability bits on the a@www.erights.org@E-site, and it's still a little farfetched to me how this is a major concern.

@RichB:

Dude, you and I know that. But the distinction is lost on 98% of the .NET developers out there, much less the Java and C++ and other non-.NET folks in the world, particularly when you take into account the fact that to a Rubyist, a "module" is a completely different creature. Not to mention that for all intents and purposes, it's 1 module per assembly, at least according to most of the .NET compilers in the wild. :-) (And, by the way, a module is a singleton within the .NET process, since you can't have more than one instance of a given module within that AppDomain. This is why the .module declaration in ILAsm only requires a name parameter.)
Saturday, February 23, 2008 2:11:05 PM (Pacific Standard Time, UTC-08:00)
On the capability based security a la E, what I think you're missing is an architecture which enables security in a very simple way, rather than an absolute increase in security. With security, the tradeoff is always ease of use versus security. With capabilities, nobody can do anything unless you give them something to do it *with*. This architecture enables simple, intuitive security, but obviously enough it's also impossible to guarantee without banning static state. To make it really usable, you shouldn't need to have to thread variables throughout your call graph, though, and that's where nested classes, with the outermost class acting like a module, and no visibility between top-level modules, starts to make sense - it makes the capability hand-off tractable.

On the startup time point, I think you reveal how deeply ingrained static state is in your thinking. If one has banned static state, the phrase "check the state of uninitialized objects" *has no meaning* - every object is initialized by its constructor, and you can't get into a situation whereby you need to check on each call whether initialization is done or not. Every call is performed on an already-initialized object by design - it can't be any other way.

-- Barry
Sunday, February 24, 2008 1:23:06 PM (Pacific Standard Time, UTC-08:00)
I find this statement to be particularly amusing:

You start off by creating an object, and you keep your state in that object and in objects it references.


The author haven't eliminated static variables at all; all he did was just moved them to a different location.
Jonathan Allen
Thursday, February 28, 2008 10:33:03 AM (Pacific Standard Time, UTC-08:00)
I like to build big complex systems from lots of small pieces, usually singleton "service" objects (that have no client-specific internal state). Pragmatically speaking, static variables get in the way.

Using the traditional static getSingleton() method limits you in many ways:

You are limited to the lifecycle defined by the static variable itself.

You are limited to the instance provided by the getSingleton() method, even if that gets in the way of unit or integration testing (or some of your more complicated use cases). This is tight and early binding.

You are limited to a single instance even if the library defining the variable is shared across multiple applications.

These are the factors that drive the use of Dependency Injection, particularily the freedom to defer binding later and later and later.

Comments are closed.