JOB REFERRALS
    ON THIS PAGE
    ARCHIVES
    CATEGORIES
    BLOGROLL
    LINKS
    SEARCH
    MY BOOKS
    DISCLAIMER
 
 Wednesday, January 11, 2006
LINQ paper comments and feedback

A number of you have made comments about my LINQ paper, and rather than respond in comments in turn, I thought I'd gather them up and respond to them en masse. So, without ado....

Stu Smith said:

Nice article. Two things occur to me immediately...
  1. Based on my current understanding of LINQ, it's purely for querying, and so compared to most O/R systems it lacks caching support. (ie its queries may be optimal but that's not much consolation if it keeps re-executing them).
  2. The O/R system we use (which admittedly only needs to support a particular kind of application) solves the 'lazy load vs N+1' issue by generating joined queries based on either the path taken to the data, or on particular routes to a marked-up tables. ie, if you navigate from a Customer to a collection of Orders, that's a single select. If you then start iterating the Orders and inner-iterating the OrderDetails, then on the first one a second joined select is issued, and for subsequent iterations the data is already cached and thus no further SQL statements are emitted.
Thanks, Stu. Responses:
  1. No, LINQ can, in fact, do any sort of relational manipulation, including INSERTs, UPDATEs and DELETEs, but the real strength of the language integration is in the query aspects, particularly the fact that LINQ can do these queries across any rectangular data store, so it's fair to say that LINQ is mostly about query.
  2. I'd be interested to hear more about your solution to solve the lazy-load issues, partcularly how you handle the situation where you need to display only a small part of the full OrderDetails data--remember, part of the criticism of O/R is that they either eager-load too much, or lazy-load too much, and can't infer the amount or areas of data to retrieve that's "just right".

Bryant wrote:

I thought the article was well written and informative. While I think it's cool that I might one day be able to use the same object model to query databases, XML documents, and even the file system I still feel compelled to look back upon my days as a DBA. This looks a lot like building ad hoc SQL statements in the code except we're a little more type safe here. LINQ still does not answer the question of storage abstraction. The developer still needs an intimate knowledge of the database structure. So, LINQ appears to cover "Conflicting type systems" and maybe "Transactional boundaries" and "Query/access capabilities". There are still four more items in your list that I don't see being solved with LINQ. Your article was a great read but sorry, I am still not excited.
We are building ad-hoc statements in the code, although this depends slightly on your definition of "ad-hoc statements" (my early experiences lead me to a definition that says "ad-hoc statements" means "users can throw SQL at the database", whereas "developers throwing SQL at the database" isn't ad-hoc, as the SQL itself is known prior to execution; I can see where others' definitions may vary on this, however). I do have to point out, however, that LINQ *does*, in fact, answer the question of storage abstraction, though perhaps not to the degree you prefer. I see LINQ's ability to hide the difference between in-memory storage and external-database storage as storage abstraction, but what it does not do (rightly, in my opinion) is try to hide the differences between rectangular (relational), hierarchical (XML) and referential (objects) storage. That is the area where the impedance mismatches kick in, and that's what's the hard part to solve. As to the last four items, well, one could always say they're not done yet... :-) Seriously, I think it's a great start, and my excitement comes not necesarily from what LINQ can do right now, but from the idea that it opens up and explores an entirely new avenue of research that nobody else seemed to be interested in exploring.

James commented:

I thought the article was great! I'm a little unclear as to why it's not getting ranked better on MSDN but for someone who didn't really understand/appreciate the problem domain LINQ is serving, your article really got me thinking and cleared up a lot of fogginess in my mind. Excellent work!
Thanks for the praise, James, and as for MSDN's ranking schemes, a couple of other MSDN authors have suggested that there's some "article assassination" going around the site, so maybe that's it. I'm glad you find it intriguing and that it "got you thinking"--that, in many ways, was the point in the first place. :-)

Andy Maule said:

Very interesting! I liked the discussion of Rail's ActiveRecord which I think is an approach that most people miss when talking about OR Mappings.

There's a good research paper discussing the same stuff here. It mentions something recently developed for doing statically typed queries in Java 'Native Queries' which is an interesting comparison to LINQ. Anyone interested in this area should take a look.

I'm currently doing a PhD in this area, and I have to say that LINQ is making things very interesting.
Well, good luck on your PhD, and thanks for the link--I'm definitely interested in following up on anybody who's pursuing this in the Java space. Along those same lines, Marius Gherorghe pointed out that
Karmencita is my lightweight alternative for in memory object querying.
and again, I appreciate the link.

As for the rest of you who offered kudos (Bart De Boeck, Dan Kahler, Paul Wilson, Eric Bachtal), thanks; every author likes to know that their work is appreciated, particularly when so much of what they say seems to stir up more controversy than discussion. :-)


Thursday, January 12, 2006 5:14:09 AM (Pacific Standard Time, UTC-08:00)
Thanks for the reply to my comment.

I'd better start with a brief description of the sort of applications we write here. I'm working on an accounts package, where the unit of work is the client, ie a client of the accountant we sell our software to. Optimistic locking is out of the question here for usability reasons, so a client (and associated data) is only ever modified by one user at a time.

1) I didn't express myself very clearly with regards LINQ, as I was concerned about repeated queries to the database. Since only one user may alter a client at a time, once I've loaded in a certain bit of data, there's no point re-loading it every time it's used. Am I right in thinking that were I to use LINQ for this sort of application, I'd probably want to add my own caching layer above it?

2) You're absolutely right, we can't handle the situation where you only want some of the OrderDetails. However, I don't believe there can be an automated method of knowing that in advance. Our schema (written as little XML files) generates three things: database creation scripts, a thin data access layer (which presents the object/collection interface above, and talks to the data server layer below), plus a schema description which is used by (amongst other things) the query generator. We can mark-up the schema by defining points from which joined queries can be made. We choose these where the table represents a logical root of a block of data, and where the fan-out is relatively high (ie, where there will likely be a lot of iteration within).

In the following example, each of the marked SQL statements is issued only once, and is issued only as the data is needed. If I hadn't started to iterate the OrderItems, the system wouldn't have loaded them wouldn't have loaded them. (I've trimmed the SQL slightly for readability; in particular I've changed it from parameterized SQL).

Customer customer = CustomerCollection.FromClientCode( se, "STU01" );

--> SELECT [CustomerID], [ClientCode], etc FROM [Customer] WHERE [ClientCode] = 'STU01'

foreach( Order order in customer.OrderCollection )

--> SELECT [OrderID], [CustomerID], etc FROM [Order] WHERE [CustomerID] = 1234

{
foreach( OrderItem item in order.OrderItemCollection )

--> SELECT t1.[ItemID], t1.[OrderID], t1.[ProductCode], t1.[Quantity], t1.[UnitPrice]
--> FROM ( [Order] AS t0 INNER JOIN [OrderItem] AS t1 ON t0.[OrderID] = t1.[OrderID] )
--> WHERE t0.[CustomerID] = 1234

{
total += item.GetUnitPrice().Value;
}
}

Although it could be seen as a shortcoming that when we start to iterate the OrderItems, we get all the items for that customer, in practice the application would never want just some of them -- when one is loaded, the others will all be required soon. The "intelligence" comes from the developer who writes the schema -- and the advantage here is that if we get it wrong, we just tweak the schema, based on profiling our queries, and that produces a fix across the whole application.

I appreciate of course that LINQ has to be more of a general solution (although like eg DataSets it does seem to have more of a web-app bent than a desktop application file-open-file-save one), and that our system is tailored to one pattern of data-access.

I'm looking forward to LINQ but I don't think I'd want to see it even in my object/collection layer, rather I'd hide it in a lower-level layer. I think there are two main dangers otherwise; one is that queries would be so easy to write that a developer might forget the consequence of repeated data-loads; and secondly, the optimal select is now baked into the code (which is difficult to change), rather than being driven by the schema (which is easier to change).

(And of course it wouldn't give us all the other nice features we already have like long transactions and offline working).

Stu
Comments are closed.