JOB REFERRALS
    ON THIS PAGE
    ARCHIVES
    CATEGORIES
    BLOGROLL
    LINKS
    SEARCH
    MY BOOKS
    DISCLAIMER
 
 Thursday, May 06, 2010
Code Kata: Compressing Lists

Code Katas are small, relatively simple exercises designed to give you a problem to try and solve. I like to use them as a way to get my feet wet and help write something more interesting than "Hello World" but less complicated than "The Internet's Next Killer App".

 

Rick Minerich mentioned this one on his blog already, but here is the original "problem"/challenge as it was presented to me and which I in turn shot to him over a Twitter DM:

 

I have a list, say something like [4, 4, 4, 4, 2, 2, 2, 3, 3, 2, 2, 2, 2, 1, 1, 1, 5, 5], which consists of varying repetitions of integers. (We can assume that it's always numbers, and the use of the term "list" here is generic—it could be a list, array, or some other collection class, your choice.) The goal is to take this list of numbers, and "compress" it down into a (theoretically smaller) list of numbers in pairs, where the first of the pair is the occurrence number of the value, which is the second number. So, since the list above has four 4's, followed by three 2's, two 3's, four 2's, three 1's and two 5's, it should compress into [4, 4, 3, 2, 2, 3, 3, 1, 2, 5].

Update: Typo! It should compress into [4, 4, 3, 2, 2, 3, 4, 2, 3, 1, 2, 5], not [4, 4, 3, 2, 2, 3, 3, 1, 2, 5]. Sorry!

Using your functional language of choice, implement a solution. (No looking at Rick's solution first, by the way—that's cheating!) Feel free to post proposed solutions here as comments, by the way.

 

This is a pretty easy challenge, but I wanted to try and solve it in a functional mindset, which the challenger had never seen before. I also thought it made for an interesting challenge for people who've never programming in functional languages before, because it requires a very different approach than the imperative solution.

 

Extensions to the kata (a.k.a. "extra credit"):

  • How does the implementation change (if any) to generalize it to a list of any particular type? (Assume the list is of homogenous type—always strings, always ints, always whatever.)
  • How does the implementation change (if any) to generalize it to a list of any type? (In other words, a list of strings, ints, Dates, whatever, mixed together within the list: [1, 1, "one", "one", "one", ...] .)
  • How does the implementation change (if any) to generate a list of two-item tuples (the first being the occurence, the second being the value) as the result instead? Are there significant advantages to this?
  • How does the implementation change (if any) to parallelize/multi-thread it? For your particular language how many elements have to be in the list before doing so yields a significant payoff?

By the way, some of the extension questions make the Kata somewhat interesting even for the imperative/O-O developer; have at, and let me know what you think.


Thursday, May 06, 2010 7:10:07 PM (Pacific Daylight Time, UTC-07:00)
Using Haskell:

map (length &&& head) . group

This one returns a list of pairs.
Friday, May 07, 2010 2:03:12 AM (Pacific Daylight Time, UTC-07:00)
Clojure (formatted nicely) at http://gist.github.com/393215
Friday, May 07, 2010 2:16:47 AM (Pacific Daylight Time, UTC-07:00)
The four 2's don't appear in the result in your example, I assume this is a typo?
Olivier
Friday, May 07, 2010 2:39:57 AM (Pacific Daylight Time, UTC-07:00)
Scala with a fold right (generic, returns a list of pairs) :

def compress[T](l: List[T]): List[(Int, T)] = {
def count(elem: T, tuples: List[(Int, T)]) = tuples match {
case Tuple2(i, `elem`) :: end => Tuple2(i + 1, elem) :: end
case _ => Tuple2(1, elem) :: tuples
}
(l :\ List[(Int, T)]())(count)
}
Olivier
Friday, May 07, 2010 2:40:26 AM (Pacific Daylight Time, UTC-07:00)
Python:

list(itertools.chain.from_iterable(([len(list(g)), k] for k,g in itertools.groupby(d))))

and to have count/key pairs, just remove the chain, i.e:

[[len(list(g)), k] for k,g in itertools.groupby(d)]
Friday, May 07, 2010 4:20:40 PM (Pacific Daylight Time, UTC-07:00)
I agree with @Oliver, isn't the correct answer [4, 4, 3, 2, 2, 3, 4, 2, 3, 1, 2, 5]? Where are the unit tests? :)
Dion Dock
Sunday, May 09, 2010 1:17:03 AM (Pacific Daylight Time, UTC-07:00)
It really takes quite some time to become familiar with FP; I can't believe how long it took me to come up with a solution as convoluted as this one (in Clojure): http://gist.github.com/395016

Sunday, May 09, 2010 2:49:21 AM (Pacific Daylight Time, UTC-07:00)
@Dion, @Olivier--

Alas, BlogMock is still in an alpha state. ;-)

Blog entry fixed--thanks for the catch.

@ the rest--

Thanks for playing! I'll put forth a few more soon.
Monday, May 10, 2010 6:01:33 AM (Pacific Daylight Time, UTC-07:00)
(use '[clojure.contrib.seq-utils :only (flatten partition-by)])

(defn compress-2
"Compresses a list"
[s]
(flatten (map #(list (count %) (first %))
(partition-by identity s))))
Juan Manuel
Monday, May 10, 2010 10:13:15 AM (Pacific Daylight Time, UTC-07:00)
Here's my take on a Clojure 1.2 solution; I could have written in more succinctly but I prefer some readability:

(defn rle [lst]
(let [partitioned (partition-by identity lst)
assembler (fn [part] [(count part) (first part)])]
(mapcat assembler partitioned)))

Clojure doesn't care about types, so this will work for any type of sequence. Also, this is lazy (because partition-by and mapcat are lazy), so it can work on infinite sequences. If you want tuples, replace the mapcat function with map. I haven't played with parallelization, but chances are there's nothing to be gained (the Clojure docs are pretty specific that parallel operations aren't a win unless there's I/O or considerable processing involved). However, changing the last s-expression to:

(apply concat (pmap assembler partitioned))

should run most of the algorithm in parallel ... but since partition-by doesn't work in parallel, not much is to be gained.
Tuesday, May 11, 2010 1:39:19 AM (Pacific Daylight Time, UTC-07:00)
My Scala version:

def compress(l: List[Int]): List[Int] = {
l match {
case Nil => Nil
case x :: xs => List(xs.takeWhile(_ == x).length + 1, x) ::: compress2(l.dropWhile(_ == x))
}
}
Michael Kebe
Thursday, May 13, 2010 6:30:34 PM (Pacific Daylight Time, UTC-07:00)
Another new feature in Clojure 1.2 is the ->> macro:

(defn rle2 [lst]
(->>
(partition-by identity lst)
(mapcat #(vector (count %) (first %)))))

->> is a macro that turns a series of forms into a pipeline. The first form is evaluated and added to the end of the second form (i.e., as a new parameter). If there were more forms, this would continue, with each evaluation being passed to the next form as the last value. Effectively, ->> rewrites the code to the (less readable):

(defn rle2 [lst] (mapcat #(vector (count %) (first %)) (partition-by identity lst)))

Monday, May 24, 2010 3:38:18 PM (Pacific Daylight Time, UTC-07:00)
Using LINQ and C# 4.0:

var results = from n in new List<int>() { 4, 4, 4, 4, 3, 2, 2, 2, 3, 3, 2, 2, 2, 2, 1, 1, 1, 5, 5 }
.GroupBy(i => i)
select new Tuple<int,int>(n.Key, n.Count());

Currently returns a set of tuples with the pairs.

-Simon
Comments are closed.