2 Problems

Posted by david

1 - How can you verify that the destination of a hyperlink is what the description claims?

2 - How can Alice give data to Bob and ensure that Bob does not give that data to Mallory?

Why not make your business cards useful?

Posted by david

Last weekend, as I was cleaning out some old papers, I came across a stack of business cards I had collected at a conference, and it gave me an idea. Typically, when I get a business card, I enter the address into my address book or Highrise and then throw it away. It seems like a waste - it would be more efficient to just enter that information directly, but that would take too much time in any sort of social context.

As far as I can tell, the reason you may want to offer business cards is that you provide a product or service, and you want to card recipient to remember your name and contact you when he finds himself in need of whatever it is you provide. If that's the case, then the best thing you could hope for (with respect to the business card) is that the recipient would tape the card to his monitor so that when he found himself in need of your product or service, it would be staring him in the face. That seems unlikely to happen, but here is a way to make it happen in at least a few cases: put some immediately useful information on the card.

Ideally, the information you put on your card would be something that is hard to remember but frequently useful. Suppose you are targetting software developers. Most software developers I know have post-it notes of useful snippets (often obscure command-line flags) attached to their monitor. If you can guess what sort of snippet would be useful to them, you're golden.

As an example, the following snippet will find all of the files that have not been added to (or ignored in) a subversion repository and add them:

svn st | grep ? | awk '{print $2}' | xargs svn add

There are many ways to do the same thing, but a surprising number of programmers seem to use subversion and know only how to manually add new files.

As another example, I can never remember the Mac startup key combinations since I use them infrequently, but when I need them, it's really inconvenient to look them up (since my Mac is probably powered down at that point).

Even if you don't succeed in getting developers to tape your card to their monitors, you should succeed in getting their attention, which is all that most business cards seem to strive for now anyway.

Learning Haskell

Posted by david

I have a confession to make.

One piece of the Pragmatic Programmer's advice that I often echo to other programmers is to learn a new programming language every year. Well, 2007 is done, and I let it go by without learning a new language. It's time to correct that, so I've decided to really dive into Haskell.

Why Haskell?

I'm not deluded enough to think that I'm going to be using it any time soon (if at all) in my day-to-day work. In fact, I'm skeptical that it will be useful even for my after-hours projects. However, I don't think that any other language can currently be more effective at teaching me general programming concepts.

Here are some of the concepts I'm hoping to grok by learning Haskell:

  • lazy evaluation
  • monads (and more general category theory)
  • software transactional memory

My approach to learning Haskell

My main resources for learning Haskell are guides I've found on the web. Haskell for C Programmers looks like it will give me a decent overview, and from there, I expect to jump to A Gentle Introduction to Haskell. About a year ago, I read a little gem of a book called Purely Functional Data Structures. While that book uses Standard ML for its examples, it provides Haskell examples in the back. I also intend to work through the tutorial "Write Yourself a Scheme in 48 hours", which shows you how to implement a subset of Scheme in Haskell.

To be honest, I expect to have less difficulty learning the language itself than figuring out how to get anything done with it - learning what libraries are available and how to use them, learning how to best setup my environment for rapid development, figuring out the best way to deploy a Haskell application, etc. In my experience, these things are the toughest aspects of moving to a less-mainstream language. For example, even if you can find a library that does what you need, if it is not well-documented, you'll have difficulties finding usage examples on the web. And because it's less-used, libraries are more likely to be buggy or incomplete. And frankly, learning this stuff is a lot less fun than learning a language itself, so it seems to be even more effort than it really is.

As a starting point, I'm lucky that Paul Brown has just done the same thing (rewrite his blog in Haskell) and has some useful writings about the process. My first steps have been to get FastCGI working with Haskell, and that was pretty straightforward, thanks to Paul's post on the subject.

My approach to implementing the blog

At this point, I plan to use lighttpd with FastCGI for the web portion. I'm familiar with both from the time when they were your best bet for deploying Rails applications. I'm not sure what I'm going to do about the stuff a web framework typically gives you (URL routing, reponse templates, etc.). I'm just writing a blog here, so I don't need anything fancy, and I'm not too concerned.

I'm also not sure what I'm going to do about persistence, but I'm pretty sure I'm not going to use a relational database. Getting Haskell to talk to a database seems to be a tripping point for plenty of people who have tried what I'm attempted, and I don't see the relational model as being appropriate for a publishing system anyway. I'm toying with the idea of using CouchDB for persistence, since I think JSon is an appropriate data format for exporting my current data, and getting Haskell to talk to CouchDB should be straightforward since I'm sure there's a decent HTTP client library available. However, it may be more appropriate to just use the filesystem (which is what Paul did). We'll see.

Whatever I do, I intend to blog my progress, there seems to be a growing interest in Haskell along with a scarcity of practical guides to getting started. When I hit a roadblock that takes me hours to get past, knowing that I could save someone else the same frustration will make it more tolerable (I hope). Updates will be slow, though, since this undertaking is competing with at least 2 other projects for my after-hours time. I'm looking forward to it - from what I've seen of the language so far, it possesses a beauty that surpasses even Ruby. I can't wait to see if it still looks beautiful after trying to actually use it.

You ain't got no soul power: Good-bad v. Bad-bad

Posted by david

Disclaimer: This post is only tangentally related to software development. I've had it written for a while, but I can't quite get it to say what I want. Then I realized that, given the subject matter, a crappy post is kind of appropriate.

I recently saw the film Troll 2 for the first time. If you haven't seen this film, stop reading this right now, and add it to your Netflix queue, order it from Amazon, do whatever you need to do to watch this. Troll 2 is easily the most enjoyably awful film I've ever seen. For the uninitiated, here are some highlights:

  • Despite the title, this film has nothing to do with the first Troll movie and is, in fact, about goblins, rather than trolls.
  • The monsters in the film are very obviously little people wearing potato sacks and rubber masks. With the exception of a couple close-up shots, their lips don't move at all.
  • The premise is that goblins feed humans green-colored food that causes the humans to morph into a half-human, half-plant being that the (vegetarian) goblins can then eat.
  • Dialogue:
    • "Do you see this writing? Do you know what it means? Hospitality. And you can't piss on hospitality! I won't allow it!"
    • Sister: "How do we get him to come? By having a seance maybe?" Brother: "You're a genius, big sister!"
    • "You're grandfather's death was very hard on all of us. It was hard on your sister, your father, and on me, his daughter."

I got my wife to watch this by saying we would only watch the first ten minutes and then turn it off, unless she wanted to keep going. She did. After it was over, she turned to me and said, "We need to order this from Amazon right now". This is the type of film that you want to show all of your friends, if only because you'll be quoting it for weeks on end. ("I'm tightening my belt by one loop so that I don't feel hunger pains. Your sister and mother will have to do likewise!")

In case I've not made it clear: this is not a good movie. The plot is ludicrous, the acting would be considered bad in a community theater, and the production qualities are abysmal. Troll 2 may be the worst movie ever made, surpassing such disasters as Manos, Hands of Fate and Plan 9 from Outer Space. However, I'd like to contrast it with another film I frequenty call the worst film ever made: Batman and Robin. I saw Batman and Robin in 1997, when it was in theaters. I haven't seen it since, and have no intention of doing so. Roger Ebert is fond of saying, "Every bad movie is depressing. No good movie is depressing.". When I hear this quote, I think of Batman and Robin, which depresses me in a way no other movie does.

To recap so far: in my mind, Batman and Robin and Troll 2 are each candidates for the worst film ever made, but one depresses me and the other brings me a lot of joy. How can this be?

The most obvious difference between the two films is the size of the budgets. Watching Batman and Robin, it's clear that it was an expensive film to make, and that every dollar spent on its production could have been better spent on just about anything else. In the case of Troll 2, it's obvious that little was spent on its production (one of the lead characters received just $1500 for his performance), so while some viewers may feel their 90 minutes have been wasted, it's hard to complain too much about what went into making the movie - I expect that the catering bill for the average studio film is higher than the budget for Troll 2. I don't think the reason why one film is so much fun while the other is just depressing is as simple as budget size, however.

The sins of Batman and Robin have been well-documented, so I won't repeat them here. If you were to survey the highest-grossing films circa 1996 (Independance Day, The Rock, Mission: Impossible), take the surface elements of those films, and cynically combine them in the hopes of creating a blockbuster, I think you'd get a result similar to Batman and Robin. Every element of that film feels as though all artistic decisions were made by a group of suits trying to guess what would be most marketable. I have vivid memories of watching late-night talk shows in the months leading up to Batman and Robin's release and seeing George Clooney (who played Batman) promoting the film several times on the different shows. Each time I saw him, he recounted the same anecdote: Although Batman is supposed to be a superhero, the Batman costume he wore during filming was so heavy that, as he put it, an eight-year-old could kick his ass. What struck me after hearing that story multiple times was how manufactured it felt - as though marketers had come up with a way for him to "show his human side". The movie itself, however, is sorely lacking any human element. All dialogue that I can remember from the film consists of either exposition or short lines that play well in trailers (for example, Robin: “Batgirl? That’s not very PC.”) In short, Batman in Robin feels manufactured, as if it were never touched by human hands.

In contrast, the human elements of Troll 2 stand out - not through dialogue or character development, as in good films, but through the incompetence of the whole thing. It's impossible to watch the hammy or stilted performances and not be fully aware that we are watching actors playing roles. The cheap costumes of the trolls make it impossible for us to ignore the people playing them. The scene where one character states "everyone's in bed at this time of night" when the sun is obviously shining reminds us that what we're seeing has been filmed and (poorly) edited. That human element, I think, is what makes watching Troll 2 so much fun. It feels like you are watching a group of people try their hardest to make a good movie despite the obvious limitations of a bad script, low budget, and sheer incompetence. It feels, in fact, like the type of movie I would end up making if I were to take my savings, hire some cameras, and make a film.

What Troll 2 made me realize is that there can be an enjoyment that comes from failure that isn't schadenfreude, but that there needs to be a sincerety to that failure. Who among us hasn't seen our efforts in something turn into a total disaster? In the best films, we connect deeply with the characters - their joys, sorrows, loves, successes, and failures. What makes Troll 2 so special is that rather than connect with the failures of the characters, we connect with the failures of the cast and crew. And we laugh. Not in a cruel way, but in the way we laugh when a baby makes a mess, when young lovers kiss awkardly, whenever, in someone's flaws, we see a bit of ourselves and recognize that these flaws are not fatal, but instead are part of what makes us human.

RSpec on Rails tip - Weaning myself off of fixtures

Posted by david

As I've used RSpec more, I've grown less and less fond of fixtures for Rails testing. In everything except model specs, I use mocks for the models and don't touch the database. Sometimes, however, I still want to test the database interaction in my model specs.

I'm working on a project that has plenty of existing test cases (both RSpec and Test::Unit) that depend on fixtures. As much as I'd like to eliminate that dependency, the show must go on, and I need to keep writing specs for new functionality. When I create a new model spec that does test database interaction, I want the database to be in a known state at the start of each spec - and the easiest known state to work with is empty, so today I whipped up the following snippet to empty out relevant tables at the beginning of each spec run:


# Wean ourselves off of fixtures
# Call this within the description (e.g., after the describe block) to remove everything from the associated class or table
module Spec
  module DSL
    module BehaviourCallbacks
      def reset_tables(*tables)
        callback = lambda do 
          tables.each do |table|
            ActiveRecord::Base.connection.delete("DELETE FROM #{table.to_s.tableize}")
          end
        end
        before_each_parts << callback
      end
    end
  end
end

If you drop this into your spec_helper.rb, then, in your specs, you can do the following:


require File.dirname(__FILE__) + '/../spec_helper'

describe BookOrders, "A newly created BookOrder" do

  reset_tables :book_orders, :books, :customer

  before(:each) do
    # setup some stuff
  end

  it "should require a customer" do
    # blah blah
  end

  # and so on
end

Custom XPath Matcher for RSpec

Posted by david

I was using RSpec on Rails to test generation of an RSS feed, and I was surprised that it did not include a built-in way to easily check XML output of a view (such as using XPath). It does have a way to check HTML output, so you can do something like the following:

1
2
3
4
5


response.should have_tag('ul') do 
  with_tag('li')
end

It may be tempting to use this to do simple matches on XML output as well. Don't give into that temptation. The have_tag matcher assumes it's working against HTML, and if you're not, you may see strange behavior if any of the XML tags you have share their names with HTML tags. E.g., the "link" tag in RSS 2.0 feeds will cause strict HTML parsing to fail.

Fortunately, RSpec makes adding your own custom matchers really easy. Thanks to a couple existing tutorials, such as this one, I was able to whip up a custom XPath matcher pretty quickly:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34


module Spec
  module Rails
    module Matchers
      class MatchXpath  #:nodoc:
        
        def initialize(xpath)
          @xpath = xpath
        end

        def matches?(response)
          @response_text = response.body
          doc = REXML::Document.new @response_text
          match = REXML::XPath.match(doc, @xpath)
          not match.empty?
        end

        def failure_message
          "Did not find expected xpath #{@xpath}\n" + 
          "Response text was #{@response_text}"
        end

        def description
          "match the xpath expression #{@xpath}"
        end
      end

      def match_xpath(xpath)
        MatchXpath.new(xpath)
      end
    end
  end
end

Where to Define Matchers

All that was left was the question of where to put the MatchXpath definition. It could go into my spec_helper file, but that's already starting to get cluttered, and I don't want this definition to be lost within a bunch of configuration code. Instead, I created a directory called "spec/matchers" and threw this definition in a file called "xpath_matches.rb" within that directory. To load up the definition, I added the following code to "spec_helper.rb":

1
2
3
4
5
6
7


matchers_path = File.dirname(__FILE__) + "/matchers"
matchers_files = Dir.entries(matchers_path).select {|x| /\.rb\z/ =~ x}
matchers_files.each do |path|
  require File.join(matchers_path, path)
end

Now, any matcher I define in the matchers directory will get picked up by the spec_helper file.

Using RSpec with BackgrounDRb Workers

Posted by david

A Rails app I'm working on performs some expensive operations that should be offloaded to another process, so I'm using this as an opportunity to try out BackgrounDRb. Because I'm doing BDD with RSpec, my first instinct, after installing and generating a worker, was to write a spec for my worker. However, Googling for the best way to do this turned up nothing, so I'm posting what I did. If you have a better way to do this, I'd love to hear it. If not, I hope this saves someone some time.

First, I created a new directory under my spec directory:

svn mkdir spec/workers

Then, I wrote the following in a file called collecting_worker_spec.rb in the newly-created workers directory (my worker is called CollectingWorker)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
 
require File.dirname(__FILE__) + '/../spec_helper'

describe CollectingWorker, "with feeds needing collection" do
  
  before(:each) do
    @worker = CollectingWorker.new
  end       
  
  it "should pull a single feed" do                        
    mock_collector = returning mock('collector') do |m|
      m.should_receive(:collect)
    end
    Collector.should_receive(:pop).and_return(mock_collector)
    @worker.do_work(true)
  end
end

For this spec to run correctly, though, I needed to add some code to my spec_helper.rb. This isn't pretty, but it is working for me:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

module BackgrounDRb
  module Worker
    class RailsBase 
      def self.register; end
    end
  end
end 

worker_path = File.dirname(__FILE__) + "/../lib/workers"
spec_files = Dir.entries(worker_path).select {|x| /\.rb\z/ =~ x}
spec_files -= [ File.basename(__FILE__) ]
spec_files.each do |path|
  require(File.join(worker_path, path))
end

All of the necessary RailsBase methods get mocked or stubbed in the spec itself. The class declaration is there so the necessary constants can be found when they're needed. There's probably a better place for that declaration, but the fact that so little code is needed for me to begin speccing my workers is a testament to the power of RSpec and its mocking facilities.

Development Database Maintenance

Posted by david

When working with a software development team, you typically want each developer to have their own database schema. That way, each developer is free to modify their schema as part of their development without getting in the way of other developers. When doing this, however, it's crucial to have the ability to trivially do the following:

  1. Rebuild the database from scratch
  2. Distribute changes to the schema

(1) is important because it's possible during development to put your database into a bad state that's difficult to back out of. If you can easily rebuild the database, you don't need to waste time figuring out how to back out changes that you've decided against.

(2) is important to keep the developers in synch. If developers need to rebuild their database to incorporate every schema change, they will do so less often than if they can run a script that brings their database up to date with the lasted schema in version control. And more importantly, by doing (2) as part of your development process, you have a way to test the changes that will ultimately be applied to your production database.

I don't if these are obvious, but the majority of projects I've seen do not have these processes in place. While I would place these processes among those that the best teams are following (such as automated deployments, continuous integration, test-driven development, etc.) they are not talked about as much among programmers.

The main thing that drew me to Ruby on Rails was that it incorporates so many good development practices - not only making them possible to use, but encouraging you to use them. Databases are no exception. Rails's method of defining database schemas in terms of migrations is the best way to accomplish practice (2) that I've seen. It also makes rebuilding the schema from scratch very easy, going a long way to accomplish practice (1).

However, rebuilding the database from migrations gives you an empty database. For your test database, this is what you probably want (using fixtures to populate the tables). However, you probably do not want an empty development database. Among other things, user interface problems (both bugs and usability issues) that are obvious when lots of data is on the screen may remain hidden when the only data present is what the developer has created in the process of trying out their own changes.

If you have a demo environment, used either by your sales force or as part of communicating with your customers, it's helpful for your development database to be populated with the same data as your demo database. Since your demo environment, by definition, is used to show off features of your product, it can also give you good data to work with during development. And by working with the data that will be used to demo your product, you may be able to avoid some nasty surprises during demo.

One way to copy data from your demo environment to your development enviroment is to just copy the database, DDL and all. However, this approach does not play well with migrations. Fortunately, if you're working with Rails, you have Ruby, Rails, and Rake to help you out. I use the following code to accomplish this task. Just copy it into a Rake file in your tasks directory and create a directory in the top level of your project called "gold" and another within that called "data". Then, running "rake gold:export" will pull all the data from within your database into a bunch of YAML files, one for each table. These files are structured like test fixtures. You can import the data using "rake gold:import". This approach has the advantage that your development/demo data can easily be stored in source control.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34


  namespace :gold do

    task :export do
      require RAILS_ROOT + '/config/environment'
      conn = ActiveRecord::Base.connection
      tables = conn.tables.reject {|i| i == 'schema_info'}
      tables.each do |table|
        filename = RAILS_ROOT + '/gold/data/' + table.pluralize + '.yml'
        open(filename, 'w') do |f|
          rows = conn.select_all("SELECT * FROM #{table}")
          rows.each do |row|
            f.puts("gold_#{row["id"]}:")
            row.keys.each do |key|
              f.puts "  #{key}: #{row[key]}"
            end
          end
        end
      end

    end

    task :import do
      require RAILS_ROOT + '/config/environment'
      require 'test_help'
      Dir.glob(RAILS_ROOT + '/gold/data/*.yml').each do |file|
        Fixtures.create_fixtures(
          RAILS_ROOT + '/gold/data', 
          File.basename(file, '.yml'))
      end
    end

  end

Vacation Reading

Posted by david

I'm leaving for a much-needed vacation in a couple days. In the weeks leading up to a trip, I tend to spend as much time thinking about what I'm going to read on that trip as I do thinking about what we're going to do when we get there. Since vacation is one of the few times I can spend long stretches of time reading, I tend to save books I've been wanting to read for those times.

As a rule, I don't bring my laptop on vacations (my wife appreciates this), so I try to avoid reading anything that will make me want to sit down and start coding right away. This rules out many technical books. I also tend to bring more than I think I will need (especially when travelling to a country where I don't speak the language) because of the fear that one of them will be a stinker.

On my last vacation (which happened to be my honeymoon), I read the following:

  • Beautiful Evidence, Edward Tufte

    I'm a huge Edward Tufte fan, and this is his latest book. It was difficult to wait until vacation to start reading this. While it's not my favorite of his books (The Visual Display of Quantitative Information remains that), it was still a pleasure to read, and it made the flight over the Atlantic go by quickly.

  • A Madman Dreams of Turing Machines, Janna Levin

    The subjects of this book about the lives of Alan Turing and Kurt Gödel are fascinating men. Unfortunately, this book was a little disappointing - it was easier to put down than I'd hoped it would be. The focus of it was on the mental struggles of these men with very little description of the mathematics involved. I'd love to find biographies of these two geniuses written by authors who did not assume their readers were afraid of mathematical detail.

  • Special Topics in Calamity Physics, Marisha Pessl

    This novel drew me in early. While it doesn't have the depth that the reviews of it suggest, Marish Pessl's use of language is entertaining, and the story moves quickly enough that the book seemed much shorter than its 500 pages.

After an embarassingly large amount of deliberating, I've decided to bring the following on my upcoming vacation:

  • Against the Day -- Thomas Pynchon's latest.

  • Compilers: Principles, Techniques, and Tools -- You know, the dragon book. I'm embarassed to have not yet read this and am looking forward to finally doing so. This may break my rule about bringing books that make me want to code, but that's a risk I'm willing to take.

  • Dreaming in Code -- I'm a sucker for stories about software projects.

After I get back from my vacation, I'm in Chicago just long enough to shower, catch a nap, and grab my laptop before heading to Portland for RailsConf. I can't wait.

Enumerate, Map, Filter, Accumulate

Posted by david

Chapter 2 of The Structure and Interpretation of Computer Programs provides an excellent description of how to leverage abstractions to make code more expressive. One technique it describes is processing data like a signal-processing engineer would precess a signal - by generating the signal, filtering it, translating it with a map, and then combining the elements of the signal with an accumulator. This enumerate-filter-map-accumulate sequence of operations becomes a pattern that can implement a number of computations in a really understandable way.

The example that the book gives is that of finding the salary of the highest paid programmer, given a collection of employee records. Here's how a Java programmer might implement it:

1
2
3
4
5
6
7
8
9
10

int maxSalary = 0;
for (Employee employee : employees) {
  if (Role.PROGRAMMER.equals(employee.getRole())) {
    int salary = employee.getSalary();
    if (salary > maxSalary) {
      maxSalary = salary;
    }
  }
}

I don't think this is bad at all. It's more or less obvious at a glance what the code is doing. Let's take a look at how it can be implemented in Scheme using the enumerate-map-filter-accumulate style:

1
2
3
4
5
6

(define (salary-of-highest-paid-programmer records)
  (accumulate 
      max 0 
        (map salary
          (filter programmer? records))))

If you don't have any exposure to functional programming, you probably find the first example clearer. This is a simple example, and most programmers will prefer the one written in the style they're most familiar with. I've spent most of my career programming in Java, but I have had some exposure to functional languages, and I prefer the second example. I find that it more directly expresses the intent of the code. At such a small scale, however, it really doesn't matter which style you use. The true power of this technique lies in how it scales to more complex compuations, while still keeping code readable and concise. Let's suppose that instead of finding the salary of the highest-paid programmer, we were to generate a report. That report should contain, for each job title, the maximum salary of any employee with that title, followed by a list of salaries for all employees with that title.

Here's how I would do it in Scheme:

1
2
3
4
5
6
7
8
9
10
11
12

(define (salary-report records)
  (map 
    (lambda (want-role) 
      (let ((salaries 
          (map 
            salary 
            (filter 
              (lambda (employee) (has-role? want-role employee)) 
              records))))
        (list want-role (accumulate max 0 salaries) salaries)))
    (uniq  (map role records))))

And in Java:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

Map<Role, List<Employee>> employeesByRole = new HashMap<Role, List<Employee>>();
for (Employee employee : employees) {
  Role role = employee.getRole();
  if (!employeesByRole.containsKey(role)) {
    employeesByRole.put(role, new ArrayList<Employee>());
  }
  employeesByRole.get(role).add(employee);
}

List<List> report = new ArrayList<List>();

for (Role role : employeesByRole.keySet()) {
  Integer maxSalary = 0;
  List<Integer> allSalaries = new ArrayList<Integer>();
  for (Employee employee : employeesByRole.get(role)) {
    Integer salary = employee.getSalary();
    allSalaries.add(salary);
    maxSalary = salary > maxSalary ? salary : maxSalary;
  }
  List reportEntry = new ArrayList();
  reportEntry.add(role);
  reportEntry.add(maxSalary);
  reportEntry.add(allSalaries);
  report.add(reportEntry);
}

Here, the difference should be more apparent. I far prefer the Scheme version. The problem with the Java version is not just that it's more verbose; it does not express the intent of the code as directly as the Scheme one. There's a lot of housekeeping going on with building up the data structures needed to build the report. The fact that we have two explicit loops clouds the intent of the code.

I hope that at this point you can see how the enumerate-filter-map-accumulate pattern can scale up to more complex calculations. In those calculations we may be applying that pattern (or pieces of it) several times to get the final result, without a loss of clarity in the intent of the code. A footnote in the aforementioned chapter of The Structure and Interpretation of Computer Programs mentions a study of the Fortran Scientific Subroutine Package that found 90% of the code fitting into this pattern.

Let's return to the original example: finding the salary of the highest paid programmer. Here's how I would do it in Ruby:

1
2
3
4
5

create_employees.
  select {|emp| :programmer == emp.role }.
    map {|emp| emp.salary }.
      inject {|m, v| m > v ? m : v}

Is it possible to do this kind of thing in Java? This is the closest I could come up with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Integer maxSalary = accumulate(new Accumulation<Integer, Integer>() {
  protected Integer function(Integer a, Integer b) {
    return a > b ? a : b;
  }}, 
  map(new Mapper<Employee, Integer>() {
    protected Integer function(Employee emp) {
      return emp.getSalary();
    }}, 
    filter(new Filter<Employee>() {
      protected boolean function(Employee emp) {
        return Role.PROGRAMMER.equals(emp.getRole());
      }}, 
      enumerate(new Enumeration<Employee>() {
        protected Collection<Employee> function() {
          return Arrays.asList(employees);
      }}))), 
  new Integer(0));

Yuck. Since Java does not include a function type, we need to simulate one with anonymous classes. That creates a lot of syntactic cruft that clouds the code. People are talking about adding closure support to Java 7 to address this. How would this code look like if that were added? Well, there are currently a couple different proposals out there. If the BGGA proposal were adopted, I think it would look something like this:

1
2
3
4
5
6
7
8
9

Integer maxSalary = 
  accumulate(
    { Integer a, Integer b => a > b ? a : b }, 0,
    map(
      { Employee emp => emp.getSalary() }, 
      filter(
        { Employee emp => Role.PROGRAMMER.equals(emp.getRole()) },
        Arrays.asList(employees) )));

That's a real improvement.

Learning Tools Versus Learning Concepts

Posted by david

One of the best teachers I had during my time at the University of Illinois was John D'Angelo in multivariable calculus. This class was required for all engineering students, and most of the students in the class (myself included) were not very interested in the course itself. We may have wanted to learn to design bridges, circuits, or software, but this class was really just a hoop to jump through in order to study the stuff we really wanted to learn.

My reason for writing about this course is Professor D'Angelo and his lack of patience for a certain type of student. You see, to do well on the exams in math courses at Illinois, you had one of three options:

  1. To learn the concepts
  2. To learn how to solve the types of problems that would be on the exam
  3. To cheat

While I'm sure there were at least of few students who opted for option three, Professor D'Angelo was notable for designing his exams to foil the students who opted for option two. Many exams would feature questions that looked nothing like those that were featured in the homework problems yet were trivial for students who really grasped the concepts behind those problems. After each exam, it was inevitable that some students would stop by his office to argue that they deserved partial credit for one or more answers because if they had done just one or two steps differently, they would have arrived at the correct answer. The professor was famous (or infamous) among the students for being stingy with these partial credit points since the incorrect steps indicated a lack of true understanding of the mathematics behind the techniques. He once boasted to the class about a student he had sent away by exclaiming, "You deserve zero! You deserve zero!". Despite his temper, he was an excellent teacher, and students who did well in his class were well-prepared for the more advanced math courses to come.

While I was studying computer science at Illinois, I commonly heard students gripe that the coursework was too theoretical, without enough of the practical knowledge that would enable us to land good jobs coming out of school. We may have graduated knowing plenty about complexity analysis and hashing algorithms, but would we know enough Java to succeed at the day-to-day work most of us were heading off to do? I certainly griped about this on at least a couple of occasions, but in hindsight, I'm grateful for the theoretical knowledge I did acquire, due to the type of teaching exemplified by John D'Angelo. I don't know how much the computer science curriculum at Illinois has changed in the few years since I graduated, but I hope it has not moved very far away from teaching the concepts it taught while I was there.

If you make your living as a software developer, you've undoubtedly come across programmers who know all of the surface details of the platform and the tools they work with but lack the conceptual knowledge of how those tools work. Perhaps they're Java programmers who have not only memorized the Collections APIs, but who can write JSTL with lots of custom tags, Hibernate configuration files, and Struts forms without needing to look at a reference.

And perhaps they're also programmers who hear about a technology such as Ruby on Rails or Jython and immediately dismiss it by saying that languages without static typing just aren't "safe" (or fast enough, or secure, etc.) despite having zero or near-zero experience programming in any language other than Java.

While such a stereotypical programmer may honestly believe the reasons he's dismissing the technology in question and may even have a point, he also does have a lot to lose in such a major technology shift as heavy use by dynamic languages. He has invested a lot of time and effort in learning the technologies he works with. He gets paid a lot of money and has (for now) a secure job because of the effort he's put into that learning. When EJB first came out, he spent several evenings learning it. When EJB2 came out, he spent several months. Once he was convinced that SOAP was going to take over the world, he not only learned XML Schemas, XML namespaces, and WSDL, he also learned how to use JAX-RPC, JAXB, and SAAJ. It wouldn't be fair to him to render that knowledge useless by choosing a new platform with a different set of APIs.

I don't mean the previous sentence sarcastically - any programmer who invests the time to become an expert in development tools isn't lazy. Knowing all the details of the tools you're using is a good thing, not a bad thing, and teams can be well-served by having such an expert available. But whether or not having to learn a new platform is fair is irrelevent. Birds fly, fish swim, and technology changes. If it didn't, it wouldn't really be technology. If the University of Illinois had spent its time teaching me all about the mainstream technologies in use in 2001 (the year I graduated), it would have served me well at that time, but my education would be getting more and more stale with each passing year. Instead, I currently find myself really wishing I had paid more attention during that lecture about fixed-point combinators, because not only are they freaking cool, but because understanding them may prove to be somewhat useful in whatever language I'll be using to earn a living a year from now.

The thing is, to get anything done, we do need tools - we need XML parsers, build scripts, and even ORM frameworks, and we need to know how to use them effectively. A Java programmer who doesn't know the difference betweeh the HashMap and a TreeMap classes probably isn't going to be very productive, no matter how well they undertand hashing algorithms. We only have so much time we can spend learning, and if we spend it all on high-level concepts, we'll have a hard time actually building anything. That's the tradeoff. So how should we focus our energy?

One of my biggest metrics for the quality of a tool is how much of what I learn when using the tool will be of value when I stop using that tool. In other words, will learning that tool teach me about the concepts behind that tool? Likewise, how much will my existing conceptual knowledge make it easier to learn how to use the tool? For instance, if I'm going to invest the time to learn yet another ORM framework, I want to get a better understanding of the general techniques ORM frameworks use. As another example, learning how to leverage higher-order functions in Ruby translates well to using them in JavaScript or Scheme and will translate in some degree to any language that supports functions as first-class objects.

One way to get the benefits of learning a tool well and learning concepts that will prove valuable beyond the tool itself is to learn how that tool works. While I'm picking on database-mapping tools, consider the Ruby on Rails interface to the database - ActiveRecord. I bet that anyone who can honestly put Ruby on Rails on their resume has ventured into the ActiveRecord source code at some point. (The same is certainly not true about Hibernate). This is due to the fact that Rails is distributed as source code, rather than object code, and because the dynamic nature of Ruby makes it easy to extend the behavior of ActiveRecord in ways other than subclassing. To do so effecively, however, you need to be familiar with how ActiveRecord works, and the most effective way to do that is to go to the source code. While you could argue that needing to go into the source code to find out how something works is not ideal, you cannot argue that the act of doing so will give you a better understanding of the techniques the tool uses to do its job - techniques that may prove applicable in other situations. If fact, the Rails source code is how I originally learned Ruby. After working through the simple examples given in the Rails documenation, I found the Rails source to be a rich bed of information on Ruby idioms and techniques.

Reading back over this, it seems like I'm picking on Java programmers. That wasn't my intention, but now that I think about it, whenever I've met a software developer who only knows a single platform or programming language, that language has always been Java. Java's dominance over the last few years has made it possible to be a financially successful programmer without needing to learn other languages, and the complexity and sheer number of its frameworks has captured the energy and time many developers have for learning. But like the languages before it, Java won't dominate forever, and the developers who have a solid understanding of the concepts (both within Java-land and outside it) are the ones who will be successful transitioning to whatever comes next.