Enumerate, Map, Filter, Accumulate

Posted by david

Chapter 2 of The Structure and Interpretation of Computer Programs provides an excellent description of how to leverage abstractions to make code more expressive. One technique it describes is processing data like a signal-processing engineer would precess a signal - by generating the signal, filtering it, translating it with a map, and then combining the elements of the signal with an accumulator. This enumerate-filter-map-accumulate sequence of operations becomes a pattern that can implement a number of computations in a really understandable way.

The example that the book gives is that of finding the salary of the highest paid programmer, given a collection of employee records. Here's how a Java programmer might implement it:

1
2
3
4
5
6
7
8
9
10

int maxSalary = 0;
for (Employee employee : employees) {
  if (Role.PROGRAMMER.equals(employee.getRole())) {
    int salary = employee.getSalary();
    if (salary > maxSalary) {
      maxSalary = salary;
    }
  }
}

I don't think this is bad at all. It's more or less obvious at a glance what the code is doing. Let's take a look at how it can be implemented in Scheme using the enumerate-map-filter-accumulate style:

1
2
3
4
5
6

(define (salary-of-highest-paid-programmer records)
  (accumulate 
      max 0 
        (map salary
          (filter programmer? records))))

If you don't have any exposure to functional programming, you probably find the first example clearer. This is a simple example, and most programmers will prefer the one written in the style they're most familiar with. I've spent most of my career programming in Java, but I have had some exposure to functional languages, and I prefer the second example. I find that it more directly expresses the intent of the code. At such a small scale, however, it really doesn't matter which style you use. The true power of this technique lies in how it scales to more complex compuations, while still keeping code readable and concise. Let's suppose that instead of finding the salary of the highest-paid programmer, we were to generate a report. That report should contain, for each job title, the maximum salary of any employee with that title, followed by a list of salaries for all employees with that title.

Here's how I would do it in Scheme:

1
2
3
4
5
6
7
8
9
10
11
12

(define (salary-report records)
  (map 
    (lambda (want-role) 
      (let ((salaries 
          (map 
            salary 
            (filter 
              (lambda (employee) (has-role? want-role employee)) 
              records))))
        (list want-role (accumulate max 0 salaries) salaries)))
    (uniq  (map role records))))

And in Java:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

Map<Role, List<Employee>> employeesByRole = new HashMap<Role, List<Employee>>();
for (Employee employee : employees) {
  Role role = employee.getRole();
  if (!employeesByRole.containsKey(role)) {
    employeesByRole.put(role, new ArrayList<Employee>());
  }
  employeesByRole.get(role).add(employee);
}

List<List> report = new ArrayList<List>();

for (Role role : employeesByRole.keySet()) {
  Integer maxSalary = 0;
  List<Integer> allSalaries = new ArrayList<Integer>();
  for (Employee employee : employeesByRole.get(role)) {
    Integer salary = employee.getSalary();
    allSalaries.add(salary);
    maxSalary = salary > maxSalary ? salary : maxSalary;
  }
  List reportEntry = new ArrayList();
  reportEntry.add(role);
  reportEntry.add(maxSalary);
  reportEntry.add(allSalaries);
  report.add(reportEntry);
}

Here, the difference should be more apparent. I far prefer the Scheme version. The problem with the Java version is not just that it's more verbose; it does not express the intent of the code as directly as the Scheme one. There's a lot of housekeeping going on with building up the data structures needed to build the report. The fact that we have two explicit loops clouds the intent of the code.

I hope that at this point you can see how the enumerate-filter-map-accumulate pattern can scale up to more complex calculations. In those calculations we may be applying that pattern (or pieces of it) several times to get the final result, without a loss of clarity in the intent of the code. A footnote in the aforementioned chapter of The Structure and Interpretation of Computer Programs mentions a study of the Fortran Scientific Subroutine Package that found 90% of the code fitting into this pattern.

Let's return to the original example: finding the salary of the highest paid programmer. Here's how I would do it in Ruby:

1
2
3
4
5

create_employees.
  select {|emp| :programmer == emp.role }.
    map {|emp| emp.salary }.
      inject {|m, v| m > v ? m : v}

Is it possible to do this kind of thing in Java? This is the closest I could come up with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Integer maxSalary = accumulate(new Accumulation<Integer, Integer>() {
  protected Integer function(Integer a, Integer b) {
    return a > b ? a : b;
  }}, 
  map(new Mapper<Employee, Integer>() {
    protected Integer function(Employee emp) {
      return emp.getSalary();
    }}, 
    filter(new Filter<Employee>() {
      protected boolean function(Employee emp) {
        return Role.PROGRAMMER.equals(emp.getRole());
      }}, 
      enumerate(new Enumeration<Employee>() {
        protected Collection<Employee> function() {
          return Arrays.asList(employees);
      }}))), 
  new Integer(0));

Yuck. Since Java does not include a function type, we need to simulate one with anonymous classes. That creates a lot of syntactic cruft that clouds the code. People are talking about adding closure support to Java 7 to address this. How would this code look like if that were added? Well, there are currently a couple different proposals out there. If the BGGA proposal were adopted, I think it would look something like this:

1
2
3
4
5
6
7
8
9

Integer maxSalary = 
  accumulate(
    { Integer a, Integer b => a > b ? a : b }, 0,
    map(
      { Employee emp => emp.getSalary() }, 
      filter(
        { Employee emp => Role.PROGRAMMER.equals(emp.getRole()) },
        Arrays.asList(employees) )));

That's a real improvement.