Governance by those who do the work.

Sunday, October 25, 2009

Implicit Concurrency

There has been a lot of concern over how to harness the speed potential of multi-core processors without a disruptive reinvention of software engineering. Implicit concurrency is a solution.

The specifications of the algorithmic languages C, C++, OCAML, and Scheme deliberately leave unspecified the order of evaluation of arguments to procedure-calls. If coders are to write programs which display the same behavior when (compiled and) run on different platforms, an easily mastered discipline concerning side-effects of argument expressions must be practiced.

It turns out that this discipline is very similar to that needed in order to write programs which display the same behavior even with concurrent evaluation of arguments (and on different platforms).

Changing from unspecified-order-of-evaluation-of-arguments to concurrent-envaluation-of-arguments in these language specifications would enable them to be implicitly-concurrent:

  • able to spawn concurrent subtasks at many points in programs,
  • produce the same result whether run in 1 core (or thread) or 1000,
  • involve no new language primitives, and
  • require no complicated code analysis.

Terminology

closed-expression
Let the term closed-expression designate
  • a Scheme expression and
  • all state accessible from the value of each free variable in the expression or free in any procedures called from the expression or procedures they call (this can include files).

After the closed-expression is evaluated, the resulting state (or output) is:
  • the returned value or values,
  • all state accessible from the value of each free variable in the expression or free in any procedures called from the expression or procedures they call, and
  • files written by evaluating the expression.

singly-terminate
Let the term singly-terminate designate the condition of the evaluation of a closed-expression that:
  • returns once and only once and
  • does not let any continuations escape from the dynamic extent of evaluating the closed expression.

portably-repeatable
Let the term portably-repeatable designate those closed-expressions which singly-terminate and produce identical output for any serial evaluation order allowed by the Scheme report.

The Scheme Reports mandate an unspecified serial evaluation order for arguments to procedure-calls, `let' bindings, `letrec' bindings, `do' bindings, and `map'.

A caveat to portably-repeatable is the non-associativity of floating-point operations. For the extent of this discussion, assume that `+', `*', `-', and `/' with more than 2 arguments are left-associative.

concurrently-repeatable
Let the term concurrently-repeatable designate those closed-expressions which singly-terminate and produce identical output for any concurrent evaluation order where the Scheme report mandates an unspecified serial evaluation order.


Implicitly-Concurrent-Scheme
Let Implicitly-Concurrent-Scheme (ICS) designate an implementation which allows concurrent evaluation anywhere the Scheme report mandates an unspecified serial evaluation order.

Theorems

Theorem 1
All singly-terminating purely functional closed-expressions (including procedures they call) are both portably-repeatable and concurrently-repeatable.
This is because purely functional (sub-)expressions have no side effects; hence they cannot effect the execution of other purely functional (sub-)expressions.

Theorem 2
All concurrently-repeatable closed-expressions are portably-repeatable.
For a given closed-expression, its serial evaluation orders are a subset of its concurrent evaluation orders.

Theorem 3
A single threaded, non-preemptive RnRS (Scheme) implementation is an ICS implementation.
This is because concurrently-repeatable implies portably-repeatable by Theorem 2.

The Programmer Contract

The programmer who wants to write closed-expressions which have repeatable behavior across platforms must currently write portably-repeatable code (among other constraints).

There are relatively few portably-repeatable expressions which are not also concurrently-repeatable. And it is avoiding these few expressions which represents the change of habit for coders. The few expressions involve the potential for multiple threads to have side-effects on shared state.

Examples of portably-repeatable expressions which are not concurrently-repeatable are symmetric calls of a pseudo-random-number-generator having shared state:

(begin
  (require 'random)
  (let ((rs (make-random-state "seed1")))
    (+ (random 9 rs) (random 9 rs))))

Notice that the expressions:

(- (random 9 rs) (random 9 rs))
and
(+ (random 9 rs) (random 19 rs))
are not portably-repeatable; for asymmetrical functions (or asymmetrical argument expressions), the order of calls to `random' causes different results to be returned.

A concurrently-repeatable version of the random sum can be written:

(begin
  (require 'random)
  (let* ((rs (make-random-state "seed1"))
         (rnd1 (random 9 rs)))
    (+ rnd1 (random 9 rs))))

The SLIB random module detects and signals an error if `random' is called concurrently or reentrantly on the same random-state byte-array. A concurrent implementation can thus sometimes catch concurrently-repeatable violations.

The Scheme Reports also leave the order of evaluation unspecified for `map'. Examples of portably-repeatable `map' expressions which are not concurrently-repeatable have side-effecting (set!) shared-state (cnt) within a `map'ped procedure:

(let ((cnt 0))
  (let ((res (map
              (lambda (x)
                (if (negative? x) (set! cnt (+ 1 cnt)))
                (* x x))
              '(-1 2 -3 4 -5 6))))
    (list res cnt)))

Such code can be made concurrently-repeatable by separating the functional and side-effecting parts:

(let ((cnt 0) (lst '(-1 2 -3 4 -5 6)))
  (for-each (lambda (x)
              (if (negative? x) (set! cnt (+ 1 cnt))))
            lst)
  (list (map
         (lambda (x) (* x x))
         lst)
        cnt))

Or the non-functional part can be made functional:

(let ((lst '(-1 2 -3 4 -5 6)))
  (list (map (lambda (x) (* x x)) lst)
        (apply + (map (lambda (x) (if (negative? x) 1 0))
                      lst))))

Or the functional part can be accumulated in the serial `for-each'. This code is forced to run serially by the `for-each'; so it will not gain a speed benefit from concurrent evaluation:

(let ((cnt 0)
      (lst '(-1 2 -3 4 -5 6))
      (nst '()))
  (for-each (lambda (x)
              (if (negative? x) (set! cnt (+ 1 cnt)))
              (set! nst (cons (* x x) nst)))
            lst)
  (list (reverse nst) cnt))

One could also write a serial `map*' procedure to do the same.

Continuations

This expression is portably-repeatable but not concurrently-repeatable because it would return twice:
(call-with-current-continuation
 (lambda (return)
   (/ (return 0) (return 0))))

This expression is neither portably-repeatable nor concurrently-repeatable because whether it returns 0 or 1 depends on the order of evaluation of arguments:

(call-with-current-continuation
 (lambda (return)
   (/ (return 0) (return 1))))

But escape continuations can be used in concurrently-repeatable ways:

(define (careful-div n d)
  (call-with-current-continuation
   (lambda (return)
     (/ n (if (zero? d) (return #f) d)))))

The Implementation Contract

When encountering an expression (list (u) (v)), a parallelizing RnRS implementation must commence an analysis of u and v and everything they call, possibly including other modules, to determine whether any side effects to shared state might depend on concurrent evaluation order. This is the same difficulty that any parallelizing C or Fortran compiler faces. Such an analysis cannot be exhaustive because there is no guarantee that the analysis will terminate (Halting Problem).

When a parallelizing Implicitly-Concurrent-Scheme compiler or interpreter encounters (list (u) (v)), it immediately knows that it can delegate the computation of (u) and (v) to distinct threads. The Programmer Contract to write concurrently-repeatable code insures that no side-effects to shared state will depend on concurrent evaluation order.

Exceptions

When an exception is raised in one thread evaluating an argument in concurrently-repeatable code, it does not effect the evaluation of the other arguments because its side-effects do not interact with the threads of the other arguments. Whether the handler returns or not determines whether the procedure is applied to the result of those argument evaluations.

Conclusion

Implicitly-Concurrent-Scheme is a minor modification to Scheme Report semantics which allows, but does not require, both interpreters and compilers to simply parallelize the execution of programs. No new primitives are introduced. In order to write ICS programs which produce the same output irrespective of concurrent evaluation order requires a regime very similar to the that required of RnRS programs to produce the same output irrespective of serial evaluation order.

Bibliography

[Feeley]
Marc Feeley,
An Efficient and General Implementation
of Futures on Large Scale Shared-Memory Multiprocessors
,
PhD thesis, Brandeis University, April 1993, 213 pages.

2 comments:

AA said...

I wonder if this technique can deliver significant speedups in practice. Scheduling overhead and cache misses from juggling so many fine grained computations over an increasing number of cores may overwhelm the concurrency gains.

Simon Peyton Jones discusses his experience with a similar technique in Haskell at about 5 minutes into this interview

Aubrey Jaffer said...

Implicit-concurrency doesn't mandate that any concurrency be used. And cautious application of concurrency is entirely compatible with implicit-concurrency. For instance, one might only fork threads when the subexpressions called user-defined functions, or user-defined functions whose definitions are over 40 lines long.