Governance by those who do the work.

Wednesday, December 30, 2009

Integers-as-Bits versus Bit-Arrays

Integers As Bits (SRFI-60) is motivated this way:
Treating integers as two's-complement strings of bits is an arcane but important domain of computer science. It is used for:
  • hashing;
  • Galois-field[2] calculations of error-detecting and error-correcting codes;
  • cryptography and ciphers;
  • pseudo-random number generation;
  • register-transfer-level modeling of digital logic designs;
  • index arithmetic of Fast-Fourier transforms;
  • packing and unpacking numbers in persistant data structures;
  • space-filling curves with applications to dimension reduction and sparse multi-dimensional database indexes; and
  • generating initial values for root-finders and transcendental function algorithms.
Many people's first impulse is to conflate integers with bit-vectors. But this is not useful when we realize that functional operations on immutable integers-as-bits are less inefficient than side-effecting operations on large bit-vectors.

Bit-vectors can be used to construct very fast indexes for database tables having millions of records. To keep the working-set small, creation of megabit arrays is done explicitly; the bit-vector procedures accumulate their results into one of the operands. Functional operations on bignums would be impractical for such an application.

Let

N
be the largest number of significant bits in bignum inputs to a bitwise computation; and
P
be the number of binary bitwise operations (LOGAND, LOGOR, ...) in a computation.
Consider the computation to find the disjunction of P bit-vectors (a OR b OR c ...). Using (bignum) integers-as-bits, P bignums will be allocated and P-1 bignums will be discarded in the course of the computation. Thus it consumes O(PN) space and time.

Accumulating the result of multiple disjunctions in one bit-vector consumes only O(N) storage and O(PN) time.

The worst case for bit-vector accumulation is a full binary tree of 2q bit-vectors with P=2q-1 binary operations. By reusing temporary bit-vectors, storage can be limited to O(Nq)=O(NlogP) and O(PN) time.

A clever optimizer could be written to convert functional bitwise expressions into accumulating bit-vector forms, but I haven't found functional bitwise operators to be easier to work with than imperative bit-vector operators for database indexing.

In addition to the generic Array and Array Mapping procedures of SLIB, the following procedures in SCM were written to support the use of bit-vectors as database indexes. This technique works best for indexes having only a few indexed values (with many records per value). One bit-vector is devoted to each indexed value for a field; a 1-bit indicates that the corresponding record has that value in that field. Indexes for values which are sparse can instead be encoded by vectors of integers, each integer indicating the record having that value in that field. Indexes coded in these ways allow fast calculation of complicated queries.

5.4.3 Bit Vectors

Bit vectors can be written and read as a sequence of 0s and 1s prefixed by #*.

#1At(#f #f #f #t #f #t #f) => #*0001010

Some of these operations will eventually be generalized to other uniform-arrays.

Function: bit-count bool bv
Returns the number of occurrences of bool in bv.

Function: bit-position bool bv k
Returns the minimum index of an occurrence of bool in bv which is at least k. If no bool occurs within the specified range #f is returned.

Function: bit-invert! bv
Modifies bv by replacing each element with its negation.

Function: bit-set*! bv uve bool
If uve is a bit-vector, then bv and uve must be of the same length. If bool is #t, then uve is OR'ed into bv; If bool is #f, the inversion of uve is AND'ed into bv.

If uve is a unsigned integer vector, then all the elements of uve must be between 0 and the LENGTH of bv. The bits of bv corresponding to the indexes in uve are set to bool.

The return value is unspecified.

Function: bit-count* bv uve bool
Returns
(bit-count (bit-set*! (if bool bv (bit-invert! bv)) uve #t) #t).

bv is not modified.

Tuesday, December 15, 2009

Schlep Toolchains

Schlep Toolchains documents and distributes the hyper-cross-platform technology used to create the WB B-tree database libraries and new implementation of Water for C, C#, and Java from source written in a subset of Scheme.

One might assume that in order for Schlep-dialect code to map to these three languages [C, C#, and Java], the Schlep-dialect must manifest the worst limitations of each language; but translation can actually rectify some of these limitations.

If I had to pick one practical feature of Scheme which elevates it above other languages, it would be internal definitions. The Algorithmic Language Scheme allows procedure definitions (using define, letrec, and named-let) inside of other procedures, which none of C, C#, or Java does. Internal definitions allow calls of internal procedures with a small number of arguments to replace the common alternatives:

  • duplication of code, or
  • top-level definitions and calls passing large numbers of arguments.
C and C# have a `goto' statement, enabling Schlep to emulate calling of internal-procedures in the tail position using some variable assignments (sometimes including temporary variable binding to emulate simultaneous assignment) followed by a goto statement. The restriction to the tail-position does not allow internal recursion other than tail-recursion; but this facilitates use of internal procedures in many situations which would otherwise force less desirable practices.

Java lacks a `goto' statement. Tail-called internal procedures are instead implemented using while (true), continue, and break with labels. The resulting Java code is not as readable as the original Scheme-dialect; but that loss in clarity is balanced by greater expressive power.

Schlep-example gives an example of a procedure with an internal procedure and how Schlep translates this procedure into C, C#, and Java.

Sunday, November 15, 2009

Implicitly Concurrent `map'

In my first Implicit Concurrency post I stated that
The Scheme Reports mandate an unspecified-but-serial evaluation order for arguments to procedure-calls, `let' bindings, `letrec' bindings, `do' bindings, and `map'.
The only RnRS mention of concurrency appears in the description of Procedure call:
Note: Although the order of evaluation is otherwise unspecified, the effect of any concurrent evaluation of the operator and operand expressions is constrained to be consistent with some sequential order of evaluation. The order of evaluation may be chosen differently for each procedure call.

Because `do', `let', and `letrec' bindings are defined in terms of (`lambda' and) procedure calls, this restriction applies to them as well.

But the Reports are silent as to the evaluation order for `map'; so it shouldn't have been included in my list.

If the Implicitly Concurrent future is already here, why is it a secret? Because misunderstanding of `map' evaluation order is widespread; in [R6RS] Will `map' order remain unspecified?, W. Clinger writes:

... expecting map to process its elements from left to right has been perhaps the most common order-of-evaluation bug in R5RS programs.

There is a deep connection between concurrent `map' and the other implicit concurrencies. If a metacircular-interpreter uses a concurrent `map' to evaluate arguments to procedure calls, then that interpreter's procedure calls will also be concurrent.

Implicit concurrency can thus be viewed as a basic property of a computing platform. A concurrently-repeatable metacircular-interpreter can be written without using explicit-concurrency primitives which is concurrent when run by a concurrent (implicitly concurrent) implementation and serial when run by a serial implementation.

Sunday, October 25, 2009

Implicit Concurrency

There has been a lot of concern over how to harness the speed potential of multi-core processors without a disruptive reinvention of software engineering. Implicit concurrency is a solution.

The specifications of the algorithmic languages C, C++, OCAML, and Scheme deliberately leave unspecified the order of evaluation of arguments to procedure-calls. If coders are to write programs which display the same behavior when (compiled and) run on different platforms, an easily mastered discipline concerning side-effects of argument expressions must be practiced.

It turns out that this discipline is very similar to that needed in order to write programs which display the same behavior even with concurrent evaluation of arguments (and on different platforms).

Changing from unspecified-order-of-evaluation-of-arguments to concurrent-envaluation-of-arguments in these language specifications would enable them to be implicitly-concurrent:

  • able to spawn concurrent subtasks at many points in programs,
  • produce the same result whether run in 1 core (or thread) or 1000,
  • involve no new language primitives, and
  • require no complicated code analysis.

Terminology

closed-expression
Let the term closed-expression designate
  • a Scheme expression and
  • all state accessible from the value of each free variable in the expression or free in any procedures called from the expression or procedures they call (this can include files).

After the closed-expression is evaluated, the resulting state (or output) is:
  • the returned value or values,
  • all state accessible from the value of each free variable in the expression or free in any procedures called from the expression or procedures they call, and
  • files written by evaluating the expression.

singly-terminate
Let the term singly-terminate designate the condition of the evaluation of a closed-expression that:
  • returns once and only once and
  • does not let any continuations escape from the dynamic extent of evaluating the closed expression.

portably-repeatable
Let the term portably-repeatable designate those closed-expressions which singly-terminate and produce identical output for any serial evaluation order allowed by the Scheme report.

The Scheme Reports mandate an unspecified serial evaluation order for arguments to procedure-calls, `let' bindings, `letrec' bindings, `do' bindings, and `map'.

A caveat to portably-repeatable is the non-associativity of floating-point operations. For the extent of this discussion, assume that `+', `*', `-', and `/' with more than 2 arguments are left-associative.

concurrently-repeatable
Let the term concurrently-repeatable designate those closed-expressions which singly-terminate and produce identical output for any concurrent evaluation order where the Scheme report mandates an unspecified serial evaluation order.


Implicitly-Concurrent-Scheme
Let Implicitly-Concurrent-Scheme (ICS) designate an implementation which allows concurrent evaluation anywhere the Scheme report mandates an unspecified serial evaluation order.

Theorems

Theorem 1
All singly-terminating purely functional closed-expressions (including procedures they call) are both portably-repeatable and concurrently-repeatable.
This is because purely functional (sub-)expressions have no side effects; hence they cannot effect the execution of other purely functional (sub-)expressions.

Theorem 2
All concurrently-repeatable closed-expressions are portably-repeatable.
For a given closed-expression, its serial evaluation orders are a subset of its concurrent evaluation orders.

Theorem 3
A single threaded, non-preemptive RnRS (Scheme) implementation is an ICS implementation.
This is because concurrently-repeatable implies portably-repeatable by Theorem 2.

The Programmer Contract

The programmer who wants to write closed-expressions which have repeatable behavior across platforms must currently write portably-repeatable code (among other constraints).

There are relatively few portably-repeatable expressions which are not also concurrently-repeatable. And it is avoiding these few expressions which represents the change of habit for coders. The few expressions involve the potential for multiple threads to have side-effects on shared state.

Examples of portably-repeatable expressions which are not concurrently-repeatable are symmetric calls of a pseudo-random-number-generator having shared state:

(begin
  (require 'random)
  (let ((rs (make-random-state "seed1")))
    (+ (random 9 rs) (random 9 rs))))

Notice that the expressions:

(- (random 9 rs) (random 9 rs))
and
(+ (random 9 rs) (random 19 rs))
are not portably-repeatable; for asymmetrical functions (or asymmetrical argument expressions), the order of calls to `random' causes different results to be returned.

A concurrently-repeatable version of the random sum can be written:

(begin
  (require 'random)
  (let* ((rs (make-random-state "seed1"))
         (rnd1 (random 9 rs)))
    (+ rnd1 (random 9 rs))))

The SLIB random module detects and signals an error if `random' is called concurrently or reentrantly on the same random-state byte-array. A concurrent implementation can thus sometimes catch concurrently-repeatable violations.

The Scheme Reports also leave the order of evaluation unspecified for `map'. Examples of portably-repeatable `map' expressions which are not concurrently-repeatable have side-effecting (set!) shared-state (cnt) within a `map'ped procedure:

(let ((cnt 0))
  (let ((res (map
              (lambda (x)
                (if (negative? x) (set! cnt (+ 1 cnt)))
                (* x x))
              '(-1 2 -3 4 -5 6))))
    (list res cnt)))

Such code can be made concurrently-repeatable by separating the functional and side-effecting parts:

(let ((cnt 0) (lst '(-1 2 -3 4 -5 6)))
  (for-each (lambda (x)
              (if (negative? x) (set! cnt (+ 1 cnt))))
            lst)
  (list (map
         (lambda (x) (* x x))
         lst)
        cnt))

Or the non-functional part can be made functional:

(let ((lst '(-1 2 -3 4 -5 6)))
  (list (map (lambda (x) (* x x)) lst)
        (apply + (map (lambda (x) (if (negative? x) 1 0))
                      lst))))

Or the functional part can be accumulated in the serial `for-each'. This code is forced to run serially by the `for-each'; so it will not gain a speed benefit from concurrent evaluation:

(let ((cnt 0)
      (lst '(-1 2 -3 4 -5 6))
      (nst '()))
  (for-each (lambda (x)
              (if (negative? x) (set! cnt (+ 1 cnt)))
              (set! nst (cons (* x x) nst)))
            lst)
  (list (reverse nst) cnt))

One could also write a serial `map*' procedure to do the same.

Continuations

This expression is portably-repeatable but not concurrently-repeatable because it would return twice:
(call-with-current-continuation
 (lambda (return)
   (/ (return 0) (return 0))))

This expression is neither portably-repeatable nor concurrently-repeatable because whether it returns 0 or 1 depends on the order of evaluation of arguments:

(call-with-current-continuation
 (lambda (return)
   (/ (return 0) (return 1))))

But escape continuations can be used in concurrently-repeatable ways:

(define (careful-div n d)
  (call-with-current-continuation
   (lambda (return)
     (/ n (if (zero? d) (return #f) d)))))

The Implementation Contract

When encountering an expression (list (u) (v)), a parallelizing RnRS implementation must commence an analysis of u and v and everything they call, possibly including other modules, to determine whether any side effects to shared state might depend on concurrent evaluation order. This is the same difficulty that any parallelizing C or Fortran compiler faces. Such an analysis cannot be exhaustive because there is no guarantee that the analysis will terminate (Halting Problem).

When a parallelizing Implicitly-Concurrent-Scheme compiler or interpreter encounters (list (u) (v)), it immediately knows that it can delegate the computation of (u) and (v) to distinct threads. The Programmer Contract to write concurrently-repeatable code insures that no side-effects to shared state will depend on concurrent evaluation order.

Exceptions

When an exception is raised in one thread evaluating an argument in concurrently-repeatable code, it does not effect the evaluation of the other arguments because its side-effects do not interact with the threads of the other arguments. Whether the handler returns or not determines whether the procedure is applied to the result of those argument evaluations.

Conclusion

Implicitly-Concurrent-Scheme is a minor modification to Scheme Report semantics which allows, but does not require, both interpreters and compilers to simply parallelize the execution of programs. No new primitives are introduced. In order to write ICS programs which produce the same output irrespective of concurrent evaluation order requires a regime very similar to the that required of RnRS programs to produce the same output irrespective of serial evaluation order.

Bibliography

[Feeley]
Marc Feeley,
An Efficient and General Implementation
of Futures on Large Scale Shared-Memory Multiprocessors
,
PhD thesis, Brandeis University, April 1993, 213 pages.

Wednesday, October 21, 2009

 

Noticed these Tricyrtis blooming after our first snowfall.
Posted by Picasa

Saturday, October 17, 2009

Climate Change: Anthropogenesis Doesn't Matter

Climate change deniers express a lot of concern over whether climate change is caused by human activities or not. Implicit in asking whether climate change is anthropogenic is an assumption that if climate change isn't caused by humans, then nothing can or should be done to reverse it.

That assumption is absurd; and only serves to derail discussion of how to combat climate change. To see how absurd it is, consider the case of asteroids. If an asteroid large enough to end life-as-we-know-it were discovered to be on a trajectory to impact the earth, it would not be anthropogenic. Should we then not attempt to deflect it from catastrophe?

The important question is: How can we slow or reverse climate change and the destruction of ocean reefs from increased atmospheric CO2?

Wednesday, October 7, 2009

Search-paths considered harmful

It has been my experience that dynamic search-paths are disastrous for libraries on commercial and educational computer networks. The search-paths at these sites seem to only grow; paths to newer versions are placed ahead of older versions. It is not uncommon for half of the directories in a path to refer to non-existent directories or non-functioning host machines! The timeouts from hosed or inaccessible networked-file-systems can balloon into minutes the startup time for programs.

Furthermore, transient router outages can cause incompatible versions of libraries to be loaded which, in the Scheme or Lisp case, may not be noticed until the program has run for hours; but more likely will just cause a mysterious failure.

The SLIB Scheme Library's solution to this problem is to have an explicit command [(require 'new-catalog)], typically run after installations, write to a file an association list of features and resolved pathnames. SLIB sessions read this file and thereafter have instantaneous latency to library paths.

If the network can't reach a particular file, the session fails immediately or after the first timeout rather than conducting covert experiments on library version compatibility.