Governance by those who do the work.

Sunday, April 6, 2014

Recurrence for Multidimensional Space-Filling Functions

"Recurrence for Multidimensional Space-Filling Functions" is the culmination of many years' interest in space-filling curves and functions.  The recurrences result from a de novo analysis that was more engineering than mathematical.  Instead of trying to characterize all possible curves or base patterns, I focused on finding a cell algorithm for all ranks and side lengths (greater than 2) and making that work with a self-similar recurrence subject to scaling laws I discovered from previous implementations.

That cell algorithm is "serpentine".  It turns out that both the Peano and Hilbert (multidimensional) space-filling curves are instances of my recurrence working on serpentine patterns.

From there I generalized the alignment function.  In the alignment function there were degrees of freedom for diagonal-corner cells (with odd side lengths) which were easy to employ to improve their isotropy (Peano curves are anisotropic).  There are degrees of freedom for adjacent-corner cells of rank greater than 2, but deeper analysis will be required to understand them.

While the alignment function seems to work for all serpentine cells (I verified thousands of combinations of rank and side-length), this is not yet proven.  And Figure 20 at the end of the paper shows a non-serpentine 3x3x3 adjacent-corner cell whose symmetrical version works with my alignment function, but whose asymmetrical variant does not.  Future explorations should determine if that asymmetrical cell itself is valid, and discover the constraint on cells if that is not the case.

The non-terminating recurrence is elegant but was tricky to discover because its truncated form returns coordinates at the center of sub-cells rather than at their origins as the integer recurrence does.

The centered form was accomplished by generalizing the offsets I earlier discovered for centering the Peano curve.

Hereditary Dislike of Family

The extensive genealogy work my spouse has done has barely penetrated one branch of my family, and it's not because it lacks descendants! Unlike the other relatives, the ones in this branch admit to knowing little or nothing about their parents' and grandparents' generations. Many in this branch have not responded to our inquiries; two have told us to never contact them again.

So why does antipathy towards relatives seem to be concentrated in this line?  An interesting idea is to consider the evolutionary consequences of a heritable dislike-your-family trait.  Such a trait seems present in the life cycles of some social species where only a minority of adults leave their clans.  While this trait would, on average, probably reduce the survivability of the individuals expressing it, that could be outweighed by the increased dispersion of the clan's genes.

Such a trait could have more than one cause.  One possibility is the types of mental illnesses which often result in the breakup of families.  Schizophrenia in particular tends to manifest in late adolescence, the time when individuals become fertile and able to live independently.  Separation from the clan at that time maximizes their chances of mixing their genes with other clans in offspring.

If mental illness benefits the species by increasing genetic diversity (which is evidenced by its global occurrence), then mental illness is an integral and persistant part of the human genome.

Saturday, October 5, 2013

Easy Accurate Reading and Writing of Floating-Point Numbers

http://people.csail.mit.edu/jaffer/III/EZFPRW

Abstract

Presented here are algorithms for converting between (decimal) scientific-notation and (binary) IEEE-754 double-precision floating-point numbers. These algorithms are much simpler than those previously published. The values are stable under repeated conversions between the formats. The scientific representations generated have only the minimum number of mantissa digits needed to convert back to the original binary values.

Introduction

In both How to Read Floating-Point Numbers Accurately[1990] and Printing floating-point numbers quickly and accurately[1996] the key issue is that successive rounding operations do not have the same effect as a single rounding operation.
The algorithms from both papers are iterative and complicated. The read and write algorithms presented here do at most 2 and 4 bignum divisions, respectively. Over the range of IEEE-754 double-precision numbers, the largest intermediate bignum is 339 decimal digits (1126 bits). This is not large for bignums, being orders of magnitude smaller than the precisions which get speed benefits from FFT multiplication.

Bignums

Both reading and writing of floating-point numbers can involve division of numbers larger than can be stored in the floating-point registers, causing rounding at undesired points of the conversion. Bignums (arbitrary precision integers) can perform division of large integers without rounding. What is needed is a bignum division-with-rounding operator, called round-quotient here. It can be implementated in Scheme as follows:
(define (round-quotient num den)
  (define quo (quotient num den))
  (if ((if (even? quo) > >=) (* (abs (remainder num den)) 2) (abs den))
      (+ quo (if (eqv? (negative? num) (negative? den)) 1 -1))
      quo))
If the remainder is more than half of the denominator, then it rounds up; if it is less, then it rounds down; if it is equal, then it rounds to even. These are the same rounding rules as IEEE Standard for Binary Floating-Point Arithmetic (754-1985) and the Scheme procedure round.
The algorithms presented here use the integer-length function from Common-Lisp and the SLIB Scheme Library:
— Function: integer-length n
Returns the number of bits neccessary to represent n.
Example:
        (integer-length #b10101010)
        ⇒ 8
        (integer-length 0)
        ⇒ 0
        (integer-length #b1111)
        ⇒ 4

For positive argument n, integer-length returns ⌈log2 n⌉.
The algorithms also use ldexp (from C). Here is Scheme code for ldexp:
;; ldexp() from C.
(define (ldexp bmant bexp)
  (define DBL_MIN_10_EXP -1074)
  (if (> DBL_MIN_10_EXP (* -2 (abs bexp)))
      (* (* 1. bmant (expt 2 (quotient bexp 2)))
  (expt 2 (- bexp (quotient bexp 2))))
      (* 1. bmant (expt 2 bexp))))
The conditional in ldexp is in order to work with bexp values where 2bexp is out of floating-point range.

Reading

The algorithm to convert a positive integer mantissa and integer exponent (of 10) to a floating-point number is straightforward.
;; dbl-mant-dig is the number of bits in the mantissa field of the
;; floating-point format.
(define dbl-mant-dig 53)

;; Given positive integer mantissa MANT and exponent (of 10) POINT,
;; find the closest binary floating-point number.
;; bex is the binary shift for quo=mant/scl; bex is negative.
(define (mantexp->dbl mant point)
  (if (>= point 0)
      (exact->inexact (* mant (expt 10 point)))
      (let* ((scl (expt 10 (- point)))
      (bex (- (integer-length mant) (integer-length scl) dbl-mant-dig))
      (num (* mant (expt 2 (- bex))))
      (quo (round-quotient num scl)))
 (cond ((> (integer-length quo) dbl-mant-dig) ;too many bits of quotient
        (set! bex (+ 1 bex))
        (set! quo (round-quotient num (* scl 2)))))
 (ldexp (exact->inexact quo) bex))))
When point is non-negative, the mantissa is multiplied by 10point and the product is rounded to fit in dbl-mant-dig bits by exact->inexact.
With a negative point, the mantissa will be multiplied by a power of 2, then divided by scl=10point. Over the floating-point range, the longest a rounded quotient of a n bit number and a m bit power-of-10 can be is 1+n−m bits; the shortest is n−m bits.
2bex is the power-of-two multiplier. The initial value of bex corresponds to the n−m case above. If the round-quotient is more than dbl-mant-dig bits long, then call round-quotient with double the denominator, 2⋅scl. In either case, the final step is to convert to floating-point using ldexp.

Writing

The algorithm for writing a floating-point number is more complicated because it must generate the shortest decimal mantissa which reads as the original floating-point input.
The approximate number of decimal digits needed to represent a binary number is obtained by multiplying dbl-mant-dig by log210. Try to convert using the floor of this number and test to see whether it would read as the original number. If so, then return; if not, then try with one more decimal digit.
(define (exact-ceiling x) (inexact->exact (ceiling x)))

;; Given positive integer mantissa and exponent (of 2) E2, find the
;; closest integer and exponent (of 10).
(define (dbl->string mant e2)
  (define llog2 (/ (log 2) (log 10)))
  (define f (ldexp mant e2))
  (define quo 0)
  (define point 0)
  (if (> e2 0)
      (let ((num (* mant (expt 2 e2))))
 (set! point
       (max 0 (exact-ceiling (- (/ (log num) (log 10)) (* dbl-mant-dig llog2)))))
 (let ((den (expt 10 point)))
   (set! quo (round-quotient num den))
   (cond ((not (= (mantexp->dbl quo point) f))
   (set! point (+ -1 point))
   (set! quo (round-quotient num (quotient den 10)))))))
      (let ((den (expt 2 (- e2))))
 (set! point (exact-ceiling (* e2 llog2)))
 (let ((num (* mant (expt 10 (- point)))))
   (set! quo (round-quotient num den))
   (cond ((and (positive? f) (not (= (mantexp->dbl quo point) f)))
   (set! point (+ -1 point))
   (set! quo (round-quotient (* num 10) den)))))))
  (let* ((dman (number->string quo)) (lman (string-length dman)))
    (do ((idx (+ -1 lman) (+ -1 idx)))
 ((or (zero? idx) (not (eqv? #\0 (string-ref dman idx))))
  (string-append "." (substring dman 0 (+ 1 idx))
   "e" (number->string (+ point lman)))))))
When e2 is positive, num is bound to the product of mant and 2e2. point is set to an upper-bound of the number of decimal digits of num in excess of the floating-point mantissa's precision. The round-quotient of num and 10point produces the integer quo. If mantexp->dbl of quo and point is not equal to the original floating-point value f, then the round-quotient is computed again with the divisor divided by 10 yielding one more digit of precision.
When e2 is negative, den is bound to 2e2 and point is set to the negation of an upper-bound of the number of decimal digits in 2e2. num is bound to the product of mant and 10point. The round-quotient of num and den produces the integer quo. If mantexp->dbl of quo and point is not equal to the original floating-point value f, then the round-quotient is computed again with num multiplied by 10 yielding one more digit of precision.
The last part of dbl->string produces a string using Scheme's number->string to convert the integer mantissa and exponent components to decimal strings. Mantissa trailing zeros are eliminated by scanning the dman string in reverse for non-zero digits.

References:

Saturday, June 29, 2013

Hash Entropy

http://people.csail.mit.edu/jaffer/III/Hash-Entropy
A browser cookie contains a 128 bit "unique" identifier. 128 bits is an inconvenient size, being longer than native integer size on nearly all CPUs. The 128 bit number can be split into two 64-bit numbers; are both needed to correlate user records?
In my sample of 103 million cookie identifiers, there were only 2 collisions when grouping by the left half of the identifier. For statistical calculations, this 0.000002% error rate is negligible.
The expected number of collisions among k samples out of n possible codes is:
k ( 1 − ( 1 − 1/n )k−1 )
The expected number of matching birthdays among 20 people is 1.016. How many collisions are expected from 103 million samples of randomly distributed 64 bit numbers?
In order to keep the numbers within double-precision range, use the fact that ln(1+ε) ≈ ε for ε near 0:
k ⋅ ( 1 − ( 1 − 1/n )k−1 ) k ⋅ ( 1 − e(1−k)/n )
The expected number of collisions for 103×106 random 64.bit numbers is 575.115×10−6. So the cookie identifiers are not uniformly distributed. Another way of saying this is that the distribution of cookies has less entropy than a 64.bit random variable. How much entropy does the distribution have?
Solving the expected-collisions formula for n will give an estimate of the entropy (c is the number of collisions):
n 1 − k

ln( 1−c/k )
Two collisions among 103×106 cookie-IDs indicate around 52.2 bits of entropy (log2 5.3×1015). Looking at a sample of cookie-IDs notice that the digits under the "M" are all "4" and the digits under "N" are always "8", "9", "a", or "b".
              M    N
3001457e-e07a-4b20-ba16-f3f8624da98f
73f9436b-2868-4325-a62c-bf5a4f38dfdf
1c4df5b6-210b-4aba-9660-bdb129ea6f31
34c3fcb1-bf16-4a96-a7e9-ec377bface30
a1ddb5fc-4bb8-4321-a1bf-753668387197
3edd32e0-5bad-476d-bc1c-8cd8c80b88f8
2ee886c8-d877-480c-b109-455cd3c1b099
5c76c27d-7caf-4adf-8aec-65fee5ce281a
93af502b-61db-4596-bcfe-188c352296b5
ed981151-6179-45cf-8e0d-e4293caf62a0
54ea634b-ebfc-4371-97a7-5b1123264e4b
9a3cce97-2a4b-4a56-8fd9-85d9f07abce4
0bfdbd6f-5fd8-4c74-ab7e-1ccb1a81fd70
84ac5d8a-c64f-48a3-8d11-653aea56e2b4
f5e3afb2-e87b-45e1-9b1c-a49d3e106124
b5f832f5-0771-4554-8a00-da142aab1832
ca025994-9412-4311-bb62-c7e6458a94dc
cf1fc6a6-b3d4-47a5-8aa5-063120d5b8c4
83add10c-b266-4995-8803-805d32c58147
d85c3ec5-3d77-4767-b87b-a42a20dc9f74
b6104d07-b29e-4b07-b3c8-65efd08bd003
21111ab5-83ab-4af8-a39f-1824a4055a9a
78346096-3202-48d8-9ed8-797a0f233804
1d124d3d-8ac2-405b-9a8a-afb70dac4720
e187bd94-c091-4fb5-a3c4-2a50e2fe4e35
8dca4524-cb3d-4a40-ae4d-881976a6173f
94e458b1-04c8-4495-b3fb-4cbbc4c37561
7d0819df-4a08-4553-b10b-379433b09e35
74332f07-3183-479b-a285-04c1a9e61bc4
4ac04ba2-950e-4315-bb14-eb3369dd1591
f8a20163-1db1-47e4-a6b5-d8fa4519eb79
ba778464-0bf4-4575-93e9-ac5b41923e3a
6dda046c-651d-4d9c-9482-4d3c1247a14a
This turns out to be RFC-4122 A Universally Unique IDentifier (UUID) URN Namespace format version 4. The left half of the UUID should have 60 bits of entropy if the random or pseudo-random source is uniformly distributed. The left half having only 52 bits of entropy indicates that the source quality could be improved.
Note that a linearly increasing sequence would not have any collisions although, as a sequence, it has low sequence-entropy. Although having the expected hash entropy is a test which a pseudo-random-number-generator should pass, PRNG sequences must also be "random". So low collision counts are a necessary, but not sufficient condition for PRNG quality.
This collision test's insensitivity to sequence makes it useful for testing hash functions. A good hash function will have have entropy near the number of codes it can output.

Sarah V. Jaffer has a much simpler way to compute the number of collisions. k is much smaller than n. The probability of one choice hitting another is k/n. There are k chances to hit another choice; so the expected number of collisions is roughly k2/n. In the cookie calculation, Sarah's formula matches within one part-per-million.
How are these formulas related? Because ln(1+ε)≈ε, it follows that eε−1≈ε (for small ε). Thus the expected number of collisions is:
k ⋅ ( 1 − e(1−k)/n ) k ⋅ (k−1)

n
Realizing that k≫1 leads to Sarah's formula k2/n.

Wednesday, January 9, 2013

Competing for Energy Savings

MassSAVE (formerly the Massachusetts Residential Conservation Services Program) has been providing conservation services for Massachusetts residential energy consumers since 1980.  It is funded by Massachusetts' gas and electric utilities and energy efficiency providers.

Given that energy-audits and most of the related services are free to Massachusetts households yearly, why are annual participation rates only a few percent?

The largest obstacles to greater participation may be related to everyday suspicions about offers for free services:

* "If it sounds to good to be true, it probably is."

* "If it will save me money, why are they giving it away?"

Countering these suspicions is an uphill battle, as most of the "free" offers people encounter come with hidden agendas.  If a nominal fee were charged, then it would still give people cause for inaction, thinking "an energy-audit probably won't be worth the fee".

Another commonly heard objection to getting an energy-audit is the belief that one will not live in her current residence long enough to justify effort or expense.  This article suggests that we look for motivations beyond the energy savings themselves.

People are less critical about participating in activities organized around civic or charitable purposes.  Even residents without children are often willing to support local school fundraisers.

Unlike energy-audits, those fundraisers involve familiar goods or services like baked goods and car-washes.  To motivate people to participate in an unfamiliar process, it makes sense to frame the activity as a competition, a popular meme in contemporary life. Making that competition be between a few towns (for energy-audit participation rate) increases the chances for winning from the hit-by-lightning odds of lotteries or singing competitions to plausible, achievable levels.

A prize to benefit a town will motivate civic-minded citizens to participate.  If the prize is a green energy source (such as a solar array), then even climate-change deniers must acknowledge that the prize will save the town (and their taxes) money; such savings are valued beyond their true economic worth in contemporary American society.

For a prize benefiting the town, the town's or school's infrastructure for communicating with residents could be harnessed to promote participation.

As for the prize itself, the publicity generated by the competition should be very attractive to green-energy companies, who may be convinced to supply the prize at reduced or no cost.

Because energy-audits can be done annually, these competitions could be repeated each year (perhaps without the previous winning towns).

The energy-audit participation rate competition is an all-winners proposition:

* Energy-audits become widely used and accepted;

* a town wins a green-energy source which lowers their energy costs;

* all participants save money on energy costs;

* electric utilities avoid the need to construct new generation capacity; and

* fossil-carbon pollution is reduced.

I am talking with NextStepLiving about setting up an energy-audit competition northwest of Boston.

Sunday, November 18, 2012

"Mathematical Marbling" article in IEEE-CGA

My article has finally appeared in print:

magazine cover
Lu, S.; Jaffer, A.; Jin, X.; Zhao, H.; Mao, X.; ,
"Mathematical Marbling,"
IEEE Computer Graphics and Applications
Nov.-Dec. 2012 (vol. 32 no. 6) pp 26-35
ISSN: 0272-1716
http://doi.ieeecomputersociety.org/10.1109/MCG.2011.51

Saturday, September 8, 2012

Increasingly Frequent Storm Damage

A recent letter to the editor of the "Bedford Minuteman".

The IPCC says*:

More frequent extreme weather events are predicted to accompany global warming, in part as a consequence of projected increases in convective activity.

The projected increase from 1993 to 2012 is roughly 14% in the Northeast US.

With an increased incidence of severe weather we should also expect to see an increase in tree damage such as fallen limbs and toppled trees. The increase in power outages in recent years is not inconsistent with severe weather having increased 14% in Bedford since 1993.

In addition to causing power outages, power lines being knocked down by tree limbs create a hazard requiring specialists from NStar to repair them. In areas with buried utility cables, downed limbs and trees can be cleared immediately by homeowners and emergency personnel.

In the face of this growing problem, it would make sense for the Town of Bedford to set as a long-term goal the elimination of utility-poles in favor of buried cables. The advantages to Bedford would be:

  1. Reduction in outages of power, telephone, video, and data services due to severe weather;
  2. Elimination of electrocution hazards due to downed power lines;
  3. Faster clearing of damaged trees and downed limbs from roads and yards;
  4. More robust street trees not weakened by pruned centers;
  5. More attractive neighborhoods;
  6. More authentic-looking Historic districts.
Aubrey Jaffer
Bedford Global Warming Action Coalition

* Section 8.3.9.3 "Extreme Weather Events" in
    "The Regional Impacts of Climate Change"
    by the Intergovernmental Panel on Climate Change (IPCC)
    http://www.ipcc.ch/ipccreports/sres/regional/index.php?idp=232