                        PENTIUM STUDY

Summary:

IBM Research focused on the likelihood of error on the Pentium chip in
everyday floating point division activities. Intel has analyzed the
probability of making an error based on the assumption that any possible
64 bit pattern is equally likely to occur in both the numerator and the
denominator. If that were the case, then the chances of the error would be
1 in 9 billion. They also estimate that an average spreadsheet user will
perform 1,000 floating point division per day. Based on these assumptions,
Intel estimates that an error will be encountered only once in 9 million
days (or once in about 27,000 years).

Our analysis shows that the chances of an error occurring are significantly
greater when we perform simulations of the types of calculations performed
by financial spreadsheet users, because, in this case, all bit patterns
are not equally probable.

Probability Tests:

We have analyzed a Pentium chip in order to understand the sources of
errors and have found that in order for an error to occur, both the
numerator and denominator must have certain "bad" bit patterns, as
described below.

First, for the denominator to be "at risk", that is, capable of producing
an error with certain numerators, it must contain a long string of
consecutive 1's in its binary representation. Although such numbers
represent only a very small fraction of all possible numbers, they do
occur much more frequently when denominators are created by adding,
subtracting, multiplying or dividing simple numbers. For example, the
number 3.0 represented exactly, does not have that pattern, but the result
of the computation 4.1-1.1 does have that pattern.

How many denominators produced in this fashion can be "at risk" that is,
capable of producing an error for certain numerators? When we randomly
added or subtracted ten random numbers having a single digit dollar amount
and two digit in cents, for example, $4.57, then one out of every 300 of
the results was "at risk" and hence capable of producing an error. If we
repeated the test with numbers having two digit dollar amounts and two
digits in cents, then one out of every 2,000 could cause an error. If the
denominator was calculated by dividing two numbers having one digit to the
left and one to the right of the decimal point, then approximately one in
every 200 could cause an error.

For simplicity, suppose that one of every 1000 denominators produced by
some calculations was "at risk."

Now, suppose we have created a bad denominator. What is the chance of now
encountering a bad numerator, which will produce an error? It depends on
the actual value of the "at risk" denominator, but based on our tests, a
conservative estimate would be that only one out of every 100,000
numerators causes a problem.

Finally, when we combine the chances of a bad numerator and the chances of
a bad denominator, the result is that one out of every 100 million
divisions will give a bad result. Our conclusion is vastly different from
Intel's.

Frequency Tests:

We also questioned Intel's analysis and assumption that spreadsheet users
will only perform 1,000 divides in a day. Tests run independently suggest
that a spreadsheet user (Lotus 1-2-3) does about 5,000 divides every
second when he is calculating his spreadsheet. Even if he does this for
only 15 minutes a day, he will perform 4.2 million divides in a day, and
according to our probability findings, on average, a computer could make a
mistake every 24 days. Hypothetically, if 100,000 Pentium customers were
doing 15 minutes of calculations every day, we could expect 4,000 mistakes
to occur each day.

Conclusion:

The Pentium processor does make errors on floating point divisions when
both the numerator and denominator of the division have certain
characteristics. Our study is an analysis based on probabilities and
chances. In reality, a user could face either significantly more errors or
no errors at all. If an error occurs, it will first appear in the fifth or
higher significant digit and it may have no effect or it may have
catastrophic effects.
-----------------------------------------------------------------
Additional Technical Detail:

Some Experiments on Pentium Using Decimal Numbers

According to an Intel white paper, if you were to choose a random binary
bit pattern for numerator and the denominator, the probability of error in
divide is about 1 in 9 billion.

The error occurs when certain divisors (termed "at risk" or bad) are
divided into certain numerators. In order for the error to occur, our
belief is that divisors must lie in a certain range. For each such
denominator, there is a range of numerator values which produce an
incorrect result.

An example of affected numbers is the decimal constants we hardwire in our
programs. For example, if converting from months to years and we are
interested in 7-8 decimal digits of accuracy, then we can hard wire a
constant to convert from months to years.

         alpha = 1/12 = .083333333

Let us construct a hypothetical example. We have contracted a job which is
expected to last 22.5 months. The total value of the contract is $96000.
From this, tax at the rate of 14 and 2/3 percent rate has to be deducted.
The taxing authority has defined 14 and 2/3 percent to be 14.66667. We
want to calculate the net take at a per annum basis. We do the following
calculations.

                      Tax = 96000*.1466667 = 14080.0032

  Net take home money = 96000 - 14080.0032 = 81919.9968

        The number of years in 22.5 months = 22.5*.083333333
                                           = 1.8749999925 years

             Net take home money per annum =  81919.9968/1.874999925
                                           =  $43690.6667

Most machines give the above answer which satisfies the desired 7-8 digit
accuracy criterion. On Pentium, the answer is $43690.53, which has only 5
correct digits.

In this example, both numerator and denominator are bad numbers. They are
both near some simple integer boundary in their binary presentation and as
you rightly observed, these numbers occur in real world at a much higher
frequency compared to the totally random bit pattern hypothesis.

Probabilistic Analysis

We are addressing the question of how likely it is to have a bad divisor.
On Pentium, a bad divisor belongs to one of the five bad table entries
characterized by 1.0001, 1.0100, 1.0111, 1.1010, and 1.1101, followed by a
string of 1's in the mantissa.

We have found that if the string of 1's is of length 20 or so, then it is a
bad divisor. Given a bad divisor, the probability of making an error in
the division increases dramatically, compared to the 1 in 9 billion figure
quoted by Intel.

We did some simple experiments using decimal numbers and the findings are
reported below. We counted only those bad divisors which belong to one of
the above five table values, followed by a string of 32 1's. Intel people
argue that all binary patterns are equally likely. If that was really the
case, the probability of finding a bad divisor, as defined above, will be
5/(2**36) or about one in 13 billion random divisors. However, we are
finding the probabilities to be much higher.

Addition/Subtraction of Decimal Numbers

In this experiment, we randomly added or subtracted, 10 uniformly
distributed random numbers having one or two decimal digits (as in dollars
and cents) and then we examined the result for the above binary patterns.
Here are the results for two cases. In the first case, we chose only one
digit to the left of decimal (as in $3.47) and in the second case, we
chose two digits to the left of the decimal (as in $29.56). All the digits
were chosen randomly with uniform probability. In the third case, we chose
one digit to the right of the decimal point and two digits to the left.
The results below give the number of times the result of this experiment
has the bit pattern corresponding to a bad divisor.

 Case 1 (one digit to the left, two to the right)   ---   188 out of
100,000
 Case 2 (two digits to the left, two to the right)  ---    45 out of
100,000
 Case 3 (two digits to the left, one to the right)  ---   356 out of
100,000

Clearly, these probabilities are much higher than those obtained with the
random bit pattern hypothesis.

Division Of Two Decimal Numbers:

These experiments were conducted through exhaustive tests on all possible
digits patterns. Here (a.b)/(c.d) represents division of a two digit (one
to the left of the decimal point and one to the right of the decimal
point) number by another two digit number.

         (a.b)/(c.d)   -  44 out of    10,000
         (0.ab)/(0.cd) -  27 out of    10,000
         (a.bc)/(d.ef) - 344 out of 1,000,000
         (ab.c)/(de.f) - 422 out of 1,000,000

Multiplication of Two Numbers

Here we are multiplying a decimal number by another number which was
computed as a reciprocal of another decimal number as in scaling by a
constant.

         (a.b)  * (1/(c.d))   -    37 out of    10,000
         (a.bc) * (1/(d.e))   -   139 out of   100,000
         (a.bc) * (1/(d.ef))  -   434 out of 1,000,000

To summarize, for the decimal calculations of the type given above, the
probability of having a result which falls into the category of being a
bad divisor is rather high. It appears to be somewhere between 1 in 3000
to 1 in 250. Let us say that it is of the order of 1 in 1000.

Furthermore, if the rounding mode corresponds to truncate, the probability
of arriving at bad divisors increases significantly.

The Dependency on Numerator

Given a bad divisor, the divide error occurs for some range of values of
the denominator. If we were to take a totally random bit pattern for the
denominator, the probability of error appears to be of the order one in
100,000. This is a first cut rough estimate and probably could be
improved. It appears that probabilities are different for different table
values. The table corresponding to '1.0001' seems to have the most error.
For numerator also, there are bands of values where the error is much more
likely. Again these bands are more prominent near whole numbers. For
example. if we were using (19.4 - 10.4) = '9' as a divisor (a bad one),
and you picked a random value between 6 and 6.01, as the numerator then
the chance of error increases to about one in 1000.

For the purpose of our simplistic analysis, we will use the figure of 1 in
100,000 for a bad numerator. This assumes that we are picking up a random
numerator. Using the value of 1 in 1000 as the probability for a bad
divisor, the overall probability for a 'typical' divide being incorrect
seems to be of the order of 1 in 100 millions. This is about two orders of
magnitude higher compared to the Intel estimate of 1 in 9 billion.

Probability of a Divide Instruction

Let us assume that a Pentium operating at 90 MHz does an op in 1.2 cycles
on the average. That will give about 75 Million ops per second of actual
compute time. We will use a figure of 1 divide per 16,000 instructions,
even though many estimates suggest a much higher frequency of divide.

Thus using this conservative estimate of one divide per 16,000
instructions, we come up at about 4687 divides per second. Let us further
assume that a typical spread-sheet user does only about 15 minutes of
actual intensive computing per day. Then, he is likely to do 4687*900 =
4.2 million divides per day. Assuming an error rate of 1 in 100 million,
it will take about 24 days for an error to occur for an individual user.

Combine this with the fact that there are millions of PENTIUM users
worldwide, we quickly come to the conclusion that on a typical day a large
number of people are making mistakes in their computation without
realizing it.

  IBM Corporation
  ibmstudy@watson.ibm.com

 ============================================================
 From the 'New Product News' Electronic News Service provided
 via AOL (Keyword = New Products) & Delphi (GO BUSINESS PROD)
 ============================================================
 This information was processed from data provided by the 
 company or author mentioned. For additional details, please 
 contact them directly at the address/phone number indicated.
 OmniPage Pro is now used for converting all printed input! 
 ============================================================
 All submissions for this service should be addressed to:
 BAKER ENTERPRISES,  20 Ferro Dr,  Sewell, NJ  08080  U.S.A.
 Email: RBakerPC (AOL/Delphi), rbakerpc@delphi.com (Internet)
 ============================================================
