More Probability

The Binomial Distribution

by

Sam Allen

In the last installment we discussed the game of Chuck-a-Luck and found out why, despite his expectations, Joe Tourist will end up a loser if he plays it for any length of time. In order to do so we had to do some deep thinking and perform some calculations. This time we shall show how, by invoking a somewhat abstract concept, a problem such as this may be solved more quickly and with less chance of error. First of all it would be appropriate to give some definitions.

The theory of probability is the science of making decisions based on incomplete information. It is appropriate to use it whenever it is impossible to gather enough information to be absolutely certain or when the cost of being wrong is less than the cost of obtaining enough information to be absolutely certain.

A probability variable is an algebraic entity which expresses a degree of confidence in the existence of something or in the truth of some statement or concept. It is customary to express a probability variable by a letter of the alphabet when its value is not precisely known. If its value becomes known it must be a number in the range 0 to 1. A value of 0 corresponds to impossibility, a value of 1 corresponds to certainty, and any intermediate value expresses a degree of confidence.

There is another doctrine called "fuzzy logic" which has some similarities to the theory of probability. It was created to deal with the inexactness of ordinary language and to calculate what people probably mean when they make imprecise or ambiguous statements. We shall try to avoid having to use fuzzy logic theory by using precise and unambiguous language. Fuzzy logic theory is only twenty years old, whereas the theory of probability has a much longer history. At first most people viewed fuzzy logic as a parody of the theory of probability such as might appear in a college humor magazine, but in recent years some people have started to take it seriously.

The advanced tool which we shall use to calculate the probability of certain combinations of numbers coming up in Chuck-a-Luck is the Binomial Probability Function. It takes three arguments and I shall use the notation B (n, m, p). It gives the probability that in n trials an event having the probability p of occurring in a single trial will occur exactly m times.

An essential condition for using this function is that p does not change, that is, that if the event occurs one or more times it does not alter the probability that it will occur again. This is the case when we roll dice, since the dice have no memory, but it may not be the case when we are dealing with animals or people. We can calculate the probabilities in the case of Chuck-a-Luck merely by giving the Binomial Probability Function specific arguments and finding its value. How do we find its value? There are tables of values in the back of most textbooks on Probability and Statistics and more extensive tables have been published in book form. The trouble is that values for the arguments we are using may not have been tabulated. I have at hand a book which has a table for values of p of 0.1, 0.2, 0.25, 0.3, 1/3, 0.4, and 0.5, but not for p = 1/6. At best this table would be good for a rough check on our calculations, since 1/6 lies about halfway between 0.1 and 0.2. If we anticipated doing a lot of work with dice we might consider creating a table for p = 1/6. Such work usually is done by computers, and the computer which we shall put to the task is the Commodore.

For the moment, let's assume that we have a table of the Binomial Probability Function for p = 1/6 and apply it to the Chuck-a-Luck problem. First note that rolling three dice once is essentially the same as rolling one die three times. We want to know the probability that a particular number will come up exactly once. This is given by B (3, 1, 1/6) and the value in the table is 0.3472.

Since the table we need for the Chuck-a-Luck problem is quite short, I'll give it in full:

B (3, 0, 1/6) = 0.5787
B (3, 1, 1/6) = 0.3472
B (3, 2, 1/6) = 0.0694
B (3, 3, 1/6) = 0.0046

By using the Binomial Probability Function and either looking up the values in a table or invoking a program to compute it we can analyze the game of Chuck-a-Luck. much more easily than before. All we must do is make certain that the conditions for applying the BPF are met. Suppose we have no table for the BPF suitable for our problem. We can do one of three things:

  1. We can calculate the values by hand.
  2. We can write a program in Commodore BASIC to compute individual values
  3. We can write a program in Commodore BASIC to compute and print out a complete table.

Before we can write a program we have to know how to calculate the BPF by hand. It is defined in terms of another function, the combinatorial function. Fortunately, this function is on the keyboard of many scientific hand calculators. It takes two arguments and on the keyboard of my Casio it is represented by nCr. It has a use of its own: it represents the number of ways it is possible to select a group of r members from a population of n members. If you have such a calculator, try using it on the following problem:

PROBLEM: In RVUG, how many ways is it possible to elect 5 officers from a membership of 25?

This is the same as asking how many distinct groups of 5 people you can make up out of a total of 25 people and the answer is given by the Combinatorial Function.

If your calculator is like mine, enter 25, press the nCr key, enter 5 and then press "=." The answer is 53,130. I'll bet this is more than you thought, but remember that two panels are distinct even if they differ by only one member.

In the above calculation we have not taken account of which title each officer has. If we want to distinguish panels on the basis not only of what persons are involved but what title he has, we would use the nPr key instead of the nCr and the answer would be 6,375,600. I'll bet you never knew you had so many options. At the next election, speak up!

Assuming that the nCr function is available, The BDF may simply be calculated as

B (n, m, p) = (nCm)( p^n)(1-p)^(n-m)

where "^" denotes exponentiation. This function is not available on the Commodore and it will be necessary to compute it. The computation looks easy in principle, but there are subtle complications if one wants computation to be fast and to allow for large values of n and m. How the function nCr can be computed is easily illustrated by a few examples. For convenience I'll use the notation C (n, r) instead. This is the kind of notation you would use in programming such a function.

C (5, 3) = 5 x 4 x 3 / (1 x 2 x 3)
C (7, 4) = (7 x 6 x 5 x 4 ) / (1 x 2 x 3 x 4)

The function can be calculated by forming a fraction, the numerator of which is a product of consecutive integers starting with n and working downward and the denominator of which is a sequence of consecutive integers starting with 1 and working upward. Both the numerator and denominator have r terms. If you were computing a table you might want to calculate C in just this way. You would be computing a sequence of values and could calculate the next value in the sequence by multiplying and dividing the previous value by simple integers.

If you were planning on doing a considerable amount of computation which needed only arbitrary values of C (n, r) you might consider preparing a table of values in advance. This would take a few seconds initially, but each subsequent computation involving C (n, r) would be much faster. I don't recommend preparing a table of C (n, r) as such, though, as such a table would require a two dimensional array for storage and would consume a lot of memory. Instead I recommend preparing a one dimensional table of the factorial function. C (n, r) can be computed from the factorial function with three table lookups, one multiplication and one division, which is a good compromise between efficiency of computation and conservation of memory.

The factorial function of n is usually written as n! and is calculated as 1 x 2 x 3 x ... x n. The combinatorial function can be calculated as C (n, r) = n! / [r! (n-r)!]. It is a simple matter to write a code to generate n! from 1 up to about 25 and store the values in consecutive values in the elements of an array. If you go much higher than this, though, you will encounter numerical overflow. In order to increase the range of arguments you can use I recommend instead constructing a table of logarithms of factorials. That way you can construct a table from log 1! all the way up to log 100! without fear of overflow. Computing C(n, r) would involve three table lookups, two subtractions, and eventually an exponential operation. For values of log n! where n > 100 you can use Stirling's Formula, which is accurate for large values of log n! but not for small values. I won't go into Stirling's Formula at this time. It is complicated but the Commodore can handle it.

The assignment for next time is to write a program in Commodore BASIC for computing the Binomial Probability Function. It doesn't have to be as sophisticated as I have outlined - just so that it works.

A collection of values of B (m, n, p) from n = 0 to n = m is called a Binomial Probability Distribution. It is the fundamental probability distribution, as all others either are derived from it or are related to it.

All necessary calculations are within the capabilities of the Commodore. From time to time the Atlantic City casinos offer promotions at which the player actually can make money. It is possible to prove this mathematically, and a library of probability software can help us calculate how best to take advantage of them.

BACK NEXT


  • Return to Library
  • Return to Main Page
    1