dacs.doc electric

 

Computers and Creativity
Part 3: The Black Box Problem, or
Too Much Learning Can Be Dangerous

by Richard P. Ten Dyke

 

In this series of articles we use simple computer problems to illustrate concepts of the creative process. In Part 1 we started with optimization, the process of making continual improvements to achieve a result that is the best we can do. Then in Part 2, looking for a magic square, we introduced the concept of the local optimum, that is, finding a solution that is better than its nearest neighbors but not the best over all. We used random behavior as a means of making changes and for breaking out of traps. In doing this we discovered an interesting paradox: it is possible to use disorder to lead to order. Now in part 3, we will explore a new kind of problem, called the “Black Box.”

A Black Box is like an unopened Birthday present. We ask the question “what’s inside?” You look at it (how big is it?) and you pick it up (how heavy is it?) and you shake it (does it have parts?). Then you run down the list of possibilities ... a book?... a necktie? ... an electric train?

The airplane flight recorder, a device carried aboard aircraft to record events that precede a crash and called “a Black Box” gets its name from a device of the same name that is familiar to first-year students of electrical engineering. There, the infamous Black Box housed an electrical circuit. It was, literally, a sealed black wooden box, with two connections for input, two connections for output, and circuit inside that consisted of some combination of a resistance, a coil, and a capacitor. The student is told to apply a series of electrical signals to the input terminals, measure the signals that emerge as output, and with that information, deduce and explain the circuit inside the box.

The Black Box illustrates a basic situation in research. We know how a system behaves, but we are curious to know what is causing it to behave that way. More important, we want to predict how it will behave in the future.

Our characterization of the Black Box problem is quite simple. Instead of an electrical circuit, we will look at a process that is the result of a mathematical equation. We will use a problem that is found in intelligence tests. Consider the sequence of numbers below: can you fill in the blank?

-1, 1, 0, 1, 1, ___?

We have an unknown “system” that is creating a series of numbers. We look at the numbers that have been created, and try to guess the next number. This sequence is produced, we think, by a mathematical equation. But what equation? Can we program a computer to find the answer? There might be more than one equation that will reproduce the series, but we will be looking for just one — a simple one.

To solve it, we will use the principles established in the previous two parts. First, we need a measure of success. Second, we need to find trial solutions that we can modify. Third, we need to test the trial solutions against the goal. We will again use randomness as a tool.

The equation we seek will use the basic mathematical functions: add, subtract, multiply, divide, and exponentiation. We will also need some arguments (operands) for the operators to work with. These will be selected, randomly, from the numbers in the series, the index value of each number in the series, and a few constants like the integers from 0 to 5.

We need some trial equations to choose from, so we use a combinations of operators and operands that are assembled using a “random algorithm generator.” For each trial we see if an equation can predict the fourth number in the series based on the first, second, and third. If we have a success, we then apply the same randomly created equation to see if it will also predict the fifth number in the series based on the second, third, and fourth. If we get a match on the second try, we then assume success, and calculate the sixth value based on the values for the third, fourth, and fifth.

It actually works. In this particular case, after a large number of trials, the computer says that the fourth number in the series has been found to be the sum of the second and third. Shifting to the right, it finds that the fifth number is the sum of the third and fourth. Therefore, the sixth number is predicted to be the sum of the fourth and fifth which is 2 ( 1 plus 1 ), so that is our solution. Granted, other algorithms may meet the criteria and come up with the same or even a different result, but our simple answer works and therefore is an acceptable solution to the problem.

The same method will solve almost any series that can be expressed in terms of simple mathematical relationships between the values. In each case, the computer will search through thousands of randomly generated equations until it finds one that works for that particular series.

Playing around with this toy problem many years ago I came upon an interesting and unexpected insight. I added to it a feature called “learning.” The program would need to discard many proposed solutions before it would find one that worked. So, when the computer was finally able to achieve a solution to a series, I programmed it so it would adjust a table of probabilities so key characteristics of a solution would be more likely to be selected the next time to increase the probability that a trial solution would succeed. I then asked the computer to solve the same sequence several times in succession to determine if the computer would be able to solve subsequent trials more quickly.

When the experiment was being run — many years ago — it was necessary to submit the jobs to the computer center and wait several hours for the result. I decided to batch-up some experiments. I took two distinct series, and asked the computer to solve the first one about 100 times, then do the same to the second. I expected to get a curve showing the value of learning in each case. But I got an unexpected result. The computer would solve the first series with ease, and subsequent trials with the same series proved faster. But the computer would hang up on the second. I did not understand this, since the computer had been able to solve both series on a previous occasion. So I switched the order of the two and tried again. Now, the computer was able to solve the first one — the one it did not solve before — but was unable to solve the second one. Then the light-bulb went off. The reason was obvious. The process of learning had done me in. After solving one series so many times in succession, the computer’s ability to come up with randomized solutions had vanished. The computer had “learned” to a point where it could only solve one problem, the first series, whatever that one was, but had lost the “imagination” to recognize a new situation when it occurred.

How human! I had discovered the principle of overlearning.

I easily overcame the difficulty by asking the computer to solve a variety of series in a mixed up order, so that it never got the chance to overlearn.

I have thought about this experiment for many years. It illustrates for me the value of a person having a variety of experiences early in life, so that paths to new ways of thinking are available later on. In the interesting coincidence department, a recent Wall Street Journal contains a letter by Jim Pandjiris concerning the problems with today’s railroad industry. Quoting from the letter: “Many senior officials had worked only in the railroad industry, and many operating practices were based on custom ( ‘we always do it that way’ )” which illustrates how overlearning can be a problem in the corporate world as well.

So we ask, is this just a toy problem, one that is not scaleable to larger and more significant situations?
Yes and no. Similar methods are used today to find relationships for setting company policies and practices.

For example, a teller in a bank follows policies to determine whether the bank will cash a customer’s check. Does the person have an account at the bank? Is the check drawn from this branch or another branch? Does the issuer of the check have an account at the bank? Computer programs use a history of outcomes, to develop predictive algorithms based on real experiences in cashing checks. Bank policies are then established to reduce the likelihood that bad checks will be accepted in the future. Similarly, credit card companies use predictive algorithms to try to guess when a credit card is being fraudulently used.

A technique that has been applied to these kinds of problems is “neural networks” which uses mathematical functions that can be assembled and manipulated to resemble complex logic and relationships. It is a form of of reverse engineering. I once was given a quantity of market research data in which several hundred respondents were asked to select and rank a variety of proposed products. The products were similar in function but with differences in several key characteristics. The client wanted to know which characteristics were the most important to insure the success of the product. Using a neural network technique, we created a model of the decision process using these characteristics, and then “tuned” the model to most closely reproduce the results of the market research. We could then disassemble the model to find out which of the product characteristics were the most important to the potential customer.

The results were then compared to a parallel study in which the same respondents were asked directly, which characteristics were the most important.

The results were surprising. When faced with an actual decision, the model showed that the respondents used entirely different criteria to make their selections than the claimed they used in the parallel study. It is as if the respondents did not really know how they made their decisions. When asked “what characteristics are important to you”, they provided answers that they thought were reasonable or perhaps “expected” rather than those which they actually used when faced with an actual decision. A warning to market researchers in the future.

But using these techniques sometimes requires a leap of faith rather than logic. Models may be created to fit the results, without knowing whether the results are caused by a similar model. One can only say that the model would have worked in the past, and perhaps it will continue to work in the future. The amount of risk inherent in this approach depends on the situation, so the technique should be used with great care, for it is possible to create a model that can explain history but has no predictive capabilities whatsoever. Nevertheless, it remains a useful tool in those cases where cause and effect relationships do exist.

On a much broader scale, as we approach the 100th anniversary of
Einstein’s 1905 paper on relativity, physicists are still searching for a theory that will unite Relativity and Quantum Mechanics and that will explain the forces in the universe including gravitation, magnetism, and the sources of energy. A way to describe the problem that physicists are attacking is to see it as a complex Black Box problem. We know the inputs and the outputs, and now physicists are looking for the mathematical relationships that will explain a vast quantity of these well documented results.

Next month: Playing Poker.


Richard Ten Dyke, a member of the Danbury Area Computer Society, has previously contributed to this newsletter on the topic of Digital Photography. He is retired from IBM and can be reached at tendyke@bedfordny.com. All opinions are his own, and he welcomes comments.

© 2004 Richard P. Ten Dyke

BackHomeNext

© Copyright Danbury Area Computer Society, Inc. 1998-2003 All Rights Reserved
Web Site Terms & Conditions of Use