Wed Nov 03 03:31:36 1999
exception thrown #3
Patterns and chaos and primes, Oh No!
This article was originally started as a letter to Sam, Karen and Efrem. It was written in response to Sam's stock recommendation for Sun Microsystems which he bases on Sun's participation in the upcoming nanotechnology.
Dear Sam,
Thanks! I'm always looking for some kind of long term investment...
I just picked up an interesting book you
might like to look at - it's called Feynman and Computation: Exploring the Limits of
Computers, edited by Anthony Hey, 1999,
Perseus. It was intended apparantly as kind of a companion to Feynman Lectures on Computation, 1996. But it's quite a bit more than that. The book
contains up-to-date articles by numerous people who had appeared as guest
lecturers for Dr. Richard Feynman's well known class on computation at Cal Tech
(Minsky, James Archibald Wheeler, and Gerald Sussman for example).
Both books nearly fell off the shelf at a Borders bookstore here, so I felt I had to pick at least one of them up. Right now I'm reading a section by Dr Richard Hughes of the Los Alamos National Laboratories on quantum computers. Part of his section which I found very interesting deals with the RSA problem and prime numbers, and Shor's factoring algorithm - the latter being the first known application of the quantum computer.
At the end of the RSA section, Hughes briefly mentions a problem similar to RSA called the discrete logarithm problem - also addressed by a Shor algorithm - and states "this problem, like factoring is computationally intractable, and is also widely used in public key crytography."
Ah the dazzling Zen of problem solving... it's Douglas Hofstader in reverse! Intractability is no drag at all. If you can view problems as opportunities then maybe the bigger the problem, the bigger the opportunity. So there is may be no need at all frustrations about incompleteness. If a problem is NP-complete, it still could have a highly rewarding and completely suitable application - such as maybe cryptography. In the physicists view, if you have to protect a piece of information for N years, use an algorithm that takes N+ years to break.
One might say perhaps is that the physicists tendancy toward practicality (if however optimistic at times) can succeed in applying what mathematics finds as a limitation. In other words, if you can't knock over the windmill with your spear, 1) use that fact to construct a windmill that can't be easily knocked over, or 2) instead try creating a Quantum Spear.
One of my favorite NP problems - so far I haven't seen any encryption based on it, but I'm sure it's coming - is called Ramsey Theory -- more on this further down. Ramsey Theory concerns ordered patterns within chaos and became a pet puzzle of mathematician Paul Erdös, a mathematician who inspired many others.
Feynman has inspired many others as well. A parallel theme in the book is Feynman's fascination with making things very very small such as in the 1959 lecture Plenty of Room at the Bottom, creating tiny motors and writing the Encyclopedia Britanica on the head of a pin. Not only did this inspire scientists and engineers to actually do such things, the lecture, poised on the edge of the transistor age forshadows the development of large scale integration and microprocessors, just as today we are poised on the edged of nanotechnology.
Like Sherlock Holmes, I have to believe that nothing is actually an accident. Since nothing was an accident, we don't really need it in the number system, eh? Yet it is the other end, however, the problems of the very very large which are the hobgobblen of primes and factoring.
"When all other possibilities have been eliminated, what remains, no matter how improbable, must become a Java library" -- Sherlock Gosling.
Michael Swaine of Dr. Dobb's Journal also has a column this month called Magic (Programming Paradigms), discussing Shor's factoring algorithm and the quantum computer - Swain claims that his broad-brush article and explanation of factoring isn't all that complete or useful in constructing Shor's algorithm. However, Hughes explaination is - or at least it looked interesting enough to make me want to code an algorithm from. Probably more importantly, he refers to the General Number Field Sieve - an algorithm that is fairly representative of current research on primes and factoring. The latest thinking on "non-quantum" algorithms at least is to use an elliptical curve function. For more information on primes and factoring see the Quick guide to "Prime real-estate" on the web.
So to begin, I openned up a Visual Foxpro window and started typing. I got through the phi formula: phi(N) = the number of "relative primes" of N. A "relative prime" to a number N is a number less than N which is not a factor of N or a multiple of a factor of N. More formally, a relative prime is one whose "greatest common denominator" (GCD) with N is equal to 1.This is actually not too hard to calculate - as long as N is at least fairly small - less than 64 bits.
So for the number 6, phi(6) = 2, for example because the integers 2,3, and 4 are all either factors of 6 or multiples of factors - so only 1 and 5 are relative primes. Since there are two of them, phi(6) = 2.
I won't go over the program yet, it's complicated, but suffice it to say, once written, you can use phi() to easily determine if a number is prime or not. If N is a prime, then all numbers less than N are relative primes so phi(N) = N -1. So, if you pick any number and calculate phi of it, if phi is 1 less, then the number is a prime.
Of course, along the way in calculating phi, the program has to find all the factors of N. And if N is not a prime, calculating phi(N) can take either a lot of space or a long time. For non-primes, phi() produces some interesting results. If I type:
? phi(31523)
and go away from the computer for about 6 seconds, it comes back with:
30408
The number 30408 looks kind of suspicious. It's fairly close to 31523 - it's only 1115 away. We can draw three conclusions.
First, there must not be many factors of 31523 - probably just 2.
Second, one factor of 31523 must appear about 1100 times.
Third, the other number must then be close to 1100.
If you divide 31523 by 1115, you get 28.271749... So a good guess might be that one factor of 31523 is 29.
And it is. The other happens to be 1087. And both are primes. When you get a phi(N) which is fairly close to N, it's probably a product of two primes. But the next equation definitely threw me.
You chose two primes and a public exponent (e) and the formula calls for calculating the decrypting exponent using modular arithmetic. The formula is easy to type in to Basic or Visual Foxpro:
d = mod(e^(phi(phi(N))-1), phi(N))
( ^ is the exponential operator and mod is the modular arithmetic function ).
But making it work is not so easy. Let's say you pick 13 and 17 as primes - N = 13 * 17 = 221.
phi(N) = 192
But now, you need to pick a fairly low number as the exponent. Because otherwise, you run into the 64 bit problem very quickly. It has to be a relative prime to 192 - so suppose I pick 5. When I type in
? mod( 5 ^(phi(phi(N))-1), phi(N))
What I get back is:
0.0000
This does not look promising. Even with 5, some data was lost or truncated. It's the 64 bit problem.
Not all languages or computer hardware are adept at large numbers - numbers greater than 64 bits - about 20 decimal digits. In fact, many computer assembly languages - excluding a Cray of course - have numbers that go up to about 64 bits and then it's double precision time - meaning part of the number is used as an exponent and the rest of the resolution is truncated. But truncated numbers and modulus arithmetic do not mix. To do modulus, you need integers. To get big integers and exponents, you need crunch power.
The Univac 1100, which had a 36 bit word, could do integer arithematic up to 72 bits. The CDC 6000-7000 series could handle double-word integers of 120 bits. To do more than that, you need to create a specialized non-native program or library. Enter Java - the final coding place - the eventual destination of all specialized program libraries.
I decided to take a look at Java's BigInteger library. According to the documentation, the BigInteger library can handle "arbitrary-precision integers" with an "infinite word size."
Not bad for a start. Not only that, Java BigInteger happens to be especially designed to be used to solve problems with prime numbers, prime generation, GCD calculations. For instance, it has a method called "isProbablePrime()" I haven't found a use for that one yet.
But it also has a method called modpow() that will do most of the modulus-exponential formula above. How could it get better than this?
Well, I think it could get better if you didn't have to do so much crunching. You need more uproar. More fun. Essentially, you can do this with less crunch and more chaos.
They say what ever problems a psychotherapist is working on will appear in their patients. Maybe the same can be said for mathematicians as well. Paul Erdös spent so much time travelling and wrote so many papers with so many people that - as you probably are aware, many people now publish their "Erdös number" on their web site.
The Erdös number means how many co-authors you are away from Paul Erdös. He's kind of the Kevin Bacon of math. Apparantly, according to "the Oracle of Bacon," Brett Tjaden mathematical observation of "the six degrees of Kevin Bacon" exists because of Erdös -- or at least because of Erdös numbers. I guess it's a chicken and egghead question.
But how very appropriate is Ramsey Theory to have been uncovered and organized by Erdös. According to Cheryl Mootz's paper The Mathematician Who Never Died:Ester Klein introduced Erdös and George Szekeres to the original problem that became Ramsey Theory. As Erdös construes Ramsey Theory -- in terms almost anyone can relate to -- it's about parties.
It asks how many guests do I have to have to guarantee that either N guests all know each other - or M guests are total strangers who don't know each other. N and M can and often are equal. And they're usually very small numbers like 3 or 4. In fact if they're bigger than about 5, you can't solve the problem at all. (Maybe here's another application for a quantum computer). But the good news is you don't need big integers or exponents to look at Ramsey problems.
Let's say you want to know how many guests to invite so that at least two people know each other - or don't know each other. This you can do in your head. Let's start call the first person A and the second B and so on. We assume that there aren't any celebrities and there aren't any therapists (who are pretending not to know one of their patients). If A doesn't know B then B doesn't know A.
A is at the party and B arrives. You ask "Hey A, do you know B?" He says "No." You're done. You now have two people who don't know each other.
If you want to know the answer for 3 people, you can actually figure this out on a piece of paper drawing circles and lines. Suppose they go around in a circle for a while: A knows B. B knows A and C - but A doesn't know C. C knows D and B. D doesn't know B. But D and A know each other. After a while, when you add the next guest, three of the guests will have to not know each other or else all know each other.
See if you can
figure out how to add in the next guest E. The problem for 4 guests takes 18 total guests. To demonstrate in a practical way, first, let's get rid of the the idea of "knows" and "doesn't know," and subtitute "neighborhoods" instead. Let's assume that all the characters in an alphabet are from particular neighborhood. The neighborhoods do not overlap. Let's say A,B,C, and D are in one neighborhood, E,F,G, and H are in another, and so on with I,J,K,L, etc. |
![]() |
Neighborhood |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
ABCD |
EFGH |
IJKL |
MNOP |
QRST |
UVWX |
YZ12 |
3456 |
7890 |
Now, we start inviting guests. We'll invite 18 guests: F, A, 4, M, L, B, 1, X, Q, 5, Z, E, V, J, P, R, W, and A.
Yes, we can invite A more than once, and of course A and A are in the same neighborhood. Now, lets group the guests by neighborhood:
ABA
EF
JL
MP
QR
VWX
Z1
45
Looking over our guests, we see that there are three from each of neighborhoods 1,2, and 6 but there are no neighborhoods with four or more representatives. According to Ramsey theory, this means there must be at least four guests from different neighborhoods - and there are.
Now find the fourth guest from a different neighborhood in the original series - in this case it would be the "M." And we're going to go to the right one character - this would be the "L." We now send that guest home and replace him with first character of
"Hey babe take a walk on the wild side."
The series of guests now reads as follows.
FA4MHB1XQ5ZEVJPRWA.
Now we start another party and invite 18 more guests. One of the guests will be the fourth from the same or different neighborhood. We replace the next guest with the letter "E." And then - you guessed (guest?) it - another party!
At the end, we will have sent only 40 guests home so we can send an actual message. It's a sacrifice on their part - considering that we could be sending 18 x 18 x 18 x ... to the 40th power messages.
There are some other details to work out of course since certain guests are party animals (like Erdös) and are more likely to attend than others. Some parties may get out of hand and have to be scrapped because adding a different guest can alter the 4 same-4 different resolution.
We need additional neighborhoods for space and punctuation marks. The neighborhoods don't have to always be the same or in groups of 4 and we can invite more than 18 guests to a party. If we want more noise...