BigInt - Multiprecision Integers
Welcome to the home of BigInt, my class for handling arbitrarily large integers.
BigInt has been around for a few years now and has proved quite popular.
But apparently some Johnny-come-lately outfit called Microsoft is trying to
muscle in on the market for large integer arithmetic with their new
System.Numerics.BigInteger type. I say Bring 'em on.
A special welcome to those who are using BigInt to solve problems on
Project Euler! If you find it useful please let me know - I would
love to hear from you. (One problem that responds well to the BigInt treatment
is #56).
Here's the C# file that you need:
Choosing the target audience
I didn't appreciate at first the impact of the storage base on the end product.
The trouble is that converting between binary and decimal is tedious and slow.
Think about it - adding another most-significant decimal digit to a number can change
the value of any bit in its binary representation except bit zero. So if possible
you should choose a base and stick with it. Binary has the advantage of native
speed and efficiency, but decimal has the advantage of being more human-friendly.
If you were doing something like SSL where humans don't need to see the numbers
then you would choose binary. But BigInt's target audience is people who want
to input lots of big base 10 numbers and see lots of big base 10 answers, so I went
with base 10 storage. As you will see, this makes BigInt pretty slow for things
like RSA encryption.
Storage
To store a BigInt, I break the number up into "chunks" of three decimal digits.
Chunk zero holds the three least significant decimal digits, chunk one holds the
next three digits, and so on. Obviously, each chunk has 1,000 possible values
(0-999). Each chunk is held in a block of ten bits. Now, ten bits can
actually hold 1,024 different values, so there is about 2% wastage in this method,
but that's a pretty good trade for the speed and simplicity of storage. Of
course, the computer works in multiples of eight bits, so some shifting and
masking is required to manipulate the chunks. The chunks are divided into
"blocks" of 8 chunks, each of which occupies 10 x 8 = 80 bits, which is 10 bytes
or 5 shorts. An array of bit masks tells us which bits of which bytes within
the block hold the chunk we want.
In addition to the bit array, there is a Sign property to indicate positive (1)
or negative (-1). That two's-complement stuff just gives me a headache.
I decided to make BigInts immutable, like .NET strings. Once you have created
a BigInt (through construction, addition, or whatever), you can't change its value.
Addition (and subtraction)
Adding a + b (a,b >= 0) was pretty simple - just add a.chunk[i] + b.chunk[i]
+ carry to get answer.chunk[i]. O(N). Then I tried to add support for
signs and it got a little tricky. In the end, two extra steps were required
for the addition of two numbers with opposite signs (a.k.a subtraction)...
First, the number with larger magnitude (absolute value) has to be the positive
one (so that intermediate steps can borrow if needs be). If the larger number is
the negative one then we compute -((-a) + (-b)) instead. For example, if asked
to compute 11 + (-99) we would actually compute -(-11 + 99). Unary minus is
of course a very fast operation since we just flip the sign indicator.
Second, you have to borrow 1000 when the positive number's chunk is less than the
negative number's chunk (plus carry). In this case the carry for the next
chunk becomes -1.
Multiplication
Armed with addition, (naive) multiplication is easy. Easier than addition
really, because you don't have to worry about signs or borrowing on the intermediate
steps. Just multiply every chunk in a by every chunk in b (with shifts) and
add up the results. The sign of the result is a.Sign * b.Sign. The only
trouble is, this is an O(N2 ) operation. Thanks to
http://www.tc.umn.edu/~ringx004/article-main.html for pointing out the faster
Divide and Conquer method, which proved easier to implement than it first appeared.
The BigInt class now defines a threshold number of chunks, above which the D&C
algorithm is used and below which the naive algorithm is used.
Div and mod
This was the tough one: how to divide a by b and get the (integer) quotient and
remainder? I just wanted to implement a naive division that essential 'undoes'
the steps of naive multiplication, but for a while I couldn't see how to unpick
all those additions. Eventually I realised something: if you divide the first
few digits of a by the first few digits of b, you will get an estimate for the first
few digits of the quotient that will be high by at most 1. Picking two numbers
at random...
a = 64,745,023,423,004,376,034,552
b = 798,963,467,349,673,448
To start calculating a/b, BigInt takes the most significant chunk of each number
(64 and 798). Since 64 < 798, it appends the next chunk of a to make 67745.
Then its guess for the first few digits of the quotient is 67745 / 798 = 81.
There is a chance that this is one too high, so the next step is to check by calculating
(81000 * b), which is 64716040855323549288000. Since this is less than a,
we know that 81 is correct. Now we subtract (81000 * b) from a to get the
remainder and go round again. Keep going until the remainder is less than
b and there you have it.
Raising to a power
The latest version of BigInt usurps the ^ operator, which is supposed to mean
XOR in C#, and uses it as the raise-to-a-power operator. So (BigInt)10^100 == BigInt.Power(10, 100)
== 10 to the power of 100 == one googol. I chose this operator because
it's often used for this purpose in other applications, and because there isn't
an obvious alternative.
The one drawback is that the precedence isn't quite right, because the power
operation binds tighter than xor. You would expect (BigInt)2 ^ 3 + 4
to yield 12 (8 + 4), but in fact it yields 128 (2^7). So by all means use
the ^ operator, but remember to use plenty of parentheses!
BigInt in action
Armed with all of this, what can we do? In RSA with BigInts
you can see yet another crude RSA encryption system, this time using BigInt to crunch
(almost) realistically big messages and keys.
Homer's Last Theorem
Believe it or not, we can use BigInt in the context of a Simpsons episode.
In the Homer3 segment of Treehouse of Horror VI, Homer enters
the hypothetical "z" axis of the Third Dimension. Here in the background
floats the "equation" 178212 + 184112 = 192212.
If this statement were true it would disprove Fermat's Last Theorem. But
in fact it is not true, it is just a "Fermat near miss". Since the
nine most significant digits on the right and left hand sides are equal,
it would look true when calculated on a typical (non-BigInt) calculator. The
following code searches sums of powers of twelve of the numbers between 1,000 and
2,000 and finds a couple of these near misses, including Homer's. It also
shows the true values in these sums, which are obviously nowhere near equal.
const
int ARRAY_SIZE = 2000;
BigInt[] powers =
new BigInt[ARRAY_SIZE
+ 1];
for (int
x = 1; x < ARRAY_SIZE; x++)
{
powers[x] = (BigInt)x ^ 12;
}
string[] frontDigits =
new string[ARRAY_SIZE
+ 1];
for (int
i = 500; i < ARRAY_SIZE; i++)
{
frontDigits[i] =
powers[i].ToString().Substring(0, 9);
}
for (int
i = 1000; i < ARRAY_SIZE; i++)
{
for (int j = i +
1; j < ARRAY_SIZE; j++)
{
BigInt fermatMissTest = powers[i] +
powers[j];
string missTestDigits =
fermatMissTest.ToString().Substring(0, 9);
for (int k = i;
k <= ARRAY_SIZE; k++)
{
if (frontDigits[k] == missTestDigits)
{
Console.WriteLine("{0}^12 + {1}^12 \"=\" {2}^12", i, j, k);
Console.WriteLine(" {0}^12
+ {1}^12 = {2}", i, j, fermatMissTest);
Console.WriteLine("
{0}^12 = {1}\n", k, powers[k]);
}
}
}
}
Here is the output:
1363^12 + 1808^12 "=" 1813^12
1363^12 + 1808^12 = 1261170053084770027529532240400562560817
1813^12 = 1261170051398003589253823949865506481681
1782^12 + 1841^12 "=" 1922^12
1782^12 + 1841^12 = 2541210258614589176288669958142428526657
1922^12 = 2541210259314801410819278649643651567616