Friday, April 4, 2008

Integer Partitionings: 3-Partitions

This will be, I think, a relatively gentle treatment of what could be a rather complicated matter.

The problem came up in problem seminar this week, the matter of integer partitioning. Specifically, given an integer n, how many different ways can you write it as the sum of smaller integers? More specifically, and a notably easier problem, how many different ways can you write it as the sum of three smaller integers?

For example, 6 can be written as 1 + 4 + 1, 3 + 2 + 1, and 2 + 2 + 2. Notice that we're counting 1 + 4 + 1 as the same as 4 + 1 + 1. 5 can be written as 1 + 1 + 3, 1 + 2 + 2. So the number of 3-partitions for 6 is 3, and the number of 3-partitions for 5 is 2. Clearly 0, 1, and 2 all have no 3-partitions, and 3 has one, 1 + 1 + 1. But for a general n, how many are there?

The basic approach is going to be to come up with a method of constructing all 3-partitions of a given number, and then the formula for the total number falls out almost automatically. What follows then is a reduction from the initially simple to understand but difficult to compute formula to a relatively difficult to understand but surprisingly simple formula. It's interesting.

So, we have an integer n, and we want to construct a 3-partition of it. Constructing a partition basically amounts to choosing 3 integers i, k, j such that i + k + j = n.

Notice though, that we want to, in picking our numbers, avoid duplication. For example, when n = 6, we want to avoid choosing 1, 1, 4 and also 1, 4, 1. To avoid this, we're going to make sure to pick our numbers in sorted order. That is, we're going to make sure that the first number we pick will be the smallest of the three, and that the second number we pick will be at least as big as the first, and that the third is at least as big as the second. We'll call them, in order, i, k, j.

So, how many values can we pick for i, the smallest of the three? Clearly the smallest value we can have for i is 1. But what is the largest? i can be, at most, n/3. Think about it. If i is greater than n/3, then what is left of n to partition between k and j? Something less than n - n/3, or 2n/3. Partitioning 2n/3 between two numbers, one of them is going to have a value less than n/3 - but this contradicts our definition of i as the smallest of the three. Therefore, i is at most n/3.

But, since we're dealing with integers, i is really at most n/3, rounded down to the nearest integer. In math speak, this is the floor of n/3, or . So, for example with n = 5, n/3 is 1 and 2/3rds, which rounds down to 1. Looking back up at the introduction, you'll see that in the two partitions I give for n = 5, the smallest number in each case is 1.

So, we declare a value for i between 1 and . Notice though, that since k and j are at least as big as i, in deciding the value of i, we've 'pre-partitioned' 3*i of the total n. We know that the sum of the partition is going to be at least i + i + i. So really, in choosing k and j, what we're doing is partitioning n - 3*i between k and j, each of which is already at least i.

So, having picked a value for i, and thus set a minimum value of k, how can we partition n - 3*i between k and j such that j is greater than or equal to k? Similar to before, we can add, at most, half the remaining value to be partitioned to k. If we add any more, we'll end up adding less value to j, and k will be greater than j - which violates how we're setting things up. Remembering the fact that k must be an integer though, we can add at most to k. That gives us the upper bound. For the lower, remember that it's perfectly acceptable for i to equal k, but k cannot be less than i, so i is a lower bound for k.

To summarize,

We have that i is between 1 and .
We have that k is between i and i + .

Now what of j?

This is largely a trick question. Remember that we're after i + k + j = n. If we've chosen values for i and k, then j = n - (i + k). There's only one choice for j. And, by the way we've bounded our choices for i and k, it guarantees that j is the largest of the three numbers.

And that's it : ) This gives us a method for constructing all the partitions of n into 3 integers, such that no partition is constructed twice.

Now, as to counting the number of such partitions, it amounts to counting every possible choice of i and k. So, we basically count through every i, and for each i count through every possible k, and sum the number of times we do this Counting amounts to adding 1 a certain number of times. Thus we have our partition counting function,


That's the easy to understand formula, because you can see where every part of it comes from. Now to simplify : ) Notice though that it is difficult to compute, because it basically amounts to running through every possible partition, though in a convenient way, and counting them. If there are 10000000 bajillion partitions, that double sum will take a long time to compute.

Firstly, summing 1 over a given range is just equal to the number of numbers in that range. So we can reduce the double sum to a single sum of the form


Now instead of summing over all possible i and then all possible k, we've reduced it to a sum over all possible i. But we can do better. Notice, for example, that the floor function isn't very ... managable. There's not a lot you can do with it algebraically. But we can replace it with something just as good.

First, think about . If n - 3*i is even, then it is divisible by 2, and floor of half that is an integer, equal to (1/2)(n - 3 i). If n - 3*i is odd, then half that equals an integer plus 1/2. Rounding that down will yield, in effect, (1/2)(n - 3 i) - (1/2).

So, think about that behavior. For even values of n - 3*i, we want a function that returns (1/2)(n - 3 i), and for odd values of n - 3*i, we want one that returns (1/2)(n - 3 i) - (1/2). The only thing that really changes is the subtracting of one half.

So notice that (-1)m equals 1 for even m, and -1 for odd m. Then 1 - (-1)m equals 0 for even m, and -2 for odd m. Then, (1/4)(1 - (-1)m) equals 0 for even m, and -1/2 for odd m.

That being so, we can rewrite our floor function as


We can then reform our summation as


The rest is just a matter of algebra and series formulae. Expanding everything,


And then I could step through each step of simplifying the sum, but that would be even more tedious than everything I've done so far, so I'll skip straight to the punchline,


And that is awesome. Because we've reduced a function that was a multiple summation to a function that requires a fixed number of compuations, -dramatically- easier to calculate. Plugging in n = 0, 1, and 2, we get P3 is 0 for each. For n = 3, when = 1, we get









So there is only one partition of n = 3, which we know is true since the only partition is 1 + 1 + 1.

I don't really feel like doing any more examples, since I've gone on well long enough. I think it is very interesting though that the resulting function turned out to be (effectively) quadratic in terms of n. Honestly, I find it interesting the formula reduced half as nicely as it did anyway : )

So that's our final result for the number of ways of partitioning an integer n into three pieces. I've gone on long enough at this point though, and it's time enough for me to get back to work.

Tuesday, April 1, 2008

Progress on Riemann?

The Riemann Hypothesis is one of the greatest unsolved math problems in the world today - probably. If not, it's easily one of the most famous.

It centers on the Riemann Zeta Function, given by



The function being defined over the complex plane. The Hypothesis is, in its most basic form, that if , then s = 1/2 + t*i, for some real number t. That any zero of the zeta function has a real part of one half.

Why this is interesting though is the connection with prime numbers. There's a function called the prime counting function, pi(x), and it basically gives the number of prime numbers between 0 and n. As the distribution of primes is so bizarre, being able to calculate pi(x) correctly, without just counting all the prime numbers from 1 to x, would be a very neat trick. We have some very nice approximations for pi(x), involving integrals of relatively simple functions, but the question is the, how good are these approximations?

So enters the Riemann Hypothesis. Basically, it's been shown that the Riemann Hypothesis, as stated above, is equivalent to saying, for our best approximation of pi(x), pi(x) differs from this approximation by no more than some constant times , for x sufficiently large.

Which is a very interesting result.

But a proof was proposed a few months ago, and it seems really quite promising, and at the very least opens up some new avenues for further inquiry. As usual for such things, I don't have the background to explain fully, but I think I can break down a couple of the key points to an understandable level. Maybe Mark can pick this up? More below the fold.

Read more »

Friday, March 14, 2008

Quantum Computers: Math and Time Travel

I'm currently half way through my Spring Break, so the math is slow/non-existent. I've got an interesting math problem half written up, but I'm trying to decide if there's a better way to reach my final conclusion. It's still nice to get a break from the insanity, though.

In the mean time, in this months Scientific American, there was an article on the Limits of Quantum Computers. You'll need to register/pay to read the online version, so that's sort of a bum deal. But it's a very interesting read. But it draws together two of my favoritest things, math and time travel. And a big heap of quantum computing to boot.

Without getting into the nitty gritty of complexity theory (I'll leave that to Kurt, I think) there are several classes of math problems, which could best be described as 'hard' and 'easy'. Easy problems are ones for which we have developed highly efficient algorithms for solving, sorting a list of data, for example. For hard problems, however, any algorithm we've developed for solving them runs incredibly inefficiently, quickly needing unmanageable amounts of time or space to arrive at the solution.

In a relatively simplified version of things, suppose a problem's size can be represented by some variable n. For example, n cities you're trying to find a shortest path between, or trying to determine whether or not a number is in a list of size n. For easy problems, the number of steps needed to solve the problem grows proportionate to some polynomial of n, like n2. Hard problems, however, the number of steps needed grows exponentially, like 2n. As n increases, it becomes vastly more resource intensive to solve the hard problems than the easy ones.

The million dollar question, literally, is whether or not 'hard' problems are actually 'easy', but we just haven't found a good way of solving them yet.

The point of the article is that there is a massive hoopla surrounding quantum computation, based on the idea that quantum computers will be able to solve hard problems easily. The best way of describing how they might do such a thing is that, theoretically, quantum computers would be able to check every possible solution to a given problem simultaneously, whereas modern computers currently have to construct solutions, or check possible solutions some small number at a time. So these unfathomably difficult problems could be felled easily, by even simple quantum computers. In theory.

Except, according to the author, the excitement is relatively unfounded.

The main problem is that the constructive and destructive interference of waveforms that represent superimposed solutions in quantum computers are incredibly difficult to handle without interference from outside the computer, never mind the fact that if you actually measure the state of the computer while some number of possible solutions are in superposition, you can only ever measure one solution at a time. Any algorithm for using quantum computers to solve such problems must not only overcome these difficulties, but exploit them in solving the problem.

But aside from the physical difficulties of handling such a machine, there seem to be mathematical issues as well.

We need to abstract. Imagine a completely general problem, with some number of potential solutions, S. Using a usual computer, trying to find the actual solution, in the worst case we would have to check each of the S many solutions. In computer science, we always assume the worst. So that is, we anticipate, S many steps to find the actual solution.

Now, imagine you had a quantum computer. We'll disregard the actual construction and innerworkings of such a computer, but assume that it has the capacity to take a potential solution and verify it. Now, in the optimal case, as described, we could superimpose every single solution of the S many at once in the computer, and verify them all at the same time. Effectively solving the problem in one step. But that makes many assumptions about the structure and nature of the problem and its solutions. In this abstract, general case we are assuming, it has actually been proven that using a quantum computer, the best speedup you can hope for is to solve the problem in S1/2 many steps. It's the absolute best case scenario. Of course, best case scenario the first solution you check is the correct one and you solve the problem in one step, but again assuming the absolute worst, it's been shown that the worst case number of steps to solve the problem is S1/2.

As my friend Dan would say, we proved it using math.

Now, going from a worst case of S to a worst case of S1/2 is a significant speed up, yes. But in these hard problems, as I described, the solution space, S, grows exponentially. As such, S1/2 also grows exponentially, and again we're in a situation where as the size of the problem grows, it rapidly becomes unmanageable.

Does this show that quantum computers won't be able to solve these hard problems? Not necessarily. At best, what this shows is that any solution to such problems can't be 'general'. A solution algorithm to a given problem will have to be specific to that problem, exploiting its nature in some way that we don't yet understand.

But here's where the article got very interesting.

The author then goes on to describe theoretic computers, above and beyond quantum computers, that might be able to tackle such problems. One in particular that caught my attention approached this problems by manipulating time. In layman's terms, it solved the problem using time travel - possibly my favorite subject. The fellow who conceived of such things devised a computer which, in theory, could solve these 'hard' problems easily through the use of time travel, through what are known as Closed Timelike Curves. Not to be confused with Closed Timely Curves.

Of course, no one knows if such things are physically possible.

Now, this is my favorite part. The question of whether these hard problems are in fact easy basically boils down to a math problem. Can it be proved, one way or the other. However hard it may be, it is a math problem. And if it is proved that these hard problems are -not- easy, any method by which they could be solved easily becomes impossible. As such, given the above discussion, it would serve as a mathematical disproof to the possibility of time travel.

Of course, if it is proven that hard problems are in fact, easy, it wouldn't say a single thing about whether or not time travel is possible. The implication is one way, but still interesting.

Very exciting. Very exciting.

Friday, March 7, 2008

The Carnival of Mathematics: No. 28

Tyler and Foxy's Scientific and Mathematical Adventure Land is proud to present the 28th. Carnival of Mathematics! According to the treasured tradition of this fine event, I will exposit something interesting about the number 28. Fortunately, I got the perfect number to do so, literally. 28 is a perfect number, such a thing being defined as the sum of its positive divisors that are not equal to the number itself. That is, 28 is the sum of 1, 2, 4, 7 and 14. The next guy who gets to talk about perfect numbers will be the guy who happens to be doing the 496th. Carnival of Mathematics. Good luck, whoever you end up being!

And now, on to the good stuff...

Over at The Universe of Discourse, Mark Dominus provides us with three interesting posts. The first concerns an interesting property in the world of coding theory known as "unique decodability", or whether particular strings are the result of unique encodings. The second is on rational roots of polynomial equations. The third concerns techniques of algebraic manipulation that work...sometimes.

From the website Old-Wizard dot com, we have this list of the top 10 Mathematicians of All Time. In my mind, they get it mostly right. The only serious problem I can see is that Leibniz is nowhere to be found. Seriously, WTF?

My esteemed blogmate Foxy over at Foxmaths 2.0! provides us with this post on an interesting geometric question: can you divide a cube into a finite number of smaller cubes, all of different sizes? The answer is....just kidding, I'm not a spoiler. Go read!

Charles Daney at Science and Reason sends us this post on important concepts in algebraic number theory, in particular concepts from ring theory. The key concept is that of the "Dedekind domain", a specific type of commutative ring in which every nonzero ideal can be uniquely factored into a product of prime ideals.

Over at The Number Warrior, Jason Dyer provides us with this fascinating historical discussion of the ancient Egyptian calculation of pi. It even has tables of Egyptian numeric characters showing the development of the quantity over the years, I think that's pretty damn neat.

Julie Rehmeyer provides us with two posts on Sophie Germain, the first person ever to lay out a realistic plan for rigorously proving Fermat's Last Theorem, one of the peskiest little assertions in the history of number theory.

In the math education and pedagogy category, Mr K. over at Math Stories sends us this entry on how overemphasis on mechanical shortcuts over conceptual understanding harms long term learning in mathematics.

Over at Wild About Math!, we have a probability puzzle, apparently the first in a regular series untitled "Monday Math Madness".

In the aesthetics of mathematics category, we recieve this entry from Art Black, which compares the perception of beauty of mathematics to the perception of beauty of music.

The winner in the terms of sheer volume of material for this edition of the Math Carnival is Blake Stacey, who sends us a series of five posts on supersymmetric quantum mechanics! [1, 2, 3, 4, 5]

And finally, at the Teaching College Math Technology Blog, we recieve this entry from Maria Andersen on software for teaching math courses online.

And that's all, folks! It has been a pleasure to host this event, and I hope I have done so to your satisfaction. Good luck, and godless speed.

Tuesday, February 19, 2008

Carnival of Mathematics Submission Guidelines

I'm proud to say that the Carnival of Mathematics is going to hosted here on March 7th. To help potential authors I've decided to provide this set of submission guidlines:

1. The submission deadline is set to 10:00PM EST on March 5, no submissions will be accepted after that. I plan on finalizing the entry selection that night, writing up the carnival on Thursday and posting it on Friday morning, for purposes of work schedule compatibility.

2. Include a brief (2-5 sentence-long) description of what your entry is about. I'm a reasonably math savvy person, but there is no way I'll be able to fully capture the content of all the entries without a bit of assistance. Help a fellow geek out!

3. Send all the entries to my personal email (tylerofmanyminds AT gmail dot com).

Good luck, and godless speed.

Thursday, February 7, 2008

Probabilistic Chess Coloring

Now isn't that an inviting title.

So, the idea is that in general, you have an NxN chessboard, with all white squares, and you're going to go through each square and color it black with a probability p. So ideally, you're left with p*N*N many black squares and (1 - p)*N*N many white squares. Consider the 20x20 board below. I hope this comes out formatted properly.

X__X_XXXX________XX____X_X__XX
_X_____XXX____X_X_XXX___X_X__X
_______X___X_XX_____XXXX_X__X_
X__XX____XXX__X_X__X_____X_X__
X___XX__X__XXXXXX____XXX__X__X
___________XXXXX__XX__X_XX__XX
XX__XX___X__X___X___X_____X__X
X_X_X_XXX_______X___X_____XX_X
_XXX__X_X___XX_X__XXX__X___XX_
X___X__X_XX_X_XXX____X_____X_X
X_X_X_X____X_X_______________X
_XXXX_X__X_____XX_X_X___X___X_
____XX_XX_X___XX____XX_X____X_
X_X_XXXXXX____XX_X_____X____X_
X___X_X_XX______XX_X____XX_XXX
X_X__X__X___XX_XXX____X___XXX_
_X_X_XX_X____X_______XXX__XX_X
_X_XXX_______XX__XXX__XX__X_XX
____X_______X_XXX_XX_X_X______
__XX_X__XX_X_X___XX_XX_X__XX_X
X___XX_XXXXX__X____XXX_X______
__X_XX__XX_X__________XX__XX_X
______XXXXX_______X_X________X
X__X_X__XX__XX_XX_XX_____XXXX_
X___XX_XXXXX_XXX_X_XX___XXX_X_
_XX_____X_XX_X___X_X___XXXXXX_
__XX_X_X____X_X__X_______X_X__
X__X_X___X_XXX_XXX_X__XX_X__X_
___X_X__X_XXX_____XX__XX______
_X_XX_X__X___XXX_X__X_X_______

The question I would like to consider then is, what is the size, in number of squares, of the largest continuous black section? We'll say that two black squares are touching only if they share a side - no diagonals. So, for a given value of p on a given board, what is the largest number of black squares you expect to find all touching each other?

This is a variant of a problem in graph theory known as looking for the largest connected component of a random graph. Analysis of that problem yields some interesting results, but I'm not sure how to extend it to here, where the geometry of the graph is modified.

However, we can approximate, test, and simulate. Using p from 0 to 1, at increments of 0.01, I generated such boards (I think it was a 70x70 board), filled them, and determined the maximum connected component on the board. And then I repeated this multiple times, averaging them all together with a convenient ruby script. Here is a plot of the results, in terms of the probability p along the x, and the maximum size as a fraction of the whole board along the y.

Chess Connected Size Graph


Notice that for p less than about 1/2, the size of the maximum component is ridiculously small. Which means that if you color only half or less of the squares, you can expect, on average, very tiny connected blobs of black. But soon after that, the behavior of the curve changes drastically, practically exponentially. A dramatic increase in the expected size of the maximum component. It is as though the board reaches a sort of carrying capacity for the smaller blobs, after which point (increasing p and adding more black squares), it becomes necessary to begin connecting smaller blobs into larger. There's no way to fit any more smaller blobs onto the board, and as you continue to add black squares, the smaller blobs become connected at an increasing rate.

It's very interesting, I think.

And makes me think, abstractly, of Blokus.

Tuesday, January 22, 2008

Average-Case Complexity

(Cross-posted at PowerUp)

Analysis of algorithms is important at just about all levels of computer science as well as software engineering. When analyzing algorithms, we usually have in mind the property of running-time complexity, a measure of the efficiency of the algorithm by how many steps it takes. In addition, we often must also take into account unavoidable costs by invoking constant terms. Unavoidable costs show up in many practical areas, an example from my main interest (computer graphics) is the draw call cost associated with rendering objects to a screen or to a memory buffer. Here, we primarily take a look at the techniques of average case analysis through the common, though instructive, example of the hiring problem.

The hiring problem is defined as such. You are given the task of hiring an office assistant, and the task itself is a two pronged process of interviewing and hiring from a list of n candidates. Both interviewing and hiring have associated costs. Interviewing has a low cost, which we will denote by ci. However, actually hiring an assistant has a much higher cost, which we will denote by ch. After interviewing an assistant, if they are superior to the currently operating office assistant, the candidate becomes the office assistant. All candidates must be interviewed, so the lower cost is guaranteed to be incurred for each candidate. However, hiring is not a guaranteed for each candidate. We add to this the assumption that the candidates are offered for interviews according to uniform distribution, i.e., they come in a statistically random order. And to simplify, we also assume that the quality of the candidates represents of a total order, i.e., all candidates yield to hard inequality (either > or <).

Essentially, the algorithm for computing this problem can be defined as such.


1. set current_assist to 1
2. for candid(2-N)
3. if candid > current_assist
4. do current_assist <-- candid


Essentially, all we really need to determine in the running time for the algorithm is how often we execute step 4. We know we incur the heavy cost of hiring at least once, as per step 1, and we have to go through interviewing each candidate as per steps 2 and 3. Trivially, the best case running time is $ O(nci\ +\ ch) $, since you only incur the cost of hiring once and no candidate inteviewed is superior to your first hiree. In the worst case, the candidates are interviewed in a perfect ascending order of quality, meaning you must execute the hiring operation a total of n times, yielding a running-time of $ O(nci\ +\ nch) $. Clearly, neither the best or worst case is representative of the expected running-time of the algorithm. What we are looking for is the average case, or the expected running-time when all inputs are considered.

Usually, average-case analysis is done through an enumeration of all possible inputs and a derivation of the expectation or variance of the inputs. For many algorithms this can be a cumbersome operation to perform. For the hiring problem, this actually isn't too difficult. Cormen, et al. do so in Introduction to Algorithms. In short, since the candidates are interviewed according to uniform distribution, the probability that any candidate i in the list of all n candidates is superior is 1/i. The expected running-time thus coincides with the harmonic series $ \sum_{i=1}^n\frac{1}{i} $, which converges to $ lg\ n\ +\ O(1) $. Thus the average case running-time of the algorithm is on the order of $ O(nci\ +\ ch\ lg\ n) $.

That would be an example of the obtaining the result via the probabilistic method. An alternative method for average case analysis is known as the incompressibility method. The latter takes advantage of a well acknowledged fact from the theory of Kolmogorov complexity: very few strings have short descriptions. In other words, the vast majority of members of any given set of objects will be Kolmogorov random. This gives us a way to determine the average-case running time by examining one input that is guaranteed to be representative of the ensemble of inputs. To demonstrate this, we take an arbitrary permutation p of the list of n candidates and fix a universal partial recursive function f to compute p given n. Given n inputs, there are a total of n! possible permutations. Assuming uniform distribution of inputs and thus equidistributed hirees in the hiring assistant algorithm, most strings would contain $ O(lg\ n) $ hirees. Assume the contrary, that most strings contain significantly more or fewer than $ O(lg\ n) $ hirees. Then there would be a short description of the majority of the inputs, being highly regular sequences. This however would contradict the well known fact that for every set A of cardinality n:

$ K_f(p\ |\ n)\ \ge\ lg\ n\ -\ c $

For all but about $ n(1\ -\ 2^{-c})\ +\ O(1) $ elements. Here, $ K_f(p\ |\ n) $ represents the Kolmogorov complexity of the permutation p given n (that is, the quantity n is known in advance to us and is encoded by the partial recursive function), and c is an arbitrary positive integer. A string x is said to be c-incompressible if $ K(x)\ \ge\ l(x)\ -\ c $, where $ l(x) $ is the length of the sequence. Since most strings are incompressible, the contradiction shows that most permutations p of n inputs can't have short descriptions. The average case running time is thus, as it was before, $ O(nci\ +\ ch\ lg\ n) $.

Further reading: I kind of handwaved on some of the stuff here, especially definitions of Kolmogorov complexity. For that, you should consult either Li and Vitanyi's An Introduction to Kolmogorov Complexity and Its Applications (the entirety of chapter 6 is on the incompressibility method) or Peter Gacs' lecture notes on Algorithmic Information Theory. Probabilistic analysis is covered by Cormen, et al. in chapter five of Introduction to Algorithms.

Monday, January 14, 2008

Recursive Grab Bag

A rather interesting topic in math (what a general way to start a post) is the matter of recursive functions. These are functions that are expressed in terms of themselves. A fairly traditional example would be factorial. n! is defined to be



Now, a nicely concise way to write this, not to mention computationally convenient, is to define the following




So, the operation is defined, effectively, with respect to itself. There are a lot of operations that can be defined very nicely in this way. Consider the following, for example.





Calling f(2, 3) then, it unwraps to give





In general, it's clear that f(x, n) returns n*x. In effect, this is a recursive definition of multiplication. So, many nice things can be expressed very nicely using recursion.

But, many interesting things can be expressed with recursion that can't be expressed nicely otherwise. Consider, for example, the Ackermann Function, A(m, n) defined as

If m = 0:
If m > 0 and n = 0:
If m > 0 and n > 0:

A(m, n) has no non-recursive definition, no closed-form expression, or anything near as nice as the two recursive examples I already showed you have. The above is the best you're going to get. Interestingly, it is the simplest function for which that is the case. The other point of interest about the Ackermann Function is how incredibly fast it grows, even for small m and n. In fact, A(4,2) has 19729 digits, which is far more than I am comfortable writing out here. The full number can be found here, if you're interested. A very, very large number. In fact, at that magnitude, I'd argue that the full text of the number is next to meaningless, when compared with the much more concise description of the same number as merely A(4,2).

Alternately, there are things that are expressed very nicely with recursion, that become much less nice when unraveled. Take, for example, the Fibonacci Sequence. Starting with a 1 and a 1, the sequence is formed by taking the sum of the two previous terms. This produces 1, 1, 2, 3, 5, 8, ... and so on. More concisely, it can be written as







A very clean and elegant description of the sequence. Unraveling the recursion, with all due care, it can be shown that the following definition is equivalent.



Hardly the most elegant of formulae. Somehow, the recursive definition captures the relationship between the terms of the sequence far better. However, and this becomes more clear with more complicated recursive definitions, it is increasingly difficult for humans to fully grasp the shape and form of the sequence - something that becomes much simpler even in the more complicated form. In that way, the goal of unraveling these recursive definitions is a markedly human endeavor. The whole of the sequence or function is defined through the recursion, and the act of unraveling it merely serves to make it understandable to us. To make the full implications of the levels of self reference clear. In many ways, I think this is indicative of the goals of mathematics as a whole. In general, you start with a concise set of axioms and assumptions, and tease out some conclusion or theorem from them. The conclusion or theorem of course already existed as truth, through the statement of the axioms, but the act of proof teased out a bit of comprehensible meaning that was otherwise unclear from the statement of the axioms themselves. The whole of math exists in the statement of the axioms - the point of math is to bring the full weight and import of those simple statements to a understandable level. As much as I am loathe to admit it ... it's all a very human enterprise.

But I digress. That is a topic for another time.

So, in general, given a recursive function or sequence, it's an interesting mathematical question of whether the recursion can be unwrapped to give some form of closed-form, non-recursive description of the same function or sequence. In my next post, or something soon at least, I'll discuss generating functions, a neat tool used to such ends.

The Necessity of Relativity

Relativity holds a rather curious place in physics, as I see it. Other branches - nuclear, mechanics, E&M, so forth, all seem firmly based in the study of physical objects, systems, and principles. Relativity is, in point of fact, much more of a math than a physics. It starts with simple axioms about our understanding of the universe and derives conclusions from them about the way the universe must work. The tools of relativity are not physical laws, but rather logic.

Now, as I said, relativity starts with simple axioms, and works from them to derive its conclusions. The axiom in question in this case is that the speed of light is finite, and constant. No matter who is measuring it, how fast they are going, what they are doing, who else is measuring, all observers always measure the same speed of light. That speed we'll call c here, though more traditionally it is 299 792 458 meters per second.Now to the experiment.

Alice Cooper is riding in a train which is moving at speed v. At some time, which we'll call even one, a flash of light is emitted from the floor of the train, straight up towards the ceiling, being absorbed by the ceiling at event two. If we say that the train is of height h, then the entire transit time of the flash, from the floor to the ceiling, as seen by Alice, is h/c. And needless to say, the length of the path of the light flash, from the floor to the ceiling, as seen by Alice, is simply h.

Train Observer


At the same time, a man with no distinguishing characteristics, code named 'Observer' is standing at a station watching Alice Cooper's train pass. He sees the light emitted, watches it travel, and watches it be absorbed by the ceiling of the train. Below is a diagram of what he sees.

Station Observer


The problem should begin to be clear. Once the light is emitted, some time passes before it is absorbed by the ceiling. During this time, the train continues forwards, so the point of absorption is no longer directly above the point the light was emitted from. As such, there is now a horizontal component to the path of the light that was not present in the first case. As such, the light travels further for the station observer than it does for Alice Cooper.And -that- is ridiculous! How could someone measure something as one length, and someone else measure it as another? To make matters worse, since as we said, they both measure the speed of light to be the same thing, c, if they measure the paths as having different lengths, then they must measure the time between the two events as different as well, the station observer measuring a longer time since it takes longer for light to travel along that path.

Clearly a contradiction. Nevertheless - nothing in this thought experiment is strange or alien or impossible. There are no buses going the speed of light, no perfect spheres. If we wanted to, we could put Alice Cooper on a train, we could speed it up to some velocity, and some man could be observing the train. Everything is doable, and the contradiction is simply a logical result of what we know to be true. This is the essence of relativity - it is a mathematical description of how the -must- universe work. It is a necessity.

Let us ask then, exactly what the station observer measures between the two events in terms of distance and time. Let the distance between the emission and the absorption be L, and the time be T. Because the speed of light is constant, we know that the time the station observer measures must be T = L/c.

But what of L? Notice the diagram below, with the conveniently drawn right triangle.

Observer Geometry


The hypotenuse of the diagrammed triangle is the path the light takes, so it is of length L. The vertical leg of the triangle is simply the height of the train, which is h. The horizontal leg is the distance the point of emission travels during the time interval between events. In other words, v*T. Using the Pythagorean theorem and that T = L/c,











Notice that L, the distance the station observer measures the path to be, is simply h, the distance Alice Cooper measured, scaled by a curious factor. Now look at the time.





Notice that T, the time the station observer measures the path to take, is simply h/c, the time Alice Cooper measured, scaled by the same curious factor. Length and time have been dilated. The conclusion is inescapable, bizarre, completely logical, and utterly fascinating.

So we have two observers, measuring ostensibly the same thing, but each getting markedly different results in their measurements of time and distance. This is a drive by introduction to special relativity.

The important thing to note here is this - many people object to the conclusions of relativity as bizarre, counter intuitive, nonsensical, and downright impossible. However, the problem is not with relativity. Relativity is simply a precise, mathematical description of what -must be true- about the universe based on what we know to be true. There are no special forces at work here - no magic, no hand waving. Each step of the derivation follows logically from the rest. Relativity is simply a mathematical necessity based on our understanding of the universe.

A mathematical necessity. And there's really little more to say than that.