(Cross-posted at PowerUp)
Analysis of algorithms is important at just about all levels of computer science as well as software engineering. When analyzing algorithms, we usually have in mind the property of running-time complexity, a measure of the efficiency of the algorithm by how many steps it takes. In addition, we often must also take into account unavoidable costs by invoking constant terms. Unavoidable costs show up in many practical areas, an example from my main interest (computer graphics) is the draw call cost associated with rendering objects to a screen or to a memory buffer. Here, we primarily take a look at the techniques of average case analysis through the common, though instructive, example of the hiring problem.
The hiring problem is defined as such. You are given the task of hiring an office assistant, and the task itself is a two pronged process of interviewing and hiring from a list of n candidates. Both interviewing and hiring have associated costs. Interviewing has a low cost, which we will denote by ci. However, actually hiring an assistant has a much higher cost, which we will denote by ch. After interviewing an assistant, if they are superior to the currently operating office assistant, the candidate becomes the office assistant. All candidates must be interviewed, so the lower cost is guaranteed to be incurred for each candidate. However, hiring is not a guaranteed for each candidate. We add to this the assumption that the candidates are offered for interviews according to uniform distribution, i.e., they come in a statistically random order. And to simplify, we also assume that the quality of the candidates represents of a total order, i.e., all candidates yield to hard inequality (either > or <).
Essentially, the algorithm for computing this problem can be defined as such.
1. set current_assist to 1
2. for candid(2-N)
3. if candid > current_assist
4. do current_assist <-- candid
Essentially, all we really need to determine in the running time for the algorithm is how often we execute step 4. We know we incur the heavy cost of hiring at least once, as per step 1, and we have to go through interviewing each candidate as per steps 2 and 3. Trivially, the best case running time is $ O(nci\ +\ ch) $, since you only incur the cost of hiring once and no candidate inteviewed is superior to your first hiree. In the worst case, the candidates are interviewed in a perfect ascending order of quality, meaning you must execute the hiring operation a total of n times, yielding a running-time of $ O(nci\ +\ nch) $. Clearly, neither the best or worst case is representative of the expected running-time of the algorithm. What we are looking for is the average case, or the expected running-time when all inputs are considered.
Usually, average-case analysis is done through an enumeration of all possible inputs and a derivation of the expectation or variance of the inputs. For many algorithms this can be a cumbersome operation to perform. For the hiring problem, this actually isn't too difficult. Cormen, et al. do so in Introduction to Algorithms. In short, since the candidates are interviewed according to uniform distribution, the probability that any candidate i in the list of all n candidates is superior is 1/i. The expected running-time thus coincides with the harmonic series $ \sum_{i=1}^n\frac{1}{i} $, which converges to $ lg\ n\ +\ O(1) $. Thus the average case running-time of the algorithm is on the order of $ O(nci\ +\ ch\ lg\ n) $.
That would be an example of the obtaining the result via the probabilistic method. An alternative method for average case analysis is known as the incompressibility method. The latter takes advantage of a well acknowledged fact from the theory of Kolmogorov complexity: very few strings have short descriptions. In other words, the vast majority of members of any given set of objects will be Kolmogorov random. This gives us a way to determine the average-case running time by examining one input that is guaranteed to be representative of the ensemble of inputs. To demonstrate this, we take an arbitrary permutation p of the list of n candidates and fix a universal partial recursive function f to compute p given n. Given n inputs, there are a total of n! possible permutations. Assuming uniform distribution of inputs and thus equidistributed hirees in the hiring assistant algorithm, most strings would contain $ O(lg\ n) $ hirees. Assume the contrary, that most strings contain significantly more or fewer than $ O(lg\ n) $ hirees. Then there would be a short description of the majority of the inputs, being highly regular sequences. This however would contradict the well known fact that for every set A of cardinality n:
$ K_f(p\ |\ n)\ \ge\ lg\ n\ -\ c $
For all but about $ n(1\ -\ 2^{-c})\ +\ O(1) $ elements. Here, $ K_f(p\ |\ n) $ represents the Kolmogorov complexity of the permutation p given n (that is, the quantity n is known in advance to us and is encoded by the partial recursive function), and c is an arbitrary positive integer. A string x is said to be c-incompressible if $ K(x)\ \ge\ l(x)\ -\ c $, where $ l(x) $ is the length of the sequence. Since most strings are incompressible, the contradiction shows that most permutations p of n inputs can't have short descriptions. The average case running time is thus, as it was before, $ O(nci\ +\ ch\ lg\ n) $.
Further reading: I kind of handwaved on some of the stuff here, especially definitions of Kolmogorov complexity. For that, you should consult either Li and Vitanyi's An Introduction to Kolmogorov Complexity and Its Applications (the entirety of chapter 6 is on the incompressibility method) or Peter Gacs' lecture notes on Algorithmic Information Theory. Probabilistic analysis is covered by Cormen, et al. in chapter five of Introduction to Algorithms.
Tuesday, January 22, 2008
Average-Case Complexity
Posted by
TylerD
at
11:04 AM
Subscribe to:
Comment Feed (RSS)

|