跳转至

Chapter 5 Probabilistic Analysis and Randomized Algorithms

5.1 The hiring problem

You decide to use an employment agency to hire a new office assistant. You interview that person and then decide either to hire that person or not. You must pay the employment agency a small fee to interview an applicant. To actually hire an applicant is more costly, however, since you must fire your current office assistant and pay a substantial hiring fee to the employment agency. You are committed to having, at all times, the best possible person for the job. Therefore, you decide that, after interviewing each applicant, if that applicant is better qualified than the current office assistant, you will fire the current office assistant and hire the new applicant. You are willing to pay the resulting price of this strategy, but you wish to estimate what that price will be.

HIRE-ASSISTENT(n)
best = 0
for i = 1 to n
 interview candidate i
 if candidate i is better than condidate best
  best = i
  hire candidate i

Interviewing has a low cost, say \(c_i\), whereas hiring is expensive, costing \(c_h\). Letingn \(m\) be the number of people hired, the total cost associated with this algorithm is \(O(c_in + c_hm)\).

Worst-case analysis

In the worst case, we actually hire every candidate that we interview. Total cost of hiring is \(O(c_hn)\).

It is natural to ask what we expect to happen in a typical or average case.

Probabilistic analysis

Probabilistic analysis is the use of probability in analysis of problem.

average-case running time.

Assume that we can compare any two candidates and decide which one is better qualified; that is there is a total order on the candidates. Rank each candidate with a unique number from 1 through \(n\). This list of ranks is equally likely to be any one of the \(n!\) permutations appears with equal probability.

Randomized algorithms

随机化算法,打乱输入进行分析。

5.2 Indicator random variables

引入概率论的相关概念,指示随机变量(Indicator random variables)

Indicator random variables provide a convenient method for converting(转换) between probabilities and expectations(期望).

Given a sample space S and an event A.

indicator random variable I{A} associated with event A is defined as

\[ I\{A\}= \begin{cases} 1 \quad if A occurs,\\ 0 \quad if A does not occur. \end{cases} \]

indicator variable(指示变量) \(X_A = I\{A\}\)

So the expectations is

\[ E[X_A] = E[I\{A\}]\\ = 1 \cdot Pr\{A occurs\} + 0 \cdot Pr\{A does not occur\} \]

The expected value of an indicator random variable associated with an event A is equal to the probability that A occurs.

Lemma 5.1

Given a sample space S and an event in the sample space S, let \(X_A = I\{A\}.\) The \(E[X_A] = Pr\{A\}.\)

Proof By the definition of an indicator random variable from equation and the definition of expected value, we have

\[ E[X_A] = E[I\{A\}]\\ = 1 \cdot Pr\{A\} + 0 \cdot Pr\{\overline{A}\} =Pr\{A\} \]

where \(\overline{A}\) denotes \(S - A\), the complement of A.

The indicator random variables can simplify the calculation of the expected number of heads.

Analysis of the hiring problem using indicator random variables

To compute the expected number of times that we hire a new office assistant.

In order to use a probabilistic analysis.

  • Assume that the candidates arrive in a random order.
  • Let X be the random variable whose value equals the number of times we hire a new office assistant.

Apply the definition of expected value frim equation to obtain

\[ E[X] = \sum_{x = 1}^{n}xPr\{X = x\} \]

This calculation would be cumbersome. We shall instead use indicator random variables to greatly simplify the calculation.

Instead of computing the expected number by defining one variable associated with the number of times we hire a new office assistant, we define \(n\) variables ralated to whether or not each particular candidate is fired.

Then wo let \(X_i\) be the indicator random variable associated with the event in which the ith candidate is hired. Thus,

\[ X_i = I\{candidate i is hired\}\\ = \begin{cases} 1 \quad if candidate i is hired,\\ 0 \quad if candidate i is not hired, \end{cases} \]

and

\[ X = X_1 + X_2 + \cdots + X_n. \]

By Lemma 5.2, we have that

\[ E[X_i] = Pr\{candidate i is hired\}, \]

and we must therefore compute the probability that lines 5-6 of HIRE-ASSISTANT are executed.

Candidate i is hired, exactly when candidate i is better tham each of candidates 1 through i - 1. Because we have assumed that the candidates arrive in a random order, the first i candidates hace appeared in a random order. Any one of these first i candidates is equally likely to be the best-qualified so far.

Candidate i has a probability of 1/i of being better qualified than candidates 1 through i - 1 and thus a probability of 1/i of being hired.

By Lemma 5.1, we conclude that

\[ E[X_i] = 1/i. \]

Now we can compute the expected value:

\[ E[X] = E[\sum_{i = 1}^{n}X_i]\\ = \sum_{i = 1}^nE[X_i]\\ = \sum_{i = 1}^n 1/i\\ = lnn + Q(1) \]

The answer means even though we interview n people, we actully hire only approximately lnn of thme, on average.

Lemma 5.2

Assuming that the candidates are presented in a random order, algorithm HIRE-ASSISTANT has an average-case total hiring cost of \(O(c_hlnn)\).

Proof The bound follows immediately from our definition of the firing cost and quation, which shows that the expected number of hires is approximately lnn.

The average-case hiring caost is a significant improvement over the worst-case hiring cost of \(O(c_hn)\).

5.3 Randomized algorithms

We expect to hire a new office assistant approximately lnn times and make it be the case for any input, rather than for inputs drawn from a particular distribution.

\[ RANDOMIZED-HIRE-ASSISTANT(n) 1 \quad randomly permute the list of candidates\\ 2 \quad best = 0 // candidate 0 is a least-qualified dummy candidate\\ 3 \quad for i = 1 to n\\ 4 \quad \quad interview candidate i\\ 5 \quad \quad if candidate i is better than candidate best\\ 6 \quad \quad \quad best = i\\ 7 \quad \quad \quad hire candidate i \]

With this simple change, we hace created a randomized algorightm whose performance matches that obtained by assuming that the candidates were presented in a random order.