ith Order Statistic

The ith order statistic of a set of n elements is the ith smallest element The minimum of a set of elements is the first order statistic while the maximum is the nth order statistic The ith order statistic is for instance used to create filters in image processing The median is the item at the halfway point of the set (when the set is sorted) If n is odd, the median is the item at position (n+1)/2 If n is even, there are two medians, but we take the median at position (n+1)/2 also called lower median

MSc Computer Science ICS 801 Design and Analysis of Algorithms

We need to be able to select the ith order statistic from a set of n numbers The problem is stated formally as follows:

1st and nth Order Statistics The 1st and nth order statistics can be got in using n-1 comparisons using the following algorithm and its variants minimum(A) 1 min=A[1] 2 for i=2 to n 3 if min>A[i] i >A[i] 4 min=A[i] 5 return min Exercise: 1. Analyze the algorithm to get its running time 2. Modify the algorithm to get the 2nd order statistic

Input: A set A of n numbers, an integer 1 i n Output: the ith smallest value in A.

The problem can be solved in O(nlogn) time if we sort the numbers Practical algorithms exist for solving the problem in O(n) time

Selection in Expected Linear Time There is a divide and conquer algorithm for the selection problem Works in the same way as quick sort in partitioning the array of numbers While as quick sort is expected to run in (nlogn), RANDOMIZED SELECT RANDOMIZED-SELECT is (n) Uses RANDOMIZED-PARTITION to increase the likelihood of items being partitioned into two equal sets The algorithm is given in the next slide

Selection in Expected Linear Time RANDOMIZED-SELECT(A, p, r, i) 1 if p = r 2 return A[p] 3 q RANDOMIZED-PARTITION(A, p, r) 4 kq-p+1 5 if i = k // the pivot value is the answer 6 return A[q] 7 elseif i < k 8 return RANDOMIZED-SELECT(A, p, q - 1, i) 9 else return RANDOMIZED-SELECT(A, q + 1, r, i - k)

Selection in Expected Linear Time

1 2 3 RANDOMIZED-PARTITION(A, p, r) i RANDOM(p, r) exchange A[r] A[i] return PARTITION(A, p, r)

Selection in Expected Linear Time

If we assume that the randomization of the array A makes the two partitions of A to be of equal size, then the following summation can be used to describe the running time for the algorithm Note that we sum the time used to partition the array

PARTITION(A, p, r) 1 x A[r] 2 ip-1 3 for j p to r - 1 4 if A[j] x 5 ii+1 6 exchange A[i] A[j] 7 exchange A[i + 1] A[r] 8 return i + 1
n + n + n + ... + 1 2 4 = n(1 + 1 + 1 + ... + 1 ) 2 4 n = n ( 1 )i 2

i =0 log n

Using the following geometric series,

i =0

1 = 1 x , if | x |< 1

Selection in Expected Linear Time

Then we have

n ( 1 )i 2
i =0

log n

< n( 11 .5 ) 0 = 2n = ( n )

