← All lectures
Lecture · L04

Sorting Algorithms

donesortingalgorithms

The Problem We Are Solving

You have a sequence of nn numbers. You want to rearrange them so they are in non-decreasing order.

More precisely: given [a1,a2,,an][a_1, a_2, \ldots, a_n], find a permutation pp of the indices {1,,n}\{1, \ldots, n\} such that ap(1)ap(2)ap(n)a_{p(1)} \leq a_{p(2)} \leq \cdots \leq a_{p(n)}.

A permutation is just a reordering. For [7,32,88,21,92,4][7, 32, 88, 21, 92, -4] the output is [4,7,21,32,88,92][-4, 7, 21, 32, 88, 92].

Each element has a key (what you sort by) and possibly a value (data attached to it). You sort by key, you carry the value along for the ride.


Two Properties That Classify Every Sorting Algorithm

In-Place

The algorithm sorts the array using only the original array itself, with at most a constant amount of extra memory (a few variables, not another array of size nn). No auxiliary array proportional to nn is needed.

Why does this matter? Memory. If your array is 4GB, you can't afford a second 4GB copy just to sort it.

Stable

If two elements have equal keys, they appear in the output in the same relative order they had in the input.

Why does this matter? Consider sorting a list of students first by grade, then by name. If the name sort is stable, students with the same name will remain in grade order from the first sort. If it is not stable, the grade ordering is destroyed.

Making any algorithm stable: replace each key kk with the pair (k,i)(k, i) where ii is the original index. Compare pairs lexicographically: first by kk, then by ii. This breaks all ties by original position, guaranteeing stability. Cost: slightly larger keys, same asymptotic complexity.


The Array Data Structure

An array is a fixed-size block of memory. Element A[i]A[i] lives at a fixed memory address computable from the base address and ii, so every access is O(1)O(1) regardless of ii.

Arrays cannot grow. To resize, you allocate a new bigger array and copy everything over — O(n)O(n) work. This is fine if you do it rarely.

Notation: A[ij]A[i \ldots j] is the subarray A[i],A[i+1],,A[j]A[i], A[i+1], \ldots, A[j].


Part I — Incremental Algorithms

The idea behind incremental sorting is simple: maintain a sorted prefix and extend it one element at a time.

Formally, at step kk you assume A[1k]A[1 \ldots k] is sorted and you extend the sorting to A[1k+1]A[1 \ldots k+1].

Both algorithms below are in-place.


SelectionSort

The Idea

At step ii, the first i1i-1 positions already contain their final values (the i1i-1 smallest elements in sorted order). Find the minimum of the remaining unsorted portion A[in]A[i \ldots n] and swap it into position ii.

Pseudocode

SelectionSort(A[1..n]):
    for i = 1 to n-1:
        m = MinIndex(A, i)      // find index of min in A[i..n]
        Swap(A, i, m)

MinIndex(A[1..n], i):
    for j = i+1 to n:
        if A[j] < A[i]:
            i = j
    return i

Swap(A[1..n], i, j):
    tmp = A[i]
    A[i] = A[j]
    A[j] = tmp

Traced Example

Array: [7,2,5,7,4,1][7, 2, 5, 7, 4, 1]

  • Step 1: min of [7,2,5,7,4,1][7,2,5,7,4,1] is 11 at index 6. Swap positions 1 and 6 → [1,2,5,7,4,7][1, 2, 5, 7, 4, 7]
  • Step 2: min of [2,5,7,4,7][2,5,7,4,7] is 22 at index 2. Swap positions 2 and 2 → [1,2,5,7,4,7][1, 2, 5, 7, 4, 7]
  • Step 3: min of [5,7,4,7][5,7,4,7] is 44 at index 5. Swap positions 3 and 5 → [1,2,4,7,5,7][1, 2, 4, 7, 5, 7]
  • Step 4: min of [7,5,7][7,5,7] is 55 at index 5. Swap positions 4 and 5 → [1,2,4,5,7,7][1, 2, 4, 5, 7, 7]
  • Step 5: min of [7,7][7,7] is 77 at index 5. Swap positions 5 and 5 → done.

Final: [1,2,4,5,7,7][1, 2, 4, 5, 7, 7]

Why It Is Not Stable

The swap in step 1 of the example moved the 77 from position 1 all the way to position 6, jumping over the other 77 at position 4. Two elements with equal keys have swapped their relative order. There is no fix that preserves in-place and Θ(n2)\Theta(n^2) — stability is sacrificed by design.

Time Complexity

At step ii, MinIndex scans from position i+1i+1 to nn: that is nin - i comparisons. The swap is O(1)O(1).

Crucially, both loops always run to completion regardless of the input. The array could already be sorted; it doesn't matter. The inner loop always scans the full remaining portion.

Total comparisons:

i=1n1(ni)=(n1)+(n2)++1=n(n1)2=Θ(n2)\sum_{i=1}^{n-1}(n - i) = (n-1) + (n-2) + \cdots + 1 = \frac{n(n-1)}{2} = \Theta(n^2)

Best case, average case, worst case: all Θ(n2)\Theta(n^2).

The input order is completely irrelevant to the running time.


InsertionSort

The Idea

At step ii, assume A[1i1]A[1 \ldots i-1] is sorted. Take element A[i]A[i] and walk it leftward, swapping it with its left neighbor as long as it is smaller. It bubbles into its correct position within the sorted prefix.

This is exactly how most people sort a hand of playing cards: you pick up one card at a time and slide it left until it fits.

Pseudocode

InsertionSort(A[1..n]):
    for i = 2 to n:
        j = i
        while j > 1 and A[j] < A[j-1]:
            Swap(A, j, j-1)
            j = j - 1

Traced Example

Array: [7,2,5,7,4,1][7, 2, 5, 7, 4, 1]

  • i=2i=2: A[2]=2<A[1]=7A[2]=2 < A[1]=7. Swap. [2,7,5,7,4,1][2, 7, 5, 7, 4, 1]. j=1j=1, stop.
  • i=3i=3: A[3]=5<A[2]=7A[3]=5 < A[2]=7. Swap. [2,5,7,7,4,1][2, 5, 7, 7, 4, 1]. 5>25 > 2, stop.
  • i=4i=4: A[4]=7A[3]=7A[4]=7 \geq A[3]=7. Condition is strict (<<), so no swap. Stop.
  • i=5i=5: A[5]=4<A[4]=7A[5]=4 < A[4]=7. Swap. Then 4<54 < 5. Swap. Then 4>24 > 2, stop. [2,4,5,7,7,1][2, 4, 5, 7, 7, 1]
  • i=6i=6: 11 bubbles all the way left. [1,2,4,5,7,7][1, 2, 4, 5, 7, 7]

Why It Is Stable

The while loop condition uses strict inequality: A[j] < A[j-1]. When two elements are equal, the condition is false and the loop stops. Equal elements are never swapped past each other. Their original relative order is preserved.

This is not accidental — it is an intentional design choice. If you changed < to <=, InsertionSort would become unstable.

Time Complexity — Worst Case

The worst case is when the array is sorted in reverse order (largest to smallest). Then at step ii, the element A[i]A[i] is smaller than all of A[1i1]A[1 \ldots i-1], so the while loop runs i1i-1 times (it travels all the way to position 1).

i=2n(i1)=1+2++(n1)=n(n1)2=Θ(n2)\sum_{i=2}^{n}(i-1) = 1 + 2 + \cdots + (n-1) = \frac{n(n-1)}{2} = \Theta(n^2)

Time Complexity — Best Case

The best case is when the array is already sorted. At step ii, A[i]A[i1]A[i] \geq A[i-1], so the while loop condition is false immediately and the loop body never executes.

i=2n1=n1=Θ(n)\sum_{i=2}^{n} 1 = n - 1 = \Theta(n)

This is linear. SelectionSort never achieves this — it always does Θ(n2)\Theta(n^2) work.

Time Complexity — Average Case

We assume that at step ii, the while loop runs (i1)/2(i-1)/2 times on average (the element A[i]A[i] lands somewhere in the middle of A[1i1]A[1 \ldots i-1]).

i=2ni12=12i=1n1i=12n(n1)2=Θ(n2)\sum_{i=2}^{n}\frac{i-1}{2} = \frac{1}{2}\sum_{i=1}^{n-1}i = \frac{1}{2} \cdot \frac{n(n-1)}{2} = \Theta(n^2)

The constant factor is halved compared to the worst case, but the growth rate is still Θ(n2)\Theta(n^2).

Nearly Sorted Arrays — A Key Insight

The best case Θ(n)\Theta(n) generalizes beyond perfectly sorted arrays.

Suppose the array is sorted except for kk misplaced elements. The outer for loop always runs n1n-1 times, contributing Θ(n)\Theta(n). The while loop only does significant work on the kk misplaced elements. Each one travels at most nn positions leftward.

Total cost: Θ(n)+O(nk)=O(nk)\Theta(n) + O(nk) = O(nk).

When k=O(1)k = O(1) (constant number of misplaced elements), this is Θ(n)\Theta(n) — linear.

This is important in practice. Databases, log files, and sensor streams often produce nearly-sorted data. InsertionSort handles these efficiently while MergeSort or QuickSort would still pay Θ(nlogn)\Theta(n \log n).


Part II — Divide and Conquer

A general algorithmic strategy:

  1. Divide: split the problem into smaller subproblems.
  2. Conquer: solve each subproblem recursively (base case: trivially small).
  3. Combine: merge the subproblem solutions into a solution for the whole.

The two algorithms below both split the array in two and recurse, but they differ in where the work happens:

  • MergeSort: trivial divide, hard combine (merge).
  • QuickSort: hard divide (partition), trivial combine (nothing to do).

MergeSort

The Algorithm

  1. Divide: split A[pr]A[p \ldots r] at the midpoint q=(p+r)/2q = \lfloor(p+r)/2\rfloor into A[pq]A[p \ldots q] and A[q+1r]A[q+1 \ldots r].
  2. Conquer: recursively sort both halves.
  3. Combine: merge the two sorted halves into a single sorted array.

Base case: a subarray of length 1 is already sorted (nothing to do).

MergeSort(A[1..n], p, r):
    if p < r:
        q = floor((p + r) / 2)
        MergeSort(A, p, q)
        MergeSort(A, q+1, r)
        Merge(A, p, q, r)

First call: MergeSort(A, 1, n).

The Merge Procedure

This is the heart of the algorithm. You have two sorted subarrays A[pq]A[p \ldots q] and A[q+1r]A[q+1 \ldots r] and you need to produce a single sorted subarray in A[pr]A[p \ldots r].

The approach: use two pointers, one for each half. At each step, compare the two pointed-to elements and take the smaller one into a temporary array BB. When one half is exhausted, dump the remainder of the other half.

Merge(A[1..n], p, q, r):
    B = new array of size r - p + 1
    i = p          // pointer into left half A[p..q]
    j = q + 1      // pointer into right half A[q+1..r]
    k = 1          // pointer into B

    while i <= q and j <= r:
        if A[i] <= A[j]:
            B[k] = A[i];  i++
        else:
            B[k] = A[j];  j++
        k++

    // drain whichever half still has elements
    while i <= q:
        B[k] = A[i];  k++;  i++
    while j <= r:
        B[k] = A[j];  k++;  j++

    // copy B back into A[p..r]
    for k = 1 to r - p + 1:
        A[p + k - 1] = B[k]

Why Merge Is Θ(rp+1)\Theta(r - p + 1)

Each iteration of each while loop increments exactly one of ii, jj, or kk. The pointers ii and jj together move from pp and q+1q+1 all the way to qq and rr respectively — exactly rp+1r - p + 1 total increments across all three while loops. The final copy loop runs rp+1r - p + 1 times. Total: Θ(rp+1)\Theta(r - p + 1), linear in the size of the subarray.

Why MergeSort Is Stable

The merge condition is A[i] <= A[j] (with \leq, not <<). When two elements are equal, the one from the left half is taken first. Since the left half corresponds to earlier positions in the original array, relative order is preserved.

MergeSort Is Not In-Place

The merge step allocates a temporary array BB of size rp+1r - p + 1. At the top level, this is Θ(n)\Theta(n) extra memory. It is possible to merge in-place, but the resulting algorithm is significantly more complex and slower in practice.

Time Complexity

The recurrence:

T(n)={1n12T(n/2)+Θ(n)n>1T(n) = \begin{cases} 1 & n \leq 1 \\ 2T(n/2) + \Theta(n) & n > 1 \end{cases}

Two recursive calls each on half the array, plus Θ(n)\Theta(n) for the merge.

Solving with the Master Theorem: this is the form T(n)=aT(n/b)+f(n)T(n) = aT(n/b) + f(n) with a=2a = 2, b=2b = 2, f(n)=Θ(n)f(n) = \Theta(n). We compare f(n)f(n) with nlogba=nlog22=n1=nn^{\log_b a} = n^{\log_2 2} = n^1 = n. Since f(n)=Θ(n)=Θ(nlogba)f(n) = \Theta(n) = \Theta(n^{\log_b a}), we are in case 2 of the Master Theorem: T(n)=Θ(nlogn)T(n) = \Theta(n \log n).

Intuition without the Master Theorem: at each level of recursion, the total work for all merges at that level is Θ(n)\Theta(n) (every element participates in exactly one merge per level). There are Θ(logn)\Theta(\log n) levels (the recursion halves each time). Total: Θ(nlogn)\Theta(n \log n).

Best case = average case = worst case = Θ(nlogn)\Theta(n \log n). The initial order of the array is completely irrelevant.


QuickSort

The Algorithm

  1. Divide: choose a pivot element. Partition the array so that everything \leq pivot goes left and everything >> pivot goes right. The pivot lands in its final sorted position.
  2. Conquer: recursively sort the left and right parts.
  3. Combine: nothing. Once both recursive calls return, the array is sorted.
QuickSort(A[1..n], p, r):
    if p < r:
        q = Partition(A, p, r)
        QuickSort(A, p, q - 1)
        QuickSort(A, q + 1, r)

First call: QuickSort(A, 1, n).

The Partition Procedure

Partition(A[1..n], p, r) -> int:
    x = A[r]       // pivot = last element
    i = p - 1
    for j = p to r - 1:
        if A[j] <= x:
            Swap(A, i + 1, j)
            i = i + 1
    Swap(A, i + 1, r)   // place pivot in its final position
    return i + 1

What this maintains: at all times, the subarray is divided into three regions:

  • A[pi]A[p \ldots i]: elements x\leq x (the pivot value)
  • A[i+1j1]A[i+1 \ldots j-1]: elements >x> x
  • A[jr1]A[j \ldots r-1]: not yet examined
  • A[r]A[r]: the pivot, waiting to be placed

As jj scans from pp to r1r-1: if A[j]xA[j] \leq x, swap it into the x\leq x region (swap A[j]A[j] with A[i+1]A[i+1]) and advance ii. If A[j]>xA[j] > x, do nothing — the >x> x region expands naturally.

After the loop, swap A[i+1]A[i+1] with A[r]A[r] to place the pivot. Now everything to the left of position i+1i+1 is x\leq x and everything to the right is >x> x, and A[i+1]=xA[i+1] = x is in its final position.

Traced Partition Example

Array: [7,2,5,9,1,4][7, 2, 5, 9, 1, 4], pivot = A[6]=4A[6] = 4, i=0i = 0

  • j=1j=1: A[1]=7>4A[1]=7 > 4, nothing.
  • j=2j=2: A[2]=24A[2]=2 \leq 4. Swap A[1]A[1] and A[2]A[2]. i=1i=1. Array: [2,7,5,9,1,4][2,7,5,9,1,4]
  • j=3j=3: A[3]=5>4A[3]=5 > 4, nothing.
  • j=4j=4: A[4]=9>4A[4]=9 > 4, nothing.
  • j=5j=5: A[5]=14A[5]=1 \leq 4. Swap A[2]A[2] and A[5]A[5]. i=2i=2. Array: [2,1,5,9,7,4][2,1,5,9,7,4]
  • Loop ends. Swap A[3]A[3] and A[6]A[6]. Array: [2,1,4,9,7,5][2,1,4,9,7,5]
  • Return q=3q = 3.

Verify: A[12]=[2,1]A[1 \ldots 2] = [2,1] are all 4\leq 4. A[3]=4A[3] = 4 is the pivot in its final position. A[46]=[9,7,5]A[4 \ldots 6] = [9,7,5] are all >4> 4. ✓

Partition always costs Θ(rp+1)\Theta(r - p + 1): it scans every element exactly once. It is in-place (only swaps) but not stable (swaps disrupt relative order of equal elements).

Worst-Case Analysis

The worst case for QuickSort is when every partition splits the array as unequally as possible: one subarray has size 0 and the other has size n1n-1.

When does this happen? When the pivot is always the smallest or largest element in the subarray. For example, if the array is already sorted (or reverse-sorted) and you always pick the last element as pivot, every partition is maximally unbalanced.

Recurrence:

T(n)=T(n1)+T(0)+Θ(n)=T(n1)+Θ(n)T(n) = T(n-1) + T(0) + \Theta(n) = T(n-1) + \Theta(n)

(T(0)=O(1)T(0) = O(1), the empty subarray costs nothing.)

Unrolling the recurrence:

T(n)=T(n1)+nT(n2)+(n1)+nk=1nk=n(n+1)2=Θ(n2)T(n) = T(n-1) + n \approx T(n-2) + (n-1) + n \approx \cdots \approx \sum_{k=1}^{n} k = \frac{n(n+1)}{2} = \Theta(n^2)

Worst-case QuickSort is as slow as SelectionSort. The recursion tree is a completely unbalanced chain of depth nn.

Best-Case Analysis

The best case is when every partition splits the array exactly in half: two subarrays of size n/2n/2.

Recurrence:

T(n)=2T(n/2)+Θ(n)T(n) = 2T(n/2) + \Theta(n)

This is identical to MergeSort's recurrence. By the Master Theorem: T(n)=Θ(nlogn)T(n) = \Theta(n \log n).

The recursion tree is perfectly balanced, with logn\log n levels, Θ(n)\Theta(n) work per level.

Average-Case Analysis

In the general case, the partition lands at some position ii (0-indexed), creating subarrays of size ii and ni1n - i - 1:

T(n)=T(i)+T(ni1)+Θ(n)T(n) = T(i) + T(n - i - 1) + \Theta(n)

The value of ii changes at every recursive call — it depends on the pivot value relative to the subarray. Under the assumption that all nn possible split positions are equally likely (each with probability 1/n1/n), the average cost is:

T(n)=1ni=0n1[T(i)+T(ni1)]+nT(n) = \frac{1}{n}\sum_{i=0}^{n-1}\bigl[T(i) + T(n-i-1)\bigr] + n

Since i=0n1T(ni1)=i=0n1T(i)\sum_{i=0}^{n-1} T(n-i-1) = \sum_{i=0}^{n-1} T(i) (the same terms, just listed in reverse order):

T(n)=2ni=0n1T(i)+nT(n) = \frac{2}{n}\sum_{i=0}^{n-1} T(i) + n

Solving the Average-Case Recurrence

Multiply both sides by nn:

n \cdot T(n) = 2\sum_{i=0}^{n-1} T(i) + n^2 \tag{1}

Replace nn with n1n-1:

(n-1) \cdot T(n-1) = 2\sum_{i=0}^{n-2} T(i) + (n-1)^2 \tag{2}

Subtract (2) from (1). The sums almost cancel — only T(n1)T(n-1) remains from the left sum:

nT(n)(n1)T(n1)=2T(n1)+2n1n \cdot T(n) - (n-1) \cdot T(n-1) = 2T(n-1) + 2n - 1

nT(n)=(n+1)T(n1)+2n1n \cdot T(n) = (n+1) \cdot T(n-1) + 2n - 1

Divide both sides by n(n+1)n(n+1):

T(n)n+1=T(n1)n+2n+11n(n+1)T(n1)n+2n+1\frac{T(n)}{n+1} = \frac{T(n-1)}{n} + \frac{2}{n+1} - \frac{1}{n(n+1)} \leq \frac{T(n-1)}{n} + \frac{2}{n+1}

Now define S(n)=T(n)/(n+1)S(n) = T(n)/(n+1). The inequality becomes:

S(n)S(n1)+2n+1S(n) \leq S(n-1) + \frac{2}{n+1}

Apply this telescopically n2n-2 times:

S(n)S(1)+k=3n+12k=12+2k=3n+11kS(n) \leq S(1) + \sum_{k=3}^{n+1}\frac{2}{k} = \frac{1}{2} + 2\sum_{k=3}^{n+1}\frac{1}{k}

The harmonic sum k=1m1/k=Hmlnm\sum_{k=1}^{m} 1/k = H_m \approx \ln m. So:

S(n)12+2lnnS(n) \leq \frac{1}{2} + 2\ln n

Multiplying back:

T(n)=(n+1)S(n)(n+1)(12+2lnn)=O(nlogn)T(n) = (n+1) \cdot S(n) \leq (n+1)\left(\frac{1}{2} + 2\ln n\right) = O(n \log n)

The average case is O(nlogn)O(n \log n).

The constant hidden in the OO is about 2lnn1.386log2n2\ln n \approx 1.386 \log_2 n, roughly 1.391.39 times the constant for MergeSort. In practice, QuickSort's cache-friendliness and low overhead make it faster than MergeSort despite the slightly larger constant.

The Pivot Problem and Randomization

The average-case analysis assumes all nn split positions are equally likely. With the deterministic pivot (always A[r]), this assumption is about the input. Nearly-sorted arrays violate it badly — they trigger the worst case.

Fix: choose the pivot uniformly at random from A[pr]A[p \ldots r].

RandomizedPartition(A[1..n], p, r) -> int:
    i = Random(p, r)    // uniform random index in [p, r]
    Swap(A, i, r)
    return Partition(A, p, r)

With a random pivot, the probability of a particular split is 1/n1/n regardless of the input order. The "equally likely" assumption is now a property of the algorithm, not a hope about the input. The expected running time is O(nlogn)O(n \log n) for every possible input.


Sorting in Python: TimSort and PowerSort

Python's sorted() and list.sort() use PowerSort (since Python 3.11), a refined version of TimSort (Python 2.3 to 3.10).

TimSort is a hybrid: it exploits structure already present in the data.

The algorithm identifies runs — maximal already-sorted (or reverse-sorted, which it flips) contiguous subarrays. Short runs are extended using InsertionSort up to a minimum size (typically 32–64 elements). Then runs are merged using a MergeSort-style merge procedure. Runs of similar size are merged preferentially to maintain balance.

Why this is clever: real-world data often has natural order already embedded in it. TimSort finds and exploits this structure instead of ignoring it.

Complexity: O(n)O(n) best case (already sorted), O(nlogn)O(n \log n) worst and average case. Stable. Not in-place.

PowerSort improves the run-merging strategy using a near-optimal merge tree, reducing the number of comparisons in practice.


Summary Table — Comparison-Based Algorithms

AlgorithmBestAverageWorstIn-PlaceStable
SelectionSortΘ(n2)\Theta(n^2)Θ(n2)\Theta(n^2)Θ(n2)\Theta(n^2)YesNo
InsertionSortΘ(n)\Theta(n)Θ(n2)\Theta(n^2)Θ(n2)\Theta(n^2)YesYes
MergeSortΘ(nlogn)\Theta(n \log n)Θ(nlogn)\Theta(n \log n)Θ(nlogn)\Theta(n \log n)NoYes
QuickSortΘ(nlogn)\Theta(n \log n)O(nlogn)O(n \log n)Θ(n2)\Theta(n^2)YesNo
TimSortO(n)O(n)O(nlogn)O(n \log n)O(nlogn)O(n \log n)NoYes

No single algorithm dominates on all axes. This motivates the next question.


Part III — The Lower Bound

Can We Do Better Than O(nlogn)O(n \log n)?

Can any comparison-based algorithm sort faster than O(nlogn)O(n \log n) in the worst case?

No. This is a theorem, not an empirical observation.

Decision Trees

Any comparison sort can be modeled as a decision tree. Each internal node represents a comparison A[i]A[i] vs A[j]A[j]. The left branch is taken if A[i]A[j]A[i] \leq A[j], the right branch if A[i]>A[j]A[i] > A[j]. Each leaf represents a specific output permutation.

A run of the algorithm on a given input traces a root-to-leaf path. The number of comparisons for that input equals the depth of the leaf. The worst-case number of comparisons equals the height of the tree.

Every comparison sort has its own decision tree, but all correct sorts share a key property: the tree must have at least n!n! leaves, one for each possible permutation of nn elements (any permutation could be the correct answer for some input, so the algorithm must be able to output all of them).

Key Lemma: Height of a Binary Tree

Theorem. A binary tree with kk leaves where every internal node has exactly two children has height at least log2k\log_2 k.

Proof by induction on kk.

Base case (k=1k = 1): the tree is a single node (the leaf). Height =0log21=0= 0 \geq \log_2 1 = 0. ✓

Inductive step: assume the result holds for all trees with fewer than kk leaves. Let TkT_k be a tree with kk leaves. It has a root with two subtrees Tk1T_{k_1} (left) and Tk2T_{k_2} (right), where k1+k2=kk_1 + k_2 = k and without loss of generality k1k21k_1 \geq k_2 \geq 1.

By the inductive hypothesis (since k1,k2<kk_1, k_2 < k):

h(Tk)=1+max(h(Tk1),h(Tk2))h(T_k) = 1 + \max(h(T_{k_1}), h(T_{k_2}))

1+h(Tk1)(take the larger subtree)\geq 1 + h(T_{k_1}) \qquad\text{(take the larger subtree)}

1+log2k1(inductive hypothesis)\geq 1 + \log_2 k_1 \qquad\text{(inductive hypothesis)}

=log22+log2k1=log22k1= \log_2 2 + \log_2 k_1 = \log_2 2k_1

log2(k1+k2)=log2k(since k1k2 implies 2k1k1+k2)\geq \log_2(k_1 + k_2) = \log_2 k \qquad\text{(since } k_1 \geq k_2 \text{ implies } 2k_1 \geq k_1 + k_2\text{)}

So h(Tk)log2kh(T_k) \geq \log_2 k. ✓

The Ω(nlogn)\Omega(n \log n) Lower Bound

Theorem. Any comparison sort requires Ω(nlogn)\Omega(n \log n) comparisons in the worst case.

Proof. A correct comparison sort's decision tree must have at least n!n! leaves (it must be able to output every permutation). By the lemma above, its height is at least log2(n!)\log_2(n!).

By Stirling's approximation, log2(n!)=Ω(nlogn)\log_2(n!) = \Omega(n \log n). (More precisely: n!(n/e)nn! \geq (n/e)^n, so log2(n!)nlog2(n/e)=nlog2nnlog2e=Ω(nlogn)\log_2(n!) \geq n\log_2(n/e) = n\log_2 n - n\log_2 e = \Omega(n \log n).)

The worst-case number of comparisons \geq height of decision tree log2(n!)=Ω(nlogn)\geq \log_2(n!) = \Omega(n \log n). ✓

Consequence: MergeSort is asymptotically optimal among comparison sorts. Its worst-case cost Θ(nlogn)\Theta(n \log n) matches the lower bound exactly.

Important caveat: this lower bound applies only to algorithms that determine order exclusively through pairwise comparisons. If you know something extra about the keys — like they are integers in a bounded range — you can exploit that structure and break the barrier.


Part IV — Non-Comparison Sorting

The Key Insight

The Ω(nlogn)\Omega(n \log n) bound comes from modeling sorts as decision trees, where each node is a comparison. If instead your algorithm uses arithmetic on the keys (like treating a key as an array index), it is not a comparison sort and the lower bound does not apply.

The tradeoff: you need assumptions about the structure of the keys.


CountingSort

Assumption

All keys are integers in the range [a,b][a, b].

The Idea

Instead of comparing elements, count how many times each possible key value appears. Then reconstruct the sorted array directly from the counts.

Simple Version (Integers Only)

CountingSort(A[1..n]):
    a = min(A),  b = max(A),  k = b - a + 1
    B = new array of size k, initialized to 0

    // Count occurrences
    for i = 1 to n:
        B[A[i] - a + 1]  +=  1

    // Reconstruct A from counts
    j = 1
    for i = 1 to k:
        while B[i] > 0:
            A[j] = i + a - 1
            B[i] -= 1
            j += 1

The index shift A[i]a+1A[i] - a + 1 maps the range [a,b][a, b] to [1,k][1, k], so it works even if aa is negative.

This version is not stable and destroys associated data — it just reconstructs values, not the original elements.

Stable Version (With Associated Data)

To sort elements that carry data alongside their key, we need to know where each element goes, not just how many there are. We use prefix sums.

After counting, transform BB so that B[i]B[i] holds the number of elements with key i+a1\leq i + a - 1. This gives us the last position where a key-ii element should land in the output.

To preserve stability, iterate AA from right to left during placement. This ensures that among elements with the same key, the one appearing later in AA (higher index) is placed later in CC (higher index) — preserving relative order.

CountingSort(A[1..n]):
    a = minkey(A),  b = maxkey(A),  k = b - a + 1
    B = new array of size k, initialized to 0
    C = new array of size n

    // Count
    for i = 1 to n:
        B[A[i].key - a + 1] += 1

    // Prefix sums: B[i] = number of keys <= i + a - 1
    for i = 2 to k:
        B[i] = B[i] + B[i-1]

    // Place elements (right to left for stability)
    for i = n downto 1:
        C[B[A[i].key - a + 1]] = A[i]
        B[A[i].key - a + 1] -= 1

    // Copy back
    for i = 1 to n:
        A[i] = C[i]

Why the Right-to-Left Pass Ensures Stability

Suppose two elements A[s]A[s] and A[t]A[t] have the same key, with s<ts < t (so A[s]A[s] appeared earlier). After prefix sums, both would map to the same position range. The right-to-left pass processes A[t]A[t] first (it has the higher index). It takes the current value B[]B[\ldots] and then decrements it. When A[s]A[s] is processed later, it gets the decremented value — a lower index in CC. Lower index in CC means earlier in the output. So A[s]A[s] (earlier in input) ends up earlier in output. ✓

Time Complexity

Computing min and max: Θ(n)\Theta(n). Count loop: Θ(n)\Theta(n). Prefix sum loop: Θ(k)\Theta(k). Placement loop: Θ(n)\Theta(n). Copy loop: Θ(n)\Theta(n).

Total: Θ(n+k)\Theta(n + k) for all cases (best, average, worst).

Critical observation: if k=O(n)k = O(n) (the range is proportional to the number of elements), this is Θ(n)\Theta(n) — genuinely linear. But if knk \gg n, CountingSort is wasteful. Sorting A=[1,2100]A = [1, 2^{100}] gives n=2n = 2 but k=2100k = 2^{100}: catastrophically bad.

CountingSort is stable (in the general version) and not in-place (needs arrays BB of size kk and CC of size nn).


RadixSort

Assumption

Keys are composed of digits or characters — integers, strings, etc. The radix rr is the number of distinct digit values (10 for decimal, 26 for lowercase letters, 256 for bytes, etc.).

The Idea

Sort digit by digit, starting from the least significant digit (rightmost), using a stable sort at each digit level.

RadixSort(A[1..n]):
    d = max number of digits in any key
    for i = 1 to d:
        stable sort A on digit i (digit 1 = rightmost)

Why Least-Significant-Digit First?

This is the non-obvious part of the algorithm.

Invariant: after processing digits 11 through ii, the array is sorted with respect to its last ii digits.

Base case (i=0i = 0): trivially true.

Inductive step: assume after ii passes, the array is sorted by its last ii digits. Now we sort by digit i+1i+1 using a stable sort. After this sort:

  • Elements with different digit i+1i+1 values are in the correct order relative to each other (the new sort handles this).
  • Elements with the same digit i+1i+1 value remain sorted by their last ii digits (stability of the new sort guarantees this — equal elements are not reordered).

So after i+1i+1 passes, the array is sorted by its last i+1i+1 digits. ✓

After all dd passes, it is sorted by all dd digits = fully sorted.

Why not most-significant-digit first? If you sort by the most significant digit first, elements that share that digit need to be sorted among themselves by the remaining digits — you'd need to recursively sort each bucket. That is essentially a different (and more complex) algorithm.

Why Stability Is Not Optional

Consider keys 123123 and 153153. After sorting by the last digit, they might be in order [123,153][123, 153] (both have 3, their relative order depends on what came before). After sorting by the middle digit, 2<52 < 5, so 123123 should stay before 153153. A stable sort guarantees that elements with the same digit in the current position are not reordered — so the ordering from previous passes is preserved. A non-stable sort would randomly scramble the previous work.

Time Complexity

Using CountingSort as the per-digit stable sort, each of the dd passes costs Θ(n+k)\Theta(n + k) where kk is the number of possible digit values (the radix).

Total: Θ(d(n+k))\Theta(d(n + k)).

If k=O(n)k = O(n) and dd is constant (key length doesn't grow with nn), this is Θ(n)\Theta(n) — linear.

RadixSort is stable and not in-place.


Final Summary

AlgorithmBestAverageWorstIn-PlaceStable
SelectionSortΘ(n2)\Theta(n^2)Θ(n2)\Theta(n^2)Θ(n2)\Theta(n^2)YesNo
InsertionSortΘ(n)\Theta(n)Θ(n2)\Theta(n^2)Θ(n2)\Theta(n^2)YesYes
MergeSortΘ(nlogn)\Theta(n \log n)Θ(nlogn)\Theta(n \log n)Θ(nlogn)\Theta(n \log n)NoYes
QuickSortΘ(nlogn)\Theta(n \log n)O(nlogn)O(n \log n)Θ(n2)\Theta(n^2)YesNo
TimSortO(n)O(n)O(nlogn)O(n \log n)O(nlogn)O(n \log n)NoYes
CountingSortΘ(n+k)\Theta(n+k)Θ(n+k)\Theta(n+k)Θ(n+k)\Theta(n+k)NoYes
RadixSortΘ(d(n+k))\Theta(d(n+k))Θ(d(n+k))\Theta(d(n+k))Θ(d(n+k))\Theta(d(n+k))NoYes

Where: nn = number of elements, kk = range of key values (or digit values), dd = maximum number of digits in a key.

The Narrative Arc

The incremental algorithms (SelectionSort, InsertionSort) are simple and quadratic — they don't exploit structure. The divide-and-conquer algorithms (MergeSort, QuickSort) break through to O(nlogn)O(n \log n) by decomposing the problem recursively. The lower bound theorem proves this is optimal for any comparison sort — no pairwise-comparison algorithm can do better. CountingSort and RadixSort escape this bound by exploiting key structure (bounded integers, digit decomposition), achieving linear time under the right conditions. Each improvement comes not from coding tricks but from a deeper understanding of what information the algorithm has access to and how to use it.