ADS — Algorithms & Data Structures

The Problem We Are Solving

You have a sequence of $n$ numbers. You want to rearrange them so they are in non-decreasing order.

More precisely: given $[a_1, a_2, \ldots, a_n]$ , find a permutation $p$ of the indices $\{1, \ldots, n\}$ such that $a_{p(1)} \leq a_{p(2)} \leq \cdots \leq a_{p(n)}$ .

A permutation is just a reordering. For $[7, 32, 88, 21, 92, -4]$ the output is $[-4, 7, 21, 32, 88, 92]$ .

Each element has a key (what you sort by) and possibly a value (data attached to it). You sort by key, you carry the value along for the ride.

Two Properties That Classify Every Sorting Algorithm

In-Place

The algorithm sorts the array using only the original array itself, with at most a constant amount of extra memory (a few variables, not another array of size $n$ ). No auxiliary array proportional to $n$ is needed.

Why does this matter? Memory. If your array is 4GB, you can't afford a second 4GB copy just to sort it.

Stable

If two elements have equal keys, they appear in the output in the same relative order they had in the input.

Why does this matter? Consider sorting a list of students first by grade, then by name. If the name sort is stable, students with the same name will remain in grade order from the first sort. If it is not stable, the grade ordering is destroyed.

Making any algorithm stable: replace each key $k$ with the pair $(k, i)$ where $i$ is the original index. Compare pairs lexicographically: first by $k$ , then by $i$ . This breaks all ties by original position, guaranteeing stability. Cost: slightly larger keys, same asymptotic complexity.

The Array Data Structure

An array is a fixed-size block of memory. Element $A[i]$ lives at a fixed memory address computable from the base address and $i$ , so every access is $O(1)$ regardless of $i$ .

Arrays cannot grow. To resize, you allocate a new bigger array and copy everything over — $O(n)$ work. This is fine if you do it rarely.

Notation: $A[i \ldots j]$ is the subarray $A[i], A[i+1], \ldots, A[j]$ .

Part I — Incremental Algorithms

The idea behind incremental sorting is simple: maintain a sorted prefix and extend it one element at a time.

Formally, at step $k$ you assume $A[1 \ldots k]$ is sorted and you extend the sorting to $A[1 \ldots k+1]$ .

Both algorithms below are in-place.

SelectionSort

The Idea

At step $i$ , the first $i-1$ positions already contain their final values (the $i-1$ smallest elements in sorted order). Find the minimum of the remaining unsorted portion $A[i \ldots n]$ and swap it into position $i$ .

Pseudocode

SelectionSort(A[1..n]):
    for i = 1 to n-1:
        m = MinIndex(A, i)      // find index of min in A[i..n]
        Swap(A, i, m)

MinIndex(A[1..n], i):
    for j = i+1 to n:
        if A[j] < A[i]:
            i = j
    return i

Swap(A[1..n], i, j):
    tmp = A[i]
    A[i] = A[j]
    A[j] = tmp

Traced Example

Array: $[7, 2, 5, 7, 4, 1]$

Step 1: min of $[7,2,5,7,4,1]$ is $1$ at index 6. Swap positions 1 and 6 → $[1, 2, 5, 7, 4, 7]$
Step 2: min of $[2,5,7,4,7]$ is $2$ at index 2. Swap positions 2 and 2 → $[1, 2, 5, 7, 4, 7]$
Step 3: min of $[5,7,4,7]$ is $4$ at index 5. Swap positions 3 and 5 → $[1, 2, 4, 7, 5, 7]$
Step 4: min of $[7,5,7]$ is $5$ at index 5. Swap positions 4 and 5 → $[1, 2, 4, 5, 7, 7]$
Step 5: min of $[7,7]$ is $7$ at index 5. Swap positions 5 and 5 → done.

Final: $[1, 2, 4, 5, 7, 7]$

Why It Is Not Stable

The swap in step 1 of the example moved the $7$ from position 1 all the way to position 6, jumping over the other $7$ at position 4. Two elements with equal keys have swapped their relative order. There is no fix that preserves in-place and $\Theta(n^2)$ — stability is sacrificed by design.

Time Complexity

At step $i$ , MinIndex scans from position $i+1$ to $n$ : that is $n - i$ comparisons. The swap is $O(1)$ .

Crucially, both loops always run to completion regardless of the input. The array could already be sorted; it doesn't matter. The inner loop always scans the full remaining portion.

Total comparisons:

$\sum_{i=1}^{n-1}(n - i) = (n-1) + (n-2) + \cdots + 1 = \frac{n(n-1)}{2} = \Theta(n^2)$

Best case, average case, worst case: all $\Theta(n^2)$ .

The input order is completely irrelevant to the running time.

InsertionSort

The Idea

At step $i$ , assume $A[1 \ldots i-1]$ is sorted. Take element $A[i]$ and walk it leftward, swapping it with its left neighbor as long as it is smaller. It bubbles into its correct position within the sorted prefix.

This is exactly how most people sort a hand of playing cards: you pick up one card at a time and slide it left until it fits.

Pseudocode

InsertionSort(A[1..n]):
    for i = 2 to n:
        j = i
        while j > 1 and A[j] < A[j-1]:
            Swap(A, j, j-1)
            j = j - 1

Traced Example

Array: $[7, 2, 5, 7, 4, 1]$

$i=2$ : $A[2]=2 < A[1]=7$ . Swap. $[2, 7, 5, 7, 4, 1]$ . $j=1$ , stop.
$i=3$ : $A[3]=5 < A[2]=7$ . Swap. $[2, 5, 7, 7, 4, 1]$ . $5 > 2$ , stop.
$i=4$ : $A[4]=7 \geq A[3]=7$ . Condition is strict ( $<$ ), so no swap. Stop.
$i=5$ : $A[5]=4 < A[4]=7$ . Swap. Then $4 < 5$ . Swap. Then $4 > 2$ , stop. $[2, 4, 5, 7, 7, 1]$
$i=6$ : $1$ bubbles all the way left. $[1, 2, 4, 5, 7, 7]$

Why It Is Stable

The while loop condition uses strict inequality: A[j] < A[j-1]. When two elements are equal, the condition is false and the loop stops. Equal elements are never swapped past each other. Their original relative order is preserved.

This is not accidental — it is an intentional design choice. If you changed < to <=, InsertionSort would become unstable.

Time Complexity — Worst Case

The worst case is when the array is sorted in reverse order (largest to smallest). Then at step $i$ , the element $A[i]$ is smaller than all of $A[1 \ldots i-1]$ , so the while loop runs $i-1$ times (it travels all the way to position 1).

$\sum_{i=2}^{n}(i-1) = 1 + 2 + \cdots + (n-1) = \frac{n(n-1)}{2} = \Theta(n^2)$

Time Complexity — Best Case

The best case is when the array is already sorted. At step $i$ , $A[i] \geq A[i-1]$ , so the while loop condition is false immediately and the loop body never executes.

$\sum_{i=2}^{n} 1 = n - 1 = \Theta(n)$

This is linear. SelectionSort never achieves this — it always does $\Theta(n^2)$ work.

Time Complexity — Average Case

We assume that at step $i$ , the while loop runs $(i-1)/2$ times on average (the element $A[i]$ lands somewhere in the middle of $A[1 \ldots i-1]$ ).

$\sum_{i=2}^{n}\frac{i-1}{2} = \frac{1}{2}\sum_{i=1}^{n-1}i = \frac{1}{2} \cdot \frac{n(n-1)}{2} = \Theta(n^2)$

The constant factor is halved compared to the worst case, but the growth rate is still $\Theta(n^2)$ .

Nearly Sorted Arrays — A Key Insight

The best case $\Theta(n)$ generalizes beyond perfectly sorted arrays.

Suppose the array is sorted except for $k$ misplaced elements. The outer for loop always runs $n-1$ times, contributing $\Theta(n)$ . The while loop only does significant work on the $k$ misplaced elements. Each one travels at most $n$ positions leftward.

Total cost: $\Theta(n) + O(nk) = O(nk)$ .

When $k = O(1)$ (constant number of misplaced elements), this is $\Theta(n)$ — linear.

This is important in practice. Databases, log files, and sensor streams often produce nearly-sorted data. InsertionSort handles these efficiently while MergeSort or QuickSort would still pay $\Theta(n \log n)$ .

Part II — Divide and Conquer

A general algorithmic strategy:

Divide: split the problem into smaller subproblems.
Conquer: solve each subproblem recursively (base case: trivially small).
Combine: merge the subproblem solutions into a solution for the whole.

The two algorithms below both split the array in two and recurse, but they differ in where the work happens:

MergeSort: trivial divide, hard combine (merge).
QuickSort: hard divide (partition), trivial combine (nothing to do).

MergeSort

The Algorithm

Divide: split $A[p \ldots r]$ at the midpoint $q = \lfloor(p+r)/2\rfloor$ into $A[p \ldots q]$ and $A[q+1 \ldots r]$ .
Conquer: recursively sort both halves.
Combine: merge the two sorted halves into a single sorted array.

Base case: a subarray of length 1 is already sorted (nothing to do).

MergeSort(A[1..n], p, r):
    if p < r:
        q = floor((p + r) / 2)
        MergeSort(A, p, q)
        MergeSort(A, q+1, r)
        Merge(A, p, q, r)

First call: MergeSort(A, 1, n).

The Merge Procedure

This is the heart of the algorithm. You have two sorted subarrays $A[p \ldots q]$ and $A[q+1 \ldots r]$ and you need to produce a single sorted subarray in $A[p \ldots r]$ .

The approach: use two pointers, one for each half. At each step, compare the two pointed-to elements and take the smaller one into a temporary array $B$ . When one half is exhausted, dump the remainder of the other half.

Merge(A[1..n], p, q, r):
    B = new array of size r - p + 1
    i = p          // pointer into left half A[p..q]
    j = q + 1      // pointer into right half A[q+1..r]
    k = 1          // pointer into B

    while i <= q and j <= r:
        if A[i] <= A[j]:
            B[k] = A[i];  i++
        else:
            B[k] = A[j];  j++
        k++

    // drain whichever half still has elements
    while i <= q:
        B[k] = A[i];  k++;  i++
    while j <= r:
        B[k] = A[j];  k++;  j++

    // copy B back into A[p..r]
    for k = 1 to r - p + 1:
        A[p + k - 1] = B[k]

Why Merge Is $\Theta(r - p + 1)$

Each iteration of each while loop increments exactly one of $i$ , $j$ , or $k$ . The pointers $i$ and $j$ together move from $p$ and $q+1$ all the way to $q$ and $r$ respectively — exactly $r - p + 1$ total increments across all three while loops. The final copy loop runs $r - p + 1$ times. Total: $\Theta(r - p + 1)$ , linear in the size of the subarray.

Why MergeSort Is Stable

The merge condition is A[i] <= A[j] (with $\leq$ , not $<$ ). When two elements are equal, the one from the left half is taken first. Since the left half corresponds to earlier positions in the original array, relative order is preserved.

MergeSort Is Not In-Place

The merge step allocates a temporary array $B$ of size $r - p + 1$ . At the top level, this is $\Theta(n)$ extra memory. It is possible to merge in-place, but the resulting algorithm is significantly more complex and slower in practice.

Time Complexity

The recurrence:

$T(n) = \begin{cases} 1 & n \leq 1 \\ 2T(n/2) + \Theta(n) & n > 1 \end{cases}$

Two recursive calls each on half the array, plus $\Theta(n)$ for the merge.

Solving with the Master Theorem: this is the form $T(n) = aT(n/b) + f(n)$ with $a = 2$ , $b = 2$ , $f(n) = \Theta(n)$ . We compare $f(n)$ with $n^{\log_b a} = n^{\log_2 2} = n^1 = n$ . Since $f(n) = \Theta(n) = \Theta(n^{\log_b a})$ , we are in case 2 of the Master Theorem: $T(n) = \Theta(n \log n)$ .

Intuition without the Master Theorem: at each level of recursion, the total work for all merges at that level is $\Theta(n)$ (every element participates in exactly one merge per level). There are $\Theta(\log n)$ levels (the recursion halves each time). Total: $\Theta(n \log n)$ .

Best case = average case = worst case = $\Theta(n \log n)$ . The initial order of the array is completely irrelevant.

QuickSort

The Algorithm

Divide: choose a pivot element. Partition the array so that everything $\leq$ pivot goes left and everything $>$ pivot goes right. The pivot lands in its final sorted position.
Conquer: recursively sort the left and right parts.
Combine: nothing. Once both recursive calls return, the array is sorted.

QuickSort(A[1..n], p, r):
    if p < r:
        q = Partition(A, p, r)
        QuickSort(A, p, q - 1)
        QuickSort(A, q + 1, r)

First call: QuickSort(A, 1, n).

The Partition Procedure

Partition(A[1..n], p, r) -> int:
    x = A[r]       // pivot = last element
    i = p - 1
    for j = p to r - 1:
        if A[j] <= x:
            Swap(A, i + 1, j)
            i = i + 1
    Swap(A, i + 1, r)   // place pivot in its final position
    return i + 1

What this maintains: at all times, the subarray is divided into three regions:

$A[p \ldots i]$ : elements $\leq x$ (the pivot value)
$A[i+1 \ldots j-1]$ : elements $> x$
$A[j \ldots r-1]$ : not yet examined
$A[r]$ : the pivot, waiting to be placed

As $j$ scans from $p$ to $r-1$ : if $A[j] \leq x$ , swap it into the $\leq x$ region (swap $A[j]$ with $A[i+1]$ ) and advance $i$ . If $A[j] > x$ , do nothing — the $> x$ region expands naturally.

After the loop, swap $A[i+1]$ with $A[r]$ to place the pivot. Now everything to the left of position $i+1$ is $\leq x$ and everything to the right is $> x$ , and $A[i+1] = x$ is in its final position.

Traced Partition Example

Array: $[7, 2, 5, 9, 1, 4]$ , pivot = $A[6] = 4$ , $i = 0$

$j=1$ : $A[1]=7 > 4$ , nothing.
$j=2$ : $A[2]=2 \leq 4$ . Swap $A[1]$ and $A[2]$ . $i=1$ . Array: $[2,7,5,9,1,4]$
$j=3$ : $A[3]=5 > 4$ , nothing.
$j=4$ : $A[4]=9 > 4$ , nothing.
$j=5$ : $A[5]=1 \leq 4$ . Swap $A[2]$ and $A[5]$ . $i=2$ . Array: $[2,1,5,9,7,4]$
Loop ends. Swap $A[3]$ and $A[6]$ . Array: $[2,1,4,9,7,5]$
Return $q = 3$ .

Verify: $A[1 \ldots 2] = [2,1]$ are all $\leq 4$ . $A[3] = 4$ is the pivot in its final position. $A[4 \ldots 6] = [9,7,5]$ are all $> 4$ . ✓

Partition always costs $\Theta(r - p + 1)$ : it scans every element exactly once. It is in-place (only swaps) but not stable (swaps disrupt relative order of equal elements).

Worst-Case Analysis

The worst case for QuickSort is when every partition splits the array as unequally as possible: one subarray has size 0 and the other has size $n-1$ .

When does this happen? When the pivot is always the smallest or largest element in the subarray. For example, if the array is already sorted (or reverse-sorted) and you always pick the last element as pivot, every partition is maximally unbalanced.

Recurrence:

$T(n) = T(n-1) + T(0) + \Theta(n) = T(n-1) + \Theta(n)$

( $T(0) = O(1)$ , the empty subarray costs nothing.)

Unrolling the recurrence:

$T(n) = T(n-1) + n \approx T(n-2) + (n-1) + n \approx \cdots \approx \sum_{k=1}^{n} k = \frac{n(n+1)}{2} = \Theta(n^2)$

Worst-case QuickSort is as slow as SelectionSort. The recursion tree is a completely unbalanced chain of depth $n$ .

Best-Case Analysis

The best case is when every partition splits the array exactly in half: two subarrays of size $n/2$ .

Recurrence:

$T(n) = 2T(n/2) + \Theta(n)$

This is identical to MergeSort's recurrence. By the Master Theorem: $T(n) = \Theta(n \log n)$ .

The recursion tree is perfectly balanced, with $\log n$ levels, $\Theta(n)$ work per level.

Average-Case Analysis

In the general case, the partition lands at some position $i$ (0-indexed), creating subarrays of size $i$ and $n - i - 1$ :

$T(n) = T(i) + T(n - i - 1) + \Theta(n)$

The value of $i$ changes at every recursive call — it depends on the pivot value relative to the subarray. Under the assumption that all $n$ possible split positions are equally likely (each with probability $1/n$ ), the average cost is:

$T(n) = \frac{1}{n}\sum_{i=0}^{n-1}\bigl[T(i) + T(n-i-1)\bigr] + n$

Since $\sum_{i=0}^{n-1} T(n-i-1) = \sum_{i=0}^{n-1} T(i)$ (the same terms, just listed in reverse order):

$T(n) = \frac{2}{n}\sum_{i=0}^{n-1} T(i) + n$

Solving the Average-Case Recurrence

Multiply both sides by $n$ :

$n \cdot T(n) = 2\sum_{i=0}^{n-1} T(i) + n^2 \tag{1}$

Replace $n$ with $n-1$ :

$(n-1) \cdot T(n-1) = 2\sum_{i=0}^{n-2} T(i) + (n-1)^2 \tag{2}$

Subtract (2) from (1). The sums almost cancel — only $T(n-1)$ remains from the left sum:

$n \cdot T(n) - (n-1) \cdot T(n-1) = 2T(n-1) + 2n - 1$

$n \cdot T(n) = (n+1) \cdot T(n-1) + 2n - 1$

Divide both sides by $n(n+1)$ :

$\frac{T(n)}{n+1} = \frac{T(n-1)}{n} + \frac{2}{n+1} - \frac{1}{n(n+1)} \leq \frac{T(n-1)}{n} + \frac{2}{n+1}$

Now define $S(n) = T(n)/(n+1)$ . The inequality becomes:

$S(n) \leq S(n-1) + \frac{2}{n+1}$

Apply this telescopically $n-2$ times:

$S(n) \leq S(1) + \sum_{k=3}^{n+1}\frac{2}{k} = \frac{1}{2} + 2\sum_{k=3}^{n+1}\frac{1}{k}$

The harmonic sum $\sum_{k=1}^{m} 1/k = H_m \approx \ln m$ . So:

$S(n) \leq \frac{1}{2} + 2\ln n$

Multiplying back:

$T(n) = (n+1) \cdot S(n) \leq (n+1)\left(\frac{1}{2} + 2\ln n\right) = O(n \log n)$

The average case is $O(n \log n)$ .

The constant hidden in the $O$ is about $2\ln n \approx 1.386 \log_2 n$ , roughly $1.39$ times the constant for MergeSort. In practice, QuickSort's cache-friendliness and low overhead make it faster than MergeSort despite the slightly larger constant.

The Pivot Problem and Randomization

The average-case analysis assumes all $n$ split positions are equally likely. With the deterministic pivot (always A[r]), this assumption is about the input. Nearly-sorted arrays violate it badly — they trigger the worst case.

Fix: choose the pivot uniformly at random from $A[p \ldots r]$ .

RandomizedPartition(A[1..n], p, r) -> int:
    i = Random(p, r)    // uniform random index in [p, r]
    Swap(A, i, r)
    return Partition(A, p, r)

With a random pivot, the probability of a particular split is $1/n$ regardless of the input order. The "equally likely" assumption is now a property of the algorithm, not a hope about the input. The expected running time is $O(n \log n)$ for every possible input.

Sorting in Python: TimSort and PowerSort

Python's sorted() and list.sort() use PowerSort (since Python 3.11), a refined version of TimSort (Python 2.3 to 3.10).

TimSort is a hybrid: it exploits structure already present in the data.

The algorithm identifies runs — maximal already-sorted (or reverse-sorted, which it flips) contiguous subarrays. Short runs are extended using InsertionSort up to a minimum size (typically 32–64 elements). Then runs are merged using a MergeSort-style merge procedure. Runs of similar size are merged preferentially to maintain balance.

Why this is clever: real-world data often has natural order already embedded in it. TimSort finds and exploits this structure instead of ignoring it.

Complexity: $O(n)$ best case (already sorted), $O(n \log n)$ worst and average case. Stable. Not in-place.

PowerSort improves the run-merging strategy using a near-optimal merge tree, reducing the number of comparisons in practice.

Summary Table — Comparison-Based Algorithms

Algorithm	Best	Average	Worst	In-Place	Stable
SelectionSort	$\Theta(n^2)$	$\Theta(n^2)$	$\Theta(n^2)$	Yes	No
InsertionSort	$\Theta(n)$	$\Theta(n^2)$	$\Theta(n^2)$	Yes	Yes
MergeSort	$\Theta(n \log n)$	$\Theta(n \log n)$	$\Theta(n \log n)$	No	Yes
QuickSort	$\Theta(n \log n)$	$O(n \log n)$	$\Theta(n^2)$	Yes	No
TimSort	$O(n)$	$O(n \log n)$	$O(n \log n)$	No	Yes

No single algorithm dominates on all axes. This motivates the next question.

Part III — The Lower Bound

Can We Do Better Than $O(n \log n)$ ?

Can any comparison-based algorithm sort faster than $O(n \log n)$ in the worst case?

No. This is a theorem, not an empirical observation.

Decision Trees

Any comparison sort can be modeled as a decision tree. Each internal node represents a comparison $A[i]$ vs $A[j]$ . The left branch is taken if $A[i] \leq A[j]$ , the right branch if $A[i] > A[j]$ . Each leaf represents a specific output permutation.

A run of the algorithm on a given input traces a root-to-leaf path. The number of comparisons for that input equals the depth of the leaf. The worst-case number of comparisons equals the height of the tree.

Every comparison sort has its own decision tree, but all correct sorts share a key property: the tree must have at least $n!$ leaves, one for each possible permutation of $n$ elements (any permutation could be the correct answer for some input, so the algorithm must be able to output all of them).

Key Lemma: Height of a Binary Tree

Theorem. A binary tree with $k$ leaves where every internal node has exactly two children has height at least $\log_2 k$ .

Proof by induction on $k$ .

Base case ( $k = 1$ ): the tree is a single node (the leaf). Height $= 0 \geq \log_2 1 = 0$ . ✓

Inductive step: assume the result holds for all trees with fewer than $k$ leaves. Let $T_k$ be a tree with $k$ leaves. It has a root with two subtrees $T_{k_1}$ (left) and $T_{k_2}$ (right), where $k_1 + k_2 = k$ and without loss of generality $k_1 \geq k_2 \geq 1$ .

By the inductive hypothesis (since $k_1, k_2 < k$ ):

$h(T_k) = 1 + \max(h(T_{k_1}), h(T_{k_2}))$

$\geq 1 + h(T_{k_1}) \qquad\text{(take the larger subtree)}$

$\geq 1 + \log_2 k_1 \qquad\text{(inductive hypothesis)}$

$= \log_2 2 + \log_2 k_1 = \log_2 2k_1$

$\geq \log_2(k_1 + k_2) = \log_2 k \qquad\text{(since } k_1 \geq k_2 \text{ implies } 2k_1 \geq k_1 + k_2\text{)}$

So $h(T_k) \geq \log_2 k$ . ✓

The $\Omega(n \log n)$ Lower Bound

Theorem. Any comparison sort requires $\Omega(n \log n)$ comparisons in the worst case.

Proof. A correct comparison sort's decision tree must have at least $n!$ leaves (it must be able to output every permutation). By the lemma above, its height is at least $\log_2(n!)$ .

By Stirling's approximation, $\log_2(n!) = \Omega(n \log n)$ . (More precisely: $n! \geq (n/e)^n$ , so $\log_2(n!) \geq n\log_2(n/e) = n\log_2 n - n\log_2 e = \Omega(n \log n)$ .)

The worst-case number of comparisons $\geq$ height of decision tree $\geq \log_2(n!) = \Omega(n \log n)$ . ✓

Consequence: MergeSort is asymptotically optimal among comparison sorts. Its worst-case cost $\Theta(n \log n)$ matches the lower bound exactly.

Important caveat: this lower bound applies only to algorithms that determine order exclusively through pairwise comparisons. If you know something extra about the keys — like they are integers in a bounded range — you can exploit that structure and break the barrier.

Part IV — Non-Comparison Sorting

The Key Insight

The $\Omega(n \log n)$ bound comes from modeling sorts as decision trees, where each node is a comparison. If instead your algorithm uses arithmetic on the keys (like treating a key as an array index), it is not a comparison sort and the lower bound does not apply.

The tradeoff: you need assumptions about the structure of the keys.

CountingSort

Assumption

All keys are integers in the range $[a, b]$ .

The Idea

Instead of comparing elements, count how many times each possible key value appears. Then reconstruct the sorted array directly from the counts.

Simple Version (Integers Only)

CountingSort(A[1..n]):
    a = min(A),  b = max(A),  k = b - a + 1
    B = new array of size k, initialized to 0

    // Count occurrences
    for i = 1 to n:
        B[A[i] - a + 1]  +=  1

    // Reconstruct A from counts
    j = 1
    for i = 1 to k:
        while B[i] > 0:
            A[j] = i + a - 1
            B[i] -= 1
            j += 1

The index shift $A[i] - a + 1$ maps the range $[a, b]$ to $[1, k]$ , so it works even if $a$ is negative.

This version is not stable and destroys associated data — it just reconstructs values, not the original elements.

Stable Version (With Associated Data)

To sort elements that carry data alongside their key, we need to know where each element goes, not just how many there are. We use prefix sums.

After counting, transform $B$ so that $B[i]$ holds the number of elements with key $\leq i + a - 1$ . This gives us the last position where a key- $i$ element should land in the output.

To preserve stability, iterate $A$ from right to left during placement. This ensures that among elements with the same key, the one appearing later in $A$ (higher index) is placed later in $C$ (higher index) — preserving relative order.

CountingSort(A[1..n]):
    a = minkey(A),  b = maxkey(A),  k = b - a + 1
    B = new array of size k, initialized to 0
    C = new array of size n

    // Count
    for i = 1 to n:
        B[A[i].key - a + 1] += 1

    // Prefix sums: B[i] = number of keys <= i + a - 1
    for i = 2 to k:
        B[i] = B[i] + B[i-1]

    // Place elements (right to left for stability)
    for i = n downto 1:
        C[B[A[i].key - a + 1]] = A[i]
        B[A[i].key - a + 1] -= 1

    // Copy back
    for i = 1 to n:
        A[i] = C[i]

Why the Right-to-Left Pass Ensures Stability

Suppose two elements $A[s]$ and $A[t]$ have the same key, with $s < t$ (so $A[s]$ appeared earlier). After prefix sums, both would map to the same position range. The right-to-left pass processes $A[t]$ first (it has the higher index). It takes the current value $B[\ldots]$ and then decrements it. When $A[s]$ is processed later, it gets the decremented value — a lower index in $C$ . Lower index in $C$ means earlier in the output. So $A[s]$ (earlier in input) ends up earlier in output. ✓

Time Complexity

Computing min and max: $\Theta(n)$ . Count loop: $\Theta(n)$ . Prefix sum loop: $\Theta(k)$ . Placement loop: $\Theta(n)$ . Copy loop: $\Theta(n)$ .

Total: $\Theta(n + k)$ for all cases (best, average, worst).

Critical observation: if $k = O(n)$ (the range is proportional to the number of elements), this is $\Theta(n)$ — genuinely linear. But if $k \gg n$ , CountingSort is wasteful. Sorting $A = [1, 2^{100}]$ gives $n = 2$ but $k = 2^{100}$ : catastrophically bad.

CountingSort is stable (in the general version) and not in-place (needs arrays $B$ of size $k$ and $C$ of size $n$ ).

RadixSort

Assumption

Keys are composed of digits or characters — integers, strings, etc. The radix $r$ is the number of distinct digit values (10 for decimal, 26 for lowercase letters, 256 for bytes, etc.).

The Idea

Sort digit by digit, starting from the least significant digit (rightmost), using a stable sort at each digit level.

RadixSort(A[1..n]):
    d = max number of digits in any key
    for i = 1 to d:
        stable sort A on digit i (digit 1 = rightmost)

Why Least-Significant-Digit First?

This is the non-obvious part of the algorithm.

Invariant: after processing digits $1$ through $i$ , the array is sorted with respect to its last $i$ digits.

Base case ( $i = 0$ ): trivially true.

Inductive step: assume after $i$ passes, the array is sorted by its last $i$ digits. Now we sort by digit $i+1$ using a stable sort. After this sort:

Elements with different digit $i+1$ values are in the correct order relative to each other (the new sort handles this).
Elements with the same digit $i+1$ value remain sorted by their last $i$ digits (stability of the new sort guarantees this — equal elements are not reordered).

So after $i+1$ passes, the array is sorted by its last $i+1$ digits. ✓

After all $d$ passes, it is sorted by all $d$ digits = fully sorted.

Why not most-significant-digit first? If you sort by the most significant digit first, elements that share that digit need to be sorted among themselves by the remaining digits — you'd need to recursively sort each bucket. That is essentially a different (and more complex) algorithm.

Why Stability Is Not Optional

Consider keys $123$ and $153$ . After sorting by the last digit, they might be in order $[123, 153]$ (both have 3, their relative order depends on what came before). After sorting by the middle digit, $2 < 5$ , so $123$ should stay before $153$ . A stable sort guarantees that elements with the same digit in the current position are not reordered — so the ordering from previous passes is preserved. A non-stable sort would randomly scramble the previous work.

Time Complexity

Using CountingSort as the per-digit stable sort, each of the $d$ passes costs $\Theta(n + k)$ where $k$ is the number of possible digit values (the radix).

Total: $\Theta(d(n + k))$ .

If $k = O(n)$ and $d$ is constant (key length doesn't grow with $n$ ), this is $\Theta(n)$ — linear.

RadixSort is stable and not in-place.

Final Summary

Algorithm	Best	Average	Worst	In-Place	Stable
SelectionSort	$\Theta(n^2)$	$\Theta(n^2)$	$\Theta(n^2)$	Yes	No
InsertionSort	$\Theta(n)$	$\Theta(n^2)$	$\Theta(n^2)$	Yes	Yes
MergeSort	$\Theta(n \log n)$	$\Theta(n \log n)$	$\Theta(n \log n)$	No	Yes
QuickSort	$\Theta(n \log n)$	$O(n \log n)$	$\Theta(n^2)$	Yes	No
TimSort	$O(n)$	$O(n \log n)$	$O(n \log n)$	No	Yes
CountingSort	$\Theta(n+k)$	$\Theta(n+k)$	$\Theta(n+k)$	No	Yes
RadixSort	$\Theta(d(n+k))$	$\Theta(d(n+k))$	$\Theta(d(n+k))$	No	Yes

Where: $n$ = number of elements, $k$ = range of key values (or digit values), $d$ = maximum number of digits in a key.

The Narrative Arc

The incremental algorithms (SelectionSort, InsertionSort) are simple and quadratic — they don't exploit structure. The divide-and-conquer algorithms (MergeSort, QuickSort) break through to $O(n \log n)$ by decomposing the problem recursively. The lower bound theorem proves this is optimal for any comparison sort — no pairwise-comparison algorithm can do better. CountingSort and RadixSort escape this bound by exploiting key structure (bounded integers, digit decomposition), achieving linear time under the right conditions. Each improvement comes not from coding tricks but from a deeper understanding of what information the algorithm has access to and how to use it.

The Problem We Are Solving

Two Properties That Classify Every Sorting Algorithm

In-Place

Stable

The Array Data Structure

Part I — Incremental Algorithms

SelectionSort

The Idea

Pseudocode

Traced Example

Why It Is Not Stable

Time Complexity

InsertionSort

The Idea

Pseudocode

Traced Example

Why It Is Stable

Time Complexity — Worst Case

Time Complexity — Best Case

Time Complexity — Average Case

Nearly Sorted Arrays — A Key Insight

Part II — Divide and Conquer

MergeSort

The Algorithm

The Merge Procedure

Why Merge Is Θ(r−p+1)\Theta(r - p + 1)Θ(r−p+1)

Why MergeSort Is Stable

MergeSort Is Not In-Place

Time Complexity

QuickSort

The Algorithm

The Partition Procedure

Traced Partition Example

Worst-Case Analysis

Best-Case Analysis

Average-Case Analysis

Solving the Average-Case Recurrence

The Pivot Problem and Randomization

Sorting in Python: TimSort and PowerSort

Summary Table — Comparison-Based Algorithms

Part III — The Lower Bound

Can We Do Better Than O(nlog⁡n)O(n \log n)O(nlogn)?

Decision Trees

Key Lemma: Height of a Binary Tree

The Ω(nlog⁡n)\Omega(n \log n)Ω(nlogn) Lower Bound

Part IV — Non-Comparison Sorting

The Key Insight

CountingSort

Assumption

The Idea

Simple Version (Integers Only)

Stable Version (With Associated Data)

Why the Right-to-Left Pass Ensures Stability

Time Complexity

RadixSort

Assumption

The Idea

Why Least-Significant-Digit First?

Why Stability Is Not Optional

Time Complexity

Final Summary

The Narrative Arc

Why Merge Is $\Theta(r - p + 1)$

Can We Do Better Than $O(n \log n)$ ?

The $\Omega(n \log n)$ Lower Bound