ADS — Algorithms & Data Structures

The previous lecture established that the same function can be computed by algorithms that differ by an exponential factor in efficiency. What we need now is a precise, machine-independent language for describing and comparing those differences. That language is asymptotic notation. This lecture also introduces a more careful taxonomy of what it means to measure the cost of an algorithm — distinguishing best, worst, and average case — and a fourth mode of analysis, amortized cost, which applies when we care about the cost of a sequence of operations rather than a single one.

The Cost Function

We define a cost function $f(n) \geq 0$ as the amount of a given computational resource (running time or memory) required by some algorithm on inputs of size $n$ , where $n \geq 0$ .

Two design choices are embedded in this definition. First, we consider only non-negative input sizes and resource amounts — negative inputs and negative costs are not meaningful here. Second, the output $f(n)$ is real-valued even though input size is an integer; this allows intermediate calculations to produce non-integer quantities before we classify them asymptotically.

What we want from $f(n)$ is its growth rate: how fast the resource usage scales as $n$ increases. Two things are deliberately ignored:

Constant factors. An algorithm that does $3n$ operations and one that does $100n$ operations both scale linearly. The factor 3 vs 100 is machine-dependent, compiler-dependent, and irrelevant to the asymptotic class.
Low-order terms. In $f(n) = n^2 + 10n + 5$ , the $10n + 5$ term becomes negligible relative to $n^2$ as $n$ grows. For large $n$ , only the dominant term matters.

This is not an approximation for convenience — it is the right abstraction. What determines whether an algorithm is usable in practice is the growth rate of its cost, not the exact constant.

Asymptotic Notation

There are five standard notations. Three are primary (Big- $O$ , Big- $\Omega$ , $\Theta$ ) and two are strict variants (little- $o$ , little- $\omega$ ). Each defines a set of functions, not a single function. When we write $f(n) = O(g(n))$ , the technically correct statement is $f(n) \in O(g(n))$ — we are saying that $f$ belongs to the class of functions bounded above by $g$ up to a constant. The equality notation is conventional abuse, and we use it throughout.

Big- $O$ (Upper Bound)

$O(g(n)) = \{ f(n) \mid \exists\, c > 0,\, n_0 \geq 0 \text{ such that } \forall n \geq n_0,\, f(n) \leq c\,g(n) \}$

$O(g(n))$ is the set of functions whose order of growth is at most that of $g(n)$ . The constants $c$ and $n_0$ exist but need not be unique — any valid pair witnesses membership in the set. Note that $g(n) = O(g(n))$ : a function is always an upper bound on itself.

Worked example. Prove $f(n) = 3n^2 + 10n = O(n^2)$ , i.e., take $g(n) = n^2$ .

We need $c > 0$ and $n_0 \geq 0$ such that $3n^2 + 10n \leq cn^2$ for all $n \geq n_0$ . Dividing through by $n^2$ (valid for $n > 0$ ):

$c \geq \frac{3n^2 + 10n}{n^2} = 3 + \frac{10}{n}$

For $n \geq 1$ , the right-hand side is at most $3 + 10 = 13$ , so any $c \geq 13$ and $n_0 = 1$ works. Alternatively, $c = 4$ and $n_0 = 10$ also works, since for $n \geq 10$ the term $10/n \leq 1$ gives $3 + 10/n \leq 4$ . There is no unique witness — what matters is that one exists.

Big- $\Omega$ (Lower Bound)

$\Omega(g(n)) = \{ f(n) \mid \exists\, c > 0,\, n_0 \geq 0 \text{ such that } \forall n \geq n_0,\, f(n) \geq c\,g(n) \}$

$\Omega(g(n))$ is the set of functions whose order of growth is at least that of $g(n)$ . It is the mirror of Big- $O$ : where Big- $O$ says the function does not grow faster than $g$ , Big- $\Omega$ says it does not grow slower.

Worked example. Prove $f(n) = n^3 + 2n^2 = \Omega(n^2)$ .

We need $c > 0$ and $n_0 \geq 0$ such that $n^3 + 2n^2 \geq cn^2$ for all $n \geq n_0$ . Dividing by $n^2$ :

$c \leq \frac{n^3 + 2n^2}{n^2} = n + 2$

For $n \geq 0$ , the right-hand side is always $\geq 2$ , so any $c \leq 2$ and $n_0 = 0$ works. We can also choose $c = 12$ with $n_0 = 10$ , since for $n \geq 10$ we have $n + 2 \geq 12$ .

Theta (Tight Bound)

$\Theta(g(n)) = \{ f(n) \mid \exists\, c_1, c_2 > 0,\, n_0 \geq 0 \text{ such that } \forall n \geq n_0,\, c_1\,g(n) \leq f(n) \leq c_2\,g(n) \}$

$\Theta(g(n))$ is the set of functions that are asymptotically equivalent to $g(n)$ : squeezed between two constant multiples of $g$ for all sufficiently large $n$ . This is the tightest classification — it gives both an upper and lower bound simultaneously.

Key theorem. $f(n) = \Theta(g(n))$ if and only if $f(n) = O(g(n))$ and $f(n) = \Omega(g(n))$ .

This means you can prove a $\Theta$ bound by proving both an $O$ and an $\Omega$ bound separately.

Worked example. Prove $f(n) = n^3 + 2n^2 = \Theta(n^3)$ .

We need $c_1, c_2 > 0$ and $n_0 \geq 0$ such that $c_1 n^3 \leq n^3 + 2n^2 \leq c_2 n^3$ . Dividing through by $n^3$ :

$c_1 \leq 1 + \frac{2}{n} \leq c_2$

For $n \geq 1$ , the middle expression satisfies $1 \leq 1 + 2/n \leq 3$ , so $c_1 = 1$ , $c_2 = 3$ , $n_0 = 1$ works.

Little- $o$ (Strict Upper Bound)

$o(g(n)) = \{ f(n) \mid \forall c > 0,\, \exists n_0 \geq 0 \text{ such that } \forall n \geq n_0,\, f(n) < c\,g(n) \}$

The critical difference from Big- $O$ : the quantifier on $c$ flips. Big- $O$ requires the bound to hold for some $c > 0$ . Little- $o$ requires it to hold for all $c > 0$ . This means $f$ must eventually be crushed below any positive multiple of $g$ , no matter how small — $g$ grows strictly faster than $f$ .

A direct consequence: $g(n) \neq o(g(n))$ for any $g$ . A function cannot dominate itself strictly. In contrast, $g(n) = O(g(n))$ is always true. So $n^2 = O(n^2)$ but $n^2 \neq o(n^2)$ , while $n = o(n^2)$ .

If $f(n) = o(g(n))$ then certainly $f(n) = O(g(n))$ . The converse does not hold in general.

Little- $\omega$ (Strict Lower Bound)

$\omega(g(n)) = \{ f(n) \mid \forall c > 0,\, \exists n_0 \geq 0 \text{ such that } \forall n \geq n_0,\, f(n) > c\,g(n) \}$

The strict analog of Big- $\Omega$ : $f$ must eventually exceed every positive multiple of $g$ . Again, $g(n) \neq \omega(g(n))$ for any $g$ , and $f(n) = \omega(g(n))$ implies $f(n) = \Omega(g(n))$ .

Connecting to Limits

The five notations can be identified via limits, which is often the most direct way to classify a given pair of functions:

$\lim_{n \to \infty} \frac{f(n)}{g(n)} = 0 \implies f(n) = o(g(n)) \implies f(n) = O(g(n))$

$\lim_{n \to \infty} \frac{f(n)}{g(n)} = \infty \implies f(n) = \omega(g(n)) \implies f(n) = \Omega(g(n))$

$\lim_{n \to \infty} \frac{f(n)}{g(n)} = k > 0 \implies f(n) = \Theta(g(n))$

The limit characterization is a practical tool. To show $n \log n = o(n^2)$ , compute $\lim_{n \to \infty} n \log n / n^2 = \lim_{n \to \infty} \log n / n = 0$ (by L'Hôpital or standard results). Done.

Properties and Rules

Structural Properties

The five notations satisfy a collection of algebraic properties. These are not incidental — they encode the behavior of the $\leq, <, =, >, \geq$ ordering on growth rates.

Transitivity holds for all five notations. For example: if $f(n) = O(g(n))$ and $g(n) = O(h(n))$ , then $f(n) = O(h(n))$ . This makes the ordering transitive, as expected.

Reflexivity holds for $O$ , $\Omega$ , and $\Theta$ — a function is its own upper bound, lower bound, and tight bound. It does not hold for $o$ or $\omega$ , since those are strict relations.

Symmetry is unique to $\Theta$ : $f(n) = \Theta(g(n)) \iff g(n) = \Theta(f(n))$ . Two functions in the same asymptotic class are mutually equivalent.

Transpose symmetry links $O$ to $\Omega$ and $o$ to $\omega$ :

$f(n) = O(g(n)) \iff g(n) = \Omega(f(n))$ $f(n) = o(g(n)) \iff g(n) = \omega(f(n))$

This makes sense: if $f$ is upper-bounded by $g$ , then $g$ is lower-bounded by $f$ .

Computation Rules

Three rules make it easy to compose asymptotic bounds:

Sum rule. If $f_1(n) = O(g_1(n))$ and $f_2(n) = O(g_2(n))$ , then $f_1(n) + f_2(n) = O(g_1(n) + g_2(n))$ . In practice, since $O(g_1 + g_2) = O(\max(g_1, g_2))$ , the complexity of a sum is dominated by the larger term.

Product rule. If $f_1(n) = O(g_1(n))$ and $f_2(n) = O(g_2(n))$ , then $f_1(n) \cdot f_2(n) = O(g_1(n) \cdot g_2(n))$ .

Constant multiplication. If $f(n) = O(g(n))$ and $a > 0$ , then $a \cdot f(n) = O(g(n))$ . Multiplying by a positive constant does not change the asymptotic class — constants are absorbed.

These rules also hold for $\Omega$ , $\Theta$ , $o$ , and $\omega$ , with appropriate care.

Asymptotic Notation Inside Equations

Asymptotic notation can appear inside expressions. The notation $2n^2 + \Theta(n)$ means "there exists some function $f(n) \in \Theta(n)$ such that the expression equals $2n^2 + f(n)$ ." The statement $2n^2 + \Theta(n) = \Theta(n^2)$ asserts that for any such $f(n)$ , the resulting sum is in $\Theta(n^2)$ . This use is legitimate — it lets us suppress explicit intermediate terms while preserving the asymptotic claim.

Incomparable Functions

Unlike the real numbers, asymptotic growth rates are not always comparable. For real numbers $a$ and $b$ , exactly one of $a < b$ , $a = b$ , $a > b$ holds. For functions this is not guaranteed. The pair $f(n) = n$ and $g(n) = n^{\sin(n) + 1}$ is a concrete counterexample: when $\sin(n) = -1$ we get $g(n) = 1$ , so $f$ dominates $g$ ; when $\sin(n) = 1$ we get $g(n) = n^2$ , so $g$ dominates $f$ . Neither $f(n) = O(g(n))$ nor $f(n) = \Omega(g(n))$ holds, and no $\Theta$ relationship exists. Such pairs are incomparable under asymptotic ordering.

Orders of Growth

The following table lists the standard complexity classes in increasing order of growth rate. Each row is strictly $\omega$ of all rows above it.

Class	Name
$O(1)$	Constant
$O(\log n)$	Logarithmic
$O(\log^k n)$ , $k \geq 1$	Polylogarithmic
$O(n^k)$ , $0 < k < 1$	Sublinear
$O(n)$	Linear
$O(n \log n)$	Pseudolinear
$O(n^k)$ , $k > 1$	Polynomial
$O(n^2)$	Quadratic
$O(n^3)$	Cubic
$O(c^n)$ , $c > 1$	Exponential
$O(n!)$	Factorial
$O(n^n)$	Exponential (base $n$ )

Convention used throughout: $\log n = \log_2 n$ , and $\log^k n = (\log n)^k$ .

The gap between polynomial and exponential classes is enormous in practice, not just theoretically. The dividing line between $O(n^k)$ and $O(c^n)$ is roughly the dividing line between "feasible for large inputs" and "feasible only for small inputs." This is why the class $\mathbf{P}$ (problems solvable in polynomial time) is the standard definition of tractability in complexity theory.

Computational Complexity

Having established the notation, we can now define complexity precisely.

Computational complexity of an algorithm. An algorithm $\mathcal{A}$ has computational complexity $O(f(n))$ with respect to a given resource if the amount of that resource required to run $\mathcal{A}$ on inputs of size $n$ is $O(f(n))$ .

Computational complexity of a problem. A problem $\mathcal{P}$ has computational complexity $O(f(n))$ with respect to a given resource if the best algorithm solving $\mathcal{P}$ has that cost.

The distinction matters. An algorithm's complexity is a property of that specific algorithm. A problem's complexity is a property of the problem itself — a lower bound on what any algorithm solving it must cost. Proving a tight problem complexity requires both an algorithm achieving a given bound and a proof that no algorithm can do better.

The same definitions apply to all five asymptotic notations, not just Big- $O$ .

Best, Worst, and Average Case

For a fixed input size $n$ , different inputs of that size can cause the algorithm to behave very differently. This motivates three distinct modes of analysis.

Best case describes the algorithm's cost on the most favorable input of size $n$ . It is a lower bound on the actual cost, but usually not very informative for algorithm design — the best case may be a rare or unrepresentative scenario.

Worst case describes the cost on the most unfavorable input of size $n$ . This is typically the most important quantity: it is a guarantee that holds for every input, regardless of what the input happens to be. When designing algorithms, worst-case performance is the primary target.

Average case describes the expected cost under some probability distribution over inputs of size $n$ . This is the most informative measure when inputs are drawn from a realistic distribution, but it requires specifying and justifying that distribution — a non-trivial assumption.

Linear Search

function linsearch(int A[1···n], int x) → int:
    for i = 1, ···, n do:
        if A[i] == x then:
            return i
    return −1

Best case. $x$ is the first element. The loop executes once and returns. Cost: $O(1)$ .

Worst case. $x$ is the last element or is absent. The loop runs through all $n$ elements. Cost: $\Theta(n)$ .

Average case. Assume uniform probability: $x$ is at position $i$ with probability $P_i = \frac{1}{n+1}$ for $i = 1, \ldots, n$ , and $x$ is absent (treated as "position $n+1$ ") also with probability $\frac{1}{n+1}$ . The cost of finding $x$ at position $i$ is $T_i = i$ (we inspect $i$ elements), and the cost of determining absence is $T_{n+1} = n+1$ (a full scan).

$\text{Average cost} = \sum_{i=1}^{n+1} P_i T_i = \frac{1}{n+1} \sum_{i=1}^{n+1} i = \frac{1}{n+1} \cdot \frac{(n+1)(n+2)}{2} = \frac{n+2}{2} = \Theta(n)$

In this case best/worst differ but the average and worst are both $\Theta(n)$ .

Binary Search

Binary search requires a sorted array. Rather than scanning sequentially, it checks the midpoint and eliminates half the remaining search space at each step.

function binsearch(int A[1···n], int x) → int:
    i = 1, j = n
    while i ≤ j do:
        m = (i + j)/2
        if A[m] == x then:
            return m
        else if A[m] < x then:
            i = m + 1
        else:
            j = m − 1
    return −1

Best case. $x$ equals the middle element $A[\lfloor(1+n)/2\rfloor]$ on the first iteration. Cost: $O(1)$ .

Worst case. $x$ is absent. Each iteration halves the search space. Let the maximum number of iterations be $i$ . After $i$ iterations, the search space has size $n/2^{i-1}$ . The loop terminates when this falls below 1:

$\frac{n}{2^{i-1}} < 1 \implies n < 2^{i-1} \implies \log_2 n < i - 1 \implies i > \log_2 n + 1$

The loop therefore executes at most $\Theta(\log n)$ iterations, each doing $O(1)$ work. Worst-case cost: $\Theta(\log n)$ .

Average case. Under uniform probability $P_i = 1/n$ (excluding the missing-element case to simplify), the cost for position $i$ depends on how many steps binary search takes to reach it. The structure of binary search on $n$ elements is a balanced binary tree: the root (midpoint) requires 1 step, its two children require 2 steps, their four children 3 steps, and so on. There is 1 element at depth 1, 2 at depth 2, 4 at depth 3, and $2^{k-1}$ at depth $k \leq \log_2 n$ .

The average cost is:

$T(n) \leq \frac{1}{n} \sum_{i=1}^{\log_2 n} i \cdot 2^{i-1}$

To bound this sum, replace each $i$ with its maximum value $\log_2 n$ :

$\frac{1}{n} \sum_{i=1}^{\log_2 n} i \cdot 2^{i-1} \leq \frac{\log_2 n}{n} \sum_{i=1}^{\log_2 n} 2^i = \frac{\log_2 n}{n} \cdot \frac{2^{\log_2 n + 1} - 2}{2 - 1} = \frac{\log_2 n}{n} \cdot (2n - 2)$

which is $O(\log n)$ . The average case matches the worst case: $O(\log n)$ .

Minimum Value Search

function min(int A[1···n]) → int:
    m = 1
    for i = 2, ···, n do:
        if A[i] < A[m] then:
            m = i
    return m

The loop always runs exactly $n-1$ times with no early termination possible — you cannot know you have found the minimum without examining every element. Best, worst, and average cases all coincide: $\Theta(n)$ .

[!NOTE] When best, worst, and average cases all give the same bound, that bound is tight in a strong sense: no input structure can help the algorithm, and none can hurt it. $\Theta$ notation is appropriate here, not just $O$ .

Amortized Analysis

Best, worst, and average case analysis all measure the cost of a single operation. Amortized analysis is a different kind of question: what is the average cost per operation over a sequence of $n$ operations?

This is not the same as average-case analysis. Average case uses a probability distribution over inputs. Amortized analysis makes no probabilistic assumptions — it analyzes a concrete sequence of operations and gives a guaranteed average cost per operation in the worst case over all possible sequences.

Why is this needed? Some algorithms have operations that are cheap most of the time but occasionally expensive. Worst-case analysis of a single operation in isolation may be too pessimistic about the sequence as a whole — it imagines every operation hitting the expensive case, which may be structurally impossible. Amortized analysis captures the true cost more precisely.

The Binary Counter

Consider incrementing a $k$ -bit binary counter stored in an array $A[1 \ldots k]$ (most significant bit at position 1).

function increment(int A[1···k]):
    i = k
    while i ≥ 1 and A[i] == 1 do:
        A[i] = 0
        i = i − 1
    if i ≥ 1 then:
        A[i] = 1

The logic: scan from the least significant bit, flipping 1s to 0s until you hit a 0, then flip that 0 to 1. This is standard binary addition.

Best case. The least significant bit is 0 — one flip suffices. Cost: $O(1)$ .

Worst case. The counter holds $111\ldots1$ (all ones) — all $k$ bits flip. Cost: $O(k)$ .

A naive worst-case bound for $n$ increments is therefore $O(nk)$ . But is it really possible for every increment to flip all $k$ bits? No. After an all-ones counter is incremented, it resets to zero. The expensive operation cannot repeat immediately. The question is how to quantify this precisely.

Aggregate Method

The aggregate method computes the total cost of $n$ operations directly and divides by $n$ .

Count how many times each bit position flips across $n$ increments. Bit position $k$ (the least significant) flips on every increment: $n$ times. Bit $k-1$ flips every 2 increments: $n/2$ times. Bit $k-j$ flips every $2^j$ increments: $n/2^j$ times. The total cost is:

$\sum_{j=0}^{k-1} \frac{n}{2^j} = n \sum_{j=0}^{k-1} \frac{1}{2^j} \leq n \sum_{j=0}^{\infty} \frac{1}{2^j} = n \cdot \frac{1}{1 - 1/2} = 2n$

The total cost of $n$ increments is $O(n)$ . The amortized cost per operation is therefore $O(n)/n = O(1)$ .

Note: the bound $\leq 2n$ holds even when $n > 2^k$ , because beyond $2^k$ increments the counter overflows and cycles — position $k$ is never flipped in those cases.

Accounting Method

The accounting method works by assigning a fictitious "charge" to each operation — an amortized cost — and maintaining a credit balance to handle operations whose actual cost exceeds their charge.

The rules are: (1) each operation is charged its amortized cost; (2) if actual cost $<$ amortized cost, the difference is saved as credit; (3) if actual cost $>$ amortized cost, the excess is paid from credit; (4) the credit balance must never go negative.

If we can show that a valid amortized cost assignment exists with the credit never going negative, then the sum of amortized costs over $n$ operations is an upper bound on the total actual cost.

For the binary counter, assign an amortized cost of 2 units to every increment:

Spend 1 unit to pay for the actual bit flip from 0 to 1.
Save 1 unit as credit on that bit.

When a bit is subsequently flipped from 1 to 0, the flip is paid using the 1 unit of credit sitting on that bit. We never charge separately for flips to 0.

Why the credit is never negative. Every 1-bit in the array has exactly 1 unit of credit on it (deposited when it was set to 1, not yet spent). Since the number of 1-bits is always $\geq 0$ , the total available credit is always $\geq 0$ .

Total cost bound. Each increment sets exactly one bit from 0 to 1, so each increment costs exactly 2 amortized units. For $n$ increments: total amortized cost $= 2n$ . Since the credit never goes negative, the total actual cost $\leq 2n = O(n)$ .

The amortized cost per operation is $O(1)$ .

When to Use Each Method

Method	Best suited for
Aggregate	When total cost over $n$ operations can be computed directly (e.g., counting bit flips across all operations)
Accounting	When operations are heterogeneous and assigning per-operation charges is more natural

Both methods give the same conclusion in the binary counter example. The accounting method generalizes more easily when a data structure has different operation types with different costs (e.g., a dynamic array that occasionally resizes).

[!NOTE] The amortized $O(1)$ cost of a counter increment does not mean each individual increment is cheap — single operations can still cost $O(k)$ . It means that averaged over a long sequence, the cost per operation is constant. This is the guarantee amortized analysis provides, and it is often the guarantee that actually matters for designing efficient systems.

Summary

The five asymptotic notations provide a complete vocabulary for describing growth rates. $O$ bounds from above, $\Omega$ from below, $\Theta$ pins both sides simultaneously. The little variants $o$ and $\omega$ express strict dominance — asymptotically one function is strictly absorbed into or exceeds any multiple of the other.

On top of this, computational complexity analysis has four distinct modes: best case, worst case, average case, and amortized. The worst case is the most generally useful — it gives an unconditional guarantee. Average case is more informative but requires a distributional assumption. Amortized analysis fills the remaining gap: it gives tight guarantees about sequences of operations without probabilistic assumptions, by accounting for the fact that expensive operations can only occur when previous cheap operations have "set up" the expensive state.

The binary counter is the canonical demonstration: a single operation can cost $O(k)$ , but the amortized cost over any sequence is $O(1)$ , because the state that causes an expensive operation always takes many cheap operations to recreate.