The previous lecture established that the same function can be computed by algorithms that differ by an exponential factor in efficiency. What we need now is a precise, machine-independent language for describing and comparing those differences. That language is asymptotic notation. This lecture also introduces a more careful taxonomy of what it means to measure the cost of an algorithm — distinguishing best, worst, and average case — and a fourth mode of analysis, amortized cost, which applies when we care about the cost of a sequence of operations rather than a single one.
We define a cost function as the amount of a given computational resource (running time or memory) required by some algorithm on inputs of size , where .
Two design choices are embedded in this definition. First, we consider only non-negative input sizes and resource amounts — negative inputs and negative costs are not meaningful here. Second, the output is real-valued even though input size is an integer; this allows intermediate calculations to produce non-integer quantities before we classify them asymptotically.
What we want from is its growth rate: how fast the resource usage scales as increases. Two things are deliberately ignored:
This is not an approximation for convenience — it is the right abstraction. What determines whether an algorithm is usable in practice is the growth rate of its cost, not the exact constant.
There are five standard notations. Three are primary (Big-, Big-, ) and two are strict variants (little-, little-). Each defines a set of functions, not a single function. When we write , the technically correct statement is — we are saying that belongs to the class of functions bounded above by up to a constant. The equality notation is conventional abuse, and we use it throughout.
is the set of functions whose order of growth is at most that of . The constants and exist but need not be unique — any valid pair witnesses membership in the set. Note that : a function is always an upper bound on itself.
Worked example. Prove , i.e., take .
We need and such that for all . Dividing through by (valid for ):
For , the right-hand side is at most , so any and works. Alternatively, and also works, since for the term gives . There is no unique witness — what matters is that one exists.
is the set of functions whose order of growth is at least that of . It is the mirror of Big-: where Big- says the function does not grow faster than , Big- says it does not grow slower.
Worked example. Prove .
We need and such that for all . Dividing by :
For , the right-hand side is always , so any and works. We can also choose with , since for we have .
is the set of functions that are asymptotically equivalent to : squeezed between two constant multiples of for all sufficiently large . This is the tightest classification — it gives both an upper and lower bound simultaneously.
Key theorem. if and only if and .
This means you can prove a bound by proving both an and an bound separately.
Worked example. Prove .
We need and such that . Dividing through by :
For , the middle expression satisfies , so , , works.
The critical difference from Big-: the quantifier on flips. Big- requires the bound to hold for some . Little- requires it to hold for all . This means must eventually be crushed below any positive multiple of , no matter how small — grows strictly faster than .
A direct consequence: for any . A function cannot dominate itself strictly. In contrast, is always true. So but , while .
If then certainly . The converse does not hold in general.
The strict analog of Big-: must eventually exceed every positive multiple of . Again, for any , and implies .
The five notations can be identified via limits, which is often the most direct way to classify a given pair of functions:
The limit characterization is a practical tool. To show , compute (by L'Hôpital or standard results). Done.
The five notations satisfy a collection of algebraic properties. These are not incidental — they encode the behavior of the ordering on growth rates.
Transitivity holds for all five notations. For example: if and , then . This makes the ordering transitive, as expected.
Reflexivity holds for , , and — a function is its own upper bound, lower bound, and tight bound. It does not hold for or , since those are strict relations.
Symmetry is unique to : . Two functions in the same asymptotic class are mutually equivalent.
Transpose symmetry links to and to :
This makes sense: if is upper-bounded by , then is lower-bounded by .
Three rules make it easy to compose asymptotic bounds:
Sum rule. If and , then . In practice, since , the complexity of a sum is dominated by the larger term.
Product rule. If and , then .
Constant multiplication. If and , then . Multiplying by a positive constant does not change the asymptotic class — constants are absorbed.
These rules also hold for , , , and , with appropriate care.
Asymptotic notation can appear inside expressions. The notation means "there exists some function such that the expression equals ." The statement asserts that for any such , the resulting sum is in . This use is legitimate — it lets us suppress explicit intermediate terms while preserving the asymptotic claim.
Unlike the real numbers, asymptotic growth rates are not always comparable. For real numbers and , exactly one of , , holds. For functions this is not guaranteed. The pair and is a concrete counterexample: when we get , so dominates ; when we get , so dominates . Neither nor holds, and no relationship exists. Such pairs are incomparable under asymptotic ordering.
The following table lists the standard complexity classes in increasing order of growth rate. Each row is strictly of all rows above it.
| Class | Name |
|---|---|
| Constant | |
| Logarithmic | |
| , | Polylogarithmic |
| , | Sublinear |
| Linear | |
| Pseudolinear | |
| , | Polynomial |
| Quadratic | |
| Cubic | |
| , | Exponential |
| Factorial | |
| Exponential (base ) |
Convention used throughout: , and .
The gap between polynomial and exponential classes is enormous in practice, not just theoretically. The dividing line between and is roughly the dividing line between "feasible for large inputs" and "feasible only for small inputs." This is why the class (problems solvable in polynomial time) is the standard definition of tractability in complexity theory.
Having established the notation, we can now define complexity precisely.
Computational complexity of an algorithm. An algorithm has computational complexity with respect to a given resource if the amount of that resource required to run on inputs of size is .
Computational complexity of a problem. A problem has computational complexity with respect to a given resource if the best algorithm solving has that cost.
The distinction matters. An algorithm's complexity is a property of that specific algorithm. A problem's complexity is a property of the problem itself — a lower bound on what any algorithm solving it must cost. Proving a tight problem complexity requires both an algorithm achieving a given bound and a proof that no algorithm can do better.
The same definitions apply to all five asymptotic notations, not just Big-.
For a fixed input size , different inputs of that size can cause the algorithm to behave very differently. This motivates three distinct modes of analysis.
Best case describes the algorithm's cost on the most favorable input of size . It is a lower bound on the actual cost, but usually not very informative for algorithm design — the best case may be a rare or unrepresentative scenario.
Worst case describes the cost on the most unfavorable input of size . This is typically the most important quantity: it is a guarantee that holds for every input, regardless of what the input happens to be. When designing algorithms, worst-case performance is the primary target.
Average case describes the expected cost under some probability distribution over inputs of size . This is the most informative measure when inputs are drawn from a realistic distribution, but it requires specifying and justifying that distribution — a non-trivial assumption.
function linsearch(int A[1···n], int x) → int:
for i = 1, ···, n do:
if A[i] == x then:
return i
return −1
Best case. is the first element. The loop executes once and returns. Cost: .
Worst case. is the last element or is absent. The loop runs through all elements. Cost: .
Average case. Assume uniform probability: is at position with probability for , and is absent (treated as "position ") also with probability . The cost of finding at position is (we inspect elements), and the cost of determining absence is (a full scan).
In this case best/worst differ but the average and worst are both .
Binary search requires a sorted array. Rather than scanning sequentially, it checks the midpoint and eliminates half the remaining search space at each step.
function binsearch(int A[1···n], int x) → int:
i = 1, j = n
while i ≤ j do:
m = (i + j)/2
if A[m] == x then:
return m
else if A[m] < x then:
i = m + 1
else:
j = m − 1
return −1
Best case. equals the middle element on the first iteration. Cost: .
Worst case. is absent. Each iteration halves the search space. Let the maximum number of iterations be . After iterations, the search space has size . The loop terminates when this falls below 1:
The loop therefore executes at most iterations, each doing work. Worst-case cost: .
Average case. Under uniform probability (excluding the missing-element case to simplify), the cost for position depends on how many steps binary search takes to reach it. The structure of binary search on elements is a balanced binary tree: the root (midpoint) requires 1 step, its two children require 2 steps, their four children 3 steps, and so on. There is 1 element at depth 1, 2 at depth 2, 4 at depth 3, and at depth .
The average cost is:
To bound this sum, replace each with its maximum value :
which is . The average case matches the worst case: .
function min(int A[1···n]) → int:
m = 1
for i = 2, ···, n do:
if A[i] < A[m] then:
m = i
return m
The loop always runs exactly times with no early termination possible — you cannot know you have found the minimum without examining every element. Best, worst, and average cases all coincide: .
[!NOTE] When best, worst, and average cases all give the same bound, that bound is tight in a strong sense: no input structure can help the algorithm, and none can hurt it. notation is appropriate here, not just .
Best, worst, and average case analysis all measure the cost of a single operation. Amortized analysis is a different kind of question: what is the average cost per operation over a sequence of operations?
This is not the same as average-case analysis. Average case uses a probability distribution over inputs. Amortized analysis makes no probabilistic assumptions — it analyzes a concrete sequence of operations and gives a guaranteed average cost per operation in the worst case over all possible sequences.
Why is this needed? Some algorithms have operations that are cheap most of the time but occasionally expensive. Worst-case analysis of a single operation in isolation may be too pessimistic about the sequence as a whole — it imagines every operation hitting the expensive case, which may be structurally impossible. Amortized analysis captures the true cost more precisely.
Consider incrementing a -bit binary counter stored in an array (most significant bit at position 1).
function increment(int A[1···k]):
i = k
while i ≥ 1 and A[i] == 1 do:
A[i] = 0
i = i − 1
if i ≥ 1 then:
A[i] = 1
The logic: scan from the least significant bit, flipping 1s to 0s until you hit a 0, then flip that 0 to 1. This is standard binary addition.
Best case. The least significant bit is 0 — one flip suffices. Cost: .
Worst case. The counter holds (all ones) — all bits flip. Cost: .
A naive worst-case bound for increments is therefore . But is it really possible for every increment to flip all bits? No. After an all-ones counter is incremented, it resets to zero. The expensive operation cannot repeat immediately. The question is how to quantify this precisely.
The aggregate method computes the total cost of operations directly and divides by .
Count how many times each bit position flips across increments. Bit position (the least significant) flips on every increment: times. Bit flips every 2 increments: times. Bit flips every increments: times. The total cost is:
The total cost of increments is . The amortized cost per operation is therefore .
Note: the bound holds even when , because beyond increments the counter overflows and cycles — position is never flipped in those cases.
The accounting method works by assigning a fictitious "charge" to each operation — an amortized cost — and maintaining a credit balance to handle operations whose actual cost exceeds their charge.
The rules are: (1) each operation is charged its amortized cost; (2) if actual cost amortized cost, the difference is saved as credit; (3) if actual cost amortized cost, the excess is paid from credit; (4) the credit balance must never go negative.
If we can show that a valid amortized cost assignment exists with the credit never going negative, then the sum of amortized costs over operations is an upper bound on the total actual cost.
For the binary counter, assign an amortized cost of 2 units to every increment:
When a bit is subsequently flipped from 1 to 0, the flip is paid using the 1 unit of credit sitting on that bit. We never charge separately for flips to 0.
Why the credit is never negative. Every 1-bit in the array has exactly 1 unit of credit on it (deposited when it was set to 1, not yet spent). Since the number of 1-bits is always , the total available credit is always .
Total cost bound. Each increment sets exactly one bit from 0 to 1, so each increment costs exactly 2 amortized units. For increments: total amortized cost . Since the credit never goes negative, the total actual cost .
The amortized cost per operation is .
| Method | Best suited for |
|---|---|
| Aggregate | When total cost over operations can be computed directly (e.g., counting bit flips across all operations) |
| Accounting | When operations are heterogeneous and assigning per-operation charges is more natural |
Both methods give the same conclusion in the binary counter example. The accounting method generalizes more easily when a data structure has different operation types with different costs (e.g., a dynamic array that occasionally resizes).
[!NOTE] The amortized cost of a counter increment does not mean each individual increment is cheap — single operations can still cost . It means that averaged over a long sequence, the cost per operation is constant. This is the guarantee amortized analysis provides, and it is often the guarantee that actually matters for designing efficient systems.
The five asymptotic notations provide a complete vocabulary for describing growth rates. bounds from above, from below, pins both sides simultaneously. The little variants and express strict dominance — asymptotically one function is strictly absorbed into or exceeds any multiple of the other.
On top of this, computational complexity analysis has four distinct modes: best case, worst case, average case, and amortized. The worst case is the most generally useful — it gives an unconditional guarantee. Average case is more informative but requires a distributional assumption. Amortized analysis fills the remaining gap: it gives tight guarantees about sequences of operations without probabilistic assumptions, by accounting for the fact that expensive operations can only occur when previous cheap operations have "set up" the expensive state.
The binary counter is the canonical demonstration: a single operation can cost , but the amortized cost over any sequence is , because the state that causes an expensive operation always takes many cheap operations to recreate.