← All lectures
Lecture · L02

Asymptotic Notation and Computational Complexity

doneAsymptotic Notationanalysis

The previous lecture established that the same function can be computed by algorithms that differ by an exponential factor in efficiency. What we need now is a precise, machine-independent language for describing and comparing those differences. That language is asymptotic notation. This lecture also introduces a more careful taxonomy of what it means to measure the cost of an algorithm — distinguishing best, worst, and average case — and a fourth mode of analysis, amortized cost, which applies when we care about the cost of a sequence of operations rather than a single one.


The Cost Function

We define a cost function f(n)0f(n) \geq 0 as the amount of a given computational resource (running time or memory) required by some algorithm on inputs of size nn, where n0n \geq 0.

Two design choices are embedded in this definition. First, we consider only non-negative input sizes and resource amounts — negative inputs and negative costs are not meaningful here. Second, the output f(n)f(n) is real-valued even though input size is an integer; this allows intermediate calculations to produce non-integer quantities before we classify them asymptotically.

What we want from f(n)f(n) is its growth rate: how fast the resource usage scales as nn increases. Two things are deliberately ignored:

  • Constant factors. An algorithm that does 3n3n operations and one that does 100n100n operations both scale linearly. The factor 3 vs 100 is machine-dependent, compiler-dependent, and irrelevant to the asymptotic class.
  • Low-order terms. In f(n)=n2+10n+5f(n) = n^2 + 10n + 5, the 10n+510n + 5 term becomes negligible relative to n2n^2 as nn grows. For large nn, only the dominant term matters.

This is not an approximation for convenience — it is the right abstraction. What determines whether an algorithm is usable in practice is the growth rate of its cost, not the exact constant.


Asymptotic Notation

There are five standard notations. Three are primary (Big-OO, Big-Ω\Omega, Θ\Theta) and two are strict variants (little-oo, little-ω\omega). Each defines a set of functions, not a single function. When we write f(n)=O(g(n))f(n) = O(g(n)), the technically correct statement is f(n)O(g(n))f(n) \in O(g(n)) — we are saying that ff belongs to the class of functions bounded above by gg up to a constant. The equality notation is conventional abuse, and we use it throughout.

Big-OO (Upper Bound)

O(g(n))={f(n)c>0,n00 such that nn0,f(n)cg(n)}O(g(n)) = \{ f(n) \mid \exists\, c > 0,\, n_0 \geq 0 \text{ such that } \forall n \geq n_0,\, f(n) \leq c\,g(n) \}

O(g(n))O(g(n)) is the set of functions whose order of growth is at most that of g(n)g(n). The constants cc and n0n_0 exist but need not be unique — any valid pair witnesses membership in the set. Note that g(n)=O(g(n))g(n) = O(g(n)): a function is always an upper bound on itself.

Worked example. Prove f(n)=3n2+10n=O(n2)f(n) = 3n^2 + 10n = O(n^2), i.e., take g(n)=n2g(n) = n^2.

We need c>0c > 0 and n00n_0 \geq 0 such that 3n2+10ncn23n^2 + 10n \leq cn^2 for all nn0n \geq n_0. Dividing through by n2n^2 (valid for n>0n > 0):

c3n2+10nn2=3+10nc \geq \frac{3n^2 + 10n}{n^2} = 3 + \frac{10}{n}

For n1n \geq 1, the right-hand side is at most 3+10=133 + 10 = 13, so any c13c \geq 13 and n0=1n_0 = 1 works. Alternatively, c=4c = 4 and n0=10n_0 = 10 also works, since for n10n \geq 10 the term 10/n110/n \leq 1 gives 3+10/n43 + 10/n \leq 4. There is no unique witness — what matters is that one exists.

Big-Ω\Omega (Lower Bound)

Ω(g(n))={f(n)c>0,n00 such that nn0,f(n)cg(n)}\Omega(g(n)) = \{ f(n) \mid \exists\, c > 0,\, n_0 \geq 0 \text{ such that } \forall n \geq n_0,\, f(n) \geq c\,g(n) \}

Ω(g(n))\Omega(g(n)) is the set of functions whose order of growth is at least that of g(n)g(n). It is the mirror of Big-OO: where Big-OO says the function does not grow faster than gg, Big-Ω\Omega says it does not grow slower.

Worked example. Prove f(n)=n3+2n2=Ω(n2)f(n) = n^3 + 2n^2 = \Omega(n^2).

We need c>0c > 0 and n00n_0 \geq 0 such that n3+2n2cn2n^3 + 2n^2 \geq cn^2 for all nn0n \geq n_0. Dividing by n2n^2:

cn3+2n2n2=n+2c \leq \frac{n^3 + 2n^2}{n^2} = n + 2

For n0n \geq 0, the right-hand side is always 2\geq 2, so any c2c \leq 2 and n0=0n_0 = 0 works. We can also choose c=12c = 12 with n0=10n_0 = 10, since for n10n \geq 10 we have n+212n + 2 \geq 12.

Theta (Tight Bound)

Θ(g(n))={f(n)c1,c2>0,n00 such that nn0,c1g(n)f(n)c2g(n)}\Theta(g(n)) = \{ f(n) \mid \exists\, c_1, c_2 > 0,\, n_0 \geq 0 \text{ such that } \forall n \geq n_0,\, c_1\,g(n) \leq f(n) \leq c_2\,g(n) \}

Θ(g(n))\Theta(g(n)) is the set of functions that are asymptotically equivalent to g(n)g(n): squeezed between two constant multiples of gg for all sufficiently large nn. This is the tightest classification — it gives both an upper and lower bound simultaneously.

Key theorem. f(n)=Θ(g(n))f(n) = \Theta(g(n)) if and only if f(n)=O(g(n))f(n) = O(g(n)) and f(n)=Ω(g(n))f(n) = \Omega(g(n)).

This means you can prove a Θ\Theta bound by proving both an OO and an Ω\Omega bound separately.

Worked example. Prove f(n)=n3+2n2=Θ(n3)f(n) = n^3 + 2n^2 = \Theta(n^3).

We need c1,c2>0c_1, c_2 > 0 and n00n_0 \geq 0 such that c1n3n3+2n2c2n3c_1 n^3 \leq n^3 + 2n^2 \leq c_2 n^3. Dividing through by n3n^3:

c11+2nc2c_1 \leq 1 + \frac{2}{n} \leq c_2

For n1n \geq 1, the middle expression satisfies 11+2/n31 \leq 1 + 2/n \leq 3, so c1=1c_1 = 1, c2=3c_2 = 3, n0=1n_0 = 1 works.

Little-oo (Strict Upper Bound)

o(g(n))={f(n)c>0,n00 such that nn0,f(n)<cg(n)}o(g(n)) = \{ f(n) \mid \forall c > 0,\, \exists n_0 \geq 0 \text{ such that } \forall n \geq n_0,\, f(n) < c\,g(n) \}

The critical difference from Big-OO: the quantifier on cc flips. Big-OO requires the bound to hold for some c>0c > 0. Little-oo requires it to hold for all c>0c > 0. This means ff must eventually be crushed below any positive multiple of gg, no matter how small — gg grows strictly faster than ff.

A direct consequence: g(n)o(g(n))g(n) \neq o(g(n)) for any gg. A function cannot dominate itself strictly. In contrast, g(n)=O(g(n))g(n) = O(g(n)) is always true. So n2=O(n2)n^2 = O(n^2) but n2o(n2)n^2 \neq o(n^2), while n=o(n2)n = o(n^2).

If f(n)=o(g(n))f(n) = o(g(n)) then certainly f(n)=O(g(n))f(n) = O(g(n)). The converse does not hold in general.

Little-ω\omega (Strict Lower Bound)

ω(g(n))={f(n)c>0,n00 such that nn0,f(n)>cg(n)}\omega(g(n)) = \{ f(n) \mid \forall c > 0,\, \exists n_0 \geq 0 \text{ such that } \forall n \geq n_0,\, f(n) > c\,g(n) \}

The strict analog of Big-Ω\Omega: ff must eventually exceed every positive multiple of gg. Again, g(n)ω(g(n))g(n) \neq \omega(g(n)) for any gg, and f(n)=ω(g(n))f(n) = \omega(g(n)) implies f(n)=Ω(g(n))f(n) = \Omega(g(n)).


Connecting to Limits

The five notations can be identified via limits, which is often the most direct way to classify a given pair of functions:

limnf(n)g(n)=0    f(n)=o(g(n))    f(n)=O(g(n))\lim_{n \to \infty} \frac{f(n)}{g(n)} = 0 \implies f(n) = o(g(n)) \implies f(n) = O(g(n))

limnf(n)g(n)=    f(n)=ω(g(n))    f(n)=Ω(g(n))\lim_{n \to \infty} \frac{f(n)}{g(n)} = \infty \implies f(n) = \omega(g(n)) \implies f(n) = \Omega(g(n))

limnf(n)g(n)=k>0    f(n)=Θ(g(n))\lim_{n \to \infty} \frac{f(n)}{g(n)} = k > 0 \implies f(n) = \Theta(g(n))

The limit characterization is a practical tool. To show nlogn=o(n2)n \log n = o(n^2), compute limnnlogn/n2=limnlogn/n=0\lim_{n \to \infty} n \log n / n^2 = \lim_{n \to \infty} \log n / n = 0 (by L'Hôpital or standard results). Done.


Properties and Rules

Structural Properties

The five notations satisfy a collection of algebraic properties. These are not incidental — they encode the behavior of the ,<,=,>,\leq, <, =, >, \geq ordering on growth rates.

Transitivity holds for all five notations. For example: if f(n)=O(g(n))f(n) = O(g(n)) and g(n)=O(h(n))g(n) = O(h(n)), then f(n)=O(h(n))f(n) = O(h(n)). This makes the ordering transitive, as expected.

Reflexivity holds for OO, Ω\Omega, and Θ\Theta — a function is its own upper bound, lower bound, and tight bound. It does not hold for oo or ω\omega, since those are strict relations.

Symmetry is unique to Θ\Theta: f(n)=Θ(g(n))    g(n)=Θ(f(n))f(n) = \Theta(g(n)) \iff g(n) = \Theta(f(n)). Two functions in the same asymptotic class are mutually equivalent.

Transpose symmetry links OO to Ω\Omega and oo to ω\omega:

f(n)=O(g(n))    g(n)=Ω(f(n))f(n) = O(g(n)) \iff g(n) = \Omega(f(n)) f(n)=o(g(n))    g(n)=ω(f(n))f(n) = o(g(n)) \iff g(n) = \omega(f(n))

This makes sense: if ff is upper-bounded by gg, then gg is lower-bounded by ff.

Computation Rules

Three rules make it easy to compose asymptotic bounds:

Sum rule. If f1(n)=O(g1(n))f_1(n) = O(g_1(n)) and f2(n)=O(g2(n))f_2(n) = O(g_2(n)), then f1(n)+f2(n)=O(g1(n)+g2(n))f_1(n) + f_2(n) = O(g_1(n) + g_2(n)). In practice, since O(g1+g2)=O(max(g1,g2))O(g_1 + g_2) = O(\max(g_1, g_2)), the complexity of a sum is dominated by the larger term.

Product rule. If f1(n)=O(g1(n))f_1(n) = O(g_1(n)) and f2(n)=O(g2(n))f_2(n) = O(g_2(n)), then f1(n)f2(n)=O(g1(n)g2(n))f_1(n) \cdot f_2(n) = O(g_1(n) \cdot g_2(n)).

Constant multiplication. If f(n)=O(g(n))f(n) = O(g(n)) and a>0a > 0, then af(n)=O(g(n))a \cdot f(n) = O(g(n)). Multiplying by a positive constant does not change the asymptotic class — constants are absorbed.

These rules also hold for Ω\Omega, Θ\Theta, oo, and ω\omega, with appropriate care.

Asymptotic Notation Inside Equations

Asymptotic notation can appear inside expressions. The notation 2n2+Θ(n)2n^2 + \Theta(n) means "there exists some function f(n)Θ(n)f(n) \in \Theta(n) such that the expression equals 2n2+f(n)2n^2 + f(n)." The statement 2n2+Θ(n)=Θ(n2)2n^2 + \Theta(n) = \Theta(n^2) asserts that for any such f(n)f(n), the resulting sum is in Θ(n2)\Theta(n^2). This use is legitimate — it lets us suppress explicit intermediate terms while preserving the asymptotic claim.

Incomparable Functions

Unlike the real numbers, asymptotic growth rates are not always comparable. For real numbers aa and bb, exactly one of a<ba < b, a=ba = b, a>ba > b holds. For functions this is not guaranteed. The pair f(n)=nf(n) = n and g(n)=nsin(n)+1g(n) = n^{\sin(n) + 1} is a concrete counterexample: when sin(n)=1\sin(n) = -1 we get g(n)=1g(n) = 1, so ff dominates gg; when sin(n)=1\sin(n) = 1 we get g(n)=n2g(n) = n^2, so gg dominates ff. Neither f(n)=O(g(n))f(n) = O(g(n)) nor f(n)=Ω(g(n))f(n) = \Omega(g(n)) holds, and no Θ\Theta relationship exists. Such pairs are incomparable under asymptotic ordering.


Orders of Growth

The following table lists the standard complexity classes in increasing order of growth rate. Each row is strictly ω\omega of all rows above it.

ClassName
O(1)O(1)Constant
O(logn)O(\log n)Logarithmic
O(logkn)O(\log^k n), k1k \geq 1Polylogarithmic
O(nk)O(n^k), 0<k<10 < k < 1Sublinear
O(n)O(n)Linear
O(nlogn)O(n \log n)Pseudolinear
O(nk)O(n^k), k>1k > 1Polynomial
O(n2)O(n^2)Quadratic
O(n3)O(n^3)Cubic
O(cn)O(c^n), c>1c > 1Exponential
O(n!)O(n!)Factorial
O(nn)O(n^n)Exponential (base nn)

Convention used throughout: logn=log2n\log n = \log_2 n, and logkn=(logn)k\log^k n = (\log n)^k.

The gap between polynomial and exponential classes is enormous in practice, not just theoretically. The dividing line between O(nk)O(n^k) and O(cn)O(c^n) is roughly the dividing line between "feasible for large inputs" and "feasible only for small inputs." This is why the class P\mathbf{P} (problems solvable in polynomial time) is the standard definition of tractability in complexity theory.


Computational Complexity

Having established the notation, we can now define complexity precisely.

Computational complexity of an algorithm. An algorithm A\mathcal{A} has computational complexity O(f(n))O(f(n)) with respect to a given resource if the amount of that resource required to run A\mathcal{A} on inputs of size nn is O(f(n))O(f(n)).

Computational complexity of a problem. A problem P\mathcal{P} has computational complexity O(f(n))O(f(n)) with respect to a given resource if the best algorithm solving P\mathcal{P} has that cost.

The distinction matters. An algorithm's complexity is a property of that specific algorithm. A problem's complexity is a property of the problem itself — a lower bound on what any algorithm solving it must cost. Proving a tight problem complexity requires both an algorithm achieving a given bound and a proof that no algorithm can do better.

The same definitions apply to all five asymptotic notations, not just Big-OO.


Best, Worst, and Average Case

For a fixed input size nn, different inputs of that size can cause the algorithm to behave very differently. This motivates three distinct modes of analysis.

Best case describes the algorithm's cost on the most favorable input of size nn. It is a lower bound on the actual cost, but usually not very informative for algorithm design — the best case may be a rare or unrepresentative scenario.

Worst case describes the cost on the most unfavorable input of size nn. This is typically the most important quantity: it is a guarantee that holds for every input, regardless of what the input happens to be. When designing algorithms, worst-case performance is the primary target.

Average case describes the expected cost under some probability distribution over inputs of size nn. This is the most informative measure when inputs are drawn from a realistic distribution, but it requires specifying and justifying that distribution — a non-trivial assumption.

function linsearch(int A[1···n], int x) → int:
    for i = 1, ···, n do:
        if A[i] == x then:
            return i
    return −1

Best case. xx is the first element. The loop executes once and returns. Cost: O(1)O(1).

Worst case. xx is the last element or is absent. The loop runs through all nn elements. Cost: Θ(n)\Theta(n).

Average case. Assume uniform probability: xx is at position ii with probability Pi=1n+1P_i = \frac{1}{n+1} for i=1,,ni = 1, \ldots, n, and xx is absent (treated as "position n+1n+1") also with probability 1n+1\frac{1}{n+1}. The cost of finding xx at position ii is Ti=iT_i = i (we inspect ii elements), and the cost of determining absence is Tn+1=n+1T_{n+1} = n+1 (a full scan).

Average cost=i=1n+1PiTi=1n+1i=1n+1i=1n+1(n+1)(n+2)2=n+22=Θ(n)\text{Average cost} = \sum_{i=1}^{n+1} P_i T_i = \frac{1}{n+1} \sum_{i=1}^{n+1} i = \frac{1}{n+1} \cdot \frac{(n+1)(n+2)}{2} = \frac{n+2}{2} = \Theta(n)

In this case best/worst differ but the average and worst are both Θ(n)\Theta(n).

Binary search requires a sorted array. Rather than scanning sequentially, it checks the midpoint and eliminates half the remaining search space at each step.

function binsearch(int A[1···n], int x) → int:
    i = 1, j = n
    while i ≤ j do:
        m = (i + j)/2
        if A[m] == x then:
            return m
        else if A[m] < x then:
            i = m + 1
        else:
            j = m − 1
    return −1

Best case. xx equals the middle element A[(1+n)/2]A[\lfloor(1+n)/2\rfloor] on the first iteration. Cost: O(1)O(1).

Worst case. xx is absent. Each iteration halves the search space. Let the maximum number of iterations be ii. After ii iterations, the search space has size n/2i1n/2^{i-1}. The loop terminates when this falls below 1:

n2i1<1    n<2i1    log2n<i1    i>log2n+1\frac{n}{2^{i-1}} < 1 \implies n < 2^{i-1} \implies \log_2 n < i - 1 \implies i > \log_2 n + 1

The loop therefore executes at most Θ(logn)\Theta(\log n) iterations, each doing O(1)O(1) work. Worst-case cost: Θ(logn)\Theta(\log n).

Average case. Under uniform probability Pi=1/nP_i = 1/n (excluding the missing-element case to simplify), the cost for position ii depends on how many steps binary search takes to reach it. The structure of binary search on nn elements is a balanced binary tree: the root (midpoint) requires 1 step, its two children require 2 steps, their four children 3 steps, and so on. There is 1 element at depth 1, 2 at depth 2, 4 at depth 3, and 2k12^{k-1} at depth klog2nk \leq \log_2 n.

The average cost is:

T(n)1ni=1log2ni2i1T(n) \leq \frac{1}{n} \sum_{i=1}^{\log_2 n} i \cdot 2^{i-1}

To bound this sum, replace each ii with its maximum value log2n\log_2 n:

1ni=1log2ni2i1log2nni=1log2n2i=log2nn2log2n+1221=log2nn(2n2)\frac{1}{n} \sum_{i=1}^{\log_2 n} i \cdot 2^{i-1} \leq \frac{\log_2 n}{n} \sum_{i=1}^{\log_2 n} 2^i = \frac{\log_2 n}{n} \cdot \frac{2^{\log_2 n + 1} - 2}{2 - 1} = \frac{\log_2 n}{n} \cdot (2n - 2)

which is O(logn)O(\log n). The average case matches the worst case: O(logn)O(\log n).

function min(int A[1···n]) → int:
    m = 1
    for i = 2, ···, n do:
        if A[i] < A[m] then:
            m = i
    return m

The loop always runs exactly n1n-1 times with no early termination possible — you cannot know you have found the minimum without examining every element. Best, worst, and average cases all coincide: Θ(n)\Theta(n).

[!NOTE] When best, worst, and average cases all give the same bound, that bound is tight in a strong sense: no input structure can help the algorithm, and none can hurt it. Θ\Theta notation is appropriate here, not just OO.


Amortized Analysis

Best, worst, and average case analysis all measure the cost of a single operation. Amortized analysis is a different kind of question: what is the average cost per operation over a sequence of nn operations?

This is not the same as average-case analysis. Average case uses a probability distribution over inputs. Amortized analysis makes no probabilistic assumptions — it analyzes a concrete sequence of operations and gives a guaranteed average cost per operation in the worst case over all possible sequences.

Why is this needed? Some algorithms have operations that are cheap most of the time but occasionally expensive. Worst-case analysis of a single operation in isolation may be too pessimistic about the sequence as a whole — it imagines every operation hitting the expensive case, which may be structurally impossible. Amortized analysis captures the true cost more precisely.

The Binary Counter

Consider incrementing a kk-bit binary counter stored in an array A[1k]A[1 \ldots k] (most significant bit at position 1).

function increment(int A[1···k]):
    i = k
    while i ≥ 1 and A[i] == 1 do:
        A[i] = 0
        i = i − 1
    if i ≥ 1 then:
        A[i] = 1

The logic: scan from the least significant bit, flipping 1s to 0s until you hit a 0, then flip that 0 to 1. This is standard binary addition.

Best case. The least significant bit is 0 — one flip suffices. Cost: O(1)O(1).

Worst case. The counter holds 1111111\ldots1 (all ones) — all kk bits flip. Cost: O(k)O(k).

A naive worst-case bound for nn increments is therefore O(nk)O(nk). But is it really possible for every increment to flip all kk bits? No. After an all-ones counter is incremented, it resets to zero. The expensive operation cannot repeat immediately. The question is how to quantify this precisely.

Aggregate Method

The aggregate method computes the total cost of nn operations directly and divides by nn.

Count how many times each bit position flips across nn increments. Bit position kk (the least significant) flips on every increment: nn times. Bit k1k-1 flips every 2 increments: n/2n/2 times. Bit kjk-j flips every 2j2^j increments: n/2jn/2^j times. The total cost is:

j=0k1n2j=nj=0k112jnj=012j=n111/2=2n\sum_{j=0}^{k-1} \frac{n}{2^j} = n \sum_{j=0}^{k-1} \frac{1}{2^j} \leq n \sum_{j=0}^{\infty} \frac{1}{2^j} = n \cdot \frac{1}{1 - 1/2} = 2n

The total cost of nn increments is O(n)O(n). The amortized cost per operation is therefore O(n)/n=O(1)O(n)/n = O(1).

Note: the bound 2n\leq 2n holds even when n>2kn > 2^k, because beyond 2k2^k increments the counter overflows and cycles — position kk is never flipped in those cases.

Accounting Method

The accounting method works by assigning a fictitious "charge" to each operation — an amortized cost — and maintaining a credit balance to handle operations whose actual cost exceeds their charge.

The rules are: (1) each operation is charged its amortized cost; (2) if actual cost << amortized cost, the difference is saved as credit; (3) if actual cost >> amortized cost, the excess is paid from credit; (4) the credit balance must never go negative.

If we can show that a valid amortized cost assignment exists with the credit never going negative, then the sum of amortized costs over nn operations is an upper bound on the total actual cost.

For the binary counter, assign an amortized cost of 2 units to every increment:

  • Spend 1 unit to pay for the actual bit flip from 0 to 1.
  • Save 1 unit as credit on that bit.

When a bit is subsequently flipped from 1 to 0, the flip is paid using the 1 unit of credit sitting on that bit. We never charge separately for flips to 0.

Why the credit is never negative. Every 1-bit in the array has exactly 1 unit of credit on it (deposited when it was set to 1, not yet spent). Since the number of 1-bits is always 0\geq 0, the total available credit is always 0\geq 0.

Total cost bound. Each increment sets exactly one bit from 0 to 1, so each increment costs exactly 2 amortized units. For nn increments: total amortized cost =2n= 2n. Since the credit never goes negative, the total actual cost 2n=O(n)\leq 2n = O(n).

The amortized cost per operation is O(1)O(1).

When to Use Each Method

MethodBest suited for
AggregateWhen total cost over nn operations can be computed directly (e.g., counting bit flips across all operations)
AccountingWhen operations are heterogeneous and assigning per-operation charges is more natural

Both methods give the same conclusion in the binary counter example. The accounting method generalizes more easily when a data structure has different operation types with different costs (e.g., a dynamic array that occasionally resizes).

[!NOTE] The amortized O(1)O(1) cost of a counter increment does not mean each individual increment is cheap — single operations can still cost O(k)O(k). It means that averaged over a long sequence, the cost per operation is constant. This is the guarantee amortized analysis provides, and it is often the guarantee that actually matters for designing efficient systems.


Summary

The five asymptotic notations provide a complete vocabulary for describing growth rates. OO bounds from above, Ω\Omega from below, Θ\Theta pins both sides simultaneously. The little variants oo and ω\omega express strict dominance — asymptotically one function is strictly absorbed into or exceeds any multiple of the other.

On top of this, computational complexity analysis has four distinct modes: best case, worst case, average case, and amortized. The worst case is the most generally useful — it gives an unconditional guarantee. Average case is more informative but requires a distributional assumption. Amortized analysis fills the remaining gap: it gives tight guarantees about sequences of operations without probabilistic assumptions, by accounting for the fact that expensive operations can only occur when previous cheap operations have "set up" the expensive state.

The binary counter is the canonical demonstration: a single operation can cost O(k)O(k), but the amortized cost over any sequence is O(1)O(1), because the state that causes an expensive operation always takes many cheap operations to recreate.