Testing Theory

The following are the general guidelines to proper software testing.

[ companion video ]


A State-Based Theory of Testing

A persistent misconception in software testing is the belief that a system can be tested by validating each of its operations independently. This view treats a system as a bag of functions rather than what it actually is: A stateful system whose behavior depends on history. This page presents a general testing methodology applicable to any object—an ADT, a module, a subsystem, or an entire system—and explains why correct testing must focus on state transitions, not isolated operations.

What We Mean by “System”

For testing purposes, a system is anything that:
  • Maintains internal state
  • Accepts inputs or operations over time
  • Produces outputs that depend on prior interactions
This includes:
  • Abstract Data Types (stacks, lists, trees)
  • Modules and libraries
  • Services and APIs
  • Concurrent subsystems
  • Entire applications and distributed systems
The scale changes. The model does not.

The Fundamental Error: Treating Systems as Bags of Operations

A bag-of-operations mindset assumes:
  • Each operation can be tested independently
  • Correctness of parts implies correctness of the whole
  • State history is incidental, not central
This assumption is false. Most systems do not fail because an operation is incorrect in isolation. They fail because:
  • The system entered a state the developer did not anticipate
  • An operation violated an invariant after a particular sequence
  • Two correct operations interacted in an incorrect way
Correct testing must therefore focus on how operations compose over time.

Systems as State Machines

Every stateful system can be modeled (explicitly or implicitly) as a state machine:
  • States: Internal configurations of the system
  • Transitions: Operations that move the system between states
  • Invariants: Properties that must always hold
Testing, at its core, is about answering: “Does the system behave correctly across all relevant states and transitions?” Testing isolated operations without considering transitions is equivalent to testing only a few nodes of the state graph while ignoring its edges.

Why Exhaustive Testing Is Not Practically Possible for Large Systems

For any non-trivial system:
  • The number of possible states is enormous or infinite
  • The number of possible operation sequences grows exponentially
  • External inputs multiply the space further
This is known as state-space explosion. As a result:
  • Exhaustive testing is computationally infeasible
  • Testing can never prove the absence of bugs
  • This is not a failure of testing—it is a fundamental theoretical limit
That said, complete and exhaustive testing is possible for small, trivial, or highly constrained systems. It is in that space we can test thoroughly, and then build up confidence as we move to more complex systems by applying the principles outlined below.

What Testing Can Give Us Instead: Theoretical Confidence

Although exhaustive testing is impossible for non-trivial systems, we can achieve theoretical confidence by:
  • Partitioning the state space into equivalence classes
  • Explicitly testing boundary and edge states
  • Covering state transitions, not just states
  • Asserting invariants after every meaningful transition
  • Using randomness to explore additional sequences probabilistically
Testing does not provide proof. It provides structured evidence that failure is unlikely within the tested model.

The Core Principle

Correctness emerges from transitions, not operations. An operation may be correct in isolation and still be incorrect in context. Testing must therefore:
  • Drive the system into meaningful states
  • Perform operations from those states
  • Assert behavior and invariants afterward

Three Complementary Layers of Proper Testing

  • Layer 1: Local Contract Testing. Tests that:
    • Validate individual operations
    • Confirm basic pre- and postconditions
    • Ensure error handling is correct
    • These tests are necessary—but insufficient.
  • Layer 2: State and Transition Testing. Tests that:
    • Explicitly construct non-trivial system states
    • Execute sequences of operations
    • Validate invariants across transitions
    • This is where most defects are found
  • Layer 3: Property-Based and Randomized Testing. Tests that:
    • Generate many operation sequences
    • Assert general properties rather than specific outputs
    • Increase coverage of unanticipated paths
    • Random testing amplifies confidence but cannot replace explicit state testing

Why Random Testing Alone Fails

Random testing cannot guarantee:
  • Coverage of rare boundary states
  • Exploration of critical transitions
  • Reproduction of known failure patterns
Random testing is valuable only when guided by:
  • Well-defined properties
  • Known invariants
  • Explicitly tested edge cases
  • Otherwise, it is hope masquerading as methodology

Fresh State vs. Test Isolation

A crucial distinction:
  • Test isolation is mandatory
  • State reset between operations is not
  • Each test case should be independent of others
But within a test case, the system must be allowed—and often forced—to evolve through multiple states. Resetting the system between every operation destroys the very conditions that expose failures.

Real-World Failure Patterns

Most real bugs follow this pattern: “The system works—until you do this, then that, and then this other thing.” These failures are:
  • Rare
  • Sequence-dependent
  • State-dependent
  • Invisible to isolated tests
Testing strategies that ignore state evolution are precisely how such bugs escape.

ADTs as a Special Case of the General Theory

Testing an ADT is not fundamentally different from testing:
  • A memory manager
  • A transaction system
  • A protocol implementation
They all:
  • Maintain state
  • Enforce invariants
  • Depend on operation sequences/li>
ADT testing is simply the most approachable example of this universal principle.

Summary

  • Systems are not bags of operations; they are state machines.
  • Isolated operation testing is necessary but insufficient.
  • Most bugs arise from state transitions and interactions.
  • Exhaustive testing is impossible, but structured confidence is achievable.
  • Proper testing explicitly explores states, transitions, and invariants.
  • Random testing supports, but never replaces, deliberate design.
  • If your testing strategy does not explicitly reason about how a system evolves over time, then it is not testing the system—it is testing a falsehood.