Alexander Katrompas Testing

Testing Theory

The following are the general guidelines to proper software testing.

A State-Based Theory of Testing

A persistent misconception in software testing is the belief that a system can be tested by validating each of its operations independently. This view treats a system as a bag of functions rather than what it actually is: A stateful system whose behavior depends on history. This page presents a general testing methodology applicable to any object—an ADT, a module, a subsystem, or an entire system—and explains why correct testing must focus on state transitions, not isolated operations.

What We Mean by “System”

For testing purposes, a system is anything that:

Maintains internal state
Accepts inputs or operations over time
Produces outputs that depend on prior interactions

This includes:

Abstract Data Types (stacks, lists, trees)
Modules and libraries
Services and APIs
Concurrent subsystems
Entire applications and distributed systems

The scale changes. The model does not.

The Fundamental Error: Treating Systems as Bags of Operations

A bag-of-operations mindset assumes:

Each operation can be tested independently
Correctness of parts implies correctness of the whole
State history is incidental, not central

This assumption is false. Most systems do not fail because an operation is incorrect in isolation. They fail because:

The system entered a state the developer did not anticipate
An operation violated an invariant after a particular sequence
Two correct operations interacted in an incorrect way

Correct testing must therefore focus on how operations compose over time.

Systems as State Machines

Every stateful system can be modeled (explicitly or implicitly) as a state machine:

States: Internal configurations of the system
Transitions: Operations that move the system between states
Invariants: Properties that must always hold

Testing, at its core, is about answering: “Does the system behave correctly across all relevant states and transitions?” Testing isolated operations without considering transitions is equivalent to testing only a few nodes of the state graph while ignoring its edges.

Why Exhaustive Testing Is Not Practically Possible for Large Systems

For any non-trivial system:

The number of possible states is enormous or infinite
The number of possible operation sequences grows exponentially
External inputs multiply the space further

This is known as state-space explosion. As a result:

Exhaustive testing is computationally infeasible
Testing can never prove the absence of bugs
This is not a failure of testing—it is a fundamental theoretical limit

That said, complete and exhaustive testing is possible for small, trivial, or highly constrained systems. It is in that space we can test thoroughly, and then build up confidence as we move to more complex systems by applying the principles outlined below.

What Testing Can Give Us Instead: Theoretical Confidence

Although exhaustive testing is impossible for non-trivial systems, we can achieve theoretical confidence by:

Partitioning the state space into equivalence classes
Explicitly testing boundary and edge states
Covering state transitions, not just states
Asserting invariants after every meaningful transition
Using randomness to explore additional sequences probabilistically

Testing does not provide proof. It provides structured evidence that failure is unlikely within the tested model.

The Core Principle

Correctness emerges from transitions, not operations. An operation may be correct in isolation and still be incorrect in context. Testing must therefore:

Drive the system into meaningful states
Perform operations from those states
Assert behavior and invariants afterward

Three Complementary Layers of Proper Testing

Layer 1: Local Contract Testing. Tests that:
- Validate individual operations
- Confirm basic pre- and postconditions
- Ensure error handling is correct
- These tests are necessary—but insufficient.
Layer 2: State and Transition Testing. Tests that:
- Explicitly construct non-trivial system states
- Execute sequences of operations
- Validate invariants across transitions
- This is where most defects are found
Layer 3: Property-Based and Randomized Testing. Tests that:
- Generate many operation sequences
- Assert general properties rather than specific outputs
- Increase coverage of unanticipated paths
- Random testing amplifies confidence but cannot replace explicit state testing

Why Random Testing Alone Fails

Random testing cannot guarantee:

Coverage of rare boundary states
Exploration of critical transitions
Reproduction of known failure patterns

Random testing is valuable only when guided by:

Well-defined properties
Known invariants
Explicitly tested edge cases
Otherwise, it is hope masquerading as methodology

Fresh State vs. Test Isolation

A crucial distinction:

Test isolation is mandatory
State reset between operations is not
Each test case should be independent of others

But within a test case, the system must be allowed—and often forced—to evolve through multiple states. Resetting the system between every operation destroys the very conditions that expose failures.

Real-World Failure Patterns

Most real bugs follow this pattern: “The system works—until you do this, then that, and then this other thing.” These failures are:

Rare
Sequence-dependent
State-dependent
Invisible to isolated tests

Testing strategies that ignore state evolution are precisely how such bugs escape.

ADTs as a Special Case of the General Theory

Testing an ADT is not fundamentally different from testing:

A memory manager
A transaction system
A protocol implementation

They all:

Maintain state
Enforce invariants
Depend on operation sequences/li>

ADT testing is simply the most approachable example of this universal principle.

Summary

Systems are not bags of operations; they are state machines.
Isolated operation testing is necessary but insufficient.
Most bugs arise from state transitions and interactions.
Exhaustive testing is impossible, but structured confidence is achievable.
Proper testing explicitly explores states, transitions, and invariants.
Random testing supports, but never replaces, deliberate design.
If your testing strategy does not explicitly reason about how a system evolves over time, then it is not testing the system—it is testing a falsehood.