Architecture and Design¶
This document describes the internal architecture and design principles of coola.
Overview¶
coola is designed around a flexible, extensible comparison framework that can handle various data
types through a plugin-like architecture. The core design follows these principles:
- Separation of concerns: Comparison logic is separated from data type handling
- Extensibility: New data types can be added without modifying core code
- Type safety: Strong type checking to prevent subtle bugs
- Composability: Complex comparisons are built from simpler ones
Core Components¶
1. Comparison Functions¶
The main entry points for users:
objects_are_equal: Checks exact equalityobjects_are_allclose: Checks equality within tolerance
These functions provide a simple interface while delegating to the internal comparison system.
2. Testers¶
Testers are responsible for orchestrating the comparison process.
BaseEqualityTester¶
Abstract base class defining the tester interface:
class BaseEqualityTester:
def equal(self, actual: Any, expected: Any, config: EqualityConfig) -> bool:
"""Check if two objects are equal."""
...
EqualityTester¶
The default implementation that uses a registry of comparators:
- Maintains a registry mapping types to comparators
- Uses Method Resolution Order (MRO) to find the most specific comparator
- Delegates comparison to the appropriate comparator
Key Features:
- Type-based dispatch
- Support for inheritance hierarchies
- Extensible through registration
3. Comparators¶
Comparators implement comparison logic for specific types.
BaseEqualityComparator¶
Abstract base class for comparators:
class BaseEqualityComparator:
def equal(self, actual: Any, expected: Any, config: EqualityConfig) -> bool:
"""Compare two objects of a specific type."""
...
def clone(self) -> BaseEqualityComparator:
"""Create a copy of this comparator."""
...
Built-in Comparators¶
DefaultEqualityComparator: Handles basic Python typesMappingEqualityComparator: Handles dict and mapping typesSequenceEqualityComparator: Handles list, tuple, and sequencesTorchTensorComparator: Handles PyTorch tensorsNumpyArrayComparator: Handles NumPy arraysPandasDataFrameComparator: Handles pandas DataFrames- And more for other supported types...
4. Configuration¶
EqualityConfig¶
Configuration object that carries comparison settings through the comparison tree:
@dataclasses.dataclass
class EqualityConfig:
tester: BaseEqualityTester
show_difference: bool = False
# Additional settings...
This allows behavior to be customized without changing comparator signatures.
5. Handlers¶
Handlers provide reusable comparison logic for common checks:
DTypeHandler: Compares data typesShapeHandler: Compares array shapesDeviceHandler: Compares PyTorch device placementNativeEqualHandler: Performs native equality checks- And more...
Handlers promote code reuse and consistency across comparators.
Data Flow¶
Here's how a comparison flows through the system:
User calls objects_are_equal(obj1, obj2)
↓
Creates EqualityConfig with settings
↓
Calls tester.equal(obj1, obj2, config)
↓
Tester looks up comparator based on obj1's type
↓
Calls comparator.equal(obj1, obj2, config)
↓
Comparator performs type-specific checks
↓
May recursively call tester.equal() for nested objects
↓
Returns boolean result
Design Patterns¶
1. Strategy Pattern¶
Comparators implement different comparison strategies for different types, allowing the algorithm to vary independently from the clients that use it.
2. Chain of Responsibility¶
The MRO-based comparator lookup implements a chain of responsibility, trying more specific comparators before falling back to general ones.
3. Template Method¶
Many comparators follow a template:
- Check types match
- Check metadata (shape, dtype, etc.)
- Check values
- Optionally show differences
4. Registry Pattern¶
The comparator registry allows runtime type-to-comparator mapping, enabling extensibility.
5. Visitor Pattern¶
The recursive nature of comparison through nested structures follows a visitor-like pattern.
Extension Points¶
Adding Support for New Types¶
To add support for a custom type:
-
Implement a Comparator:
class MyTypeComparator(BaseEqualityComparator): def equal(self, actual: MyType, expected: Any, config: EqualityConfig) -> bool: # Type check if type(actual) is not type(expected): return False # Custom comparison logic return actual.compare_to(expected) def clone(self): return MyTypeComparator() -
Register the Comparator:
tester = EqualityTester.local_copy() tester.add_comparator(MyType, MyTypeComparator()) -
Use with Custom Tester:
objects_are_equal(obj1, obj2, tester=tester)
Type System¶
Strict Type Checking¶
coola enforces strict type checking:
1(int) ≠1.0(float) ≠True(bool)list≠tupledict≠OrderedDict
This prevents subtle bugs from type coercion.
Type Hierarchy Support¶
Through MRO-based lookup, coola supports inheritance:
- A comparator for
Sequenceapplies tolist,tuple, etc. - More specific comparators override general ones
- Custom subclasses inherit parent comparators
Performance Considerations¶
Early Exit¶
Comparators check fast properties first:
- Type check (very fast)
- Metadata checks (fast: shape, dtype, device)
- Value comparison (potentially slow)
Lazy Evaluation¶
Comparisons short-circuit on first difference when possible.
Caching¶
The tester caches comparator lookups by type for performance.
Recursive Depth¶
For deeply nested structures, comparison is recursive. Very deep nesting may hit recursion limits ( typically ~1000 levels in Python).
Error Handling¶
Graceful Degradation¶
When a specific comparator is not available, coola falls back to:
- More general comparator (via MRO)
- Default comparator (for
object) - Native equality check as last resort
Informative Messages¶
When show_difference=True, comparators log:
- What objects differ
- Where in the structure the difference is
- The actual values that differ
Testing Strategy¶
The coola codebase uses:
- Unit tests: Test individual comparators in isolation
- Integration tests: Test complete comparison workflows
- Property-based tests: Test invariants (e.g., reflexivity)
- Cross-library tests: Test integration with PyTorch, NumPy, etc.
Dependencies¶
Core Dependencies¶
- Python 3.10+: Core language features
Optional Dependencies¶
- torch: PyTorch tensor support
- numpy: NumPy array support
- pandas: DataFrame support
- polars: Polars DataFrame support
- xarray: xarray support
- jax: JAX array support
- pyarrow: PyArrow table support
Each optional dependency is only imported when used (lazy loading).
Module Organization¶
coola/
├── comparison.py # Main public API
├── equality/
│ ├── comparators/ # Type-specific comparators
│ │ ├── base.py
│ │ ├── default.py
│ │ ├── collection.py # Mapping, Sequence
│ │ ├── torch_.py
│ │ ├── numpy_.py
│ │ └── ...
│ ├── testers/ # Comparison orchestration
│ │ ├── base.py
│ │ └── default.py
│ ├── handlers/ # Reusable comparison logic
│ └── config.py # Configuration
├── allclose/ # Tolerance-based comparison
└── utils/ # Utility functions
Design Decisions¶
Why Strict Type Checking?¶
Rationale: Prevents subtle bugs from implicit type coercion. In scientific computing, knowing
that 1 (int) and 1.0 (float) are treated differently can catch numerical issues.
Trade-off: Less convenient for some use cases, but more explicit and safe.
Why Registry-Based Dispatch?¶
Rationale: Allows extensibility without modifying core code. Users can add support for their own types.
Trade-off: Slightly more complex than if/else chains, but much more maintainable.
Why Separate Testers and Comparators?¶
Rationale: Separation of concerns. Testers handle dispatch and orchestration, comparators handle type-specific logic.
Trade-off: More classes/files, but better modularity.
Why Handlers?¶
Rationale: Code reuse. Many comparators need similar checks (dtype, shape, etc.).
Trade-off: One more abstraction layer, but reduces duplication.
Future Directions¶
Potential areas for enhancement:
- Parallel comparison: For large independent comparisons
- Streaming comparison: For very large objects that don't fit in memory
- Approximate structural matching: For comparing objects with similar but not identical structure
- Diff generation: Not just boolean result, but detailed diff
- Performance optimizations: Cython/Numba for hot paths
References¶
- PEP 8: Python style guide
- PyTorch documentation
- NumPy documentation
- Design Patterns: Gang of Four patterns
Contributing¶
To contribute to coola's architecture:
- Understand the existing patterns
- Follow the established conventions
- Document design decisions
- Write tests for new components
- Update this document for significant changes
See the contributing guide for more details.