Summarizing Nested Data¶

This page describes the coola.summary package, which provides utilities for creating human-readable text summaries of complex and nested data structures.

Prerequisites: You'll need to know a bit of Python. For a refresher, see the Python tutorial.

Overview¶

The coola.summary package provides a type-based system for generating formatted string representations of Python objects, with special support for nested data structures. Unlike Python's built-in repr() or str(), this package offers:

Configurable depth control: Limit how deep nested structures are expanded
Type-specific formatting: Different types are formatted according to their structure
Truncation support: Limit the number of items shown in collections
Extensibility: Register custom summarizers for your own types
Automatic type dispatch: Automatically selects the right summarizer based on type

This is particularly useful for debugging, logging, and displaying complex data in a readable format without being overwhelmed by deeply nested or large structures.

Basic Usage¶

The `summarize()` Function¶

The simplest way to create a summary is using the summarize() function:

>>> from coola.summary import summarize
>>> print(summarize({"a": 1, "b": "abc"}))
<class 'dict'> (length=2)
  (a): 1
  (b): abc

The function automatically detects the type of data and formats it appropriately.

Simple Values¶

For simple scalar values, the summary shows the type and value:

>>> from coola.summary import summarize
>>> print(summarize(42))
<class 'int'> 42
>>> print(summarize(3.14159))
<class 'float'> 3.14159
>>> print(summarize("hello"))
<class 'str'> hello
>>> print(summarize(True))
<class 'bool'> True

Lists and Tuples¶

Sequences are displayed with their type, length, and indexed items:

>>> from coola.summary import summarize
>>> print(summarize([1, 2, 3, 4, 5]))
<class 'list'> (length=5)
  (0): 1
  (1): 2
  (2): 3
  (3): 4
  (4): 5

Tuples work the same way:

>>> from coola.summary import summarize
>>> print(summarize((10, 20, 30)))
<class 'tuple'> (length=3)
  (0): 10
  (1): 20
  (2): 30

Dictionaries¶

Dictionaries show their type, length, and key-value pairs:

>>> from coola.summary import summarize
>>> print(summarize({"name": "Alice", "age": 30, "city": "NYC"}))
<class 'dict'> (length=3)
  (name): Alice
  (age): 30
  (city): NYC

Sets¶

Sets display their type, length, and elements:

>>> from coola.summary import summarize
>>> print(summarize({1, 2, 3, 4, 5}))
<class 'set'> (length=5)
  (0): 1
  (1): 2
  (2): 3
  (3): 4
  (4): 5

Note: Sets are unordered, so the element order in the summary may vary.

Working with Nested Structures¶

Depth Control¶

The most important feature of coola.summary is controlling how deeply nested structures are expanded using the max_depth parameter:

>>> from coola.summary import get_default_registry
>>> nested_data = {"level1": {"level2": {"level3": [1, 2, 3]}}}
>>> registry = get_default_registry()

max_depth=0: Shows the raw string representation (no expansion):

>>> print(registry.summarize(nested_data, max_depth=0))
{'level1': {'level2': {'level3': [1, 2, 3]}}}

max_depth=1: Expands only the top level:

>>> print(registry.summarize(nested_data, max_depth=1))
<class 'dict'> (length=1)
  (level1): {'level2': {'level3': [1, 2, 3]}}

max_depth=2: Expands two levels deep:

>>> print(registry.summarize(nested_data, max_depth=2))
<class 'dict'> (length=1)
  (level1): <class 'dict'> (length=1)
      (level2): {'level3': [1, 2, 3]}

max_depth=3: Expands three levels deep:

>>> print(registry.summarize(nested_data, max_depth=3))
<class 'dict'> (length=1)
  (level1): <class 'dict'> (length=1)
      (level2): <class 'dict'> (length=1)
          (level3): [1, 2, 3]

Complex Nested Examples¶

Nested lists and dictionaries:

>>> from coola.summary import summarize
>>> data = {
...     "users": [
...         {"name": "Alice", "scores": [95, 87, 92]},
...         {"name": "Bob", "scores": [88, 91, 85]},
...     ],
...     "metadata": {"count": 2, "version": "1.0"},
... }
>>> print(summarize(data))
<class 'dict'> (length=2)
  (users): [{'name': 'Alice', 'scores': [95, 87, 92]}, {'name': 'Bob', 'scores': [88, 91, 85]}]
  (metadata): {'count': 2, 'version': '1.0'}

Mixed types:

>>> from coola.summary import summarize
>>> mixed = {
...     "int": 42,
...     "float": 3.14,
...     "string": "hello",
...     "list": [1, 2, 3],
...     "tuple": (4, 5, 6),
...     "dict": {"nested": "value"},
... }
>>> print(summarize(mixed))
<class 'dict'> (length=6)
  (int): 42
  (float): 3.14
  (string): hello
  (list): [1, 2, 3]
  (tuple): (4, 5, 6)
  ...

Controlling Output Size¶

Limiting Items in Collections¶

By default, collections show a maximum of 5 items. Longer collections are truncated with ...:

>>> from coola.summary import summarize
>>> long_list = list(range(20))
>>> print(summarize(long_list))
<class 'list'> (length=20)
  (0): 0
  (1): 1
  (2): 2
  (3): 3
  (4): 4
  ...

Custom max_items¶

You can customize the number of items shown using custom summarizers:

>>> from coola.summary import SummarizerRegistry, SequenceSummarizer, DefaultSummarizer
>>> registry = SummarizerRegistry()
>>> registry.register(object, DefaultSummarizer())
>>> registry.register(list, SequenceSummarizer(max_items=3))
>>> long_list = list(range(10))
>>> print(registry.summarize(long_list))
<class 'list'> (length=10)
  (0): 0
  (1): 1
  (2): 2
  ...

To show all items, set max_items=-1:

>>> from coola.summary import SummarizerRegistry, SequenceSummarizer, DefaultSummarizer
>>> registry = SummarizerRegistry()
>>> registry.register(object, DefaultSummarizer())
>>> registry.register(list, SequenceSummarizer(max_items=-1))
>>> print(registry.summarize(list(range(10))))
<class 'list'> (length=10)
  (0): 0
  (1): 1
  (2): 2
  (3): 3
  (4): 4
  (5): 5
  (6): 6
  (7): 7
  (8): 8
  (9): 9

Truncating Long Strings¶

For very long strings or values, you can use DefaultSummarizer with max_characters:

>>> from coola.summary import SummarizerRegistry, DefaultSummarizer
>>> long_string = (
...     "This is a very long string that should be truncated when max_characters is set"
... )
>>> registry = SummarizerRegistry()
>>> registry.register(object, DefaultSummarizer(max_characters=30))
>>> print(registry.summarize(long_string))
<class 'str'> This is a very long string tha...

Customizing Indentation¶

The num_spaces parameter controls indentation for nested structures:

>>> from coola.summary import (
...     SummarizerRegistry,
...     MappingSummarizer,
...     SequenceSummarizer,
...     DefaultSummarizer,
... )
>>> data = {"a": [1, 2, 3], "b": {"nested": "value"}}
>>> # Default: 2 spaces
>>> registry = SummarizerRegistry()
>>> registry.register(object, DefaultSummarizer())
>>> registry.register(list, SequenceSummarizer(num_spaces=2))
>>> registry.register(dict, MappingSummarizer(num_spaces=2))
>>> print(registry.summarize(data, max_depth=3))
<class 'dict'> (length=2)
  (a): <class 'list'> (length=3)
      (0): <class 'int'> 1
      (1): <class 'int'> 2
      (2): <class 'int'> 3
  (b): <class 'dict'> (length=1)
      (nested): <class 'str'> value

With 4 spaces for clearer nesting:

>>> from coola.summary import (
...     SummarizerRegistry,
...     MappingSummarizer,
...     SequenceSummarizer,
...     DefaultSummarizer,
... )
>>> registry = SummarizerRegistry()
>>> registry.register(object, DefaultSummarizer())
>>> registry.register(list, SequenceSummarizer(num_spaces=4))
>>> registry.register(dict, MappingSummarizer(num_spaces=4))
>>> print(registry.summarize(data, max_depth=3))
<class 'dict'> (length=2)
    (a): <class 'list'> (length=3)
            (0): <class 'int'> 1
            (1): <class 'int'> 2
            (2): <class 'int'> 3
    (b): <class 'dict'> (length=1)
            (nested): <class 'str'> value

Working with NumPy and PyTorch¶

NumPy Arrays¶

The NDArraySummarizer creates compact summaries of NumPy arrays:

>>> import numpy as np
>>> from coola.summary import SummarizerRegistry, NDArraySummarizer, DefaultSummarizer
>>> registry = SummarizerRegistry()
>>> registry.register(object, DefaultSummarizer())
>>> registry.register(np.ndarray, NDArraySummarizer())
>>> arr = np.arange(100).reshape(10, 10)
>>> print(registry.summarize(arr))
<class 'numpy.ndarray'> | shape=(10, 10) | dtype=int64

By default, only metadata is shown. To see the actual data, use show_data=True:

>>> from coola.summary import SummarizerRegistry, NDArraySummarizer, DefaultSummarizer
>>> import numpy as np
>>> registry = SummarizerRegistry()
>>> registry.register(object, DefaultSummarizer())
>>> registry.register(np.ndarray, NDArraySummarizer(show_data=True))
>>> arr = np.arange(5)
>>> print(registry.summarize(arr))
array([0, 1, 2, 3, 4])

PyTorch Tensors¶

The TensorSummarizer works similarly for PyTorch tensors:

>>> import torch
>>> from coola.summary import SummarizerRegistry, TensorSummarizer, DefaultSummarizer
>>> registry = SummarizerRegistry()
>>> registry.register(object, DefaultSummarizer())
>>> registry.register(torch.Tensor, TensorSummarizer())
>>> tensor = torch.randn(3, 4, 5)
>>> print(registry.summarize(tensor))
<class 'torch.Tensor'> | shape=torch.Size([3, 4, 5]) | dtype=torch.float32 | device=cpu | requires_grad=False

With show_data=True:

>>> from coola.summary import SummarizerRegistry, TensorSummarizer, DefaultSummarizer
>>> import torch
>>> registry = SummarizerRegistry()
>>> registry.register(object, DefaultSummarizer())
>>> registry.register(torch.Tensor, TensorSummarizer(show_data=True))
>>> tensor = torch.arange(5)
>>> print(registry.summarize(tensor))
tensor([0, 1, 2, 3, 4])

Creating Custom Summarizers¶

Defining a Custom Summarizer¶

To create a custom summarizer for your own types, extend BaseSummarizer:

>>> from coola.summary import BaseSummarizer, SummarizerRegistry

>>> class Person:
...     def __init__(self, name, age):
...         self.name = name
...         self.age = age
...
>>> class PersonSummarizer(BaseSummarizer):
...     def equal(self, other: object) -> bool:
...         return type(self) is type(other)
...     def summarize(self, data, registry, depth=0, max_depth=1):
...         return f"Person(name={data.name!r}, age={data.age})"
...

Registering Custom Summarizers¶

Register your custom summarizer with the registry:

>>> from coola.summary import register_summarizers
>>> register_summarizers({Person: PersonSummarizer()})
>>> person = Person("Alice", 30)
>>> from coola.summary import summarize
>>> print(summarize(person))
Person(name='Alice', age=30)

Working with the Registry¶

Understanding the Registry¶

The SummarizerRegistry manages the mapping from types to summarizers:

>>> from coola.summary import get_default_registry
>>> registry = get_default_registry()
>>> registry
SummarizerRegistry(
  (state): TypeRegistry(
      (<class 'object'>): DefaultSummarizer(max_characters=-1)
      (<class 'str'>): DefaultSummarizer(max_characters=-1)
      (<class 'int'>): DefaultSummarizer(max_characters=-1)
      (<class 'float'>): DefaultSummarizer(max_characters=-1)
      (<class 'complex'>): DefaultSummarizer(max_characters=-1)
      (<class 'bool'>): DefaultSummarizer(max_characters=-1)
      (<class 'list'>): SequenceSummarizer(max_items=5, num_spaces=2)
      (<class 'tuple'>): SequenceSummarizer(max_items=5, num_spaces=2)
      (<class 'collections.abc.Sequence'>): SequenceSummarizer(max_items=5, num_spaces=2)
      (<class 'set'>): SetSummarizer(max_items=5, num_spaces=2)
      (<class 'frozenset'>): SetSummarizer(max_items=5, num_spaces=2)
      (<class 'dict'>): MappingSummarizer(max_items=5, num_spaces=2)
      (<class 'collections.abc.Mapping'>): MappingSummarizer(max_items=5, num_spaces=2)
      ...
    )
)

Creating a Custom Registry¶

For complete control, create your own registry:

>>> from coola.summary import SummarizerRegistry, SequenceSummarizer, DefaultSummarizer
>>> registry = SummarizerRegistry()
>>> registry.register(object, DefaultSummarizer())
>>> registry.register(list, SequenceSummarizer())
>>> print(registry.summarize([1, 2, 3]))
<class 'list'> (length=3)
  (0): 1
  (1): 2
  (2): 3

Registering Multiple Types¶

Use register_many() to register multiple types at once:

>>> from coola.summary import (
...     SummarizerRegistry,
...     SequenceSummarizer,
...     MappingSummarizer,
...     DefaultSummarizer,
... )
>>> registry = SummarizerRegistry()
>>> registry.register_many(
...     {object: DefaultSummarizer(), list: SequenceSummarizer(), dict: MappingSummarizer()}
... )

Checking Registered Types¶

Check if a type has a registered summarizer:

>>> from coola.summary import get_default_registry
>>> registry = get_default_registry()
>>> registry.has_summarizer(list)
True
>>> registry.has_summarizer(int)
True

Common Use Cases¶

Debugging Complex Data¶

Quickly inspect complex data structures during debugging:

>>> from coola.summary import summarize
>>> config = {
...     "database": {
...         "host": "localhost",
...         "port": 5432,
...         "credentials": {"user": "admin", "password": "secret"},
...     },
...     "cache": {"enabled": True, "ttl": 3600, "backends": ["redis", "memcached"]},
... }
>>> print(summarize(config))
<class 'dict'> (length=2)
  (database): {'host': 'localhost', 'port': 5432, 'credentials': {'user': 'admin', 'password': 'secret'}}
  (cache): {'enabled': True, 'ttl': 3600, 'backends': ['redis', 'memcached']}

Logging Large Tensors¶

Log tensor shapes without filling logs with data:

>>> import torch
>>> from coola.summary import summarize
>>> model_weights = {
...     "layer1": torch.randn(1000, 1000),
...     "layer2": torch.randn(1000, 500),
...     "bias": torch.randn(500),
... }
>>> print(summarize(model_weights, max_depth=2))
<class 'dict'> (length=3)
  (layer1): <class 'torch.Tensor'> | shape=torch.Size([1000, 1000]) | dtype=torch.float32 | device=cpu | requires_grad=False
  (layer2): <class 'torch.Tensor'> | shape=torch.Size([1000, 500]) | dtype=torch.float32 | device=cpu | requires_grad=False
  (bias): <class 'torch.Tensor'> | shape=torch.Size([500]) | dtype=torch.float32 | device=cpu | requires_grad=False

Inspecting API Responses¶

Summarize complex API responses:

>>> from coola.summary import summarize
>>> api_response = {
...     "status": "success",
...     "data": {
...         "items": [
...             {"id": 1, "name": "Item 1", "tags": ["tag1", "tag2"]},
...             {"id": 2, "name": "Item 2", "tags": ["tag3"]},
...             # ... potentially many more items
...         ],
...         "pagination": {"page": 1, "total_pages": 10},
...     },
... }
>>> print(summarize(api_response))
<class 'dict'> (length=2)
  (status): success
  (data): {'items': [{'id': 1, 'name': 'Item 1', 'tags': ['tag1', 'tag2']}, {'id': 2, 'name': 'Item 2', 'tags': ['tag3']}], 'pagination': {'page': 1, 'total_pages': 10}}

Comparing Data Structures¶

Get a quick overview to compare different data structures. Note that with default max_depth=1, nested structures are shown as raw strings without truncation:

>>> from coola.summary import summarize
>>> data1 = {"users": [1, 2, 3, 4, 5], "version": "1.0"}
>>> data2 = {"users": [1, 2, 3], "version": "2.0"}
>>> print(summarize(data1))
<class 'dict'> (length=2)
  (users): [1, 2, 3, 4, 5]
  (version): 1.0
>>> print(summarize(data2))
<class 'dict'> (length=2)
  (users): [1, 2, 3]
  (version): 2.0

For deeper inspection with truncation, increase max_depth:

>>> from coola.summary import get_default_registry
>>> registry = get_default_registry()
>>> data = {"users": list(range(20)), "version": "1.0"}
>>> print(registry.summarize(data, max_depth=2))
<class 'dict'> (length=2)
  (users): <class 'list'> (length=20)
      (0): 0
      (1): 1
      (2): 2
      (3): 3
      (4): 4
      ...
  (version): <class 'str'> 1.0

Available Summarizers¶

The coola.summary package provides the following built-in summarizers:

DefaultSummarizer: For generic objects and scalar types (int, float, str, bool, etc.)
Configurable: max_characters (default: -1, no limit)
SequenceSummarizer: For sequences (list, tuple, Sequence ABC)
Configurable: max_items (default: 5), num_spaces (default: 2)
MappingSummarizer: For mappings (dict, Mapping ABC)
Configurable: max_items (default: 5), num_spaces (default: 2)
SetSummarizer: For sets (set, frozenset)
Configurable: max_items (default: 5), num_spaces (default: 2)
NDArraySummarizer: For NumPy arrays (requires NumPy)
Configurable: show_data (default: False)
TensorSummarizer: For PyTorch tensors (requires PyTorch)
Configurable: show_data (default: False)

Design Principles¶

The coola.summary package is designed with the following principles:

Type-based dispatch: Automatically selects the appropriate summarizer based on data type
Recursive summarization: Handles deeply nested structures through the registry pattern
Configurable output: Control depth, item limits, and formatting to suit your needs
Extensibility: Easy to add support for custom types via the registry
Sensible defaults: Works out-of-the-box for common Python types
Metadata focus: For large data structures (arrays, tensors), show metadata instead of data