Iterating Over Nested Data¶

This page describes the coola.iterator package, which provides utilities for iterating over nested data structures using different traversal strategies.

Prerequisites: You'll need to know a bit of Python. For a refresher, see the Python tutorial.

Overview¶

The coola.iterator package provides functions to iterate over nested data structures (lists, dicts, tuples, sets, etc.) using two different traversal strategies:

Depth-First Search (DFS): Traverses nested structures by going deep into each branch before moving to the next
Breadth-First Search (BFS): Traverses nested structures level by level

The package also provides filtering utilities to extract only specific types from heterogeneous collections.

Depth-First Search (DFS) Iteration¶

Basic Usage¶

The dfs_iterate function performs depth-first traversal of nested data structures:

>>> from coola.iterator import dfs_iterate
>>> list(dfs_iterate({"a": 1, "b": "abc"}))
[1, 'abc']
>>> list(dfs_iterate([1, [2, 3], {"x": 4}]))
[1, 2, 3, 4]

The function yields only the leaf values (atomic values that are not containers), excluding the containers themselves even if they're empty.

DFS Traversal Examples¶

Simple nested lists:

>>> from coola.iterator import dfs_iterate
>>> list(dfs_iterate([1, 2, [3, 4, [5, 6]]]))
[1, 2, 3, 4, 5, 6]

Nested dictionaries:

>>> from coola.iterator import dfs_iterate
>>> data = {"level1": {"level2": {"level3": 42}}}
>>> list(dfs_iterate(data))
[42]

Mixed structures:

>>> from coola.iterator import dfs_iterate
>>> data = {
...     "list": [1, 2, 3],
...     "dict": {"a": 4, "b": 5},
...     "set": {6, 7},
...     "value": 8,
... }
>>> sorted(dfs_iterate(data))  # sorted for consistent output
[1, 2, 3, 4, 5, 6, 7, 8]

Sets and tuples:

>>> from coola.iterator import dfs_iterate
>>> list(dfs_iterate((1, (2, 3))))
[1, 2, 3]
>>> list(dfs_iterate((1, (2, (3, 4)))))
[1, 2, 3, 4]

Understanding DFS Order¶

Depth-first search goes deep into each branch before moving to the next branch:

>>> from coola.iterator import dfs_iterate
>>> data = {
...     "branch1": [1, [2, 3]],
...     "branch2": [4, [5, 6]],
... }
>>> # DFS will fully explore branch1 before moving to branch2
>>> list(dfs_iterate(data))  # Note: dict order may vary in Python < 3.7
[1, 2, 3, 4, 5, 6]

Breadth-First Search (BFS) Iteration¶

Basic Usage¶

The bfs_iterate function performs breadth-first traversal of nested data structures:

>>> from coola.iterator import bfs_iterate
>>> list(bfs_iterate({"a": 1, "b": "abc"}))
[1, 'abc']
>>> list(bfs_iterate([1, [2, 3], {"x": 4}]))
[1, 2, 3, 4]

Like dfs_iterate, it yields only the leaf values, excluding containers.

BFS Traversal Examples¶

Nested lists:

>>> from coola.iterator import bfs_iterate
>>> list(bfs_iterate([1, [2, [3, [4]]]]))
[1, 2, 3, 4]

Multi-level structure:

>>> from coola.iterator import bfs_iterate
>>> data = {
...     "level1_a": 1,
...     "level1_b": {"level2_a": 2, "level2_b": {"level3": 3}},
... }
>>> # BFS processes level by level: 1 at level 1, then 2 at level 2, then 3 at level 3
>>> list(bfs_iterate(data))
[1, 2, 3]

BFS vs DFS¶

The main difference between BFS and DFS is the order of traversal:

>>> from coola.iterator import bfs_iterate, dfs_iterate
>>> data = [[1, 2], [3, 4]]
>>> list(dfs_iterate(data))  # DFS: depth-first
[1, 2, 3, 4]
>>> list(bfs_iterate(data))  # BFS: breadth-first
[1, 2, 3, 4]

For this simple example, the results are the same, but for more complex nested structures, the traversal order can differ significantly. BFS processes all items at each level before moving deeper, while DFS goes as deep as possible before backtracking.

Type-Based Filtering¶

Basic Filtering¶

The filter_by_type function filters an iterator to yield only values of specified types:

>>> from coola.iterator import filter_by_type
>>> list(filter_by_type([1, "hello", 2, 3.14, "world", 4], int))
[1, 2, 4]

Filtering Multiple Types¶

You can filter for multiple types using a tuple:

>>> from coola.iterator import filter_by_type
>>> # Note: bool is a subclass of int
>>> list(filter_by_type([1, "hello", 2.5, True, None, [1, 2]], (int, float)))
[1, 2.5, True]

Combining with Iteration¶

You can combine type filtering with nested iteration:

>>> from coola.iterator import dfs_iterate, filter_by_type
>>> data = {"a": 1, "b": "hello", "c": [2, "world", 3.14]}
>>> # Get all numeric values (int and float) from nested structure
>>> list(filter_by_type(dfs_iterate(data), (int, float)))
[1, 2, 3.14]

Extract only strings:

>>> from coola.iterator import dfs_iterate, filter_by_type
>>> data = [1, "a", [2, "b", [3, "c"]]]
>>> list(filter_by_type(dfs_iterate(data), str))
['a', 'b', 'c']

Extract only integers:

>>> from coola.iterator import bfs_iterate, filter_by_type
>>> data = {"nums": [1, 2.5, 3], "text": "hello", "value": 2}
>>> list(filter_by_type(bfs_iterate(data), int))
[2, 1, 3]

Advanced Usage¶

Custom Iterators¶

For more control over how specific types are iterated, you can register custom iterators using the registry system.

DFS Custom Registry:

>>> from coola.iterator.dfs import register_iterators, IterableIterator
>>> # Register custom behavior for specific types
>>> register_iterators({list: IterableIterator()}, exist_ok=True)

BFS Custom Registry:

>>> from coola.iterator.bfs import register_child_finders, IterableChildFinder
>>> # Register custom behavior for specific types
>>> register_child_finders({list: IterableChildFinder()}, exist_ok=True)

Using Custom Registry¶

You can create and use a custom registry:

>>> from coola.iterator.dfs import IteratorRegistry, dfs_iterate
>>> from coola.iterator.dfs import DefaultIterator, IterableIterator
>>> registry = IteratorRegistry()
>>> registry.register(list, IterableIterator())
>>> registry.register(object, DefaultIterator())
>>> list(dfs_iterate([1, 2, 3], registry=registry))
[1, 2, 3]

Common Use Cases¶

Extracting All Values¶

Extract all values from a complex configuration:

>>> from coola.iterator import dfs_iterate
>>> config = {
...     "database": {"host": "localhost", "port": 5432},
...     "cache": {"enabled": True, "ttl": 3600},
... }
>>> list(dfs_iterate(config))
['localhost', 5432, True, 3600]

Counting Values¶

Count specific types in a nested structure:

>>> from coola.iterator import dfs_iterate, filter_by_type
>>> data = {
...     "scores": [95, 87, 92],
...     "names": ["Alice", "Bob", "Charlie"],
... }
>>> len(list(filter_by_type(dfs_iterate(data), int)))
3

Finding All Strings¶

Find all string values in a nested structure:

>>> from coola.iterator import dfs_iterate, filter_by_type
>>> data = {
...     "user": {"name": "John", "age": 30},
...     "posts": [{"title": "Post 1", "views": 100}, {"title": "Post 2", "views": 200}],
... }
>>> list(filter_by_type(dfs_iterate(data), str))
['John', 'Post 1', 'Post 2']

Validating Data Types¶

Check if all numeric values in a structure are within a range:

>>> from coola.iterator import dfs_iterate, filter_by_type
>>> data = {"values": [10, 20, 30], "nested": {"more": [40, 50]}}
>>> numbers = list(filter_by_type(dfs_iterate(data), (int, float)))
>>> all(0 <= n <= 100 for n in numbers)
True

Design Principles¶

The coola.iterator package design provides:

Multiple traversal strategies: Choose between DFS and BFS based on your needs
Generator-based: Memory-efficient iteration without loading entire structures
Type-aware: Built-in support for common Python types with extensibility
Clean API: Simple functions that compose well with other Python tools

Iterating Over Nested Data¶

Overview¶

Depth-First Search (DFS) Iteration¶

Basic Usage¶

DFS Traversal Examples¶

Understanding DFS Order¶

Breadth-First Search (BFS) Iteration¶

Basic Usage¶

BFS Traversal Examples¶

BFS vs DFS¶

Type-Based Filtering¶

Basic Filtering¶

Filtering Multiple Types¶

Combining with Iteration¶

Advanced Usage¶

Custom Iterators¶

Using Custom Registry¶

Common Use Cases¶

Extracting All Values¶

Counting Values¶

Finding All Strings¶

Validating Data Types¶

Design Principles¶

See Also¶