Skip to content

Hashing

coola.hashing

Provide deterministic hashing functions.

coola.hashing.BaseHasher

Bases: ABC, Generic[T]

Base class for type-specific hashers.

This abstract base class defines the interface for hashers that compute a hash value for a specific data type. Each concrete hasher implementation handles a specific data type and knows how to hash its nested elements recursively.

Notes

Subclasses must implement the hash method to define how their specific type should be hashed.

Example
>>> from coola.hashing import StrHasher, HasherRegistry
>>> registry = HasherRegistry()
>>> hasher = StrHasher()
>>> hasher
StrHasher()
>>> hasher.hash([1, 2, 3], registry=registry)

coola.hashing.BaseHasher.hash abstractmethod

hash(
    data: T, registry: HasherRegistry, length: int = 64
) -> str

Hash the given data structure recursively.

This method traverses the data structure and computes a hash value for it. The registry is used to resolve appropriate hashers for nested data types encountered during traversal.

Parameters:

Name Type Description Default
data T

The data structure to hash. Must be of the type T that this hasher handles.

required
registry HasherRegistry

The hasher registry used to look up hashers for nested data structures of different types.

required
length int

The desired length of the returned hex string. Must be an even number between 2 and 128 inclusive, since each byte of the digest encodes as two hex characters.

64

Returns:

Type Description
str

A string representing the hash of the input data.

Example
>>> from coola.hashing import StrHasher, HasherRegistry
>>> registry = HasherRegistry()
>>> hasher = StrHasher()
>>> hasher.hash("Meowwww", registry=registry)

coola.hashing.DatetimeHasher

Bases: BaseHasher[date]

Hasher for :class:datetime.date and :class:datetime.datetime objects.

This hasher converts the object to its ISO 8601 string representation via isoformat() and then computes the hash of that string.

Since :class:datetime.datetime is a subclass of :class:datetime.date, both types are handled by this hasher. Their ISO representations are always distinct (e.g. '2021-01-01' vs '2021-01-01T00:00:00'), so they never produce the same hash for the same calendar date.

Example
>>> from datetime import date
>>> from coola.hashing import DatetimeHasher, HasherRegistry
>>> registry = HasherRegistry()
>>> hasher = DatetimeHasher()
>>> hasher
DatetimeHasher()
>>> hasher.hash(date(2021, 1, 1), registry=registry)
'f2b4c6a9941206bb6fc3b4b9c1104d8c05264985c009e2e1c7c840aaeda00dac'

coola.hashing.HasherRegistry

Registry that manages and dispatches hashers based on data type.

This registry maintains a mapping from Python types to hasher instances and uses the Method Resolution Order (MRO) for type lookup. When hashing data, it automatically selects the most specific registered hasher for the data's type, falling back to parent types or a default hasher if needed.

Parameters:

Name Type Description Default
initial_state dict[type, BaseHasher[Any]] | None

Optional initial mapping of types to hashers. If provided, the state is copied to prevent external mutations.

None
Example

Basic usage with a sequence hasher:

>>> from coola.hashing import HasherRegistry, SequenceHasher, StrHasher
>>> registry = HasherRegistry({object: StrHasher(), list: SequenceHasher()})
>>> registry
HasherRegistry(
  (state): TypeRegistry(
      (<class 'object'>): StrHasher()
      (<class 'list'>): SequenceHasher()
    )
)
>>> registry.hash([1, 2, 3])

Registering custom hashers:

>>> from coola.hashing import HasherRegistry, SequenceHasher, StrHasher
>>> registry = HasherRegistry({object: StrHasher()})
>>> registry.register(list, SequenceHasher())
>>> registry.hash([1, 2, 3])

Working with nested structures:

>>> from coola.hashing import get_default_registry
>>> registry = get_default_registry()
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> registry.hash(data)

coola.hashing.HasherRegistry.find_hasher

find_hasher(data_type: type) -> BaseHasher[Any]

Find the appropriate hasher for a given type.

Uses the Method Resolution Order (MRO) to find the most specific registered hasher. For example, if a hasher is registered for Sequence but not for list, lists will use the Sequence hasher.

Parameters:

Name Type Description Default
data_type type

The Python type to find a hasher for.

required

Returns:

Type Description
BaseHasher[Any]

The most specific registered hasher for this type, resolved

BaseHasher[Any]

via MRO, or the default hasher if no match is found.

Example
>>> from collections.abc import Sequence
>>> from coola.hashing import HasherRegistry, SequenceHasher, StrHasher
>>> registry = HasherRegistry({object: StrHasher()})
>>> registry.register(Sequence, SequenceHasher())
>>> hasher = registry.find_hasher(list)
>>> hasher
StrHasher()

coola.hashing.HasherRegistry.has_hasher

has_hasher(data_type: type) -> bool

Check if a hasher is explicitly registered for the given type.

Note that this only checks for direct registration. Even if this returns False, find_hasher may still return a hasher via MRO lookup or the default hasher.

Parameters:

Name Type Description Default
data_type type

The type to check.

required

Returns:

Type Description
bool

True if a hasher is explicitly registered for this type, False otherwise.

Example
>>> from coola.hashing import HasherRegistry, SequenceHasher
>>> registry = HasherRegistry()
>>> registry.register(list, SequenceHasher())
>>> registry.has_hasher(list)
True
>>> registry.has_hasher(tuple)
False

coola.hashing.HasherRegistry.hash

hash(data: object, length: int = 64) -> str

Hash the given data by recursively traversing its structure.

This is the main entry point for hashing. It automatically:

  1. Determines the data's type.
  2. Finds the appropriate hasher via find_hasher.
  3. Delegates to that hasher's hash method, which recursively processes any nested structures.

Parameters:

Name Type Description Default
data object

The data to hash. Can be a nested structure such as a list, dict, or tuple.

required
length int

The desired length of the returned hex string. Must be an even number between 2 and 128 inclusive, since each byte of the digest encodes as two hex characters.

64

Returns:

Type Description
str

A string representing the hash of the input data.

Example
>>> from coola.hashing import get_default_registry
>>> registry = get_default_registry()
>>> registry.hash({"scores": [95, 87, 92], "name": "test"})

coola.hashing.HasherRegistry.register

register(
    data_type: type,
    hasher: BaseHasher[Any],
    exist_ok: bool = False,
) -> None

Register a hasher for a given data type.

Parameters:

Name Type Description Default
data_type type

The Python type to register (e.g., list, dict, custom classes).

required
hasher BaseHasher[Any]

The hasher instance that handles this type.

required
exist_ok bool

If False (default), raises an error if the type is already registered. If True, overwrites the existing registration silently.

False

Raises:

Type Description
RuntimeError

If the type is already registered and exist_ok is False.

Example
>>> from coola.hashing import HasherRegistry, SequenceHasher
>>> registry = HasherRegistry()
>>> registry.register(list, SequenceHasher())
>>> registry.has_hasher(list)
True

coola.hashing.HasherRegistry.register_many

register_many(
    mapping: Mapping[type, BaseHasher[Any]],
    exist_ok: bool = False,
) -> None

Register multiple hashers at once.

This is a convenience method for bulk registration that internally calls register for each type-hasher pair.

Parameters:

Name Type Description Default
mapping Mapping[type, BaseHasher[Any]]

Dictionary mapping Python types to hasher instances.

required
exist_ok bool

If False (default), raises an error if any type is already registered. If True, overwrites existing registrations silently.

False

Raises:

Type Description
RuntimeError

If any type is already registered and exist_ok is False.

Example
>>> from coola.hashing import HasherRegistry, SequenceHasher, MappingHasher
>>> registry = HasherRegistry()
>>> registry.register_many(
...     {
...         list: SequenceHasher(),
...         dict: MappingHasher(),
...     }
... )
>>> registry
HasherRegistry(
  (state): TypeRegistry(
      (<class 'list'>): SequenceHasher()
      (<class 'dict'>): MappingHasher()
    )
)

coola.hashing.MappingHasher

Bases: BaseHasher[Mapping[Any, Any]]

Hasher for mapping types.

This hasher sorts the mapping by key, hashes each key and value separately via the registry, concatenates the key and value hashes per item, concatenates all per-item strings, and hashes the result.

Sorting by key ensures that mappings with the same key-value pairs but different insertion orders produce the same hash.

This hasher handles any type that is an instance of collections.abc.Mapping, including dict.

Example
>>> from collections.abc import Mapping
>>> from coola.hashing import MappingHasher, StrHasher, HasherRegistry
>>> registry = HasherRegistry({object: StrHasher(), Mapping: MappingHasher()})
>>> hasher = MappingHasher()
>>> hasher
MappingHasher()
>>> hasher.hash({"a": 1, "b": 2}, registry=registry)
'a3ecbdde9e227bcdae038eb86746b0fccb90939d8e7eeac55513423219ffa02f'

coola.hashing.ReprHasher

Bases: BaseHasher[Any]

Hasher for objects whose repr() is a reliable canonical representation.

This hasher converts the object to its repr() string and then computes the hash of that string. It is preferable to StrHasher for numeric types (int, float, complex, bool) because repr() guarantees round-trip accuracy for floating point values, whereas str() may lose precision on some platforms.

Example
>>> from coola.hashing import ReprHasher, HasherRegistry
>>> registry = HasherRegistry()
>>> hasher = ReprHasher()
>>> hasher
ReprHasher()
>>> hasher.hash(1234, registry=registry)
'bf1003cd5c1336387f7e4eebf72a3d9cd4fa8ab5be19825bc0e3ecd8ce1cd140'

coola.hashing.SequenceHasher

Bases: BaseHasher[Sequence[Any]]

Hasher for sequence types.

This hasher computes the hash of each item in the sequence recursively using the registry, concatenates the intermediate hash strings, and then hashes the concatenated result.

This hasher handles any type that is an instance of collections.abc.Sequence, including list, tuple, and str.

Example
>>> from coola.hashing import SequenceHasher, StrHasher, HasherRegistry
>>> registry = HasherRegistry({object: StrHasher(), Sequence: SequenceHasher()})
>>> hasher = SequenceHasher()
>>> hasher
SequenceHasher()
>>> hasher.hash([1, 2, 3], registry=registry)

coola.hashing.StrHasher

Bases: BaseHasher[Any]

Hasher for objects whose str() is a reliable canonical representation.

This hasher converts the object to its str() string and then computes the hash of that string. Unlike ReprHasher, which uses repr(), this hasher uses str(), which is more human-readable but does not guarantee round-trip accuracy for floating point values.

Example
>>> from coola.hashing import StrHasher, HasherRegistry
>>> registry = HasherRegistry()
>>> hasher = StrHasher()
>>> hasher
StrHasher()
>>> hasher.hash("hello", registry=registry)
'324dcf027dd4a30a932c441f365a25e86b173defa4b8e58948253471b81b72cf'

coola.hashing.StringHasher

Bases: BaseHasher[str]

Hasher for string objects.

This hasher computes the hash of the string directly, without any intermediate conversion.

Example
>>> from coola.hashing import StringHasher, HasherRegistry
>>> registry = HasherRegistry()
>>> hasher = StringHasher()
>>> hasher
StringHasher()
>>> hasher.hash("Meowwwwww", registry=registry)
'1b06bfa9e842b52eaf47386798687ccd22697ed0198cfda4e0eee7e4650595f5'

coola.hashing.get_default_registry

get_default_registry() -> HasherRegistry

Return the default global hasher registry.

The registry is created on the first call and reused on all subsequent calls (singleton pattern). It is pre-configured with hashers for common Python built-in types.

Returns:

Type Description
HasherRegistry

A singleton HasherRegistry configured for common Python

HasherRegistry

built-in types.

Notes

Because the registry is a singleton, modifications to it (e.g. via register_hashers) affect all future calls to this function. If you need an isolated registry, create a new HasherRegistry instance directly.

Example
>>> from coola.hashing import get_default_registry
>>> registry = get_default_registry()
>>> registry.hash("meowwwwww")
'36a34d8fd93344d3be9a68e8c797601210f7d4585e30c102f3e8fceea38192aa'

coola.hashing.hash_object

hash_object(
    data: object,
    registry: HasherRegistry | None = None,
    length: int = 64,
) -> str

Compute a hash of a nested data structure.

Parameters:

Name Type Description Default
data object

The data to hash. Can be a nested structure such as a list, dict, or tuple.

required
registry HasherRegistry | None

The registry used to resolve hashers for each data type. If None, the default registry is used.

None
length int

The desired length of the returned hex string. Must be an even number between 2 and 128 inclusive. Defaults to 64.

64

Returns:

Type Description
str

A string representing the hash of the input data.

Example
>>> from coola.hashing import hash_object
>>> hash_object({"a": 1, "b": "abc"})
'8579f51cd67c8be8fd22301d4c085e2b676c7c7d49991645a85a3c77692a1056'

coola.hashing.hash_string

hash_string(value: str, length: int = 64) -> str

Generate a hexadecimal hash of a string using BLAKE2b.

Parameters:

Name Type Description Default
value str

The input string to hash.

required
length int

The desired length of the returned hex string. Must be an even number between 2 and 128 inclusive, since each byte of the digest encodes as two hex characters. Defaults to 64 (32-byte digest).

64

Returns:

Type Description
str

A lowercase hexadecimal string of exactly length characters.

Raises:

Type Description
ValueError

If length is not an even number between 2 and 128.

Example
>>> from coola.hashing import hash_string
>>> len(hash_string("hello", length=16))
16

coola.hashing.register_hashers

register_hashers(
    mapping: Mapping[type, BaseHasher[Any]],
    exist_ok: bool = False,
) -> None

Register custom hashers into the default global registry.

This function extends the singleton registry returned by get_default_registry with additional type-to-hasher mappings.

Parameters:

Name Type Description Default
mapping Mapping[type, BaseHasher[Any]]

Mapping of Python types to hasher instances.

required
exist_ok bool

If False (default), raises an error if any type is already registered. If True, overwrites existing registrations silently.

False

Raises:

Type Description
RuntimeError

If any type is already registered and exist_ok is False.

Example
>>> from coola.hashing import register_hashers, StrHasher
>>> class MyType:
...     def __init__(self, value):
...         self.value = value
...
>>> register_hashers({MyType: StrHasher()})