Hashing
coola.hashing ¶
Provide deterministic hashing functions.
coola.hashing.BaseHasher ¶
Bases: ABC, Generic[T]
Base class for type-specific hashers.
This abstract base class defines the interface for hashers that compute a hash value for a specific data type. Each concrete hasher implementation handles a specific data type and knows how to hash its nested elements recursively.
Notes
Subclasses must implement the hash method to define how
their specific type should be hashed.
Example
>>> from coola.hashing import StrHasher, HasherRegistry
>>> registry = HasherRegistry()
>>> hasher = StrHasher()
>>> hasher
StrHasher()
>>> hasher.hash([1, 2, 3], registry=registry)
coola.hashing.BaseHasher.hash
abstractmethod
¶
hash(
data: T, registry: HasherRegistry, length: int = 64
) -> str
Hash the given data structure recursively.
This method traverses the data structure and computes a hash value for it. The registry is used to resolve appropriate hashers for nested data types encountered during traversal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
T
|
The data structure to hash. Must be of the type |
required |
registry
|
HasherRegistry
|
The hasher registry used to look up hashers for nested data structures of different types. |
required |
length
|
int
|
The desired length of the returned hex string. Must be an even number between 2 and 128 inclusive, since each byte of the digest encodes as two hex characters. |
64
|
Returns:
| Type | Description |
|---|---|
str
|
A string representing the hash of the input data. |
Example
>>> from coola.hashing import StrHasher, HasherRegistry
>>> registry = HasherRegistry()
>>> hasher = StrHasher()
>>> hasher.hash("Meowwww", registry=registry)
coola.hashing.DatetimeHasher ¶
Bases: BaseHasher[date]
Hasher for :class:datetime.date and :class:datetime.datetime
objects.
This hasher converts the object to its ISO 8601 string representation
via isoformat() and then computes the hash of that string.
Since :class:datetime.datetime is a subclass of :class:datetime.date,
both types are handled by this hasher. Their ISO representations are
always distinct (e.g. '2021-01-01' vs '2021-01-01T00:00:00'),
so they never produce the same hash for the same calendar date.
Example
>>> from datetime import date
>>> from coola.hashing import DatetimeHasher, HasherRegistry
>>> registry = HasherRegistry()
>>> hasher = DatetimeHasher()
>>> hasher
DatetimeHasher()
>>> hasher.hash(date(2021, 1, 1), registry=registry)
'f2b4c6a9941206bb6fc3b4b9c1104d8c05264985c009e2e1c7c840aaeda00dac'
coola.hashing.HasherRegistry ¶
Registry that manages and dispatches hashers based on data type.
This registry maintains a mapping from Python types to hasher instances and uses the Method Resolution Order (MRO) for type lookup. When hashing data, it automatically selects the most specific registered hasher for the data's type, falling back to parent types or a default hasher if needed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
initial_state
|
dict[type, BaseHasher[Any]] | None
|
Optional initial mapping of types to hashers. If provided, the state is copied to prevent external mutations. |
None
|
Example
Basic usage with a sequence hasher:
>>> from coola.hashing import HasherRegistry, SequenceHasher, StrHasher
>>> registry = HasherRegistry({object: StrHasher(), list: SequenceHasher()})
>>> registry
HasherRegistry(
(state): TypeRegistry(
(<class 'object'>): StrHasher()
(<class 'list'>): SequenceHasher()
)
)
>>> registry.hash([1, 2, 3])
Registering custom hashers:
>>> from coola.hashing import HasherRegistry, SequenceHasher, StrHasher
>>> registry = HasherRegistry({object: StrHasher()})
>>> registry.register(list, SequenceHasher())
>>> registry.hash([1, 2, 3])
Working with nested structures:
>>> from coola.hashing import get_default_registry
>>> registry = get_default_registry()
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> registry.hash(data)
coola.hashing.HasherRegistry.find_hasher ¶
find_hasher(data_type: type) -> BaseHasher[Any]
Find the appropriate hasher for a given type.
Uses the Method Resolution Order (MRO) to find the most specific
registered hasher. For example, if a hasher is registered for
Sequence but not for list, lists will use the Sequence
hasher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_type
|
type
|
The Python type to find a hasher for. |
required |
Returns:
| Type | Description |
|---|---|
BaseHasher[Any]
|
The most specific registered hasher for this type, resolved |
BaseHasher[Any]
|
via MRO, or the default hasher if no match is found. |
Example
>>> from collections.abc import Sequence
>>> from coola.hashing import HasherRegistry, SequenceHasher, StrHasher
>>> registry = HasherRegistry({object: StrHasher()})
>>> registry.register(Sequence, SequenceHasher())
>>> hasher = registry.find_hasher(list)
>>> hasher
StrHasher()
coola.hashing.HasherRegistry.has_hasher ¶
has_hasher(data_type: type) -> bool
Check if a hasher is explicitly registered for the given type.
Note that this only checks for direct registration. Even if this
returns False, find_hasher may still return a hasher via
MRO lookup or the default hasher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_type
|
type
|
The type to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
|
Example
>>> from coola.hashing import HasherRegistry, SequenceHasher
>>> registry = HasherRegistry()
>>> registry.register(list, SequenceHasher())
>>> registry.has_hasher(list)
True
>>> registry.has_hasher(tuple)
False
coola.hashing.HasherRegistry.hash ¶
hash(data: object, length: int = 64) -> str
Hash the given data by recursively traversing its structure.
This is the main entry point for hashing. It automatically:
- Determines the data's type.
- Finds the appropriate hasher via
find_hasher. - Delegates to that hasher's
hashmethod, which recursively processes any nested structures.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
object
|
The data to hash. Can be a nested structure such as
a |
required |
length
|
int
|
The desired length of the returned hex string. Must be an even number between 2 and 128 inclusive, since each byte of the digest encodes as two hex characters. |
64
|
Returns:
| Type | Description |
|---|---|
str
|
A string representing the hash of the input data. |
Example
>>> from coola.hashing import get_default_registry
>>> registry = get_default_registry()
>>> registry.hash({"scores": [95, 87, 92], "name": "test"})
coola.hashing.HasherRegistry.register ¶
register(
data_type: type,
hasher: BaseHasher[Any],
exist_ok: bool = False,
) -> None
Register a hasher for a given data type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_type
|
type
|
The Python type to register (e.g., |
required |
hasher
|
BaseHasher[Any]
|
The hasher instance that handles this type. |
required |
exist_ok
|
bool
|
If |
False
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the type is already registered and |
Example
>>> from coola.hashing import HasherRegistry, SequenceHasher
>>> registry = HasherRegistry()
>>> registry.register(list, SequenceHasher())
>>> registry.has_hasher(list)
True
coola.hashing.HasherRegistry.register_many ¶
register_many(
mapping: Mapping[type, BaseHasher[Any]],
exist_ok: bool = False,
) -> None
Register multiple hashers at once.
This is a convenience method for bulk registration that internally
calls register for each type-hasher pair.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mapping
|
Mapping[type, BaseHasher[Any]]
|
Dictionary mapping Python types to hasher instances. |
required |
exist_ok
|
bool
|
If |
False
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If any type is already registered and |
Example
>>> from coola.hashing import HasherRegistry, SequenceHasher, MappingHasher
>>> registry = HasherRegistry()
>>> registry.register_many(
... {
... list: SequenceHasher(),
... dict: MappingHasher(),
... }
... )
>>> registry
HasherRegistry(
(state): TypeRegistry(
(<class 'list'>): SequenceHasher()
(<class 'dict'>): MappingHasher()
)
)
coola.hashing.MappingHasher ¶
Bases: BaseHasher[Mapping[Any, Any]]
Hasher for mapping types.
This hasher sorts the mapping by key, hashes each key and value separately via the registry, concatenates the key and value hashes per item, concatenates all per-item strings, and hashes the result.
Sorting by key ensures that mappings with the same key-value pairs but different insertion orders produce the same hash.
This hasher handles any type that is an instance of
collections.abc.Mapping, including dict.
Example
>>> from collections.abc import Mapping
>>> from coola.hashing import MappingHasher, StrHasher, HasherRegistry
>>> registry = HasherRegistry({object: StrHasher(), Mapping: MappingHasher()})
>>> hasher = MappingHasher()
>>> hasher
MappingHasher()
>>> hasher.hash({"a": 1, "b": 2}, registry=registry)
'a3ecbdde9e227bcdae038eb86746b0fccb90939d8e7eeac55513423219ffa02f'
coola.hashing.ReprHasher ¶
Bases: BaseHasher[Any]
Hasher for objects whose repr() is a reliable canonical
representation.
This hasher converts the object to its repr() string and then
computes the hash of that string. It is preferable to
StrHasher for numeric types (int, float, complex,
bool) because repr() guarantees round-trip accuracy for
floating point values, whereas str() may lose precision on some
platforms.
Example
>>> from coola.hashing import ReprHasher, HasherRegistry
>>> registry = HasherRegistry()
>>> hasher = ReprHasher()
>>> hasher
ReprHasher()
>>> hasher.hash(1234, registry=registry)
'bf1003cd5c1336387f7e4eebf72a3d9cd4fa8ab5be19825bc0e3ecd8ce1cd140'
coola.hashing.SequenceHasher ¶
Bases: BaseHasher[Sequence[Any]]
Hasher for sequence types.
This hasher computes the hash of each item in the sequence recursively using the registry, concatenates the intermediate hash strings, and then hashes the concatenated result.
This hasher handles any type that is an instance of
collections.abc.Sequence, including list, tuple, and
str.
Example
>>> from coola.hashing import SequenceHasher, StrHasher, HasherRegistry
>>> registry = HasherRegistry({object: StrHasher(), Sequence: SequenceHasher()})
>>> hasher = SequenceHasher()
>>> hasher
SequenceHasher()
>>> hasher.hash([1, 2, 3], registry=registry)
coola.hashing.StrHasher ¶
Bases: BaseHasher[Any]
Hasher for objects whose str() is a reliable canonical
representation.
This hasher converts the object to its str() string and then
computes the hash of that string. Unlike ReprHasher, which uses
repr(), this hasher uses str(), which is more human-readable
but does not guarantee round-trip accuracy for floating point values.
Example
>>> from coola.hashing import StrHasher, HasherRegistry
>>> registry = HasherRegistry()
>>> hasher = StrHasher()
>>> hasher
StrHasher()
>>> hasher.hash("hello", registry=registry)
'324dcf027dd4a30a932c441f365a25e86b173defa4b8e58948253471b81b72cf'
coola.hashing.StringHasher ¶
Bases: BaseHasher[str]
Hasher for string objects.
This hasher computes the hash of the string directly, without any intermediate conversion.
Example
>>> from coola.hashing import StringHasher, HasherRegistry
>>> registry = HasherRegistry()
>>> hasher = StringHasher()
>>> hasher
StringHasher()
>>> hasher.hash("Meowwwwww", registry=registry)
'1b06bfa9e842b52eaf47386798687ccd22697ed0198cfda4e0eee7e4650595f5'
coola.hashing.get_default_registry ¶
get_default_registry() -> HasherRegistry
Return the default global hasher registry.
The registry is created on the first call and reused on all subsequent calls (singleton pattern). It is pre-configured with hashers for common Python built-in types.
Returns:
| Type | Description |
|---|---|
HasherRegistry
|
A singleton |
HasherRegistry
|
built-in types. |
Notes
Because the registry is a singleton, modifications to it
(e.g. via register_hashers) affect all future calls to
this function. If you need an isolated registry, create a new
HasherRegistry instance directly.
Example
>>> from coola.hashing import get_default_registry
>>> registry = get_default_registry()
>>> registry.hash("meowwwwww")
'36a34d8fd93344d3be9a68e8c797601210f7d4585e30c102f3e8fceea38192aa'
coola.hashing.hash_object ¶
hash_object(
data: object,
registry: HasherRegistry | None = None,
length: int = 64,
) -> str
Compute a hash of a nested data structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
object
|
The data to hash. Can be a nested structure such as a
|
required |
registry
|
HasherRegistry | None
|
The registry used to resolve hashers for each data
type. If |
None
|
length
|
int
|
The desired length of the returned hex string. Must be
an even number between 2 and 128 inclusive. Defaults to
|
64
|
Returns:
| Type | Description |
|---|---|
str
|
A string representing the hash of the input data. |
Example
>>> from coola.hashing import hash_object
>>> hash_object({"a": 1, "b": "abc"})
'8579f51cd67c8be8fd22301d4c085e2b676c7c7d49991645a85a3c77692a1056'
coola.hashing.hash_string ¶
hash_string(value: str, length: int = 64) -> str
Generate a hexadecimal hash of a string using BLAKE2b.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str
|
The input string to hash. |
required |
length
|
int
|
The desired length of the returned hex string. Must be an even number between 2 and 128 inclusive, since each byte of the digest encodes as two hex characters. Defaults to 64 (32-byte digest). |
64
|
Returns:
| Type | Description |
|---|---|
str
|
A lowercase hexadecimal string of exactly |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Example
>>> from coola.hashing import hash_string
>>> len(hash_string("hello", length=16))
16
coola.hashing.register_hashers ¶
register_hashers(
mapping: Mapping[type, BaseHasher[Any]],
exist_ok: bool = False,
) -> None
Register custom hashers into the default global registry.
This function extends the singleton registry returned by
get_default_registry with additional type-to-hasher mappings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mapping
|
Mapping[type, BaseHasher[Any]]
|
Mapping of Python types to hasher instances. |
required |
exist_ok
|
bool
|
If |
False
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If any type is already registered and |
Example
>>> from coola.hashing import register_hashers, StrHasher
>>> class MyType:
... def __init__(self, value):
... self.value = value
...
>>> register_hashers({MyType: StrHasher()})