Skip to content

Shard

iden.shard

Contain shard implementations.

iden.shard.BaseShard

Bases: Generic[T], ABC

Define the base class to implement a shard.

iden.shard.BaseShard.equal abstractmethod

equal(other: Any, equal_nan: bool = False) -> bool

Indicate if two shards are equal or not.

Parameters:

Name Type Description Default
other Any

The object to compare with.

required
equal_nan bool

If True, then two NaNs will be considered equal.

False

Returns:

Type Description
bool

True if the two shards are equal, otherwise False.

iden.shard.BaseShard.get_data abstractmethod

get_data() -> T

Get the data in the shard.

Returns:

Type Description
T

The data in the shard.

iden.shard.BaseShard.get_uri abstractmethod

get_uri() -> str | None

Get the Uniform Resource Identifier (URI) of the shard.

Returns:

Type Description
str | None

The Uniform Resource Identifier (URI).

iden.shard.FileShard

Bases: BaseShard[T]

Implement a generic shard where the data are stored in a single file.

Parameters:

Name Type Description Default
uri str

The shard's URI.

required
path Path | str

Specifies the path to the pickle file.

required
loader BaseLoader[T] | dict | None

The data loader or its configuration.

None

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import FileShard
>>> from iden.io import save_json, JsonLoader
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     file = Path(tmpdir).joinpath("data.json")
...     save_json([1, 2, 3], file)
...     uri = Path(tmpdir).joinpath("my_uri").as_uri()
...     shard = FileShard(uri=uri, path=file, loader=JsonLoader())
...     shard.get_data()
...
[1, 2, 3]

iden.shard.FileShard.path property

path: Path

The path to the file with data.

iden.shard.FileShard.from_uri classmethod

from_uri(uri: str) -> FileShard

Instantiate a shard from its URI.

Parameters:

Name Type Description Default
uri str

The URI.

required

Returns:

Type Description
FileShard

The instantiated shard.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import FileShard, create_json_shard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     uri = Path(tmpdir).joinpath("my_uri").as_uri()
...     _ = create_json_shard([1, 2, 3], uri=uri)
...     shard = FileShard.from_uri(uri)
...     shard
...
FileShard(uri=file:///.../my_uri)

iden.shard.FileShard.generate_uri_config classmethod

generate_uri_config(path: Path) -> dict

Generate the minimal config that is used to load the shard from its URI.

The config must be compatible with the JSON format.

Parameters:

Name Type Description Default
path Path

The path to the json file.

required

Returns:

Type Description
dict

The minimal config to load the shard from its URI.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import FileShard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     file = Path(tmpdir).joinpath("data.json")
...     FileShard.generate_uri_config(file)
...
{'kwargs': {'path': '.../data.json'},
 'loader': {'_target_': 'iden.shard.loader.FileShardLoader'}}

iden.shard.InMemoryShard

Bases: BaseShard[Any]

Implement an in-memory shard.

This shard does not have valid URI as the data are stored in-memory.

Example usage:

>>> from iden.shard import InMemoryShard
>>> shard = InMemoryShard([1, 2, 3])
>>> shard.get_data()
[1, 2, 3]

iden.shard.JsonShard

Bases: FileShard[Any]

Implement a JSON shard.

The data are stored in a JSON file.

Parameters:

Name Type Description Default
uri str

The shard's URI.

required
path Path | str

Specifies the path to the JSON file.

required

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import JsonShard
>>> from iden.io import save_json
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     file = Path(tmpdir).joinpath("data.json")
...     save_json([1, 2, 3], file)
...     shard = JsonShard(uri="file:///data/1234456789", path=file)
...     shard.get_data()
...
[1, 2, 3]

iden.shard.JsonShard.generate_uri_config classmethod

generate_uri_config(path: Path) -> dict

Generate the minimal config that is used to load the shard from its URI.

The config must be compatible with the JSON format.

Parameters:

Name Type Description Default
path Path

The path to the json file.

required

Returns:

Type Description
dict

The minimal config to load the shard from its URI.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import JsonShard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     file = Path(tmpdir).joinpath("data.json")
...     JsonShard.generate_uri_config(file)
...
{'kwargs': {'path': '.../data.json'},
 'loader': {'_target_': 'iden.shard.loader.JsonShardLoader'}}

iden.shard.PickleShard

Bases: FileShard[Any]

Implement a pickle shard.

The data are stored in a pickle file.

Parameters:

Name Type Description Default
uri str

The shard's URI.

required
path Path | str

Specifies the path to the pickle file.

required

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import PickleShard
>>> from iden.io import save_pickle
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     file = Path(tmpdir).joinpath("data.pkl")
...     save_pickle([1, 2, 3], file)
...     shard = PickleShard(uri="file:///data/1234456789", path=file)
...     shard.get_data()
...
[1, 2, 3]

iden.shard.PickleShard.generate_uri_config classmethod

generate_uri_config(path: Path) -> dict

Generate the minimal config that is used to load the shard from its URI.

The config must be compatible with the JSON format.

Parameters:

Name Type Description Default
path Path

The path to the pickle file.

required

Returns:

Type Description
dict

The minimal config to load the shard from its URI.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import PickleShard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     file = Path(tmpdir).joinpath("data.pkl")
...     PickleShard.generate_uri_config(file)
...
{'kwargs': {'path': '.../data.pkl'},
 'loader': {'_target_': 'iden.shard.loader.PickleShardLoader'}}

iden.shard.ShardDict

Bases: BaseShard

Implement a data structure to manage a dictionary of shards.

Parameters:

Name Type Description Default
uri str

The shard's URI.

required
shards dict[str, BaseShard]

The dictionary of shards.

required

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.dataset import VanillaDataset
>>> from iden.shard import create_json_shard, ShardDict
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shards = {
...         "train": create_json_shard(
...             [1, 2, 3], uri=Path(tmpdir).joinpath("shard/uri1").as_uri()
...         ),
...         "val": create_json_shard(
...             [4, 5, 6, 7], uri=Path(tmpdir).joinpath("shard/uri2").as_uri()
...         ),
...     }
...     sd = ShardDict(uri=Path(tmpdir).joinpath("uri").as_uri(), shards=shards)
...     sd
...
ShardDict(
  (uri): file:///.../uri
  (shards):
    (train): JsonShard(uri=file:///.../shard/uri1)
    (val): JsonShard(uri=file:///.../shard/uri2)
)

iden.shard.ShardDict.from_uri classmethod

from_uri(uri: str) -> ShardDict

Instantiate a shard from its URI.

Parameters:

Name Type Description Default
uri str

The URI.

required

Returns:

Type Description
ShardDict

The instantiated shard.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import ShardDict, create_json_shard, create_shard_dict
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shards = {
...         "train": create_json_shard(
...             [1, 2, 3], uri=Path(tmpdir).joinpath("shard/uri1").as_uri()
...         ),
...         "val": create_json_shard(
...             [4, 5, 6, 7], uri=Path(tmpdir).joinpath("shard/uri2").as_uri()
...         ),
...     }
...     uri = Path(tmpdir).joinpath("uri").as_uri()
...     _ = create_shard_dict(shards, uri=uri)
...     shard = ShardDict.from_uri(uri)
...     shard
...
ShardDict(
  (uri): file:///.../uri
  (shards):
    (train): JsonShard(uri=file:///.../shard/uri1)
    (val): JsonShard(uri=file:///.../shard/uri2)
)

iden.shard.ShardDict.generate_uri_config classmethod

generate_uri_config(shards: dict[str, BaseShard]) -> dict

Generate the minimal config that is used to load the shard from its URI.

The config must be compatible with the JSON format.

Parameters:

Name Type Description Default
shards dict[str, BaseShard]

The shards.

required

Returns:

Type Description
dict

The minimal config to load the shard from its URI.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import ShardDict, create_json_shard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shards = {
...         "train": create_json_shard(
...             [1, 2, 3], uri=Path(tmpdir).joinpath("shard/uri1").as_uri()
...         ),
...         "val": create_json_shard(
...             [4, 5, 6, 7], uri=Path(tmpdir).joinpath("shard/uri2").as_uri()
...         ),
...     }
...     ShardDict.generate_uri_config(shards)
...
{'shards': {'train': 'file:///.../shard/uri1', 'val': 'file:///.../shard/uri2'},
 'loader': {'_target_': 'iden.shard.loader.ShardDictLoader'}}

iden.shard.ShardDict.get_shard

get_shard(shard_id: str) -> Any

Get a shard.

Parameters:

Name Type Description Default
shard_id str

The shard ID.

required

Returns:

Type Description
Any

The shard.

Raises:

Type Description
ShardNotFoundError

if the shard does not exist.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import create_json_shard, ShardDict
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shards = {
...         "train": create_json_shard(
...             [1, 2, 3], uri=Path(tmpdir).joinpath("shard/uri1").as_uri()
...         ),
...         "val": create_json_shard(
...             [4, 5, 6, 7], uri=Path(tmpdir).joinpath("shard/uri2").as_uri()
...         ),
...     }
...     sd = ShardDict(uri=Path(tmpdir).joinpath("main_uri").as_uri(), shards=shards)
...     sd.get_shard("train")
...
JsonShard(uri=file:///.../uri1)

iden.shard.ShardDict.get_shard_ids

get_shard_ids() -> set[str]

Get the shard IDs.

Returns:

Type Description
set[str]

The shard IDs.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import create_json_shard, ShardDict
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shards = {
...         "train": create_json_shard(
...             [1, 2, 3], uri=Path(tmpdir).joinpath("shard/uri1").as_uri()
...         ),
...         "val": create_json_shard(
...             [4, 5, 6, 7], uri=Path(tmpdir).joinpath("shard/uri2").as_uri()
...         ),
...     }
...     sd = ShardDict(uri=Path(tmpdir).joinpath("main_uri").as_uri(), shards=shards)
...     sorted(sd.get_shard_ids())
...
['train', 'val']

iden.shard.ShardDict.has_shard

has_shard(shard_id: str) -> bool

Indicate if the shard exists or not.

Parameters:

Name Type Description Default
shard_id str

The shard ID.

required

Returns:

Type Description
bool

True if the shard exists, otherwise False

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import create_json_shard, ShardDict
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shards = {
...         "train": create_json_shard(
...             [1, 2, 3], uri=Path(tmpdir).joinpath("shard/uri1").as_uri()
...         ),
...         "val": create_json_shard(
...             [4, 5, 6, 7], uri=Path(tmpdir).joinpath("shard/uri2").as_uri()
...         ),
...     }
...     sd = ShardDict(uri=Path(tmpdir).joinpath("main_uri").as_uri(), shards=shards)
...     sd.has_shard("train")
...     sd.has_shard("test")
...
True
False

iden.shard.ShardTuple

Bases: BaseShard[tuple[BaseShard, ...]]

Implement a data structure to manage a tuple of shards.

Parameters:

Name Type Description Default
uri str

The shard's URI.

required
shards Iterable[BaseShard]

The tuple of shards.

required

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import create_json_shard
>>> from iden.shard import ShardTuple
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shards = [
...         create_json_shard([1, 2, 3], uri=Path(tmpdir).joinpath("shard/uri1").as_uri()),
...         create_json_shard(
...             [4, 5, 6, 7], uri=Path(tmpdir).joinpath("shard/uri2").as_uri()
...         ),
...     ]
...     sl = ShardTuple(uri=Path(tmpdir).joinpath("uri").as_uri(), shards=shards)
...     sl
...
ShardTuple(
  (uri): file:///.../uri
  (shards):
    (0): JsonShard(uri=file:///.../shard/uri1)
    (1): JsonShard(uri=file:///.../shard/uri2)
)

iden.shard.ShardTuple.from_uri classmethod

from_uri(uri: str) -> ShardTuple

Instantiate a shard from its URI.

Parameters:

Name Type Description Default
uri str

The URI.

required

Returns:

Type Description
ShardTuple

The instantiated shard.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import ShardTuple, create_json_shard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shards = [
...         create_json_shard(
...             [1, 2, 3], uri=Path(tmpdir).joinpath("shard/uri1").as_uri()
...         ),
...         create_json_shard(
...             [4, 5, 6, 7], uri=Path(tmpdir).joinpath("shard/uri2").as_uri()
...         ),
...     ]
...     uri = Path(tmpdir).joinpath("uri").as_uri()
...     create_shard_tuple(shards, uri=uri)
...     shard = ShardTuple.from_uri(uri)
...     shard
...
ShardTuple(
  (uri): file:///.../uri
  (shards):
    (0): JsonShard(uri=file:///.../shard/uri1)
    (1): JsonShard(uri=file:///.../shard/uri2)
)

iden.shard.ShardTuple.generate_uri_config classmethod

generate_uri_config(shards: Iterable[BaseShard]) -> dict

Generate the minimal config that is used to load the shard from its URI.

The config must be compatible with the JSON format.

Parameters:

Name Type Description Default
shards Iterable[BaseShard]

The shards.

required

Returns:

Type Description
dict

The minimal config to load the shard from its URI.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import ShardTuple, create_json_shard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shards = [
...         create_json_shard(
...             [1, 2, 3], uri=Path(tmpdir).joinpath("shard/uri1").as_uri()
...         ),
...         create_json_shard(
...             [4, 5, 6, 7], uri=Path(tmpdir).joinpath("shard/uri2").as_uri()
...         ),
...     ]
...     ShardTuple.generate_uri_config(shards)
...
{'shards': ['file:///.../shard/uri1', 'file:///.../shard/uri2'],
 'loader': {'_target_': 'iden.shard.loader.ShardTupleLoader'}}

iden.shard.ShardTuple.get

get(index: int) -> BaseShard

Get a shard.

Parameters:

Name Type Description Default
index int

The shard index to get.

required

Returns:

Type Description
BaseShard

The shard.

Raises:

Type Description
IndexError

if the index is outside the tuple range.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import create_json_shard
>>> from iden.shard import ShardTuple
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shards = [
...         create_json_shard(
...             [1, 2, 3], uri=Path(tmpdir).joinpath("shard/uri1").as_uri()
...         ),
...         create_json_shard(
...             [4, 5, 6, 7], uri=Path(tmpdir).joinpath("shard/uri2").as_uri()
...         ),
...     ]
...     sl = ShardTuple(uri=Path(tmpdir).joinpath("main_uri").as_uri(), shards=shards)
...     sl.get(0)
...
JsonShard(uri=file:///.../uri1)

iden.shard.ShardTuple.is_sorted_by_uri

is_sorted_by_uri() -> bool

Indicate if the shards are sorted by ascending order of URIs or not.

Returns:

Type Description
bool

True if the shards are sorted by ascending order of URIs, otherwise False.

iden.shard.TorchSafetensorsShard

Bases: FileShard[dict[str, Tensor]]

Implement a safetensors shard for torch.Tensors.

The data are stored in a safetensors file.

Parameters:

Name Type Description Default
uri str

The shard's URI.

required
path Path | str

Specifies the path to the safetensors file.

required

Raises:

Type Description
RuntimeError

if safetensors or torch is not installed.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import TorchSafetensorsShard
>>> from iden.io.safetensors import TorchSaver
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     file = Path(tmpdir).joinpath("data.safetensors")
...     TorchSaver().save({"key1": torch.ones(2, 3), "key2": torch.arange(5)}, file)
...     shard = TorchSafetensorsShard(uri="file:///data/1234456789", path=file)
...     shard.get_data()
...
{'key1': tensor([[1., 1., 1.], [1., 1., 1.]]), 'key2': tensor([0, 1, 2, 3, 4])}

iden.shard.TorchSafetensorsShard.generate_uri_config classmethod

generate_uri_config(path: Path) -> dict

Generate the minimal config that is used to load the shard from its URI.

The config must be compatible with the JSON format.

Parameters:

Name Type Description Default
path Path

The path to the pickle file.

required

Returns:

Type Description
dict

The minimal config to load the shard from its URI.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import TorchSafetensorsShard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     file = Path(tmpdir).joinpath("data.safetensors")
...     TorchSafetensorsShard.generate_uri_config(file)
...
{'kwargs': {'path': '.../data.safetensors'},
 'loader': {'_target_': 'iden.shard.loader.TorchSafetensorsShardLoader'}}

iden.shard.TorchShard

Bases: FileShard[Any]

Implement a PyTorch shard for torch.Tensors.

The data are stored in a PyTorch file.

Parameters:

Name Type Description Default
uri str

The shard's URI.

required
path Path | str

Specifies the path to the PyTorch file.

required

Raises:

Type Description
RuntimeError

if torch is not installed.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import TorchShard
>>> from iden.io import TorchSaver
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     file = Path(tmpdir).joinpath("data.pt")
...     TorchSaver().save({"key1": torch.ones(2, 3), "key2": torch.arange(5)}, file)
...     shard = TorchShard(uri="file:///data/1234456789", path=file)
...     shard.get_data()
...
{'key1': tensor([[1., 1., 1.], [1., 1., 1.]]), 'key2': tensor([0, 1, 2, 3, 4])}

iden.shard.TorchShard.generate_uri_config classmethod

generate_uri_config(path: Path) -> dict

Generate the minimal config that is used to load the shard from its URI.

The config must be compatible with the JSON format.

Parameters:

Name Type Description Default
path Path

The path to the pickle file.

required

Returns:

Type Description
dict

The minimal config to load the shard from its URI.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import TorchShard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     file = Path(tmpdir).joinpath("data.pt")
...     TorchShard.generate_uri_config(file)
...
{'kwargs': {'path': '.../data.pt'},
 'loader': {'_target_': 'iden.shard.loader.TorchShardLoader'}}

iden.shard.YamlShard

Bases: FileShard[Any]

Implement a YAML shard.

The data are stored in a YAML file.

Parameters:

Name Type Description Default
uri str

The shard's URI.

required
path Path | str

Specifies the path to the YAML file.

required

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import YamlShard
>>> from iden.io import save_yaml
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     file = Path(tmpdir).joinpath("data.yaml")
...     save_yaml([1, 2, 3], file)
...     shard = YamlShard(uri="file:///data/1234456789", path=file)
...     shard.get_data()
...
[1, 2, 3]

iden.shard.YamlShard.generate_uri_config classmethod

generate_uri_config(path: Path) -> dict

Generate the minimal config that is used to load the shard from its URI.

The config must be compatible with the YAML format.

Parameters:

Name Type Description Default
path Path

The path to the yaml file.

required

Returns:

Type Description
dict

The minimal config to load the shard from its URI.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import YamlShard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     file = Path(tmpdir).joinpath("data.yaml")
...     YamlShard.generate_uri_config(file)
...
{'kwargs': {'path': '.../data.yaml'},
 'loader': {'_target_': 'iden.shard.loader.YamlShardLoader'}}

iden.shard.create_json_shard

create_json_shard(
    data: Any, uri: str, path: Path | None = None
) -> JsonShard

Create a JsonShard from data.

Note

It is a utility function to create a JsonShard from its data and URI. It is possible to create a JsonShard in other ways.

Parameters:

Name Type Description Default
data Any

The data to save in the json file.

required
uri str

The shard's URI.

required
path Path | None

The path to the JSON file. If None, a path is automatically based on the URI.

None

Returns:

Type Description
JsonShard

The JsonShard object.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import create_json_shard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shard = create_json_shard([1, 2, 3], uri=Path(tmpdir).joinpath("my_uri").as_uri())
...     shard.get_data()
...
[1, 2, 3]

iden.shard.create_pickle_shard

create_pickle_shard(
    data: Any, uri: str, path: Path | None = None
) -> PickleShard

Create a PickleShard from data.

Note

It is a utility function to create a PickleShard from its data and URI. It is possible to create a PickleShard in other ways.

Parameters:

Name Type Description Default
data Any

The data to save in the pickle file.

required
uri str

The shard's URI.

required
path Path | None

The path to the pickle file. If None, a path is automatically based on the URI.

None

Returns:

Type Description
PickleShard

The PickleShard object.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import create_pickle_shard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shard = create_pickle_shard([1, 2, 3], uri=Path(tmpdir).joinpath("my_uri").as_uri())
...     shard.get_data()
...
[1, 2, 3]

iden.shard.create_shard_dict

create_shard_dict(
    shards: dict[str, BaseShard], uri: str
) -> ShardDict

Create a ShardDict a list of shards.

Note

It is a utility function to create a ShardDict from its shards and URI. It is possible to create a ShardDict in other ways.

Parameters:

Name Type Description Default
shards dict[str, BaseShard]

The shards.

required
uri str

The shard's URI.

required

Returns:

Type Description
ShardDict

The ShardDict object.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import ShardDict, create_json_shard, create_shard_dict
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shards = {
...         "train": create_json_shard(
...             [1, 2, 3], uri=Path(tmpdir).joinpath("shard/uri1").as_uri()
...         ),
...         "val": create_json_shard(
...             [4, 5, 6, 7], uri=Path(tmpdir).joinpath("shard/uri2").as_uri()
...         ),
...     }
...     shard = create_shard_dict(shards, uri=Path(tmpdir).joinpath("uri").as_uri())
...     shard
...
ShardDict(
  (uri): file:///.../uri
  (shards):
    (train): JsonShard(uri=file:///.../shard/uri1)
    (val): JsonShard(uri=file:///.../shard/uri2)
)

iden.shard.create_shard_tuple

create_shard_tuple(
    shards: Iterable[BaseShard], uri: str
) -> ShardTuple

Create a ShardTuple a list of shards.

Note

It is a utility function to create a ShardTuple from its shards and URI. It is possible to create a ShardTuple in other ways.

Parameters:

Name Type Description Default
shards Iterable[BaseShard]

The shards.

required
uri str

The shard's URI.

required

Returns:

Type Description
ShardTuple

The ShardTuple object.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import ShardTuple, create_json_shard, create_shard_tuple
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shards = [
...         create_json_shard(
...             [1, 2, 3], uri=Path(tmpdir).joinpath("shard/uri1").as_uri()
...         ),
...         create_json_shard(
...             [4, 5, 6, 7], uri=Path(tmpdir).joinpath("shard/uri2").as_uri()
...         ),
...     ]
...     shard = create_shard_tuple(shards, uri=Path(tmpdir).joinpath("uri").as_uri())
...     shard
...
ShardTuple(
  (uri): file:///.../uri
  (shards):
    (0): JsonShard(uri=file:///.../shard/uri1)
    (1): JsonShard(uri=file:///.../shard/uri2)
)

iden.shard.create_torch_safetensors_shard

create_torch_safetensors_shard(
    data: dict[str, Tensor],
    uri: str,
    path: Path | None = None,
) -> TorchSafetensorsShard

Create a TorchSafetensorsShard from data.

Note

It is a utility function to create a TorchSafetensorsShard from its data and URI. It is possible to create a TorchSafetensorsShard in other ways.

Parameters:

Name Type Description Default
data dict[str, Tensor]

The data to save in the safetensors file.

required
uri str

The shard's URI.

required
path Path | None

The path to the safetensors file. If None, a path is automatically based on the URI.

None

Returns:

Type Description
TorchSafetensorsShard

The TorchSafetensorsShard object.

Raises:

Type Description
RuntimeError

if safetensors or torch is not installed.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> import torch
>>> from iden.shard import create_torch_safetensors_shard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shard = create_torch_safetensors_shard(
...         data={"key1": torch.ones(2, 3), "key2": torch.arange(5)},
...         uri=Path(tmpdir).joinpath("my_uri").as_uri()
...     )
...     shard.get_data()
...
{'key1': tensor([[1., 1., 1.], [1., 1., 1.]]), 'key2': tensor([0, 1, 2, 3, 4])}

iden.shard.create_torch_shard

create_torch_shard(
    data: Any, uri: str, path: Path | None = None
) -> TorchShard

Create a TorchShard from data.

Note

It is a utility function to create a TorchShard from its data and URI. It is possible to create a TorchShard in other ways.

Parameters:

Name Type Description Default
data Any

The data to save in the PyTorch file.

required
uri str

The shard's URI.

required
path Path | None

The path to the PyTorch file. If None, a path is automatically based on the URI.

None

Returns:

Type Description
TorchShard

The TorchShard object.

Raises:

Type Description
RuntimeError

if torch is not installed.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> import torch
>>> from iden.shard import create_torch_shard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shard = create_torch_shard(
...         data={"key1": torch.ones(2, 3), "key2": torch.arange(5)},
...         uri=Path(tmpdir).joinpath("my_uri").as_uri()
...     )
...     shard.get_data()
...
{'key1': tensor([[1., 1., 1.], [1., 1., 1.]]), 'key2': tensor([0, 1, 2, 3, 4])}

iden.shard.create_yaml_shard

create_yaml_shard(
    data: Any, uri: str, path: Path | None = None
) -> YamlShard

Create a YamlShard from data.

Note

It is a utility function to create a YamlShard from its data and URI. It is possible to create a YamlShard in other ways.

Parameters:

Name Type Description Default
data Any

The data to save in the yaml file.

required
uri str

The shard's URI.

required
path Path | None

The path to the YAML file. If None, a path is automatically based on the URI.

None

Returns:

Type Description
YamlShard

The YamlShard object.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import create_yaml_shard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shard = create_yaml_shard([1, 2, 3], uri=Path(tmpdir).joinpath("my_uri").as_uri())
...     shard.get_data()
...
[1, 2, 3]

iden.shard.get_dict_uris

get_dict_uris(
    shards: dict[str, BaseShard]
) -> dict[str, str]

Get the dictionary of shard's URI.

Parameters:

Name Type Description Default
shards dict[str, BaseShard]

The dictionary of shards.

required

Returns:

Type Description
dict[str, str]

The dictionary of shard's URI.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import create_json_shard, get_dict_uris
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shards = {
...         "train": create_json_shard(
...             [1, 2, 3], uri=Path(tmpdir).joinpath("shard/uri1").as_uri()
...         ),
...         "val": create_json_shard(
...             [4, 5, 6, 7], uri=Path(tmpdir).joinpath("shard/uri2").as_uri()
...         ),
...     }
...     get_dict_uris(shards)
...
{'train': 'file:///.../shard/uri1', 'val': 'file:///.../shard/uri2'}

iden.shard.get_list_uris

get_list_uris(shards: Iterable[BaseShard]) -> list[str]

Get the list of shard's URI.

Parameters:

Name Type Description Default
shards Iterable[BaseShard]

The shards.

required

Returns:

Type Description
list[str]

The tuple of shard's URI.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import get_list_uris, create_json_shard
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shards = [
...         create_json_shard([1, 2, 3], uri=Path(tmpdir).joinpath("shard/uri1").as_uri()),
...         create_json_shard(
...             [4, 5, 6, 7], uri=Path(tmpdir).joinpath("shard/uri2").as_uri()
...         ),
...     ]
...     get_list_uris(shards)
...
['file:///.../shard/uri1', 'file:///.../shard/uri2']

iden.shard.load_from_uri

load_from_uri(uri: str) -> BaseShard

Load a shard from its Uniform Resource Identifier (URI).

Parameters:

Name Type Description Default
uri str

The URI of the shard.

required

Returns:

Type Description
BaseShard

The shard associated to the URI.

Raises:

Type Description
FileNotFoundError

if the URI file does not exist.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import create_json_shard, load_from_uri
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     uri = Path(tmpdir).joinpath("my_uri").as_uri()
...     _ = create_json_shard([1, 2, 3], uri=uri)
...     shard = load_from_uri(uri)
...     shard
...
JsonShard(uri=file:///.../my_uri)

iden.shard.sort_by_uri

sort_by_uri(
    shards: Iterable[BaseShard], /, *, reverse: bool = False
) -> list[BaseShard]

Sort a sequence of shards by their URIs.

Parameters:

Name Type Description Default
shards Iterable[BaseShard]

The shards to sort.

required
reverse bool

If set to True, then the list elements are sorted as if each comparison were reversed.

False

Returns:

Type Description
list[BaseShard]

The sorted shards.

Example usage:

>>> import tempfile
>>> from pathlib import Path
>>> from iden.shard import create_json_shard, sort_by_uri
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     shards = sort_by_uri(
...         [
...             create_json_shard([1, 2, 3], uri=Path(tmpdir).joinpath("uri2").as_uri()),
...             create_json_shard([4, 5, 6, 7], uri=Path(tmpdir).joinpath("uri3").as_uri()),
...             create_json_shard([4, 5, 6, 7], uri=Path(tmpdir).joinpath("uri1").as_uri()),
...         ]
...     )
...     shards
...
[JsonShard(uri=file:///.../uri1), JsonShard(uri=file:///.../uri2), JsonShard(uri=file:///.../uri3)]