Skip to content

Ingestor

analora.ingestor

Contain data ingestors.

analora.ingestor.BaseIngestor

Bases: ABC, Generic[T]

Define the base class to implement a data ingestor.

Example usage:

>>> from analora.ingestor import Ingestor
>>> ingestor = Ingestor([1, 2, 3, 4])
>>> ingestor
Ingestor()
>>> data = ingestor.ingest()
>>> data
[1, 2, 3, 4]

analora.ingestor.BaseIngestor.equal abstractmethod

equal(other: Any, equal_nan: bool = False) -> bool

Indicate if two ingestor objects are equal or not.

Parameters:

Name Type Description Default
other Any

The other object to compare.

required
equal_nan bool

Whether to compare NaN's as equal. If True, NaN's in both objects will be considered equal.

False

Returns:

Type Description
bool

True if the two ingestors are equal, otherwise False.

Example usage:

>>> from analora.ingestor import Ingestor
>>> obj1 = Ingestor([1, 2, 3, 4])
>>> obj2 = Ingestor([1, 2, 3, 4])
>>> obj3 = Ingestor(["a", "b", "c"])
>>> obj1.equal(obj2)
True
>>> obj1.equal(obj3)
False

analora.ingestor.BaseIngestor.ingest abstractmethod

ingest() -> T

Ingest data.

Returns:

Type Description
T

The ingested data.

Example usage:

>>> from analora.ingestor import Ingestor
>>> ingestor = Ingestor([1, 2, 3, 4])
>>> data = ingestor.ingest()
>>> data
[1, 2, 3, 4]

analora.ingestor.Ingestor

Bases: BaseIngestor[T]

Implement a simple data ingestor.

Parameters:

Name Type Description Default
data T

The data to ingest.

required

Example usage:

>>> from analora.ingestor import Ingestor
>>> ingestor = Ingestor(data=[1, 2, 3, 4, 5])
>>> ingestor
Ingestor()
>>> data = ingestor.ingest()

analora.ingestor.MappingIngestor

Bases: BaseIngestor[dict[str, T]]

Implement a simple data ingestor.

Parameters:

Name Type Description Default
ingestors Mapping[str, BaseIngestor[T] | dict]

The mapping of ingestors or their configuration.

required

Example usage:

>>> from analora.ingestor import Ingestor, MappingIngestor
>>> ingestor = MappingIngestor(
...     {"key1": Ingestor(data=[1, 2, 3, 4, 5]), "key2": Ingestor(data="meow")}
... )
>>> ingestor
MappingIngestor(
  (key1): Ingestor()
  (key2): Ingestor()
)
>>> data = ingestor.ingest()
>>> data
{'key1': [1, 2, 3, 4, 5], 'key2': 'meow'}

analora.ingestor.PickleIngestor

Bases: BaseIngestor[Any]

Implement a pickle file ingestor.

Parameters:

Name Type Description Default
path Path | str

The path to the pickle file containing the data to ingest.

required

Example usage:

>>> from analora.ingestor import PickleIngestor
>>> ingestor = PickleIngestor(path="/path/to/data.pickle")
>>> ingestor
PickleIngestor(path=/path/to/data.pickle)
>>> data = ingestor.ingest()  # doctest: +SKIP

analora.ingestor.TorchIngestor

Bases: BaseIngestor[Any]

Implement a torch file ingestor.

Parameters:

Name Type Description Default
path Path | str

The path to the torch file containing the data to ingest.

required
**kwargs Any

Additional arguments passed to torch.load.

{}

Example usage:

>>> from analora.ingestor import TorchIngestor
>>> ingestor = TorchIngestor(path="/path/to/data.pt")
>>> ingestor
TorchIngestor(path=/path/to/data.pt)
>>> data = ingestor.ingest()  # doctest: +SKIP

analora.ingestor.is_ingestor_config

is_ingestor_config(config: dict) -> bool

Indicate if the input configuration is a configuration for a BaseIngestor.

This function only checks if the value of the key _target_ is valid. It does not check the other values. If _target_ indicates a function, the returned type hint is used to check the class.

Parameters:

Name Type Description Default
config dict

The configuration to check.

required

Returns:

Type Description
bool

True if the input configuration is a configuration for a BaseIngestor object.

Example usage:

>>> from analora.ingestor import is_ingestor_config
>>> is_ingestor_config({"_target_": "analora.ingestor.Ingestor", "data": [1, 2, 3, 4]})
True

analora.ingestor.setup_ingestor

setup_ingestor(
    ingestor: BaseIngestor | dict,
) -> BaseIngestor

Set up an ingestor.

The ingestor is instantiated from its configuration by using the BaseIngestor factory function.

Parameters:

Name Type Description Default
ingestor BaseIngestor | dict

An ingestor or its configuration.

required

Returns:

Type Description
BaseIngestor

An instantiated ingestor.

Example usage:

>>> from analora.ingestor import setup_ingestor
>>> ingestor = setup_ingestor(
...     {"_target_": "analora.ingestor.Ingestor", "data": [1, 2, 3, 4]}
... )
>>> ingestor
Ingestor()

analora.ingestor.polars

Contain polars DataFrame ingestors.

analora.ingestor.polars.CsvIngestor

Bases: BaseIngestor[DataFrame]

Implement a CSV ingestor.

Parameters:

Name Type Description Default
source FileSource

The source to the CSV data to ingest.

required
**kwargs Any

Additional keyword arguments for polars.scan_csv.

{}

Example usage:

>>> from analora.ingestor.polars import CsvIngestor
>>> ingestor = CsvIngestor(source="/path/to/frame.csv")
>>> ingestor
CsvIngestor(source=/path/to/frame.csv)
>>> frame = ingestor.ingest()  # doctest: +SKIP

analora.ingestor.polars.ParquetIngestor

Bases: BaseIngestor[DataFrame]

Implement a parquet ingestor.

Parameters:

Name Type Description Default
source FileSource

The source to the parquet data to ingest.

required
**kwargs Any

Additional keyword arguments for polars.read_parquet.

{}

Example usage:

>>> from analora.ingestor.polars import ParquetIngestor
>>> ingestor = ParquetIngestor(source="/path/to/frame.parquet")
>>> ingestor
ParquetIngestor(source=/path/to/frame.parquet)
>>> frame = ingestor.ingest()  # doctest: +SKIP