Ingestor
analora.ingestor ¶
Contain data ingestors.
analora.ingestor.BaseIngestor ¶
Bases: ABC
, Generic[T]
Define the base class to implement a data ingestor.
Example usage:
>>> from analora.ingestor import Ingestor
>>> ingestor = Ingestor([1, 2, 3, 4])
>>> ingestor
Ingestor()
>>> data = ingestor.ingest()
>>> data
[1, 2, 3, 4]
analora.ingestor.BaseIngestor.equal
abstractmethod
¶
equal(other: Any, equal_nan: bool = False) -> bool
Indicate if two ingestor objects are equal or not.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Any
|
The other object to compare. |
required |
equal_nan
|
bool
|
Whether to compare NaN's as equal. If |
False
|
Returns:
Type | Description |
---|---|
bool
|
|
Example usage:
>>> from analora.ingestor import Ingestor
>>> obj1 = Ingestor([1, 2, 3, 4])
>>> obj2 = Ingestor([1, 2, 3, 4])
>>> obj3 = Ingestor(["a", "b", "c"])
>>> obj1.equal(obj2)
True
>>> obj1.equal(obj3)
False
analora.ingestor.BaseIngestor.ingest
abstractmethod
¶
ingest() -> T
Ingest data.
Returns:
Type | Description |
---|---|
T
|
The ingested data. |
Example usage:
>>> from analora.ingestor import Ingestor
>>> ingestor = Ingestor([1, 2, 3, 4])
>>> data = ingestor.ingest()
>>> data
[1, 2, 3, 4]
analora.ingestor.Ingestor ¶
Bases: BaseIngestor[T]
Implement a simple data ingestor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
T
|
The data to ingest. |
required |
Example usage:
>>> from analora.ingestor import Ingestor
>>> ingestor = Ingestor(data=[1, 2, 3, 4, 5])
>>> ingestor
Ingestor()
>>> data = ingestor.ingest()
analora.ingestor.MappingIngestor ¶
Bases: BaseIngestor[dict[str, T]]
Implement a simple data ingestor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ingestors
|
Mapping[str, BaseIngestor[T] | dict]
|
The mapping of ingestors or their configuration. |
required |
Example usage:
>>> from analora.ingestor import Ingestor, MappingIngestor
>>> ingestor = MappingIngestor(
... {"key1": Ingestor(data=[1, 2, 3, 4, 5]), "key2": Ingestor(data="meow")}
... )
>>> ingestor
MappingIngestor(
(key1): Ingestor()
(key2): Ingestor()
)
>>> data = ingestor.ingest()
>>> data
{'key1': [1, 2, 3, 4, 5], 'key2': 'meow'}
analora.ingestor.PickleIngestor ¶
Bases: BaseIngestor[Any]
Implement a pickle file ingestor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path | str
|
The path to the pickle file containing the data to ingest. |
required |
Example usage:
>>> from analora.ingestor import PickleIngestor
>>> ingestor = PickleIngestor(path="/path/to/data.pickle")
>>> ingestor
PickleIngestor(path=/path/to/data.pickle)
>>> data = ingestor.ingest() # doctest: +SKIP
analora.ingestor.TorchIngestor ¶
Bases: BaseIngestor[Any]
Implement a torch file ingestor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path | str
|
The path to the torch file containing the data to ingest. |
required |
**kwargs
|
Any
|
Additional arguments passed to |
{}
|
Example usage:
>>> from analora.ingestor import TorchIngestor
>>> ingestor = TorchIngestor(path="/path/to/data.pt")
>>> ingestor
TorchIngestor(path=/path/to/data.pt)
>>> data = ingestor.ingest() # doctest: +SKIP
analora.ingestor.is_ingestor_config ¶
is_ingestor_config(config: dict) -> bool
Indicate if the input configuration is a configuration for a
BaseIngestor
.
This function only checks if the value of the key _target_
is valid. It does not check the other values. If _target_
indicates a function, the returned type hint is used to check
the class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
dict
|
The configuration to check. |
required |
Returns:
Type | Description |
---|---|
bool
|
|
Example usage:
>>> from analora.ingestor import is_ingestor_config
>>> is_ingestor_config({"_target_": "analora.ingestor.Ingestor", "data": [1, 2, 3, 4]})
True
analora.ingestor.setup_ingestor ¶
setup_ingestor(
ingestor: BaseIngestor | dict,
) -> BaseIngestor
Set up an ingestor.
The ingestor is instantiated from its configuration
by using the BaseIngestor
factory function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ingestor
|
BaseIngestor | dict
|
An ingestor or its configuration. |
required |
Returns:
Type | Description |
---|---|
BaseIngestor
|
An instantiated ingestor. |
Example usage:
>>> from analora.ingestor import setup_ingestor
>>> ingestor = setup_ingestor(
... {"_target_": "analora.ingestor.Ingestor", "data": [1, 2, 3, 4]}
... )
>>> ingestor
Ingestor()
analora.ingestor.polars ¶
Contain polars DataFrame ingestors.
analora.ingestor.polars.CsvIngestor ¶
Bases: BaseIngestor[DataFrame]
Implement a CSV ingestor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source
|
FileSource
|
The source to the CSV data to ingest. |
required |
**kwargs
|
Any
|
Additional keyword arguments for
|
{}
|
Example usage:
>>> from analora.ingestor.polars import CsvIngestor
>>> ingestor = CsvIngestor(source="/path/to/frame.csv")
>>> ingestor
CsvIngestor(source=/path/to/frame.csv)
>>> frame = ingestor.ingest() # doctest: +SKIP
analora.ingestor.polars.ParquetIngestor ¶
Bases: BaseIngestor[DataFrame]
Implement a parquet ingestor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source
|
FileSource
|
The source to the parquet data to ingest. |
required |
**kwargs
|
Any
|
Additional keyword arguments for
|
{}
|
Example usage:
>>> from analora.ingestor.polars import ParquetIngestor
>>> ingestor = ParquetIngestor(source="/path/to/frame.parquet")
>>> ingestor
ParquetIngestor(source=/path/to/frame.parquet)
>>> frame = ingestor.ingest() # doctest: +SKIP