Skip to content

exporter

grizz.exporter

Contain DataFrame exporters.

grizz.exporter.BaseExporter

Bases: ABC

Define the base class to implement a DataFrame exporter.

Example usage:

>>> import polars as pl
>>> from grizz.exporter import ParquetExporter
>>> exporter = ParquetExporter("/path/to/frame.parquet")
>>> exporter
ParquetExporter(path=/path/to/frame.parquet)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> exporter.export(frame)  # doctest: +SKIP

grizz.exporter.BaseExporter.equal abstractmethod

equal(other: Any, equal_nan: bool = False) -> bool

Indicate if two exporter objects are equal or not.

Parameters:

Name Type Description Default
other Any

The other object to compare.

required
equal_nan bool

Whether to compare NaN's as equal. If True, NaN's in both objects will be considered equal.

False

Returns:

Type Description
bool

True if the two exporters are equal, otherwise False.

Example usage:

>>> import numpy as np
>>> from grizz.exporter import CsvExporter
>>> obj1 = CsvExporter("/path/to/frame.csv")
>>> obj2 = CsvExporter("/path/to/frame.csv")
>>> obj3 = CsvExporter("/path/to/frame2.csv")
>>> obj1.equal(obj2)
True
>>> obj1.equal(obj3)
False

grizz.exporter.BaseExporter.export abstractmethod

export(frame: DataFrame) -> None

Export a DataFrame.

Parameters:

Name Type Description Default
frame DataFrame

The DataFrame to export.

required

Example usage:

>>> import polars as pl
>>> from grizz.exporter import ParquetExporter
>>> exporter = ParquetExporter("/path/to/frame.parquet")
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> exporter.export(frame)  # doctest: +SKIP

grizz.exporter.CsvExporter

Bases: BaseExporter

Implement a CSV DataFrame exporter.

Parameters:

Name Type Description Default
path Path | str

The path to the csv file to ingest.

required
**kwargs Any

Additional keyword arguments for polars.DataFrame.write_csv.

{}

Example usage:

>>> import polars as pl
>>> from grizz.exporter import CsvExporter
>>> exporter = CsvExporter(path="/path/to/frame.csv")
>>> exporter
CsvExporter(path=/path/to/frame.csv)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> exporter.export(frame)  # doctest: +SKIP

grizz.exporter.InMemoryExporter

Bases: BaseExporter, BaseIngestor

Implement an in-memory DataFrame exporter and ingestor.

Notes

This exporter is both exporter and ingestor as the object stores the DataFrame.

Example usage:

>>> import polars as pl
>>> from grizz.exporter import InMemoryExporter
>>> exporter = InMemoryExporter()
>>> exporter
InMemoryExporter(frame=None)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> exporter.export(frame)
>>> df = exporter.ingest()
>>> df
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┘

grizz.exporter.ParquetExporter

Bases: BaseExporter

Implement a parquet DataFrame exporter.

Parameters:

Name Type Description Default
path Path | str

The path to the parquet file to ingest.

required
**kwargs Any

Additional keyword arguments for polars.DataFrame.write_parquet.

{}

Example usage:

>>> import polars as pl
>>> from grizz.exporter import ParquetExporter
>>> exporter = ParquetExporter(path="/path/to/frame.parquet")
>>> exporter
ParquetExporter(path=/path/to/frame.parquet)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> exporter.export(frame)  # doctest: +SKIP

grizz.exporter.TransformExporter

Bases: BaseExporter

Implement an exporter that transforms the DataFrame before to export it.

Parameters:

Name Type Description Default
transformer BaseTransformer | dict

The polars.DataFrame transformer or its configuration.

required
exporter BaseExporter | dict

The DataFrame exporter or its configuration.

required

Example usage:

>>> import polars as pl
>>> from grizz.exporter import TransformExporter, ParquetExporter
>>> from grizz.transformer import InplaceCast
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> exporter = TransformExporter(
...     transformer=InplaceCast(columns=["col1", "col3"], dtype=pl.Float32),
...     exporter=ParquetExporter(path="/path/to/frame.parquet"),
... )
>>> exporter
TransformExporter(
  (transformer): InplaceCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Float32)
  (exporter): ParquetExporter(path=/path/to/frame.parquet)
)
>>> exporter.export(frame)  # doctest: +SKIP

grizz.exporter.is_exporter_config

is_exporter_config(config: dict) -> bool

Indicate if the input configuration is a configuration for a BaseExporter.

This function only checks if the value of the key _target_ is valid. It does not check the other values. If _target_ indicates a function, the returned type hint is used to check the class.

Parameters:

Name Type Description Default
config dict

The configuration to check.

required

Returns:

Type Description
bool

True if the input configuration is a configuration for a BaseExporter object.

Example usage:

>>> from grizz.exporter import is_exporter_config
>>> is_exporter_config(
...     {"_target_": "grizz.exporter.ParquetExporter", "path": "/path/to/data.parquet"}
... )
True

grizz.exporter.setup_exporter

setup_exporter(
    exporter: BaseExporter | dict,
) -> BaseExporter

Set up an exporter.

The exporter is instantiated from its configuration by using the BaseExporter factory function.

Parameters:

Name Type Description Default
exporter BaseExporter | dict

A exporter or its configuration.

required

Returns:

Type Description
BaseExporter

An instantiated exporter.

Example usage:

>>> from grizz.exporter import setup_exporter
>>> exporter = setup_exporter(
...     {"_target_": "grizz.exporter.ParquetExporter", "path": "/path/to/data.parquet"}
... )
>>> exporter
ParquetExporter(path=/path/to/data.parquet)