exporter
grizz.exporter ¶
Contain DataFrame exporters.
grizz.exporter.BaseExporter ¶
Bases: ABC
Define the base class to implement a DataFrame exporter.
Example usage:
>>> import polars as pl
>>> from grizz.exporter import ParquetExporter
>>> exporter = ParquetExporter("/path/to/frame.parquet")
>>> exporter
ParquetExporter(path=/path/to/frame.parquet)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> exporter.export(frame) # doctest: +SKIP
grizz.exporter.BaseExporter.equal
abstractmethod
¶
equal(other: Any, equal_nan: bool = False) -> bool
Indicate if two exporter objects are equal or not.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Any
|
The other object to compare. |
required |
equal_nan
|
bool
|
Whether to compare NaN's as equal. If |
False
|
Returns:
Type | Description |
---|---|
bool
|
|
Example usage:
>>> import numpy as np
>>> from grizz.exporter import CsvExporter
>>> obj1 = CsvExporter("/path/to/frame.csv")
>>> obj2 = CsvExporter("/path/to/frame.csv")
>>> obj3 = CsvExporter("/path/to/frame2.csv")
>>> obj1.equal(obj2)
True
>>> obj1.equal(obj3)
False
grizz.exporter.BaseExporter.export
abstractmethod
¶
export(frame: DataFrame) -> None
Export a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The DataFrame to export. |
required |
Example usage:
>>> import polars as pl
>>> from grizz.exporter import ParquetExporter
>>> exporter = ParquetExporter("/path/to/frame.parquet")
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> exporter.export(frame) # doctest: +SKIP
grizz.exporter.CsvExporter ¶
Bases: BaseExporter
Implement a CSV DataFrame exporter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path | str
|
The path to the csv file to ingest. |
required |
**kwargs
|
Any
|
Additional keyword arguments for
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.exporter import CsvExporter
>>> exporter = CsvExporter(path="/path/to/frame.csv")
>>> exporter
CsvExporter(path=/path/to/frame.csv)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> exporter.export(frame) # doctest: +SKIP
grizz.exporter.InMemoryExporter ¶
Bases: BaseExporter
, BaseIngestor
Implement an in-memory DataFrame exporter and ingestor.
Notes
This exporter is both exporter and ingestor as the object stores the DataFrame.
Example usage:
>>> import polars as pl
>>> from grizz.exporter import InMemoryExporter
>>> exporter = InMemoryExporter()
>>> exporter
InMemoryExporter(frame=None)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> exporter.export(frame)
>>> df = exporter.ingest()
>>> df
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┘
grizz.exporter.ParquetExporter ¶
Bases: BaseExporter
Implement a parquet DataFrame exporter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path | str
|
The path to the parquet file to ingest. |
required |
**kwargs
|
Any
|
Additional keyword arguments for
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.exporter import ParquetExporter
>>> exporter = ParquetExporter(path="/path/to/frame.parquet")
>>> exporter
ParquetExporter(path=/path/to/frame.parquet)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> exporter.export(frame) # doctest: +SKIP
grizz.exporter.TransformExporter ¶
Bases: BaseExporter
Implement an exporter that transforms the DataFrame before to export it.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transformer
|
BaseTransformer | dict
|
The |
required |
exporter
|
BaseExporter | dict
|
The DataFrame exporter or its configuration. |
required |
Example usage:
>>> import polars as pl
>>> from grizz.exporter import TransformExporter, ParquetExporter
>>> from grizz.transformer import InplaceCast
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> exporter = TransformExporter(
... transformer=InplaceCast(columns=["col1", "col3"], dtype=pl.Float32),
... exporter=ParquetExporter(path="/path/to/frame.parquet"),
... )
>>> exporter
TransformExporter(
(transformer): InplaceCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Float32)
(exporter): ParquetExporter(path=/path/to/frame.parquet)
)
>>> exporter.export(frame) # doctest: +SKIP
grizz.exporter.is_exporter_config ¶
is_exporter_config(config: dict) -> bool
Indicate if the input configuration is a configuration for a
BaseExporter
.
This function only checks if the value of the key _target_
is valid. It does not check the other values. If _target_
indicates a function, the returned type hint is used to check
the class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
dict
|
The configuration to check. |
required |
Returns:
Type | Description |
---|---|
bool
|
|
Example usage:
>>> from grizz.exporter import is_exporter_config
>>> is_exporter_config(
... {"_target_": "grizz.exporter.ParquetExporter", "path": "/path/to/data.parquet"}
... )
True
grizz.exporter.setup_exporter ¶
setup_exporter(
exporter: BaseExporter | dict,
) -> BaseExporter
Set up an exporter.
The exporter is instantiated from its configuration
by using the BaseExporter
factory function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exporter
|
BaseExporter | dict
|
A exporter or its configuration. |
required |
Returns:
Type | Description |
---|---|
BaseExporter
|
An instantiated exporter. |
Example usage:
>>> from grizz.exporter import setup_exporter
>>> exporter = setup_exporter(
... {"_target_": "grizz.exporter.ParquetExporter", "path": "/path/to/data.parquet"}
... )
>>> exporter
ParquetExporter(path=/path/to/data.parquet)