Skip to content

transformer

grizz.transformer

Contain polars.DataFrame transformers.

grizz.transformer.BaseColumnsTransformer

Bases: BaseTransformer

Define a base class to implement transformers that apply the same transformation on multiple columns.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to prepare. If None, it processes all the columns of type string.

None
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> transformer = StripChars(columns=["col2", "col3"])
>>> transformer
StripCharsTransformer(columns=('col2', 'col3'), ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ str  ┆ str  ┆ str   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1    ┆ a    ┆ a     │
│ 2    ┆ 2    ┆ b    ┆  b    │
│ 3    ┆ 3    ┆ c    ┆   c   │
│ 4    ┆ 4    ┆ d    ┆ d     │
│ 5    ┆ 5    ┆ e    ┆ e     │
└──────┴──────┴──────┴───────┘

grizz.transformer.BaseColumnsTransformer.find_columns

find_columns(frame: DataFrame) -> tuple[str, ...]

Find the columns to transform.

Parameters:

Name Type Description Default
frame DataFrame

The input DataFrame. Sometimes the columns to transform are found by analyzing the input DataFrame.

required

Returns:

Type Description
tuple[str, ...]

The columns to transform.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> transformer = StripChars(columns=["col2", "col3"])
>>> transformer.find_columns(frame)
('col2', 'col3')
>>> transformer = StripChars()
>>> transformer.find_columns(frame)
('col1', 'col2', 'col3', 'col4')

grizz.transformer.BaseColumnsTransformer.find_common_columns

find_common_columns(frame: DataFrame) -> tuple[str, ...]

Find the common columns.

Parameters:

Name Type Description Default
frame DataFrame

The input DataFrame. Sometimes the columns to transform are found by analyzing the input DataFrame.

required

Returns:

Type Description
tuple[str, ...]

The common columns.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> transformer = StripChars(columns=["col2", "col3", "col5"])
>>> transformer.find_common_columns(frame)
('col2', 'col3')
>>> transformer = StripChars()
>>> transformer.find_common_columns(frame)
('col1', 'col2', 'col3', 'col4')

grizz.transformer.BaseColumnsTransformer.find_missing_columns

find_missing_columns(frame: DataFrame) -> tuple[str, ...]

Find the missing columns.

Parameters:

Name Type Description Default
frame DataFrame

The input DataFrame. Sometimes the columns to transform are found by analyzing the input DataFrame.

required

Returns:

Type Description
tuple[str, ...]

The missing columns.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> transformer = StripChars(columns=["col2", "col3", "col5"])
>>> transformer.find_missing_columns(frame)
('col5',)
>>> transformer = StripChars()
>>> transformer.find_missing_columns(frame)
()

grizz.transformer.BaseTransformer

Bases: ABC

Define the base class to transform a polars.DataFrame.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
CastTransformer(columns=('col1', 'col3'), dtype=Int32, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.BaseTransformer.transform

transform(frame: DataFrame) -> DataFrame

Transform the data in the polars.DataFrame.

Parameters:

Name Type Description Default
frame DataFrame

The polars.DataFrame to transform.

required

Returns:

Type Description
DataFrame

The transformed DataFrame.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.Cast

Bases: BaseColumnsTransformer

Implement a transformer to convert some columns to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
CastTransformer(columns=('col1', 'col3'), dtype=Int32, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.CastTransformer

Bases: BaseColumnsTransformer

Implement a transformer to convert some columns to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
CastTransformer(columns=('col1', 'col3'), dtype=Int32, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.ColumnSelection

Bases: BaseColumnsTransformer

Implement a polars.DataFrame transformer to select a subset of columns.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to keep.

required
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise a warning message is shown.

False

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnSelection
>>> transformer = ColumnSelection(columns=["col1", "col2"])
>>> transformer
ColumnSelectionTransformer(columns=('col1', 'col2'), ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, 2, 3, 4, 5],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
    shape: (5, 2)
┌────────────┬──────┐
│ col1       ┆ col2 │
│ ---        ┆ ---  │
│ str        ┆ i64  │
╞════════════╪══════╡
│ 2020-1-1   ┆ 1    │
│ 2020-1-2   ┆ 2    │
│ 2020-1-31  ┆ 3    │
│ 2020-12-31 ┆ 4    │
│ null       ┆ 5    │
└────────────┴──────┘

grizz.transformer.ColumnSelectionTransformer

Bases: BaseColumnsTransformer

Implement a polars.DataFrame transformer to select a subset of columns.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to keep.

required
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise a warning message is shown.

False

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnSelection
>>> transformer = ColumnSelection(columns=["col1", "col2"])
>>> transformer
ColumnSelectionTransformer(columns=('col1', 'col2'), ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, 2, 3, 4, 5],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
    shape: (5, 2)
┌────────────┬──────┐
│ col1       ┆ col2 │
│ ---        ┆ ---  │
│ str        ┆ i64  │
╞════════════╪══════╡
│ 2020-1-1   ┆ 1    │
│ 2020-1-2   ┆ 2    │
│ 2020-1-31  ┆ 3    │
│ 2020-12-31 ┆ 4    │
│ null       ┆ 5    │
└────────────┴──────┘

grizz.transformer.ConcatColumns

Bases: BaseColumnsTransformer

Implement a transformer to concatenate columns into a new column.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to concatenate. The column should have the same type or compatible types.

required
out_column str

The output column.

required
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_column="col")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_column=col, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col          │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---          │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ list[i64]    │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ [11, 21, 31] │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ [12, 22, 32] │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ [13, 23, 33] │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ [14, 24, 34] │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘

grizz.transformer.ConcatColumnsTransformer

Bases: BaseColumnsTransformer

Implement a transformer to concatenate columns into a new column.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to concatenate. The column should have the same type or compatible types.

required
out_column str

The output column.

required
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_column="col")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_column=col, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col          │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---          │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ list[i64]    │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ [11, 21, 31] │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ [12, 22, 32] │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ [13, 23, 33] │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ [14, 24, 34] │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘

grizz.transformer.DecimalCast

Bases: CastTransformer

Implement a transformer to convert columns of type decimal to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DecimalCast
>>> transformer = DecimalCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
DecimalCastTransformer(columns=('col1', 'col2'), dtype=Float32, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Decimal,
...         "col3": pl.Decimal,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬───────────────┬───────────────┬──────┐
│ col1 ┆ col2          ┆ col3          ┆ col4 │
│ ---  ┆ ---           ┆ ---           ┆ ---  │
│ i64  ┆ decimal[38,0] ┆ decimal[38,0] ┆ str  │
╞══════╪═══════════════╪═══════════════╪══════╡
│ 1    ┆ 1             ┆ 1             ┆ a    │
│ 2    ┆ 2             ┆ 2             ┆ b    │
│ 3    ┆ 3             ┆ 3             ┆ c    │
│ 4    ┆ 4             ┆ 4             ┆ d    │
│ 5    ┆ 5             ┆ 5             ┆ e    │
└──────┴───────────────┴───────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────────────┬──────┐
│ col1 ┆ col2 ┆ col3          ┆ col4 │
│ ---  ┆ ---  ┆ ---           ┆ ---  │
│ i64  ┆ f32  ┆ decimal[38,0] ┆ str  │
╞══════╪══════╪═══════════════╪══════╡
│ 1    ┆ 1.0  ┆ 1             ┆ a    │
│ 2    ┆ 2.0  ┆ 2             ┆ b    │
│ 3    ┆ 3.0  ┆ 3             ┆ c    │
│ 4    ┆ 4.0  ┆ 4             ┆ d    │
│ 5    ┆ 5.0  ┆ 5             ┆ e    │
└──────┴──────┴───────────────┴──────┘

grizz.transformer.DecimalCastTransformer

Bases: CastTransformer

Implement a transformer to convert columns of type decimal to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DecimalCast
>>> transformer = DecimalCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
DecimalCastTransformer(columns=('col1', 'col2'), dtype=Float32, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Decimal,
...         "col3": pl.Decimal,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬───────────────┬───────────────┬──────┐
│ col1 ┆ col2          ┆ col3          ┆ col4 │
│ ---  ┆ ---           ┆ ---           ┆ ---  │
│ i64  ┆ decimal[38,0] ┆ decimal[38,0] ┆ str  │
╞══════╪═══════════════╪═══════════════╪══════╡
│ 1    ┆ 1             ┆ 1             ┆ a    │
│ 2    ┆ 2             ┆ 2             ┆ b    │
│ 3    ┆ 3             ┆ 3             ┆ c    │
│ 4    ┆ 4             ┆ 4             ┆ d    │
│ 5    ┆ 5             ┆ 5             ┆ e    │
└──────┴───────────────┴───────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────────────┬──────┐
│ col1 ┆ col2 ┆ col3          ┆ col4 │
│ ---  ┆ ---  ┆ ---           ┆ ---  │
│ i64  ┆ f32  ┆ decimal[38,0] ┆ str  │
╞══════╪══════╪═══════════════╪══════╡
│ 1    ┆ 1.0  ┆ 1             ┆ a    │
│ 2    ┆ 2.0  ┆ 2             ┆ b    │
│ 3    ┆ 3.0  ┆ 3             ┆ c    │
│ 4    ┆ 4.0  ┆ 4             ┆ d    │
│ 5    ┆ 5.0  ┆ 5             ┆ e    │
└──────┴──────┴───────────────┴──────┘

grizz.transformer.Diff

Bases: BaseTransformer

Implement a transformer to compute the first discrete difference between shifted items.

Parameters:

Name Type Description Default
in_col str

The input column name.

required
out_col str

The output column name.

required
shift int

The number of slots to shift.

1

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Diff
>>> transformer = Diff(in_col="col1", out_col="diff")
>>> transformer
DiffTransformer(in_col=col1, out_col=diff, shift=1)
>>> frame = pl.DataFrame({"col1": [1, 2, 3, 4, 5], "col2": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ a    │
│ 2    ┆ b    │
│ 3    ┆ c    │
│ 4    ┆ d    │
│ 5    ┆ e    │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ diff │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╡
│ 1    ┆ a    ┆ null │
│ 2    ┆ b    ┆ 1    │
│ 3    ┆ c    ┆ 1    │
│ 4    ┆ d    ┆ 1    │
│ 5    ┆ e    ┆ 1    │
└──────┴──────┴──────┘

grizz.transformer.DiffTransformer

Bases: BaseTransformer

Implement a transformer to compute the first discrete difference between shifted items.

Parameters:

Name Type Description Default
in_col str

The input column name.

required
out_col str

The output column name.

required
shift int

The number of slots to shift.

1

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Diff
>>> transformer = Diff(in_col="col1", out_col="diff")
>>> transformer
DiffTransformer(in_col=col1, out_col=diff, shift=1)
>>> frame = pl.DataFrame({"col1": [1, 2, 3, 4, 5], "col2": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ a    │
│ 2    ┆ b    │
│ 3    ┆ c    │
│ 4    ┆ d    │
│ 5    ┆ e    │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ diff │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╡
│ 1    ┆ a    ┆ null │
│ 2    ┆ b    ┆ 1    │
│ 3    ┆ c    ┆ 1    │
│ 4    ┆ d    ┆ 1    │
│ 5    ┆ e    ┆ 1    │
└──────┴──────┴──────┘

grizz.transformer.DropDuplicate

Bases: BaseColumnsTransformer

Implement a transformer to drop duplicate rows.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to check. If set to None (default), use all columns.

None
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for unique.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropDuplicate
>>> transformer = DropDuplicate(keep="first", maintain_order=True)
>>> transformer
DropDuplicateTransformer(columns=None, ignore_missing=False, keep=first, maintain_order=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 1],
...         "col2": ["1", "2", "3", "4", "1"],
...         "col3": ["1", "2", "3", "1", "1"],
...         "col4": ["a", "a", "a", "a", "a"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ a    │
│ 3    ┆ 3    ┆ 3    ┆ a    │
│ 4    ┆ 4    ┆ 1    ┆ a    │
│ 1    ┆ 1    ┆ 1    ┆ a    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ a    │
│ 3    ┆ 3    ┆ 3    ┆ a    │
│ 4    ┆ 4    ┆ 1    ┆ a    │
└──────┴──────┴──────┴──────┘

grizz.transformer.DropDuplicateTransformer

Bases: BaseColumnsTransformer

Implement a transformer to drop duplicate rows.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to check. If set to None (default), use all columns.

None
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for unique.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropDuplicate
>>> transformer = DropDuplicate(keep="first", maintain_order=True)
>>> transformer
DropDuplicateTransformer(columns=None, ignore_missing=False, keep=first, maintain_order=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 1],
...         "col2": ["1", "2", "3", "4", "1"],
...         "col3": ["1", "2", "3", "1", "1"],
...         "col4": ["a", "a", "a", "a", "a"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ a    │
│ 3    ┆ 3    ┆ 3    ┆ a    │
│ 4    ┆ 4    ┆ 1    ┆ a    │
│ 1    ┆ 1    ┆ 1    ┆ a    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ a    │
│ 3    ┆ 3    ┆ 3    ┆ a    │
│ 4    ┆ 4    ┆ 1    ┆ a    │
└──────┴──────┴──────┴──────┘

grizz.transformer.DropNullColumn

Bases: BaseColumnsTransformer

Implement a transformer to remove the columns that have too many null values.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

None
threshold float

The maximum percentage of null values to keep columns. If the proportion of null vallues is greater or equal to this threshold value, the column is removed. If set to 1.0, it removes all the columns that have only null values.

1.0
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullColumn
>>> transformer = DropNullColumn()
>>> transformer
DropNullColumnTransformer(columns=None, threshold=1.0, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, None, 3, None, 5],
...         "col3": [None, None, None, None, None],
...     }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null       ┆ 5    ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌────────────┬──────┐
│ col1       ┆ col2 │
│ ---        ┆ ---  │
│ str        ┆ i64  │
╞════════════╪══════╡
│ 2020-1-1   ┆ 1    │
│ 2020-1-2   ┆ null │
│ 2020-1-31  ┆ 3    │
│ 2020-12-31 ┆ null │
│ null       ┆ 5    │
└────────────┴──────┘

grizz.transformer.DropNullColumnTransformer

Bases: BaseColumnsTransformer

Implement a transformer to remove the columns that have too many null values.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

None
threshold float

The maximum percentage of null values to keep columns. If the proportion of null vallues is greater or equal to this threshold value, the column is removed. If set to 1.0, it removes all the columns that have only null values.

1.0
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullColumn
>>> transformer = DropNullColumn()
>>> transformer
DropNullColumnTransformer(columns=None, threshold=1.0, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, None, 3, None, 5],
...         "col3": [None, None, None, None, None],
...     }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null       ┆ 5    ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌────────────┬──────┐
│ col1       ┆ col2 │
│ ---        ┆ ---  │
│ str        ┆ i64  │
╞════════════╪══════╡
│ 2020-1-1   ┆ 1    │
│ 2020-1-2   ┆ null │
│ 2020-1-31  ┆ 3    │
│ 2020-12-31 ┆ null │
│ null       ┆ 5    │
└────────────┴──────┘

grizz.transformer.DropNullRow

Bases: BaseColumnsTransformer

Implement a transformer to drop all rows that contain null values.

Note that all the values in the row need to be null to drop the row.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to check. If set to None (default), use all columns.

None
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> transformer = DropNullRow()
>>> transformer
DropNullRowTransformer(columns=None, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, None, 3, None, None],
...         "col3": [None, None, None, None, None],
...     }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null       ┆ null ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
└────────────┴──────┴──────┘

grizz.transformer.DropNullRowTransformer

Bases: BaseColumnsTransformer

Implement a transformer to drop all rows that contain null values.

Note that all the values in the row need to be null to drop the row.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to check. If set to None (default), use all columns.

None
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> transformer = DropNullRow()
>>> transformer
DropNullRowTransformer(columns=None, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, None, 3, None, None],
...         "col3": [None, None, None, None, None],
...     }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null       ┆ null ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
└────────────┴──────┴──────┘

grizz.transformer.FillNan

Bases: BaseColumnsTransformer

Implement a transformer to fill NaN values.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to prepare. If None, it processes all the columns of type string.

None
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for fill_nan.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FillNan
>>> transformer = FillNan(columns=["col1", "col4"], value=100)
>>> transformer
FillNanTransformer(columns=('col1', 'col4'), ignore_missing=False, value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ NaN  ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ f64  ┆ str  ┆ f64   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2   │
│ 2    ┆ 2.2  ┆ b    ┆ 100.0 │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2   │
│ 4    ┆ 4.2  ┆ d    ┆ null  │
│ null ┆ NaN  ┆ null ┆ 5.2   │
└──────┴──────┴──────┴───────┘

grizz.transformer.FillNanTransformer

Bases: BaseColumnsTransformer

Implement a transformer to fill NaN values.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to prepare. If None, it processes all the columns of type string.

None
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for fill_nan.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FillNan
>>> transformer = FillNan(columns=["col1", "col4"], value=100)
>>> transformer
FillNanTransformer(columns=('col1', 'col4'), ignore_missing=False, value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ NaN  ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ f64  ┆ str  ┆ f64   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2   │
│ 2    ┆ 2.2  ┆ b    ┆ 100.0 │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2   │
│ 4    ┆ 4.2  ┆ d    ┆ null  │
│ null ┆ NaN  ┆ null ┆ 5.2   │
└──────┴──────┴──────┴───────┘

grizz.transformer.FillNull

Bases: BaseColumnsTransformer

Implement a transformer to fill null values.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to prepare. If None, it processes all the columns of type string.

None
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for fill_null.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FillNull
>>> transformer = FillNull(columns=["col1", "col4"], value=100)
>>> transformer
FillNullTransformer(columns=('col1', 'col4'), ignore_missing=False, value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, None],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ null ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ f64  ┆ str  ┆ f64   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2   │
│ 2    ┆ 2.2  ┆ b    ┆ NaN   │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2   │
│ 4    ┆ 4.2  ┆ d    ┆ 100.0 │
│ 100  ┆ null ┆ null ┆ 5.2   │
└──────┴──────┴──────┴───────┘

grizz.transformer.FillNullTransformer

Bases: BaseColumnsTransformer

Implement a transformer to fill null values.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to prepare. If None, it processes all the columns of type string.

None
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for fill_null.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FillNull
>>> transformer = FillNull(columns=["col1", "col4"], value=100)
>>> transformer
FillNullTransformer(columns=('col1', 'col4'), ignore_missing=False, value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, None],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ null ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ f64  ┆ str  ┆ f64   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2   │
│ 2    ┆ 2.2  ┆ b    ┆ NaN   │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2   │
│ 4    ┆ 4.2  ┆ d    ┆ 100.0 │
│ 100  ┆ null ┆ null ┆ 5.2   │
└──────┴──────┴──────┴───────┘

grizz.transformer.FloatCast

Bases: CastTransformer

Implement a transformer to convert columns of type float to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FloatCast
>>> transformer = FloatCast(columns=["col1", "col2"], dtype=pl.Int32)
>>> transformer
FloatCastTransformer(columns=('col1', 'col2'), dtype=Int32, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i32  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1.0  ┆ a    │
│ 2    ┆ 2    ┆ 2.0  ┆ b    │
│ 3    ┆ 3    ┆ 3.0  ┆ c    │
│ 4    ┆ 4    ┆ 4.0  ┆ d    │
│ 5    ┆ 5    ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.FloatCastTransformer

Bases: CastTransformer

Implement a transformer to convert columns of type float to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FloatCast
>>> transformer = FloatCast(columns=["col1", "col2"], dtype=pl.Int32)
>>> transformer
FloatCastTransformer(columns=('col1', 'col2'), dtype=Int32, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i32  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1.0  ┆ a    │
│ 2    ┆ 2    ┆ 2.0  ┆ b    │
│ 3    ┆ 3    ┆ 3.0  ┆ c    │
│ 4    ┆ 4    ┆ 4.0  ┆ d    │
│ 5    ┆ 5    ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.Function

Bases: BaseTransformer

Implement a transformer that is a wrapper around a function to transform the DataFrame.

Parameters:

Name Type Description Default
func Callable[[DataFrame], DataFrame]

The function to transform the DataFrame.

required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FunctionTransformer
>>> transformer = FunctionTransformer(
...     func=lambda frame: frame.filter(pl.col("col1").is_in({2, 4}))
... )
>>> transformer
FunctionTransformer(func=<function <lambda> at 0x...>)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (2, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
└──────┴──────┴──────┴──────┘

grizz.transformer.FunctionTransformer

Bases: BaseTransformer

Implement a transformer that is a wrapper around a function to transform the DataFrame.

Parameters:

Name Type Description Default
func Callable[[DataFrame], DataFrame]

The function to transform the DataFrame.

required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FunctionTransformer
>>> transformer = FunctionTransformer(
...     func=lambda frame: frame.filter(pl.col("col1").is_in({2, 4}))
... )
>>> transformer
FunctionTransformer(func=<function <lambda> at 0x...>)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (2, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
└──────┴──────┴──────┴──────┘

grizz.transformer.IntegerCast

Bases: CastTransformer

Implement a transformer to convert columns of type integer to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import IntegerCast
>>> transformer = IntegerCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
IntegerCastTransformer(columns=('col1', 'col2'), dtype=Float32, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1, 2, 3, 4, 5],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Int64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1    ┆ a    │
│ 2    ┆ 2.0  ┆ 2    ┆ b    │
│ 3    ┆ 3.0  ┆ 3    ┆ c    │
│ 4    ┆ 4.0  ┆ 4    ┆ d    │
│ 5    ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f32  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ 1    ┆ a    │
│ 2.0  ┆ 2.0  ┆ 2    ┆ b    │
│ 3.0  ┆ 3.0  ┆ 3    ┆ c    │
│ 4.0  ┆ 4.0  ┆ 4    ┆ d    │
│ 5.0  ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.IntegerCastTransformer

Bases: CastTransformer

Implement a transformer to convert columns of type integer to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import IntegerCast
>>> transformer = IntegerCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
IntegerCastTransformer(columns=('col1', 'col2'), dtype=Float32, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1, 2, 3, 4, 5],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Int64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1    ┆ a    │
│ 2    ┆ 2.0  ┆ 2    ┆ b    │
│ 3    ┆ 3.0  ┆ 3    ┆ c    │
│ 4    ┆ 4.0  ┆ 4    ┆ d    │
│ 5    ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f32  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ 1    ┆ a    │
│ 2.0  ┆ 2.0  ┆ 2    ┆ b    │
│ 3.0  ┆ 3.0  ┆ 3    ┆ c    │
│ 4.0  ┆ 4.0  ┆ 4    ┆ d    │
│ 5.0  ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.JsonDecode

Bases: BaseColumnsTransformer

Implement a transformer to parse string values as JSON.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to parse. None means all the columns.

required
dtype PolarsDataType | PythonDataType | None

The dtype to cast the extracted value to. If None, the dtype will be inferred from the JSON value.

None

Example usage:

>>> import polars as pl
>>> from grizz.transformer import JsonDecode
>>> transformer = JsonDecode(columns=["col1", "col3"])
>>> transformer
JsonDecodeTransformer(columns=('col1', 'col3'), dtype=None, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ str       ┆ str  ┆ str             ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ list[i64] ┆ str  ┆ list[str]       ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ["1", "2"]      ┆ a    │
│ [2]       ┆ 2    ┆ ["2"]           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ["1", "2", "3"] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ["4", "5"]      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ["5", "4"]      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘

grizz.transformer.JsonDecodeTransformer

Bases: BaseColumnsTransformer

Implement a transformer to parse string values as JSON.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to parse. None means all the columns.

required
dtype PolarsDataType | PythonDataType | None

The dtype to cast the extracted value to. If None, the dtype will be inferred from the JSON value.

None

Example usage:

>>> import polars as pl
>>> from grizz.transformer import JsonDecode
>>> transformer = JsonDecode(columns=["col1", "col3"])
>>> transformer
JsonDecodeTransformer(columns=('col1', 'col3'), dtype=None, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ str       ┆ str  ┆ str             ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ list[i64] ┆ str  ┆ list[str]       ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ["1", "2"]      ┆ a    │
│ [2]       ┆ 2    ┆ ["2"]           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ["1", "2", "3"] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ["4", "5"]      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ["5", "4"]      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘

grizz.transformer.Replace

Bases: BaseTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
orig_column str

The original column name.

required
final_column str

The final column name.

required
*args Any

The positional arguments to pass to replace.

()
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Replace
>>> transformer = Replace(
...     orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3}
... )
>>> transformer
ReplaceTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ d   │
│ e   ┆ e   │
└─────┴─────┘
>>> transformer = Replace(
...     orig_column="old",
...     final_column="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

grizz.transformer.ReplaceStrict

Bases: BaseTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
orig_column str

The original column name.

required
final_column str

The final column name.

required
*args Any

The positional arguments to pass to replace.

()
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ReplaceStrict
>>> transformer = ReplaceStrict(
...     orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
ReplaceStrictTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ 4   │
│ e   ┆ 5   │
└─────┴─────┘
>>> transformer = ReplaceStrict(
...     orig_column="old",
...     final_column="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

grizz.transformer.ReplaceStrictTransformer

Bases: BaseTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
orig_column str

The original column name.

required
final_column str

The final column name.

required
*args Any

The positional arguments to pass to replace.

()
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ReplaceStrict
>>> transformer = ReplaceStrict(
...     orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
ReplaceStrictTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ 4   │
│ e   ┆ 5   │
└─────┴─────┘
>>> transformer = ReplaceStrict(
...     orig_column="old",
...     final_column="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

grizz.transformer.ReplaceTransformer

Bases: BaseTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
orig_column str

The original column name.

required
final_column str

The final column name.

required
*args Any

The positional arguments to pass to replace.

()
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Replace
>>> transformer = Replace(
...     orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3}
... )
>>> transformer
ReplaceTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ d   │
│ e   ┆ e   │
└─────┴─────┘
>>> transformer = Replace(
...     orig_column="old",
...     final_column="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

grizz.transformer.Sequential

Bases: BaseTransformer

Implement a polars.DataFrame transformer to apply sequentially several transformers.

Parameters:

Name Type Description Default
transformers Sequence[BaseTransformer | dict]

The transformers or their configurations.

required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import (
...     Sequential,
...     Cast,
... )
>>> transformer = Sequential(
...     [
...         Cast(columns=["col1"], dtype=pl.Float32),
...         Cast(columns=["col2"], dtype=pl.Int64),
...     ]
... )
>>> transformer
SequentialTransformer(
  (0): CastTransformer(columns=('col1',), dtype=Float32, ignore_missing=False)
  (1): CastTransformer(columns=('col2',), dtype=Int64, ignore_missing=False)
)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ f32  ┆ i64  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1.0  ┆ 1    ┆ a     ┆ a     │
│ 2.0  ┆ 2    ┆  b    ┆  b    │
│ 3.0  ┆ 3    ┆   c   ┆   c   │
│ 4.0  ┆ 4    ┆ d     ┆ d     │
│ 5.0  ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘

grizz.transformer.SequentialTransformer

Bases: BaseTransformer

Implement a polars.DataFrame transformer to apply sequentially several transformers.

Parameters:

Name Type Description Default
transformers Sequence[BaseTransformer | dict]

The transformers or their configurations.

required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import (
...     Sequential,
...     Cast,
... )
>>> transformer = Sequential(
...     [
...         Cast(columns=["col1"], dtype=pl.Float32),
...         Cast(columns=["col2"], dtype=pl.Int64),
...     ]
... )
>>> transformer
SequentialTransformer(
  (0): CastTransformer(columns=('col1',), dtype=Float32, ignore_missing=False)
  (1): CastTransformer(columns=('col2',), dtype=Int64, ignore_missing=False)
)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ f32  ┆ i64  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1.0  ┆ 1    ┆ a     ┆ a     │
│ 2.0  ┆ 2    ┆  b    ┆  b    │
│ 3.0  ┆ 3    ┆   c   ┆   c   │
│ 4.0  ┆ 4    ┆ d     ┆ d     │
│ 5.0  ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘

grizz.transformer.Sort

Bases: BaseTransformer

Implement a transformer to sort the DataFrame by the given columns.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to convert.

required
*args Any

The positional arguments to pass to sort.

()
**kwargs Any

The keyword arguments to pass to sort.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Sort
>>> transformer = Sort(columns=["col3", "col1"])
>>> transformer
SortTransformer(columns=('col3', 'col1'))
>>> frame = pl.DataFrame(
...     {"col1": [1, 2, None], "col2": [6.0, 5.0, 4.0], "col3": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ null ┆ 4.0  ┆ b    │
│ 2    ┆ 5.0  ┆ c    │
└──────┴──────┴──────┘

grizz.transformer.SortColumns

Bases: BaseTransformer

Implement a transformer to sort the DataFrame columns by name.

Parameters:

Name Type Description Default
reverse bool

If set to False, then the columns are sorted by alphabetical order.

False

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SortColumns
>>> transformer = SortColumns()
>>> transformer
SortColumnsTransformer(reverse=False)
>>> frame = pl.DataFrame(
...     {"col2": [1, 2, None], "col3": [6.0, 5.0, 4.0], "col1": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col2 ┆ col3 ┆ col1 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ str  ┆ i64  ┆ f64  │
╞══════╪══════╪══════╡
│ a    ┆ 1    ┆ 6.0  │
│ c    ┆ 2    ┆ 5.0  │
│ b    ┆ null ┆ 4.0  │
└──────┴──────┴──────┘

grizz.transformer.SortColumnsTransformer

Bases: BaseTransformer

Implement a transformer to sort the DataFrame columns by name.

Parameters:

Name Type Description Default
reverse bool

If set to False, then the columns are sorted by alphabetical order.

False

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SortColumns
>>> transformer = SortColumns()
>>> transformer
SortColumnsTransformer(reverse=False)
>>> frame = pl.DataFrame(
...     {"col2": [1, 2, None], "col3": [6.0, 5.0, 4.0], "col1": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col2 ┆ col3 ┆ col1 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ str  ┆ i64  ┆ f64  │
╞══════╪══════╪══════╡
│ a    ┆ 1    ┆ 6.0  │
│ c    ┆ 2    ┆ 5.0  │
│ b    ┆ null ┆ 4.0  │
└──────┴──────┴──────┘

grizz.transformer.SortTransformer

Bases: BaseTransformer

Implement a transformer to sort the DataFrame by the given columns.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to convert.

required
*args Any

The positional arguments to pass to sort.

()
**kwargs Any

The keyword arguments to pass to sort.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Sort
>>> transformer = Sort(columns=["col3", "col1"])
>>> transformer
SortTransformer(columns=('col3', 'col1'))
>>> frame = pl.DataFrame(
...     {"col1": [1, 2, None], "col2": [6.0, 5.0, 4.0], "col3": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ null ┆ 4.0  ┆ b    │
│ 2    ┆ 5.0  ┆ c    │
└──────┴──────┴──────┘

grizz.transformer.SqlTransformer

Bases: BaseTransformer

Implement a transformer that execute a SQL query against the DataFrame..

Parameters:

Name Type Description Default
query str

The SQL query to execute.

required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SqlTransformer
>>> transformer = SqlTransformer(query="SELECT col1, col4 FROM self WHERE col1 > 2")
>>> transformer
SqlTransformer(
  (query): SELECT col1, col4 FROM self WHERE col1 > 2
)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (3, 2)
┌──────┬──────┐
│ col1 ┆ col4 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 3    ┆ c    │
│ 4    ┆ d    │
│ 5    ┆ e    │
└──────┴──────┘

grizz.transformer.StripChars

Bases: BaseColumnsTransformer

Implement a transformer to remove leading and trailing characters.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to prepare. If None, it processes all the columns of type string.

None
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for strip_chars.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> transformer = StripChars(columns=["col2", "col3"])
>>> transformer
StripCharsTransformer(columns=('col2', 'col3'), ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ str  ┆ str  ┆ str   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1    ┆ a    ┆ a     │
│ 2    ┆ 2    ┆ b    ┆  b    │
│ 3    ┆ 3    ┆ c    ┆   c   │
│ 4    ┆ 4    ┆ d    ┆ d     │
│ 5    ┆ 5    ┆ e    ┆ e     │
└──────┴──────┴──────┴───────┘

grizz.transformer.StripCharsTransformer

Bases: BaseColumnsTransformer

Implement a transformer to remove leading and trailing characters.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to prepare. If None, it processes all the columns of type string.

None
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for strip_chars.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> transformer = StripChars(columns=["col2", "col3"])
>>> transformer
StripCharsTransformer(columns=('col2', 'col3'), ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ str  ┆ str  ┆ str   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1    ┆ a    ┆ a     │
│ 2    ┆ 2    ┆ b    ┆  b    │
│ 3    ┆ 3    ┆ c    ┆   c   │
│ 4    ┆ 4    ┆ d    ┆ d     │
│ 5    ┆ 5    ┆ e    ┆ e     │
└──────┴──────┴──────┴───────┘

grizz.transformer.TimeDiff

Bases: BaseTransformer

Implement a transformer to compute the time difference between consecutive time steps.

Parameters:

Name Type Description Default
group_cols Sequence[str]

The columns used to generate the group for each sequence.

required
time_col str

The input time column name.

required
time_diff_col str

The output time difference column name.

required
shift int

The number of slots to shift.

1

Example usage:

>>> import polars as pl
>>> from grizz.transformer import TimeDiff
>>> transformer = TimeDiff(group_cols=["col"], time_col="time", time_diff_col="diff")
>>> transformer
TimeDiffTransformer(group_cols=['col'], time_col=time, time_diff_col=diff, shift=1)
>>> frame = pl.DataFrame({"col": ["a", "b", "a", "a", "b"], "time": [1, 2, 3, 4, 5]})
>>> frame
shape: (5, 2)
┌─────┬──────┐
│ col ┆ time │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ a   ┆ 3    │
│ a   ┆ 4    │
│ b   ┆ 5    │
└─────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────┬──────┬──────┐
│ col ┆ time ┆ diff │
│ --- ┆ ---  ┆ ---  │
│ str ┆ i64  ┆ i64  │
╞═════╪══════╪══════╡
│ a   ┆ 1    ┆ 0    │
│ a   ┆ 3    ┆ 2    │
│ a   ┆ 4    ┆ 1    │
│ b   ┆ 2    ┆ 0    │
│ b   ┆ 5    ┆ 3    │
└─────┴──────┴──────┘

grizz.transformer.TimeDiffTransformer

Bases: BaseTransformer

Implement a transformer to compute the time difference between consecutive time steps.

Parameters:

Name Type Description Default
group_cols Sequence[str]

The columns used to generate the group for each sequence.

required
time_col str

The input time column name.

required
time_diff_col str

The output time difference column name.

required
shift int

The number of slots to shift.

1

Example usage:

>>> import polars as pl
>>> from grizz.transformer import TimeDiff
>>> transformer = TimeDiff(group_cols=["col"], time_col="time", time_diff_col="diff")
>>> transformer
TimeDiffTransformer(group_cols=['col'], time_col=time, time_diff_col=diff, shift=1)
>>> frame = pl.DataFrame({"col": ["a", "b", "a", "a", "b"], "time": [1, 2, 3, 4, 5]})
>>> frame
shape: (5, 2)
┌─────┬──────┐
│ col ┆ time │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ a   ┆ 3    │
│ a   ┆ 4    │
│ b   ┆ 5    │
└─────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────┬──────┬──────┐
│ col ┆ time ┆ diff │
│ --- ┆ ---  ┆ ---  │
│ str ┆ i64  ┆ i64  │
╞═════╪══════╪══════╡
│ a   ┆ 1    ┆ 0    │
│ a   ┆ 3    ┆ 2    │
│ a   ┆ 4    ┆ 1    │
│ b   ┆ 2    ┆ 0    │
│ b   ┆ 5    ┆ 3    │
└─────┴──────┴──────┘

grizz.transformer.TimeToSecond

Bases: BaseTransformer

Implement a transformer to convert a column with time values to seconds.

Parameters:

Name Type Description Default
in_col str

The input column with the time value to convert.

required
out_col str

The output column with the time in seconds.

required

Example usage:

>>> import datetime
>>> import polars as pl
>>> from grizz.transformer import TimeToSecond
>>> transformer = TimeToSecond(in_col="time", out_col="second")
>>> transformer
TimeToSecondTransformer(in_col=time, out_col=second)
>>> frame = pl.DataFrame(
...     {
...         "time": [
...             datetime.time(0, 0, 1, 890000),
...             datetime.time(0, 1, 1, 890000),
...             datetime.time(1, 1, 1, 890000),
...             datetime.time(0, 19, 19, 890000),
...             datetime.time(19, 19, 19, 890000),
...         ],
...         "col": ["a", "b", "c", "d", "e"],
...     },
...     schema={"time": pl.Time, "col": pl.String},
... )
>>> frame
shape: (5, 2)
┌──────────────┬─────┐
│ time         ┆ col │
│ ---          ┆ --- │
│ time         ┆ str │
╞══════════════╪═════╡
│ 00:00:01.890 ┆ a   │
│ 00:01:01.890 ┆ b   │
│ 01:01:01.890 ┆ c   │
│ 00:19:19.890 ┆ d   │
│ 19:19:19.890 ┆ e   │
└──────────────┴─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────────┬─────┬──────────┐
│ time         ┆ col ┆ second   │
│ ---          ┆ --- ┆ ---      │
│ time         ┆ str ┆ f64      │
╞══════════════╪═════╪══════════╡
│ 00:00:01.890 ┆ a   ┆ 1.89     │
│ 00:01:01.890 ┆ b   ┆ 61.89    │
│ 01:01:01.890 ┆ c   ┆ 3661.89  │
│ 00:19:19.890 ┆ d   ┆ 1159.89  │
│ 19:19:19.890 ┆ e   ┆ 69559.89 │
└──────────────┴─────┴──────────┘

grizz.transformer.TimeToSecondTransformer

Bases: BaseTransformer

Implement a transformer to convert a column with time values to seconds.

Parameters:

Name Type Description Default
in_col str

The input column with the time value to convert.

required
out_col str

The output column with the time in seconds.

required

Example usage:

>>> import datetime
>>> import polars as pl
>>> from grizz.transformer import TimeToSecond
>>> transformer = TimeToSecond(in_col="time", out_col="second")
>>> transformer
TimeToSecondTransformer(in_col=time, out_col=second)
>>> frame = pl.DataFrame(
...     {
...         "time": [
...             datetime.time(0, 0, 1, 890000),
...             datetime.time(0, 1, 1, 890000),
...             datetime.time(1, 1, 1, 890000),
...             datetime.time(0, 19, 19, 890000),
...             datetime.time(19, 19, 19, 890000),
...         ],
...         "col": ["a", "b", "c", "d", "e"],
...     },
...     schema={"time": pl.Time, "col": pl.String},
... )
>>> frame
shape: (5, 2)
┌──────────────┬─────┐
│ time         ┆ col │
│ ---          ┆ --- │
│ time         ┆ str │
╞══════════════╪═════╡
│ 00:00:01.890 ┆ a   │
│ 00:01:01.890 ┆ b   │
│ 01:01:01.890 ┆ c   │
│ 00:19:19.890 ┆ d   │
│ 19:19:19.890 ┆ e   │
└──────────────┴─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────────┬─────┬──────────┐
│ time         ┆ col ┆ second   │
│ ---          ┆ --- ┆ ---      │
│ time         ┆ str ┆ f64      │
╞══════════════╪═════╪══════════╡
│ 00:00:01.890 ┆ a   ┆ 1.89     │
│ 00:01:01.890 ┆ b   ┆ 61.89    │
│ 01:01:01.890 ┆ c   ┆ 3661.89  │
│ 00:19:19.890 ┆ d   ┆ 1159.89  │
│ 19:19:19.890 ┆ e   ┆ 69559.89 │
└──────────────┴─────┴──────────┘

grizz.transformer.ToDatetime

Bases: BaseColumnsTransformer

Implement a transformer to convert some columns to a polars.Datetime type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
format str | None

Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d %H:%M:%S". If set to None (default), the format is inferred from the data.

None
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for to_datetime.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ToDatetime
>>> transformer = ToDatetime(columns=["col1"])
>>> transformer
ToDatetimeTransformer(columns=('col1',), format=None, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ datetime[μs]        ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘

grizz.transformer.ToDatetimeTransformer

Bases: BaseColumnsTransformer

Implement a transformer to convert some columns to a polars.Datetime type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
format str | None

Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%Y-%m-%d %H:%M:%S". If set to None (default), the format is inferred from the data.

None
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for to_datetime.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ToDatetime
>>> transformer = ToDatetime(columns=["col1"])
>>> transformer
ToDatetimeTransformer(columns=('col1',), format=None, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ datetime[μs]        ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘

grizz.transformer.ToTime

Bases: BaseColumnsTransformer

Implement a transformer to convert some columns to a polars.Time type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
format str | None

Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%H:%M:%S". If set to None (default), the format is inferred from the data.

None
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for to_time.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ToTime
>>> transformer = ToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
ToTimeTransformer(columns=('col1',), format=%H:%M:%S, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ time     ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘

grizz.transformer.ToTimeTransformer

Bases: BaseColumnsTransformer

Implement a transformer to convert some columns to a polars.Time type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
format str | None

Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%H:%M:%S". If set to None (default), the format is inferred from the data.

None
ignore_missing bool

If False, an exception is raised if a column is missing, otherwise just a warning message is shown.

False
**kwargs Any

The keyword arguments for to_time.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ToTime
>>> transformer = ToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
ToTimeTransformer(columns=('col1',), format=%H:%M:%S, ignore_missing=False)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ time     ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘

grizz.transformer.is_transformer_config

is_transformer_config(config: dict) -> bool

Indicate if the input configuration is a configuration for a BaseTransformer.

This function only checks if the value of the key _target_ is valid. It does not check the other values. If _target_ indicates a function, the returned type hint is used to check the class.

Parameters:

Name Type Description Default
config dict

The configuration to check.

required

Returns:

Type Description
bool

True if the input configuration is a configuration for a BaseTransformer object.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import is_transformer_config
>>> is_transformer_config(
...     {
...         "_target_": "grizz.transformer.Cast",
...         "columns": ("col1", "col3"),
...         "dtype": pl.Int32,
...     }
... )
True

grizz.transformer.setup_transformer

setup_transformer(
    transformer: BaseTransformer | dict,
) -> BaseTransformer

Set up a polars.DataFrame transformer.

The transformer is instantiated from its configuration by using the BaseTransformer factory function.

Parameters:

Name Type Description Default
transformer BaseTransformer | dict

Specifies a polars.DataFrame transformer or its configuration.

required

Returns:

Type Description
BaseTransformer

An instantiated transformer.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import setup_transformer
>>> transformer = setup_transformer(
...     {
...         "_target_": "grizz.transformer.Cast",
...         "columns": ("col1", "col3"),
...         "dtype": pl.Int32,
...     }
... )
>>> transformer
CastTransformer(columns=('col1', 'col3'), dtype=Int32, ignore_missing=False)