transformer

Implement a transformer to compute the absolute difference between two columns.

Internally, this tranformer computes: out = abs(in1 - in2)

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import AbsDiffHorizontal
>>> transformer = AbsDiffHorizontal(in1_col="col1", in2_col="col2", out_col="diff")
>>> transformer
AbsDiffHorizontalTransformer(in1_col='col1', in2_col='col2', out_col='diff', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ diff │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    ┆ 4    │
│ 2    ┆ 4    ┆ b    ┆ 2    │
│ 3    ┆ 3    ┆ c    ┆ 0    │
│ 4    ┆ 2    ┆ d    ┆ 2    │
│ 5    ┆ 1    ┆ e    ┆ 4    │
└──────┴──────┴──────┴──────┘

grizz.transformer.AbsDiffHorizontalTransformer ¶

Implement a transformer to compute the absolute difference between two columns.

Internally, this tranformer computes: out = abs(in1 - in2)

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import AbsDiffHorizontal
>>> transformer = AbsDiffHorizontal(in1_col="col1", in2_col="col2", out_col="diff")
>>> transformer
AbsDiffHorizontalTransformer(in1_col='col1', in2_col='col2', out_col='diff', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ diff │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    ┆ 4    │
│ 2    ┆ 4    ┆ b    ┆ 2    │
│ 3    ┆ 3    ┆ c    ┆ 0    │
│ 4    ┆ 2    ┆ d    ┆ 2    │
│ 5    ┆ 1    ┆ e    ┆ 4    │
└──────┴──────┴──────┴──────┘

grizz.transformer.BaseArgTransformer ¶

Bases: BaseTransformer

Define a base class to implement transformers with custom arguments.

grizz.transformer.BaseArgTransformer.get_args `abstractmethod` ¶

get_args() -> dict

Get the arguments of the transformer.

Returns:

Type	Description
`dict`	The arguments of the transformer.

grizz.transformer.BaseIn1Out1Transformer ¶

Bases: BaseArgTransformer

Define a base class to implement polars.DataFrame transformers that takes one input column and generate one output column.

Parameters:

Name	Type	Description	Default
`in_col`	`str`	The input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

grizz.transformer.BaseIn2Out1Transformer ¶

Bases: BaseArgTransformer

Define a base class to implement polars.DataFrame transformers that takes two input columns and generate one output column.

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import AbsDiffHorizontal
>>> transformer = AbsDiffHorizontal(in1_col="col1", in2_col="col2", out_col="diff")
>>> transformer
AbsDiffHorizontalTransformer(in1_col='col1', in2_col='col2', out_col='diff', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ diff │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    ┆ 4    │
│ 2    ┆ 4    ┆ b    ┆ 2    │
│ 3    ┆ 3    ┆ c    ┆ 0    │
│ 4    ┆ 2    ┆ d    ┆ 2    │
│ 5    ┆ 1    ┆ e    ┆ 4    │
└──────┴──────┴──────┴──────┘

grizz.transformer.BaseInNOut1Transformer ¶

Bases: BaseInNTransformer

Define a base class to implement polars.DataFrame transformers that generate a single output column by using multiple input columns.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to prepare. If `None`, it processes all the columns.	required
`out_col`	`str`	The output column.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col          │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---          │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ list[i64]    │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ [11, 21, 31] │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ [12, 22, 32] │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ [13, 23, 33] │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ [14, 24, 34] │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘

grizz.transformer.BaseInNOutNTransformer ¶

Bases: BaseInNTransformer

Define a base class to implement polars.DataFrame transformers that has N input columns and N output columns.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to prepare. If `None`, it processes all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col          │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---          │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ list[i64]    │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ [11, 21, 31] │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ [12, 22, 32] │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ [13, 23, 33] │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ [14, 24, 34] │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘

grizz.transformer.BaseInNTransformer ¶

Bases: BaseArgTransformer

Define a base class to implement polars.DataFrame transformers that transform DataFrames by using multiple input columns.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to prepare. If `None`, it processes all the columns.	`None`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> transformer = DropNullRow()
>>> transformer
DropNullRowTransformer(columns=None, exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, None, 3, None, None],
...         "col3": [None, None, None, None, None],
...     }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null       ┆ null ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
└────────────┴──────┴──────┘

grizz.transformer.BaseInNTransformer.find_columns ¶

find_columns(frame: DataFrame) -> tuple[str, ...]

Find the columns to transform.

Parameters:

Name	Type	Description	Default
`frame`	`DataFrame`	The input DataFrame. Sometimes the columns to transform are found by analyzing the input DataFrame.	required

Returns:

Type	Description
`tuple[str, ...]`	The columns to transform.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> transformer = DropNullRow(columns=["col2", "col3"])
>>> transformer.find_columns(frame)
('col2', 'col3')
>>> transformer = DropNullRow()
>>> transformer.find_columns(frame)
('col1', 'col2', 'col3', 'col4')

grizz.transformer.BaseInNTransformer.find_common_columns ¶

find_common_columns(frame: DataFrame) -> tuple[str, ...]

Find the common columns between the DataFrame columns and the input columns.

Parameters:

Name	Type	Description	Default
`frame`	`DataFrame`	The input DataFrame. Sometimes the columns to transform are found by analyzing the input DataFrame.	required

Returns:

Type	Description
`tuple[str, ...]`	The common columns.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> transformer = DropNullRow(columns=["col2", "col3", "col5"])
>>> transformer.find_common_columns(frame)
('col2', 'col3')
>>> transformer = DropNullRow()
>>> transformer.find_common_columns(frame)
('col1', 'col2', 'col3', 'col4')

grizz.transformer.BaseInNTransformer.find_missing_columns ¶

find_missing_columns(frame: DataFrame) -> tuple[str, ...]

Find the missing columns.

Parameters:

Name	Type	Description	Default
`frame`	`DataFrame`	The input DataFrame. Sometimes the columns to transform are found by analyzing the input DataFrame.	required

Returns:

Type	Description
`tuple[str, ...]`	The missing columns.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> transformer = DropNullRow(columns=["col2", "col3", "col5"])
>>> transformer.find_missing_columns(frame)
('col5',)
>>> transformer = DropNullRow()
>>> transformer.find_missing_columns(frame)
()

grizz.transformer.BaseTransformer ¶

Bases: ABC

Define the base class to transform a polars.DataFrame.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
InplaceCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.BaseTransformer.equal `abstractmethod` ¶

equal(other: Any, equal_nan: bool = False) -> bool

Indicate if two objects are equal or not.

Parameters:

Name	Type	Description	Default
`other`	`Any`	The other object to compare.	required
`equal_nan`	`bool`	Whether to compare NaN's as equal. If `True`, NaN's in both objects will be considered equal.	`False`

Returns:

Type	Description
`bool`	`True` if the two are equal, otherwise `False`.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> obj1 = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> obj2 = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> obj3 = InplaceCast(columns=["col2", "col3"], dtype=pl.Float32)
>>> obj1.equal(obj2)
True
>>> obj1.equal(obj3)
False

grizz.transformer.BaseTransformer.fit `abstractmethod` ¶

fit(frame: DataFrame) -> None

Fit to the data in the polars.DataFrame.

Parameters:

Name	Type	Description	Default
`frame`	`DataFrame`	The `polars.DataFrame` to fit.	required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> transformer.fit(frame)
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.BaseTransformer.fit_transform `abstractmethod` ¶

fit_transform(frame: DataFrame) -> None

Fit to the data, then transform it.

Parameters:

Name	Type	Description	Default
`frame`	`DataFrame`	The `polars.DataFrame` to fit.	required

Returns:

Type	Description
`None`	The transformed DataFrame.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.BaseTransformer.transform `abstractmethod` ¶

transform(frame: DataFrame) -> DataFrame

Transform the data in the polars.DataFrame.

Parameters:

Name	Type	Description	Default
`frame`	`DataFrame`	The `polars.DataFrame` to transform.	required

Returns:

Type	Description
`DataFrame`	The transformed DataFrame.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.Binarizer ¶

Implement a transformer to binarize data according to a threshold.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.Binarizer`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Binarizer
>>> transformer = Binarizer(
...     columns=["col1", "col3"], prefix="", suffix="_out", threshold=1.5
... )
>>> transformer
BinarizerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', threshold=1.5)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [5, 4, 3, 2, 1, 0],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 5    ┆ a    │
│ 1    ┆ 1    ┆ 4    ┆ b    │
│ 2    ┆ 2    ┆ 3    ┆ c    │
│ 3    ┆ 3    ┆ 2    ┆ d    │
│ 4    ┆ 4    ┆ 1    ┆ e    │
│ 5    ┆ 5    ┆ 0    ┆ f    │
└──────┴──────┴──────┴──────┘

>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ i64      ┆ i64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ 5    ┆ a    ┆ 0        ┆ 1        │
│ 1    ┆ 1    ┆ 4    ┆ b    ┆ 0        ┆ 1        │
│ 2    ┆ 2    ┆ 3    ┆ c    ┆ 1        ┆ 1        │
│ 3    ┆ 3    ┆ 2    ┆ d    ┆ 1        ┆ 1        │
│ 4    ┆ 4    ┆ 1    ┆ e    ┆ 1        ┆ 0        │
│ 5    ┆ 5    ┆ 0    ┆ f    ┆ 1        ┆ 0        │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.BinarizerTransformer ¶

Implement a transformer to binarize data according to a threshold.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.Binarizer`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Binarizer
>>> transformer = Binarizer(
...     columns=["col1", "col3"], prefix="", suffix="_out", threshold=1.5
... )
>>> transformer
BinarizerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', threshold=1.5)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [5, 4, 3, 2, 1, 0],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 5    ┆ a    │
│ 1    ┆ 1    ┆ 4    ┆ b    │
│ 2    ┆ 2    ┆ 3    ┆ c    │
│ 3    ┆ 3    ┆ 2    ┆ d    │
│ 4    ┆ 4    ┆ 1    ┆ e    │
│ 5    ┆ 5    ┆ 0    ┆ f    │
└──────┴──────┴──────┴──────┘

>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ i64      ┆ i64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ 5    ┆ a    ┆ 0        ┆ 1        │
│ 1    ┆ 1    ┆ 4    ┆ b    ┆ 0        ┆ 1        │
│ 2    ┆ 2    ┆ 3    ┆ c    ┆ 1        ┆ 1        │
│ 3    ┆ 3    ┆ 2    ┆ d    ┆ 1        ┆ 1        │
│ 4    ┆ 4    ┆ 1    ┆ e    ┆ 1        ┆ 0        │
│ 5    ┆ 5    ┆ 0    ┆ f    ┆ 1        ┆ 0        │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.Cast ¶

Implement a transformer to convert some columns to a new data type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32, prefix="", suffix="_out")
>>> transformer
CastTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ str  ┆ str  ┆ i32      ┆ i32      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 1    ┆ a    ┆ 1        ┆ 1        │
│ 2    ┆ 2    ┆ 2    ┆ b    ┆ 2        ┆ 2        │
│ 3    ┆ 3    ┆ 3    ┆ c    ┆ 3        ┆ 3        │
│ 4    ┆ 4    ┆ 4    ┆ d    ┆ 4        ┆ 4        │
│ 5    ┆ 5    ┆ 5    ┆ e    ┆ 5        ┆ 5        │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.CastTransformer ¶

Implement a transformer to convert some columns to a new data type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32, prefix="", suffix="_out")
>>> transformer
CastTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ str  ┆ str  ┆ i32      ┆ i32      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 1    ┆ a    ┆ 1        ┆ 1        │
│ 2    ┆ 2    ┆ 2    ┆ b    ┆ 2        ┆ 2        │
│ 3    ┆ 3    ┆ 3    ┆ c    ┆ 3        ┆ 3        │
│ 4    ┆ 4    ┆ 4    ┆ d    ┆ 4        ┆ 4        │
│ 5    ┆ 5    ┆ 5    ┆ e    ┆ 5        ┆ 5        │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.CategoricalCast ¶

Implement a transformer to convert a column to categorical data type.

Parameters:

Name	Type	Description	Default
`in_col`	`str`	The input column name to cast.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `polars.Categorical`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import CategoricalCast
>>> transformer = CategoricalCast(in_col="col1", out_col="out")
>>> transformer
CategoricalCastTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...     },
...     schema={"col1": pl.String, "col2": pl.Float64},
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.0  │
│ b    ┆ 2.0  │
│ c    ┆ 3.0  │
│ d    ┆ 4.0  │
│ e    ┆ 5.0  │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬─────┐
│ col1 ┆ col2 ┆ out │
│ ---  ┆ ---  ┆ --- │
│ str  ┆ f64  ┆ cat │
╞══════╪══════╪═════╡
│ a    ┆ 1.0  ┆ a   │
│ b    ┆ 2.0  ┆ b   │
│ c    ┆ 3.0  ┆ c   │
│ d    ┆ 4.0  ┆ d   │
│ e    ┆ 5.0  ┆ e   │
└──────┴──────┴─────┘

grizz.transformer.CategoricalCastTransformer ¶

Implement a transformer to convert a column to categorical data type.

Parameters:

Name	Type	Description	Default
`in_col`	`str`	The input column name to cast.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `polars.Categorical`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import CategoricalCast
>>> transformer = CategoricalCast(in_col="col1", out_col="out")
>>> transformer
CategoricalCastTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...     },
...     schema={"col1": pl.String, "col2": pl.Float64},
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.0  │
│ b    ┆ 2.0  │
│ c    ┆ 3.0  │
│ d    ┆ 4.0  │
│ e    ┆ 5.0  │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬─────┐
│ col1 ┆ col2 ┆ out │
│ ---  ┆ ---  ┆ --- │
│ str  ┆ f64  ┆ cat │
╞══════╪══════╪═════╡
│ a    ┆ 1.0  ┆ a   │
│ b    ┆ 2.0  ┆ b   │
│ c    ┆ 3.0  ┆ c   │
│ d    ┆ 4.0  ┆ d   │
│ e    ┆ 5.0  ┆ e   │
└──────┴──────┴─────┘

grizz.transformer.ColumnClose ¶

Implement a transformer to compute a column that indicates if the values of two columns are element-wise equal within a tolerance.

The output column contains True if two columns are element-wise equal within a tolerance. Internally, this tranformer computes: out = (|actual - expected| <= atol + rtol * |expected|)

Parameters:

Name	Type	Description	Default
`actual`	`str`	The actual input column name. This column must be a numeric column.	required
`expected`	`str`	The expected input column name. This column must be a numeric column.	required
`out_col`	`str`	The output column name.	required
`atol`	`float`	The absolute tolerance parameter.	`1e-08`
`rtol`	`float`	The relative tolerance parameter.	`1e-05`
`equal_nan`	`bool`	Whether to compare NaN's as equal. If `True`, NaN's in `actual` will be considered equal to NaN's in `expected` in the output column.	`False`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnClose
>>> transformer = ColumnClose(actual="col1", expected="col2", out_col="out")
>>> transformer
ColumnCloseTransformer(actual='col1', expected='col2', out_col='out', atol=1e-08, rtol=1e-05, equal_nan=False, exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnCloseTransformer ¶

Implement a transformer to compute a column that indicates if the values of two columns are element-wise equal within a tolerance.

The output column contains True if two columns are element-wise equal within a tolerance. Internally, this tranformer computes: out = (|actual - expected| <= atol + rtol * |expected|)

Parameters:

Name	Type	Description	Default
`actual`	`str`	The actual input column name. This column must be a numeric column.	required
`expected`	`str`	The expected input column name. This column must be a numeric column.	required
`out_col`	`str`	The output column name.	required
`atol`	`float`	The absolute tolerance parameter.	`1e-08`
`rtol`	`float`	The relative tolerance parameter.	`1e-05`
`equal_nan`	`bool`	Whether to compare NaN's as equal. If `True`, NaN's in `actual` will be considered equal to NaN's in `expected` in the output column.	`False`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnClose
>>> transformer = ColumnClose(actual="col1", expected="col2", out_col="out")
>>> transformer
ColumnCloseTransformer(actual='col1', expected='col2', out_col='out', atol=1e-08, rtol=1e-05, equal_nan=False, exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnEqual ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the equal operation between two columns (in1 == in2).

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnEqual
>>> transformer = ColumnEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnEqualMissing ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the equal operation between two columns (in1 == in2), where null values are not propagated.

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnEqualMissing
>>> transformer = ColumnEqualMissing(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnEqualMissingTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnEqualMissingTransformer ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the equal operation between two columns (in1 == in2), where null values are not propagated.

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnEqualMissing
>>> transformer = ColumnEqualMissing(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnEqualMissingTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnEqualTransformer ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the equal operation between two columns (in1 == in2).

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnEqual
>>> transformer = ColumnEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnGreater ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the greater than operation between two columns (in1 > in2).

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnGreater
>>> transformer = ColumnGreater(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnGreaterTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnGreaterEqual ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the greater than or equal operation between two columns (in1 >= in2).

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnGreaterEqual
>>> transformer = ColumnGreaterEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnGreaterEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnGreaterEqualTransformer ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the greater than or equal operation between two columns (in1 >= in2).

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnGreaterEqual
>>> transformer = ColumnGreaterEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnGreaterEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnGreaterTransformer ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the greater than operation between two columns (in1 > in2).

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnGreater
>>> transformer = ColumnGreater(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnGreaterTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnLower ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the lower than operation between two columns (in1 < in2).

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnLower
>>> transformer = ColumnLower(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnLowerTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnLowerEqual ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the lower than or equal operation between two columns (in1 <= in2).

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnLowerEqual
>>> transformer = ColumnLowerEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnLowerEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnLowerEqualTransformer ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the lower than or equal operation between two columns (in1 <= in2).

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnLowerEqual
>>> transformer = ColumnLowerEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnLowerEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnLowerTransformer ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the lower than operation between two columns (in1 < in2).

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnLower
>>> transformer = ColumnLower(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnLowerTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnNotEqual ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the not equal operation between two columns (in1 != in2).

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnNotEqual
>>> transformer = ColumnNotEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnNotEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnNotEqualMissing ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the not equal operation between two columns (in1 != in2), where null values are not propagated.

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnNotEqualMissing
>>> transformer = ColumnNotEqualMissing(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnNotEqualMissingTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnNotEqualMissingTransformer ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the not equal operation between two columns (in1 != in2), where null values are not propagated.

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnNotEqualMissing
>>> transformer = ColumnNotEqualMissing(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnNotEqualMissingTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnNotEqualTransformer ¶

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the not equal operation between two columns (in1 != in2).

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnNotEqual
>>> transformer = ColumnNotEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnNotEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnSelection ¶

Bases: BaseInNTransformer

Implement a polars.DataFrame transformer to select a subset of columns.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to keep.	`None`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnSelection
>>> transformer = ColumnSelection(columns=["col1", "col2"])
>>> transformer
ColumnSelectionTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, 2, 3, 4, 5],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
    shape: (5, 2)
┌────────────┬──────┐
│ col1       ┆ col2 │
│ ---        ┆ ---  │
│ str        ┆ i64  │
╞════════════╪══════╡
│ 2020-1-1   ┆ 1    │
│ 2020-1-2   ┆ 2    │
│ 2020-1-31  ┆ 3    │
│ 2020-12-31 ┆ 4    │
│ null       ┆ 5    │
└────────────┴──────┘

grizz.transformer.ColumnSelectionTransformer ¶

Bases: BaseInNTransformer

Implement a polars.DataFrame transformer to select a subset of columns.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to keep.	`None`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnSelection
>>> transformer = ColumnSelection(columns=["col1", "col2"])
>>> transformer
ColumnSelectionTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, 2, 3, 4, 5],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
    shape: (5, 2)
┌────────────┬──────┐
│ col1       ┆ col2 │
│ ---        ┆ ---  │
│ str        ┆ i64  │
╞════════════╪══════╡
│ 2020-1-1   ┆ 1    │
│ 2020-1-2   ┆ 2    │
│ 2020-1-31  ┆ 3    │
│ 2020-12-31 ┆ 4    │
│ null       ┆ 5    │
└────────────┴──────┘

grizz.transformer.ConcatColumns ¶

Implement a transformer to concatenate columns into a new column.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to concatenate. The columns should have the same type or compatible types. If `None`, it processes all the columns.	required
`out_col`	`str`	The output column.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_col="out")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_col='out', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ out          │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---          │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ list[i64]    │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ [11, 21, 31] │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ [12, 22, 32] │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ [13, 23, 33] │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ [14, 24, 34] │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘

grizz.transformer.ConcatColumnsTransformer ¶

Implement a transformer to concatenate columns into a new column.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to concatenate. The columns should have the same type or compatible types. If `None`, it processes all the columns.	required
`out_col`	`str`	The output column.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_col="out")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_col='out', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ out          │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---          │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ list[i64]    │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ [11, 21, 31] │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ [12, 22, 32] │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ [13, 23, 33] │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ [14, 24, 34] │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘

grizz.transformer.CopyColumn ¶

Implement a polars.DataFrame to copy a column.

Parameters:

Name	Type	Description	Default
`in_col`	`str`	The input column name i.e. the column to copy.	required
`out_col`	`str`	The output column name i.e. the copied column.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import CopyColumn
>>> transformer = CopyColumn(in_col="col1", out_col="out")
>>> transformer
CopyColumnTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ str  ┆ str  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 1    ┆ 1    ┆ 1    ┆ a    ┆ 1   │
│ 2    ┆ 2    ┆ 2    ┆ b    ┆ 2   │
│ 3    ┆ 3    ┆ 3    ┆ c    ┆ 3   │
│ 4    ┆ 4    ┆ 4    ┆ d    ┆ 4   │
│ 5    ┆ 5    ┆ 5    ┆ e    ┆ 5   │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.CopyColumnTransformer ¶

Implement a polars.DataFrame to copy a column.

Parameters:

Name	Type	Description	Default
`in_col`	`str`	The input column name i.e. the column to copy.	required
`out_col`	`str`	The output column name i.e. the copied column.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import CopyColumn
>>> transformer = CopyColumn(in_col="col1", out_col="out")
>>> transformer
CopyColumnTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ str  ┆ str  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 1    ┆ 1    ┆ 1    ┆ a    ┆ 1   │
│ 2    ┆ 2    ┆ 2    ┆ b    ┆ 2   │
│ 3    ┆ 3    ┆ 3    ┆ c    ┆ 3   │
│ 4    ┆ 4    ┆ 4    ┆ d    ┆ 4   │
│ 5    ┆ 5    ┆ 5    ┆ e    ┆ 5   │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.CopyColumns ¶

Implement a transformer to copy some columns.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to copy. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import CopyColumns
>>> transformer = CopyColumns(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
CopyColumnsTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ str  ┆ str  ┆ i64      ┆ str      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 1    ┆ a    ┆ 1        ┆ 1        │
│ 2    ┆ 2    ┆ 2    ┆ b    ┆ 2        ┆ 2        │
│ 3    ┆ 3    ┆ 3    ┆ c    ┆ 3        ┆ 3        │
│ 4    ┆ 4    ┆ 4    ┆ d    ┆ 4        ┆ 4        │
│ 5    ┆ 5    ┆ 5    ┆ e    ┆ 5        ┆ 5        │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.CopyColumnsTransformer ¶

Implement a transformer to copy some columns.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to copy. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import CopyColumns
>>> transformer = CopyColumns(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
CopyColumnsTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ str  ┆ str  ┆ i64      ┆ str      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 1    ┆ a    ┆ 1        ┆ 1        │
│ 2    ┆ 2    ┆ 2    ┆ b    ┆ 2        ┆ 2        │
│ 3    ┆ 3    ┆ 3    ┆ c    ┆ 3        ┆ 3        │
│ 4    ┆ 4    ┆ 4    ┆ d    ┆ 4        ┆ 4        │
│ 5    ┆ 5    ┆ 5    ┆ e    ┆ 5        ┆ 5        │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.DecimalCast ¶

Bases: CastTransformer

Implement a transformer to convert decimal columns to a new data type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DecimalCast
>>> transformer = DecimalCast(
...     columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
DecimalCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Decimal,
...         "col3": pl.Decimal,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────────────┬──────────────┬──────┐
│ col1 ┆ col2         ┆ col3         ┆ col4 │
│ ---  ┆ ---          ┆ ---          ┆ ---  │
│ i64  ┆ decimal[*,0] ┆ decimal[*,0] ┆ str  │
╞══════╪══════════════╪══════════════╪══════╡
│ 1    ┆ 1            ┆ 1            ┆ a    │
│ 2    ┆ 2            ┆ 2            ┆ b    │
│ 3    ┆ 3            ┆ 3            ┆ c    │
│ 4    ┆ 4            ┆ 4            ┆ d    │
│ 5    ┆ 5            ┆ 5            ┆ e    │
└──────┴──────────────┴──────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────────────┬──────────────┬──────┬──────────┐
│ col1 ┆ col2         ┆ col3         ┆ col4 ┆ col2_out │
│ ---  ┆ ---          ┆ ---          ┆ ---  ┆ ---      │
│ i64  ┆ decimal[*,0] ┆ decimal[*,0] ┆ str  ┆ f32      │
╞══════╪══════════════╪══════════════╪══════╪══════════╡
│ 1    ┆ 1            ┆ 1            ┆ a    ┆ 1.0      │
│ 2    ┆ 2            ┆ 2            ┆ b    ┆ 2.0      │
│ 3    ┆ 3            ┆ 3            ┆ c    ┆ 3.0      │
│ 4    ┆ 4            ┆ 4            ┆ d    ┆ 4.0      │
│ 5    ┆ 5            ┆ 5            ┆ e    ┆ 5.0      │
└──────┴──────────────┴──────────────┴──────┴──────────┘

grizz.transformer.DecimalCastTransformer ¶

Bases: CastTransformer

Implement a transformer to convert decimal columns to a new data type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DecimalCast
>>> transformer = DecimalCast(
...     columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
DecimalCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Decimal,
...         "col3": pl.Decimal,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────────────┬──────────────┬──────┐
│ col1 ┆ col2         ┆ col3         ┆ col4 │
│ ---  ┆ ---          ┆ ---          ┆ ---  │
│ i64  ┆ decimal[*,0] ┆ decimal[*,0] ┆ str  │
╞══════╪══════════════╪══════════════╪══════╡
│ 1    ┆ 1            ┆ 1            ┆ a    │
│ 2    ┆ 2            ┆ 2            ┆ b    │
│ 3    ┆ 3            ┆ 3            ┆ c    │
│ 4    ┆ 4            ┆ 4            ┆ d    │
│ 5    ┆ 5            ┆ 5            ┆ e    │
└──────┴──────────────┴──────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────────────┬──────────────┬──────┬──────────┐
│ col1 ┆ col2         ┆ col3         ┆ col4 ┆ col2_out │
│ ---  ┆ ---          ┆ ---          ┆ ---  ┆ ---      │
│ i64  ┆ decimal[*,0] ┆ decimal[*,0] ┆ str  ┆ f32      │
╞══════╪══════════════╪══════════════╪══════╪══════════╡
│ 1    ┆ 1            ┆ 1            ┆ a    ┆ 1.0      │
│ 2    ┆ 2            ┆ 2            ┆ b    ┆ 2.0      │
│ 3    ┆ 3            ┆ 3            ┆ c    ┆ 3.0      │
│ 4    ┆ 4            ┆ 4            ┆ d    ┆ 4.0      │
│ 5    ┆ 5            ┆ 5            ┆ e    ┆ 5.0      │
└──────┴──────────────┴──────────────┴──────┴──────────┘

grizz.transformer.Diff ¶

Implement a transformer to compute the first discrete difference between shifted items.

Parameters:

Name	Type	Description	Default
`in_col`	`str`	The input column name.	required
`out_col`	`str`	The output column name.	required
`shift`	`int`	The number of slots to shift.	`1`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Diff
>>> transformer = Diff(in_col="col1", out_col="diff")
>>> transformer
DiffTransformer(in_col='col1', out_col='diff', exist_policy='raise', missing_policy='raise', shift=1)
>>> frame = pl.DataFrame({"col1": [1, 2, 3, 4, 5], "col2": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ a    │
│ 2    ┆ b    │
│ 3    ┆ c    │
│ 4    ┆ d    │
│ 5    ┆ e    │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ diff │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╡
│ 1    ┆ a    ┆ null │
│ 2    ┆ b    ┆ 1    │
│ 3    ┆ c    ┆ 1    │
│ 4    ┆ d    ┆ 1    │
│ 5    ┆ e    ┆ 1    │
└──────┴──────┴──────┘

grizz.transformer.DiffHorizontal ¶

Implement a transformer to compute the difference between two columns.

Internally, this tranformer computes: out = in1 - in2

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DiffHorizontal
>>> transformer = DiffHorizontal(in1_col="col1", in2_col="col2", out_col="diff")
>>> transformer
DiffHorizontalTransformer(in1_col='col1', in2_col='col2', out_col='diff', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ diff │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    ┆ -4   │
│ 2    ┆ 4    ┆ b    ┆ -2   │
│ 3    ┆ 3    ┆ c    ┆ 0    │
│ 4    ┆ 2    ┆ d    ┆ 2    │
│ 5    ┆ 1    ┆ e    ┆ 4    │
└──────┴──────┴──────┴──────┘

grizz.transformer.DiffHorizontalTransformer ¶

Implement a transformer to compute the difference between two columns.

Internally, this tranformer computes: out = in1 - in2

Parameters:

Name	Type	Description	Default
`in1_col`	`str`	The first input column name.	required
`in2_col`	`str`	The second input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DiffHorizontal
>>> transformer = DiffHorizontal(in1_col="col1", in2_col="col2", out_col="diff")
>>> transformer
DiffHorizontalTransformer(in1_col='col1', in2_col='col2', out_col='diff', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ diff │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    ┆ -4   │
│ 2    ┆ 4    ┆ b    ┆ -2   │
│ 3    ┆ 3    ┆ c    ┆ 0    │
│ 4    ┆ 2    ┆ d    ┆ 2    │
│ 5    ┆ 1    ┆ e    ┆ 4    │
└──────┴──────┴──────┴──────┘

grizz.transformer.DiffTransformer ¶

Implement a transformer to compute the first discrete difference between shifted items.

Parameters:

Name	Type	Description	Default
`in_col`	`str`	The input column name.	required
`out_col`	`str`	The output column name.	required
`shift`	`int`	The number of slots to shift.	`1`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Diff
>>> transformer = Diff(in_col="col1", out_col="diff")
>>> transformer
DiffTransformer(in_col='col1', out_col='diff', exist_policy='raise', missing_policy='raise', shift=1)
>>> frame = pl.DataFrame({"col1": [1, 2, 3, 4, 5], "col2": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ a    │
│ 2    ┆ b    │
│ 3    ┆ c    │
│ 4    ┆ d    │
│ 5    ┆ e    │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ diff │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╡
│ 1    ┆ a    ┆ null │
│ 2    ┆ b    ┆ 1    │
│ 3    ┆ c    ┆ 1    │
│ 4    ┆ d    ┆ 1    │
│ 5    ┆ e    ┆ 1    │
└──────┴──────┴──────┘

grizz.transformer.DropDuplicate ¶

Bases: BaseInNTransformer

Implement a transformer to drop duplicate rows.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to check. If set to `None` (default), use all columns.	`None`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `unique`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropDuplicate
>>> transformer = DropDuplicate(keep="first", maintain_order=True)
>>> transformer
DropDuplicateTransformer(columns=None, exclude_columns=(), missing_policy='raise', keep='first', maintain_order=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 1],
...         "col2": ["1", "2", "3", "4", "1"],
...         "col3": ["1", "2", "3", "1", "1"],
...         "col4": ["a", "a", "a", "a", "a"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ a    │
│ 3    ┆ 3    ┆ 3    ┆ a    │
│ 4    ┆ 4    ┆ 1    ┆ a    │
│ 1    ┆ 1    ┆ 1    ┆ a    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ a    │
│ 3    ┆ 3    ┆ 3    ┆ a    │
│ 4    ┆ 4    ┆ 1    ┆ a    │
└──────┴──────┴──────┴──────┘

grizz.transformer.DropDuplicateTransformer ¶

Bases: BaseInNTransformer

Implement a transformer to drop duplicate rows.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to check. If set to `None` (default), use all columns.	`None`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `unique`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropDuplicate
>>> transformer = DropDuplicate(keep="first", maintain_order=True)
>>> transformer
DropDuplicateTransformer(columns=None, exclude_columns=(), missing_policy='raise', keep='first', maintain_order=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 1],
...         "col2": ["1", "2", "3", "4", "1"],
...         "col3": ["1", "2", "3", "1", "1"],
...         "col4": ["a", "a", "a", "a", "a"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ a    │
│ 3    ┆ 3    ┆ 3    ┆ a    │
│ 4    ┆ 4    ┆ 1    ┆ a    │
│ 1    ┆ 1    ┆ 1    ┆ a    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ a    │
│ 3    ┆ 3    ┆ 3    ┆ a    │
│ 4    ┆ 4    ┆ 1    ┆ a    │
└──────┴──────┴──────┴──────┘

grizz.transformer.DropNanColumn ¶

Bases: BaseInNTransformer

Implement a transformer to remove the columns that have too many NaN values.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	`None`
`threshold`	`float`	The maximum percentage of NaN values to keep columns. If the proportion of NaN vallues is greater or equal to this threshold value, the column is removed. If set to `1.0`, it removes all the columns that have only NaN values.	`1.0`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `drop`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNanColumn
>>> transformer = DropNanColumn()
>>> transformer
DropNanColumnTransformer(columns=None, exclude_columns=(), missing_policy='raise', threshold=1.0)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1.0, 2.0, 3.0, 4.0, float("nan")],
...         "col2": [1.0, float("nan"), 3.0, float("nan"), 5.0],
...         "col3": [float("nan"), float("nan"), float("nan"), float("nan"), float("nan")],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ f64  ┆ f64  ┆ f64  │
╞══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ NaN  │
│ 2.0  ┆ NaN  ┆ NaN  │
│ 3.0  ┆ 3.0  ┆ NaN  │
│ 4.0  ┆ NaN  ┆ NaN  │
│ NaN  ┆ 5.0  ┆ NaN  │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ f64  ┆ f64  │
╞══════╪══════╡
│ 1.0  ┆ 1.0  │
│ 2.0  ┆ NaN  │
│ 3.0  ┆ 3.0  │
│ 4.0  ┆ NaN  │
│ NaN  ┆ 5.0  │
└──────┴──────┘

grizz.transformer.DropNanColumnTransformer ¶

Bases: BaseInNTransformer

Implement a transformer to remove the columns that have too many NaN values.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	`None`
`threshold`	`float`	The maximum percentage of NaN values to keep columns. If the proportion of NaN vallues is greater or equal to this threshold value, the column is removed. If set to `1.0`, it removes all the columns that have only NaN values.	`1.0`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `drop`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNanColumn
>>> transformer = DropNanColumn()
>>> transformer
DropNanColumnTransformer(columns=None, exclude_columns=(), missing_policy='raise', threshold=1.0)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1.0, 2.0, 3.0, 4.0, float("nan")],
...         "col2": [1.0, float("nan"), 3.0, float("nan"), 5.0],
...         "col3": [float("nan"), float("nan"), float("nan"), float("nan"), float("nan")],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ f64  ┆ f64  ┆ f64  │
╞══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ NaN  │
│ 2.0  ┆ NaN  ┆ NaN  │
│ 3.0  ┆ 3.0  ┆ NaN  │
│ 4.0  ┆ NaN  ┆ NaN  │
│ NaN  ┆ 5.0  ┆ NaN  │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ f64  ┆ f64  │
╞══════╪══════╡
│ 1.0  ┆ 1.0  │
│ 2.0  ┆ NaN  │
│ 3.0  ┆ 3.0  │
│ 4.0  ┆ NaN  │
│ NaN  ┆ 5.0  │
└──────┴──────┘

grizz.transformer.DropNanRow ¶

Bases: BaseInNTransformer

Implement a transformer to drop all rows that contain NaN values.

Note that all the values in the row need to be NaN to drop the row.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to check. If set to `None` (default), use all columns.	`None`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNanRow
>>> transformer = DropNanRow()
>>> transformer
DropNanRowTransformer(columns=None, exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1.0, 2.0, 3.0, 4.0, float("nan")],
...         "col2": [1.0, float("nan"), 3.0, float("nan"), float("nan")],
...         "col3": [float("nan"), float("nan"), float("nan"), float("nan"), float("nan")],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ f64  ┆ f64  ┆ f64  │
╞══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ NaN  │
│ 2.0  ┆ NaN  ┆ NaN  │
│ 3.0  ┆ 3.0  ┆ NaN  │
│ 4.0  ┆ NaN  ┆ NaN  │
│ NaN  ┆ NaN  ┆ NaN  │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ f64  ┆ f64  ┆ f64  │
╞══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ NaN  │
│ 2.0  ┆ NaN  ┆ NaN  │
│ 3.0  ┆ 3.0  ┆ NaN  │
│ 4.0  ┆ NaN  ┆ NaN  │
└──────┴──────┴──────┘

grizz.transformer.DropNanRowTransformer ¶

Bases: BaseInNTransformer

Implement a transformer to drop all rows that contain NaN values.

Note that all the values in the row need to be NaN to drop the row.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to check. If set to `None` (default), use all columns.	`None`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNanRow
>>> transformer = DropNanRow()
>>> transformer
DropNanRowTransformer(columns=None, exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1.0, 2.0, 3.0, 4.0, float("nan")],
...         "col2": [1.0, float("nan"), 3.0, float("nan"), float("nan")],
...         "col3": [float("nan"), float("nan"), float("nan"), float("nan"), float("nan")],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ f64  ┆ f64  ┆ f64  │
╞══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ NaN  │
│ 2.0  ┆ NaN  ┆ NaN  │
│ 3.0  ┆ 3.0  ┆ NaN  │
│ 4.0  ┆ NaN  ┆ NaN  │
│ NaN  ┆ NaN  ┆ NaN  │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ f64  ┆ f64  ┆ f64  │
╞══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ NaN  │
│ 2.0  ┆ NaN  ┆ NaN  │
│ 3.0  ┆ 3.0  ┆ NaN  │
│ 4.0  ┆ NaN  ┆ NaN  │
└──────┴──────┴──────┘

grizz.transformer.DropNullColumn ¶

Bases: BaseInNTransformer

Implement a transformer to remove the columns that have too many null values.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	`None`
`threshold`	`float`	The maximum percentage of null values to keep columns. If the proportion of null vallues is greater or equal to this threshold value, the column is removed. If set to `1.0`, it removes all the columns that have only null values.	`1.0`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `drop`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullColumn
>>> transformer = DropNullColumn()
>>> transformer
DropNullColumnTransformer(columns=None, exclude_columns=(), missing_policy='raise', threshold=1.0)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, None, 3, None, 5],
...         "col3": [None, None, None, None, None],
...     }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null       ┆ 5    ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌────────────┬──────┐
│ col1       ┆ col2 │
│ ---        ┆ ---  │
│ str        ┆ i64  │
╞════════════╪══════╡
│ 2020-1-1   ┆ 1    │
│ 2020-1-2   ┆ null │
│ 2020-1-31  ┆ 3    │
│ 2020-12-31 ┆ null │
│ null       ┆ 5    │
└────────────┴──────┘

grizz.transformer.DropNullColumnTransformer ¶

Bases: BaseInNTransformer

Implement a transformer to remove the columns that have too many null values.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	`None`
`threshold`	`float`	The maximum percentage of null values to keep columns. If the proportion of null vallues is greater or equal to this threshold value, the column is removed. If set to `1.0`, it removes all the columns that have only null values.	`1.0`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `drop`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullColumn
>>> transformer = DropNullColumn()
>>> transformer
DropNullColumnTransformer(columns=None, exclude_columns=(), missing_policy='raise', threshold=1.0)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, None, 3, None, 5],
...         "col3": [None, None, None, None, None],
...     }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null       ┆ 5    ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌────────────┬──────┐
│ col1       ┆ col2 │
│ ---        ┆ ---  │
│ str        ┆ i64  │
╞════════════╪══════╡
│ 2020-1-1   ┆ 1    │
│ 2020-1-2   ┆ null │
│ 2020-1-31  ┆ 3    │
│ 2020-12-31 ┆ null │
│ null       ┆ 5    │
└────────────┴──────┘

grizz.transformer.DropNullRow ¶

Bases: BaseInNTransformer

Implement a transformer to drop all rows that contain null values.

Note that all the values in the row need to be null to drop the row.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to check. If set to `None` (default), use all columns.	`None`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> transformer = DropNullRow()
>>> transformer
DropNullRowTransformer(columns=None, exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, None, 3, None, None],
...         "col3": [None, None, None, None, None],
...     }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null       ┆ null ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
└────────────┴──────┴──────┘

grizz.transformer.DropNullRowTransformer ¶

Bases: BaseInNTransformer

Implement a transformer to drop all rows that contain null values.

Note that all the values in the row need to be null to drop the row.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to check. If set to `None` (default), use all columns.	`None`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> transformer = DropNullRow()
>>> transformer
DropNullRowTransformer(columns=None, exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, None, 3, None, None],
...         "col3": [None, None, None, None, None],
...     }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null       ┆ null ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
└────────────┴──────┴──────┘

grizz.transformer.Equal ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the equal operation.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Equal
>>> transformer = Equal(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
EqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.EqualMissing ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the equal operation where null values are not propagated.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import EqualMissing
>>> transformer = EqualMissing(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
EqualMissingTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.EqualMissingTransformer ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the equal operation where null values are not propagated.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import EqualMissing
>>> transformer = EqualMissing(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
EqualMissingTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.EqualTransformer ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the equal operation.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Equal
>>> transformer = Equal(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
EqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.FillNan ¶

Implement a transformer to fill NaN values.

This transformer ignores the columns that are not of type float.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `fill_nan`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FillNan
>>> transformer = FillNan(columns=["col1", "col4"], prefix="", suffix="_out", value=100)
>>> transformer
FillNanTransformer(columns=('col1', 'col4'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ NaN  ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col4_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      │
│ i64  ┆ f64  ┆ str  ┆ f64  ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  ┆ 1.2      │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  ┆ 100.0    │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  ┆ 3.2      │
│ 4    ┆ 4.2  ┆ d    ┆ null ┆ null     │
│ null ┆ NaN  ┆ null ┆ 5.2  ┆ 5.2      │
└──────┴──────┴──────┴──────┴──────────┘

grizz.transformer.FillNanTransformer ¶

Implement a transformer to fill NaN values.

This transformer ignores the columns that are not of type float.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `fill_nan`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FillNan
>>> transformer = FillNan(columns=["col1", "col4"], prefix="", suffix="_out", value=100)
>>> transformer
FillNanTransformer(columns=('col1', 'col4'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ NaN  ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col4_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      │
│ i64  ┆ f64  ┆ str  ┆ f64  ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  ┆ 1.2      │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  ┆ 100.0    │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  ┆ 3.2      │
│ 4    ┆ 4.2  ┆ d    ┆ null ┆ null     │
│ null ┆ NaN  ┆ null ┆ 5.2  ┆ 5.2      │
└──────┴──────┴──────┴──────┴──────────┘

grizz.transformer.FillNull ¶

Implement a transformer to fill null values.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `fill_null`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FillNull
>>> transformer = FillNull(columns=["col1", "col4"], prefix="", suffix="_out", value=100)
>>> transformer
FillNullTransformer(columns=('col1', 'col4'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, None],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ null ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col4_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ f64  ┆ str  ┆ f64  ┆ i64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  ┆ 1        ┆ 1.2      │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  ┆ 2        ┆ NaN      │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  ┆ 3        ┆ 3.2      │
│ 4    ┆ 4.2  ┆ d    ┆ null ┆ 4        ┆ 100.0    │
│ null ┆ null ┆ null ┆ 5.2  ┆ 100      ┆ 5.2      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.FillNullTransformer ¶

Implement a transformer to fill null values.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `fill_null`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FillNull
>>> transformer = FillNull(columns=["col1", "col4"], prefix="", suffix="_out", value=100)
>>> transformer
FillNullTransformer(columns=('col1', 'col4'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, None],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ null ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col4_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ f64  ┆ str  ┆ f64  ┆ i64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  ┆ 1        ┆ 1.2      │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  ┆ 2        ┆ NaN      │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  ┆ 3        ┆ 3.2      │
│ 4    ┆ 4.2  ┆ d    ┆ null ┆ 4        ┆ 100.0    │
│ null ┆ null ┆ null ┆ 5.2  ┆ 100      ┆ 5.2      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.FilterCardinality ¶

Bases: BaseInNTransformer

Implement a transformer to filter based on the cardinality (i.e. number of unique values) in each column.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to use to filter based on the number of unique values. If `None`, it processes all the columns of type string.	`None`
`n_min`	`int`	The minimal cardinality (included).	`0`
`n_max`	`int`	The maximal cardinality (excluded).	`float('inf')`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FilterCardinality
>>> transformer = FilterCardinality(columns=["col1", "col2", "col3"], n_min=2, n_max=5)
>>> transformer
FilterCardinalityTransformer(columns=('col1', 'col2', 'col3'), exclude_columns=(), missing_policy='raise', n_min=2, n_max=5)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1, 1, 1, 1, 1],
...         "col3": ["a", "b", "c", "a", "b"],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ a    ┆ 1.2  │
│ 2    ┆ 1    ┆ b    ┆ NaN  │
│ 3    ┆ 1    ┆ c    ┆ 3.2  │
│ 4    ┆ 1    ┆ a    ┆ null │
│ 5    ┆ 1    ┆ b    ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col3 ┆ col4 │
│ ---  ┆ ---  │
│ str  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.2  │
│ b    ┆ NaN  │
│ c    ┆ 3.2  │
│ a    ┆ null │
│ b    ┆ 5.2  │
└──────┴──────┘

grizz.transformer.FilterCardinalityTransformer ¶

Bases: BaseInNTransformer

Implement a transformer to filter based on the cardinality (i.e. number of unique values) in each column.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to use to filter based on the number of unique values. If `None`, it processes all the columns of type string.	`None`
`n_min`	`int`	The minimal cardinality (included).	`0`
`n_max`	`int`	The maximal cardinality (excluded).	`float('inf')`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FilterCardinality
>>> transformer = FilterCardinality(columns=["col1", "col2", "col3"], n_min=2, n_max=5)
>>> transformer
FilterCardinalityTransformer(columns=('col1', 'col2', 'col3'), exclude_columns=(), missing_policy='raise', n_min=2, n_max=5)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1, 1, 1, 1, 1],
...         "col3": ["a", "b", "c", "a", "b"],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ a    ┆ 1.2  │
│ 2    ┆ 1    ┆ b    ┆ NaN  │
│ 3    ┆ 1    ┆ c    ┆ 3.2  │
│ 4    ┆ 1    ┆ a    ┆ null │
│ 5    ┆ 1    ┆ b    ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col3 ┆ col4 │
│ ---  ┆ ---  │
│ str  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.2  │
│ b    ┆ NaN  │
│ c    ┆ 3.2  │
│ a    ┆ null │
│ b    ┆ 5.2  │
└──────┴──────┘

grizz.transformer.FirstRow ¶

Bases: BaseArgTransformer

Implement a transformer that select the first n rows.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FirstRow
>>> transformer = FirstRow(n=3)
>>> transformer
FirstRowTransformer(n=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
└──────┴──────┴──────┴──────┘

grizz.transformer.FirstRowTransformer ¶

Bases: BaseArgTransformer

Implement a transformer that select the first n rows.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FirstRow
>>> transformer = FirstRow(n=3)
>>> transformer
FirstRowTransformer(n=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
└──────┴──────┴──────┴──────┘

grizz.transformer.FloatCast ¶

Bases: CastTransformer

Implement a transformer to convert float columns to a new data type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FloatCast
>>> transformer = FloatCast(
...     columns=["col1", "col2"], dtype=pl.Int32, prefix="", suffix="_out"
... )
>>> transformer
FloatCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col2_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      │
│ i64  ┆ f64  ┆ f64  ┆ str  ┆ i32      │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    ┆ 1        │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    ┆ 2        │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    ┆ 3        │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    ┆ 4        │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    ┆ 5        │
└──────┴──────┴──────┴──────┴──────────┘

grizz.transformer.FloatCastTransformer ¶

Bases: CastTransformer

Implement a transformer to convert float columns to a new data type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FloatCast
>>> transformer = FloatCast(
...     columns=["col1", "col2"], dtype=pl.Int32, prefix="", suffix="_out"
... )
>>> transformer
FloatCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col2_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      │
│ i64  ┆ f64  ┆ f64  ┆ str  ┆ i32      │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    ┆ 1        │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    ┆ 2        │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    ┆ 3        │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    ┆ 4        │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    ┆ 5        │
└──────┴──────┴──────┴──────┴──────────┘

grizz.transformer.Function ¶

Bases: BaseArgTransformer

Implement a transformer that is a wrapper around a function to transform the DataFrame.

Parameters:

Name	Type	Description	Default
`func`	`Callable[[DataFrame], DataFrame]`	The function to transform the DataFrame.	required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FunctionTransformer
>>> transformer = FunctionTransformer(
...     func=lambda frame: frame.filter(pl.col("col1").is_in({2, 4}))
... )
>>> transformer
FunctionTransformer(func=<function <lambda> at 0x...>)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (2, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
└──────┴──────┴──────┴──────┘

grizz.transformer.FunctionTransformer ¶

Bases: BaseArgTransformer

Implement a transformer that is a wrapper around a function to transform the DataFrame.

Parameters:

Name	Type	Description	Default
`func`	`Callable[[DataFrame], DataFrame]`	The function to transform the DataFrame.	required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FunctionTransformer
>>> transformer = FunctionTransformer(
...     func=lambda frame: frame.filter(pl.col("col1").is_in({2, 4}))
... )
>>> transformer
FunctionTransformer(func=<function <lambda> at 0x...>)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (2, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
└──────┴──────┴──────┴──────┘

grizz.transformer.Greater ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the greater than operation.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Greater
>>> transformer = Greater(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
GreaterTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.GreaterEqual ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the greater than or equal operation.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import GreaterEqual
>>> transformer = GreaterEqual(
...     columns=["col1", "col3"], target=4.2, prefix="", suffix="_out"
... )
>>> transformer
GreaterEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.GreaterEqualTransformer ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the greater than or equal operation.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import GreaterEqual
>>> transformer = GreaterEqual(
...     columns=["col1", "col3"], target=4.2, prefix="", suffix="_out"
... )
>>> transformer
GreaterEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.GreaterTransformer ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the greater than operation.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Greater
>>> transformer = Greater(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
GreaterTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.InplaceCast ¶

Bases: CastTransformer

Implement a transformer to convert some columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
InplaceCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceCastTransformer ¶

Bases: CastTransformer

Implement a transformer to convert some columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
InplaceCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceCategoricalCast ¶

Bases: CategoricalCastTransformer

Implement a transformer to convert a column to categorical data type.

InplaceCategoricalCastTransformer is a specific implementation of CategoricalCastTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`col`	`str`	The column name to cast.	required
`**kwargs`	`Any`	Additional arguments passed to `polars.Categorical`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCategoricalCast
>>> transformer = InplaceCategoricalCast(col="col1")
>>> transformer
InplaceCategoricalCastTransformer(col='col1', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...     },
...     schema={"col1": pl.String, "col2": pl.Float64},
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.0  │
│ b    ┆ 2.0  │
│ c    ┆ 3.0  │
│ d    ┆ 4.0  │
│ e    ┆ 5.0  │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ cat  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.0  │
│ b    ┆ 2.0  │
│ c    ┆ 3.0  │
│ d    ┆ 4.0  │
│ e    ┆ 5.0  │
└──────┴──────┘

grizz.transformer.InplaceCategoricalCastTransformer ¶

Bases: CategoricalCastTransformer

Implement a transformer to convert a column to categorical data type.

InplaceCategoricalCastTransformer is a specific implementation of CategoricalCastTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`col`	`str`	The column name to cast.	required
`**kwargs`	`Any`	Additional arguments passed to `polars.Categorical`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCategoricalCast
>>> transformer = InplaceCategoricalCast(col="col1")
>>> transformer
InplaceCategoricalCastTransformer(col='col1', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...     },
...     schema={"col1": pl.String, "col2": pl.Float64},
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.0  │
│ b    ┆ 2.0  │
│ c    ┆ 3.0  │
│ d    ┆ 4.0  │
│ e    ┆ 5.0  │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ cat  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.0  │
│ b    ┆ 2.0  │
│ c    ┆ 3.0  │
│ d    ┆ 4.0  │
│ e    ┆ 5.0  │
└──────┴──────┘

grizz.transformer.InplaceDecimalCast ¶

Implement a transformer to convert decimal columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceDecimalCast
>>> transformer = InplaceDecimalCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
InplaceDecimalCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Decimal,
...         "col3": pl.Decimal,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────────────┬──────────────┬──────┐
│ col1 ┆ col2         ┆ col3         ┆ col4 │
│ ---  ┆ ---          ┆ ---          ┆ ---  │
│ i64  ┆ decimal[*,0] ┆ decimal[*,0] ┆ str  │
╞══════╪══════════════╪══════════════╪══════╡
│ 1    ┆ 1            ┆ 1            ┆ a    │
│ 2    ┆ 2            ┆ 2            ┆ b    │
│ 3    ┆ 3            ┆ 3            ┆ c    │
│ 4    ┆ 4            ┆ 4            ┆ d    │
│ 5    ┆ 5            ┆ 5            ┆ e    │
└──────┴──────────────┴──────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────────────┬──────┐
│ col1 ┆ col2 ┆ col3         ┆ col4 │
│ ---  ┆ ---  ┆ ---          ┆ ---  │
│ i64  ┆ f32  ┆ decimal[*,0] ┆ str  │
╞══════╪══════╪══════════════╪══════╡
│ 1    ┆ 1.0  ┆ 1            ┆ a    │
│ 2    ┆ 2.0  ┆ 2            ┆ b    │
│ 3    ┆ 3.0  ┆ 3            ┆ c    │
│ 4    ┆ 4.0  ┆ 4            ┆ d    │
│ 5    ┆ 5.0  ┆ 5            ┆ e    │
└──────┴──────┴──────────────┴──────┘

grizz.transformer.InplaceDecimalCastTransformer ¶

Implement a transformer to convert decimal columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceDecimalCast
>>> transformer = InplaceDecimalCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
InplaceDecimalCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Decimal,
...         "col3": pl.Decimal,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────────────┬──────────────┬──────┐
│ col1 ┆ col2         ┆ col3         ┆ col4 │
│ ---  ┆ ---          ┆ ---          ┆ ---  │
│ i64  ┆ decimal[*,0] ┆ decimal[*,0] ┆ str  │
╞══════╪══════════════╪══════════════╪══════╡
│ 1    ┆ 1            ┆ 1            ┆ a    │
│ 2    ┆ 2            ┆ 2            ┆ b    │
│ 3    ┆ 3            ┆ 3            ┆ c    │
│ 4    ┆ 4            ┆ 4            ┆ d    │
│ 5    ┆ 5            ┆ 5            ┆ e    │
└──────┴──────────────┴──────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────────────┬──────┐
│ col1 ┆ col2 ┆ col3         ┆ col4 │
│ ---  ┆ ---  ┆ ---          ┆ ---  │
│ i64  ┆ f32  ┆ decimal[*,0] ┆ str  │
╞══════╪══════╪══════════════╪══════╡
│ 1    ┆ 1.0  ┆ 1            ┆ a    │
│ 2    ┆ 2.0  ┆ 2            ┆ b    │
│ 3    ┆ 3.0  ┆ 3            ┆ c    │
│ 4    ┆ 4.0  ┆ 4            ┆ d    │
│ 5    ┆ 5.0  ┆ 5            ┆ e    │
└──────┴──────┴──────────────┴──────┘

grizz.transformer.InplaceFillNan ¶

Bases: FillNanTransformer

Implement a transformer to fill NaN values.

This transformer ignores the columns that are not of type float. InplaceFillNanTransformer is a specific implementation of FillNanTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `fill_nan`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceFillNan
>>> transformer = InplaceFillNan(columns=["col1", "col4"], value=100)
>>> transformer
InplaceFillNanTransformer(columns=('col1', 'col4'), exclude_columns=(), missing_policy='raise', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ NaN  ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ f64  ┆ str  ┆ f64   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2   │
│ 2    ┆ 2.2  ┆ b    ┆ 100.0 │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2   │
│ 4    ┆ 4.2  ┆ d    ┆ null  │
│ null ┆ NaN  ┆ null ┆ 5.2   │
└──────┴──────┴──────┴───────┘

grizz.transformer.InplaceFillNanTransformer ¶

Bases: FillNanTransformer

Implement a transformer to fill NaN values.

This transformer ignores the columns that are not of type float. InplaceFillNanTransformer is a specific implementation of FillNanTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `fill_nan`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceFillNan
>>> transformer = InplaceFillNan(columns=["col1", "col4"], value=100)
>>> transformer
InplaceFillNanTransformer(columns=('col1', 'col4'), exclude_columns=(), missing_policy='raise', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ NaN  ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ f64  ┆ str  ┆ f64   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2   │
│ 2    ┆ 2.2  ┆ b    ┆ 100.0 │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2   │
│ 4    ┆ 4.2  ┆ d    ┆ null  │
│ null ┆ NaN  ┆ null ┆ 5.2   │
└──────┴──────┴──────┴───────┘

grizz.transformer.InplaceFillNull ¶

Bases: FillNullTransformer

Implement a transformer to fill null values.

This transformer ignores the columns that are not of type float. InplaceFillNullTransformer is a specific implementation of FillNullTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `fill_nan`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceFillNull
>>> transformer = InplaceFillNull(columns=["col1", "col4"], value=100)
>>> transformer
InplaceFillNullTransformer(columns=('col1', 'col4'), exclude_columns=(), missing_policy='raise', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ NaN  ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ f64  ┆ str  ┆ f64   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2   │
│ 2    ┆ 2.2  ┆ b    ┆ NaN   │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2   │
│ 4    ┆ 4.2  ┆ d    ┆ 100.0 │
│ 100  ┆ NaN  ┆ null ┆ 5.2   │
└──────┴──────┴──────┴───────┘

grizz.transformer.InplaceFillNullTransformer ¶

Bases: FillNullTransformer

Implement a transformer to fill null values.

This transformer ignores the columns that are not of type float. InplaceFillNullTransformer is a specific implementation of FillNullTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `fill_nan`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceFillNull
>>> transformer = InplaceFillNull(columns=["col1", "col4"], value=100)
>>> transformer
InplaceFillNullTransformer(columns=('col1', 'col4'), exclude_columns=(), missing_policy='raise', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ NaN  ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ f64  ┆ str  ┆ f64   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2   │
│ 2    ┆ 2.2  ┆ b    ┆ NaN   │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2   │
│ 4    ┆ 4.2  ┆ d    ┆ 100.0 │
│ 100  ┆ NaN  ┆ null ┆ 5.2   │
└──────┴──────┴──────┴───────┘

grizz.transformer.InplaceFloatCast ¶

Implement a transformer to convert float columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceFloatCast
>>> transformer = InplaceFloatCast(columns=["col1", "col2"], dtype=pl.Int32)
>>> transformer
InplaceFloatCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i32  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1.0  ┆ a    │
│ 2    ┆ 2    ┆ 2.0  ┆ b    │
│ 3    ┆ 3    ┆ 3.0  ┆ c    │
│ 4    ┆ 4    ┆ 4.0  ┆ d    │
│ 5    ┆ 5    ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceFloatCastTransformer ¶

Implement a transformer to convert float columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceFloatCast
>>> transformer = InplaceFloatCast(columns=["col1", "col2"], dtype=pl.Int32)
>>> transformer
InplaceFloatCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i32  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1.0  ┆ a    │
│ 2    ┆ 2    ┆ 2.0  ┆ b    │
│ 3    ┆ 3    ┆ 3.0  ┆ c    │
│ 4    ┆ 4    ┆ 4.0  ┆ d    │
│ 5    ┆ 5    ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceIntegerCast ¶

Implement a transformer to convert integer columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceIntegerCast
>>> transformer = InplaceIntegerCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
InplaceIntegerCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1, 2, 3, 4, 5],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Int64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1    ┆ a    │
│ 2    ┆ 2.0  ┆ 2    ┆ b    │
│ 3    ┆ 3.0  ┆ 3    ┆ c    │
│ 4    ┆ 4.0  ┆ 4    ┆ d    │
│ 5    ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f32  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ 1    ┆ a    │
│ 2.0  ┆ 2.0  ┆ 2    ┆ b    │
│ 3.0  ┆ 3.0  ┆ 3    ┆ c    │
│ 4.0  ┆ 4.0  ┆ 4    ┆ d    │
│ 5.0  ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceIntegerCastTransformer ¶

Implement a transformer to convert integer columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceIntegerCast
>>> transformer = InplaceIntegerCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
InplaceIntegerCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1, 2, 3, 4, 5],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Int64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1    ┆ a    │
│ 2    ┆ 2.0  ┆ 2    ┆ b    │
│ 3    ┆ 3.0  ┆ 3    ┆ c    │
│ 4    ┆ 4.0  ┆ 4    ┆ d    │
│ 5    ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f32  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ 1    ┆ a    │
│ 2.0  ┆ 2.0  ┆ 2    ┆ b    │
│ 3.0  ┆ 3.0  ┆ 3    ┆ c    │
│ 4.0  ┆ 4.0  ┆ 4    ┆ d    │
│ 5.0  ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceJsonDecode ¶

Bases: JsonDecodeTransformer

Implement a transformer to parse string values as JSON.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to parse. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `polars.Expr.str.json_decode`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceJsonDecode
>>> transformer = InplaceJsonDecode(columns=["col1", "col3"])
>>> transformer
InplaceJsonDecodeTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ str       ┆ str  ┆ str             ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ list[i64] ┆ str  ┆ list[str]       ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ["1", "2"]      ┆ a    │
│ [2]       ┆ 2    ┆ ["2"]           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ["1", "2", "3"] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ["4", "5"]      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ["5", "4"]      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘

grizz.transformer.InplaceJsonDecodeTransformer ¶

Bases: JsonDecodeTransformer

Implement a transformer to parse string values as JSON.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to parse. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `polars.Expr.str.json_decode`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceJsonDecode
>>> transformer = InplaceJsonDecode(columns=["col1", "col3"])
>>> transformer
InplaceJsonDecodeTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ str       ┆ str  ┆ str             ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ list[i64] ┆ str  ┆ list[str]       ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ["1", "2"]      ┆ a    │
│ [2]       ┆ 2    ┆ ["2"]           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ["1", "2", "3"] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ["4", "5"]      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ["5", "4"]      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘

grizz.transformer.InplaceLabelEncoder ¶

Bases: LabelEncoderTransformer

Implement a polars.DataFrame to encode the labels in a given column.

Parameters:

Name	Type	Description	Default
`col`	`str`	The column name to transform	required
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceLabelEncoder
>>> transformer = InplaceLabelEncoder(col="col1")
>>> transformer
InplaceLabelEncoderTransformer(col='col1', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": ["1", "2", "3", "4", "5"],
...     }
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ str  │
╞══════╪══════╡
│ a    ┆ 1    │
│ b    ┆ 2    │
│ c    ┆ 3    │
│ d    ┆ 4    │
│ e    ┆ 5    │
└──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 0    ┆ 1    │
│ 1    ┆ 2    │
│ 2    ┆ 3    │
│ 3    ┆ 4    │
│ 4    ┆ 5    │
└──────┴──────┘

grizz.transformer.InplaceLabelEncoderTransformer ¶

Bases: LabelEncoderTransformer

Implement a polars.DataFrame to encode the labels in a given column.

Parameters:

Name	Type	Description	Default
`col`	`str`	The column name to transform	required
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceLabelEncoder
>>> transformer = InplaceLabelEncoder(col="col1")
>>> transformer
InplaceLabelEncoderTransformer(col='col1', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": ["1", "2", "3", "4", "5"],
...     }
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ str  │
╞══════╪══════╡
│ a    ┆ 1    │
│ b    ┆ 2    │
│ c    ┆ 3    │
│ d    ┆ 4    │
│ e    ┆ 5    │
└──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 0    ┆ 1    │
│ 1    ┆ 2    │
│ 2    ┆ 3    │
│ 3    ┆ 4    │
│ 4    ┆ 5    │
└──────┴──────┘

grizz.transformer.InplaceNumericCast ¶

Implement a transformer to convert numeric columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceNumericCast
>>> transformer = InplaceNumericCast(columns=["col1", "col3"], dtype=pl.Float32)
>>> transformer
InplaceNumericCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float32,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f32  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f32  ┆ f32  ┆ f32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2.0  ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3.0  ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4.0  ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5.0  ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceNumericCastTransformer ¶

Implement a transformer to convert numeric columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceNumericCast
>>> transformer = InplaceNumericCast(columns=["col1", "col3"], dtype=pl.Float32)
>>> transformer
InplaceNumericCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float32,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f32  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f32  ┆ f32  ┆ f32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2.0  ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3.0  ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4.0  ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5.0  ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplacePowerTransformer ¶

Bases: PowerTransformer

Implement a transformer to apply a power transform featurewise to make data more Gaussian-like.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to transform. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.PowerTransformer`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplacePowerTransformer
>>> transformer = InplacePowerTransformer(columns=["col1", "col3"])
>>> transformer
InplacePowerTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 4)
┌───────────┬──────┬───────────┬──────┐
│ col1      ┆ col2 ┆ col3      ┆ col4 │
│ ---       ┆ ---  ┆ ---       ┆ ---  │
│ f64       ┆ str  ┆ f64       ┆ str  │
╞═══════════╪══════╪═══════════╪══════╡
│ -1.567837 ┆ 0    ┆ -1.695398 ┆ a    │
│ -0.836194 ┆ 1    ┆ -0.740367 ┆ b    │
│ -0.210053 ┆ 2    ┆ -0.117399 ┆ c    │
│ 0.356111  ┆ 3    ┆ 0.402585  ┆ d    │
│ 0.881486  ┆ 4    ┆ 0.864187  ┆ e    │
│ 1.376486  ┆ 5    ┆ 1.286392  ┆ f    │
└───────────┴──────┴───────────┴──────┘

grizz.transformer.InplaceQuantileTransformer ¶

Bases: QuantileTransformer

Implement a transformer to apply the quantile transformation.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.QuantileTransformer`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceQuantileTransformer
>>> transformer = InplaceQuantileTransformer(columns=["col1", "col3"])
>>> transformer
InplaceQuantileTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f64  ┆ str  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0.0  ┆ 0    ┆ 0.0  ┆ a    │
│ 0.2  ┆ 1    ┆ 0.2  ┆ b    │
│ 0.4  ┆ 2    ┆ 0.4  ┆ c    │
│ 0.6  ┆ 3    ┆ 0.6  ┆ d    │
│ 0.8  ┆ 4    ┆ 0.8  ┆ e    │
│ 1.0  ┆ 5    ┆ 1.0  ┆ f    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceReplace ¶

Bases: ReplaceTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name	Type	Description	Default
`col`	`str`	The column name.	required
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments to pass to `replace`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceReplace
>>> transformer = InplaceReplace(col="col", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5})
>>> transformer
InplaceReplaceTransformer(col='col', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
│ 4   │
│ 5   │
└─────┘
>>> transformer = InplaceReplace(col="col", old={"a": 1, "b": 2, "c": 3}, default=None)
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌──────┐
│ col  │
│ ---  │
│ i64  │
╞══════╡
│ 1    │
│ 2    │
│ 3    │
│ null │
│ null │
└──────┘

grizz.transformer.InplaceReplaceStrict ¶

Bases: ReplaceStrictTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name	Type	Description	Default
`col`	`str`	The column name.	required
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments to pass to `replace`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceReplaceStrict
>>> transformer = InplaceReplaceStrict(
...     col="col", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
InplaceReplaceStrictTransformer(col='col', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
│ 4   │
│ 5   │
└─────┘
>>> transformer = InplaceReplaceStrict(
...     col="col", old={"a": 1, "b": 2, "c": 3}, default=None
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌──────┐
│ col  │
│ ---  │
│ i64  │
╞══════╡
│ 1    │
│ 2    │
│ 3    │
│ null │
│ null │
└──────┘

grizz.transformer.InplaceReplaceStrictTransformer ¶

Bases: ReplaceStrictTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name	Type	Description	Default
`col`	`str`	The column name.	required
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments to pass to `replace`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceReplaceStrict
>>> transformer = InplaceReplaceStrict(
...     col="col", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
InplaceReplaceStrictTransformer(col='col', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
│ 4   │
│ 5   │
└─────┘
>>> transformer = InplaceReplaceStrict(
...     col="col", old={"a": 1, "b": 2, "c": 3}, default=None
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌──────┐
│ col  │
│ ---  │
│ i64  │
╞══════╡
│ 1    │
│ 2    │
│ 3    │
│ null │
│ null │
└──────┘

grizz.transformer.InplaceReplaceTransformer ¶

Bases: ReplaceTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name	Type	Description	Default
`col`	`str`	The column name.	required
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments to pass to `replace`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceReplace
>>> transformer = InplaceReplace(col="col", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5})
>>> transformer
InplaceReplaceTransformer(col='col', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
│ 4   │
│ 5   │
└─────┘
>>> transformer = InplaceReplace(col="col", old={"a": 1, "b": 2, "c": 3}, default=None)
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌──────┐
│ col  │
│ ---  │
│ i64  │
╞══════╡
│ 1    │
│ 2    │
│ 3    │
│ null │
│ null │
└──────┘

grizz.transformer.InplaceRobustScaler ¶

Bases: RobustScalerTransformer

Implement a transformer to scale each column using statistics that are robust to outliers.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.RobustScaler`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceRobustScaler
>>> transformer = InplaceRobustScaler(columns=["col1", "col3"])
>>> transformer
InplaceRobustScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f64  ┆ str  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ -1.0 ┆ 0    ┆ -1.0 ┆ a    │
│ -0.6 ┆ 1    ┆ -0.6 ┆ b    │
│ -0.2 ┆ 2    ┆ -0.2 ┆ c    │
│ 0.2  ┆ 3    ┆ 0.2  ┆ d    │
│ 0.6  ┆ 4    ┆ 0.6  ┆ e    │
│ 1.0  ┆ 5    ┆ 1.0  ┆ f    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceRobustScalerTransformer ¶

Bases: RobustScalerTransformer

Implement a transformer to scale each column using statistics that are robust to outliers.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.RobustScaler`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceRobustScaler
>>> transformer = InplaceRobustScaler(columns=["col1", "col3"])
>>> transformer
InplaceRobustScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f64  ┆ str  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ -1.0 ┆ 0    ┆ -1.0 ┆ a    │
│ -0.6 ┆ 1    ┆ -0.6 ┆ b    │
│ -0.2 ┆ 2    ┆ -0.2 ┆ c    │
│ 0.2  ┆ 3    ┆ 0.2  ┆ d    │
│ 0.6  ┆ 4    ┆ 0.6  ┆ e    │
│ 1.0  ┆ 5    ┆ 1.0  ┆ f    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceStandardScaler ¶

Bases: StandardScalerTransformer

Implement a transformer to standardize each column by removing the mean and scaling to unit variance.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.StandardScaler`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStandardScaler
>>> transformer = InplaceStandardScaler(columns=["col1", "col3"])
>>> transformer
InplaceStandardScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬───────────┬──────┐
│ col1      ┆ col2 ┆ col3      ┆ col4 │
│ ---       ┆ ---  ┆ ---       ┆ ---  │
│ f64       ┆ str  ┆ f64       ┆ str  │
╞═══════════╪══════╪═══════════╪══════╡
│ -1.414214 ┆ 1    ┆ -1.414214 ┆ a    │
│ -0.707107 ┆ 2    ┆ -0.707107 ┆ b    │
│ 0.0       ┆ 3    ┆ 0.0       ┆ c    │
│ 0.707107  ┆ 4    ┆ 0.707107  ┆ d    │
│ 1.414214  ┆ 5    ┆ 1.414214  ┆ e    │
└───────────┴──────┴───────────┴──────┘

grizz.transformer.InplaceStandardScalerTransformer ¶

Bases: StandardScalerTransformer

Implement a transformer to standardize each column by removing the mean and scaling to unit variance.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.StandardScaler`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStandardScaler
>>> transformer = InplaceStandardScaler(columns=["col1", "col3"])
>>> transformer
InplaceStandardScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬───────────┬──────┐
│ col1      ┆ col2 ┆ col3      ┆ col4 │
│ ---       ┆ ---  ┆ ---       ┆ ---  │
│ f64       ┆ str  ┆ f64       ┆ str  │
╞═══════════╪══════╪═══════════╪══════╡
│ -1.414214 ┆ 1    ┆ -1.414214 ┆ a    │
│ -0.707107 ┆ 2    ┆ -0.707107 ┆ b    │
│ 0.0       ┆ 3    ┆ 0.0       ┆ c    │
│ 0.707107  ┆ 4    ┆ 0.707107  ┆ d    │
│ 1.414214  ┆ 5    ┆ 1.414214  ┆ e    │
└───────────┴──────┴───────────┴──────┘

grizz.transformer.InplaceStringToDatetime ¶

Bases: StringToDatetimeTransformer

Implement a transformer to convert some string columns to polars.Datetime type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert to `polars.Datetime`. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `to_time`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStringToDatetime
>>> transformer = InplaceStringToDatetime(columns=["col1"])
>>> transformer
InplaceStringToDatetimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ datetime[μs]        ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘

grizz.transformer.InplaceStringToDatetimeTransformer ¶

Bases: StringToDatetimeTransformer

Implement a transformer to convert some string columns to polars.Datetime type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert to `polars.Datetime`. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `to_time`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStringToDatetime
>>> transformer = InplaceStringToDatetime(columns=["col1"])
>>> transformer
InplaceStringToDatetimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ datetime[μs]        ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘

grizz.transformer.InplaceStringToTime ¶

Bases: StringToTimeTransformer

Implement a transformer to convert some string columns to a polars.Time type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `polars.Expr.str.to_time`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStringToTime
>>> transformer = InplaceStringToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
InplaceStringToTimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', format='%H:%M:%S')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ time     ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘

grizz.transformer.InplaceStringToTimeTransformer ¶

Bases: StringToTimeTransformer

Implement a transformer to convert some string columns to a polars.Time type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `polars.Expr.str.to_time`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStringToTime
>>> transformer = InplaceStringToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
InplaceStringToTimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', format='%H:%M:%S')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ time     ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘

grizz.transformer.InplaceStripChars ¶

Bases: StripCharsTransformer

Implement a transformer to remove leading and trailing characters.

This transformer ignores the columns that are not of type string.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to prepare. If `None`, it processes all the columns of type string.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `strip_chars`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStripChars
>>> transformer = InplaceStripChars(columns=["col2", "col3"])
>>> transformer
InplaceStripCharsTransformer(columns=('col2', 'col3'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ str  ┆ str  ┆ str   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1    ┆ a    ┆ a     │
│ 2    ┆ 2    ┆ b    ┆  b    │
│ 3    ┆ 3    ┆ c    ┆   c   │
│ 4    ┆ 4    ┆ d    ┆ d     │
│ 5    ┆ 5    ┆ e    ┆ e     │
└──────┴──────┴──────┴───────┘

grizz.transformer.InplaceStripCharsTransformer ¶

Bases: StripCharsTransformer

Implement a transformer to remove leading and trailing characters.

This transformer ignores the columns that are not of type string.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to prepare. If `None`, it processes all the columns of type string.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `strip_chars`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStripChars
>>> transformer = InplaceStripChars(columns=["col2", "col3"])
>>> transformer
InplaceStripCharsTransformer(columns=('col2', 'col3'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ str  ┆ str  ┆ str   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1    ┆ a    ┆ a     │
│ 2    ┆ 2    ┆ b    ┆  b    │
│ 3    ┆ 3    ┆ c    ┆   c   │
│ 4    ┆ 4    ┆ d    ┆ d     │
│ 5    ┆ 5    ┆ e    ┆ e     │
└──────┴──────┴──────┴───────┘

grizz.transformer.InplaceToDatetime ¶

Bases: ToDatetimeTransformer

Implement a transformer to convert some columns to a polars.Datetime type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert to `polars.Datetime`. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `to_time`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceToDatetime
>>> transformer = InplaceToDatetime(columns=["col1"])
>>> transformer
InplaceToDatetimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ datetime[μs]        ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘

grizz.transformer.InplaceToDatetimeTransformer ¶

Bases: ToDatetimeTransformer

Implement a transformer to convert some columns to a polars.Datetime type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert to `polars.Datetime`. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `to_time`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceToDatetime
>>> transformer = InplaceToDatetime(columns=["col1"])
>>> transformer
InplaceToDatetimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ datetime[μs]        ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘

grizz.transformer.InplaceToTime ¶

Bases: ToTimeTransformer

Implement a transformer to convert some columns to a polars.Time type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `to_time`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceToTime
>>> transformer = InplaceToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
InplaceToTimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', format='%H:%M:%S')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ time     ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘

grizz.transformer.InplaceToTimeTransformer ¶

Bases: ToTimeTransformer

Implement a transformer to convert some columns to a polars.Time type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `to_time`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceToTime
>>> transformer = InplaceToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
InplaceToTimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', format='%H:%M:%S')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ time     ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘

grizz.transformer.IntegerCast ¶

Bases: CastTransformer

Implement a transformer to convert integer columns to a new data type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import IntegerCast
>>> transformer = IntegerCast(
...     columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
IntegerCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1, 2, 3, 4, 5],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Int64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1    ┆ a    │
│ 2    ┆ 2.0  ┆ 2    ┆ b    │
│ 3    ┆ 3.0  ┆ 3    ┆ c    │
│ 4    ┆ 4.0  ┆ 4    ┆ d    │
│ 5    ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      │
│ i64  ┆ f64  ┆ i64  ┆ str  ┆ f32      │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1    ┆ 1.0  ┆ 1    ┆ a    ┆ 1.0      │
│ 2    ┆ 2.0  ┆ 2    ┆ b    ┆ 2.0      │
│ 3    ┆ 3.0  ┆ 3    ┆ c    ┆ 3.0      │
│ 4    ┆ 4.0  ┆ 4    ┆ d    ┆ 4.0      │
│ 5    ┆ 5.0  ┆ 5    ┆ e    ┆ 5.0      │
└──────┴──────┴──────┴──────┴──────────┘

grizz.transformer.IntegerCastTransformer ¶

Bases: CastTransformer

Implement a transformer to convert integer columns to a new data type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import IntegerCast
>>> transformer = IntegerCast(
...     columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
IntegerCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1, 2, 3, 4, 5],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Int64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1    ┆ a    │
│ 2    ┆ 2.0  ┆ 2    ┆ b    │
│ 3    ┆ 3.0  ┆ 3    ┆ c    │
│ 4    ┆ 4.0  ┆ 4    ┆ d    │
│ 5    ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      │
│ i64  ┆ f64  ┆ i64  ┆ str  ┆ f32      │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1    ┆ 1.0  ┆ 1    ┆ a    ┆ 1.0      │
│ 2    ┆ 2.0  ┆ 2    ┆ b    ┆ 2.0      │
│ 3    ┆ 3.0  ┆ 3    ┆ c    ┆ 3.0      │
│ 4    ┆ 4.0  ┆ 4    ┆ d    ┆ 4.0      │
│ 5    ┆ 5.0  ┆ 5    ┆ e    ┆ 5.0      │
└──────┴──────┴──────┴──────┴──────────┘

grizz.transformer.JsonDecode ¶

Implement a transformer to parse string values as JSON.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to parse. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `polars.Expr.str.json_decode`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import JsonDecode
>>> transformer = JsonDecode(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
JsonDecodeTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ str       ┆ str  ┆ str             ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌───────────┬──────┬─────────────────┬──────┬───────────┬─────────────────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 ┆ col1_out  ┆ col3_out        │
│ ---       ┆ ---  ┆ ---             ┆ ---  ┆ ---       ┆ ---             │
│ str       ┆ str  ┆ str             ┆ str  ┆ list[i64] ┆ list[str]       │
╞═══════════╪══════╪═════════════════╪══════╪═══════════╪═════════════════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    ┆ [1, 2]    ┆ ["1", "2"]      │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    ┆ [2]       ┆ ["2"]           │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    ┆ [1, 2, 3] ┆ ["1", "2", "3"] │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    ┆ [4, 5]    ┆ ["4", "5"]      │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    ┆ [5, 4]    ┆ ["5", "4"]      │
└───────────┴──────┴─────────────────┴──────┴───────────┴─────────────────┘

grizz.transformer.JsonDecodeTransformer ¶

Implement a transformer to parse string values as JSON.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to parse. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `polars.Expr.str.json_decode`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import JsonDecode
>>> transformer = JsonDecode(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
JsonDecodeTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ str       ┆ str  ┆ str             ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌───────────┬──────┬─────────────────┬──────┬───────────┬─────────────────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 ┆ col1_out  ┆ col3_out        │
│ ---       ┆ ---  ┆ ---             ┆ ---  ┆ ---       ┆ ---             │
│ str       ┆ str  ┆ str             ┆ str  ┆ list[i64] ┆ list[str]       │
╞═══════════╪══════╪═════════════════╪══════╪═══════════╪═════════════════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    ┆ [1, 2]    ┆ ["1", "2"]      │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    ┆ [2]       ┆ ["2"]           │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    ┆ [1, 2, 3] ┆ ["1", "2", "3"] │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    ┆ [4, 5]    ┆ ["4", "5"]      │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    ┆ [5, 4]    ┆ ["5", "4"]      │
└───────────┴──────┴─────────────────┴──────┴───────────┴─────────────────┘

grizz.transformer.LabelEncoder ¶

Implement a polars.DataFrame to encode the labels in a given column.

Parameters:

Name	Type	Description	Default
`in_col`	`str`	The input column name i.e. the column with the label to encode.	required
`out_col`	`str`	The output column name i.e. the column with encoded labels.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import LabelEncoder
>>> transformer = LabelEncoder(in_col="col1", out_col="out")
>>> transformer
LabelEncoderTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": ["1", "2", "3", "4", "5"],
...     }
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ str  │
╞══════╪══════╡
│ a    ┆ 1    │
│ b    ┆ 2    │
│ c    ┆ 3    │
│ d    ┆ 4    │
│ e    ┆ 5    │
└──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬─────┐
│ col1 ┆ col2 ┆ out │
│ ---  ┆ ---  ┆ --- │
│ str  ┆ str  ┆ i64 │
╞══════╪══════╪═════╡
│ a    ┆ 1    ┆ 0   │
│ b    ┆ 2    ┆ 1   │
│ c    ┆ 3    ┆ 2   │
│ d    ┆ 4    ┆ 3   │
│ e    ┆ 5    ┆ 4   │
└──────┴──────┴─────┘

grizz.transformer.LabelEncoderTransformer ¶

Implement a polars.DataFrame to encode the labels in a given column.

Parameters:

Name	Type	Description	Default
`in_col`	`str`	The input column name i.e. the column with the label to encode.	required
`out_col`	`str`	The output column name i.e. the column with encoded labels.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import LabelEncoder
>>> transformer = LabelEncoder(in_col="col1", out_col="out")
>>> transformer
LabelEncoderTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": ["1", "2", "3", "4", "5"],
...     }
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ str  │
╞══════╪══════╡
│ a    ┆ 1    │
│ b    ┆ 2    │
│ c    ┆ 3    │
│ d    ┆ 4    │
│ e    ┆ 5    │
└──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬─────┐
│ col1 ┆ col2 ┆ out │
│ ---  ┆ ---  ┆ --- │
│ str  ┆ str  ┆ i64 │
╞══════╪══════╪═════╡
│ a    ┆ 1    ┆ 0   │
│ b    ┆ 2    ┆ 1   │
│ c    ┆ 3    ┆ 2   │
│ d    ┆ 4    ┆ 3   │
│ e    ┆ 5    ┆ 4   │
└──────┴──────┴─────┘

grizz.transformer.Lower ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the lower operation.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Lower
>>> transformer = Lower(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
LowerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.LowerEqual ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the lower than or equal operation.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import LowerEqual
>>> transformer = LowerEqual(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
LowerEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.LowerEqualTransformer ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the lower than or equal operation.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import LowerEqual
>>> transformer = LowerEqual(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
LowerEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.LowerTransformer ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the lower operation.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Lower
>>> transformer = Lower(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
LowerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.MaxAbsScaler ¶

Implement a transformer to scale columns by the maximum absolute value of each column.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MaxAbsScaler
>>> transformer = MaxAbsScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
MaxAbsScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ 0.2      ┆ 0.2      │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ 0.4      ┆ 0.4      │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ 0.6      ┆ 0.6      │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ 0.8      ┆ 0.8      │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ 1.0      ┆ 1.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.MaxAbsScalerTransformer ¶

Implement a transformer to scale columns by the maximum absolute value of each column.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MaxAbsScaler
>>> transformer = MaxAbsScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
MaxAbsScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ 0.2      ┆ 0.2      │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ 0.4      ┆ 0.4      │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ 0.6      ┆ 0.6      │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ 0.8      ┆ 0.8      │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ 1.0      ┆ 1.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.MaxHorizontal ¶

Implement a transformer to get the maximum value horizontally across columns and store the result in a column.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns the maximum value horizontally. The columns should be compatible. If `None`, it processes all the columns.	required
`out_col`	`str`	The output column.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MaxHorizontal
>>> transformer = MaxHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MaxHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [9, 5, 4, 9, 6],
...         "col2": [8, 0, 1, 8, 9],
...         "col3": [0, 4, 8, 7, 0],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 9    ┆ 8    ┆ 0    ┆ a    │
│ 5    ┆ 0    ┆ 4    ┆ b    │
│ 4    ┆ 1    ┆ 8    ┆ c    │
│ 9    ┆ 8    ┆ 7    ┆ d    │
│ 6    ┆ 9    ┆ 0    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 9    ┆ 8    ┆ 0    ┆ a    ┆ 9   │
│ 5    ┆ 0    ┆ 4    ┆ b    ┆ 5   │
│ 4    ┆ 1    ┆ 8    ┆ c    ┆ 8   │
│ 9    ┆ 8    ┆ 7    ┆ d    ┆ 9   │
│ 6    ┆ 9    ┆ 0    ┆ e    ┆ 9   │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.MaxHorizontalTransformer ¶

Implement a transformer to get the maximum value horizontally across columns and store the result in a column.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns the maximum value horizontally. The columns should be compatible. If `None`, it processes all the columns.	required
`out_col`	`str`	The output column.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MaxHorizontal
>>> transformer = MaxHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MaxHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [9, 5, 4, 9, 6],
...         "col2": [8, 0, 1, 8, 9],
...         "col3": [0, 4, 8, 7, 0],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 9    ┆ 8    ┆ 0    ┆ a    │
│ 5    ┆ 0    ┆ 4    ┆ b    │
│ 4    ┆ 1    ┆ 8    ┆ c    │
│ 9    ┆ 8    ┆ 7    ┆ d    │
│ 6    ┆ 9    ┆ 0    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 9    ┆ 8    ┆ 0    ┆ a    ┆ 9   │
│ 5    ┆ 0    ┆ 4    ┆ b    ┆ 5   │
│ 4    ┆ 1    ┆ 8    ┆ c    ┆ 8   │
│ 9    ┆ 8    ┆ 7    ┆ d    ┆ 9   │
│ 6    ┆ 9    ┆ 0    ┆ e    ┆ 9   │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.MeanHorizontal ¶

Implement a transformer to get the mean value horizontally across columns and store the result in a column.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns the mean value horizontally. The columns should be compatible. If `None`, it processes all the columns.	required
`out_col`	`str`	The output column.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `polars.mean_horizontal`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MeanHorizontal
>>> transformer = MeanHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MeanHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col  │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ 21.0 │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ 22.0 │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ 23.0 │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ 24.0 │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ 25.0 │
└──────┴──────┴──────┴──────┴──────┘

grizz.transformer.MeanHorizontalTransformer ¶

Implement a transformer to get the mean value horizontally across columns and store the result in a column.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns the mean value horizontally. The columns should be compatible. If `None`, it processes all the columns.	required
`out_col`	`str`	The output column.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `polars.mean_horizontal`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MeanHorizontal
>>> transformer = MeanHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MeanHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col  │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ 21.0 │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ 22.0 │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ 23.0 │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ 24.0 │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ 25.0 │
└──────┴──────┴──────┴──────┴──────┘

grizz.transformer.MinHorizontal ¶

Implement a transformer to get the minimum value horizontally across columns and store the result in a column.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns the minimum value horizontally. The columns should be compatible. If `None`, it processes all the columns.	required
`out_col`	`str`	The output column.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MinHorizontal
>>> transformer = MinHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MinHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [9, 5, 4, 9, 6],
...         "col2": [8, 0, 1, 8, 9],
...         "col3": [0, 4, 8, 7, 0],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 9    ┆ 8    ┆ 0    ┆ a    │
│ 5    ┆ 0    ┆ 4    ┆ b    │
│ 4    ┆ 1    ┆ 8    ┆ c    │
│ 9    ┆ 8    ┆ 7    ┆ d    │
│ 6    ┆ 9    ┆ 0    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 9    ┆ 8    ┆ 0    ┆ a    ┆ 0   │
│ 5    ┆ 0    ┆ 4    ┆ b    ┆ 0   │
│ 4    ┆ 1    ┆ 8    ┆ c    ┆ 1   │
│ 9    ┆ 8    ┆ 7    ┆ d    ┆ 7   │
│ 6    ┆ 9    ┆ 0    ┆ e    ┆ 0   │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.MinHorizontalTransformer ¶

Implement a transformer to get the minimum value horizontally across columns and store the result in a column.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns the minimum value horizontally. The columns should be compatible. If `None`, it processes all the columns.	required
`out_col`	`str`	The output column.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MinHorizontal
>>> transformer = MinHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MinHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [9, 5, 4, 9, 6],
...         "col2": [8, 0, 1, 8, 9],
...         "col3": [0, 4, 8, 7, 0],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 9    ┆ 8    ┆ 0    ┆ a    │
│ 5    ┆ 0    ┆ 4    ┆ b    │
│ 4    ┆ 1    ┆ 8    ┆ c    │
│ 9    ┆ 8    ┆ 7    ┆ d    │
│ 6    ┆ 9    ┆ 0    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 9    ┆ 8    ┆ 0    ┆ a    ┆ 0   │
│ 5    ┆ 0    ┆ 4    ┆ b    ┆ 0   │
│ 4    ┆ 1    ┆ 8    ┆ c    ┆ 1   │
│ 9    ┆ 8    ┆ 7    ┆ d    ┆ 7   │
│ 6    ┆ 9    ┆ 0    ┆ e    ┆ 0   │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.MinMaxScaler ¶

Implement a transformer to scale each column to a given range.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.MinMaxScaler`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MinMaxScaler
>>> transformer = MinMaxScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
MinMaxScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘

>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ 0    ┆ a    ┆ 0.0      ┆ 0.0      │
│ 1    ┆ 1    ┆ 10   ┆ b    ┆ 0.2      ┆ 0.2      │
│ 2    ┆ 2    ┆ 20   ┆ c    ┆ 0.4      ┆ 0.4      │
│ 3    ┆ 3    ┆ 30   ┆ d    ┆ 0.6      ┆ 0.6      │
│ 4    ┆ 4    ┆ 40   ┆ e    ┆ 0.8      ┆ 0.8      │
│ 5    ┆ 5    ┆ 50   ┆ f    ┆ 1.0      ┆ 1.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.MinMaxScalerTransformer ¶

Implement a transformer to scale each column to a given range.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.MinMaxScaler`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MinMaxScaler
>>> transformer = MinMaxScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
MinMaxScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘

>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ 0    ┆ a    ┆ 0.0      ┆ 0.0      │
│ 1    ┆ 1    ┆ 10   ┆ b    ┆ 0.2      ┆ 0.2      │
│ 2    ┆ 2    ┆ 20   ┆ c    ┆ 0.4      ┆ 0.4      │
│ 3    ┆ 3    ┆ 30   ┆ d    ┆ 0.6      ┆ 0.6      │
│ 4    ┆ 4    ┆ 40   ┆ e    ┆ 0.8      ┆ 0.8      │
│ 5    ┆ 5    ┆ 50   ┆ f    ┆ 1.0      ┆ 1.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.Normalizer ¶

Implement a transformer to normalize data points individually to unit norm.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.Normalizer`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Normalizer
>>> transformer = Normalizer(columns=["col1", "col3"], prefix="", suffix="_norm")
>>> transformer
NormalizerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_norm')
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [5, 4, 3, 2, 1, 0],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 5    ┆ a    │
│ 1    ┆ 1    ┆ 4    ┆ b    │
│ 2    ┆ 2    ┆ 3    ┆ c    │
│ 3    ┆ 3    ┆ 2    ┆ d    │
│ 4    ┆ 4    ┆ 1    ┆ e    │
│ 5    ┆ 5    ┆ 0    ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬───────────┬───────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_norm ┆ col3_norm │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---       ┆ ---       │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64       ┆ f64       │
╞══════╪══════╪══════╪══════╪═══════════╪═══════════╡
│ 0    ┆ 0    ┆ 5    ┆ a    ┆ 0.0       ┆ 1.0       │
│ 1    ┆ 1    ┆ 4    ┆ b    ┆ 0.242536  ┆ 0.970143  │
│ 2    ┆ 2    ┆ 3    ┆ c    ┆ 0.5547    ┆ 0.83205   │
│ 3    ┆ 3    ┆ 2    ┆ d    ┆ 0.83205   ┆ 0.5547    │
│ 4    ┆ 4    ┆ 1    ┆ e    ┆ 0.970143  ┆ 0.242536  │
│ 5    ┆ 5    ┆ 0    ┆ f    ┆ 1.0       ┆ 0.0       │
└──────┴──────┴──────┴──────┴───────────┴───────────┘

grizz.transformer.NormalizerTransformer ¶

Implement a transformer to normalize data points individually to unit norm.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.Normalizer`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Normalizer
>>> transformer = Normalizer(columns=["col1", "col3"], prefix="", suffix="_norm")
>>> transformer
NormalizerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_norm')
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [5, 4, 3, 2, 1, 0],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 5    ┆ a    │
│ 1    ┆ 1    ┆ 4    ┆ b    │
│ 2    ┆ 2    ┆ 3    ┆ c    │
│ 3    ┆ 3    ┆ 2    ┆ d    │
│ 4    ┆ 4    ┆ 1    ┆ e    │
│ 5    ┆ 5    ┆ 0    ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬───────────┬───────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_norm ┆ col3_norm │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---       ┆ ---       │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64       ┆ f64       │
╞══════╪══════╪══════╪══════╪═══════════╪═══════════╡
│ 0    ┆ 0    ┆ 5    ┆ a    ┆ 0.0       ┆ 1.0       │
│ 1    ┆ 1    ┆ 4    ┆ b    ┆ 0.242536  ┆ 0.970143  │
│ 2    ┆ 2    ┆ 3    ┆ c    ┆ 0.5547    ┆ 0.83205   │
│ 3    ┆ 3    ┆ 2    ┆ d    ┆ 0.83205   ┆ 0.5547    │
│ 4    ┆ 4    ┆ 1    ┆ e    ┆ 0.970143  ┆ 0.242536  │
│ 5    ┆ 5    ┆ 0    ┆ f    ┆ 1.0       ┆ 0.0       │
└──────┴──────┴──────┴──────┴───────────┴───────────┘

grizz.transformer.NotEqual ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the not equal operation.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import NotEqual
>>> transformer = NotEqual(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
NotEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.NotEqualMissing ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the not equal operation where where null values are not propagated.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import NotEqualMissing
>>> transformer = NotEqualMissing(
...     columns=["col1", "col3"], target=3, prefix="", suffix="_out"
... )
>>> transformer
NotEqualMissingTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.NotEqualMissingTransformer ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the not equal operation where where null values are not propagated.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import NotEqualMissing
>>> transformer = NotEqualMissing(
...     columns=["col1", "col3"], target=3, prefix="", suffix="_out"
... )
>>> transformer
NotEqualMissingTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.NotEqualTransformer ¶

Bases: BaseComparatorTransformer

Implements a transformer that computes the not equal operation.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to compare. `None` means all the columns.	required
`target`	`Any`	The target value to compare with.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import NotEqual
>>> transformer = NotEqual(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
NotEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.NumericCast ¶

Bases: CastTransformer

Implement a transformer to convert numeric columns to a new data type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import NumericCast
>>> transformer = NumericCast(
...     columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
NumericCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float32,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f32  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col2_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ f32  ┆ f64  ┆ str  ┆ f32      ┆ f32      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    ┆ 1.0      ┆ 1.0      │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    ┆ 2.0      ┆ 2.0      │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    ┆ 3.0      ┆ 3.0      │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    ┆ 4.0      ┆ 4.0      │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    ┆ 5.0      ┆ 5.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.NumericCastTransformer ¶

Bases: CastTransformer

Implement a transformer to convert numeric columns to a new data type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to convert. `None` means all the columns.	required
`dtype`	`type[DataType]`	The target data type.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `cast`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import NumericCast
>>> transformer = NumericCast(
...     columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
NumericCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float32,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f32  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col2_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ f32  ┆ f64  ┆ str  ┆ f32      ┆ f32      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    ┆ 1.0      ┆ 1.0      │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    ┆ 2.0      ┆ 2.0      │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    ┆ 3.0      ┆ 3.0      │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    ┆ 4.0      ┆ 4.0      │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    ┆ 5.0      ┆ 5.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.OrdinalEncoder ¶

Implement a transformer to convert each column ordinal integers.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.OrdinalEncoder`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import OrdinalEncoder
>>> transformer = OrdinalEncoder(columns=["col1", "col2"], prefix="", suffix="_out")
>>> transformer
OrdinalEncoderTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["a", "b", "c", "d", "e", "f"],
...         "col3": [0, 10, 20, 30, 40, 50],
...     }
... )
>>> frame
shape: (6, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╡
│ 0    ┆ a    ┆ 0    │
│ 1    ┆ b    ┆ 10   │
│ 2    ┆ c    ┆ 20   │
│ 3    ┆ d    ┆ 30   │
│ 4    ┆ e    ┆ 40   │
│ 5    ┆ f    ┆ 50   │
└──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 5)
┌──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col1_out ┆ col2_out │
│ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ a    ┆ 0    ┆ 0.0      ┆ 0.0      │
│ 1    ┆ b    ┆ 10   ┆ 1.0      ┆ 1.0      │
│ 2    ┆ c    ┆ 20   ┆ 2.0      ┆ 2.0      │
│ 3    ┆ d    ┆ 30   ┆ 3.0      ┆ 3.0      │
│ 4    ┆ e    ┆ 40   ┆ 4.0      ┆ 4.0      │
│ 5    ┆ f    ┆ 50   ┆ 5.0      ┆ 5.0      │
└──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.OrdinalEncoderTransformer ¶

Implement a transformer to convert each column ordinal integers.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.OrdinalEncoder`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import OrdinalEncoder
>>> transformer = OrdinalEncoder(columns=["col1", "col2"], prefix="", suffix="_out")
>>> transformer
OrdinalEncoderTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["a", "b", "c", "d", "e", "f"],
...         "col3": [0, 10, 20, 30, 40, 50],
...     }
... )
>>> frame
shape: (6, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╡
│ 0    ┆ a    ┆ 0    │
│ 1    ┆ b    ┆ 10   │
│ 2    ┆ c    ┆ 20   │
│ 3    ┆ d    ┆ 30   │
│ 4    ┆ e    ┆ 40   │
│ 5    ┆ f    ┆ 50   │
└──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 5)
┌──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col1_out ┆ col2_out │
│ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ a    ┆ 0    ┆ 0.0      ┆ 0.0      │
│ 1    ┆ b    ┆ 10   ┆ 1.0      ┆ 1.0      │
│ 2    ┆ c    ┆ 20   ┆ 2.0      ┆ 2.0      │
│ 3    ┆ d    ┆ 30   ┆ 3.0      ┆ 3.0      │
│ 4    ┆ e    ┆ 40   ┆ 4.0      ┆ 4.0      │
│ 5    ┆ f    ┆ 50   ┆ 5.0      ┆ 5.0      │
└──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.PowerTransformer ¶

Implement a transformer to apply a power transform featurewise to make data more Gaussian-like.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.PowerTransformer`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import PowerTransformer
>>> transformer = PowerTransformer(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
PowerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬───────────┬───────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out  ┆ col3_out  │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---       ┆ ---       │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64       ┆ f64       │
╞══════╪══════╪══════╪══════╪═══════════╪═══════════╡
│ 0    ┆ 0    ┆ 0    ┆ a    ┆ -1.567837 ┆ -1.695398 │
│ 1    ┆ 1    ┆ 10   ┆ b    ┆ -0.836194 ┆ -0.740367 │
│ 2    ┆ 2    ┆ 20   ┆ c    ┆ -0.210053 ┆ -0.117399 │
│ 3    ┆ 3    ┆ 30   ┆ d    ┆ 0.356111  ┆ 0.402585  │
│ 4    ┆ 4    ┆ 40   ┆ e    ┆ 0.881486  ┆ 0.864187  │
│ 5    ┆ 5    ┆ 50   ┆ f    ┆ 1.376486  ┆ 1.286392  │
└──────┴──────┴──────┴──────┴───────────┴───────────┘

grizz.transformer.QuantileTransformer ¶

Implement a transformer to apply the quantile transformation.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.QuantileTransformer`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import QuantileTransformer
>>> transformer = QuantileTransformer(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
QuantileTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ 0    ┆ a    ┆ 0.0      ┆ 0.0      │
│ 1    ┆ 1    ┆ 10   ┆ b    ┆ 0.2      ┆ 0.2      │
│ 2    ┆ 2    ┆ 20   ┆ c    ┆ 0.4      ┆ 0.4      │
│ 3    ┆ 3    ┆ 30   ┆ d    ┆ 0.6      ┆ 0.6      │
│ 4    ┆ 4    ┆ 40   ┆ e    ┆ 0.8      ┆ 0.8      │
│ 5    ┆ 5    ┆ 50   ┆ f    ┆ 1.0      ┆ 1.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.Replace ¶

Replace the values in a column by the values in a mapping.

Parameters:

Name	Type	Description	Default
`in_col`	`str`	The input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments to pass to `replace`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Replace
>>> transformer = Replace(in_col="old", out_col="new", old={"a": 1, "b": 2, "c": 3})
>>> transformer
ReplaceTransformer(in_col='old', out_col='new', exist_policy='raise', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3})
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ d   │
│ e   ┆ e   │
└─────┴─────┘
>>> transformer = Replace(
...     in_col="old",
...     out_col="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

grizz.transformer.ReplaceStrict ¶

Replace the values in a column by the values in a mapping.

Parameters:

Name	Type	Description	Default
`in_col`	`str`	The input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments to pass to `replace`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ReplaceStrict
>>> transformer = ReplaceStrict(
...     in_col="old", out_col="new", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
ReplaceStrictTransformer(in_col='old', out_col='new', exist_policy='raise', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ 4   │
│ e   ┆ 5   │
└─────┴─────┘
>>> transformer = ReplaceStrict(
...     in_col="old",
...     out_col="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

grizz.transformer.ReplaceStrictTransformer ¶

Replace the values in a column by the values in a mapping.

Parameters:

Name	Type	Description	Default
`in_col`	`str`	The input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments to pass to `replace`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ReplaceStrict
>>> transformer = ReplaceStrict(
...     in_col="old", out_col="new", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
ReplaceStrictTransformer(in_col='old', out_col='new', exist_policy='raise', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ 4   │
│ e   ┆ 5   │
└─────┴─────┘
>>> transformer = ReplaceStrict(
...     in_col="old",
...     out_col="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

grizz.transformer.ReplaceTransformer ¶

Replace the values in a column by the values in a mapping.

Parameters:

Name	Type	Description	Default
`in_col`	`str`	The input column name.	required
`out_col`	`str`	The output column name.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments to pass to `replace`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Replace
>>> transformer = Replace(in_col="old", out_col="new", old={"a": 1, "b": 2, "c": 3})
>>> transformer
ReplaceTransformer(in_col='old', out_col='new', exist_policy='raise', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3})
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ d   │
│ e   ┆ e   │
└─────┴─────┘
>>> transformer = Replace(
...     in_col="old",
...     out_col="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

grizz.transformer.RobustScaler ¶

Implement a transformer to scale each column using statistics that are robust to outliers.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.RobustScaler`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import RobustScaler
>>> transformer = RobustScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
RobustScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ 0    ┆ a    ┆ -1.0     ┆ -1.0     │
│ 1    ┆ 1    ┆ 10   ┆ b    ┆ -0.6     ┆ -0.6     │
│ 2    ┆ 2    ┆ 20   ┆ c    ┆ -0.2     ┆ -0.2     │
│ 3    ┆ 3    ┆ 30   ┆ d    ┆ 0.2      ┆ 0.2      │
│ 4    ┆ 4    ┆ 40   ┆ e    ┆ 0.6      ┆ 0.6      │
│ 5    ┆ 5    ┆ 50   ┆ f    ┆ 1.0      ┆ 1.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.RobustScalerTransformer ¶

Implement a transformer to scale each column using statistics that are robust to outliers.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.RobustScaler`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import RobustScaler
>>> transformer = RobustScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
RobustScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ 0    ┆ a    ┆ -1.0     ┆ -1.0     │
│ 1    ┆ 1    ┆ 10   ┆ b    ┆ -0.6     ┆ -0.6     │
│ 2    ┆ 2    ┆ 20   ┆ c    ┆ -0.2     ┆ -0.2     │
│ 3    ┆ 3    ┆ 30   ┆ d    ┆ 0.2      ┆ 0.2      │
│ 4    ┆ 4    ┆ 40   ┆ e    ┆ 0.6      ┆ 0.6      │
│ 5    ┆ 5    ┆ 50   ┆ f    ┆ 1.0      ┆ 1.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.Sequential ¶

Bases: BaseTransformer

Implement a polars.DataFrame transformer to apply sequentially several transformers.

Parameters:

Name	Type	Description	Default
`transformers`	`Sequence[BaseTransformer \| dict]`	The transformers or their configurations.	required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Sequential, InplaceCast
>>> transformer = Sequential(
...     [
...         InplaceCast(columns=["col1"], dtype=pl.Float32),
...         InplaceCast(columns=["col2"], dtype=pl.Int64),
...     ]
... )
>>> transformer
SequentialTransformer(
  (0): InplaceCastTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', dtype=Float32)
  (1): InplaceCastTransformer(columns=('col2',), exclude_columns=(), missing_policy='raise', dtype=Int64)
)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ f32  ┆ i64  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1.0  ┆ 1    ┆ a     ┆ a     │
│ 2.0  ┆ 2    ┆  b    ┆  b    │
│ 3.0  ┆ 3    ┆   c   ┆   c   │
│ 4.0  ┆ 4    ┆ d     ┆ d     │
│ 5.0  ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘

grizz.transformer.SequentialTransformer ¶

Bases: BaseTransformer

Implement a polars.DataFrame transformer to apply sequentially several transformers.

Parameters:

Name	Type	Description	Default
`transformers`	`Sequence[BaseTransformer \| dict]`	The transformers or their configurations.	required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Sequential, InplaceCast
>>> transformer = Sequential(
...     [
...         InplaceCast(columns=["col1"], dtype=pl.Float32),
...         InplaceCast(columns=["col2"], dtype=pl.Int64),
...     ]
... )
>>> transformer
SequentialTransformer(
  (0): InplaceCastTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', dtype=Float32)
  (1): InplaceCastTransformer(columns=('col2',), exclude_columns=(), missing_policy='raise', dtype=Int64)
)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ f32  ┆ i64  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1.0  ┆ 1    ┆ a     ┆ a     │
│ 2.0  ┆ 2    ┆  b    ┆  b    │
│ 3.0  ┆ 3    ┆   c   ┆   c   │
│ 4.0  ┆ 4    ┆ d     ┆ d     │
│ 5.0  ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘

grizz.transformer.ShrinkMemory ¶

Bases: BaseArgTransformer

Implement a transformer that shrinks DataFrame memory usage.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ShrinkMemory
>>> transformer = ShrinkMemory()
>>> transformer
ShrinkMemoryTransformer()
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.ShrinkMemoryTransformer ¶

Bases: BaseArgTransformer

Implement a transformer that shrinks DataFrame memory usage.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ShrinkMemory
>>> transformer = ShrinkMemory()
>>> transformer
ShrinkMemoryTransformer()
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.SimpleImputer ¶

Implement a transformer to impute missing values with simple strategies.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.impute.SimpleImputer`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SimpleImputer
>>> transformer = SimpleImputer(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
SimpleImputerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, None, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [float("nan"), 10, 20, 30, 40, None],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ NaN  ┆ a    │
│ 1    ┆ 1    ┆ 10.0 ┆ b    │
│ null ┆ 2    ┆ 20.0 ┆ c    │
│ 3    ┆ 3    ┆ 30.0 ┆ d    │
│ 4    ┆ 4    ┆ 40.0 ┆ e    │
│ 5    ┆ 5    ┆ null ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ f64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ NaN  ┆ a    ┆ 0.0      ┆ 25.0     │
│ 1    ┆ 1    ┆ 10.0 ┆ b    ┆ 1.0      ┆ 10.0     │
│ null ┆ 2    ┆ 20.0 ┆ c    ┆ null     ┆ 20.0     │
│ 3    ┆ 3    ┆ 30.0 ┆ d    ┆ 3.0      ┆ 30.0     │
│ 4    ┆ 4    ┆ 40.0 ┆ e    ┆ 4.0      ┆ 40.0     │
│ 5    ┆ 5    ┆ null ┆ f    ┆ 5.0      ┆ null     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.SimpleImputerTransformer ¶

Implement a transformer to impute missing values with simple strategies.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.impute.SimpleImputer`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SimpleImputer
>>> transformer = SimpleImputer(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
SimpleImputerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, None, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [float("nan"), 10, 20, 30, 40, None],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ NaN  ┆ a    │
│ 1    ┆ 1    ┆ 10.0 ┆ b    │
│ null ┆ 2    ┆ 20.0 ┆ c    │
│ 3    ┆ 3    ┆ 30.0 ┆ d    │
│ 4    ┆ 4    ┆ 40.0 ┆ e    │
│ 5    ┆ 5    ┆ null ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ f64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ NaN  ┆ a    ┆ 0.0      ┆ 25.0     │
│ 1    ┆ 1    ┆ 10.0 ┆ b    ┆ 1.0      ┆ 10.0     │
│ null ┆ 2    ┆ 20.0 ┆ c    ┆ null     ┆ 20.0     │
│ 3    ┆ 3    ┆ 30.0 ┆ d    ┆ 3.0      ┆ 30.0     │
│ 4    ┆ 4    ┆ 40.0 ┆ e    ┆ 4.0      ┆ 40.0     │
│ 5    ┆ 5    ┆ null ┆ f    ┆ 5.0      ┆ null     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.Sort ¶

Bases: BaseInNTransformer

Implement a transformer to sort the DataFrame by the given columns.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to use to sort the rows.	`None`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments to pass to `sort`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Sort
>>> transformer = Sort(columns=["col3", "col1"])
>>> transformer
SortTransformer(columns=('col3', 'col1'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {"col1": [1, 2, None], "col2": [6.0, 5.0, 4.0], "col3": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ null ┆ 4.0  ┆ b    │
│ 2    ┆ 5.0  ┆ c    │
└──────┴──────┴──────┘

grizz.transformer.SortColumns ¶

Bases: BaseArgTransformer

Implement a transformer to sort the DataFrame columns by name.

Parameters:

Name	Type	Description	Default
`reverse`	`bool`	If set to `False`, then the columns are sorted by alphabetical order.	`False`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SortColumns
>>> transformer = SortColumns()
>>> transformer
SortColumnsTransformer(reverse=False)
>>> frame = pl.DataFrame(
...     {"col2": [1, 2, None], "col3": [6.0, 5.0, 4.0], "col1": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col2 ┆ col3 ┆ col1 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ str  ┆ i64  ┆ f64  │
╞══════╪══════╪══════╡
│ a    ┆ 1    ┆ 6.0  │
│ c    ┆ 2    ┆ 5.0  │
│ b    ┆ null ┆ 4.0  │
└──────┴──────┴──────┘

grizz.transformer.SortColumnsTransformer ¶

Bases: BaseArgTransformer

Implement a transformer to sort the DataFrame columns by name.

Parameters:

Name	Type	Description	Default
`reverse`	`bool`	If set to `False`, then the columns are sorted by alphabetical order.	`False`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SortColumns
>>> transformer = SortColumns()
>>> transformer
SortColumnsTransformer(reverse=False)
>>> frame = pl.DataFrame(
...     {"col2": [1, 2, None], "col3": [6.0, 5.0, 4.0], "col1": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col2 ┆ col3 ┆ col1 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ str  ┆ i64  ┆ f64  │
╞══════╪══════╪══════╡
│ a    ┆ 1    ┆ 6.0  │
│ c    ┆ 2    ┆ 5.0  │
│ b    ┆ null ┆ 4.0  │
└──────┴──────┴──────┘

grizz.transformer.SortTransformer ¶

Bases: BaseInNTransformer

Implement a transformer to sort the DataFrame by the given columns.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to use to sort the rows.	`None`
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments to pass to `sort`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Sort
>>> transformer = Sort(columns=["col3", "col1"])
>>> transformer
SortTransformer(columns=('col3', 'col1'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {"col1": [1, 2, None], "col2": [6.0, 5.0, 4.0], "col3": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ null ┆ 4.0  ┆ b    │
│ 2    ┆ 5.0  ┆ c    │
└──────┴──────┴──────┘

grizz.transformer.SqlTransformer ¶

Bases: BaseArgTransformer

Implement a transformer that executes a SQL query against the DataFrame.

Parameters:

Name	Type	Description	Default
`query`	`str`	The SQL query to execute.	required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SqlTransformer
>>> transformer = SqlTransformer(query="SELECT col1, col4 FROM self WHERE col1 > 2")
>>> transformer
SqlTransformer(
  (query): SELECT col1, col4 FROM self WHERE col1 > 2
)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (3, 2)
┌──────┬──────┐
│ col1 ┆ col4 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 3    ┆ c    │
│ 4    ┆ d    │
│ 5    ┆ e    │
└──────┴──────┘

grizz.transformer.StandardScaler ¶

Implement a transformer to standardize each column by removing the mean and scaling to unit variance.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.StandardScaler`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StandardScaler
>>> transformer = StandardScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
StandardScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬───────────┬───────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out  ┆ col3_out  │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---       ┆ ---       │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64       ┆ f64       │
╞══════╪══════╪══════╪══════╪═══════════╪═══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ -1.414214 ┆ -1.414214 │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ -0.707107 ┆ -0.707107 │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ 0.0       ┆ 0.0       │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ 0.707107  ┆ 0.707107  │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ 1.414214  ┆ 1.414214  │
└──────┴──────┴──────┴──────┴───────────┴───────────┘

grizz.transformer.StandardScalerTransformer ¶

Implement a transformer to standardize each column by removing the mean and scaling to unit variance.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to scale. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`propagate_nulls`	`bool`	If set to `True`, the `None` values are propagated after the transformation. If `False`, the `None` values are replaced by NaNs.	`True`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `sklearn.preprocessing.StandardScaler`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StandardScaler
>>> transformer = StandardScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
StandardScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬───────────┬───────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out  ┆ col3_out  │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---       ┆ ---       │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64       ┆ f64       │
╞══════╪══════╪══════╪══════╪═══════════╪═══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ -1.414214 ┆ -1.414214 │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ -0.707107 ┆ -0.707107 │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ 0.0       ┆ 0.0       │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ 0.707107  ┆ 0.707107  │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ 1.414214  ┆ 1.414214  │
└──────┴──────┴──────┴──────┴───────────┴───────────┘

grizz.transformer.StringToDatetime ¶

Implement a transformer to convert some string columns to polars.Datetime type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `to_datetime`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StringToDatetime
>>> transformer = StringToDatetime(columns=["col1"], prefix="", suffix="_out")
>>> transformer
StringToDatetimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌─────────────────────┬──────┬─────────────────────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                ┆ col1_out            │
│ ---                 ┆ ---  ┆ ---                 ┆ ---                 │
│ str                 ┆ str  ┆ str                 ┆ datetime[μs]        │
╞═════════════════════╪══════╪═════════════════════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 ┆ 2020-01-01 01:01:01 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 ┆ 2020-01-01 02:02:02 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 ┆ 2020-01-01 12:00:01 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 ┆ 2020-01-01 18:18:18 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 ┆ 2020-01-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┴─────────────────────┘

grizz.transformer.StringToDatetimeTransformer ¶

Implement a transformer to convert some string columns to polars.Datetime type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `to_datetime`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StringToDatetime
>>> transformer = StringToDatetime(columns=["col1"], prefix="", suffix="_out")
>>> transformer
StringToDatetimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌─────────────────────┬──────┬─────────────────────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                ┆ col1_out            │
│ ---                 ┆ ---  ┆ ---                 ┆ ---                 │
│ str                 ┆ str  ┆ str                 ┆ datetime[μs]        │
╞═════════════════════╪══════╪═════════════════════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 ┆ 2020-01-01 01:01:01 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 ┆ 2020-01-01 02:02:02 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 ┆ 2020-01-01 12:00:01 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 ┆ 2020-01-01 18:18:18 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 ┆ 2020-01-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┴─────────────────────┘

grizz.transformer.StringToTime ¶

Implement a transformer to convert some string columns to polars.Time type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `polars.Expr.str.to_time`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StringToTime
>>> transformer = StringToTime(
...     columns=["col1"], format="%H:%M:%S", prefix="", suffix="_out"
... )
>>> transformer
StringToTimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', format='%H:%M:%S')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────────┬──────┬──────────┬──────────┐
│ col1     ┆ col2 ┆ col3     ┆ col1_out │
│ ---      ┆ ---  ┆ ---      ┆ ---      │
│ str      ┆ str  ┆ str      ┆ time     │
╞══════════╪══════╪══════════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 ┆ 23:59:59 │
└──────────┴──────┴──────────┴──────────┘

grizz.transformer.StringToTimeTransformer ¶

Implement a transformer to convert some string columns to polars.Time type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `polars.Expr.str.to_time`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StringToTime
>>> transformer = StringToTime(
...     columns=["col1"], format="%H:%M:%S", prefix="", suffix="_out"
... )
>>> transformer
StringToTimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', format='%H:%M:%S')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────────┬──────┬──────────┬──────────┐
│ col1     ┆ col2 ┆ col3     ┆ col1_out │
│ ---      ┆ ---  ┆ ---      ┆ ---      │
│ str      ┆ str  ┆ str      ┆ time     │
╞══════════╪══════╪══════════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 ┆ 23:59:59 │
└──────────┴──────┴──────────┴──────────┘

grizz.transformer.StripChars ¶

Implement a transformer to remove leading and trailing characters.

This transformer ignores the columns that are not of type string.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to prepare. If `None`, it processes all the columns of type string.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `strip_chars`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> transformer = StripChars(columns=["col2", "col3"], prefix="", suffix="_out")
>>> transformer
StripCharsTransformer(columns=('col2', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬───────┬───────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  ┆ col2_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---   ┆ ---   ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ str   ┆ str   ┆ str      ┆ str      │
╞══════╪══════╪═══════╪═══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ a     ┆ a     ┆ 1        ┆ a        │
│ 2    ┆ 2    ┆  b    ┆  b    ┆ 2        ┆ b        │
│ 3    ┆ 3    ┆   c   ┆   c   ┆ 3        ┆ c        │
│ 4    ┆ 4    ┆ d     ┆ d     ┆ 4        ┆ d        │
│ 5    ┆ 5    ┆ e     ┆ e     ┆ 5        ┆ e        │
└──────┴──────┴───────┴───────┴──────────┴──────────┘

grizz.transformer.StripCharsTransformer ¶

Implement a transformer to remove leading and trailing characters.

This transformer ignores the columns that are not of type string.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to prepare. If `None`, it processes all the columns of type string.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `strip_chars`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> transformer = StripChars(columns=["col2", "col3"], prefix="", suffix="_out")
>>> transformer
StripCharsTransformer(columns=('col2', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬───────┬───────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  ┆ col2_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---   ┆ ---   ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ str   ┆ str   ┆ str      ┆ str      │
╞══════╪══════╪═══════╪═══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ a     ┆ a     ┆ 1        ┆ a        │
│ 2    ┆ 2    ┆  b    ┆  b    ┆ 2        ┆ b        │
│ 3    ┆ 3    ┆   c   ┆   c   ┆ 3        ┆ c        │
│ 4    ┆ 4    ┆ d     ┆ d     ┆ 4        ┆ d        │
│ 5    ┆ 5    ┆ e     ┆ e     ┆ 5        ┆ e        │
└──────┴──────┴───────┴───────┴──────────┴──────────┘

grizz.transformer.SumHorizontal ¶

Implement a transformer to sum all values horizontally across columns and store the result in a column.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to sum. The columns should be compatible. If `None`, it processes all the columns.	required
`out_col`	`str`	The output column.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `polars.sum_horizontal`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SumHorizontal
>>> transformer = SumHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
SumHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ 63  │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ 66  │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ 69  │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ 72  │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ 75  │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.SumHorizontalTransformer ¶

Implement a transformer to sum all values horizontally across columns and store the result in a column.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns to sum. The columns should be compatible. If `None`, it processes all the columns.	required
`out_col`	`str`	The output column.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	Additional arguments passed to `polars.sum_horizontal`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SumHorizontal
>>> transformer = SumHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
SumHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ 63  │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ 66  │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ 69  │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ 72  │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ 75  │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.TimeDiff ¶

Bases: BaseArgTransformer

Implement a transformer to compute the time difference between consecutive time steps.

Parameters:

Name	Type	Description	Default
`group_cols`	`Sequence[str]`	The columns used to generate the group for each sequence.	required
`time_col`	`str`	The input time column name.	required
`time_diff_col`	`str`	The output time difference column name.	required
`shift`	`int`	The number of slots to shift.	`1`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import TimeDiff
>>> transformer = TimeDiff(group_cols=["col"], time_col="time", time_diff_col="diff")
>>> transformer
TimeDiffTransformer(group_cols=['col'], time_col='time', time_diff_col='diff', shift=1)
>>> frame = pl.DataFrame({"col": ["a", "b", "a", "a", "b"], "time": [1, 2, 3, 4, 5]})
>>> frame
shape: (5, 2)
┌─────┬──────┐
│ col ┆ time │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ a   ┆ 3    │
│ a   ┆ 4    │
│ b   ┆ 5    │
└─────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────┬──────┬──────┐
│ col ┆ time ┆ diff │
│ --- ┆ ---  ┆ ---  │
│ str ┆ i64  ┆ i64  │
╞═════╪══════╪══════╡
│ a   ┆ 1    ┆ 0    │
│ a   ┆ 3    ┆ 2    │
│ a   ┆ 4    ┆ 1    │
│ b   ┆ 2    ┆ 0    │
│ b   ┆ 5    ┆ 3    │
└─────┴──────┴──────┘

grizz.transformer.TimeDiffTransformer ¶

Bases: BaseArgTransformer

Implement a transformer to compute the time difference between consecutive time steps.

Parameters:

Name	Type	Description	Default
`group_cols`	`Sequence[str]`	The columns used to generate the group for each sequence.	required
`time_col`	`str`	The input time column name.	required
`time_diff_col`	`str`	The output time difference column name.	required
`shift`	`int`	The number of slots to shift.	`1`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import TimeDiff
>>> transformer = TimeDiff(group_cols=["col"], time_col="time", time_diff_col="diff")
>>> transformer
TimeDiffTransformer(group_cols=['col'], time_col='time', time_diff_col='diff', shift=1)
>>> frame = pl.DataFrame({"col": ["a", "b", "a", "a", "b"], "time": [1, 2, 3, 4, 5]})
>>> frame
shape: (5, 2)
┌─────┬──────┐
│ col ┆ time │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ a   ┆ 3    │
│ a   ┆ 4    │
│ b   ┆ 5    │
└─────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────┬──────┬──────┐
│ col ┆ time ┆ diff │
│ --- ┆ ---  ┆ ---  │
│ str ┆ i64  ┆ i64  │
╞═════╪══════╪══════╡
│ a   ┆ 1    ┆ 0    │
│ a   ┆ 3    ┆ 2    │
│ a   ┆ 4    ┆ 1    │
│ b   ┆ 2    ┆ 0    │
│ b   ┆ 5    ┆ 3    │
└─────┴──────┴──────┘

grizz.transformer.TimeToSecond ¶

Implement a transformer to convert a column with time values to seconds.

Parameters:

Name	Type	Description	Default
`in_col`	`str`	The input column with the time value to convert.	required
`out_col`	`str`	The output column with the time in seconds.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import datetime
>>> import polars as pl
>>> from grizz.transformer import TimeToSecond
>>> transformer = TimeToSecond(in_col="time", out_col="second")
>>> transformer
TimeToSecondTransformer(in_col='time', out_col='second', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "time": [
...             datetime.time(0, 0, 1, 890000),
...             datetime.time(0, 1, 1, 890000),
...             datetime.time(1, 1, 1, 890000),
...             datetime.time(0, 19, 19, 890000),
...             datetime.time(19, 19, 19, 890000),
...         ],
...         "col": ["a", "b", "c", "d", "e"],
...     },
...     schema={"time": pl.Time, "col": pl.String},
... )
>>> frame
shape: (5, 2)
┌──────────────┬─────┐
│ time         ┆ col │
│ ---          ┆ --- │
│ time         ┆ str │
╞══════════════╪═════╡
│ 00:00:01.890 ┆ a   │
│ 00:01:01.890 ┆ b   │
│ 01:01:01.890 ┆ c   │
│ 00:19:19.890 ┆ d   │
│ 19:19:19.890 ┆ e   │
└──────────────┴─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────────┬─────┬──────────┐
│ time         ┆ col ┆ second   │
│ ---          ┆ --- ┆ ---      │
│ time         ┆ str ┆ f64      │
╞══════════════╪═════╪══════════╡
│ 00:00:01.890 ┆ a   ┆ 1.89     │
│ 00:01:01.890 ┆ b   ┆ 61.89    │
│ 01:01:01.890 ┆ c   ┆ 3661.89  │
│ 00:19:19.890 ┆ d   ┆ 1159.89  │
│ 19:19:19.890 ┆ e   ┆ 69559.89 │
└──────────────┴─────┴──────────┘

grizz.transformer.TimeToSecondTransformer ¶

Implement a transformer to convert a column with time values to seconds.

Parameters:

Name	Type	Description	Default
`in_col`	`str`	The input column with the time value to convert.	required
`out_col`	`str`	The output column with the time in seconds.	required
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`

Example usage:

>>> import datetime
>>> import polars as pl
>>> from grizz.transformer import TimeToSecond
>>> transformer = TimeToSecond(in_col="time", out_col="second")
>>> transformer
TimeToSecondTransformer(in_col='time', out_col='second', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "time": [
...             datetime.time(0, 0, 1, 890000),
...             datetime.time(0, 1, 1, 890000),
...             datetime.time(1, 1, 1, 890000),
...             datetime.time(0, 19, 19, 890000),
...             datetime.time(19, 19, 19, 890000),
...         ],
...         "col": ["a", "b", "c", "d", "e"],
...     },
...     schema={"time": pl.Time, "col": pl.String},
... )
>>> frame
shape: (5, 2)
┌──────────────┬─────┐
│ time         ┆ col │
│ ---          ┆ --- │
│ time         ┆ str │
╞══════════════╪═════╡
│ 00:00:01.890 ┆ a   │
│ 00:01:01.890 ┆ b   │
│ 01:01:01.890 ┆ c   │
│ 00:19:19.890 ┆ d   │
│ 19:19:19.890 ┆ e   │
└──────────────┴─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────────┬─────┬──────────┐
│ time         ┆ col ┆ second   │
│ ---          ┆ --- ┆ ---      │
│ time         ┆ str ┆ f64      │
╞══════════════╪═════╪══════════╡
│ 00:00:01.890 ┆ a   ┆ 1.89     │
│ 00:01:01.890 ┆ b   ┆ 61.89    │
│ 01:01:01.890 ┆ c   ┆ 3661.89  │
│ 00:19:19.890 ┆ d   ┆ 1159.89  │
│ 19:19:19.890 ┆ e   ┆ 69559.89 │
└──────────────┴─────┴──────────┘

grizz.transformer.ToDatetime ¶

Implement a transformer to convert some columns to a polars.Datetime type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `to_datetime`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ToDatetime
>>> transformer = ToDatetime(columns=["col1"], prefix="", suffix="_out")
>>> transformer
ToDatetimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌─────────────────────┬──────┬─────────────────────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                ┆ col1_out            │
│ ---                 ┆ ---  ┆ ---                 ┆ ---                 │
│ str                 ┆ str  ┆ str                 ┆ datetime[μs]        │
╞═════════════════════╪══════╪═════════════════════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 ┆ 2020-01-01 01:01:01 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 ┆ 2020-01-01 02:02:02 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 ┆ 2020-01-01 12:00:01 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 ┆ 2020-01-01 18:18:18 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 ┆ 2020-01-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┴─────────────────────┘

grizz.transformer.ToDatetimeTransformer ¶

Implement a transformer to convert some columns to a polars.Datetime type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `to_datetime`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ToDatetime
>>> transformer = ToDatetime(columns=["col1"], prefix="", suffix="_out")
>>> transformer
ToDatetimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌─────────────────────┬──────┬─────────────────────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                ┆ col1_out            │
│ ---                 ┆ ---  ┆ ---                 ┆ ---                 │
│ str                 ┆ str  ┆ str                 ┆ datetime[μs]        │
╞═════════════════════╪══════╪═════════════════════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 ┆ 2020-01-01 01:01:01 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 ┆ 2020-01-01 02:02:02 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 ┆ 2020-01-01 12:00:01 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 ┆ 2020-01-01 18:18:18 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 ┆ 2020-01-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┴─────────────────────┘

grizz.transformer.ToTime ¶

Implement a transformer to convert some columns to a polars.Time type.

Parameters:

Name	Type	Description	Default
`columns`	`Sequence[str] \| None`	The columns of type to convert. `None` means all the columns.	required
`prefix`	`str`	The column name prefix for the output columns.	required
`suffix`	`str`	The column name suffix for the output columns.	required
`exclude_columns`	`Sequence[str]`	The columns to exclude from the input `columns`. If any column is not found, it will be ignored during the filtering process.	`()`
`exist_policy`	`str`	The policy on how to handle existing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column already exist. If `'warn'`, a warning is raised if at least one column already exist and the existing columns are overwritten. If `'ignore'`, the existing columns are overwritten and no warning message appears.	`'raise'`
`missing_policy`	`str`	The policy on how to handle missing columns. The following options are available: `'ignore'`, `'warn'`, and `'raise'`. If `'raise'`, an exception is raised if at least one column is missing. If `'warn'`, a warning is raised if at least one column is missing and the missing columns are ignored. If `'ignore'`, the missing columns are ignored and no warning message appears.	`'raise'`
`**kwargs`	`Any`	The keyword arguments for `to_time`.	`{}`

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ToTime
>>> transformer = ToTime(columns=["col1"], format="%H:%M:%S", prefix="", suffix="_out")
>>> transformer
ToTimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', format='%H:%M:%S')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────────┬──────┬──────────┬──────────┐
│ col1     ┆ col2 ┆ col3     ┆ col1_out │
│ ---      ┆ ---  ┆ ---      ┆ ---      │
│ str      ┆ str  ┆ str      ┆ time     │
╞══════════╪══════╪══════════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 ┆ 23:59:59 │
└──────────┴──────┴──────────┴──────────┘

grizz.transformer.ToTimeTransformer ¶