Skip to content

transformer

grizz.transformer

Contain polars.DataFrame transformers.

grizz.transformer.AbsDiffHorizontal

Bases: BaseIn2Out1Transformer

Implement a transformer to compute the absolute difference between two columns.

Internally, this tranformer computes: out = abs(in1 - in2)

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import AbsDiffHorizontal
>>> transformer = AbsDiffHorizontal(in1_col="col1", in2_col="col2", out_col="diff")
>>> transformer
AbsDiffHorizontalTransformer(in1_col='col1', in2_col='col2', out_col='diff', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ diff │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    ┆ 4    │
│ 2    ┆ 4    ┆ b    ┆ 2    │
│ 3    ┆ 3    ┆ c    ┆ 0    │
│ 4    ┆ 2    ┆ d    ┆ 2    │
│ 5    ┆ 1    ┆ e    ┆ 4    │
└──────┴──────┴──────┴──────┘

grizz.transformer.AbsDiffHorizontalTransformer

Bases: BaseIn2Out1Transformer

Implement a transformer to compute the absolute difference between two columns.

Internally, this tranformer computes: out = abs(in1 - in2)

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import AbsDiffHorizontal
>>> transformer = AbsDiffHorizontal(in1_col="col1", in2_col="col2", out_col="diff")
>>> transformer
AbsDiffHorizontalTransformer(in1_col='col1', in2_col='col2', out_col='diff', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ diff │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    ┆ 4    │
│ 2    ┆ 4    ┆ b    ┆ 2    │
│ 3    ┆ 3    ┆ c    ┆ 0    │
│ 4    ┆ 2    ┆ d    ┆ 2    │
│ 5    ┆ 1    ┆ e    ┆ 4    │
└──────┴──────┴──────┴──────┘

grizz.transformer.BaseArgTransformer

Bases: BaseTransformer

Define a base class to implement transformers with custom arguments.

grizz.transformer.BaseArgTransformer._fit_data abstractmethod

_fit_data(frame: DataFrame) -> None

Fit to the data in the polars.DataFrame.

Parameters:

Name Type Description Default
frame DataFrame

The polars.DataFrame to fit.

required

grizz.transformer.BaseArgTransformer._transform_data abstractmethod

_transform_data(frame: DataFrame) -> DataFrame

Transform the data in the polars.DataFrame.

Parameters:

Name Type Description Default
frame DataFrame

The polars.DataFrame to transform.

required

Returns:

Type Description
DataFrame

The transformed DataFrame.

grizz.transformer.BaseArgTransformer.get_args abstractmethod

get_args() -> dict

Get the arguments of the transformer.

Returns:

Type Description
dict

The arguments of the transformer.

grizz.transformer.BaseIn1Out1Transformer

Bases: BaseArgTransformer

Define a base class to implement polars.DataFrame transformers that takes one input column and generate one output column.

Parameters:

Name Type Description Default
in_col str

The input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

grizz.transformer.BaseIn1Out1Transformer._check_input_column

_check_input_column(frame: DataFrame) -> None

Check if the input column is missing.

Parameters:

Name Type Description Default
frame DataFrame

The input DataFrame to check.

required

grizz.transformer.BaseIn1Out1Transformer._check_output_column

_check_output_column(frame: DataFrame) -> None

Check if the output column already exists.

Parameters:

Name Type Description Default
frame DataFrame

The input DataFrame to check.

required

grizz.transformer.BaseIn1Out1Transformer._fit abstractmethod

_fit(frame: DataFrame) -> DataFrame

Fit to the data in the polars.DataFrame.

Parameters:

Name Type Description Default
frame DataFrame

The polars.DataFrame to fit.

required

grizz.transformer.BaseIn1Out1Transformer._transform abstractmethod

_transform(frame: DataFrame) -> DataFrame

Transform the data in the polars.DataFrame.

Parameters:

Name Type Description Default
frame DataFrame

The polars.DataFrame to transform.

required

Returns:

Type Description
DataFrame

The transformed DataFrame.

grizz.transformer.BaseIn2Out1Transformer

Bases: BaseArgTransformer

Define a base class to implement polars.DataFrame transformers that takes two input columns and generate one output column.

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import AbsDiffHorizontal
>>> transformer = AbsDiffHorizontal(in1_col="col1", in2_col="col2", out_col="diff")
>>> transformer
AbsDiffHorizontalTransformer(in1_col='col1', in2_col='col2', out_col='diff', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ diff │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    ┆ 4    │
│ 2    ┆ 4    ┆ b    ┆ 2    │
│ 3    ┆ 3    ┆ c    ┆ 0    │
│ 4    ┆ 2    ┆ d    ┆ 2    │
│ 5    ┆ 1    ┆ e    ┆ 4    │
└──────┴──────┴──────┴──────┘

grizz.transformer.BaseIn2Out1Transformer._check_input_columns

_check_input_columns(frame: DataFrame) -> None

Check if any of the input columns is missing.

Parameters:

Name Type Description Default
frame DataFrame

The input DataFrame to check.

required

grizz.transformer.BaseIn2Out1Transformer._check_output_column

_check_output_column(frame: DataFrame) -> None

Check if the output column already exists.

Parameters:

Name Type Description Default
frame DataFrame

The input DataFrame to check.

required

grizz.transformer.BaseIn2Out1Transformer._fit abstractmethod

_fit(frame: DataFrame) -> DataFrame

Fit to the data in the polars.DataFrame.

Parameters:

Name Type Description Default
frame DataFrame

The polars.DataFrame to fit.

required

grizz.transformer.BaseIn2Out1Transformer._transform abstractmethod

_transform(frame: DataFrame) -> DataFrame

Transform the data in the polars.DataFrame.

Parameters:

Name Type Description Default
frame DataFrame

The polars.DataFrame to transform.

required

Returns:

Type Description
DataFrame

The transformed DataFrame.

grizz.transformer.BaseInNOut1Transformer

Bases: BaseInNTransformer

Define a base class to implement polars.DataFrame transformers that generate a single output column by using multiple input columns.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to prepare. If None, it processes all the columns.

required
out_col str

The output column.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col          │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---          │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ list[i64]    │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ [11, 21, 31] │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ [12, 22, 32] │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ [13, 23, 33] │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ [14, 24, 34] │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘

grizz.transformer.BaseInNOut1Transformer._check_output_column

_check_output_column(frame: DataFrame) -> None

Check if the output column already exists.

Parameters:

Name Type Description Default
frame DataFrame

The input DataFrame to check.

required

grizz.transformer.BaseInNOutNTransformer

Bases: BaseInNTransformer

Define a base class to implement polars.DataFrame transformers that has N input columns and N output columns.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to prepare. If None, it processes all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col          │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---          │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ list[i64]    │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ [11, 21, 31] │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ [12, 22, 32] │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ [13, 23, 33] │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ [14, 24, 34] │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘

grizz.transformer.BaseInNOutNTransformer._check_output_column

_check_output_column(frame: DataFrame) -> None

Check if the output column already exists.

Parameters:

Name Type Description Default
frame DataFrame

The input DataFrame to check.

required

grizz.transformer.BaseInNTransformer

Bases: BaseArgTransformer

Define a base class to implement polars.DataFrame transformers that transform DataFrames by using multiple input columns.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to prepare. If None, it processes all the columns.

None
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> transformer = DropNullRow()
>>> transformer
DropNullRowTransformer(columns=None, exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, None, 3, None, None],
...         "col3": [None, None, None, None, None],
...     }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null       ┆ null ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
└────────────┴──────┴──────┘

grizz.transformer.BaseInNTransformer._check_input_columns

_check_input_columns(frame: DataFrame) -> None

Check if some input columns are missing.

Parameters:

Name Type Description Default
frame DataFrame

The input DataFrame to check.

required

grizz.transformer.BaseInNTransformer._fit abstractmethod

_fit(frame: DataFrame) -> DataFrame

Fit to the data in the polars.DataFrame.

Parameters:

Name Type Description Default
frame DataFrame

The polars.DataFrame to fit.

required

grizz.transformer.BaseInNTransformer._transform abstractmethod

_transform(frame: DataFrame) -> DataFrame

Transform the data in the polars.DataFrame.

Parameters:

Name Type Description Default
frame DataFrame

The polars.DataFrame to transform.

required

Returns:

Type Description
DataFrame

The transformed DataFrame.

grizz.transformer.BaseInNTransformer.find_columns

find_columns(frame: DataFrame) -> tuple[str, ...]

Find the columns to transform.

Parameters:

Name Type Description Default
frame DataFrame

The input DataFrame. Sometimes the columns to transform are found by analyzing the input DataFrame.

required

Returns:

Type Description
tuple[str, ...]

The columns to transform.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> transformer = DropNullRow(columns=["col2", "col3"])
>>> transformer.find_columns(frame)
('col2', 'col3')
>>> transformer = DropNullRow()
>>> transformer.find_columns(frame)
('col1', 'col2', 'col3', 'col4')

grizz.transformer.BaseInNTransformer.find_common_columns

find_common_columns(frame: DataFrame) -> tuple[str, ...]

Find the common columns between the DataFrame columns and the input columns.

Parameters:

Name Type Description Default
frame DataFrame

The input DataFrame. Sometimes the columns to transform are found by analyzing the input DataFrame.

required

Returns:

Type Description
tuple[str, ...]

The common columns.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> transformer = DropNullRow(columns=["col2", "col3", "col5"])
>>> transformer.find_common_columns(frame)
('col2', 'col3')
>>> transformer = DropNullRow()
>>> transformer.find_common_columns(frame)
('col1', 'col2', 'col3', 'col4')

grizz.transformer.BaseInNTransformer.find_missing_columns

find_missing_columns(frame: DataFrame) -> tuple[str, ...]

Find the missing columns.

Parameters:

Name Type Description Default
frame DataFrame

The input DataFrame. Sometimes the columns to transform are found by analyzing the input DataFrame.

required

Returns:

Type Description
tuple[str, ...]

The missing columns.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> transformer = DropNullRow(columns=["col2", "col3", "col5"])
>>> transformer.find_missing_columns(frame)
('col5',)
>>> transformer = DropNullRow()
>>> transformer.find_missing_columns(frame)
()

grizz.transformer.BaseTransformer

Bases: ABC

Define the base class to transform a polars.DataFrame.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
InplaceCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.BaseTransformer.equal abstractmethod

equal(other: Any, equal_nan: bool = False) -> bool

Indicate if two objects are equal or not.

Parameters:

Name Type Description Default
other Any

The other object to compare.

required
equal_nan bool

Whether to compare NaN's as equal. If True, NaN's in both objects will be considered equal.

False

Returns:

Type Description
bool

True if the two are equal, otherwise False.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> obj1 = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> obj2 = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> obj3 = InplaceCast(columns=["col2", "col3"], dtype=pl.Float32)
>>> obj1.equal(obj2)
True
>>> obj1.equal(obj3)
False

grizz.transformer.BaseTransformer.fit abstractmethod

fit(frame: DataFrame) -> None

Fit to the data in the polars.DataFrame.

Parameters:

Name Type Description Default
frame DataFrame

The polars.DataFrame to fit.

required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> transformer.fit(frame)
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.BaseTransformer.fit_transform abstractmethod

fit_transform(frame: DataFrame) -> None

Fit to the data, then transform it.

Parameters:

Name Type Description Default
frame DataFrame

The polars.DataFrame to fit.

required

Returns:

Type Description
None

The transformed DataFrame.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.BaseTransformer.transform abstractmethod

transform(frame: DataFrame) -> DataFrame

Transform the data in the polars.DataFrame.

Parameters:

Name Type Description Default
frame DataFrame

The polars.DataFrame to transform.

required

Returns:

Type Description
DataFrame

The transformed DataFrame.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.Binarizer

Bases: BaseInNOutNTransformer

Implement a transformer to binarize data according to a threshold.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.Binarizer.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Binarizer
>>> transformer = Binarizer(
...     columns=["col1", "col3"], prefix="", suffix="_out", threshold=1.5
... )
>>> transformer
BinarizerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', threshold=1.5)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [5, 4, 3, 2, 1, 0],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 5    ┆ a    │
│ 1    ┆ 1    ┆ 4    ┆ b    │
│ 2    ┆ 2    ┆ 3    ┆ c    │
│ 3    ┆ 3    ┆ 2    ┆ d    │
│ 4    ┆ 4    ┆ 1    ┆ e    │
│ 5    ┆ 5    ┆ 0    ┆ f    │
└──────┴──────┴──────┴──────┘

>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ i64      ┆ i64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ 5    ┆ a    ┆ 0        ┆ 1        │
│ 1    ┆ 1    ┆ 4    ┆ b    ┆ 0        ┆ 1        │
│ 2    ┆ 2    ┆ 3    ┆ c    ┆ 1        ┆ 1        │
│ 3    ┆ 3    ┆ 2    ┆ d    ┆ 1        ┆ 1        │
│ 4    ┆ 4    ┆ 1    ┆ e    ┆ 1        ┆ 0        │
│ 5    ┆ 5    ┆ 0    ┆ f    ┆ 1        ┆ 0        │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.BinarizerTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to binarize data according to a threshold.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.Binarizer.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Binarizer
>>> transformer = Binarizer(
...     columns=["col1", "col3"], prefix="", suffix="_out", threshold=1.5
... )
>>> transformer
BinarizerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', threshold=1.5)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [5, 4, 3, 2, 1, 0],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 5    ┆ a    │
│ 1    ┆ 1    ┆ 4    ┆ b    │
│ 2    ┆ 2    ┆ 3    ┆ c    │
│ 3    ┆ 3    ┆ 2    ┆ d    │
│ 4    ┆ 4    ┆ 1    ┆ e    │
│ 5    ┆ 5    ┆ 0    ┆ f    │
└──────┴──────┴──────┴──────┘

>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ i64      ┆ i64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ 5    ┆ a    ┆ 0        ┆ 1        │
│ 1    ┆ 1    ┆ 4    ┆ b    ┆ 0        ┆ 1        │
│ 2    ┆ 2    ┆ 3    ┆ c    ┆ 1        ┆ 1        │
│ 3    ┆ 3    ┆ 2    ┆ d    ┆ 1        ┆ 1        │
│ 4    ┆ 4    ┆ 1    ┆ e    ┆ 1        ┆ 0        │
│ 5    ┆ 5    ┆ 0    ┆ f    ┆ 1        ┆ 0        │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.Cast

Bases: BaseInNOutNTransformer

Implement a transformer to convert some columns to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32, prefix="", suffix="_out")
>>> transformer
CastTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ str  ┆ str  ┆ i32      ┆ i32      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 1    ┆ a    ┆ 1        ┆ 1        │
│ 2    ┆ 2    ┆ 2    ┆ b    ┆ 2        ┆ 2        │
│ 3    ┆ 3    ┆ 3    ┆ c    ┆ 3        ┆ 3        │
│ 4    ┆ 4    ┆ 4    ┆ d    ┆ 4        ┆ 4        │
│ 5    ┆ 5    ┆ 5    ┆ e    ┆ 5        ┆ 5        │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.CastTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to convert some columns to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32, prefix="", suffix="_out")
>>> transformer
CastTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ str  ┆ str  ┆ i32      ┆ i32      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 1    ┆ a    ┆ 1        ┆ 1        │
│ 2    ┆ 2    ┆ 2    ┆ b    ┆ 2        ┆ 2        │
│ 3    ┆ 3    ┆ 3    ┆ c    ┆ 3        ┆ 3        │
│ 4    ┆ 4    ┆ 4    ┆ d    ┆ 4        ┆ 4        │
│ 5    ┆ 5    ┆ 5    ┆ e    ┆ 5        ┆ 5        │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.CategoricalCast

Bases: BaseIn1Out1Transformer

Implement a transformer to convert a column to categorical data type.

Parameters:

Name Type Description Default
in_col str

The input column name to cast.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to polars.Categorical.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import CategoricalCast
>>> transformer = CategoricalCast(in_col="col1", out_col="out")
>>> transformer
CategoricalCastTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...     },
...     schema={"col1": pl.String, "col2": pl.Float64},
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.0  │
│ b    ┆ 2.0  │
│ c    ┆ 3.0  │
│ d    ┆ 4.0  │
│ e    ┆ 5.0  │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬─────┐
│ col1 ┆ col2 ┆ out │
│ ---  ┆ ---  ┆ --- │
│ str  ┆ f64  ┆ cat │
╞══════╪══════╪═════╡
│ a    ┆ 1.0  ┆ a   │
│ b    ┆ 2.0  ┆ b   │
│ c    ┆ 3.0  ┆ c   │
│ d    ┆ 4.0  ┆ d   │
│ e    ┆ 5.0  ┆ e   │
└──────┴──────┴─────┘

grizz.transformer.CategoricalCastTransformer

Bases: BaseIn1Out1Transformer

Implement a transformer to convert a column to categorical data type.

Parameters:

Name Type Description Default
in_col str

The input column name to cast.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to polars.Categorical.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import CategoricalCast
>>> transformer = CategoricalCast(in_col="col1", out_col="out")
>>> transformer
CategoricalCastTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...     },
...     schema={"col1": pl.String, "col2": pl.Float64},
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.0  │
│ b    ┆ 2.0  │
│ c    ┆ 3.0  │
│ d    ┆ 4.0  │
│ e    ┆ 5.0  │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬─────┐
│ col1 ┆ col2 ┆ out │
│ ---  ┆ ---  ┆ --- │
│ str  ┆ f64  ┆ cat │
╞══════╪══════╪═════╡
│ a    ┆ 1.0  ┆ a   │
│ b    ┆ 2.0  ┆ b   │
│ c    ┆ 3.0  ┆ c   │
│ d    ┆ 4.0  ┆ d   │
│ e    ┆ 5.0  ┆ e   │
└──────┴──────┴─────┘

grizz.transformer.ColumnClose

Bases: BaseIn2Out1Transformer

Implement a transformer to compute a column that indicates if the values of two columns are element-wise equal within a tolerance.

The output column contains True if two columns are element-wise equal within a tolerance. Internally, this tranformer computes: out = (|actual - expected| <= atol + rtol * |expected|)

Parameters:

Name Type Description Default
actual str

The actual input column name. This column must be a numeric column.

required
expected str

The expected input column name. This column must be a numeric column.

required
out_col str

The output column name.

required
atol float

The absolute tolerance parameter.

1e-08
rtol float

The relative tolerance parameter.

1e-05
equal_nan bool

Whether to compare NaN's as equal. If True, NaN's in actual will be considered equal to NaN's in expected in the output column.

False
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnClose
>>> transformer = ColumnClose(actual="col1", expected="col2", out_col="out")
>>> transformer
ColumnCloseTransformer(actual='col1', expected='col2', out_col='out', atol=1e-08, rtol=1e-05, equal_nan=False, exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnCloseTransformer

Bases: BaseIn2Out1Transformer

Implement a transformer to compute a column that indicates if the values of two columns are element-wise equal within a tolerance.

The output column contains True if two columns are element-wise equal within a tolerance. Internally, this tranformer computes: out = (|actual - expected| <= atol + rtol * |expected|)

Parameters:

Name Type Description Default
actual str

The actual input column name. This column must be a numeric column.

required
expected str

The expected input column name. This column must be a numeric column.

required
out_col str

The output column name.

required
atol float

The absolute tolerance parameter.

1e-08
rtol float

The relative tolerance parameter.

1e-05
equal_nan bool

Whether to compare NaN's as equal. If True, NaN's in actual will be considered equal to NaN's in expected in the output column.

False
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnClose
>>> transformer = ColumnClose(actual="col1", expected="col2", out_col="out")
>>> transformer
ColumnCloseTransformer(actual='col1', expected='col2', out_col='out', atol=1e-08, rtol=1e-05, equal_nan=False, exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnEqual

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the equal operation between two columns (in1 == in2).

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnEqual
>>> transformer = ColumnEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnEqualMissing

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the equal operation between two columns (in1 == in2), where null values are not propagated.

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnEqualMissing
>>> transformer = ColumnEqualMissing(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnEqualMissingTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnEqualMissingTransformer

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the equal operation between two columns (in1 == in2), where null values are not propagated.

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnEqualMissing
>>> transformer = ColumnEqualMissing(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnEqualMissingTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnEqualTransformer

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the equal operation between two columns (in1 == in2).

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnEqual
>>> transformer = ColumnEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnGreater

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the greater than operation between two columns (in1 > in2).

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnGreater
>>> transformer = ColumnGreater(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnGreaterTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnGreaterEqual

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the greater than or equal operation between two columns (in1 >= in2).

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnGreaterEqual
>>> transformer = ColumnGreaterEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnGreaterEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnGreaterEqualTransformer

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the greater than or equal operation between two columns (in1 >= in2).

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnGreaterEqual
>>> transformer = ColumnGreaterEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnGreaterEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnGreaterTransformer

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the greater than operation between two columns (in1 > in2).

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnGreater
>>> transformer = ColumnGreater(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnGreaterTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ false │
│ 2    ┆ 4    ┆ b    ┆ false │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnLower

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the lower than operation between two columns (in1 < in2).

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnLower
>>> transformer = ColumnLower(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnLowerTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnLowerEqual

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the lower than or equal operation between two columns (in1 <= in2).

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnLowerEqual
>>> transformer = ColumnLowerEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnLowerEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnLowerEqualTransformer

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the lower than or equal operation between two columns (in1 <= in2).

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnLowerEqual
>>> transformer = ColumnLowerEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnLowerEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ true  │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnLowerTransformer

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the lower than operation between two columns (in1 < in2).

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnLower
>>> transformer = ColumnLower(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnLowerTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ false │
│ 5    ┆ 1    ┆ e    ┆ false │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnNotEqual

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the not equal operation between two columns (in1 != in2).

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnNotEqual
>>> transformer = ColumnNotEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnNotEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnNotEqualMissing

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the not equal operation between two columns (in1 != in2), where null values are not propagated.

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnNotEqualMissing
>>> transformer = ColumnNotEqualMissing(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnNotEqualMissingTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnNotEqualMissingTransformer

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the not equal operation between two columns (in1 != in2), where null values are not propagated.

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnNotEqualMissing
>>> transformer = ColumnNotEqualMissing(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnNotEqualMissingTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnNotEqualTransformer

Bases: BaseColumnComparatorTransformer

Implement a transformer that computes the not equal operation between two columns (in1 != in2).

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnNotEqual
>>> transformer = ColumnNotEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnNotEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out   │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ i64  ┆ str  ┆ bool  │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 5    ┆ a    ┆ true  │
│ 2    ┆ 4    ┆ b    ┆ true  │
│ 3    ┆ 3    ┆ c    ┆ false │
│ 4    ┆ 2    ┆ d    ┆ true  │
│ 5    ┆ 1    ┆ e    ┆ true  │
└──────┴──────┴──────┴───────┘

grizz.transformer.ColumnSelection

Bases: BaseInNTransformer

Implement a polars.DataFrame transformer to select a subset of columns.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to keep.

None
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnSelection
>>> transformer = ColumnSelection(columns=["col1", "col2"])
>>> transformer
ColumnSelectionTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, 2, 3, 4, 5],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
    shape: (5, 2)
┌────────────┬──────┐
│ col1       ┆ col2 │
│ ---        ┆ ---  │
│ str        ┆ i64  │
╞════════════╪══════╡
│ 2020-1-1   ┆ 1    │
│ 2020-1-2   ┆ 2    │
│ 2020-1-31  ┆ 3    │
│ 2020-12-31 ┆ 4    │
│ null       ┆ 5    │
└────────────┴──────┘

grizz.transformer.ColumnSelectionTransformer

Bases: BaseInNTransformer

Implement a polars.DataFrame transformer to select a subset of columns.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to keep.

None
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ColumnSelection
>>> transformer = ColumnSelection(columns=["col1", "col2"])
>>> transformer
ColumnSelectionTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, 2, 3, 4, 5],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
    shape: (5, 2)
┌────────────┬──────┐
│ col1       ┆ col2 │
│ ---        ┆ ---  │
│ str        ┆ i64  │
╞════════════╪══════╡
│ 2020-1-1   ┆ 1    │
│ 2020-1-2   ┆ 2    │
│ 2020-1-31  ┆ 3    │
│ 2020-12-31 ┆ 4    │
│ null       ┆ 5    │
└────────────┴──────┘

grizz.transformer.ConcatColumns

Bases: BaseInNOut1Transformer

Implement a transformer to concatenate columns into a new column.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to concatenate. The columns should have the same type or compatible types. If None, it processes all the columns.

required
out_col str

The output column.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_col="out")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_col='out', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ out          │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---          │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ list[i64]    │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ [11, 21, 31] │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ [12, 22, 32] │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ [13, 23, 33] │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ [14, 24, 34] │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘

grizz.transformer.ConcatColumnsTransformer

Bases: BaseInNOut1Transformer

Implement a transformer to concatenate columns into a new column.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to concatenate. The columns should have the same type or compatible types. If None, it processes all the columns.

required
out_col str

The output column.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_col="out")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_col='out', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ out          │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---          │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ list[i64]    │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ [11, 21, 31] │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ [12, 22, 32] │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ [13, 23, 33] │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ [14, 24, 34] │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘

grizz.transformer.CopyColumn

Bases: BaseIn1Out1Transformer

Implement a polars.DataFrame to copy a column.

Parameters:

Name Type Description Default
in_col str

The input column name i.e. the column to copy.

required
out_col str

The output column name i.e. the copied column.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import CopyColumn
>>> transformer = CopyColumn(in_col="col1", out_col="out")
>>> transformer
CopyColumnTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ str  ┆ str  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 1    ┆ 1    ┆ 1    ┆ a    ┆ 1   │
│ 2    ┆ 2    ┆ 2    ┆ b    ┆ 2   │
│ 3    ┆ 3    ┆ 3    ┆ c    ┆ 3   │
│ 4    ┆ 4    ┆ 4    ┆ d    ┆ 4   │
│ 5    ┆ 5    ┆ 5    ┆ e    ┆ 5   │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.CopyColumnTransformer

Bases: BaseIn1Out1Transformer

Implement a polars.DataFrame to copy a column.

Parameters:

Name Type Description Default
in_col str

The input column name i.e. the column to copy.

required
out_col str

The output column name i.e. the copied column.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import CopyColumn
>>> transformer = CopyColumn(in_col="col1", out_col="out")
>>> transformer
CopyColumnTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ str  ┆ str  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 1    ┆ 1    ┆ 1    ┆ a    ┆ 1   │
│ 2    ┆ 2    ┆ 2    ┆ b    ┆ 2   │
│ 3    ┆ 3    ┆ 3    ┆ c    ┆ 3   │
│ 4    ┆ 4    ┆ 4    ┆ d    ┆ 4   │
│ 5    ┆ 5    ┆ 5    ┆ e    ┆ 5   │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.CopyColumns

Bases: BaseInNOutNTransformer

Implement a transformer to copy some columns.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to copy. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import CopyColumns
>>> transformer = CopyColumns(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
CopyColumnsTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ str  ┆ str  ┆ i64      ┆ str      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 1    ┆ a    ┆ 1        ┆ 1        │
│ 2    ┆ 2    ┆ 2    ┆ b    ┆ 2        ┆ 2        │
│ 3    ┆ 3    ┆ 3    ┆ c    ┆ 3        ┆ 3        │
│ 4    ┆ 4    ┆ 4    ┆ d    ┆ 4        ┆ 4        │
│ 5    ┆ 5    ┆ 5    ┆ e    ┆ 5        ┆ 5        │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.CopyColumnsTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to copy some columns.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to copy. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import CopyColumns
>>> transformer = CopyColumns(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
CopyColumnsTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ str  ┆ str  ┆ i64      ┆ str      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 1    ┆ a    ┆ 1        ┆ 1        │
│ 2    ┆ 2    ┆ 2    ┆ b    ┆ 2        ┆ 2        │
│ 3    ┆ 3    ┆ 3    ┆ c    ┆ 3        ┆ 3        │
│ 4    ┆ 4    ┆ 4    ┆ d    ┆ 4        ┆ 4        │
│ 5    ┆ 5    ┆ 5    ┆ e    ┆ 5        ┆ 5        │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.DecimalCast

Bases: CastTransformer

Implement a transformer to convert decimal columns to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DecimalCast
>>> transformer = DecimalCast(
...     columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
DecimalCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Decimal,
...         "col3": pl.Decimal,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────────────┬──────────────┬──────┐
│ col1 ┆ col2         ┆ col3         ┆ col4 │
│ ---  ┆ ---          ┆ ---          ┆ ---  │
│ i64  ┆ decimal[*,0] ┆ decimal[*,0] ┆ str  │
╞══════╪══════════════╪══════════════╪══════╡
│ 1    ┆ 1            ┆ 1            ┆ a    │
│ 2    ┆ 2            ┆ 2            ┆ b    │
│ 3    ┆ 3            ┆ 3            ┆ c    │
│ 4    ┆ 4            ┆ 4            ┆ d    │
│ 5    ┆ 5            ┆ 5            ┆ e    │
└──────┴──────────────┴──────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────────────┬──────────────┬──────┬──────────┐
│ col1 ┆ col2         ┆ col3         ┆ col4 ┆ col2_out │
│ ---  ┆ ---          ┆ ---          ┆ ---  ┆ ---      │
│ i64  ┆ decimal[*,0] ┆ decimal[*,0] ┆ str  ┆ f32      │
╞══════╪══════════════╪══════════════╪══════╪══════════╡
│ 1    ┆ 1            ┆ 1            ┆ a    ┆ 1.0      │
│ 2    ┆ 2            ┆ 2            ┆ b    ┆ 2.0      │
│ 3    ┆ 3            ┆ 3            ┆ c    ┆ 3.0      │
│ 4    ┆ 4            ┆ 4            ┆ d    ┆ 4.0      │
│ 5    ┆ 5            ┆ 5            ┆ e    ┆ 5.0      │
└──────┴──────────────┴──────────────┴──────┴──────────┘

grizz.transformer.DecimalCastTransformer

Bases: CastTransformer

Implement a transformer to convert decimal columns to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DecimalCast
>>> transformer = DecimalCast(
...     columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
DecimalCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Decimal,
...         "col3": pl.Decimal,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────────────┬──────────────┬──────┐
│ col1 ┆ col2         ┆ col3         ┆ col4 │
│ ---  ┆ ---          ┆ ---          ┆ ---  │
│ i64  ┆ decimal[*,0] ┆ decimal[*,0] ┆ str  │
╞══════╪══════════════╪══════════════╪══════╡
│ 1    ┆ 1            ┆ 1            ┆ a    │
│ 2    ┆ 2            ┆ 2            ┆ b    │
│ 3    ┆ 3            ┆ 3            ┆ c    │
│ 4    ┆ 4            ┆ 4            ┆ d    │
│ 5    ┆ 5            ┆ 5            ┆ e    │
└──────┴──────────────┴──────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────────────┬──────────────┬──────┬──────────┐
│ col1 ┆ col2         ┆ col3         ┆ col4 ┆ col2_out │
│ ---  ┆ ---          ┆ ---          ┆ ---  ┆ ---      │
│ i64  ┆ decimal[*,0] ┆ decimal[*,0] ┆ str  ┆ f32      │
╞══════╪══════════════╪══════════════╪══════╪══════════╡
│ 1    ┆ 1            ┆ 1            ┆ a    ┆ 1.0      │
│ 2    ┆ 2            ┆ 2            ┆ b    ┆ 2.0      │
│ 3    ┆ 3            ┆ 3            ┆ c    ┆ 3.0      │
│ 4    ┆ 4            ┆ 4            ┆ d    ┆ 4.0      │
│ 5    ┆ 5            ┆ 5            ┆ e    ┆ 5.0      │
└──────┴──────────────┴──────────────┴──────┴──────────┘

grizz.transformer.Diff

Bases: BaseIn1Out1Transformer

Implement a transformer to compute the first discrete difference between shifted items.

Parameters:

Name Type Description Default
in_col str

The input column name.

required
out_col str

The output column name.

required
shift int

The number of slots to shift.

1
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Diff
>>> transformer = Diff(in_col="col1", out_col="diff")
>>> transformer
DiffTransformer(in_col='col1', out_col='diff', exist_policy='raise', missing_policy='raise', shift=1)
>>> frame = pl.DataFrame({"col1": [1, 2, 3, 4, 5], "col2": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ a    │
│ 2    ┆ b    │
│ 3    ┆ c    │
│ 4    ┆ d    │
│ 5    ┆ e    │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ diff │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╡
│ 1    ┆ a    ┆ null │
│ 2    ┆ b    ┆ 1    │
│ 3    ┆ c    ┆ 1    │
│ 4    ┆ d    ┆ 1    │
│ 5    ┆ e    ┆ 1    │
└──────┴──────┴──────┘

grizz.transformer.DiffHorizontal

Bases: BaseIn2Out1Transformer

Implement a transformer to compute the difference between two columns.

Internally, this tranformer computes: out = in1 - in2

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DiffHorizontal
>>> transformer = DiffHorizontal(in1_col="col1", in2_col="col2", out_col="diff")
>>> transformer
DiffHorizontalTransformer(in1_col='col1', in2_col='col2', out_col='diff', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ diff │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    ┆ -4   │
│ 2    ┆ 4    ┆ b    ┆ -2   │
│ 3    ┆ 3    ┆ c    ┆ 0    │
│ 4    ┆ 2    ┆ d    ┆ 2    │
│ 5    ┆ 1    ┆ e    ┆ 4    │
└──────┴──────┴──────┴──────┘

grizz.transformer.DiffHorizontalTransformer

Bases: BaseIn2Out1Transformer

Implement a transformer to compute the difference between two columns.

Internally, this tranformer computes: out = in1 - in2

Parameters:

Name Type Description Default
in1_col str

The first input column name.

required
in2_col str

The second input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DiffHorizontal
>>> transformer = DiffHorizontal(in1_col="col1", in2_col="col2", out_col="diff")
>>> transformer
DiffHorizontalTransformer(in1_col='col1', in2_col='col2', out_col='diff', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [5, 4, 3, 2, 1],
...         "col3": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    │
│ 2    ┆ 4    ┆ b    │
│ 3    ┆ 3    ┆ c    │
│ 4    ┆ 2    ┆ d    │
│ 5    ┆ 1    ┆ e    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ diff │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 5    ┆ a    ┆ -4   │
│ 2    ┆ 4    ┆ b    ┆ -2   │
│ 3    ┆ 3    ┆ c    ┆ 0    │
│ 4    ┆ 2    ┆ d    ┆ 2    │
│ 5    ┆ 1    ┆ e    ┆ 4    │
└──────┴──────┴──────┴──────┘

grizz.transformer.DiffTransformer

Bases: BaseIn1Out1Transformer

Implement a transformer to compute the first discrete difference between shifted items.

Parameters:

Name Type Description Default
in_col str

The input column name.

required
out_col str

The output column name.

required
shift int

The number of slots to shift.

1
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Diff
>>> transformer = Diff(in_col="col1", out_col="diff")
>>> transformer
DiffTransformer(in_col='col1', out_col='diff', exist_policy='raise', missing_policy='raise', shift=1)
>>> frame = pl.DataFrame({"col1": [1, 2, 3, 4, 5], "col2": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ a    │
│ 2    ┆ b    │
│ 3    ┆ c    │
│ 4    ┆ d    │
│ 5    ┆ e    │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ diff │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╡
│ 1    ┆ a    ┆ null │
│ 2    ┆ b    ┆ 1    │
│ 3    ┆ c    ┆ 1    │
│ 4    ┆ d    ┆ 1    │
│ 5    ┆ e    ┆ 1    │
└──────┴──────┴──────┘

grizz.transformer.DropDuplicate

Bases: BaseInNTransformer

Implement a transformer to drop duplicate rows.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to check. If set to None (default), use all columns.

None
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for unique.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropDuplicate
>>> transformer = DropDuplicate(keep="first", maintain_order=True)
>>> transformer
DropDuplicateTransformer(columns=None, exclude_columns=(), missing_policy='raise', keep='first', maintain_order=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 1],
...         "col2": ["1", "2", "3", "4", "1"],
...         "col3": ["1", "2", "3", "1", "1"],
...         "col4": ["a", "a", "a", "a", "a"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ a    │
│ 3    ┆ 3    ┆ 3    ┆ a    │
│ 4    ┆ 4    ┆ 1    ┆ a    │
│ 1    ┆ 1    ┆ 1    ┆ a    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ a    │
│ 3    ┆ 3    ┆ 3    ┆ a    │
│ 4    ┆ 4    ┆ 1    ┆ a    │
└──────┴──────┴──────┴──────┘

grizz.transformer.DropDuplicateTransformer

Bases: BaseInNTransformer

Implement a transformer to drop duplicate rows.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to check. If set to None (default), use all columns.

None
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for unique.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropDuplicate
>>> transformer = DropDuplicate(keep="first", maintain_order=True)
>>> transformer
DropDuplicateTransformer(columns=None, exclude_columns=(), missing_policy='raise', keep='first', maintain_order=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 1],
...         "col2": ["1", "2", "3", "4", "1"],
...         "col3": ["1", "2", "3", "1", "1"],
...         "col4": ["a", "a", "a", "a", "a"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ a    │
│ 3    ┆ 3    ┆ 3    ┆ a    │
│ 4    ┆ 4    ┆ 1    ┆ a    │
│ 1    ┆ 1    ┆ 1    ┆ a    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ a    │
│ 3    ┆ 3    ┆ 3    ┆ a    │
│ 4    ┆ 4    ┆ 1    ┆ a    │
└──────┴──────┴──────┴──────┘

grizz.transformer.DropNanColumn

Bases: BaseInNTransformer

Implement a transformer to remove the columns that have too many NaN values.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

None
threshold float

The maximum percentage of NaN values to keep columns. If the proportion of NaN vallues is greater or equal to this threshold value, the column is removed. If set to 1.0, it removes all the columns that have only NaN values.

1.0
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for drop.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNanColumn
>>> transformer = DropNanColumn()
>>> transformer
DropNanColumnTransformer(columns=None, exclude_columns=(), missing_policy='raise', threshold=1.0)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1.0, 2.0, 3.0, 4.0, float("nan")],
...         "col2": [1.0, float("nan"), 3.0, float("nan"), 5.0],
...         "col3": [float("nan"), float("nan"), float("nan"), float("nan"), float("nan")],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ f64  ┆ f64  ┆ f64  │
╞══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ NaN  │
│ 2.0  ┆ NaN  ┆ NaN  │
│ 3.0  ┆ 3.0  ┆ NaN  │
│ 4.0  ┆ NaN  ┆ NaN  │
│ NaN  ┆ 5.0  ┆ NaN  │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ f64  ┆ f64  │
╞══════╪══════╡
│ 1.0  ┆ 1.0  │
│ 2.0  ┆ NaN  │
│ 3.0  ┆ 3.0  │
│ 4.0  ┆ NaN  │
│ NaN  ┆ 5.0  │
└──────┴──────┘

grizz.transformer.DropNanColumnTransformer

Bases: BaseInNTransformer

Implement a transformer to remove the columns that have too many NaN values.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

None
threshold float

The maximum percentage of NaN values to keep columns. If the proportion of NaN vallues is greater or equal to this threshold value, the column is removed. If set to 1.0, it removes all the columns that have only NaN values.

1.0
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for drop.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNanColumn
>>> transformer = DropNanColumn()
>>> transformer
DropNanColumnTransformer(columns=None, exclude_columns=(), missing_policy='raise', threshold=1.0)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1.0, 2.0, 3.0, 4.0, float("nan")],
...         "col2": [1.0, float("nan"), 3.0, float("nan"), 5.0],
...         "col3": [float("nan"), float("nan"), float("nan"), float("nan"), float("nan")],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ f64  ┆ f64  ┆ f64  │
╞══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ NaN  │
│ 2.0  ┆ NaN  ┆ NaN  │
│ 3.0  ┆ 3.0  ┆ NaN  │
│ 4.0  ┆ NaN  ┆ NaN  │
│ NaN  ┆ 5.0  ┆ NaN  │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ f64  ┆ f64  │
╞══════╪══════╡
│ 1.0  ┆ 1.0  │
│ 2.0  ┆ NaN  │
│ 3.0  ┆ 3.0  │
│ 4.0  ┆ NaN  │
│ NaN  ┆ 5.0  │
└──────┴──────┘

grizz.transformer.DropNanRow

Bases: BaseInNTransformer

Implement a transformer to drop all rows that contain NaN values.

Note that all the values in the row need to be NaN to drop the row.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to check. If set to None (default), use all columns.

None
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNanRow
>>> transformer = DropNanRow()
>>> transformer
DropNanRowTransformer(columns=None, exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1.0, 2.0, 3.0, 4.0, float("nan")],
...         "col2": [1.0, float("nan"), 3.0, float("nan"), float("nan")],
...         "col3": [float("nan"), float("nan"), float("nan"), float("nan"), float("nan")],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ f64  ┆ f64  ┆ f64  │
╞══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ NaN  │
│ 2.0  ┆ NaN  ┆ NaN  │
│ 3.0  ┆ 3.0  ┆ NaN  │
│ 4.0  ┆ NaN  ┆ NaN  │
│ NaN  ┆ NaN  ┆ NaN  │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ f64  ┆ f64  ┆ f64  │
╞══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ NaN  │
│ 2.0  ┆ NaN  ┆ NaN  │
│ 3.0  ┆ 3.0  ┆ NaN  │
│ 4.0  ┆ NaN  ┆ NaN  │
└──────┴──────┴──────┘

grizz.transformer.DropNanRowTransformer

Bases: BaseInNTransformer

Implement a transformer to drop all rows that contain NaN values.

Note that all the values in the row need to be NaN to drop the row.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to check. If set to None (default), use all columns.

None
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNanRow
>>> transformer = DropNanRow()
>>> transformer
DropNanRowTransformer(columns=None, exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1.0, 2.0, 3.0, 4.0, float("nan")],
...         "col2": [1.0, float("nan"), 3.0, float("nan"), float("nan")],
...         "col3": [float("nan"), float("nan"), float("nan"), float("nan"), float("nan")],
...     }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ f64  ┆ f64  ┆ f64  │
╞══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ NaN  │
│ 2.0  ┆ NaN  ┆ NaN  │
│ 3.0  ┆ 3.0  ┆ NaN  │
│ 4.0  ┆ NaN  ┆ NaN  │
│ NaN  ┆ NaN  ┆ NaN  │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ f64  ┆ f64  ┆ f64  │
╞══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ NaN  │
│ 2.0  ┆ NaN  ┆ NaN  │
│ 3.0  ┆ 3.0  ┆ NaN  │
│ 4.0  ┆ NaN  ┆ NaN  │
└──────┴──────┴──────┘

grizz.transformer.DropNullColumn

Bases: BaseInNTransformer

Implement a transformer to remove the columns that have too many null values.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

None
threshold float

The maximum percentage of null values to keep columns. If the proportion of null vallues is greater or equal to this threshold value, the column is removed. If set to 1.0, it removes all the columns that have only null values.

1.0
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for drop.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullColumn
>>> transformer = DropNullColumn()
>>> transformer
DropNullColumnTransformer(columns=None, exclude_columns=(), missing_policy='raise', threshold=1.0)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, None, 3, None, 5],
...         "col3": [None, None, None, None, None],
...     }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null       ┆ 5    ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌────────────┬──────┐
│ col1       ┆ col2 │
│ ---        ┆ ---  │
│ str        ┆ i64  │
╞════════════╪══════╡
│ 2020-1-1   ┆ 1    │
│ 2020-1-2   ┆ null │
│ 2020-1-31  ┆ 3    │
│ 2020-12-31 ┆ null │
│ null       ┆ 5    │
└────────────┴──────┘

grizz.transformer.DropNullColumnTransformer

Bases: BaseInNTransformer

Implement a transformer to remove the columns that have too many null values.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

None
threshold float

The maximum percentage of null values to keep columns. If the proportion of null vallues is greater or equal to this threshold value, the column is removed. If set to 1.0, it removes all the columns that have only null values.

1.0
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for drop.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullColumn
>>> transformer = DropNullColumn()
>>> transformer
DropNullColumnTransformer(columns=None, exclude_columns=(), missing_policy='raise', threshold=1.0)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, None, 3, None, 5],
...         "col3": [None, None, None, None, None],
...     }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null       ┆ 5    ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌────────────┬──────┐
│ col1       ┆ col2 │
│ ---        ┆ ---  │
│ str        ┆ i64  │
╞════════════╪══════╡
│ 2020-1-1   ┆ 1    │
│ 2020-1-2   ┆ null │
│ 2020-1-31  ┆ 3    │
│ 2020-12-31 ┆ null │
│ null       ┆ 5    │
└────────────┴──────┘

grizz.transformer.DropNullRow

Bases: BaseInNTransformer

Implement a transformer to drop all rows that contain null values.

Note that all the values in the row need to be null to drop the row.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to check. If set to None (default), use all columns.

None
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> transformer = DropNullRow()
>>> transformer
DropNullRowTransformer(columns=None, exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, None, 3, None, None],
...         "col3": [None, None, None, None, None],
...     }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null       ┆ null ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
└────────────┴──────┴──────┘

grizz.transformer.DropNullRowTransformer

Bases: BaseInNTransformer

Implement a transformer to drop all rows that contain null values.

Note that all the values in the row need to be null to drop the row.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to check. If set to None (default), use all columns.

None
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> transformer = DropNullRow()
>>> transformer
DropNullRowTransformer(columns=None, exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
...         "col2": [1, None, 3, None, None],
...         "col3": [None, None, None, None, None],
...     }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null       ┆ null ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌────────────┬──────┬──────┐
│ col1       ┆ col2 ┆ col3 │
│ ---        ┆ ---  ┆ ---  │
│ str        ┆ i64  ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1   ┆ 1    ┆ null │
│ 2020-1-2   ┆ null ┆ null │
│ 2020-1-31  ┆ 3    ┆ null │
│ 2020-12-31 ┆ null ┆ null │
└────────────┴──────┴──────┘

grizz.transformer.Equal

Bases: BaseComparatorTransformer

Implements a transformer that computes the equal operation.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Equal
>>> transformer = Equal(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
EqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.EqualMissing

Bases: BaseComparatorTransformer

Implements a transformer that computes the equal operation where null values are not propagated.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import EqualMissing
>>> transformer = EqualMissing(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
EqualMissingTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.EqualMissingTransformer

Bases: BaseComparatorTransformer

Implements a transformer that computes the equal operation where null values are not propagated.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import EqualMissing
>>> transformer = EqualMissing(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
EqualMissingTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.EqualTransformer

Bases: BaseComparatorTransformer

Implements a transformer that computes the equal operation.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Equal
>>> transformer = Equal(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
EqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.FillNan

Bases: BaseInNOutNTransformer

Implement a transformer to fill NaN values.

This transformer ignores the columns that are not of type float.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for fill_nan.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FillNan
>>> transformer = FillNan(columns=["col1", "col4"], prefix="", suffix="_out", value=100)
>>> transformer
FillNanTransformer(columns=('col1', 'col4'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ NaN  ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col4_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      │
│ i64  ┆ f64  ┆ str  ┆ f64  ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  ┆ 1.2      │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  ┆ 100.0    │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  ┆ 3.2      │
│ 4    ┆ 4.2  ┆ d    ┆ null ┆ null     │
│ null ┆ NaN  ┆ null ┆ 5.2  ┆ 5.2      │
└──────┴──────┴──────┴──────┴──────────┘

grizz.transformer.FillNanTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to fill NaN values.

This transformer ignores the columns that are not of type float.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for fill_nan.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FillNan
>>> transformer = FillNan(columns=["col1", "col4"], prefix="", suffix="_out", value=100)
>>> transformer
FillNanTransformer(columns=('col1', 'col4'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ NaN  ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col4_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      │
│ i64  ┆ f64  ┆ str  ┆ f64  ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  ┆ 1.2      │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  ┆ 100.0    │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  ┆ 3.2      │
│ 4    ┆ 4.2  ┆ d    ┆ null ┆ null     │
│ null ┆ NaN  ┆ null ┆ 5.2  ┆ 5.2      │
└──────┴──────┴──────┴──────┴──────────┘

grizz.transformer.FillNull

Bases: BaseInNOutNTransformer

Implement a transformer to fill null values.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for fill_null.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FillNull
>>> transformer = FillNull(columns=["col1", "col4"], prefix="", suffix="_out", value=100)
>>> transformer
FillNullTransformer(columns=('col1', 'col4'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, None],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ null ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col4_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ f64  ┆ str  ┆ f64  ┆ i64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  ┆ 1        ┆ 1.2      │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  ┆ 2        ┆ NaN      │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  ┆ 3        ┆ 3.2      │
│ 4    ┆ 4.2  ┆ d    ┆ null ┆ 4        ┆ 100.0    │
│ null ┆ null ┆ null ┆ 5.2  ┆ 100      ┆ 5.2      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.FillNullTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to fill null values.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for fill_null.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FillNull
>>> transformer = FillNull(columns=["col1", "col4"], prefix="", suffix="_out", value=100)
>>> transformer
FillNullTransformer(columns=('col1', 'col4'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, None],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ null ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col4_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ f64  ┆ str  ┆ f64  ┆ i64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  ┆ 1        ┆ 1.2      │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  ┆ 2        ┆ NaN      │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  ┆ 3        ┆ 3.2      │
│ 4    ┆ 4.2  ┆ d    ┆ null ┆ 4        ┆ 100.0    │
│ null ┆ null ┆ null ┆ 5.2  ┆ 100      ┆ 5.2      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.FilterCardinality

Bases: BaseInNTransformer

Implement a transformer to filter based on the cardinality (i.e. number of unique values) in each column.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to use to filter based on the number of unique values. If None, it processes all the columns of type string.

None
n_min int

The minimal cardinality (included).

0
n_max int

The maximal cardinality (excluded).

float('inf')
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FilterCardinality
>>> transformer = FilterCardinality(columns=["col1", "col2", "col3"], n_min=2, n_max=5)
>>> transformer
FilterCardinalityTransformer(columns=('col1', 'col2', 'col3'), exclude_columns=(), missing_policy='raise', n_min=2, n_max=5)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1, 1, 1, 1, 1],
...         "col3": ["a", "b", "c", "a", "b"],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ a    ┆ 1.2  │
│ 2    ┆ 1    ┆ b    ┆ NaN  │
│ 3    ┆ 1    ┆ c    ┆ 3.2  │
│ 4    ┆ 1    ┆ a    ┆ null │
│ 5    ┆ 1    ┆ b    ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col3 ┆ col4 │
│ ---  ┆ ---  │
│ str  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.2  │
│ b    ┆ NaN  │
│ c    ┆ 3.2  │
│ a    ┆ null │
│ b    ┆ 5.2  │
└──────┴──────┘

grizz.transformer.FilterCardinalityTransformer

Bases: BaseInNTransformer

Implement a transformer to filter based on the cardinality (i.e. number of unique values) in each column.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to use to filter based on the number of unique values. If None, it processes all the columns of type string.

None
n_min int

The minimal cardinality (included).

0
n_max int

The maximal cardinality (excluded).

float('inf')
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FilterCardinality
>>> transformer = FilterCardinality(columns=["col1", "col2", "col3"], n_min=2, n_max=5)
>>> transformer
FilterCardinalityTransformer(columns=('col1', 'col2', 'col3'), exclude_columns=(), missing_policy='raise', n_min=2, n_max=5)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1, 1, 1, 1, 1],
...         "col3": ["a", "b", "c", "a", "b"],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ a    ┆ 1.2  │
│ 2    ┆ 1    ┆ b    ┆ NaN  │
│ 3    ┆ 1    ┆ c    ┆ 3.2  │
│ 4    ┆ 1    ┆ a    ┆ null │
│ 5    ┆ 1    ┆ b    ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col3 ┆ col4 │
│ ---  ┆ ---  │
│ str  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.2  │
│ b    ┆ NaN  │
│ c    ┆ 3.2  │
│ a    ┆ null │
│ b    ┆ 5.2  │
└──────┴──────┘

grizz.transformer.FirstRow

Bases: BaseArgTransformer

Implement a transformer that select the first n rows.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FirstRow
>>> transformer = FirstRow(n=3)
>>> transformer
FirstRowTransformer(n=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
└──────┴──────┴──────┴──────┘

grizz.transformer.FirstRowTransformer

Bases: BaseArgTransformer

Implement a transformer that select the first n rows.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FirstRow
>>> transformer = FirstRow(n=3)
>>> transformer
FirstRowTransformer(n=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
└──────┴──────┴──────┴──────┘

grizz.transformer.FloatCast

Bases: CastTransformer

Implement a transformer to convert float columns to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FloatCast
>>> transformer = FloatCast(
...     columns=["col1", "col2"], dtype=pl.Int32, prefix="", suffix="_out"
... )
>>> transformer
FloatCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col2_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      │
│ i64  ┆ f64  ┆ f64  ┆ str  ┆ i32      │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    ┆ 1        │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    ┆ 2        │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    ┆ 3        │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    ┆ 4        │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    ┆ 5        │
└──────┴──────┴──────┴──────┴──────────┘

grizz.transformer.FloatCastTransformer

Bases: CastTransformer

Implement a transformer to convert float columns to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FloatCast
>>> transformer = FloatCast(
...     columns=["col1", "col2"], dtype=pl.Int32, prefix="", suffix="_out"
... )
>>> transformer
FloatCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col2_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      │
│ i64  ┆ f64  ┆ f64  ┆ str  ┆ i32      │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    ┆ 1        │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    ┆ 2        │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    ┆ 3        │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    ┆ 4        │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    ┆ 5        │
└──────┴──────┴──────┴──────┴──────────┘

grizz.transformer.Function

Bases: BaseArgTransformer

Implement a transformer that is a wrapper around a function to transform the DataFrame.

Parameters:

Name Type Description Default
func Callable[[DataFrame], DataFrame]

The function to transform the DataFrame.

required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FunctionTransformer
>>> transformer = FunctionTransformer(
...     func=lambda frame: frame.filter(pl.col("col1").is_in({2, 4}))
... )
>>> transformer
FunctionTransformer(func=<function <lambda> at 0x...>)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (2, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
└──────┴──────┴──────┴──────┘

grizz.transformer.FunctionTransformer

Bases: BaseArgTransformer

Implement a transformer that is a wrapper around a function to transform the DataFrame.

Parameters:

Name Type Description Default
func Callable[[DataFrame], DataFrame]

The function to transform the DataFrame.

required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import FunctionTransformer
>>> transformer = FunctionTransformer(
...     func=lambda frame: frame.filter(pl.col("col1").is_in({2, 4}))
... )
>>> transformer
FunctionTransformer(func=<function <lambda> at 0x...>)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (2, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
└──────┴──────┴──────┴──────┘

grizz.transformer.Greater

Bases: BaseComparatorTransformer

Implements a transformer that computes the greater than operation.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Greater
>>> transformer = Greater(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
GreaterTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.GreaterEqual

Bases: BaseComparatorTransformer

Implements a transformer that computes the greater than or equal operation.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import GreaterEqual
>>> transformer = GreaterEqual(
...     columns=["col1", "col3"], target=4.2, prefix="", suffix="_out"
... )
>>> transformer
GreaterEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.GreaterEqualTransformer

Bases: BaseComparatorTransformer

Implements a transformer that computes the greater than or equal operation.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import GreaterEqual
>>> transformer = GreaterEqual(
...     columns=["col1", "col3"], target=4.2, prefix="", suffix="_out"
... )
>>> transformer
GreaterEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.GreaterTransformer

Bases: BaseComparatorTransformer

Implements a transformer that computes the greater than operation.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Greater
>>> transformer = Greater(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
GreaterTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ false    ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ false    ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ false    ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.InplaceCast

Bases: CastTransformer

Implement a transformer to convert some columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
InplaceCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceCastTransformer

Bases: CastTransformer

Implement a transformer to convert some columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
InplaceCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceCategoricalCast

Bases: CategoricalCastTransformer

Implement a transformer to convert a column to categorical data type.

InplaceCategoricalCastTransformer is a specific implementation of CategoricalCastTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
col str

The column name to cast.

required
**kwargs Any

Additional arguments passed to polars.Categorical.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCategoricalCast
>>> transformer = InplaceCategoricalCast(col="col1")
>>> transformer
InplaceCategoricalCastTransformer(col='col1', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...     },
...     schema={"col1": pl.String, "col2": pl.Float64},
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.0  │
│ b    ┆ 2.0  │
│ c    ┆ 3.0  │
│ d    ┆ 4.0  │
│ e    ┆ 5.0  │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ cat  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.0  │
│ b    ┆ 2.0  │
│ c    ┆ 3.0  │
│ d    ┆ 4.0  │
│ e    ┆ 5.0  │
└──────┴──────┘

grizz.transformer.InplaceCategoricalCastTransformer

Bases: CategoricalCastTransformer

Implement a transformer to convert a column to categorical data type.

InplaceCategoricalCastTransformer is a specific implementation of CategoricalCastTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
col str

The column name to cast.

required
**kwargs Any

Additional arguments passed to polars.Categorical.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceCategoricalCast
>>> transformer = InplaceCategoricalCast(col="col1")
>>> transformer
InplaceCategoricalCastTransformer(col='col1', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...     },
...     schema={"col1": pl.String, "col2": pl.Float64},
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.0  │
│ b    ┆ 2.0  │
│ c    ┆ 3.0  │
│ d    ┆ 4.0  │
│ e    ┆ 5.0  │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ cat  ┆ f64  │
╞══════╪══════╡
│ a    ┆ 1.0  │
│ b    ┆ 2.0  │
│ c    ┆ 3.0  │
│ d    ┆ 4.0  │
│ e    ┆ 5.0  │
└──────┴──────┘

grizz.transformer.InplaceDecimalCast

Bases: InplaceCastTransformer

Implement a transformer to convert decimal columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceDecimalCast
>>> transformer = InplaceDecimalCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
InplaceDecimalCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Decimal,
...         "col3": pl.Decimal,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────────────┬──────────────┬──────┐
│ col1 ┆ col2         ┆ col3         ┆ col4 │
│ ---  ┆ ---          ┆ ---          ┆ ---  │
│ i64  ┆ decimal[*,0] ┆ decimal[*,0] ┆ str  │
╞══════╪══════════════╪══════════════╪══════╡
│ 1    ┆ 1            ┆ 1            ┆ a    │
│ 2    ┆ 2            ┆ 2            ┆ b    │
│ 3    ┆ 3            ┆ 3            ┆ c    │
│ 4    ┆ 4            ┆ 4            ┆ d    │
│ 5    ┆ 5            ┆ 5            ┆ e    │
└──────┴──────────────┴──────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────────────┬──────┐
│ col1 ┆ col2 ┆ col3         ┆ col4 │
│ ---  ┆ ---  ┆ ---          ┆ ---  │
│ i64  ┆ f32  ┆ decimal[*,0] ┆ str  │
╞══════╪══════╪══════════════╪══════╡
│ 1    ┆ 1.0  ┆ 1            ┆ a    │
│ 2    ┆ 2.0  ┆ 2            ┆ b    │
│ 3    ┆ 3.0  ┆ 3            ┆ c    │
│ 4    ┆ 4.0  ┆ 4            ┆ d    │
│ 5    ┆ 5.0  ┆ 5            ┆ e    │
└──────┴──────┴──────────────┴──────┘

grizz.transformer.InplaceDecimalCastTransformer

Bases: InplaceCastTransformer

Implement a transformer to convert decimal columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceDecimalCast
>>> transformer = InplaceDecimalCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
InplaceDecimalCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Decimal,
...         "col3": pl.Decimal,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────────────┬──────────────┬──────┐
│ col1 ┆ col2         ┆ col3         ┆ col4 │
│ ---  ┆ ---          ┆ ---          ┆ ---  │
│ i64  ┆ decimal[*,0] ┆ decimal[*,0] ┆ str  │
╞══════╪══════════════╪══════════════╪══════╡
│ 1    ┆ 1            ┆ 1            ┆ a    │
│ 2    ┆ 2            ┆ 2            ┆ b    │
│ 3    ┆ 3            ┆ 3            ┆ c    │
│ 4    ┆ 4            ┆ 4            ┆ d    │
│ 5    ┆ 5            ┆ 5            ┆ e    │
└──────┴──────────────┴──────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────────────┬──────┐
│ col1 ┆ col2 ┆ col3         ┆ col4 │
│ ---  ┆ ---  ┆ ---          ┆ ---  │
│ i64  ┆ f32  ┆ decimal[*,0] ┆ str  │
╞══════╪══════╪══════════════╪══════╡
│ 1    ┆ 1.0  ┆ 1            ┆ a    │
│ 2    ┆ 2.0  ┆ 2            ┆ b    │
│ 3    ┆ 3.0  ┆ 3            ┆ c    │
│ 4    ┆ 4.0  ┆ 4            ┆ d    │
│ 5    ┆ 5.0  ┆ 5            ┆ e    │
└──────┴──────┴──────────────┴──────┘

grizz.transformer.InplaceFillNan

Bases: FillNanTransformer

Implement a transformer to fill NaN values.

This transformer ignores the columns that are not of type float. InplaceFillNanTransformer is a specific implementation of FillNanTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for fill_nan.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceFillNan
>>> transformer = InplaceFillNan(columns=["col1", "col4"], value=100)
>>> transformer
InplaceFillNanTransformer(columns=('col1', 'col4'), exclude_columns=(), missing_policy='raise', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ NaN  ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ f64  ┆ str  ┆ f64   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2   │
│ 2    ┆ 2.2  ┆ b    ┆ 100.0 │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2   │
│ 4    ┆ 4.2  ┆ d    ┆ null  │
│ null ┆ NaN  ┆ null ┆ 5.2   │
└──────┴──────┴──────┴───────┘

grizz.transformer.InplaceFillNanTransformer

Bases: FillNanTransformer

Implement a transformer to fill NaN values.

This transformer ignores the columns that are not of type float. InplaceFillNanTransformer is a specific implementation of FillNanTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for fill_nan.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceFillNan
>>> transformer = InplaceFillNan(columns=["col1", "col4"], value=100)
>>> transformer
InplaceFillNanTransformer(columns=('col1', 'col4'), exclude_columns=(), missing_policy='raise', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ NaN  ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ f64  ┆ str  ┆ f64   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2   │
│ 2    ┆ 2.2  ┆ b    ┆ 100.0 │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2   │
│ 4    ┆ 4.2  ┆ d    ┆ null  │
│ null ┆ NaN  ┆ null ┆ 5.2   │
└──────┴──────┴──────┴───────┘

grizz.transformer.InplaceFillNull

Bases: FillNullTransformer

Implement a transformer to fill null values.

This transformer ignores the columns that are not of type float. InplaceFillNullTransformer is a specific implementation of FillNullTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for fill_nan.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceFillNull
>>> transformer = InplaceFillNull(columns=["col1", "col4"], value=100)
>>> transformer
InplaceFillNullTransformer(columns=('col1', 'col4'), exclude_columns=(), missing_policy='raise', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ NaN  ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ f64  ┆ str  ┆ f64   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2   │
│ 2    ┆ 2.2  ┆ b    ┆ NaN   │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2   │
│ 4    ┆ 4.2  ┆ d    ┆ 100.0 │
│ 100  ┆ NaN  ┆ null ┆ 5.2   │
└──────┴──────┴──────┴───────┘

grizz.transformer.InplaceFillNullTransformer

Bases: FillNullTransformer

Implement a transformer to fill null values.

This transformer ignores the columns that are not of type float. InplaceFillNullTransformer is a specific implementation of FillNullTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for fill_nan.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceFillNull
>>> transformer = InplaceFillNull(columns=["col1", "col4"], value=100)
>>> transformer
InplaceFillNullTransformer(columns=('col1', 'col4'), exclude_columns=(), missing_policy='raise', value=100)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, None],
...         "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
...         "col3": ["a", "b", "c", "d", None],
...         "col4": [1.2, float("nan"), 3.2, None, 5.2],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2  │
│ 2    ┆ 2.2  ┆ b    ┆ NaN  │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2  │
│ 4    ┆ 4.2  ┆ d    ┆ null │
│ null ┆ NaN  ┆ null ┆ 5.2  │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ f64  ┆ str  ┆ f64   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1.2  ┆ a    ┆ 1.2   │
│ 2    ┆ 2.2  ┆ b    ┆ NaN   │
│ 3    ┆ 3.2  ┆ c    ┆ 3.2   │
│ 4    ┆ 4.2  ┆ d    ┆ 100.0 │
│ 100  ┆ NaN  ┆ null ┆ 5.2   │
└──────┴──────┴──────┴───────┘

grizz.transformer.InplaceFloatCast

Bases: InplaceCastTransformer

Implement a transformer to convert float columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceFloatCast
>>> transformer = InplaceFloatCast(columns=["col1", "col2"], dtype=pl.Int32)
>>> transformer
InplaceFloatCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i32  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1.0  ┆ a    │
│ 2    ┆ 2    ┆ 2.0  ┆ b    │
│ 3    ┆ 3    ┆ 3.0  ┆ c    │
│ 4    ┆ 4    ┆ 4.0  ┆ d    │
│ 5    ┆ 5    ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceFloatCastTransformer

Bases: InplaceCastTransformer

Implement a transformer to convert float columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceFloatCast
>>> transformer = InplaceFloatCast(columns=["col1", "col2"], dtype=pl.Int32)
>>> transformer
InplaceFloatCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i32  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1.0  ┆ a    │
│ 2    ┆ 2    ┆ 2.0  ┆ b    │
│ 3    ┆ 3    ┆ 3.0  ┆ c    │
│ 4    ┆ 4    ┆ 4.0  ┆ d    │
│ 5    ┆ 5    ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceIntegerCast

Bases: InplaceCastTransformer

Implement a transformer to convert integer columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceIntegerCast
>>> transformer = InplaceIntegerCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
InplaceIntegerCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1, 2, 3, 4, 5],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Int64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1    ┆ a    │
│ 2    ┆ 2.0  ┆ 2    ┆ b    │
│ 3    ┆ 3.0  ┆ 3    ┆ c    │
│ 4    ┆ 4.0  ┆ 4    ┆ d    │
│ 5    ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f32  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ 1    ┆ a    │
│ 2.0  ┆ 2.0  ┆ 2    ┆ b    │
│ 3.0  ┆ 3.0  ┆ 3    ┆ c    │
│ 4.0  ┆ 4.0  ┆ 4    ┆ d    │
│ 5.0  ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceIntegerCastTransformer

Bases: InplaceCastTransformer

Implement a transformer to convert integer columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceIntegerCast
>>> transformer = InplaceIntegerCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
InplaceIntegerCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1, 2, 3, 4, 5],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Int64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1    ┆ a    │
│ 2    ┆ 2.0  ┆ 2    ┆ b    │
│ 3    ┆ 3.0  ┆ 3    ┆ c    │
│ 4    ┆ 4.0  ┆ 4    ┆ d    │
│ 5    ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f32  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ 1    ┆ a    │
│ 2.0  ┆ 2.0  ┆ 2    ┆ b    │
│ 3.0  ┆ 3.0  ┆ 3    ┆ c    │
│ 4.0  ┆ 4.0  ┆ 4    ┆ d    │
│ 5.0  ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceJsonDecode

Bases: JsonDecodeTransformer

Implement a transformer to parse string values as JSON.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to parse. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to polars.Expr.str.json_decode.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceJsonDecode
>>> transformer = InplaceJsonDecode(columns=["col1", "col3"])
>>> transformer
InplaceJsonDecodeTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ str       ┆ str  ┆ str             ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ list[i64] ┆ str  ┆ list[str]       ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ["1", "2"]      ┆ a    │
│ [2]       ┆ 2    ┆ ["2"]           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ["1", "2", "3"] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ["4", "5"]      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ["5", "4"]      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘

grizz.transformer.InplaceJsonDecodeTransformer

Bases: JsonDecodeTransformer

Implement a transformer to parse string values as JSON.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to parse. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to polars.Expr.str.json_decode.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceJsonDecode
>>> transformer = InplaceJsonDecode(columns=["col1", "col3"])
>>> transformer
InplaceJsonDecodeTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ str       ┆ str  ┆ str             ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ list[i64] ┆ str  ┆ list[str]       ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ["1", "2"]      ┆ a    │
│ [2]       ┆ 2    ┆ ["2"]           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ["1", "2", "3"] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ["4", "5"]      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ["5", "4"]      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘

grizz.transformer.InplaceLabelEncoder

Bases: LabelEncoderTransformer

Implement a polars.DataFrame to encode the labels in a given column.

Parameters:

Name Type Description Default
col str

The column name to transform

required
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceLabelEncoder
>>> transformer = InplaceLabelEncoder(col="col1")
>>> transformer
InplaceLabelEncoderTransformer(col='col1', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": ["1", "2", "3", "4", "5"],
...     }
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ str  │
╞══════╪══════╡
│ a    ┆ 1    │
│ b    ┆ 2    │
│ c    ┆ 3    │
│ d    ┆ 4    │
│ e    ┆ 5    │
└──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 0    ┆ 1    │
│ 1    ┆ 2    │
│ 2    ┆ 3    │
│ 3    ┆ 4    │
│ 4    ┆ 5    │
└──────┴──────┘

grizz.transformer.InplaceLabelEncoderTransformer

Bases: LabelEncoderTransformer

Implement a polars.DataFrame to encode the labels in a given column.

Parameters:

Name Type Description Default
col str

The column name to transform

required
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceLabelEncoder
>>> transformer = InplaceLabelEncoder(col="col1")
>>> transformer
InplaceLabelEncoderTransformer(col='col1', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": ["1", "2", "3", "4", "5"],
...     }
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ str  │
╞══════╪══════╡
│ a    ┆ 1    │
│ b    ┆ 2    │
│ c    ┆ 3    │
│ d    ┆ 4    │
│ e    ┆ 5    │
└──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 0    ┆ 1    │
│ 1    ┆ 2    │
│ 2    ┆ 3    │
│ 3    ┆ 4    │
│ 4    ┆ 5    │
└──────┴──────┘

grizz.transformer.InplaceNumericCast

Bases: InplaceCastTransformer

Implement a transformer to convert numeric columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceNumericCast
>>> transformer = InplaceNumericCast(columns=["col1", "col3"], dtype=pl.Float32)
>>> transformer
InplaceNumericCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float32,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f32  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f32  ┆ f32  ┆ f32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2.0  ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3.0  ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4.0  ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5.0  ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceNumericCastTransformer

Bases: InplaceCastTransformer

Implement a transformer to convert numeric columns to a new data type.

InplaceCastTransformer is a specific implementation of CastTransformer that performs the transformation in-place.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceNumericCast
>>> transformer = InplaceNumericCast(columns=["col1", "col3"], dtype=pl.Float32)
>>> transformer
InplaceNumericCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float32,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f32  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f32  ┆ f32  ┆ f32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1.0  ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2.0  ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3.0  ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4.0  ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5.0  ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplacePowerTransformer

Bases: PowerTransformer

Implement a transformer to apply a power transform featurewise to make data more Gaussian-like.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to transform. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.PowerTransformer.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplacePowerTransformer
>>> transformer = InplacePowerTransformer(columns=["col1", "col3"])
>>> transformer
InplacePowerTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 4)
┌───────────┬──────┬───────────┬──────┐
│ col1      ┆ col2 ┆ col3      ┆ col4 │
│ ---       ┆ ---  ┆ ---       ┆ ---  │
│ f64       ┆ str  ┆ f64       ┆ str  │
╞═══════════╪══════╪═══════════╪══════╡
│ -1.567837 ┆ 0    ┆ -1.695398 ┆ a    │
│ -0.836194 ┆ 1    ┆ -0.740367 ┆ b    │
│ -0.210053 ┆ 2    ┆ -0.117399 ┆ c    │
│ 0.356111  ┆ 3    ┆ 0.402585  ┆ d    │
│ 0.881486  ┆ 4    ┆ 0.864187  ┆ e    │
│ 1.376486  ┆ 5    ┆ 1.286392  ┆ f    │
└───────────┴──────┴───────────┴──────┘

grizz.transformer.InplaceQuantileTransformer

Bases: QuantileTransformer

Implement a transformer to apply the quantile transformation.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.QuantileTransformer.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceQuantileTransformer
>>> transformer = InplaceQuantileTransformer(columns=["col1", "col3"])
>>> transformer
InplaceQuantileTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f64  ┆ str  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0.0  ┆ 0    ┆ 0.0  ┆ a    │
│ 0.2  ┆ 1    ┆ 0.2  ┆ b    │
│ 0.4  ┆ 2    ┆ 0.4  ┆ c    │
│ 0.6  ┆ 3    ┆ 0.6  ┆ d    │
│ 0.8  ┆ 4    ┆ 0.8  ┆ e    │
│ 1.0  ┆ 5    ┆ 1.0  ┆ f    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceReplace

Bases: ReplaceTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
col str

The column name.

required
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceReplace
>>> transformer = InplaceReplace(col="col", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5})
>>> transformer
InplaceReplaceTransformer(col='col', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
│ 4   │
│ 5   │
└─────┘
>>> transformer = InplaceReplace(col="col", old={"a": 1, "b": 2, "c": 3}, default=None)
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌──────┐
│ col  │
│ ---  │
│ i64  │
╞══════╡
│ 1    │
│ 2    │
│ 3    │
│ null │
│ null │
└──────┘

grizz.transformer.InplaceReplaceStrict

Bases: ReplaceStrictTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
col str

The column name.

required
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceReplaceStrict
>>> transformer = InplaceReplaceStrict(
...     col="col", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
InplaceReplaceStrictTransformer(col='col', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
│ 4   │
│ 5   │
└─────┘
>>> transformer = InplaceReplaceStrict(
...     col="col", old={"a": 1, "b": 2, "c": 3}, default=None
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌──────┐
│ col  │
│ ---  │
│ i64  │
╞══════╡
│ 1    │
│ 2    │
│ 3    │
│ null │
│ null │
└──────┘

grizz.transformer.InplaceReplaceStrictTransformer

Bases: ReplaceStrictTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
col str

The column name.

required
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceReplaceStrict
>>> transformer = InplaceReplaceStrict(
...     col="col", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
InplaceReplaceStrictTransformer(col='col', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
│ 4   │
│ 5   │
└─────┘
>>> transformer = InplaceReplaceStrict(
...     col="col", old={"a": 1, "b": 2, "c": 3}, default=None
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌──────┐
│ col  │
│ ---  │
│ i64  │
╞══════╡
│ 1    │
│ 2    │
│ 3    │
│ null │
│ null │
└──────┘

grizz.transformer.InplaceReplaceTransformer

Bases: ReplaceTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
col str

The column name.

required
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceReplace
>>> transformer = InplaceReplace(col="col", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5})
>>> transformer
InplaceReplaceTransformer(col='col', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
│ 4   │
│ 5   │
└─────┘
>>> transformer = InplaceReplace(col="col", old={"a": 1, "b": 2, "c": 3}, default=None)
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌──────┐
│ col  │
│ ---  │
│ i64  │
╞══════╡
│ 1    │
│ 2    │
│ 3    │
│ null │
│ null │
└──────┘

grizz.transformer.InplaceRobustScaler

Bases: RobustScalerTransformer

Implement a transformer to scale each column using statistics that are robust to outliers.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.RobustScaler.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceRobustScaler
>>> transformer = InplaceRobustScaler(columns=["col1", "col3"])
>>> transformer
InplaceRobustScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f64  ┆ str  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ -1.0 ┆ 0    ┆ -1.0 ┆ a    │
│ -0.6 ┆ 1    ┆ -0.6 ┆ b    │
│ -0.2 ┆ 2    ┆ -0.2 ┆ c    │
│ 0.2  ┆ 3    ┆ 0.2  ┆ d    │
│ 0.6  ┆ 4    ┆ 0.6  ┆ e    │
│ 1.0  ┆ 5    ┆ 1.0  ┆ f    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceRobustScalerTransformer

Bases: RobustScalerTransformer

Implement a transformer to scale each column using statistics that are robust to outliers.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.RobustScaler.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceRobustScaler
>>> transformer = InplaceRobustScaler(columns=["col1", "col3"])
>>> transformer
InplaceRobustScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ f64  ┆ str  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ -1.0 ┆ 0    ┆ -1.0 ┆ a    │
│ -0.6 ┆ 1    ┆ -0.6 ┆ b    │
│ -0.2 ┆ 2    ┆ -0.2 ┆ c    │
│ 0.2  ┆ 3    ┆ 0.2  ┆ d    │
│ 0.6  ┆ 4    ┆ 0.6  ┆ e    │
│ 1.0  ┆ 5    ┆ 1.0  ┆ f    │
└──────┴──────┴──────┴──────┘

grizz.transformer.InplaceStandardScaler

Bases: StandardScalerTransformer

Implement a transformer to standardize each column by removing the mean and scaling to unit variance.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.StandardScaler.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStandardScaler
>>> transformer = InplaceStandardScaler(columns=["col1", "col3"])
>>> transformer
InplaceStandardScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬───────────┬──────┐
│ col1      ┆ col2 ┆ col3      ┆ col4 │
│ ---       ┆ ---  ┆ ---       ┆ ---  │
│ f64       ┆ str  ┆ f64       ┆ str  │
╞═══════════╪══════╪═══════════╪══════╡
│ -1.414214 ┆ 1    ┆ -1.414214 ┆ a    │
│ -0.707107 ┆ 2    ┆ -0.707107 ┆ b    │
│ 0.0       ┆ 3    ┆ 0.0       ┆ c    │
│ 0.707107  ┆ 4    ┆ 0.707107  ┆ d    │
│ 1.414214  ┆ 5    ┆ 1.414214  ┆ e    │
└───────────┴──────┴───────────┴──────┘

grizz.transformer.InplaceStandardScalerTransformer

Bases: StandardScalerTransformer

Implement a transformer to standardize each column by removing the mean and scaling to unit variance.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.StandardScaler.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStandardScaler
>>> transformer = InplaceStandardScaler(columns=["col1", "col3"])
>>> transformer
InplaceStandardScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬───────────┬──────┐
│ col1      ┆ col2 ┆ col3      ┆ col4 │
│ ---       ┆ ---  ┆ ---       ┆ ---  │
│ f64       ┆ str  ┆ f64       ┆ str  │
╞═══════════╪══════╪═══════════╪══════╡
│ -1.414214 ┆ 1    ┆ -1.414214 ┆ a    │
│ -0.707107 ┆ 2    ┆ -0.707107 ┆ b    │
│ 0.0       ┆ 3    ┆ 0.0       ┆ c    │
│ 0.707107  ┆ 4    ┆ 0.707107  ┆ d    │
│ 1.414214  ┆ 5    ┆ 1.414214  ┆ e    │
└───────────┴──────┴───────────┴──────┘

grizz.transformer.InplaceStringToDatetime

Bases: StringToDatetimeTransformer

Implement a transformer to convert some string columns to polars.Datetime type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert to polars.Datetime. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for to_time.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStringToDatetime
>>> transformer = InplaceStringToDatetime(columns=["col1"])
>>> transformer
InplaceStringToDatetimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ datetime[μs]        ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘

grizz.transformer.InplaceStringToDatetimeTransformer

Bases: StringToDatetimeTransformer

Implement a transformer to convert some string columns to polars.Datetime type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert to polars.Datetime. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for to_time.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStringToDatetime
>>> transformer = InplaceStringToDatetime(columns=["col1"])
>>> transformer
InplaceStringToDatetimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ datetime[μs]        ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘

grizz.transformer.InplaceStringToTime

Bases: StringToTimeTransformer

Implement a transformer to convert some string columns to a polars.Time type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for polars.Expr.str.to_time.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStringToTime
>>> transformer = InplaceStringToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
InplaceStringToTimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', format='%H:%M:%S')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ time     ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘

grizz.transformer.InplaceStringToTimeTransformer

Bases: StringToTimeTransformer

Implement a transformer to convert some string columns to a polars.Time type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for polars.Expr.str.to_time.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStringToTime
>>> transformer = InplaceStringToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
InplaceStringToTimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', format='%H:%M:%S')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ time     ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘

grizz.transformer.InplaceStripChars

Bases: StripCharsTransformer

Implement a transformer to remove leading and trailing characters.

This transformer ignores the columns that are not of type string.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to prepare. If None, it processes all the columns of type string.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for strip_chars.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStripChars
>>> transformer = InplaceStripChars(columns=["col2", "col3"])
>>> transformer
InplaceStripCharsTransformer(columns=('col2', 'col3'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ str  ┆ str  ┆ str   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1    ┆ a    ┆ a     │
│ 2    ┆ 2    ┆ b    ┆  b    │
│ 3    ┆ 3    ┆ c    ┆   c   │
│ 4    ┆ 4    ┆ d    ┆ d     │
│ 5    ┆ 5    ┆ e    ┆ e     │
└──────┴──────┴──────┴───────┘

grizz.transformer.InplaceStripCharsTransformer

Bases: StripCharsTransformer

Implement a transformer to remove leading and trailing characters.

This transformer ignores the columns that are not of type string.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to prepare. If None, it processes all the columns of type string.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for strip_chars.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceStripChars
>>> transformer = InplaceStripChars(columns=["col2", "col3"])
>>> transformer
InplaceStripCharsTransformer(columns=('col2', 'col3'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ str  ┆ str  ┆ str   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1    ┆ a    ┆ a     │
│ 2    ┆ 2    ┆ b    ┆  b    │
│ 3    ┆ 3    ┆ c    ┆   c   │
│ 4    ┆ 4    ┆ d    ┆ d     │
│ 5    ┆ 5    ┆ e    ┆ e     │
└──────┴──────┴──────┴───────┘

grizz.transformer.InplaceToDatetime

Bases: ToDatetimeTransformer

Implement a transformer to convert some columns to a polars.Datetime type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert to polars.Datetime. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for to_time.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceToDatetime
>>> transformer = InplaceToDatetime(columns=["col1"])
>>> transformer
InplaceToDatetimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ datetime[μs]        ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘

grizz.transformer.InplaceToDatetimeTransformer

Bases: ToDatetimeTransformer

Implement a transformer to convert some columns to a polars.Datetime type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert to polars.Datetime. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for to_time.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceToDatetime
>>> transformer = InplaceToDatetime(columns=["col1"])
>>> transformer
InplaceToDatetimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ datetime[μs]        ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘

grizz.transformer.InplaceToTime

Bases: ToTimeTransformer

Implement a transformer to convert some columns to a polars.Time type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for to_time.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceToTime
>>> transformer = InplaceToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
InplaceToTimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', format='%H:%M:%S')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ time     ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘

grizz.transformer.InplaceToTimeTransformer

Bases: ToTimeTransformer

Implement a transformer to convert some columns to a polars.Time type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for to_time.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import InplaceToTime
>>> transformer = InplaceToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
InplaceToTimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', format='%H:%M:%S')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ time     ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘

grizz.transformer.IntegerCast

Bases: CastTransformer

Implement a transformer to convert integer columns to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import IntegerCast
>>> transformer = IntegerCast(
...     columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
IntegerCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1, 2, 3, 4, 5],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Int64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1    ┆ a    │
│ 2    ┆ 2.0  ┆ 2    ┆ b    │
│ 3    ┆ 3.0  ┆ 3    ┆ c    │
│ 4    ┆ 4.0  ┆ 4    ┆ d    │
│ 5    ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      │
│ i64  ┆ f64  ┆ i64  ┆ str  ┆ f32      │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1    ┆ 1.0  ┆ 1    ┆ a    ┆ 1.0      │
│ 2    ┆ 2.0  ┆ 2    ┆ b    ┆ 2.0      │
│ 3    ┆ 3.0  ┆ 3    ┆ c    ┆ 3.0      │
│ 4    ┆ 4.0  ┆ 4    ┆ d    ┆ 4.0      │
│ 5    ┆ 5.0  ┆ 5    ┆ e    ┆ 5.0      │
└──────┴──────┴──────┴──────┴──────────┘

grizz.transformer.IntegerCastTransformer

Bases: CastTransformer

Implement a transformer to convert integer columns to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import IntegerCast
>>> transformer = IntegerCast(
...     columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
IntegerCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1, 2, 3, 4, 5],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float64,
...         "col3": pl.Int64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1    ┆ a    │
│ 2    ┆ 2.0  ┆ 2    ┆ b    │
│ 3    ┆ 3.0  ┆ 3    ┆ c    │
│ 4    ┆ 4.0  ┆ 4    ┆ d    │
│ 5    ┆ 5.0  ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      │
│ i64  ┆ f64  ┆ i64  ┆ str  ┆ f32      │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1    ┆ 1.0  ┆ 1    ┆ a    ┆ 1.0      │
│ 2    ┆ 2.0  ┆ 2    ┆ b    ┆ 2.0      │
│ 3    ┆ 3.0  ┆ 3    ┆ c    ┆ 3.0      │
│ 4    ┆ 4.0  ┆ 4    ┆ d    ┆ 4.0      │
│ 5    ┆ 5.0  ┆ 5    ┆ e    ┆ 5.0      │
└──────┴──────┴──────┴──────┴──────────┘

grizz.transformer.JsonDecode

Bases: BaseInNOutNTransformer

Implement a transformer to parse string values as JSON.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to parse. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to polars.Expr.str.json_decode.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import JsonDecode
>>> transformer = JsonDecode(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
JsonDecodeTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ str       ┆ str  ┆ str             ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌───────────┬──────┬─────────────────┬──────┬───────────┬─────────────────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 ┆ col1_out  ┆ col3_out        │
│ ---       ┆ ---  ┆ ---             ┆ ---  ┆ ---       ┆ ---             │
│ str       ┆ str  ┆ str             ┆ str  ┆ list[i64] ┆ list[str]       │
╞═══════════╪══════╪═════════════════╪══════╪═══════════╪═════════════════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    ┆ [1, 2]    ┆ ["1", "2"]      │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    ┆ [2]       ┆ ["2"]           │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    ┆ [1, 2, 3] ┆ ["1", "2", "3"] │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    ┆ [4, 5]    ┆ ["4", "5"]      │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    ┆ [5, 4]    ┆ ["5", "4"]      │
└───────────┴──────┴─────────────────┴──────┴───────────┴─────────────────┘

grizz.transformer.JsonDecodeTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to parse string values as JSON.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to parse. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to polars.Expr.str.json_decode.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import JsonDecode
>>> transformer = JsonDecode(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
JsonDecodeTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ str       ┆ str  ┆ str             ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌───────────┬──────┬─────────────────┬──────┬───────────┬─────────────────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 ┆ col1_out  ┆ col3_out        │
│ ---       ┆ ---  ┆ ---             ┆ ---  ┆ ---       ┆ ---             │
│ str       ┆ str  ┆ str             ┆ str  ┆ list[i64] ┆ list[str]       │
╞═══════════╪══════╪═════════════════╪══════╪═══════════╪═════════════════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    ┆ [1, 2]    ┆ ["1", "2"]      │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    ┆ [2]       ┆ ["2"]           │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    ┆ [1, 2, 3] ┆ ["1", "2", "3"] │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    ┆ [4, 5]    ┆ ["4", "5"]      │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    ┆ [5, 4]    ┆ ["5", "4"]      │
└───────────┴──────┴─────────────────┴──────┴───────────┴─────────────────┘

grizz.transformer.LabelEncoder

Bases: BaseIn1Out1Transformer

Implement a polars.DataFrame to encode the labels in a given column.

Parameters:

Name Type Description Default
in_col str

The input column name i.e. the column with the label to encode.

required
out_col str

The output column name i.e. the column with encoded labels.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import LabelEncoder
>>> transformer = LabelEncoder(in_col="col1", out_col="out")
>>> transformer
LabelEncoderTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": ["1", "2", "3", "4", "5"],
...     }
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ str  │
╞══════╪══════╡
│ a    ┆ 1    │
│ b    ┆ 2    │
│ c    ┆ 3    │
│ d    ┆ 4    │
│ e    ┆ 5    │
└──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬─────┐
│ col1 ┆ col2 ┆ out │
│ ---  ┆ ---  ┆ --- │
│ str  ┆ str  ┆ i64 │
╞══════╪══════╪═════╡
│ a    ┆ 1    ┆ 0   │
│ b    ┆ 2    ┆ 1   │
│ c    ┆ 3    ┆ 2   │
│ d    ┆ 4    ┆ 3   │
│ e    ┆ 5    ┆ 4   │
└──────┴──────┴─────┘

grizz.transformer.LabelEncoderTransformer

Bases: BaseIn1Out1Transformer

Implement a polars.DataFrame to encode the labels in a given column.

Parameters:

Name Type Description Default
in_col str

The input column name i.e. the column with the label to encode.

required
out_col str

The output column name i.e. the column with encoded labels.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import LabelEncoder
>>> transformer = LabelEncoder(in_col="col1", out_col="out")
>>> transformer
LabelEncoderTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["a", "b", "c", "d", "e"],
...         "col2": ["1", "2", "3", "4", "5"],
...     }
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ str  │
╞══════╪══════╡
│ a    ┆ 1    │
│ b    ┆ 2    │
│ c    ┆ 3    │
│ d    ┆ 4    │
│ e    ┆ 5    │
└──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬─────┐
│ col1 ┆ col2 ┆ out │
│ ---  ┆ ---  ┆ --- │
│ str  ┆ str  ┆ i64 │
╞══════╪══════╪═════╡
│ a    ┆ 1    ┆ 0   │
│ b    ┆ 2    ┆ 1   │
│ c    ┆ 3    ┆ 2   │
│ d    ┆ 4    ┆ 3   │
│ e    ┆ 5    ┆ 4   │
└──────┴──────┴─────┘

grizz.transformer.Lower

Bases: BaseComparatorTransformer

Implements a transformer that computes the lower operation.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Lower
>>> transformer = Lower(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
LowerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.LowerEqual

Bases: BaseComparatorTransformer

Implements a transformer that computes the lower than or equal operation.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import LowerEqual
>>> transformer = LowerEqual(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
LowerEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.LowerEqualTransformer

Bases: BaseComparatorTransformer

Implements a transformer that computes the lower than or equal operation.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import LowerEqual
>>> transformer = LowerEqual(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
LowerEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.LowerTransformer

Bases: BaseComparatorTransformer

Implements a transformer that computes the lower operation.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Lower
>>> transformer = Lower(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
LowerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ false    │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ false    │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ true     ┆ false    │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ false    │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ false    ┆ false    │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.MaxAbsScaler

Bases: BaseInNOutNTransformer

Implement a transformer to scale columns by the maximum absolute value of each column.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MaxAbsScaler
>>> transformer = MaxAbsScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
MaxAbsScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ 0.2      ┆ 0.2      │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ 0.4      ┆ 0.4      │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ 0.6      ┆ 0.6      │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ 0.8      ┆ 0.8      │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ 1.0      ┆ 1.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.MaxAbsScalerTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to scale columns by the maximum absolute value of each column.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MaxAbsScaler
>>> transformer = MaxAbsScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
MaxAbsScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ 0.2      ┆ 0.2      │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ 0.4      ┆ 0.4      │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ 0.6      ┆ 0.6      │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ 0.8      ┆ 0.8      │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ 1.0      ┆ 1.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.MaxHorizontal

Bases: BaseInNOut1Transformer

Implement a transformer to get the maximum value horizontally across columns and store the result in a column.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns the maximum value horizontally. The columns should be compatible. If None, it processes all the columns.

required
out_col str

The output column.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MaxHorizontal
>>> transformer = MaxHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MaxHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [9, 5, 4, 9, 6],
...         "col2": [8, 0, 1, 8, 9],
...         "col3": [0, 4, 8, 7, 0],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 9    ┆ 8    ┆ 0    ┆ a    │
│ 5    ┆ 0    ┆ 4    ┆ b    │
│ 4    ┆ 1    ┆ 8    ┆ c    │
│ 9    ┆ 8    ┆ 7    ┆ d    │
│ 6    ┆ 9    ┆ 0    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 9    ┆ 8    ┆ 0    ┆ a    ┆ 9   │
│ 5    ┆ 0    ┆ 4    ┆ b    ┆ 5   │
│ 4    ┆ 1    ┆ 8    ┆ c    ┆ 8   │
│ 9    ┆ 8    ┆ 7    ┆ d    ┆ 9   │
│ 6    ┆ 9    ┆ 0    ┆ e    ┆ 9   │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.MaxHorizontalTransformer

Bases: BaseInNOut1Transformer

Implement a transformer to get the maximum value horizontally across columns and store the result in a column.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns the maximum value horizontally. The columns should be compatible. If None, it processes all the columns.

required
out_col str

The output column.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MaxHorizontal
>>> transformer = MaxHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MaxHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [9, 5, 4, 9, 6],
...         "col2": [8, 0, 1, 8, 9],
...         "col3": [0, 4, 8, 7, 0],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 9    ┆ 8    ┆ 0    ┆ a    │
│ 5    ┆ 0    ┆ 4    ┆ b    │
│ 4    ┆ 1    ┆ 8    ┆ c    │
│ 9    ┆ 8    ┆ 7    ┆ d    │
│ 6    ┆ 9    ┆ 0    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 9    ┆ 8    ┆ 0    ┆ a    ┆ 9   │
│ 5    ┆ 0    ┆ 4    ┆ b    ┆ 5   │
│ 4    ┆ 1    ┆ 8    ┆ c    ┆ 8   │
│ 9    ┆ 8    ┆ 7    ┆ d    ┆ 9   │
│ 6    ┆ 9    ┆ 0    ┆ e    ┆ 9   │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.MeanHorizontal

Bases: BaseInNOut1Transformer

Implement a transformer to get the mean value horizontally across columns and store the result in a column.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns the mean value horizontally. The columns should be compatible. If None, it processes all the columns.

required
out_col str

The output column.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to polars.mean_horizontal.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MeanHorizontal
>>> transformer = MeanHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MeanHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col  │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ 21.0 │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ 22.0 │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ 23.0 │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ 24.0 │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ 25.0 │
└──────┴──────┴──────┴──────┴──────┘

grizz.transformer.MeanHorizontalTransformer

Bases: BaseInNOut1Transformer

Implement a transformer to get the mean value horizontally across columns and store the result in a column.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns the mean value horizontally. The columns should be compatible. If None, it processes all the columns.

required
out_col str

The output column.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to polars.mean_horizontal.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MeanHorizontal
>>> transformer = MeanHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MeanHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col  │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ f64  │
╞══════╪══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ 21.0 │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ 22.0 │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ 23.0 │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ 24.0 │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ 25.0 │
└──────┴──────┴──────┴──────┴──────┘

grizz.transformer.MinHorizontal

Bases: BaseInNOut1Transformer

Implement a transformer to get the minimum value horizontally across columns and store the result in a column.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns the minimum value horizontally. The columns should be compatible. If None, it processes all the columns.

required
out_col str

The output column.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MinHorizontal
>>> transformer = MinHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MinHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [9, 5, 4, 9, 6],
...         "col2": [8, 0, 1, 8, 9],
...         "col3": [0, 4, 8, 7, 0],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 9    ┆ 8    ┆ 0    ┆ a    │
│ 5    ┆ 0    ┆ 4    ┆ b    │
│ 4    ┆ 1    ┆ 8    ┆ c    │
│ 9    ┆ 8    ┆ 7    ┆ d    │
│ 6    ┆ 9    ┆ 0    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 9    ┆ 8    ┆ 0    ┆ a    ┆ 0   │
│ 5    ┆ 0    ┆ 4    ┆ b    ┆ 0   │
│ 4    ┆ 1    ┆ 8    ┆ c    ┆ 1   │
│ 9    ┆ 8    ┆ 7    ┆ d    ┆ 7   │
│ 6    ┆ 9    ┆ 0    ┆ e    ┆ 0   │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.MinHorizontalTransformer

Bases: BaseInNOut1Transformer

Implement a transformer to get the minimum value horizontally across columns and store the result in a column.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns the minimum value horizontally. The columns should be compatible. If None, it processes all the columns.

required
out_col str

The output column.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MinHorizontal
>>> transformer = MinHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MinHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [9, 5, 4, 9, 6],
...         "col2": [8, 0, 1, 8, 9],
...         "col3": [0, 4, 8, 7, 0],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 9    ┆ 8    ┆ 0    ┆ a    │
│ 5    ┆ 0    ┆ 4    ┆ b    │
│ 4    ┆ 1    ┆ 8    ┆ c    │
│ 9    ┆ 8    ┆ 7    ┆ d    │
│ 6    ┆ 9    ┆ 0    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 9    ┆ 8    ┆ 0    ┆ a    ┆ 0   │
│ 5    ┆ 0    ┆ 4    ┆ b    ┆ 0   │
│ 4    ┆ 1    ┆ 8    ┆ c    ┆ 1   │
│ 9    ┆ 8    ┆ 7    ┆ d    ┆ 7   │
│ 6    ┆ 9    ┆ 0    ┆ e    ┆ 0   │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.MinMaxScaler

Bases: BaseInNOutNTransformer

Implement a transformer to scale each column to a given range.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.MinMaxScaler.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MinMaxScaler
>>> transformer = MinMaxScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
MinMaxScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘

>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ 0    ┆ a    ┆ 0.0      ┆ 0.0      │
│ 1    ┆ 1    ┆ 10   ┆ b    ┆ 0.2      ┆ 0.2      │
│ 2    ┆ 2    ┆ 20   ┆ c    ┆ 0.4      ┆ 0.4      │
│ 3    ┆ 3    ┆ 30   ┆ d    ┆ 0.6      ┆ 0.6      │
│ 4    ┆ 4    ┆ 40   ┆ e    ┆ 0.8      ┆ 0.8      │
│ 5    ┆ 5    ┆ 50   ┆ f    ┆ 1.0      ┆ 1.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.MinMaxScalerTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to scale each column to a given range.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.MinMaxScaler.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import MinMaxScaler
>>> transformer = MinMaxScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
MinMaxScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘

>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ 0    ┆ a    ┆ 0.0      ┆ 0.0      │
│ 1    ┆ 1    ┆ 10   ┆ b    ┆ 0.2      ┆ 0.2      │
│ 2    ┆ 2    ┆ 20   ┆ c    ┆ 0.4      ┆ 0.4      │
│ 3    ┆ 3    ┆ 30   ┆ d    ┆ 0.6      ┆ 0.6      │
│ 4    ┆ 4    ┆ 40   ┆ e    ┆ 0.8      ┆ 0.8      │
│ 5    ┆ 5    ┆ 50   ┆ f    ┆ 1.0      ┆ 1.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.Normalizer

Bases: BaseInNOutNTransformer

Implement a transformer to normalize data points individually to unit norm.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.Normalizer.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Normalizer
>>> transformer = Normalizer(columns=["col1", "col3"], prefix="", suffix="_norm")
>>> transformer
NormalizerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_norm')
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [5, 4, 3, 2, 1, 0],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 5    ┆ a    │
│ 1    ┆ 1    ┆ 4    ┆ b    │
│ 2    ┆ 2    ┆ 3    ┆ c    │
│ 3    ┆ 3    ┆ 2    ┆ d    │
│ 4    ┆ 4    ┆ 1    ┆ e    │
│ 5    ┆ 5    ┆ 0    ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬───────────┬───────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_norm ┆ col3_norm │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---       ┆ ---       │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64       ┆ f64       │
╞══════╪══════╪══════╪══════╪═══════════╪═══════════╡
│ 0    ┆ 0    ┆ 5    ┆ a    ┆ 0.0       ┆ 1.0       │
│ 1    ┆ 1    ┆ 4    ┆ b    ┆ 0.242536  ┆ 0.970143  │
│ 2    ┆ 2    ┆ 3    ┆ c    ┆ 0.5547    ┆ 0.83205   │
│ 3    ┆ 3    ┆ 2    ┆ d    ┆ 0.83205   ┆ 0.5547    │
│ 4    ┆ 4    ┆ 1    ┆ e    ┆ 0.970143  ┆ 0.242536  │
│ 5    ┆ 5    ┆ 0    ┆ f    ┆ 1.0       ┆ 0.0       │
└──────┴──────┴──────┴──────┴───────────┴───────────┘

grizz.transformer.NormalizerTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to normalize data points individually to unit norm.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.Normalizer.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Normalizer
>>> transformer = Normalizer(columns=["col1", "col3"], prefix="", suffix="_norm")
>>> transformer
NormalizerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_norm')
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [5, 4, 3, 2, 1, 0],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 5    ┆ a    │
│ 1    ┆ 1    ┆ 4    ┆ b    │
│ 2    ┆ 2    ┆ 3    ┆ c    │
│ 3    ┆ 3    ┆ 2    ┆ d    │
│ 4    ┆ 4    ┆ 1    ┆ e    │
│ 5    ┆ 5    ┆ 0    ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬───────────┬───────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_norm ┆ col3_norm │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---       ┆ ---       │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64       ┆ f64       │
╞══════╪══════╪══════╪══════╪═══════════╪═══════════╡
│ 0    ┆ 0    ┆ 5    ┆ a    ┆ 0.0       ┆ 1.0       │
│ 1    ┆ 1    ┆ 4    ┆ b    ┆ 0.242536  ┆ 0.970143  │
│ 2    ┆ 2    ┆ 3    ┆ c    ┆ 0.5547    ┆ 0.83205   │
│ 3    ┆ 3    ┆ 2    ┆ d    ┆ 0.83205   ┆ 0.5547    │
│ 4    ┆ 4    ┆ 1    ┆ e    ┆ 0.970143  ┆ 0.242536  │
│ 5    ┆ 5    ┆ 0    ┆ f    ┆ 1.0       ┆ 0.0       │
└──────┴──────┴──────┴──────┴───────────┴───────────┘

grizz.transformer.NotEqual

Bases: BaseComparatorTransformer

Implements a transformer that computes the not equal operation.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import NotEqual
>>> transformer = NotEqual(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
NotEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.NotEqualMissing

Bases: BaseComparatorTransformer

Implements a transformer that computes the not equal operation where where null values are not propagated.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import NotEqualMissing
>>> transformer = NotEqualMissing(
...     columns=["col1", "col3"], target=3, prefix="", suffix="_out"
... )
>>> transformer
NotEqualMissingTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.NotEqualMissingTransformer

Bases: BaseComparatorTransformer

Implements a transformer that computes the not equal operation where where null values are not propagated.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import NotEqualMissing
>>> transformer = NotEqualMissing(
...     columns=["col1", "col3"], target=3, prefix="", suffix="_out"
... )
>>> transformer
NotEqualMissingTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.NotEqualTransformer

Bases: BaseComparatorTransformer

Implements a transformer that computes the not equal operation.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to compare. None means all the columns.

required
target Any

The target value to compare with.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import polars as pl
>>> from grizz.transformer import NotEqual
>>> transformer = NotEqual(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
NotEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ bool     ┆ bool     │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ true     ┆ true     │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ true     ┆ true     │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ false    ┆ true     │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ true     ┆ true     │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ true     ┆ true     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.NumericCast

Bases: CastTransformer

Implement a transformer to convert numeric columns to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import NumericCast
>>> transformer = NumericCast(
...     columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
NumericCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float32,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f32  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col2_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ f32  ┆ f64  ┆ str  ┆ f32      ┆ f32      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    ┆ 1.0      ┆ 1.0      │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    ┆ 2.0      ┆ 2.0      │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    ┆ 3.0      ┆ 3.0      │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    ┆ 4.0      ┆ 4.0      │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    ┆ 5.0      ┆ 5.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.NumericCastTransformer

Bases: CastTransformer

Implement a transformer to convert numeric columns to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to convert. None means all the columns.

required
dtype type[DataType]

The target data type.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for cast.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import NumericCast
>>> transformer = NumericCast(
...     columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
NumericCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
...         "col4": ["a", "b", "c", "d", "e"],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Float32,
...         "col3": pl.Float64,
...         "col4": pl.String,
...     },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f32  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col2_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ f32  ┆ f64  ┆ str  ┆ f32      ┆ f32      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1    ┆ 1.0  ┆ 1.0  ┆ a    ┆ 1.0      ┆ 1.0      │
│ 2    ┆ 2.0  ┆ 2.0  ┆ b    ┆ 2.0      ┆ 2.0      │
│ 3    ┆ 3.0  ┆ 3.0  ┆ c    ┆ 3.0      ┆ 3.0      │
│ 4    ┆ 4.0  ┆ 4.0  ┆ d    ┆ 4.0      ┆ 4.0      │
│ 5    ┆ 5.0  ┆ 5.0  ┆ e    ┆ 5.0      ┆ 5.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.OrdinalEncoder

Bases: BaseInNOutNTransformer

Implement a transformer to convert each column ordinal integers.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.OrdinalEncoder.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import OrdinalEncoder
>>> transformer = OrdinalEncoder(columns=["col1", "col2"], prefix="", suffix="_out")
>>> transformer
OrdinalEncoderTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["a", "b", "c", "d", "e", "f"],
...         "col3": [0, 10, 20, 30, 40, 50],
...     }
... )
>>> frame
shape: (6, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╡
│ 0    ┆ a    ┆ 0    │
│ 1    ┆ b    ┆ 10   │
│ 2    ┆ c    ┆ 20   │
│ 3    ┆ d    ┆ 30   │
│ 4    ┆ e    ┆ 40   │
│ 5    ┆ f    ┆ 50   │
└──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 5)
┌──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col1_out ┆ col2_out │
│ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ a    ┆ 0    ┆ 0.0      ┆ 0.0      │
│ 1    ┆ b    ┆ 10   ┆ 1.0      ┆ 1.0      │
│ 2    ┆ c    ┆ 20   ┆ 2.0      ┆ 2.0      │
│ 3    ┆ d    ┆ 30   ┆ 3.0      ┆ 3.0      │
│ 4    ┆ e    ┆ 40   ┆ 4.0      ┆ 4.0      │
│ 5    ┆ f    ┆ 50   ┆ 5.0      ┆ 5.0      │
└──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.OrdinalEncoderTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to convert each column ordinal integers.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.OrdinalEncoder.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import OrdinalEncoder
>>> transformer = OrdinalEncoder(columns=["col1", "col2"], prefix="", suffix="_out")
>>> transformer
OrdinalEncoderTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["a", "b", "c", "d", "e", "f"],
...         "col3": [0, 10, 20, 30, 40, 50],
...     }
... )
>>> frame
shape: (6, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╡
│ 0    ┆ a    ┆ 0    │
│ 1    ┆ b    ┆ 10   │
│ 2    ┆ c    ┆ 20   │
│ 3    ┆ d    ┆ 30   │
│ 4    ┆ e    ┆ 40   │
│ 5    ┆ f    ┆ 50   │
└──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 5)
┌──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col1_out ┆ col2_out │
│ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ a    ┆ 0    ┆ 0.0      ┆ 0.0      │
│ 1    ┆ b    ┆ 10   ┆ 1.0      ┆ 1.0      │
│ 2    ┆ c    ┆ 20   ┆ 2.0      ┆ 2.0      │
│ 3    ┆ d    ┆ 30   ┆ 3.0      ┆ 3.0      │
│ 4    ┆ e    ┆ 40   ┆ 4.0      ┆ 4.0      │
│ 5    ┆ f    ┆ 50   ┆ 5.0      ┆ 5.0      │
└──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.PowerTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to apply a power transform featurewise to make data more Gaussian-like.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.PowerTransformer.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import PowerTransformer
>>> transformer = PowerTransformer(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
PowerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬───────────┬───────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out  ┆ col3_out  │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---       ┆ ---       │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64       ┆ f64       │
╞══════╪══════╪══════╪══════╪═══════════╪═══════════╡
│ 0    ┆ 0    ┆ 0    ┆ a    ┆ -1.567837 ┆ -1.695398 │
│ 1    ┆ 1    ┆ 10   ┆ b    ┆ -0.836194 ┆ -0.740367 │
│ 2    ┆ 2    ┆ 20   ┆ c    ┆ -0.210053 ┆ -0.117399 │
│ 3    ┆ 3    ┆ 30   ┆ d    ┆ 0.356111  ┆ 0.402585  │
│ 4    ┆ 4    ┆ 40   ┆ e    ┆ 0.881486  ┆ 0.864187  │
│ 5    ┆ 5    ┆ 50   ┆ f    ┆ 1.376486  ┆ 1.286392  │
└──────┴──────┴──────┴──────┴───────────┴───────────┘

grizz.transformer.QuantileTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to apply the quantile transformation.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.QuantileTransformer.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import QuantileTransformer
>>> transformer = QuantileTransformer(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
QuantileTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ 0    ┆ a    ┆ 0.0      ┆ 0.0      │
│ 1    ┆ 1    ┆ 10   ┆ b    ┆ 0.2      ┆ 0.2      │
│ 2    ┆ 2    ┆ 20   ┆ c    ┆ 0.4      ┆ 0.4      │
│ 3    ┆ 3    ┆ 30   ┆ d    ┆ 0.6      ┆ 0.6      │
│ 4    ┆ 4    ┆ 40   ┆ e    ┆ 0.8      ┆ 0.8      │
│ 5    ┆ 5    ┆ 50   ┆ f    ┆ 1.0      ┆ 1.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.Replace

Bases: BaseIn1Out1Transformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
in_col str

The input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Replace
>>> transformer = Replace(in_col="old", out_col="new", old={"a": 1, "b": 2, "c": 3})
>>> transformer
ReplaceTransformer(in_col='old', out_col='new', exist_policy='raise', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3})
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ d   │
│ e   ┆ e   │
└─────┴─────┘
>>> transformer = Replace(
...     in_col="old",
...     out_col="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

grizz.transformer.ReplaceStrict

Bases: BaseIn1Out1Transformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
in_col str

The input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ReplaceStrict
>>> transformer = ReplaceStrict(
...     in_col="old", out_col="new", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
ReplaceStrictTransformer(in_col='old', out_col='new', exist_policy='raise', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ 4   │
│ e   ┆ 5   │
└─────┴─────┘
>>> transformer = ReplaceStrict(
...     in_col="old",
...     out_col="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

grizz.transformer.ReplaceStrictTransformer

Bases: BaseIn1Out1Transformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
in_col str

The input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ReplaceStrict
>>> transformer = ReplaceStrict(
...     in_col="old", out_col="new", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
ReplaceStrictTransformer(in_col='old', out_col='new', exist_policy='raise', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ 4   │
│ e   ┆ 5   │
└─────┴─────┘
>>> transformer = ReplaceStrict(
...     in_col="old",
...     out_col="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

grizz.transformer.ReplaceTransformer

Bases: BaseIn1Out1Transformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
in_col str

The input column name.

required
out_col str

The output column name.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Replace
>>> transformer = Replace(in_col="old", out_col="new", old={"a": 1, "b": 2, "c": 3})
>>> transformer
ReplaceTransformer(in_col='old', out_col='new', exist_policy='raise', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3})
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ d   │
│ e   ┆ e   │
└─────┴─────┘
>>> transformer = Replace(
...     in_col="old",
...     out_col="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

grizz.transformer.RobustScaler

Bases: BaseInNOutNTransformer

Implement a transformer to scale each column using statistics that are robust to outliers.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.RobustScaler.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import RobustScaler
>>> transformer = RobustScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
RobustScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ 0    ┆ a    ┆ -1.0     ┆ -1.0     │
│ 1    ┆ 1    ┆ 10   ┆ b    ┆ -0.6     ┆ -0.6     │
│ 2    ┆ 2    ┆ 20   ┆ c    ┆ -0.2     ┆ -0.2     │
│ 3    ┆ 3    ┆ 30   ┆ d    ┆ 0.2      ┆ 0.2      │
│ 4    ┆ 4    ┆ 40   ┆ e    ┆ 0.6      ┆ 0.6      │
│ 5    ┆ 5    ┆ 50   ┆ f    ┆ 1.0      ┆ 1.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.RobustScalerTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to scale each column using statistics that are robust to outliers.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.RobustScaler.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import RobustScaler
>>> transformer = RobustScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
RobustScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 2, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [0, 10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ a    │
│ 1    ┆ 1    ┆ 10   ┆ b    │
│ 2    ┆ 2    ┆ 20   ┆ c    │
│ 3    ┆ 3    ┆ 30   ┆ d    │
│ 4    ┆ 4    ┆ 40   ┆ e    │
│ 5    ┆ 5    ┆ 50   ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ 0    ┆ a    ┆ -1.0     ┆ -1.0     │
│ 1    ┆ 1    ┆ 10   ┆ b    ┆ -0.6     ┆ -0.6     │
│ 2    ┆ 2    ┆ 20   ┆ c    ┆ -0.2     ┆ -0.2     │
│ 3    ┆ 3    ┆ 30   ┆ d    ┆ 0.2      ┆ 0.2      │
│ 4    ┆ 4    ┆ 40   ┆ e    ┆ 0.6      ┆ 0.6      │
│ 5    ┆ 5    ┆ 50   ┆ f    ┆ 1.0      ┆ 1.0      │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.Sequential

Bases: BaseTransformer

Implement a polars.DataFrame transformer to apply sequentially several transformers.

Parameters:

Name Type Description Default
transformers Sequence[BaseTransformer | dict]

The transformers or their configurations.

required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Sequential, InplaceCast
>>> transformer = Sequential(
...     [
...         InplaceCast(columns=["col1"], dtype=pl.Float32),
...         InplaceCast(columns=["col2"], dtype=pl.Int64),
...     ]
... )
>>> transformer
SequentialTransformer(
  (0): InplaceCastTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', dtype=Float32)
  (1): InplaceCastTransformer(columns=('col2',), exclude_columns=(), missing_policy='raise', dtype=Int64)
)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ f32  ┆ i64  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1.0  ┆ 1    ┆ a     ┆ a     │
│ 2.0  ┆ 2    ┆  b    ┆  b    │
│ 3.0  ┆ 3    ┆   c   ┆   c   │
│ 4.0  ┆ 4    ┆ d     ┆ d     │
│ 5.0  ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘

grizz.transformer.SequentialTransformer

Bases: BaseTransformer

Implement a polars.DataFrame transformer to apply sequentially several transformers.

Parameters:

Name Type Description Default
transformers Sequence[BaseTransformer | dict]

The transformers or their configurations.

required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Sequential, InplaceCast
>>> transformer = Sequential(
...     [
...         InplaceCast(columns=["col1"], dtype=pl.Float32),
...         InplaceCast(columns=["col2"], dtype=pl.Int64),
...     ]
... )
>>> transformer
SequentialTransformer(
  (0): InplaceCastTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', dtype=Float32)
  (1): InplaceCastTransformer(columns=('col2',), exclude_columns=(), missing_policy='raise', dtype=Int64)
)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ f32  ┆ i64  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1.0  ┆ 1    ┆ a     ┆ a     │
│ 2.0  ┆ 2    ┆  b    ┆  b    │
│ 3.0  ┆ 3    ┆   c   ┆   c   │
│ 4.0  ┆ 4    ┆ d     ┆ d     │
│ 5.0  ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘

grizz.transformer.ShrinkMemory

Bases: BaseArgTransformer

Implement a transformer that shrinks DataFrame memory usage.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ShrinkMemory
>>> transformer = ShrinkMemory()
>>> transformer
ShrinkMemoryTransformer()
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.ShrinkMemoryTransformer

Bases: BaseArgTransformer

Implement a transformer that shrinks DataFrame memory usage.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ShrinkMemory
>>> transformer = ShrinkMemory()
>>> transformer
ShrinkMemoryTransformer()
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

grizz.transformer.SimpleImputer

Bases: BaseInNOutNTransformer

Implement a transformer to impute missing values with simple strategies.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
**kwargs Any

Additional arguments passed to sklearn.impute.SimpleImputer.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SimpleImputer
>>> transformer = SimpleImputer(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
SimpleImputerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, None, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [float("nan"), 10, 20, 30, 40, None],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ NaN  ┆ a    │
│ 1    ┆ 1    ┆ 10.0 ┆ b    │
│ null ┆ 2    ┆ 20.0 ┆ c    │
│ 3    ┆ 3    ┆ 30.0 ┆ d    │
│ 4    ┆ 4    ┆ 40.0 ┆ e    │
│ 5    ┆ 5    ┆ null ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ f64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ NaN  ┆ a    ┆ 0.0      ┆ 25.0     │
│ 1    ┆ 1    ┆ 10.0 ┆ b    ┆ 1.0      ┆ 10.0     │
│ null ┆ 2    ┆ 20.0 ┆ c    ┆ null     ┆ 20.0     │
│ 3    ┆ 3    ┆ 30.0 ┆ d    ┆ 3.0      ┆ 30.0     │
│ 4    ┆ 4    ┆ 40.0 ┆ e    ┆ 4.0      ┆ 40.0     │
│ 5    ┆ 5    ┆ null ┆ f    ┆ 5.0      ┆ null     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.SimpleImputerTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to impute missing values with simple strategies.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
**kwargs Any

Additional arguments passed to sklearn.impute.SimpleImputer.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SimpleImputer
>>> transformer = SimpleImputer(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
SimpleImputerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, None, 3, 4, 5],
...         "col2": ["0", "1", "2", "3", "4", "5"],
...         "col3": [float("nan"), 10, 20, 30, 40, None],
...         "col4": ["a", "b", "c", "d", "e", "f"],
...     }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ f64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ NaN  ┆ a    │
│ 1    ┆ 1    ┆ 10.0 ┆ b    │
│ null ┆ 2    ┆ 20.0 ┆ c    │
│ 3    ┆ 3    ┆ 30.0 ┆ d    │
│ 4    ┆ 4    ┆ 40.0 ┆ e    │
│ 5    ┆ 5    ┆ null ┆ f    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ f64  ┆ str  ┆ f64      ┆ f64      │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0    ┆ 0    ┆ NaN  ┆ a    ┆ 0.0      ┆ 25.0     │
│ 1    ┆ 1    ┆ 10.0 ┆ b    ┆ 1.0      ┆ 10.0     │
│ null ┆ 2    ┆ 20.0 ┆ c    ┆ null     ┆ 20.0     │
│ 3    ┆ 3    ┆ 30.0 ┆ d    ┆ 3.0      ┆ 30.0     │
│ 4    ┆ 4    ┆ 40.0 ┆ e    ┆ 4.0      ┆ 40.0     │
│ 5    ┆ 5    ┆ null ┆ f    ┆ 5.0      ┆ null     │
└──────┴──────┴──────┴──────┴──────────┴──────────┘

grizz.transformer.Sort

Bases: BaseInNTransformer

Implement a transformer to sort the DataFrame by the given columns.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to use to sort the rows.

None
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments to pass to sort.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Sort
>>> transformer = Sort(columns=["col3", "col1"])
>>> transformer
SortTransformer(columns=('col3', 'col1'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {"col1": [1, 2, None], "col2": [6.0, 5.0, 4.0], "col3": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ null ┆ 4.0  ┆ b    │
│ 2    ┆ 5.0  ┆ c    │
└──────┴──────┴──────┘

grizz.transformer.SortColumns

Bases: BaseArgTransformer

Implement a transformer to sort the DataFrame columns by name.

Parameters:

Name Type Description Default
reverse bool

If set to False, then the columns are sorted by alphabetical order.

False

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SortColumns
>>> transformer = SortColumns()
>>> transformer
SortColumnsTransformer(reverse=False)
>>> frame = pl.DataFrame(
...     {"col2": [1, 2, None], "col3": [6.0, 5.0, 4.0], "col1": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col2 ┆ col3 ┆ col1 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ str  ┆ i64  ┆ f64  │
╞══════╪══════╪══════╡
│ a    ┆ 1    ┆ 6.0  │
│ c    ┆ 2    ┆ 5.0  │
│ b    ┆ null ┆ 4.0  │
└──────┴──────┴──────┘

grizz.transformer.SortColumnsTransformer

Bases: BaseArgTransformer

Implement a transformer to sort the DataFrame columns by name.

Parameters:

Name Type Description Default
reverse bool

If set to False, then the columns are sorted by alphabetical order.

False

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SortColumns
>>> transformer = SortColumns()
>>> transformer
SortColumnsTransformer(reverse=False)
>>> frame = pl.DataFrame(
...     {"col2": [1, 2, None], "col3": [6.0, 5.0, 4.0], "col1": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col2 ┆ col3 ┆ col1 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ str  ┆ i64  ┆ f64  │
╞══════╪══════╪══════╡
│ a    ┆ 1    ┆ 6.0  │
│ c    ┆ 2    ┆ 5.0  │
│ b    ┆ null ┆ 4.0  │
└──────┴──────┴──────┘

grizz.transformer.SortTransformer

Bases: BaseInNTransformer

Implement a transformer to sort the DataFrame by the given columns.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to use to sort the rows.

None
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments to pass to sort.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import Sort
>>> transformer = Sort(columns=["col3", "col1"])
>>> transformer
SortTransformer(columns=('col3', 'col1'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
...     {"col1": [1, 2, None], "col2": [6.0, 5.0, 4.0], "col3": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ null ┆ 4.0  ┆ b    │
│ 2    ┆ 5.0  ┆ c    │
└──────┴──────┴──────┘

grizz.transformer.SqlTransformer

Bases: BaseArgTransformer

Implement a transformer that executes a SQL query against the DataFrame.

Parameters:

Name Type Description Default
query str

The SQL query to execute.

required

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SqlTransformer
>>> transformer = SqlTransformer(query="SELECT col1, col4 FROM self WHERE col1 > 2")
>>> transformer
SqlTransformer(
  (query): SELECT col1, col4 FROM self WHERE col1 > 2
)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (3, 2)
┌──────┬──────┐
│ col1 ┆ col4 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 3    ┆ c    │
│ 4    ┆ d    │
│ 5    ┆ e    │
└──────┴──────┘

grizz.transformer.StandardScaler

Bases: BaseInNOutNTransformer

Implement a transformer to standardize each column by removing the mean and scaling to unit variance.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.StandardScaler.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StandardScaler
>>> transformer = StandardScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
StandardScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬───────────┬───────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out  ┆ col3_out  │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---       ┆ ---       │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64       ┆ f64       │
╞══════╪══════╪══════╪══════╪═══════════╪═══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ -1.414214 ┆ -1.414214 │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ -0.707107 ┆ -0.707107 │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ 0.0       ┆ 0.0       │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ 0.707107  ┆ 0.707107  │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ 1.414214  ┆ 1.414214  │
└──────┴──────┴──────┴──────┴───────────┴───────────┘

grizz.transformer.StandardScalerTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to standardize each column by removing the mean and scaling to unit variance.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to scale. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
propagate_nulls bool

If set to True, the None values are propagated after the transformation. If False, the None values are replaced by NaNs.

True
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to sklearn.preprocessing.StandardScaler.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StandardScaler
>>> transformer = StandardScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
StandardScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [10, 20, 30, 40, 50],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 10   ┆ a    │
│ 2    ┆ 2    ┆ 20   ┆ b    │
│ 3    ┆ 3    ┆ 30   ┆ c    │
│ 4    ┆ 4    ┆ 40   ┆ d    │
│ 5    ┆ 5    ┆ 50   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬───────────┬───────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out  ┆ col3_out  │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---       ┆ ---       │
│ i64  ┆ str  ┆ i64  ┆ str  ┆ f64       ┆ f64       │
╞══════╪══════╪══════╪══════╪═══════════╪═══════════╡
│ 1    ┆ 1    ┆ 10   ┆ a    ┆ -1.414214 ┆ -1.414214 │
│ 2    ┆ 2    ┆ 20   ┆ b    ┆ -0.707107 ┆ -0.707107 │
│ 3    ┆ 3    ┆ 30   ┆ c    ┆ 0.0       ┆ 0.0       │
│ 4    ┆ 4    ┆ 40   ┆ d    ┆ 0.707107  ┆ 0.707107  │
│ 5    ┆ 5    ┆ 50   ┆ e    ┆ 1.414214  ┆ 1.414214  │
└──────┴──────┴──────┴──────┴───────────┴───────────┘

grizz.transformer.StringToDatetime

Bases: BaseInNOutNTransformer

Implement a transformer to convert some string columns to polars.Datetime type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for to_datetime.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StringToDatetime
>>> transformer = StringToDatetime(columns=["col1"], prefix="", suffix="_out")
>>> transformer
StringToDatetimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌─────────────────────┬──────┬─────────────────────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                ┆ col1_out            │
│ ---                 ┆ ---  ┆ ---                 ┆ ---                 │
│ str                 ┆ str  ┆ str                 ┆ datetime[μs]        │
╞═════════════════════╪══════╪═════════════════════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 ┆ 2020-01-01 01:01:01 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 ┆ 2020-01-01 02:02:02 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 ┆ 2020-01-01 12:00:01 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 ┆ 2020-01-01 18:18:18 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 ┆ 2020-01-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┴─────────────────────┘

grizz.transformer.StringToDatetimeTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to convert some string columns to polars.Datetime type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for to_datetime.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StringToDatetime
>>> transformer = StringToDatetime(columns=["col1"], prefix="", suffix="_out")
>>> transformer
StringToDatetimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌─────────────────────┬──────┬─────────────────────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                ┆ col1_out            │
│ ---                 ┆ ---  ┆ ---                 ┆ ---                 │
│ str                 ┆ str  ┆ str                 ┆ datetime[μs]        │
╞═════════════════════╪══════╪═════════════════════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 ┆ 2020-01-01 01:01:01 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 ┆ 2020-01-01 02:02:02 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 ┆ 2020-01-01 12:00:01 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 ┆ 2020-01-01 18:18:18 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 ┆ 2020-01-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┴─────────────────────┘

grizz.transformer.StringToTime

Bases: BaseInNOutNTransformer

Implement a transformer to convert some string columns to polars.Time type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for polars.Expr.str.to_time.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StringToTime
>>> transformer = StringToTime(
...     columns=["col1"], format="%H:%M:%S", prefix="", suffix="_out"
... )
>>> transformer
StringToTimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', format='%H:%M:%S')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────────┬──────┬──────────┬──────────┐
│ col1     ┆ col2 ┆ col3     ┆ col1_out │
│ ---      ┆ ---  ┆ ---      ┆ ---      │
│ str      ┆ str  ┆ str      ┆ time     │
╞══════════╪══════╪══════════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 ┆ 23:59:59 │
└──────────┴──────┴──────────┴──────────┘

grizz.transformer.StringToTimeTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to convert some string columns to polars.Time type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for polars.Expr.str.to_time.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StringToTime
>>> transformer = StringToTime(
...     columns=["col1"], format="%H:%M:%S", prefix="", suffix="_out"
... )
>>> transformer
StringToTimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', format='%H:%M:%S')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────────┬──────┬──────────┬──────────┐
│ col1     ┆ col2 ┆ col3     ┆ col1_out │
│ ---      ┆ ---  ┆ ---      ┆ ---      │
│ str      ┆ str  ┆ str      ┆ time     │
╞══════════╪══════╪══════════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 ┆ 23:59:59 │
└──────────┴──────┴──────────┴──────────┘

grizz.transformer.StripChars

Bases: BaseInNOutNTransformer

Implement a transformer to remove leading and trailing characters.

This transformer ignores the columns that are not of type string.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to prepare. If None, it processes all the columns of type string.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for strip_chars.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> transformer = StripChars(columns=["col2", "col3"], prefix="", suffix="_out")
>>> transformer
StripCharsTransformer(columns=('col2', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬───────┬───────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  ┆ col2_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---   ┆ ---   ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ str   ┆ str   ┆ str      ┆ str      │
╞══════╪══════╪═══════╪═══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ a     ┆ a     ┆ 1        ┆ a        │
│ 2    ┆ 2    ┆  b    ┆  b    ┆ 2        ┆ b        │
│ 3    ┆ 3    ┆   c   ┆   c   ┆ 3        ┆ c        │
│ 4    ┆ 4    ┆ d     ┆ d     ┆ 4        ┆ d        │
│ 5    ┆ 5    ┆ e     ┆ e     ┆ 5        ┆ e        │
└──────┴──────┴───────┴───────┴──────────┴──────────┘

grizz.transformer.StripCharsTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to remove leading and trailing characters.

This transformer ignores the columns that are not of type string.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to prepare. If None, it processes all the columns of type string.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for strip_chars.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> transformer = StripChars(columns=["col2", "col3"], prefix="", suffix="_out")
>>> transformer
StripCharsTransformer(columns=('col2', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬───────┬───────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  ┆ col2_out ┆ col3_out │
│ ---  ┆ ---  ┆ ---   ┆ ---   ┆ ---      ┆ ---      │
│ i64  ┆ str  ┆ str   ┆ str   ┆ str      ┆ str      │
╞══════╪══════╪═══════╪═══════╪══════════╪══════════╡
│ 1    ┆ 1    ┆ a     ┆ a     ┆ 1        ┆ a        │
│ 2    ┆ 2    ┆  b    ┆  b    ┆ 2        ┆ b        │
│ 3    ┆ 3    ┆   c   ┆   c   ┆ 3        ┆ c        │
│ 4    ┆ 4    ┆ d     ┆ d     ┆ 4        ┆ d        │
│ 5    ┆ 5    ┆ e     ┆ e     ┆ 5        ┆ e        │
└──────┴──────┴───────┴───────┴──────────┴──────────┘

grizz.transformer.SumHorizontal

Bases: BaseInNOut1Transformer

Implement a transformer to sum all values horizontally across columns and store the result in a column.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to sum. The columns should be compatible. If None, it processes all the columns.

required
out_col str

The output column.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to polars.sum_horizontal.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SumHorizontal
>>> transformer = SumHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
SumHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ 63  │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ 66  │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ 69  │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ 72  │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ 75  │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.SumHorizontalTransformer

Bases: BaseInNOut1Transformer

Implement a transformer to sum all values horizontally across columns and store the result in a column.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns to sum. The columns should be compatible. If None, it processes all the columns.

required
out_col str

The output column.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

Additional arguments passed to polars.sum_horizontal.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import SumHorizontal
>>> transformer = SumHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
SumHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "col1": [11, 12, 13, 14, 15],
...         "col2": [21, 22, 23, 24, 25],
...         "col3": [31, 32, 33, 34, 35],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 11   ┆ 21   ┆ 31   ┆ a    │
│ 12   ┆ 22   ┆ 32   ┆ b    │
│ 13   ┆ 23   ┆ 33   ┆ c    │
│ 14   ┆ 24   ┆ 34   ┆ d    │
│ 15   ┆ 25   ┆ 35   ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ i64  ┆ str  ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 11   ┆ 21   ┆ 31   ┆ a    ┆ 63  │
│ 12   ┆ 22   ┆ 32   ┆ b    ┆ 66  │
│ 13   ┆ 23   ┆ 33   ┆ c    ┆ 69  │
│ 14   ┆ 24   ┆ 34   ┆ d    ┆ 72  │
│ 15   ┆ 25   ┆ 35   ┆ e    ┆ 75  │
└──────┴──────┴──────┴──────┴─────┘

grizz.transformer.TimeDiff

Bases: BaseArgTransformer

Implement a transformer to compute the time difference between consecutive time steps.

Parameters:

Name Type Description Default
group_cols Sequence[str]

The columns used to generate the group for each sequence.

required
time_col str

The input time column name.

required
time_diff_col str

The output time difference column name.

required
shift int

The number of slots to shift.

1

Example usage:

>>> import polars as pl
>>> from grizz.transformer import TimeDiff
>>> transformer = TimeDiff(group_cols=["col"], time_col="time", time_diff_col="diff")
>>> transformer
TimeDiffTransformer(group_cols=['col'], time_col='time', time_diff_col='diff', shift=1)
>>> frame = pl.DataFrame({"col": ["a", "b", "a", "a", "b"], "time": [1, 2, 3, 4, 5]})
>>> frame
shape: (5, 2)
┌─────┬──────┐
│ col ┆ time │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ a   ┆ 3    │
│ a   ┆ 4    │
│ b   ┆ 5    │
└─────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────┬──────┬──────┐
│ col ┆ time ┆ diff │
│ --- ┆ ---  ┆ ---  │
│ str ┆ i64  ┆ i64  │
╞═════╪══════╪══════╡
│ a   ┆ 1    ┆ 0    │
│ a   ┆ 3    ┆ 2    │
│ a   ┆ 4    ┆ 1    │
│ b   ┆ 2    ┆ 0    │
│ b   ┆ 5    ┆ 3    │
└─────┴──────┴──────┘

grizz.transformer.TimeDiffTransformer

Bases: BaseArgTransformer

Implement a transformer to compute the time difference between consecutive time steps.

Parameters:

Name Type Description Default
group_cols Sequence[str]

The columns used to generate the group for each sequence.

required
time_col str

The input time column name.

required
time_diff_col str

The output time difference column name.

required
shift int

The number of slots to shift.

1

Example usage:

>>> import polars as pl
>>> from grizz.transformer import TimeDiff
>>> transformer = TimeDiff(group_cols=["col"], time_col="time", time_diff_col="diff")
>>> transformer
TimeDiffTransformer(group_cols=['col'], time_col='time', time_diff_col='diff', shift=1)
>>> frame = pl.DataFrame({"col": ["a", "b", "a", "a", "b"], "time": [1, 2, 3, 4, 5]})
>>> frame
shape: (5, 2)
┌─────┬──────┐
│ col ┆ time │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ a   ┆ 3    │
│ a   ┆ 4    │
│ b   ┆ 5    │
└─────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────┬──────┬──────┐
│ col ┆ time ┆ diff │
│ --- ┆ ---  ┆ ---  │
│ str ┆ i64  ┆ i64  │
╞═════╪══════╪══════╡
│ a   ┆ 1    ┆ 0    │
│ a   ┆ 3    ┆ 2    │
│ a   ┆ 4    ┆ 1    │
│ b   ┆ 2    ┆ 0    │
│ b   ┆ 5    ┆ 3    │
└─────┴──────┴──────┘

grizz.transformer.TimeToSecond

Bases: BaseIn1Out1Transformer

Implement a transformer to convert a column with time values to seconds.

Parameters:

Name Type Description Default
in_col str

The input column with the time value to convert.

required
out_col str

The output column with the time in seconds.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import datetime
>>> import polars as pl
>>> from grizz.transformer import TimeToSecond
>>> transformer = TimeToSecond(in_col="time", out_col="second")
>>> transformer
TimeToSecondTransformer(in_col='time', out_col='second', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "time": [
...             datetime.time(0, 0, 1, 890000),
...             datetime.time(0, 1, 1, 890000),
...             datetime.time(1, 1, 1, 890000),
...             datetime.time(0, 19, 19, 890000),
...             datetime.time(19, 19, 19, 890000),
...         ],
...         "col": ["a", "b", "c", "d", "e"],
...     },
...     schema={"time": pl.Time, "col": pl.String},
... )
>>> frame
shape: (5, 2)
┌──────────────┬─────┐
│ time         ┆ col │
│ ---          ┆ --- │
│ time         ┆ str │
╞══════════════╪═════╡
│ 00:00:01.890 ┆ a   │
│ 00:01:01.890 ┆ b   │
│ 01:01:01.890 ┆ c   │
│ 00:19:19.890 ┆ d   │
│ 19:19:19.890 ┆ e   │
└──────────────┴─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────────┬─────┬──────────┐
│ time         ┆ col ┆ second   │
│ ---          ┆ --- ┆ ---      │
│ time         ┆ str ┆ f64      │
╞══════════════╪═════╪══════════╡
│ 00:00:01.890 ┆ a   ┆ 1.89     │
│ 00:01:01.890 ┆ b   ┆ 61.89    │
│ 01:01:01.890 ┆ c   ┆ 3661.89  │
│ 00:19:19.890 ┆ d   ┆ 1159.89  │
│ 19:19:19.890 ┆ e   ┆ 69559.89 │
└──────────────┴─────┴──────────┘

grizz.transformer.TimeToSecondTransformer

Bases: BaseIn1Out1Transformer

Implement a transformer to convert a column with time values to seconds.

Parameters:

Name Type Description Default
in_col str

The input column with the time value to convert.

required
out_col str

The output column with the time in seconds.

required
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'

Example usage:

>>> import datetime
>>> import polars as pl
>>> from grizz.transformer import TimeToSecond
>>> transformer = TimeToSecond(in_col="time", out_col="second")
>>> transformer
TimeToSecondTransformer(in_col='time', out_col='second', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
...     {
...         "time": [
...             datetime.time(0, 0, 1, 890000),
...             datetime.time(0, 1, 1, 890000),
...             datetime.time(1, 1, 1, 890000),
...             datetime.time(0, 19, 19, 890000),
...             datetime.time(19, 19, 19, 890000),
...         ],
...         "col": ["a", "b", "c", "d", "e"],
...     },
...     schema={"time": pl.Time, "col": pl.String},
... )
>>> frame
shape: (5, 2)
┌──────────────┬─────┐
│ time         ┆ col │
│ ---          ┆ --- │
│ time         ┆ str │
╞══════════════╪═════╡
│ 00:00:01.890 ┆ a   │
│ 00:01:01.890 ┆ b   │
│ 01:01:01.890 ┆ c   │
│ 00:19:19.890 ┆ d   │
│ 19:19:19.890 ┆ e   │
└──────────────┴─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────────┬─────┬──────────┐
│ time         ┆ col ┆ second   │
│ ---          ┆ --- ┆ ---      │
│ time         ┆ str ┆ f64      │
╞══════════════╪═════╪══════════╡
│ 00:00:01.890 ┆ a   ┆ 1.89     │
│ 00:01:01.890 ┆ b   ┆ 61.89    │
│ 01:01:01.890 ┆ c   ┆ 3661.89  │
│ 00:19:19.890 ┆ d   ┆ 1159.89  │
│ 19:19:19.890 ┆ e   ┆ 69559.89 │
└──────────────┴─────┴──────────┘

grizz.transformer.ToDatetime

Bases: BaseInNOutNTransformer

Implement a transformer to convert some columns to a polars.Datetime type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for to_datetime.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ToDatetime
>>> transformer = ToDatetime(columns=["col1"], prefix="", suffix="_out")
>>> transformer
ToDatetimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌─────────────────────┬──────┬─────────────────────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                ┆ col1_out            │
│ ---                 ┆ ---  ┆ ---                 ┆ ---                 │
│ str                 ┆ str  ┆ str                 ┆ datetime[μs]        │
╞═════════════════════╪══════╪═════════════════════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 ┆ 2020-01-01 01:01:01 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 ┆ 2020-01-01 02:02:02 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 ┆ 2020-01-01 12:00:01 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 ┆ 2020-01-01 18:18:18 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 ┆ 2020-01-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┴─────────────────────┘

grizz.transformer.ToDatetimeTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to convert some columns to a polars.Datetime type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for to_datetime.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ToDatetime
>>> transformer = ToDatetime(columns=["col1"], prefix="", suffix="_out")
>>> transformer
ToDatetimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
...     {
...         "col1": [
...             "2020-01-01 01:01:01",
...             "2020-01-01 02:02:02",
...             "2020-01-01 12:00:01",
...             "2020-01-01 18:18:18",
...             "2020-01-01 23:59:59",
...         ],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": [
...             "2020-01-01 11:11:11",
...             "2020-02-01 12:12:12",
...             "2020-03-01 13:13:13",
...             "2020-04-01 08:08:08",
...             "2020-05-01 23:59:59",
...         ],
...     },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                │
│ ---                 ┆ ---  ┆ ---                 │
│ str                 ┆ str  ┆ str                 │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌─────────────────────┬──────┬─────────────────────┬─────────────────────┐
│ col1                ┆ col2 ┆ col3                ┆ col1_out            │
│ ---                 ┆ ---  ┆ ---                 ┆ ---                 │
│ str                 ┆ str  ┆ str                 ┆ datetime[μs]        │
╞═════════════════════╪══════╪═════════════════════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1    ┆ 2020-01-01 11:11:11 ┆ 2020-01-01 01:01:01 │
│ 2020-01-01 02:02:02 ┆ 2    ┆ 2020-02-01 12:12:12 ┆ 2020-01-01 02:02:02 │
│ 2020-01-01 12:00:01 ┆ 3    ┆ 2020-03-01 13:13:13 ┆ 2020-01-01 12:00:01 │
│ 2020-01-01 18:18:18 ┆ 4    ┆ 2020-04-01 08:08:08 ┆ 2020-01-01 18:18:18 │
│ 2020-01-01 23:59:59 ┆ 5    ┆ 2020-05-01 23:59:59 ┆ 2020-01-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┴─────────────────────┘

grizz.transformer.ToTime

Bases: BaseInNOutNTransformer

Implement a transformer to convert some columns to a polars.Time type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for to_time.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ToTime
>>> transformer = ToTime(columns=["col1"], format="%H:%M:%S", prefix="", suffix="_out")
>>> transformer
ToTimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', format='%H:%M:%S')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────────┬──────┬──────────┬──────────┐
│ col1     ┆ col2 ┆ col3     ┆ col1_out │
│ ---      ┆ ---  ┆ ---      ┆ ---      │
│ str      ┆ str  ┆ str      ┆ time     │
╞══════════╪══════╪══════════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 ┆ 23:59:59 │
└──────────┴──────┴──────────┴──────────┘

grizz.transformer.ToTimeTransformer

Bases: BaseInNOutNTransformer

Implement a transformer to convert some columns to a polars.Time type.

Parameters:

Name Type Description Default
columns Sequence[str] | None

The columns of type to convert. None means all the columns.

required
prefix str

The column name prefix for the output columns.

required
suffix str

The column name suffix for the output columns.

required
exclude_columns Sequence[str]

The columns to exclude from the input columns. If any column is not found, it will be ignored during the filtering process.

()
exist_policy str

The policy on how to handle existing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column already exist. If 'warn', a warning is raised if at least one column already exist and the existing columns are overwritten. If 'ignore', the existing columns are overwritten and no warning message appears.

'raise'
missing_policy str

The policy on how to handle missing columns. The following options are available: 'ignore', 'warn', and 'raise'. If 'raise', an exception is raised if at least one column is missing. If 'warn', a warning is raised if at least one column is missing and the missing columns are ignored. If 'ignore', the missing columns are ignored and no warning message appears.

'raise'
**kwargs Any

The keyword arguments for to_time.

{}

Example usage:

>>> import polars as pl
>>> from grizz.transformer import ToTime
>>> transformer = ToTime(columns=["col1"], format="%H:%M:%S", prefix="", suffix="_out")
>>> transformer
ToTimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', format='%H:%M:%S')
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────────┬──────┬──────────┬──────────┐
│ col1     ┆ col2 ┆ col3     ┆ col1_out │
│ ---      ┆ ---  ┆ ---      ┆ ---      │
│ str      ┆ str  ┆ str      ┆ time     │
╞══════════╪══════╪══════════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 ┆ 23:59:59 │
└──────────┴──────┴──────────┴──────────┘

grizz.transformer.is_transformer_config

is_transformer_config(config: dict) -> bool

Indicate if the input configuration is a configuration for a BaseTransformer.

This function only checks if the value of the key _target_ is valid. It does not check the other values. If _target_ indicates a function, the returned type hint is used to check the class.

Parameters:

Name Type Description Default
config dict

The configuration to check.

required

Returns:

Type Description
bool

True if the input configuration is a configuration for a BaseTransformer object.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import is_transformer_config
>>> is_transformer_config(
...     {
...         "_target_": "grizz.transformer.InplaceCast",
...         "columns": ("col1", "col3"),
...         "dtype": pl.Int32,
...     }
... )
True

grizz.transformer.setup_transformer

setup_transformer(
    transformer: BaseTransformer | dict,
) -> BaseTransformer

Set up a polars.DataFrame transformer.

The transformer is instantiated from its configuration by using the BaseTransformer factory function.

Parameters:

Name Type Description Default
transformer BaseTransformer | dict

Specifies a polars.DataFrame transformer or its configuration.

required

Returns:

Type Description
BaseTransformer

An instantiated transformer.

Example usage:

>>> import polars as pl
>>> from grizz.transformer import setup_transformer
>>> transformer = setup_transformer(
...     {
...         "_target_": "grizz.transformer.InplaceCast",
...         "columns": ("col1", "col3"),
...         "dtype": pl.Int32,
...     }
... )
>>> transformer
InplaceCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Int32)