transformer
grizz.transformer ¶
Contain polars.DataFrame
transformers.
grizz.transformer.AbsDiffHorizontal ¶
Bases: BaseIn2Out1Transformer
Implement a transformer to compute the absolute difference between two columns.
Internally, this tranformer computes: out = abs(in1 - in2)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import AbsDiffHorizontal
>>> transformer = AbsDiffHorizontal(in1_col="col1", in2_col="col2", out_col="diff")
>>> transformer
AbsDiffHorizontalTransformer(in1_col='col1', in2_col='col2', out_col='diff', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ diff │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a ┆ 4 │
│ 2 ┆ 4 ┆ b ┆ 2 │
│ 3 ┆ 3 ┆ c ┆ 0 │
│ 4 ┆ 2 ┆ d ┆ 2 │
│ 5 ┆ 1 ┆ e ┆ 4 │
└──────┴──────┴──────┴──────┘
grizz.transformer.AbsDiffHorizontalTransformer ¶
Bases: BaseIn2Out1Transformer
Implement a transformer to compute the absolute difference between two columns.
Internally, this tranformer computes: out = abs(in1 - in2)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import AbsDiffHorizontal
>>> transformer = AbsDiffHorizontal(in1_col="col1", in2_col="col2", out_col="diff")
>>> transformer
AbsDiffHorizontalTransformer(in1_col='col1', in2_col='col2', out_col='diff', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ diff │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a ┆ 4 │
│ 2 ┆ 4 ┆ b ┆ 2 │
│ 3 ┆ 3 ┆ c ┆ 0 │
│ 4 ┆ 2 ┆ d ┆ 2 │
│ 5 ┆ 1 ┆ e ┆ 4 │
└──────┴──────┴──────┴──────┘
grizz.transformer.BaseArgTransformer ¶
Bases: BaseTransformer
Define a base class to implement transformers with custom arguments.
grizz.transformer.BaseArgTransformer._fit_data
abstractmethod
¶
_fit_data(frame: DataFrame) -> None
Fit to the data in the polars.DataFrame
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The |
required |
grizz.transformer.BaseArgTransformer._transform_data
abstractmethod
¶
_transform_data(frame: DataFrame) -> DataFrame
Transform the data in the polars.DataFrame
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The transformed DataFrame. |
grizz.transformer.BaseArgTransformer.get_args
abstractmethod
¶
get_args() -> dict
Get the arguments of the transformer.
Returns:
Type | Description |
---|---|
dict
|
The arguments of the transformer. |
grizz.transformer.BaseIn1Out1Transformer ¶
Bases: BaseArgTransformer
Define a base class to implement polars.DataFrame
transformers that takes one input column and generate one output
column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
grizz.transformer.BaseIn1Out1Transformer._check_input_column ¶
_check_input_column(frame: DataFrame) -> None
Check if the input column is missing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame to check. |
required |
grizz.transformer.BaseIn1Out1Transformer._check_output_column ¶
_check_output_column(frame: DataFrame) -> None
Check if the output column already exists.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame to check. |
required |
grizz.transformer.BaseIn1Out1Transformer._fit
abstractmethod
¶
_fit(frame: DataFrame) -> DataFrame
Fit to the data in the polars.DataFrame
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The |
required |
grizz.transformer.BaseIn1Out1Transformer._transform
abstractmethod
¶
_transform(frame: DataFrame) -> DataFrame
Transform the data in the polars.DataFrame
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The transformed DataFrame. |
grizz.transformer.BaseIn2Out1Transformer ¶
Bases: BaseArgTransformer
Define a base class to implement polars.DataFrame
transformers that takes two input columns and generate one output
column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import AbsDiffHorizontal
>>> transformer = AbsDiffHorizontal(in1_col="col1", in2_col="col2", out_col="diff")
>>> transformer
AbsDiffHorizontalTransformer(in1_col='col1', in2_col='col2', out_col='diff', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ diff │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a ┆ 4 │
│ 2 ┆ 4 ┆ b ┆ 2 │
│ 3 ┆ 3 ┆ c ┆ 0 │
│ 4 ┆ 2 ┆ d ┆ 2 │
│ 5 ┆ 1 ┆ e ┆ 4 │
└──────┴──────┴──────┴──────┘
grizz.transformer.BaseIn2Out1Transformer._check_input_columns ¶
_check_input_columns(frame: DataFrame) -> None
Check if any of the input columns is missing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame to check. |
required |
grizz.transformer.BaseIn2Out1Transformer._check_output_column ¶
_check_output_column(frame: DataFrame) -> None
Check if the output column already exists.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame to check. |
required |
grizz.transformer.BaseIn2Out1Transformer._fit
abstractmethod
¶
_fit(frame: DataFrame) -> DataFrame
Fit to the data in the polars.DataFrame
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The |
required |
grizz.transformer.BaseIn2Out1Transformer._transform
abstractmethod
¶
_transform(frame: DataFrame) -> DataFrame
Transform the data in the polars.DataFrame
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The transformed DataFrame. |
grizz.transformer.BaseInNOut1Transformer ¶
Bases: BaseInNTransformer
Define a base class to implement polars.DataFrame
transformers that generate a single output column by using multiple
input columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to prepare. If |
required |
out_col
|
str
|
The output column. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [11, 12, 13, 14, 15],
... "col2": [21, 22, 23, 24, 25],
... "col3": [31, 32, 33, 34, 35],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 11 ┆ 21 ┆ 31 ┆ a │
│ 12 ┆ 22 ┆ 32 ┆ b │
│ 13 ┆ 23 ┆ 33 ┆ c │
│ 14 ┆ 24 ┆ 34 ┆ d │
│ 15 ┆ 25 ┆ 35 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ list[i64] │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11 ┆ 21 ┆ 31 ┆ a ┆ [11, 21, 31] │
│ 12 ┆ 22 ┆ 32 ┆ b ┆ [12, 22, 32] │
│ 13 ┆ 23 ┆ 33 ┆ c ┆ [13, 23, 33] │
│ 14 ┆ 24 ┆ 34 ┆ d ┆ [14, 24, 34] │
│ 15 ┆ 25 ┆ 35 ┆ e ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘
grizz.transformer.BaseInNOut1Transformer._check_output_column ¶
_check_output_column(frame: DataFrame) -> None
Check if the output column already exists.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame to check. |
required |
grizz.transformer.BaseInNOutNTransformer ¶
Bases: BaseInNTransformer
Define a base class to implement polars.DataFrame
transformers that has N input columns and N output columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to prepare. If |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [11, 12, 13, 14, 15],
... "col2": [21, 22, 23, 24, 25],
... "col3": [31, 32, 33, 34, 35],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 11 ┆ 21 ┆ 31 ┆ a │
│ 12 ┆ 22 ┆ 32 ┆ b │
│ 13 ┆ 23 ┆ 33 ┆ c │
│ 14 ┆ 24 ┆ 34 ┆ d │
│ 15 ┆ 25 ┆ 35 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ list[i64] │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11 ┆ 21 ┆ 31 ┆ a ┆ [11, 21, 31] │
│ 12 ┆ 22 ┆ 32 ┆ b ┆ [12, 22, 32] │
│ 13 ┆ 23 ┆ 33 ┆ c ┆ [13, 23, 33] │
│ 14 ┆ 24 ┆ 34 ┆ d ┆ [14, 24, 34] │
│ 15 ┆ 25 ┆ 35 ┆ e ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘
grizz.transformer.BaseInNOutNTransformer._check_output_column ¶
_check_output_column(frame: DataFrame) -> None
Check if the output column already exists.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame to check. |
required |
grizz.transformer.BaseInNTransformer ¶
Bases: BaseArgTransformer
Define a base class to implement polars.DataFrame
transformers that transform DataFrames by using multiple input
columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to prepare. If |
None
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> transformer = DropNullRow()
>>> transformer
DropNullRowTransformer(columns=None, exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
... "col2": [1, None, 3, None, None],
... "col3": [None, None, None, None, None],
... }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1 ┆ 1 ┆ null │
│ 2020-1-2 ┆ null ┆ null │
│ 2020-1-31 ┆ 3 ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null ┆ null ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌────────────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1 ┆ 1 ┆ null │
│ 2020-1-2 ┆ null ┆ null │
│ 2020-1-31 ┆ 3 ┆ null │
│ 2020-12-31 ┆ null ┆ null │
└────────────┴──────┴──────┘
grizz.transformer.BaseInNTransformer._check_input_columns ¶
_check_input_columns(frame: DataFrame) -> None
Check if some input columns are missing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame to check. |
required |
grizz.transformer.BaseInNTransformer._fit
abstractmethod
¶
_fit(frame: DataFrame) -> DataFrame
Fit to the data in the polars.DataFrame
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The |
required |
grizz.transformer.BaseInNTransformer._transform
abstractmethod
¶
_transform(frame: DataFrame) -> DataFrame
Transform the data in the polars.DataFrame
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The transformed DataFrame. |
grizz.transformer.BaseInNTransformer.find_columns ¶
find_columns(frame: DataFrame) -> tuple[str, ...]
Find the columns to transform.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame. Sometimes the columns to transform are found by analyzing the input DataFrame. |
required |
Returns:
Type | Description |
---|---|
tuple[str, ...]
|
The columns to transform. |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> transformer = DropNullRow(columns=["col2", "col3"])
>>> transformer.find_columns(frame)
('col2', 'col3')
>>> transformer = DropNullRow()
>>> transformer.find_columns(frame)
('col1', 'col2', 'col3', 'col4')
grizz.transformer.BaseInNTransformer.find_common_columns ¶
find_common_columns(frame: DataFrame) -> tuple[str, ...]
Find the common columns between the DataFrame columns and the input columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame. Sometimes the columns to transform are found by analyzing the input DataFrame. |
required |
Returns:
Type | Description |
---|---|
tuple[str, ...]
|
The common columns. |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> transformer = DropNullRow(columns=["col2", "col3", "col5"])
>>> transformer.find_common_columns(frame)
('col2', 'col3')
>>> transformer = DropNullRow()
>>> transformer.find_common_columns(frame)
('col1', 'col2', 'col3', 'col4')
grizz.transformer.BaseInNTransformer.find_missing_columns ¶
find_missing_columns(frame: DataFrame) -> tuple[str, ...]
Find the missing columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame. Sometimes the columns to transform are found by analyzing the input DataFrame. |
required |
Returns:
Type | Description |
---|---|
tuple[str, ...]
|
The missing columns. |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> transformer = DropNullRow(columns=["col2", "col3", "col5"])
>>> transformer.find_missing_columns(frame)
('col5',)
>>> transformer = DropNullRow()
>>> transformer.find_missing_columns(frame)
()
grizz.transformer.BaseTransformer ¶
Bases: ABC
Define the base class to transform a polars.DataFrame
.
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
InplaceCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.BaseTransformer.equal
abstractmethod
¶
equal(other: Any, equal_nan: bool = False) -> bool
Indicate if two objects are equal or not.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Any
|
The other object to compare. |
required |
equal_nan
|
bool
|
Whether to compare NaN's as equal. If |
False
|
Returns:
Type | Description |
---|---|
bool
|
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> obj1 = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> obj2 = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> obj3 = InplaceCast(columns=["col2", "col3"], dtype=pl.Float32)
>>> obj1.equal(obj2)
True
>>> obj1.equal(obj3)
False
grizz.transformer.BaseTransformer.fit
abstractmethod
¶
fit(frame: DataFrame) -> None
Fit to the data in the polars.DataFrame
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The |
required |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> transformer.fit(frame)
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.BaseTransformer.fit_transform
abstractmethod
¶
fit_transform(frame: DataFrame) -> None
Fit to the data, then transform it.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The |
required |
Returns:
Type | Description |
---|---|
None
|
The transformed DataFrame. |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.BaseTransformer.transform
abstractmethod
¶
transform(frame: DataFrame) -> DataFrame
Transform the data in the polars.DataFrame
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The transformed DataFrame. |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.Binarizer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to binarize data according to a threshold.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Binarizer
>>> transformer = Binarizer(
... columns=["col1", "col3"], prefix="", suffix="_out", threshold=1.5
... )
>>> transformer
BinarizerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', threshold=1.5)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [5, 4, 3, 2, 1, 0],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ 5 ┆ a │
│ 1 ┆ 1 ┆ 4 ┆ b │
│ 2 ┆ 2 ┆ 3 ┆ c │
│ 3 ┆ 3 ┆ 2 ┆ d │
│ 4 ┆ 4 ┆ 1 ┆ e │
│ 5 ┆ 5 ┆ 0 ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ i64 ┆ i64 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0 ┆ 0 ┆ 5 ┆ a ┆ 0 ┆ 1 │
│ 1 ┆ 1 ┆ 4 ┆ b ┆ 0 ┆ 1 │
│ 2 ┆ 2 ┆ 3 ┆ c ┆ 1 ┆ 1 │
│ 3 ┆ 3 ┆ 2 ┆ d ┆ 1 ┆ 1 │
│ 4 ┆ 4 ┆ 1 ┆ e ┆ 1 ┆ 0 │
│ 5 ┆ 5 ┆ 0 ┆ f ┆ 1 ┆ 0 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.BinarizerTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to binarize data according to a threshold.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Binarizer
>>> transformer = Binarizer(
... columns=["col1", "col3"], prefix="", suffix="_out", threshold=1.5
... )
>>> transformer
BinarizerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', threshold=1.5)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [5, 4, 3, 2, 1, 0],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ 5 ┆ a │
│ 1 ┆ 1 ┆ 4 ┆ b │
│ 2 ┆ 2 ┆ 3 ┆ c │
│ 3 ┆ 3 ┆ 2 ┆ d │
│ 4 ┆ 4 ┆ 1 ┆ e │
│ 5 ┆ 5 ┆ 0 ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ i64 ┆ i64 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0 ┆ 0 ┆ 5 ┆ a ┆ 0 ┆ 1 │
│ 1 ┆ 1 ┆ 4 ┆ b ┆ 0 ┆ 1 │
│ 2 ┆ 2 ┆ 3 ┆ c ┆ 1 ┆ 1 │
│ 3 ┆ 3 ┆ 2 ┆ d ┆ 1 ┆ 1 │
│ 4 ┆ 4 ┆ 1 ┆ e ┆ 1 ┆ 0 │
│ 5 ┆ 5 ┆ 0 ┆ f ┆ 1 ┆ 0 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.Cast ¶
Bases: BaseInNOutNTransformer
Implement a transformer to convert some columns to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32, prefix="", suffix="_out")
>>> transformer
CastTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ i32 ┆ i32 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 1 ┆ a ┆ 1 ┆ 1 │
│ 2 ┆ 2 ┆ 2 ┆ b ┆ 2 ┆ 2 │
│ 3 ┆ 3 ┆ 3 ┆ c ┆ 3 ┆ 3 │
│ 4 ┆ 4 ┆ 4 ┆ d ┆ 4 ┆ 4 │
│ 5 ┆ 5 ┆ 5 ┆ e ┆ 5 ┆ 5 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.CastTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to convert some columns to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32, prefix="", suffix="_out")
>>> transformer
CastTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ i32 ┆ i32 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 1 ┆ a ┆ 1 ┆ 1 │
│ 2 ┆ 2 ┆ 2 ┆ b ┆ 2 ┆ 2 │
│ 3 ┆ 3 ┆ 3 ┆ c ┆ 3 ┆ 3 │
│ 4 ┆ 4 ┆ 4 ┆ d ┆ 4 ┆ 4 │
│ 5 ┆ 5 ┆ 5 ┆ e ┆ 5 ┆ 5 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.CategoricalCast ¶
Bases: BaseIn1Out1Transformer
Implement a transformer to convert a column to categorical data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name to cast. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import CategoricalCast
>>> transformer = CategoricalCast(in_col="col1", out_col="out")
>>> transformer
CategoricalCastTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": ["a", "b", "c", "d", "e"],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... },
... schema={"col1": pl.String, "col2": pl.Float64},
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ f64 │
╞══════╪══════╡
│ a ┆ 1.0 │
│ b ┆ 2.0 │
│ c ┆ 3.0 │
│ d ┆ 4.0 │
│ e ┆ 5.0 │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬─────┐
│ col1 ┆ col2 ┆ out │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ cat │
╞══════╪══════╪═════╡
│ a ┆ 1.0 ┆ a │
│ b ┆ 2.0 ┆ b │
│ c ┆ 3.0 ┆ c │
│ d ┆ 4.0 ┆ d │
│ e ┆ 5.0 ┆ e │
└──────┴──────┴─────┘
grizz.transformer.CategoricalCastTransformer ¶
Bases: BaseIn1Out1Transformer
Implement a transformer to convert a column to categorical data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name to cast. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import CategoricalCast
>>> transformer = CategoricalCast(in_col="col1", out_col="out")
>>> transformer
CategoricalCastTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": ["a", "b", "c", "d", "e"],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... },
... schema={"col1": pl.String, "col2": pl.Float64},
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ f64 │
╞══════╪══════╡
│ a ┆ 1.0 │
│ b ┆ 2.0 │
│ c ┆ 3.0 │
│ d ┆ 4.0 │
│ e ┆ 5.0 │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬─────┐
│ col1 ┆ col2 ┆ out │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ cat │
╞══════╪══════╪═════╡
│ a ┆ 1.0 ┆ a │
│ b ┆ 2.0 ┆ b │
│ c ┆ 3.0 ┆ c │
│ d ┆ 4.0 ┆ d │
│ e ┆ 5.0 ┆ e │
└──────┴──────┴─────┘
grizz.transformer.ColumnClose ¶
Bases: BaseIn2Out1Transformer
Implement a transformer to compute a column that indicates if the values of two columns are element-wise equal within a tolerance.
The output column contains True
if two columns are element-wise
equal within a tolerance. Internally, this tranformer computes:
out = (|actual - expected| <= atol + rtol * |expected|)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
actual
|
str
|
The actual input column name. This column must be a numeric column. |
required |
expected
|
str
|
The expected input column name. This column must be a numeric column. |
required |
out_col
|
str
|
The output column name. |
required |
atol
|
float
|
The absolute tolerance parameter. |
1e-08
|
rtol
|
float
|
The relative tolerance parameter. |
1e-05
|
equal_nan
|
bool
|
Whether to compare NaN's as equal. If |
False
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnClose
>>> transformer = ColumnClose(actual="col1", expected="col2", out_col="out")
>>> transformer
ColumnCloseTransformer(actual='col1', expected='col2', out_col='out', atol=1e-08, rtol=1e-05, equal_nan=False, exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ false │
│ 2 ┆ 4 ┆ b ┆ false │
│ 3 ┆ 3 ┆ c ┆ true │
│ 4 ┆ 2 ┆ d ┆ false │
│ 5 ┆ 1 ┆ e ┆ false │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnCloseTransformer ¶
Bases: BaseIn2Out1Transformer
Implement a transformer to compute a column that indicates if the values of two columns are element-wise equal within a tolerance.
The output column contains True
if two columns are element-wise
equal within a tolerance. Internally, this tranformer computes:
out = (|actual - expected| <= atol + rtol * |expected|)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
actual
|
str
|
The actual input column name. This column must be a numeric column. |
required |
expected
|
str
|
The expected input column name. This column must be a numeric column. |
required |
out_col
|
str
|
The output column name. |
required |
atol
|
float
|
The absolute tolerance parameter. |
1e-08
|
rtol
|
float
|
The relative tolerance parameter. |
1e-05
|
equal_nan
|
bool
|
Whether to compare NaN's as equal. If |
False
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnClose
>>> transformer = ColumnClose(actual="col1", expected="col2", out_col="out")
>>> transformer
ColumnCloseTransformer(actual='col1', expected='col2', out_col='out', atol=1e-08, rtol=1e-05, equal_nan=False, exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ false │
│ 2 ┆ 4 ┆ b ┆ false │
│ 3 ┆ 3 ┆ c ┆ true │
│ 4 ┆ 2 ┆ d ┆ false │
│ 5 ┆ 1 ┆ e ┆ false │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnEqual ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the equal operation between
two columns (in1 == in2
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnEqual
>>> transformer = ColumnEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ false │
│ 2 ┆ 4 ┆ b ┆ false │
│ 3 ┆ 3 ┆ c ┆ true │
│ 4 ┆ 2 ┆ d ┆ false │
│ 5 ┆ 1 ┆ e ┆ false │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnEqualMissing ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the equal operation between
two columns (in1 == in2
), where null values are not propagated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnEqualMissing
>>> transformer = ColumnEqualMissing(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnEqualMissingTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ false │
│ 2 ┆ 4 ┆ b ┆ false │
│ 3 ┆ 3 ┆ c ┆ true │
│ 4 ┆ 2 ┆ d ┆ false │
│ 5 ┆ 1 ┆ e ┆ false │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnEqualMissingTransformer ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the equal operation between
two columns (in1 == in2
), where null values are not propagated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnEqualMissing
>>> transformer = ColumnEqualMissing(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnEqualMissingTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ false │
│ 2 ┆ 4 ┆ b ┆ false │
│ 3 ┆ 3 ┆ c ┆ true │
│ 4 ┆ 2 ┆ d ┆ false │
│ 5 ┆ 1 ┆ e ┆ false │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnEqualTransformer ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the equal operation between
two columns (in1 == in2
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnEqual
>>> transformer = ColumnEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ false │
│ 2 ┆ 4 ┆ b ┆ false │
│ 3 ┆ 3 ┆ c ┆ true │
│ 4 ┆ 2 ┆ d ┆ false │
│ 5 ┆ 1 ┆ e ┆ false │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnGreater ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the greater than operation
between two columns (in1 > in2
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnGreater
>>> transformer = ColumnGreater(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnGreaterTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ false │
│ 2 ┆ 4 ┆ b ┆ false │
│ 3 ┆ 3 ┆ c ┆ false │
│ 4 ┆ 2 ┆ d ┆ true │
│ 5 ┆ 1 ┆ e ┆ true │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnGreaterEqual ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the greater than or equal
operation between two columns (in1 >= in2
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnGreaterEqual
>>> transformer = ColumnGreaterEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnGreaterEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ false │
│ 2 ┆ 4 ┆ b ┆ false │
│ 3 ┆ 3 ┆ c ┆ true │
│ 4 ┆ 2 ┆ d ┆ true │
│ 5 ┆ 1 ┆ e ┆ true │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnGreaterEqualTransformer ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the greater than or equal
operation between two columns (in1 >= in2
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnGreaterEqual
>>> transformer = ColumnGreaterEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnGreaterEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ false │
│ 2 ┆ 4 ┆ b ┆ false │
│ 3 ┆ 3 ┆ c ┆ true │
│ 4 ┆ 2 ┆ d ┆ true │
│ 5 ┆ 1 ┆ e ┆ true │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnGreaterTransformer ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the greater than operation
between two columns (in1 > in2
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnGreater
>>> transformer = ColumnGreater(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnGreaterTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ false │
│ 2 ┆ 4 ┆ b ┆ false │
│ 3 ┆ 3 ┆ c ┆ false │
│ 4 ┆ 2 ┆ d ┆ true │
│ 5 ┆ 1 ┆ e ┆ true │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnLower ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the lower than operation
between two columns (in1 < in2
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnLower
>>> transformer = ColumnLower(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnLowerTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ true │
│ 2 ┆ 4 ┆ b ┆ true │
│ 3 ┆ 3 ┆ c ┆ false │
│ 4 ┆ 2 ┆ d ┆ false │
│ 5 ┆ 1 ┆ e ┆ false │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnLowerEqual ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the lower than or equal
operation between two columns (in1 <= in2
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnLowerEqual
>>> transformer = ColumnLowerEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnLowerEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ true │
│ 2 ┆ 4 ┆ b ┆ true │
│ 3 ┆ 3 ┆ c ┆ true │
│ 4 ┆ 2 ┆ d ┆ false │
│ 5 ┆ 1 ┆ e ┆ false │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnLowerEqualTransformer ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the lower than or equal
operation between two columns (in1 <= in2
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnLowerEqual
>>> transformer = ColumnLowerEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnLowerEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ true │
│ 2 ┆ 4 ┆ b ┆ true │
│ 3 ┆ 3 ┆ c ┆ true │
│ 4 ┆ 2 ┆ d ┆ false │
│ 5 ┆ 1 ┆ e ┆ false │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnLowerTransformer ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the lower than operation
between two columns (in1 < in2
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnLower
>>> transformer = ColumnLower(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnLowerTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ true │
│ 2 ┆ 4 ┆ b ┆ true │
│ 3 ┆ 3 ┆ c ┆ false │
│ 4 ┆ 2 ┆ d ┆ false │
│ 5 ┆ 1 ┆ e ┆ false │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnNotEqual ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the not equal operation
between two columns (in1 != in2
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnNotEqual
>>> transformer = ColumnNotEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnNotEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ true │
│ 2 ┆ 4 ┆ b ┆ true │
│ 3 ┆ 3 ┆ c ┆ false │
│ 4 ┆ 2 ┆ d ┆ true │
│ 5 ┆ 1 ┆ e ┆ true │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnNotEqualMissing ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the not equal operation
between two columns (in1 != in2
), where null values are not
propagated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnNotEqualMissing
>>> transformer = ColumnNotEqualMissing(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnNotEqualMissingTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ true │
│ 2 ┆ 4 ┆ b ┆ true │
│ 3 ┆ 3 ┆ c ┆ false │
│ 4 ┆ 2 ┆ d ┆ true │
│ 5 ┆ 1 ┆ e ┆ true │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnNotEqualMissingTransformer ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the not equal operation
between two columns (in1 != in2
), where null values are not
propagated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnNotEqualMissing
>>> transformer = ColumnNotEqualMissing(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnNotEqualMissingTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ true │
│ 2 ┆ 4 ┆ b ┆ true │
│ 3 ┆ 3 ┆ c ┆ false │
│ 4 ┆ 2 ┆ d ┆ true │
│ 5 ┆ 1 ┆ e ┆ true │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnNotEqualTransformer ¶
Bases: BaseColumnComparatorTransformer
Implement a transformer that computes the not equal operation
between two columns (in1 != in2
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnNotEqual
>>> transformer = ColumnNotEqual(in1_col="col1", in2_col="col2", out_col="out")
>>> transformer
ColumnNotEqualTransformer(in1_col='col1', in2_col='col2', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ bool │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 5 ┆ a ┆ true │
│ 2 ┆ 4 ┆ b ┆ true │
│ 3 ┆ 3 ┆ c ┆ false │
│ 4 ┆ 2 ┆ d ┆ true │
│ 5 ┆ 1 ┆ e ┆ true │
└──────┴──────┴──────┴───────┘
grizz.transformer.ColumnSelection ¶
Bases: BaseInNTransformer
Implement a polars.DataFrame
transformer to select a subset
of columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to keep. |
None
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnSelection
>>> transformer = ColumnSelection(columns=["col1", "col2"])
>>> transformer
ColumnSelectionTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
... "col2": [1, 2, 3, 4, 5],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌────────────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ i64 │
╞════════════╪══════╡
│ 2020-1-1 ┆ 1 │
│ 2020-1-2 ┆ 2 │
│ 2020-1-31 ┆ 3 │
│ 2020-12-31 ┆ 4 │
│ null ┆ 5 │
└────────────┴──────┘
grizz.transformer.ColumnSelectionTransformer ¶
Bases: BaseInNTransformer
Implement a polars.DataFrame
transformer to select a subset
of columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to keep. |
None
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnSelection
>>> transformer = ColumnSelection(columns=["col1", "col2"])
>>> transformer
ColumnSelectionTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
... "col2": [1, 2, 3, 4, 5],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌────────────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ i64 │
╞════════════╪══════╡
│ 2020-1-1 ┆ 1 │
│ 2020-1-2 ┆ 2 │
│ 2020-1-31 ┆ 3 │
│ 2020-12-31 ┆ 4 │
│ null ┆ 5 │
└────────────┴──────┘
grizz.transformer.ConcatColumns ¶
Bases: BaseInNOut1Transformer
Implement a transformer to concatenate columns into a new column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to concatenate. The columns should have
the same type or compatible types. If |
required |
out_col
|
str
|
The output column. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_col="out")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_col='out', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [11, 12, 13, 14, 15],
... "col2": [21, 22, 23, 24, 25],
... "col3": [31, 32, 33, 34, 35],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 11 ┆ 21 ┆ 31 ┆ a │
│ 12 ┆ 22 ┆ 32 ┆ b │
│ 13 ┆ 23 ┆ 33 ┆ c │
│ 14 ┆ 24 ┆ 34 ┆ d │
│ 15 ┆ 25 ┆ 35 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ list[i64] │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11 ┆ 21 ┆ 31 ┆ a ┆ [11, 21, 31] │
│ 12 ┆ 22 ┆ 32 ┆ b ┆ [12, 22, 32] │
│ 13 ┆ 23 ┆ 33 ┆ c ┆ [13, 23, 33] │
│ 14 ┆ 24 ┆ 34 ┆ d ┆ [14, 24, 34] │
│ 15 ┆ 25 ┆ 35 ┆ e ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘
grizz.transformer.ConcatColumnsTransformer ¶
Bases: BaseInNOut1Transformer
Implement a transformer to concatenate columns into a new column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to concatenate. The columns should have
the same type or compatible types. If |
required |
out_col
|
str
|
The output column. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_col="out")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_col='out', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [11, 12, 13, 14, 15],
... "col2": [21, 22, 23, 24, 25],
... "col3": [31, 32, 33, 34, 35],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 11 ┆ 21 ┆ 31 ┆ a │
│ 12 ┆ 22 ┆ 32 ┆ b │
│ 13 ┆ 23 ┆ 33 ┆ c │
│ 14 ┆ 24 ┆ 34 ┆ d │
│ 15 ┆ 25 ┆ 35 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ list[i64] │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11 ┆ 21 ┆ 31 ┆ a ┆ [11, 21, 31] │
│ 12 ┆ 22 ┆ 32 ┆ b ┆ [12, 22, 32] │
│ 13 ┆ 23 ┆ 33 ┆ c ┆ [13, 23, 33] │
│ 14 ┆ 24 ┆ 34 ┆ d ┆ [14, 24, 34] │
│ 15 ┆ 25 ┆ 35 ┆ e ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘
grizz.transformer.CopyColumn ¶
Bases: BaseIn1Out1Transformer
Implement a polars.DataFrame
to copy a column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name i.e. the column to copy. |
required |
out_col
|
str
|
The output column name i.e. the copied column. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import CopyColumn
>>> transformer = CopyColumn(in_col="col1", out_col="out")
>>> transformer
CopyColumnTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 1 ┆ 1 ┆ 1 ┆ a ┆ 1 │
│ 2 ┆ 2 ┆ 2 ┆ b ┆ 2 │
│ 3 ┆ 3 ┆ 3 ┆ c ┆ 3 │
│ 4 ┆ 4 ┆ 4 ┆ d ┆ 4 │
│ 5 ┆ 5 ┆ 5 ┆ e ┆ 5 │
└──────┴──────┴──────┴──────┴─────┘
grizz.transformer.CopyColumnTransformer ¶
Bases: BaseIn1Out1Transformer
Implement a polars.DataFrame
to copy a column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name i.e. the column to copy. |
required |
out_col
|
str
|
The output column name i.e. the copied column. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import CopyColumn
>>> transformer = CopyColumn(in_col="col1", out_col="out")
>>> transformer
CopyColumnTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 1 ┆ 1 ┆ 1 ┆ a ┆ 1 │
│ 2 ┆ 2 ┆ 2 ┆ b ┆ 2 │
│ 3 ┆ 3 ┆ 3 ┆ c ┆ 3 │
│ 4 ┆ 4 ┆ 4 ┆ d ┆ 4 │
│ 5 ┆ 5 ┆ 5 ┆ e ┆ 5 │
└──────┴──────┴──────┴──────┴─────┘
grizz.transformer.CopyColumns ¶
Bases: BaseInNOutNTransformer
Implement a transformer to copy some columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to copy. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import CopyColumns
>>> transformer = CopyColumns(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
CopyColumnsTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 1 ┆ a ┆ 1 ┆ 1 │
│ 2 ┆ 2 ┆ 2 ┆ b ┆ 2 ┆ 2 │
│ 3 ┆ 3 ┆ 3 ┆ c ┆ 3 ┆ 3 │
│ 4 ┆ 4 ┆ 4 ┆ d ┆ 4 ┆ 4 │
│ 5 ┆ 5 ┆ 5 ┆ e ┆ 5 ┆ 5 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.CopyColumnsTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to copy some columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to copy. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import CopyColumns
>>> transformer = CopyColumns(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
CopyColumnsTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 1 ┆ a ┆ 1 ┆ 1 │
│ 2 ┆ 2 ┆ 2 ┆ b ┆ 2 ┆ 2 │
│ 3 ┆ 3 ┆ 3 ┆ c ┆ 3 ┆ 3 │
│ 4 ┆ 4 ┆ 4 ┆ d ┆ 4 ┆ 4 │
│ 5 ┆ 5 ┆ 5 ┆ e ┆ 5 ┆ 5 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.DecimalCast ¶
Bases: CastTransformer
Implement a transformer to convert decimal columns to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DecimalCast
>>> transformer = DecimalCast(
... columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
DecimalCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Decimal,
... "col3": pl.Decimal,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────────────┬──────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ decimal[*,0] ┆ decimal[*,0] ┆ str │
╞══════╪══════════════╪══════════════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────────────┴──────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────────────┬──────────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col2_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ decimal[*,0] ┆ decimal[*,0] ┆ str ┆ f32 │
╞══════╪══════════════╪══════════════╪══════╪══════════╡
│ 1 ┆ 1 ┆ 1 ┆ a ┆ 1.0 │
│ 2 ┆ 2 ┆ 2 ┆ b ┆ 2.0 │
│ 3 ┆ 3 ┆ 3 ┆ c ┆ 3.0 │
│ 4 ┆ 4 ┆ 4 ┆ d ┆ 4.0 │
│ 5 ┆ 5 ┆ 5 ┆ e ┆ 5.0 │
└──────┴──────────────┴──────────────┴──────┴──────────┘
grizz.transformer.DecimalCastTransformer ¶
Bases: CastTransformer
Implement a transformer to convert decimal columns to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DecimalCast
>>> transformer = DecimalCast(
... columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
DecimalCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Decimal,
... "col3": pl.Decimal,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────────────┬──────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ decimal[*,0] ┆ decimal[*,0] ┆ str │
╞══════╪══════════════╪══════════════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────────────┴──────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────────────┬──────────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col2_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ decimal[*,0] ┆ decimal[*,0] ┆ str ┆ f32 │
╞══════╪══════════════╪══════════════╪══════╪══════════╡
│ 1 ┆ 1 ┆ 1 ┆ a ┆ 1.0 │
│ 2 ┆ 2 ┆ 2 ┆ b ┆ 2.0 │
│ 3 ┆ 3 ┆ 3 ┆ c ┆ 3.0 │
│ 4 ┆ 4 ┆ 4 ┆ d ┆ 4.0 │
│ 5 ┆ 5 ┆ 5 ┆ e ┆ 5.0 │
└──────┴──────────────┴──────────────┴──────┴──────────┘
grizz.transformer.Diff ¶
Bases: BaseIn1Out1Transformer
Implement a transformer to compute the first discrete difference between shifted items.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name. |
required |
out_col
|
str
|
The output column name. |
required |
shift
|
int
|
The number of slots to shift. |
1
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Diff
>>> transformer = Diff(in_col="col1", out_col="diff")
>>> transformer
DiffTransformer(in_col='col1', out_col='diff', exist_policy='raise', missing_policy='raise', shift=1)
>>> frame = pl.DataFrame({"col1": [1, 2, 3, 4, 5], "col2": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ i64 ┆ str │
╞══════╪══════╡
│ 1 ┆ a │
│ 2 ┆ b │
│ 3 ┆ c │
│ 4 ┆ d │
│ 5 ┆ e │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ diff │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╡
│ 1 ┆ a ┆ null │
│ 2 ┆ b ┆ 1 │
│ 3 ┆ c ┆ 1 │
│ 4 ┆ d ┆ 1 │
│ 5 ┆ e ┆ 1 │
└──────┴──────┴──────┘
grizz.transformer.DiffHorizontal ¶
Bases: BaseIn2Out1Transformer
Implement a transformer to compute the difference between two columns.
Internally, this tranformer computes: out = in1 - in2
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DiffHorizontal
>>> transformer = DiffHorizontal(in1_col="col1", in2_col="col2", out_col="diff")
>>> transformer
DiffHorizontalTransformer(in1_col='col1', in2_col='col2', out_col='diff', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ diff │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a ┆ -4 │
│ 2 ┆ 4 ┆ b ┆ -2 │
│ 3 ┆ 3 ┆ c ┆ 0 │
│ 4 ┆ 2 ┆ d ┆ 2 │
│ 5 ┆ 1 ┆ e ┆ 4 │
└──────┴──────┴──────┴──────┘
grizz.transformer.DiffHorizontalTransformer ¶
Bases: BaseIn2Out1Transformer
Implement a transformer to compute the difference between two columns.
Internally, this tranformer computes: out = in1 - in2
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in1_col
|
str
|
The first input column name. |
required |
in2_col
|
str
|
The second input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DiffHorizontal
>>> transformer = DiffHorizontal(in1_col="col1", in2_col="col2", out_col="diff")
>>> transformer
DiffHorizontalTransformer(in1_col='col1', in2_col='col2', out_col='diff', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [5, 4, 3, 2, 1],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a │
│ 2 ┆ 4 ┆ b │
│ 3 ┆ 3 ┆ c │
│ 4 ┆ 2 ┆ d │
│ 5 ┆ 1 ┆ e │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ diff │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 5 ┆ a ┆ -4 │
│ 2 ┆ 4 ┆ b ┆ -2 │
│ 3 ┆ 3 ┆ c ┆ 0 │
│ 4 ┆ 2 ┆ d ┆ 2 │
│ 5 ┆ 1 ┆ e ┆ 4 │
└──────┴──────┴──────┴──────┘
grizz.transformer.DiffTransformer ¶
Bases: BaseIn1Out1Transformer
Implement a transformer to compute the first discrete difference between shifted items.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name. |
required |
out_col
|
str
|
The output column name. |
required |
shift
|
int
|
The number of slots to shift. |
1
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Diff
>>> transformer = Diff(in_col="col1", out_col="diff")
>>> transformer
DiffTransformer(in_col='col1', out_col='diff', exist_policy='raise', missing_policy='raise', shift=1)
>>> frame = pl.DataFrame({"col1": [1, 2, 3, 4, 5], "col2": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ i64 ┆ str │
╞══════╪══════╡
│ 1 ┆ a │
│ 2 ┆ b │
│ 3 ┆ c │
│ 4 ┆ d │
│ 5 ┆ e │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ diff │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╡
│ 1 ┆ a ┆ null │
│ 2 ┆ b ┆ 1 │
│ 3 ┆ c ┆ 1 │
│ 4 ┆ d ┆ 1 │
│ 5 ┆ e ┆ 1 │
└──────┴──────┴──────┘
grizz.transformer.DropDuplicate ¶
Bases: BaseInNTransformer
Implement a transformer to drop duplicate rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to check. If set to |
None
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropDuplicate
>>> transformer = DropDuplicate(keep="first", maintain_order=True)
>>> transformer
DropDuplicateTransformer(columns=None, exclude_columns=(), missing_policy='raise', keep='first', maintain_order=True)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 1],
... "col2": ["1", "2", "3", "4", "1"],
... "col3": ["1", "2", "3", "1", "1"],
... "col4": ["a", "a", "a", "a", "a"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ a │
│ 3 ┆ 3 ┆ 3 ┆ a │
│ 4 ┆ 4 ┆ 1 ┆ a │
│ 1 ┆ 1 ┆ 1 ┆ a │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ a │
│ 3 ┆ 3 ┆ 3 ┆ a │
│ 4 ┆ 4 ┆ 1 ┆ a │
└──────┴──────┴──────┴──────┘
grizz.transformer.DropDuplicateTransformer ¶
Bases: BaseInNTransformer
Implement a transformer to drop duplicate rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to check. If set to |
None
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropDuplicate
>>> transformer = DropDuplicate(keep="first", maintain_order=True)
>>> transformer
DropDuplicateTransformer(columns=None, exclude_columns=(), missing_policy='raise', keep='first', maintain_order=True)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 1],
... "col2": ["1", "2", "3", "4", "1"],
... "col3": ["1", "2", "3", "1", "1"],
... "col4": ["a", "a", "a", "a", "a"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ a │
│ 3 ┆ 3 ┆ 3 ┆ a │
│ 4 ┆ 4 ┆ 1 ┆ a │
│ 1 ┆ 1 ┆ 1 ┆ a │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ a │
│ 3 ┆ 3 ┆ 3 ┆ a │
│ 4 ┆ 4 ┆ 1 ┆ a │
└──────┴──────┴──────┴──────┘
grizz.transformer.DropNanColumn ¶
Bases: BaseInNTransformer
Implement a transformer to remove the columns that have too many NaN values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
None
|
threshold
|
float
|
The maximum percentage of NaN values to keep
columns. If the proportion of NaN vallues is greater
or equal to this threshold value, the column is removed.
If set to |
1.0
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNanColumn
>>> transformer = DropNanColumn()
>>> transformer
DropNanColumnTransformer(columns=None, exclude_columns=(), missing_policy='raise', threshold=1.0)
>>> frame = pl.DataFrame(
... {
... "col1": [1.0, 2.0, 3.0, 4.0, float("nan")],
... "col2": [1.0, float("nan"), 3.0, float("nan"), 5.0],
... "col3": [float("nan"), float("nan"), float("nan"), float("nan"), float("nan")],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞══════╪══════╪══════╡
│ 1.0 ┆ 1.0 ┆ NaN │
│ 2.0 ┆ NaN ┆ NaN │
│ 3.0 ┆ 3.0 ┆ NaN │
│ 4.0 ┆ NaN ┆ NaN │
│ NaN ┆ 5.0 ┆ NaN │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞══════╪══════╡
│ 1.0 ┆ 1.0 │
│ 2.0 ┆ NaN │
│ 3.0 ┆ 3.0 │
│ 4.0 ┆ NaN │
│ NaN ┆ 5.0 │
└──────┴──────┘
grizz.transformer.DropNanColumnTransformer ¶
Bases: BaseInNTransformer
Implement a transformer to remove the columns that have too many NaN values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
None
|
threshold
|
float
|
The maximum percentage of NaN values to keep
columns. If the proportion of NaN vallues is greater
or equal to this threshold value, the column is removed.
If set to |
1.0
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNanColumn
>>> transformer = DropNanColumn()
>>> transformer
DropNanColumnTransformer(columns=None, exclude_columns=(), missing_policy='raise', threshold=1.0)
>>> frame = pl.DataFrame(
... {
... "col1": [1.0, 2.0, 3.0, 4.0, float("nan")],
... "col2": [1.0, float("nan"), 3.0, float("nan"), 5.0],
... "col3": [float("nan"), float("nan"), float("nan"), float("nan"), float("nan")],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞══════╪══════╪══════╡
│ 1.0 ┆ 1.0 ┆ NaN │
│ 2.0 ┆ NaN ┆ NaN │
│ 3.0 ┆ 3.0 ┆ NaN │
│ 4.0 ┆ NaN ┆ NaN │
│ NaN ┆ 5.0 ┆ NaN │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞══════╪══════╡
│ 1.0 ┆ 1.0 │
│ 2.0 ┆ NaN │
│ 3.0 ┆ 3.0 │
│ 4.0 ┆ NaN │
│ NaN ┆ 5.0 │
└──────┴──────┘
grizz.transformer.DropNanRow ¶
Bases: BaseInNTransformer
Implement a transformer to drop all rows that contain NaN values.
Note that all the values in the row need to be NaN to drop the row.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to check. If set to |
None
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNanRow
>>> transformer = DropNanRow()
>>> transformer
DropNanRowTransformer(columns=None, exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1.0, 2.0, 3.0, 4.0, float("nan")],
... "col2": [1.0, float("nan"), 3.0, float("nan"), float("nan")],
... "col3": [float("nan"), float("nan"), float("nan"), float("nan"), float("nan")],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞══════╪══════╪══════╡
│ 1.0 ┆ 1.0 ┆ NaN │
│ 2.0 ┆ NaN ┆ NaN │
│ 3.0 ┆ 3.0 ┆ NaN │
│ 4.0 ┆ NaN ┆ NaN │
│ NaN ┆ NaN ┆ NaN │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞══════╪══════╪══════╡
│ 1.0 ┆ 1.0 ┆ NaN │
│ 2.0 ┆ NaN ┆ NaN │
│ 3.0 ┆ 3.0 ┆ NaN │
│ 4.0 ┆ NaN ┆ NaN │
└──────┴──────┴──────┘
grizz.transformer.DropNanRowTransformer ¶
Bases: BaseInNTransformer
Implement a transformer to drop all rows that contain NaN values.
Note that all the values in the row need to be NaN to drop the row.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to check. If set to |
None
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNanRow
>>> transformer = DropNanRow()
>>> transformer
DropNanRowTransformer(columns=None, exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1.0, 2.0, 3.0, 4.0, float("nan")],
... "col2": [1.0, float("nan"), 3.0, float("nan"), float("nan")],
... "col3": [float("nan"), float("nan"), float("nan"), float("nan"), float("nan")],
... }
... )
>>> frame
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞══════╪══════╪══════╡
│ 1.0 ┆ 1.0 ┆ NaN │
│ 2.0 ┆ NaN ┆ NaN │
│ 3.0 ┆ 3.0 ┆ NaN │
│ 4.0 ┆ NaN ┆ NaN │
│ NaN ┆ NaN ┆ NaN │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞══════╪══════╪══════╡
│ 1.0 ┆ 1.0 ┆ NaN │
│ 2.0 ┆ NaN ┆ NaN │
│ 3.0 ┆ 3.0 ┆ NaN │
│ 4.0 ┆ NaN ┆ NaN │
└──────┴──────┴──────┘
grizz.transformer.DropNullColumn ¶
Bases: BaseInNTransformer
Implement a transformer to remove the columns that have too many null values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
None
|
threshold
|
float
|
The maximum percentage of null values to keep
columns. If the proportion of null vallues is greater
or equal to this threshold value, the column is removed.
If set to |
1.0
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNullColumn
>>> transformer = DropNullColumn()
>>> transformer
DropNullColumnTransformer(columns=None, exclude_columns=(), missing_policy='raise', threshold=1.0)
>>> frame = pl.DataFrame(
... {
... "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
... "col2": [1, None, 3, None, 5],
... "col3": [None, None, None, None, None],
... }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1 ┆ 1 ┆ null │
│ 2020-1-2 ┆ null ┆ null │
│ 2020-1-31 ┆ 3 ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null ┆ 5 ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌────────────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ i64 │
╞════════════╪══════╡
│ 2020-1-1 ┆ 1 │
│ 2020-1-2 ┆ null │
│ 2020-1-31 ┆ 3 │
│ 2020-12-31 ┆ null │
│ null ┆ 5 │
└────────────┴──────┘
grizz.transformer.DropNullColumnTransformer ¶
Bases: BaseInNTransformer
Implement a transformer to remove the columns that have too many null values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
None
|
threshold
|
float
|
The maximum percentage of null values to keep
columns. If the proportion of null vallues is greater
or equal to this threshold value, the column is removed.
If set to |
1.0
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNullColumn
>>> transformer = DropNullColumn()
>>> transformer
DropNullColumnTransformer(columns=None, exclude_columns=(), missing_policy='raise', threshold=1.0)
>>> frame = pl.DataFrame(
... {
... "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
... "col2": [1, None, 3, None, 5],
... "col3": [None, None, None, None, None],
... }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1 ┆ 1 ┆ null │
│ 2020-1-2 ┆ null ┆ null │
│ 2020-1-31 ┆ 3 ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null ┆ 5 ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌────────────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ i64 │
╞════════════╪══════╡
│ 2020-1-1 ┆ 1 │
│ 2020-1-2 ┆ null │
│ 2020-1-31 ┆ 3 │
│ 2020-12-31 ┆ null │
│ null ┆ 5 │
└────────────┴──────┘
grizz.transformer.DropNullRow ¶
Bases: BaseInNTransformer
Implement a transformer to drop all rows that contain null values.
Note that all the values in the row need to be null to drop the row.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to check. If set to |
None
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> transformer = DropNullRow()
>>> transformer
DropNullRowTransformer(columns=None, exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
... "col2": [1, None, 3, None, None],
... "col3": [None, None, None, None, None],
... }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1 ┆ 1 ┆ null │
│ 2020-1-2 ┆ null ┆ null │
│ 2020-1-31 ┆ 3 ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null ┆ null ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌────────────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1 ┆ 1 ┆ null │
│ 2020-1-2 ┆ null ┆ null │
│ 2020-1-31 ┆ 3 ┆ null │
│ 2020-12-31 ┆ null ┆ null │
└────────────┴──────┴──────┘
grizz.transformer.DropNullRowTransformer ¶
Bases: BaseInNTransformer
Implement a transformer to drop all rows that contain null values.
Note that all the values in the row need to be null to drop the row.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to check. If set to |
None
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> transformer = DropNullRow()
>>> transformer
DropNullRowTransformer(columns=None, exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
... "col2": [1, None, 3, None, None],
... "col3": [None, None, None, None, None],
... }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1 ┆ 1 ┆ null │
│ 2020-1-2 ┆ null ┆ null │
│ 2020-1-31 ┆ 3 ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null ┆ null ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌────────────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1 ┆ 1 ┆ null │
│ 2020-1-2 ┆ null ┆ null │
│ 2020-1-31 ┆ 3 ┆ null │
│ 2020-12-31 ┆ null ┆ null │
└────────────┴──────┴──────┘
grizz.transformer.Equal ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the equal operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Equal
>>> transformer = Equal(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
EqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ false ┆ false │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ false ┆ false │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ true ┆ false │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ false ┆ false │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ false ┆ false │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.EqualMissing ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the equal operation where null values are not propagated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import EqualMissing
>>> transformer = EqualMissing(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
EqualMissingTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ false ┆ false │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ false ┆ false │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ true ┆ false │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ false ┆ false │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ false ┆ false │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.EqualMissingTransformer ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the equal operation where null values are not propagated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import EqualMissing
>>> transformer = EqualMissing(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
EqualMissingTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ false ┆ false │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ false ┆ false │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ true ┆ false │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ false ┆ false │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ false ┆ false │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.EqualTransformer ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the equal operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Equal
>>> transformer = Equal(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
EqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ false ┆ false │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ false ┆ false │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ true ┆ false │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ false ┆ false │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ false ┆ false │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.FillNan ¶
Bases: BaseInNOutNTransformer
Implement a transformer to fill NaN values.
This transformer ignores the columns that are not of type float.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FillNan
>>> transformer = FillNan(columns=["col1", "col4"], prefix="", suffix="_out", value=100)
>>> transformer
FillNanTransformer(columns=('col1', 'col4'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', value=100)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, None],
... "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
... "col3": ["a", "b", "c", "d", None],
... "col4": [1.2, float("nan"), 3.2, None, 5.2],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ NaN ┆ null ┆ 5.2 │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col4_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN ┆ 100.0 │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null ┆ null │
│ null ┆ NaN ┆ null ┆ 5.2 ┆ 5.2 │
└──────┴──────┴──────┴──────┴──────────┘
grizz.transformer.FillNanTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to fill NaN values.
This transformer ignores the columns that are not of type float.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FillNan
>>> transformer = FillNan(columns=["col1", "col4"], prefix="", suffix="_out", value=100)
>>> transformer
FillNanTransformer(columns=('col1', 'col4'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', value=100)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, None],
... "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
... "col3": ["a", "b", "c", "d", None],
... "col4": [1.2, float("nan"), 3.2, None, 5.2],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ NaN ┆ null ┆ 5.2 │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col4_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN ┆ 100.0 │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null ┆ null │
│ null ┆ NaN ┆ null ┆ 5.2 ┆ 5.2 │
└──────┴──────┴──────┴──────┴──────────┘
grizz.transformer.FillNull ¶
Bases: BaseInNOutNTransformer
Implement a transformer to fill null values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FillNull
>>> transformer = FillNull(columns=["col1", "col4"], prefix="", suffix="_out", value=100)
>>> transformer
FillNullTransformer(columns=('col1', 'col4'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', value=100)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, None],
... "col2": [1.2, 2.2, 3.2, 4.2, None],
... "col3": ["a", "b", "c", "d", None],
... "col4": [1.2, float("nan"), 3.2, None, 5.2],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ null ┆ null ┆ 5.2 │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col4_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 ┆ i64 ┆ f64 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 ┆ 1 ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN ┆ 2 ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 ┆ 3 ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null ┆ 4 ┆ 100.0 │
│ null ┆ null ┆ null ┆ 5.2 ┆ 100 ┆ 5.2 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.FillNullTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to fill null values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FillNull
>>> transformer = FillNull(columns=["col1", "col4"], prefix="", suffix="_out", value=100)
>>> transformer
FillNullTransformer(columns=('col1', 'col4'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', value=100)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, None],
... "col2": [1.2, 2.2, 3.2, 4.2, None],
... "col3": ["a", "b", "c", "d", None],
... "col4": [1.2, float("nan"), 3.2, None, 5.2],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ null ┆ null ┆ 5.2 │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col4_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 ┆ i64 ┆ f64 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 ┆ 1 ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN ┆ 2 ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 ┆ 3 ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null ┆ 4 ┆ 100.0 │
│ null ┆ null ┆ null ┆ 5.2 ┆ 100 ┆ 5.2 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.FilterCardinality ¶
Bases: BaseInNTransformer
Implement a transformer to filter based on the cardinality (i.e. number of unique values) in each column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to use to filter based on the number of
unique values. If |
None
|
n_min
|
int
|
The minimal cardinality (included). |
0
|
n_max
|
int
|
The maximal cardinality (excluded). |
float('inf')
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FilterCardinality
>>> transformer = FilterCardinality(columns=["col1", "col2", "col3"], n_min=2, n_max=5)
>>> transformer
FilterCardinalityTransformer(columns=('col1', 'col2', 'col3'), exclude_columns=(), missing_policy='raise', n_min=2, n_max=5)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1, 1, 1, 1, 1],
... "col3": ["a", "b", "c", "a", "b"],
... "col4": [1.2, float("nan"), 3.2, None, 5.2],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ a ┆ 1.2 │
│ 2 ┆ 1 ┆ b ┆ NaN │
│ 3 ┆ 1 ┆ c ┆ 3.2 │
│ 4 ┆ 1 ┆ a ┆ null │
│ 5 ┆ 1 ┆ b ┆ 5.2 │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col3 ┆ col4 │
│ --- ┆ --- │
│ str ┆ f64 │
╞══════╪══════╡
│ a ┆ 1.2 │
│ b ┆ NaN │
│ c ┆ 3.2 │
│ a ┆ null │
│ b ┆ 5.2 │
└──────┴──────┘
grizz.transformer.FilterCardinalityTransformer ¶
Bases: BaseInNTransformer
Implement a transformer to filter based on the cardinality (i.e. number of unique values) in each column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to use to filter based on the number of
unique values. If |
None
|
n_min
|
int
|
The minimal cardinality (included). |
0
|
n_max
|
int
|
The maximal cardinality (excluded). |
float('inf')
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FilterCardinality
>>> transformer = FilterCardinality(columns=["col1", "col2", "col3"], n_min=2, n_max=5)
>>> transformer
FilterCardinalityTransformer(columns=('col1', 'col2', 'col3'), exclude_columns=(), missing_policy='raise', n_min=2, n_max=5)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1, 1, 1, 1, 1],
... "col3": ["a", "b", "c", "a", "b"],
... "col4": [1.2, float("nan"), 3.2, None, 5.2],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ a ┆ 1.2 │
│ 2 ┆ 1 ┆ b ┆ NaN │
│ 3 ┆ 1 ┆ c ┆ 3.2 │
│ 4 ┆ 1 ┆ a ┆ null │
│ 5 ┆ 1 ┆ b ┆ 5.2 │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col3 ┆ col4 │
│ --- ┆ --- │
│ str ┆ f64 │
╞══════╪══════╡
│ a ┆ 1.2 │
│ b ┆ NaN │
│ c ┆ 3.2 │
│ a ┆ null │
│ b ┆ 5.2 │
└──────┴──────┘
grizz.transformer.FirstRow ¶
Bases: BaseArgTransformer
Implement a transformer that select the first n
rows.
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FirstRow
>>> transformer = FirstRow(n=3)
>>> transformer
FirstRowTransformer(n=3)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
└──────┴──────┴──────┴──────┘
grizz.transformer.FirstRowTransformer ¶
Bases: BaseArgTransformer
Implement a transformer that select the first n
rows.
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FirstRow
>>> transformer = FirstRow(n=3)
>>> transformer
FirstRowTransformer(n=3)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
└──────┴──────┴──────┴──────┘
grizz.transformer.FloatCast ¶
Bases: CastTransformer
Implement a transformer to convert float columns to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FloatCast
>>> transformer = FloatCast(
... columns=["col1", "col2"], dtype=pl.Int32, prefix="", suffix="_out"
... )
>>> transformer
FloatCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float64,
... "col3": pl.Float64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1.0 ┆ a │
│ 2 ┆ 2.0 ┆ 2.0 ┆ b │
│ 3 ┆ 3.0 ┆ 3.0 ┆ c │
│ 4 ┆ 4.0 ┆ 4.0 ┆ d │
│ 5 ┆ 5.0 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col2_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 ┆ str ┆ i32 │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1 ┆ 1.0 ┆ 1.0 ┆ a ┆ 1 │
│ 2 ┆ 2.0 ┆ 2.0 ┆ b ┆ 2 │
│ 3 ┆ 3.0 ┆ 3.0 ┆ c ┆ 3 │
│ 4 ┆ 4.0 ┆ 4.0 ┆ d ┆ 4 │
│ 5 ┆ 5.0 ┆ 5.0 ┆ e ┆ 5 │
└──────┴──────┴──────┴──────┴──────────┘
grizz.transformer.FloatCastTransformer ¶
Bases: CastTransformer
Implement a transformer to convert float columns to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FloatCast
>>> transformer = FloatCast(
... columns=["col1", "col2"], dtype=pl.Int32, prefix="", suffix="_out"
... )
>>> transformer
FloatCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float64,
... "col3": pl.Float64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1.0 ┆ a │
│ 2 ┆ 2.0 ┆ 2.0 ┆ b │
│ 3 ┆ 3.0 ┆ 3.0 ┆ c │
│ 4 ┆ 4.0 ┆ 4.0 ┆ d │
│ 5 ┆ 5.0 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col2_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 ┆ str ┆ i32 │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1 ┆ 1.0 ┆ 1.0 ┆ a ┆ 1 │
│ 2 ┆ 2.0 ┆ 2.0 ┆ b ┆ 2 │
│ 3 ┆ 3.0 ┆ 3.0 ┆ c ┆ 3 │
│ 4 ┆ 4.0 ┆ 4.0 ┆ d ┆ 4 │
│ 5 ┆ 5.0 ┆ 5.0 ┆ e ┆ 5 │
└──────┴──────┴──────┴──────┴──────────┘
grizz.transformer.Function ¶
Bases: BaseArgTransformer
Implement a transformer that is a wrapper around a function to transform the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func
|
Callable[[DataFrame], DataFrame]
|
The function to transform the DataFrame. |
required |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FunctionTransformer
>>> transformer = FunctionTransformer(
... func=lambda frame: frame.filter(pl.col("col1").is_in({2, 4}))
... )
>>> transformer
FunctionTransformer(func=<function <lambda> at 0x...>)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (2, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 4 ┆ 4 ┆ 4 ┆ d │
└──────┴──────┴──────┴──────┘
grizz.transformer.FunctionTransformer ¶
Bases: BaseArgTransformer
Implement a transformer that is a wrapper around a function to transform the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func
|
Callable[[DataFrame], DataFrame]
|
The function to transform the DataFrame. |
required |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FunctionTransformer
>>> transformer = FunctionTransformer(
... func=lambda frame: frame.filter(pl.col("col1").is_in({2, 4}))
... )
>>> transformer
FunctionTransformer(func=<function <lambda> at 0x...>)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (2, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 4 ┆ 4 ┆ 4 ┆ d │
└──────┴──────┴──────┴──────┘
grizz.transformer.Greater ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the greater than operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Greater
>>> transformer = Greater(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
GreaterTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ false ┆ true │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ false ┆ true │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ false ┆ true │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ false ┆ true │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ true ┆ true │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.GreaterEqual ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the greater than or equal operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import GreaterEqual
>>> transformer = GreaterEqual(
... columns=["col1", "col3"], target=4.2, prefix="", suffix="_out"
... )
>>> transformer
GreaterEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ false ┆ true │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ false ┆ true │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ false ┆ true │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ false ┆ true │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ true ┆ true │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.GreaterEqualTransformer ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the greater than or equal operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import GreaterEqual
>>> transformer = GreaterEqual(
... columns=["col1", "col3"], target=4.2, prefix="", suffix="_out"
... )
>>> transformer
GreaterEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ false ┆ true │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ false ┆ true │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ false ┆ true │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ false ┆ true │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ true ┆ true │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.GreaterTransformer ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the greater than operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Greater
>>> transformer = Greater(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
GreaterTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ false ┆ true │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ false ┆ true │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ false ┆ true │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ false ┆ true │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ true ┆ true │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.InplaceCast ¶
Bases: CastTransformer
Implement a transformer to convert some columns to a new data type.
InplaceCastTransformer
is a specific implementation of
CastTransformer
that performs the transformation in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
InplaceCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.InplaceCastTransformer ¶
Bases: CastTransformer
Implement a transformer to convert some columns to a new data type.
InplaceCastTransformer
is a specific implementation of
CastTransformer
that performs the transformation in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceCast
>>> transformer = InplaceCast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
InplaceCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.InplaceCategoricalCast ¶
Bases: CategoricalCastTransformer
Implement a transformer to convert a column to categorical data type.
InplaceCategoricalCastTransformer
is a specific implementation
of CategoricalCastTransformer
that performs the transformation
in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col
|
str
|
The column name to cast. |
required |
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceCategoricalCast
>>> transformer = InplaceCategoricalCast(col="col1")
>>> transformer
InplaceCategoricalCastTransformer(col='col1', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": ["a", "b", "c", "d", "e"],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... },
... schema={"col1": pl.String, "col2": pl.Float64},
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ f64 │
╞══════╪══════╡
│ a ┆ 1.0 │
│ b ┆ 2.0 │
│ c ┆ 3.0 │
│ d ┆ 4.0 │
│ e ┆ 5.0 │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ cat ┆ f64 │
╞══════╪══════╡
│ a ┆ 1.0 │
│ b ┆ 2.0 │
│ c ┆ 3.0 │
│ d ┆ 4.0 │
│ e ┆ 5.0 │
└──────┴──────┘
grizz.transformer.InplaceCategoricalCastTransformer ¶
Bases: CategoricalCastTransformer
Implement a transformer to convert a column to categorical data type.
InplaceCategoricalCastTransformer
is a specific implementation
of CategoricalCastTransformer
that performs the transformation
in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col
|
str
|
The column name to cast. |
required |
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceCategoricalCast
>>> transformer = InplaceCategoricalCast(col="col1")
>>> transformer
InplaceCategoricalCastTransformer(col='col1', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": ["a", "b", "c", "d", "e"],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... },
... schema={"col1": pl.String, "col2": pl.Float64},
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ f64 │
╞══════╪══════╡
│ a ┆ 1.0 │
│ b ┆ 2.0 │
│ c ┆ 3.0 │
│ d ┆ 4.0 │
│ e ┆ 5.0 │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ cat ┆ f64 │
╞══════╪══════╡
│ a ┆ 1.0 │
│ b ┆ 2.0 │
│ c ┆ 3.0 │
│ d ┆ 4.0 │
│ e ┆ 5.0 │
└──────┴──────┘
grizz.transformer.InplaceDecimalCast ¶
Bases: InplaceCastTransformer
Implement a transformer to convert decimal columns to a new data type.
InplaceCastTransformer
is a specific implementation of
CastTransformer
that performs the transformation in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceDecimalCast
>>> transformer = InplaceDecimalCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
InplaceDecimalCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Decimal,
... "col3": pl.Decimal,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────────────┬──────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ decimal[*,0] ┆ decimal[*,0] ┆ str │
╞══════╪══════════════╪══════════════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────────────┴──────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f32 ┆ decimal[*,0] ┆ str │
╞══════╪══════╪══════════════╪══════╡
│ 1 ┆ 1.0 ┆ 1 ┆ a │
│ 2 ┆ 2.0 ┆ 2 ┆ b │
│ 3 ┆ 3.0 ┆ 3 ┆ c │
│ 4 ┆ 4.0 ┆ 4 ┆ d │
│ 5 ┆ 5.0 ┆ 5 ┆ e │
└──────┴──────┴──────────────┴──────┘
grizz.transformer.InplaceDecimalCastTransformer ¶
Bases: InplaceCastTransformer
Implement a transformer to convert decimal columns to a new data type.
InplaceCastTransformer
is a specific implementation of
CastTransformer
that performs the transformation in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceDecimalCast
>>> transformer = InplaceDecimalCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
InplaceDecimalCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Decimal,
... "col3": pl.Decimal,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────────────┬──────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ decimal[*,0] ┆ decimal[*,0] ┆ str │
╞══════╪══════════════╪══════════════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────────────┴──────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f32 ┆ decimal[*,0] ┆ str │
╞══════╪══════╪══════════════╪══════╡
│ 1 ┆ 1.0 ┆ 1 ┆ a │
│ 2 ┆ 2.0 ┆ 2 ┆ b │
│ 3 ┆ 3.0 ┆ 3 ┆ c │
│ 4 ┆ 4.0 ┆ 4 ┆ d │
│ 5 ┆ 5.0 ┆ 5 ┆ e │
└──────┴──────┴──────────────┴──────┘
grizz.transformer.InplaceFillNan ¶
Bases: FillNanTransformer
Implement a transformer to fill NaN values.
This transformer ignores the columns that are not of type float.
InplaceFillNanTransformer
is a specific implementation of
FillNanTransformer
that performs the transformation in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceFillNan
>>> transformer = InplaceFillNan(columns=["col1", "col4"], value=100)
>>> transformer
InplaceFillNanTransformer(columns=('col1', 'col4'), exclude_columns=(), missing_policy='raise', value=100)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, None],
... "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
... "col3": ["a", "b", "c", "d", None],
... "col4": [1.2, float("nan"), 3.2, None, 5.2],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ NaN ┆ null ┆ 5.2 │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ 100.0 │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ NaN ┆ null ┆ 5.2 │
└──────┴──────┴──────┴───────┘
grizz.transformer.InplaceFillNanTransformer ¶
Bases: FillNanTransformer
Implement a transformer to fill NaN values.
This transformer ignores the columns that are not of type float.
InplaceFillNanTransformer
is a specific implementation of
FillNanTransformer
that performs the transformation in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceFillNan
>>> transformer = InplaceFillNan(columns=["col1", "col4"], value=100)
>>> transformer
InplaceFillNanTransformer(columns=('col1', 'col4'), exclude_columns=(), missing_policy='raise', value=100)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, None],
... "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
... "col3": ["a", "b", "c", "d", None],
... "col4": [1.2, float("nan"), 3.2, None, 5.2],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ NaN ┆ null ┆ 5.2 │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ 100.0 │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ NaN ┆ null ┆ 5.2 │
└──────┴──────┴──────┴───────┘
grizz.transformer.InplaceFillNull ¶
Bases: FillNullTransformer
Implement a transformer to fill null values.
This transformer ignores the columns that are not of type float.
InplaceFillNullTransformer
is a specific implementation of
FillNullTransformer
that performs the transformation in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceFillNull
>>> transformer = InplaceFillNull(columns=["col1", "col4"], value=100)
>>> transformer
InplaceFillNullTransformer(columns=('col1', 'col4'), exclude_columns=(), missing_policy='raise', value=100)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, None],
... "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
... "col3": ["a", "b", "c", "d", None],
... "col4": [1.2, float("nan"), 3.2, None, 5.2],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ NaN ┆ null ┆ 5.2 │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ 100.0 │
│ 100 ┆ NaN ┆ null ┆ 5.2 │
└──────┴──────┴──────┴───────┘
grizz.transformer.InplaceFillNullTransformer ¶
Bases: FillNullTransformer
Implement a transformer to fill null values.
This transformer ignores the columns that are not of type float.
InplaceFillNullTransformer
is a specific implementation of
FillNullTransformer
that performs the transformation in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceFillNull
>>> transformer = InplaceFillNull(columns=["col1", "col4"], value=100)
>>> transformer
InplaceFillNullTransformer(columns=('col1', 'col4'), exclude_columns=(), missing_policy='raise', value=100)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, None],
... "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
... "col3": ["a", "b", "c", "d", None],
... "col4": [1.2, float("nan"), 3.2, None, 5.2],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ NaN ┆ null ┆ 5.2 │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ 100.0 │
│ 100 ┆ NaN ┆ null ┆ 5.2 │
└──────┴──────┴──────┴───────┘
grizz.transformer.InplaceFloatCast ¶
Bases: InplaceCastTransformer
Implement a transformer to convert float columns to a new data type.
InplaceCastTransformer
is a specific implementation of
CastTransformer
that performs the transformation in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceFloatCast
>>> transformer = InplaceFloatCast(columns=["col1", "col2"], dtype=pl.Int32)
>>> transformer
InplaceFloatCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float64,
... "col3": pl.Float64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1.0 ┆ a │
│ 2 ┆ 2.0 ┆ 2.0 ┆ b │
│ 3 ┆ 3.0 ┆ 3.0 ┆ c │
│ 4 ┆ 4.0 ┆ 4.0 ┆ d │
│ 5 ┆ 5.0 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i32 ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1.0 ┆ a │
│ 2 ┆ 2 ┆ 2.0 ┆ b │
│ 3 ┆ 3 ┆ 3.0 ┆ c │
│ 4 ┆ 4 ┆ 4.0 ┆ d │
│ 5 ┆ 5 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.InplaceFloatCastTransformer ¶
Bases: InplaceCastTransformer
Implement a transformer to convert float columns to a new data type.
InplaceCastTransformer
is a specific implementation of
CastTransformer
that performs the transformation in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceFloatCast
>>> transformer = InplaceFloatCast(columns=["col1", "col2"], dtype=pl.Int32)
>>> transformer
InplaceFloatCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float64,
... "col3": pl.Float64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1.0 ┆ a │
│ 2 ┆ 2.0 ┆ 2.0 ┆ b │
│ 3 ┆ 3.0 ┆ 3.0 ┆ c │
│ 4 ┆ 4.0 ┆ 4.0 ┆ d │
│ 5 ┆ 5.0 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i32 ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1.0 ┆ a │
│ 2 ┆ 2 ┆ 2.0 ┆ b │
│ 3 ┆ 3 ┆ 3.0 ┆ c │
│ 4 ┆ 4 ┆ 4.0 ┆ d │
│ 5 ┆ 5 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.InplaceIntegerCast ¶
Bases: InplaceCastTransformer
Implement a transformer to convert integer columns to a new data type.
InplaceCastTransformer
is a specific implementation of
CastTransformer
that performs the transformation in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceIntegerCast
>>> transformer = InplaceIntegerCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
InplaceIntegerCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1, 2, 3, 4, 5],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float64,
... "col3": pl.Int64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1 ┆ a │
│ 2 ┆ 2.0 ┆ 2 ┆ b │
│ 3 ┆ 3.0 ┆ 3 ┆ c │
│ 4 ┆ 4.0 ┆ 4 ┆ d │
│ 5 ┆ 5.0 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f32 ┆ f64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1.0 ┆ 1.0 ┆ 1 ┆ a │
│ 2.0 ┆ 2.0 ┆ 2 ┆ b │
│ 3.0 ┆ 3.0 ┆ 3 ┆ c │
│ 4.0 ┆ 4.0 ┆ 4 ┆ d │
│ 5.0 ┆ 5.0 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.InplaceIntegerCastTransformer ¶
Bases: InplaceCastTransformer
Implement a transformer to convert integer columns to a new data type.
InplaceCastTransformer
is a specific implementation of
CastTransformer
that performs the transformation in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceIntegerCast
>>> transformer = InplaceIntegerCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
InplaceIntegerCastTransformer(columns=('col1', 'col2'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1, 2, 3, 4, 5],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float64,
... "col3": pl.Int64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1 ┆ a │
│ 2 ┆ 2.0 ┆ 2 ┆ b │
│ 3 ┆ 3.0 ┆ 3 ┆ c │
│ 4 ┆ 4.0 ┆ 4 ┆ d │
│ 5 ┆ 5.0 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f32 ┆ f64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1.0 ┆ 1.0 ┆ 1 ┆ a │
│ 2.0 ┆ 2.0 ┆ 2 ┆ b │
│ 3.0 ┆ 3.0 ┆ 3 ┆ c │
│ 4.0 ┆ 4.0 ┆ 4 ┆ d │
│ 5.0 ┆ 5.0 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.InplaceJsonDecode ¶
Bases: JsonDecodeTransformer
Implement a transformer to parse string values as JSON.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to parse. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceJsonDecode
>>> transformer = InplaceJsonDecode(columns=["col1", "col3"])
>>> transformer
InplaceJsonDecodeTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2] ┆ 1 ┆ ['1', '2'] ┆ a │
│ [2] ┆ 2 ┆ ['2'] ┆ b │
│ [1, 2, 3] ┆ 3 ┆ ['1', '2', '3'] ┆ c │
│ [4, 5] ┆ 4 ┆ ['4', '5'] ┆ d │
│ [5, 4] ┆ 5 ┆ ['5', '4'] ┆ e │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ list[i64] ┆ str ┆ list[str] ┆ str │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2] ┆ 1 ┆ ["1", "2"] ┆ a │
│ [2] ┆ 2 ┆ ["2"] ┆ b │
│ [1, 2, 3] ┆ 3 ┆ ["1", "2", "3"] ┆ c │
│ [4, 5] ┆ 4 ┆ ["4", "5"] ┆ d │
│ [5, 4] ┆ 5 ┆ ["5", "4"] ┆ e │
└───────────┴──────┴─────────────────┴──────┘
grizz.transformer.InplaceJsonDecodeTransformer ¶
Bases: JsonDecodeTransformer
Implement a transformer to parse string values as JSON.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to parse. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceJsonDecode
>>> transformer = InplaceJsonDecode(columns=["col1", "col3"])
>>> transformer
InplaceJsonDecodeTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2] ┆ 1 ┆ ['1', '2'] ┆ a │
│ [2] ┆ 2 ┆ ['2'] ┆ b │
│ [1, 2, 3] ┆ 3 ┆ ['1', '2', '3'] ┆ c │
│ [4, 5] ┆ 4 ┆ ['4', '5'] ┆ d │
│ [5, 4] ┆ 5 ┆ ['5', '4'] ┆ e │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ list[i64] ┆ str ┆ list[str] ┆ str │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2] ┆ 1 ┆ ["1", "2"] ┆ a │
│ [2] ┆ 2 ┆ ["2"] ┆ b │
│ [1, 2, 3] ┆ 3 ┆ ["1", "2", "3"] ┆ c │
│ [4, 5] ┆ 4 ┆ ["4", "5"] ┆ d │
│ [5, 4] ┆ 5 ┆ ["5", "4"] ┆ e │
└───────────┴──────┴─────────────────┴──────┘
grizz.transformer.InplaceLabelEncoder ¶
Bases: LabelEncoderTransformer
Implement a polars.DataFrame
to encode the labels in a given
column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col
|
str
|
The column name to transform |
required |
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceLabelEncoder
>>> transformer = InplaceLabelEncoder(col="col1")
>>> transformer
InplaceLabelEncoderTransformer(col='col1', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": ["a", "b", "c", "d", "e"],
... "col2": ["1", "2", "3", "4", "5"],
... }
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ 4 │
│ e ┆ 5 │
└──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ i64 ┆ str │
╞══════╪══════╡
│ 0 ┆ 1 │
│ 1 ┆ 2 │
│ 2 ┆ 3 │
│ 3 ┆ 4 │
│ 4 ┆ 5 │
└──────┴──────┘
grizz.transformer.InplaceLabelEncoderTransformer ¶
Bases: LabelEncoderTransformer
Implement a polars.DataFrame
to encode the labels in a given
column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col
|
str
|
The column name to transform |
required |
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceLabelEncoder
>>> transformer = InplaceLabelEncoder(col="col1")
>>> transformer
InplaceLabelEncoderTransformer(col='col1', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": ["a", "b", "c", "d", "e"],
... "col2": ["1", "2", "3", "4", "5"],
... }
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ 4 │
│ e ┆ 5 │
└──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ i64 ┆ str │
╞══════╪══════╡
│ 0 ┆ 1 │
│ 1 ┆ 2 │
│ 2 ┆ 3 │
│ 3 ┆ 4 │
│ 4 ┆ 5 │
└──────┴──────┘
grizz.transformer.InplaceNumericCast ¶
Bases: InplaceCastTransformer
Implement a transformer to convert numeric columns to a new data type.
InplaceCastTransformer
is a specific implementation of
CastTransformer
that performs the transformation in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceNumericCast
>>> transformer = InplaceNumericCast(columns=["col1", "col3"], dtype=pl.Float32)
>>> transformer
InplaceNumericCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float32,
... "col3": pl.Float64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f32 ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1.0 ┆ a │
│ 2 ┆ 2.0 ┆ 2.0 ┆ b │
│ 3 ┆ 3.0 ┆ 3.0 ┆ c │
│ 4 ┆ 4.0 ┆ 4.0 ┆ d │
│ 5 ┆ 5.0 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f32 ┆ f32 ┆ f32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1.0 ┆ 1.0 ┆ 1.0 ┆ a │
│ 2.0 ┆ 2.0 ┆ 2.0 ┆ b │
│ 3.0 ┆ 3.0 ┆ 3.0 ┆ c │
│ 4.0 ┆ 4.0 ┆ 4.0 ┆ d │
│ 5.0 ┆ 5.0 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.InplaceNumericCastTransformer ¶
Bases: InplaceCastTransformer
Implement a transformer to convert numeric columns to a new data type.
InplaceCastTransformer
is a specific implementation of
CastTransformer
that performs the transformation in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceNumericCast
>>> transformer = InplaceNumericCast(columns=["col1", "col3"], dtype=pl.Float32)
>>> transformer
InplaceNumericCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Float32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float32,
... "col3": pl.Float64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f32 ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1.0 ┆ a │
│ 2 ┆ 2.0 ┆ 2.0 ┆ b │
│ 3 ┆ 3.0 ┆ 3.0 ┆ c │
│ 4 ┆ 4.0 ┆ 4.0 ┆ d │
│ 5 ┆ 5.0 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f32 ┆ f32 ┆ f32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1.0 ┆ 1.0 ┆ 1.0 ┆ a │
│ 2.0 ┆ 2.0 ┆ 2.0 ┆ b │
│ 3.0 ┆ 3.0 ┆ 3.0 ┆ c │
│ 4.0 ┆ 4.0 ┆ 4.0 ┆ d │
│ 5.0 ┆ 5.0 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.InplacePowerTransformer ¶
Bases: PowerTransformer
Implement a transformer to apply a power transform featurewise to make data more Gaussian-like.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to transform. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplacePowerTransformer
>>> transformer = InplacePowerTransformer(columns=["col1", "col3"])
>>> transformer
InplacePowerTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [0, 10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ 0 ┆ a │
│ 1 ┆ 1 ┆ 10 ┆ b │
│ 2 ┆ 2 ┆ 20 ┆ c │
│ 3 ┆ 3 ┆ 30 ┆ d │
│ 4 ┆ 4 ┆ 40 ┆ e │
│ 5 ┆ 5 ┆ 50 ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 4)
┌───────────┬──────┬───────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ str ┆ f64 ┆ str │
╞═══════════╪══════╪═══════════╪══════╡
│ -1.567837 ┆ 0 ┆ -1.695398 ┆ a │
│ -0.836194 ┆ 1 ┆ -0.740367 ┆ b │
│ -0.210053 ┆ 2 ┆ -0.117399 ┆ c │
│ 0.356111 ┆ 3 ┆ 0.402585 ┆ d │
│ 0.881486 ┆ 4 ┆ 0.864187 ┆ e │
│ 1.376486 ┆ 5 ┆ 1.286392 ┆ f │
└───────────┴──────┴───────────┴──────┘
grizz.transformer.InplaceQuantileTransformer ¶
Bases: QuantileTransformer
Implement a transformer to apply the quantile transformation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceQuantileTransformer
>>> transformer = InplaceQuantileTransformer(columns=["col1", "col3"])
>>> transformer
InplaceQuantileTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [0, 10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ 0 ┆ a │
│ 1 ┆ 1 ┆ 10 ┆ b │
│ 2 ┆ 2 ┆ 20 ┆ c │
│ 3 ┆ 3 ┆ 30 ┆ d │
│ 4 ┆ 4 ┆ 40 ┆ e │
│ 5 ┆ 5 ┆ 50 ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ str ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0.0 ┆ 0 ┆ 0.0 ┆ a │
│ 0.2 ┆ 1 ┆ 0.2 ┆ b │
│ 0.4 ┆ 2 ┆ 0.4 ┆ c │
│ 0.6 ┆ 3 ┆ 0.6 ┆ d │
│ 0.8 ┆ 4 ┆ 0.8 ┆ e │
│ 1.0 ┆ 5 ┆ 1.0 ┆ f │
└──────┴──────┴──────┴──────┘
grizz.transformer.InplaceReplace ¶
Bases: ReplaceTransformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col
|
str
|
The column name. |
required |
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceReplace
>>> transformer = InplaceReplace(col="col", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5})
>>> transformer
InplaceReplaceTransformer(col='col', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
└─────┘
>>> transformer = InplaceReplace(col="col", old={"a": 1, "b": 2, "c": 3}, default=None)
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌──────┐
│ col │
│ --- │
│ i64 │
╞══════╡
│ 1 │
│ 2 │
│ 3 │
│ null │
│ null │
└──────┘
grizz.transformer.InplaceReplaceStrict ¶
Bases: ReplaceStrictTransformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col
|
str
|
The column name. |
required |
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceReplaceStrict
>>> transformer = InplaceReplaceStrict(
... col="col", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
InplaceReplaceStrictTransformer(col='col', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
└─────┘
>>> transformer = InplaceReplaceStrict(
... col="col", old={"a": 1, "b": 2, "c": 3}, default=None
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌──────┐
│ col │
│ --- │
│ i64 │
╞══════╡
│ 1 │
│ 2 │
│ 3 │
│ null │
│ null │
└──────┘
grizz.transformer.InplaceReplaceStrictTransformer ¶
Bases: ReplaceStrictTransformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col
|
str
|
The column name. |
required |
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceReplaceStrict
>>> transformer = InplaceReplaceStrict(
... col="col", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
InplaceReplaceStrictTransformer(col='col', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
└─────┘
>>> transformer = InplaceReplaceStrict(
... col="col", old={"a": 1, "b": 2, "c": 3}, default=None
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌──────┐
│ col │
│ --- │
│ i64 │
╞══════╡
│ 1 │
│ 2 │
│ 3 │
│ null │
│ null │
└──────┘
grizz.transformer.InplaceReplaceTransformer ¶
Bases: ReplaceTransformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col
|
str
|
The column name. |
required |
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceReplace
>>> transformer = InplaceReplace(col="col", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5})
>>> transformer
InplaceReplaceTransformer(col='col', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
└─────┘
>>> transformer = InplaceReplace(col="col", old={"a": 1, "b": 2, "c": 3}, default=None)
>>> out = transformer.transform(frame)
>>> out
shape: (5, 1)
┌──────┐
│ col │
│ --- │
│ i64 │
╞══════╡
│ 1 │
│ 2 │
│ 3 │
│ null │
│ null │
└──────┘
grizz.transformer.InplaceRobustScaler ¶
Bases: RobustScalerTransformer
Implement a transformer to scale each column using statistics that are robust to outliers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceRobustScaler
>>> transformer = InplaceRobustScaler(columns=["col1", "col3"])
>>> transformer
InplaceRobustScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [0, 10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ 0 ┆ a │
│ 1 ┆ 1 ┆ 10 ┆ b │
│ 2 ┆ 2 ┆ 20 ┆ c │
│ 3 ┆ 3 ┆ 30 ┆ d │
│ 4 ┆ 4 ┆ 40 ┆ e │
│ 5 ┆ 5 ┆ 50 ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ str ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ -1.0 ┆ 0 ┆ -1.0 ┆ a │
│ -0.6 ┆ 1 ┆ -0.6 ┆ b │
│ -0.2 ┆ 2 ┆ -0.2 ┆ c │
│ 0.2 ┆ 3 ┆ 0.2 ┆ d │
│ 0.6 ┆ 4 ┆ 0.6 ┆ e │
│ 1.0 ┆ 5 ┆ 1.0 ┆ f │
└──────┴──────┴──────┴──────┘
grizz.transformer.InplaceRobustScalerTransformer ¶
Bases: RobustScalerTransformer
Implement a transformer to scale each column using statistics that are robust to outliers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceRobustScaler
>>> transformer = InplaceRobustScaler(columns=["col1", "col3"])
>>> transformer
InplaceRobustScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [0, 10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ 0 ┆ a │
│ 1 ┆ 1 ┆ 10 ┆ b │
│ 2 ┆ 2 ┆ 20 ┆ c │
│ 3 ┆ 3 ┆ 30 ┆ d │
│ 4 ┆ 4 ┆ 40 ┆ e │
│ 5 ┆ 5 ┆ 50 ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ str ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ -1.0 ┆ 0 ┆ -1.0 ┆ a │
│ -0.6 ┆ 1 ┆ -0.6 ┆ b │
│ -0.2 ┆ 2 ┆ -0.2 ┆ c │
│ 0.2 ┆ 3 ┆ 0.2 ┆ d │
│ 0.6 ┆ 4 ┆ 0.6 ┆ e │
│ 1.0 ┆ 5 ┆ 1.0 ┆ f │
└──────┴──────┴──────┴──────┘
grizz.transformer.InplaceStandardScaler ¶
Bases: StandardScalerTransformer
Implement a transformer to standardize each column by removing the mean and scaling to unit variance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceStandardScaler
>>> transformer = InplaceStandardScaler(columns=["col1", "col3"])
>>> transformer
InplaceStandardScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬───────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ str ┆ f64 ┆ str │
╞═══════════╪══════╪═══════════╪══════╡
│ -1.414214 ┆ 1 ┆ -1.414214 ┆ a │
│ -0.707107 ┆ 2 ┆ -0.707107 ┆ b │
│ 0.0 ┆ 3 ┆ 0.0 ┆ c │
│ 0.707107 ┆ 4 ┆ 0.707107 ┆ d │
│ 1.414214 ┆ 5 ┆ 1.414214 ┆ e │
└───────────┴──────┴───────────┴──────┘
grizz.transformer.InplaceStandardScalerTransformer ¶
Bases: StandardScalerTransformer
Implement a transformer to standardize each column by removing the mean and scaling to unit variance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceStandardScaler
>>> transformer = InplaceStandardScaler(columns=["col1", "col3"])
>>> transformer
InplaceStandardScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬───────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ str ┆ f64 ┆ str │
╞═══════════╪══════╪═══════════╪══════╡
│ -1.414214 ┆ 1 ┆ -1.414214 ┆ a │
│ -0.707107 ┆ 2 ┆ -0.707107 ┆ b │
│ 0.0 ┆ 3 ┆ 0.0 ┆ c │
│ 0.707107 ┆ 4 ┆ 0.707107 ┆ d │
│ 1.414214 ┆ 5 ┆ 1.414214 ┆ e │
└───────────┴──────┴───────────┴──────┘
grizz.transformer.InplaceStringToDatetime ¶
Bases: StringToDatetimeTransformer
Implement a transformer to convert some string columns to
polars.Datetime
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert to |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceStringToDatetime
>>> transformer = InplaceStringToDatetime(columns=["col1"])
>>> transformer
InplaceStringToDatetimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [
... "2020-01-01 01:01:01",
... "2020-01-01 02:02:02",
... "2020-01-01 12:00:01",
... "2020-01-01 18:18:18",
... "2020-01-01 23:59:59",
... ],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [
... "2020-01-01 11:11:11",
... "2020-02-01 12:12:12",
... "2020-03-01 13:13:13",
... "2020-04-01 08:08:08",
... "2020-05-01 23:59:59",
... ],
... },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
grizz.transformer.InplaceStringToDatetimeTransformer ¶
Bases: StringToDatetimeTransformer
Implement a transformer to convert some string columns to
polars.Datetime
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert to |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceStringToDatetime
>>> transformer = InplaceStringToDatetime(columns=["col1"])
>>> transformer
InplaceStringToDatetimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [
... "2020-01-01 01:01:01",
... "2020-01-01 02:02:02",
... "2020-01-01 12:00:01",
... "2020-01-01 18:18:18",
... "2020-01-01 23:59:59",
... ],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [
... "2020-01-01 11:11:11",
... "2020-02-01 12:12:12",
... "2020-03-01 13:13:13",
... "2020-04-01 08:08:08",
... "2020-05-01 23:59:59",
... ],
... },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
grizz.transformer.InplaceStringToTime ¶
Bases: StringToTimeTransformer
Implement a transformer to convert some string columns to a
polars.Time
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceStringToTime
>>> transformer = InplaceStringToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
InplaceStringToTimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', format='%H:%M:%S')
>>> frame = pl.DataFrame(
... {
... "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ time ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
grizz.transformer.InplaceStringToTimeTransformer ¶
Bases: StringToTimeTransformer
Implement a transformer to convert some string columns to a
polars.Time
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceStringToTime
>>> transformer = InplaceStringToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
InplaceStringToTimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', format='%H:%M:%S')
>>> frame = pl.DataFrame(
... {
... "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ time ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
grizz.transformer.InplaceStripChars ¶
Bases: StripCharsTransformer
Implement a transformer to remove leading and trailing characters.
This transformer ignores the columns that are not of type string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to prepare. If |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceStripChars
>>> transformer = InplaceStripChars(columns=["col2", "col3"])
>>> transformer
InplaceStripCharsTransformer(columns=('col2', 'col3'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴──────┴───────┘
grizz.transformer.InplaceStripCharsTransformer ¶
Bases: StripCharsTransformer
Implement a transformer to remove leading and trailing characters.
This transformer ignores the columns that are not of type string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to prepare. If |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceStripChars
>>> transformer = InplaceStripChars(columns=["col2", "col3"])
>>> transformer
InplaceStripCharsTransformer(columns=('col2', 'col3'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴──────┴───────┘
grizz.transformer.InplaceToDatetime ¶
Bases: ToDatetimeTransformer
Implement a transformer to convert some columns to a
polars.Datetime
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert to |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceToDatetime
>>> transformer = InplaceToDatetime(columns=["col1"])
>>> transformer
InplaceToDatetimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [
... "2020-01-01 01:01:01",
... "2020-01-01 02:02:02",
... "2020-01-01 12:00:01",
... "2020-01-01 18:18:18",
... "2020-01-01 23:59:59",
... ],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [
... "2020-01-01 11:11:11",
... "2020-02-01 12:12:12",
... "2020-03-01 13:13:13",
... "2020-04-01 08:08:08",
... "2020-05-01 23:59:59",
... ],
... },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
grizz.transformer.InplaceToDatetimeTransformer ¶
Bases: ToDatetimeTransformer
Implement a transformer to convert some columns to a
polars.Datetime
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert to |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceToDatetime
>>> transformer = InplaceToDatetime(columns=["col1"])
>>> transformer
InplaceToDatetimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [
... "2020-01-01 01:01:01",
... "2020-01-01 02:02:02",
... "2020-01-01 12:00:01",
... "2020-01-01 18:18:18",
... "2020-01-01 23:59:59",
... ],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [
... "2020-01-01 11:11:11",
... "2020-02-01 12:12:12",
... "2020-03-01 13:13:13",
... "2020-04-01 08:08:08",
... "2020-05-01 23:59:59",
... ],
... },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
grizz.transformer.InplaceToTime ¶
Bases: ToTimeTransformer
Implement a transformer to convert some columns to a
polars.Time
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceToTime
>>> transformer = InplaceToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
InplaceToTimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', format='%H:%M:%S')
>>> frame = pl.DataFrame(
... {
... "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ time ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
grizz.transformer.InplaceToTimeTransformer ¶
Bases: ToTimeTransformer
Implement a transformer to convert some columns to a
polars.Time
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import InplaceToTime
>>> transformer = InplaceToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
InplaceToTimeTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', format='%H:%M:%S')
>>> frame = pl.DataFrame(
... {
... "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ time ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
grizz.transformer.IntegerCast ¶
Bases: CastTransformer
Implement a transformer to convert integer columns to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import IntegerCast
>>> transformer = IntegerCast(
... columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
IntegerCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1, 2, 3, 4, 5],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float64,
... "col3": pl.Int64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1 ┆ a │
│ 2 ┆ 2.0 ┆ 2 ┆ b │
│ 3 ┆ 3.0 ┆ 3 ┆ c │
│ 4 ┆ 4.0 ┆ 4 ┆ d │
│ 5 ┆ 5.0 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ i64 ┆ str ┆ f32 │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1 ┆ 1.0 ┆ 1 ┆ a ┆ 1.0 │
│ 2 ┆ 2.0 ┆ 2 ┆ b ┆ 2.0 │
│ 3 ┆ 3.0 ┆ 3 ┆ c ┆ 3.0 │
│ 4 ┆ 4.0 ┆ 4 ┆ d ┆ 4.0 │
│ 5 ┆ 5.0 ┆ 5 ┆ e ┆ 5.0 │
└──────┴──────┴──────┴──────┴──────────┘
grizz.transformer.IntegerCastTransformer ¶
Bases: CastTransformer
Implement a transformer to convert integer columns to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import IntegerCast
>>> transformer = IntegerCast(
... columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
IntegerCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1, 2, 3, 4, 5],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float64,
... "col3": pl.Int64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1 ┆ a │
│ 2 ┆ 2.0 ┆ 2 ┆ b │
│ 3 ┆ 3.0 ┆ 3 ┆ c │
│ 4 ┆ 4.0 ┆ 4 ┆ d │
│ 5 ┆ 5.0 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ i64 ┆ str ┆ f32 │
╞══════╪══════╪══════╪══════╪══════════╡
│ 1 ┆ 1.0 ┆ 1 ┆ a ┆ 1.0 │
│ 2 ┆ 2.0 ┆ 2 ┆ b ┆ 2.0 │
│ 3 ┆ 3.0 ┆ 3 ┆ c ┆ 3.0 │
│ 4 ┆ 4.0 ┆ 4 ┆ d ┆ 4.0 │
│ 5 ┆ 5.0 ┆ 5 ┆ e ┆ 5.0 │
└──────┴──────┴──────┴──────┴──────────┘
grizz.transformer.JsonDecode ¶
Bases: BaseInNOutNTransformer
Implement a transformer to parse string values as JSON.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to parse. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import JsonDecode
>>> transformer = JsonDecode(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
JsonDecodeTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
... {
... "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2] ┆ 1 ┆ ['1', '2'] ┆ a │
│ [2] ┆ 2 ┆ ['2'] ┆ b │
│ [1, 2, 3] ┆ 3 ┆ ['1', '2', '3'] ┆ c │
│ [4, 5] ┆ 4 ┆ ['4', '5'] ┆ d │
│ [5, 4] ┆ 5 ┆ ['5', '4'] ┆ e │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌───────────┬──────┬─────────────────┬──────┬───────────┬─────────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ list[i64] ┆ list[str] │
╞═══════════╪══════╪═════════════════╪══════╪═══════════╪═════════════════╡
│ [1, 2] ┆ 1 ┆ ['1', '2'] ┆ a ┆ [1, 2] ┆ ["1", "2"] │
│ [2] ┆ 2 ┆ ['2'] ┆ b ┆ [2] ┆ ["2"] │
│ [1, 2, 3] ┆ 3 ┆ ['1', '2', '3'] ┆ c ┆ [1, 2, 3] ┆ ["1", "2", "3"] │
│ [4, 5] ┆ 4 ┆ ['4', '5'] ┆ d ┆ [4, 5] ┆ ["4", "5"] │
│ [5, 4] ┆ 5 ┆ ['5', '4'] ┆ e ┆ [5, 4] ┆ ["5", "4"] │
└───────────┴──────┴─────────────────┴──────┴───────────┴─────────────────┘
grizz.transformer.JsonDecodeTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to parse string values as JSON.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to parse. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import JsonDecode
>>> transformer = JsonDecode(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
JsonDecodeTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
... {
... "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2] ┆ 1 ┆ ['1', '2'] ┆ a │
│ [2] ┆ 2 ┆ ['2'] ┆ b │
│ [1, 2, 3] ┆ 3 ┆ ['1', '2', '3'] ┆ c │
│ [4, 5] ┆ 4 ┆ ['4', '5'] ┆ d │
│ [5, 4] ┆ 5 ┆ ['5', '4'] ┆ e │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌───────────┬──────┬─────────────────┬──────┬───────────┬─────────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ list[i64] ┆ list[str] │
╞═══════════╪══════╪═════════════════╪══════╪═══════════╪═════════════════╡
│ [1, 2] ┆ 1 ┆ ['1', '2'] ┆ a ┆ [1, 2] ┆ ["1", "2"] │
│ [2] ┆ 2 ┆ ['2'] ┆ b ┆ [2] ┆ ["2"] │
│ [1, 2, 3] ┆ 3 ┆ ['1', '2', '3'] ┆ c ┆ [1, 2, 3] ┆ ["1", "2", "3"] │
│ [4, 5] ┆ 4 ┆ ['4', '5'] ┆ d ┆ [4, 5] ┆ ["4", "5"] │
│ [5, 4] ┆ 5 ┆ ['5', '4'] ┆ e ┆ [5, 4] ┆ ["5", "4"] │
└───────────┴──────┴─────────────────┴──────┴───────────┴─────────────────┘
grizz.transformer.LabelEncoder ¶
Bases: BaseIn1Out1Transformer
Implement a polars.DataFrame
to encode the labels in a given
column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name i.e. the column with the label to encode. |
required |
out_col
|
str
|
The output column name i.e. the column with encoded labels. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import LabelEncoder
>>> transformer = LabelEncoder(in_col="col1", out_col="out")
>>> transformer
LabelEncoderTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": ["a", "b", "c", "d", "e"],
... "col2": ["1", "2", "3", "4", "5"],
... }
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ 4 │
│ e ┆ 5 │
└──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬─────┐
│ col1 ┆ col2 ┆ out │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞══════╪══════╪═════╡
│ a ┆ 1 ┆ 0 │
│ b ┆ 2 ┆ 1 │
│ c ┆ 3 ┆ 2 │
│ d ┆ 4 ┆ 3 │
│ e ┆ 5 ┆ 4 │
└──────┴──────┴─────┘
grizz.transformer.LabelEncoderTransformer ¶
Bases: BaseIn1Out1Transformer
Implement a polars.DataFrame
to encode the labels in a given
column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name i.e. the column with the label to encode. |
required |
out_col
|
str
|
The output column name i.e. the column with encoded labels. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import LabelEncoder
>>> transformer = LabelEncoder(in_col="col1", out_col="out")
>>> transformer
LabelEncoderTransformer(in_col='col1', out_col='out', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": ["a", "b", "c", "d", "e"],
... "col2": ["1", "2", "3", "4", "5"],
... }
... )
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ 4 │
│ e ┆ 5 │
└──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬─────┐
│ col1 ┆ col2 ┆ out │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞══════╪══════╪═════╡
│ a ┆ 1 ┆ 0 │
│ b ┆ 2 ┆ 1 │
│ c ┆ 3 ┆ 2 │
│ d ┆ 4 ┆ 3 │
│ e ┆ 5 ┆ 4 │
└──────┴──────┴─────┘
grizz.transformer.Lower ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the lower operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Lower
>>> transformer = Lower(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
LowerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ true ┆ false │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ true ┆ false │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ true ┆ false │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ true ┆ false │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ false ┆ false │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.LowerEqual ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the lower than or equal operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import LowerEqual
>>> transformer = LowerEqual(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
LowerEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ true ┆ false │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ true ┆ false │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ true ┆ false │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ true ┆ false │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ false ┆ false │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.LowerEqualTransformer ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the lower than or equal operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import LowerEqual
>>> transformer = LowerEqual(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
LowerEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ true ┆ false │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ true ┆ false │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ true ┆ false │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ true ┆ false │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ false ┆ false │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.LowerTransformer ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the lower operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Lower
>>> transformer = Lower(columns=["col1", "col3"], target=4.2, prefix="", suffix="_out")
>>> transformer
LowerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=4.2)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ true ┆ false │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ true ┆ false │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ true ┆ false │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ true ┆ false │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ false ┆ false │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.MaxAbsScaler ¶
Bases: BaseInNOutNTransformer
Implement a transformer to scale columns by the maximum absolute value of each column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import MaxAbsScaler
>>> transformer = MaxAbsScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
MaxAbsScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ 0.2 ┆ 0.2 │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ 0.4 ┆ 0.4 │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ 0.6 ┆ 0.6 │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ 0.8 ┆ 0.8 │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ 1.0 ┆ 1.0 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.MaxAbsScalerTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to scale columns by the maximum absolute value of each column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import MaxAbsScaler
>>> transformer = MaxAbsScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
MaxAbsScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ 0.2 ┆ 0.2 │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ 0.4 ┆ 0.4 │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ 0.6 ┆ 0.6 │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ 0.8 ┆ 0.8 │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ 1.0 ┆ 1.0 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.MaxHorizontal ¶
Bases: BaseInNOut1Transformer
Implement a transformer to get the maximum value horizontally across columns and store the result in a column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns the maximum value horizontally.
The columns should be compatible.
If |
required |
out_col
|
str
|
The output column. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import MaxHorizontal
>>> transformer = MaxHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MaxHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [9, 5, 4, 9, 6],
... "col2": [8, 0, 1, 8, 9],
... "col3": [0, 4, 8, 7, 0],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 9 ┆ 8 ┆ 0 ┆ a │
│ 5 ┆ 0 ┆ 4 ┆ b │
│ 4 ┆ 1 ┆ 8 ┆ c │
│ 9 ┆ 8 ┆ 7 ┆ d │
│ 6 ┆ 9 ┆ 0 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 9 ┆ 8 ┆ 0 ┆ a ┆ 9 │
│ 5 ┆ 0 ┆ 4 ┆ b ┆ 5 │
│ 4 ┆ 1 ┆ 8 ┆ c ┆ 8 │
│ 9 ┆ 8 ┆ 7 ┆ d ┆ 9 │
│ 6 ┆ 9 ┆ 0 ┆ e ┆ 9 │
└──────┴──────┴──────┴──────┴─────┘
grizz.transformer.MaxHorizontalTransformer ¶
Bases: BaseInNOut1Transformer
Implement a transformer to get the maximum value horizontally across columns and store the result in a column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns the maximum value horizontally.
The columns should be compatible.
If |
required |
out_col
|
str
|
The output column. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import MaxHorizontal
>>> transformer = MaxHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MaxHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [9, 5, 4, 9, 6],
... "col2": [8, 0, 1, 8, 9],
... "col3": [0, 4, 8, 7, 0],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 9 ┆ 8 ┆ 0 ┆ a │
│ 5 ┆ 0 ┆ 4 ┆ b │
│ 4 ┆ 1 ┆ 8 ┆ c │
│ 9 ┆ 8 ┆ 7 ┆ d │
│ 6 ┆ 9 ┆ 0 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 9 ┆ 8 ┆ 0 ┆ a ┆ 9 │
│ 5 ┆ 0 ┆ 4 ┆ b ┆ 5 │
│ 4 ┆ 1 ┆ 8 ┆ c ┆ 8 │
│ 9 ┆ 8 ┆ 7 ┆ d ┆ 9 │
│ 6 ┆ 9 ┆ 0 ┆ e ┆ 9 │
└──────┴──────┴──────┴──────┴─────┘
grizz.transformer.MeanHorizontal ¶
Bases: BaseInNOut1Transformer
Implement a transformer to get the mean value horizontally across columns and store the result in a column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns the mean value horizontally.
The columns should be compatible.
If |
required |
out_col
|
str
|
The output column. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import MeanHorizontal
>>> transformer = MeanHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MeanHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [11, 12, 13, 14, 15],
... "col2": [21, 22, 23, 24, 25],
... "col3": [31, 32, 33, 34, 35],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 11 ┆ 21 ┆ 31 ┆ a │
│ 12 ┆ 22 ┆ 32 ┆ b │
│ 13 ┆ 23 ┆ 33 ┆ c │
│ 14 ┆ 24 ┆ 34 ┆ d │
│ 15 ┆ 25 ┆ 35 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╪══════╡
│ 11 ┆ 21 ┆ 31 ┆ a ┆ 21.0 │
│ 12 ┆ 22 ┆ 32 ┆ b ┆ 22.0 │
│ 13 ┆ 23 ┆ 33 ┆ c ┆ 23.0 │
│ 14 ┆ 24 ┆ 34 ┆ d ┆ 24.0 │
│ 15 ┆ 25 ┆ 35 ┆ e ┆ 25.0 │
└──────┴──────┴──────┴──────┴──────┘
grizz.transformer.MeanHorizontalTransformer ¶
Bases: BaseInNOut1Transformer
Implement a transformer to get the mean value horizontally across columns and store the result in a column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns the mean value horizontally.
The columns should be compatible.
If |
required |
out_col
|
str
|
The output column. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import MeanHorizontal
>>> transformer = MeanHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MeanHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [11, 12, 13, 14, 15],
... "col2": [21, 22, 23, 24, 25],
... "col3": [31, 32, 33, 34, 35],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 11 ┆ 21 ┆ 31 ┆ a │
│ 12 ┆ 22 ┆ 32 ┆ b │
│ 13 ┆ 23 ┆ 33 ┆ c │
│ 14 ┆ 24 ┆ 34 ┆ d │
│ 15 ┆ 25 ┆ 35 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╪══════╡
│ 11 ┆ 21 ┆ 31 ┆ a ┆ 21.0 │
│ 12 ┆ 22 ┆ 32 ┆ b ┆ 22.0 │
│ 13 ┆ 23 ┆ 33 ┆ c ┆ 23.0 │
│ 14 ┆ 24 ┆ 34 ┆ d ┆ 24.0 │
│ 15 ┆ 25 ┆ 35 ┆ e ┆ 25.0 │
└──────┴──────┴──────┴──────┴──────┘
grizz.transformer.MinHorizontal ¶
Bases: BaseInNOut1Transformer
Implement a transformer to get the minimum value horizontally across columns and store the result in a column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns the minimum value horizontally.
The columns should be compatible.
If |
required |
out_col
|
str
|
The output column. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import MinHorizontal
>>> transformer = MinHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MinHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [9, 5, 4, 9, 6],
... "col2": [8, 0, 1, 8, 9],
... "col3": [0, 4, 8, 7, 0],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 9 ┆ 8 ┆ 0 ┆ a │
│ 5 ┆ 0 ┆ 4 ┆ b │
│ 4 ┆ 1 ┆ 8 ┆ c │
│ 9 ┆ 8 ┆ 7 ┆ d │
│ 6 ┆ 9 ┆ 0 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 9 ┆ 8 ┆ 0 ┆ a ┆ 0 │
│ 5 ┆ 0 ┆ 4 ┆ b ┆ 0 │
│ 4 ┆ 1 ┆ 8 ┆ c ┆ 1 │
│ 9 ┆ 8 ┆ 7 ┆ d ┆ 7 │
│ 6 ┆ 9 ┆ 0 ┆ e ┆ 0 │
└──────┴──────┴──────┴──────┴─────┘
grizz.transformer.MinHorizontalTransformer ¶
Bases: BaseInNOut1Transformer
Implement a transformer to get the minimum value horizontally across columns and store the result in a column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns the minimum value horizontally.
The columns should be compatible.
If |
required |
out_col
|
str
|
The output column. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import MinHorizontal
>>> transformer = MinHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
MinHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [9, 5, 4, 9, 6],
... "col2": [8, 0, 1, 8, 9],
... "col3": [0, 4, 8, 7, 0],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 9 ┆ 8 ┆ 0 ┆ a │
│ 5 ┆ 0 ┆ 4 ┆ b │
│ 4 ┆ 1 ┆ 8 ┆ c │
│ 9 ┆ 8 ┆ 7 ┆ d │
│ 6 ┆ 9 ┆ 0 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 9 ┆ 8 ┆ 0 ┆ a ┆ 0 │
│ 5 ┆ 0 ┆ 4 ┆ b ┆ 0 │
│ 4 ┆ 1 ┆ 8 ┆ c ┆ 1 │
│ 9 ┆ 8 ┆ 7 ┆ d ┆ 7 │
│ 6 ┆ 9 ┆ 0 ┆ e ┆ 0 │
└──────┴──────┴──────┴──────┴─────┘
grizz.transformer.MinMaxScaler ¶
Bases: BaseInNOutNTransformer
Implement a transformer to scale each column to a given range.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import MinMaxScaler
>>> transformer = MinMaxScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
MinMaxScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [0, 10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ 0 ┆ a │
│ 1 ┆ 1 ┆ 10 ┆ b │
│ 2 ┆ 2 ┆ 20 ┆ c │
│ 3 ┆ 3 ┆ 30 ┆ d │
│ 4 ┆ 4 ┆ 40 ┆ e │
│ 5 ┆ 5 ┆ 50 ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0 ┆ 0 ┆ 0 ┆ a ┆ 0.0 ┆ 0.0 │
│ 1 ┆ 1 ┆ 10 ┆ b ┆ 0.2 ┆ 0.2 │
│ 2 ┆ 2 ┆ 20 ┆ c ┆ 0.4 ┆ 0.4 │
│ 3 ┆ 3 ┆ 30 ┆ d ┆ 0.6 ┆ 0.6 │
│ 4 ┆ 4 ┆ 40 ┆ e ┆ 0.8 ┆ 0.8 │
│ 5 ┆ 5 ┆ 50 ┆ f ┆ 1.0 ┆ 1.0 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.MinMaxScalerTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to scale each column to a given range.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import MinMaxScaler
>>> transformer = MinMaxScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
MinMaxScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [0, 10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ 0 ┆ a │
│ 1 ┆ 1 ┆ 10 ┆ b │
│ 2 ┆ 2 ┆ 20 ┆ c │
│ 3 ┆ 3 ┆ 30 ┆ d │
│ 4 ┆ 4 ┆ 40 ┆ e │
│ 5 ┆ 5 ┆ 50 ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0 ┆ 0 ┆ 0 ┆ a ┆ 0.0 ┆ 0.0 │
│ 1 ┆ 1 ┆ 10 ┆ b ┆ 0.2 ┆ 0.2 │
│ 2 ┆ 2 ┆ 20 ┆ c ┆ 0.4 ┆ 0.4 │
│ 3 ┆ 3 ┆ 30 ┆ d ┆ 0.6 ┆ 0.6 │
│ 4 ┆ 4 ┆ 40 ┆ e ┆ 0.8 ┆ 0.8 │
│ 5 ┆ 5 ┆ 50 ┆ f ┆ 1.0 ┆ 1.0 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.Normalizer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to normalize data points individually to unit norm.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Normalizer
>>> transformer = Normalizer(columns=["col1", "col3"], prefix="", suffix="_norm")
>>> transformer
NormalizerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_norm')
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [5, 4, 3, 2, 1, 0],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ 5 ┆ a │
│ 1 ┆ 1 ┆ 4 ┆ b │
│ 2 ┆ 2 ┆ 3 ┆ c │
│ 3 ┆ 3 ┆ 2 ┆ d │
│ 4 ┆ 4 ┆ 1 ┆ e │
│ 5 ┆ 5 ┆ 0 ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬───────────┬───────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_norm ┆ col3_norm │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪═══════════╪═══════════╡
│ 0 ┆ 0 ┆ 5 ┆ a ┆ 0.0 ┆ 1.0 │
│ 1 ┆ 1 ┆ 4 ┆ b ┆ 0.242536 ┆ 0.970143 │
│ 2 ┆ 2 ┆ 3 ┆ c ┆ 0.5547 ┆ 0.83205 │
│ 3 ┆ 3 ┆ 2 ┆ d ┆ 0.83205 ┆ 0.5547 │
│ 4 ┆ 4 ┆ 1 ┆ e ┆ 0.970143 ┆ 0.242536 │
│ 5 ┆ 5 ┆ 0 ┆ f ┆ 1.0 ┆ 0.0 │
└──────┴──────┴──────┴──────┴───────────┴───────────┘
grizz.transformer.NormalizerTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to normalize data points individually to unit norm.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Normalizer
>>> transformer = Normalizer(columns=["col1", "col3"], prefix="", suffix="_norm")
>>> transformer
NormalizerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_norm')
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [5, 4, 3, 2, 1, 0],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ 5 ┆ a │
│ 1 ┆ 1 ┆ 4 ┆ b │
│ 2 ┆ 2 ┆ 3 ┆ c │
│ 3 ┆ 3 ┆ 2 ┆ d │
│ 4 ┆ 4 ┆ 1 ┆ e │
│ 5 ┆ 5 ┆ 0 ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬───────────┬───────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_norm ┆ col3_norm │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪═══════════╪═══════════╡
│ 0 ┆ 0 ┆ 5 ┆ a ┆ 0.0 ┆ 1.0 │
│ 1 ┆ 1 ┆ 4 ┆ b ┆ 0.242536 ┆ 0.970143 │
│ 2 ┆ 2 ┆ 3 ┆ c ┆ 0.5547 ┆ 0.83205 │
│ 3 ┆ 3 ┆ 2 ┆ d ┆ 0.83205 ┆ 0.5547 │
│ 4 ┆ 4 ┆ 1 ┆ e ┆ 0.970143 ┆ 0.242536 │
│ 5 ┆ 5 ┆ 0 ┆ f ┆ 1.0 ┆ 0.0 │
└──────┴──────┴──────┴──────┴───────────┴───────────┘
grizz.transformer.NotEqual ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the not equal operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import NotEqual
>>> transformer = NotEqual(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
NotEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ true ┆ true │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ true ┆ true │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ false ┆ true │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ true ┆ true │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ true ┆ true │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.NotEqualMissing ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the not equal operation where where null values are not propagated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import NotEqualMissing
>>> transformer = NotEqualMissing(
... columns=["col1", "col3"], target=3, prefix="", suffix="_out"
... )
>>> transformer
NotEqualMissingTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ true ┆ true │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ true ┆ true │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ false ┆ true │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ true ┆ true │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ true ┆ true │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.NotEqualMissingTransformer ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the not equal operation where where null values are not propagated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import NotEqualMissing
>>> transformer = NotEqualMissing(
... columns=["col1", "col3"], target=3, prefix="", suffix="_out"
... )
>>> transformer
NotEqualMissingTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ true ┆ true │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ true ┆ true │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ false ┆ true │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ true ┆ true │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ true ┆ true │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.NotEqualTransformer ¶
Bases: BaseComparatorTransformer
Implements a transformer that computes the not equal operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to compare. |
required |
target
|
Any
|
The target value to compare with. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import NotEqual
>>> transformer = NotEqual(columns=["col1", "col3"], target=3, prefix="", suffix="_out")
>>> transformer
NotEqualTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', target=3)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ bool ┆ bool │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ true ┆ true │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ true ┆ true │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ false ┆ true │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ true ┆ true │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ true ┆ true │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.NumericCast ¶
Bases: CastTransformer
Implement a transformer to convert numeric columns to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import NumericCast
>>> transformer = NumericCast(
... columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
NumericCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float32,
... "col3": pl.Float64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f32 ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1.0 ┆ a │
│ 2 ┆ 2.0 ┆ 2.0 ┆ b │
│ 3 ┆ 3.0 ┆ 3.0 ┆ c │
│ 4 ┆ 4.0 ┆ 4.0 ┆ d │
│ 5 ┆ 5.0 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col2_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f32 ┆ f64 ┆ str ┆ f32 ┆ f32 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1.0 ┆ 1.0 ┆ a ┆ 1.0 ┆ 1.0 │
│ 2 ┆ 2.0 ┆ 2.0 ┆ b ┆ 2.0 ┆ 2.0 │
│ 3 ┆ 3.0 ┆ 3.0 ┆ c ┆ 3.0 ┆ 3.0 │
│ 4 ┆ 4.0 ┆ 4.0 ┆ d ┆ 4.0 ┆ 4.0 │
│ 5 ┆ 5.0 ┆ 5.0 ┆ e ┆ 5.0 ┆ 5.0 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.NumericCastTransformer ¶
Bases: CastTransformer
Implement a transformer to convert numeric columns to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import NumericCast
>>> transformer = NumericCast(
... columns=["col1", "col2"], dtype=pl.Float32, prefix="", suffix="_out"
... )
>>> transformer
NumericCastTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', dtype=Float32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float32,
... "col3": pl.Float64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f32 ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1.0 ┆ a │
│ 2 ┆ 2.0 ┆ 2.0 ┆ b │
│ 3 ┆ 3.0 ┆ 3.0 ┆ c │
│ 4 ┆ 4.0 ┆ 4.0 ┆ d │
│ 5 ┆ 5.0 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col2_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f32 ┆ f64 ┆ str ┆ f32 ┆ f32 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 1 ┆ 1.0 ┆ 1.0 ┆ a ┆ 1.0 ┆ 1.0 │
│ 2 ┆ 2.0 ┆ 2.0 ┆ b ┆ 2.0 ┆ 2.0 │
│ 3 ┆ 3.0 ┆ 3.0 ┆ c ┆ 3.0 ┆ 3.0 │
│ 4 ┆ 4.0 ┆ 4.0 ┆ d ┆ 4.0 ┆ 4.0 │
│ 5 ┆ 5.0 ┆ 5.0 ┆ e ┆ 5.0 ┆ 5.0 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.OrdinalEncoder ¶
Bases: BaseInNOutNTransformer
Implement a transformer to convert each column ordinal integers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import OrdinalEncoder
>>> transformer = OrdinalEncoder(columns=["col1", "col2"], prefix="", suffix="_out")
>>> transformer
OrdinalEncoderTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["a", "b", "c", "d", "e", "f"],
... "col3": [0, 10, 20, 30, 40, 50],
... }
... )
>>> frame
shape: (6, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╡
│ 0 ┆ a ┆ 0 │
│ 1 ┆ b ┆ 10 │
│ 2 ┆ c ┆ 20 │
│ 3 ┆ d ┆ 30 │
│ 4 ┆ e ┆ 40 │
│ 5 ┆ f ┆ 50 │
└──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 5)
┌──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col1_out ┆ col2_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════════╪══════════╡
│ 0 ┆ a ┆ 0 ┆ 0.0 ┆ 0.0 │
│ 1 ┆ b ┆ 10 ┆ 1.0 ┆ 1.0 │
│ 2 ┆ c ┆ 20 ┆ 2.0 ┆ 2.0 │
│ 3 ┆ d ┆ 30 ┆ 3.0 ┆ 3.0 │
│ 4 ┆ e ┆ 40 ┆ 4.0 ┆ 4.0 │
│ 5 ┆ f ┆ 50 ┆ 5.0 ┆ 5.0 │
└──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.OrdinalEncoderTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to convert each column ordinal integers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import OrdinalEncoder
>>> transformer = OrdinalEncoder(columns=["col1", "col2"], prefix="", suffix="_out")
>>> transformer
OrdinalEncoderTransformer(columns=('col1', 'col2'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["a", "b", "c", "d", "e", "f"],
... "col3": [0, 10, 20, 30, 40, 50],
... }
... )
>>> frame
shape: (6, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╡
│ 0 ┆ a ┆ 0 │
│ 1 ┆ b ┆ 10 │
│ 2 ┆ c ┆ 20 │
│ 3 ┆ d ┆ 30 │
│ 4 ┆ e ┆ 40 │
│ 5 ┆ f ┆ 50 │
└──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 5)
┌──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col1_out ┆ col2_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════════╪══════════╡
│ 0 ┆ a ┆ 0 ┆ 0.0 ┆ 0.0 │
│ 1 ┆ b ┆ 10 ┆ 1.0 ┆ 1.0 │
│ 2 ┆ c ┆ 20 ┆ 2.0 ┆ 2.0 │
│ 3 ┆ d ┆ 30 ┆ 3.0 ┆ 3.0 │
│ 4 ┆ e ┆ 40 ┆ 4.0 ┆ 4.0 │
│ 5 ┆ f ┆ 50 ┆ 5.0 ┆ 5.0 │
└──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.PowerTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to apply a power transform featurewise to make data more Gaussian-like.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import PowerTransformer
>>> transformer = PowerTransformer(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
PowerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [0, 10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ 0 ┆ a │
│ 1 ┆ 1 ┆ 10 ┆ b │
│ 2 ┆ 2 ┆ 20 ┆ c │
│ 3 ┆ 3 ┆ 30 ┆ d │
│ 4 ┆ 4 ┆ 40 ┆ e │
│ 5 ┆ 5 ┆ 50 ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬───────────┬───────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪═══════════╪═══════════╡
│ 0 ┆ 0 ┆ 0 ┆ a ┆ -1.567837 ┆ -1.695398 │
│ 1 ┆ 1 ┆ 10 ┆ b ┆ -0.836194 ┆ -0.740367 │
│ 2 ┆ 2 ┆ 20 ┆ c ┆ -0.210053 ┆ -0.117399 │
│ 3 ┆ 3 ┆ 30 ┆ d ┆ 0.356111 ┆ 0.402585 │
│ 4 ┆ 4 ┆ 40 ┆ e ┆ 0.881486 ┆ 0.864187 │
│ 5 ┆ 5 ┆ 50 ┆ f ┆ 1.376486 ┆ 1.286392 │
└──────┴──────┴──────┴──────┴───────────┴───────────┘
grizz.transformer.QuantileTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to apply the quantile transformation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import QuantileTransformer
>>> transformer = QuantileTransformer(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
QuantileTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [0, 10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ 0 ┆ a │
│ 1 ┆ 1 ┆ 10 ┆ b │
│ 2 ┆ 2 ┆ 20 ┆ c │
│ 3 ┆ 3 ┆ 30 ┆ d │
│ 4 ┆ 4 ┆ 40 ┆ e │
│ 5 ┆ 5 ┆ 50 ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0 ┆ 0 ┆ 0 ┆ a ┆ 0.0 ┆ 0.0 │
│ 1 ┆ 1 ┆ 10 ┆ b ┆ 0.2 ┆ 0.2 │
│ 2 ┆ 2 ┆ 20 ┆ c ┆ 0.4 ┆ 0.4 │
│ 3 ┆ 3 ┆ 30 ┆ d ┆ 0.6 ┆ 0.6 │
│ 4 ┆ 4 ┆ 40 ┆ e ┆ 0.8 ┆ 0.8 │
│ 5 ┆ 5 ┆ 50 ┆ f ┆ 1.0 ┆ 1.0 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.Replace ¶
Bases: BaseIn1Out1Transformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Replace
>>> transformer = Replace(in_col="old", out_col="new", old={"a": 1, "b": 2, "c": 3})
>>> transformer
ReplaceTransformer(in_col='old', out_col='new', exist_policy='raise', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3})
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ d │
│ e ┆ e │
└─────┴─────┘
>>> transformer = Replace(
... in_col="old",
... out_col="new",
... old={"a": 1, "b": 2, "c": 3},
... default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ null │
│ e ┆ null │
└─────┴──────┘
grizz.transformer.ReplaceStrict ¶
Bases: BaseIn1Out1Transformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ReplaceStrict
>>> transformer = ReplaceStrict(
... in_col="old", out_col="new", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
ReplaceStrictTransformer(in_col='old', out_col='new', exist_policy='raise', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ 4 │
│ e ┆ 5 │
└─────┴─────┘
>>> transformer = ReplaceStrict(
... in_col="old",
... out_col="new",
... old={"a": 1, "b": 2, "c": 3},
... default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ null │
│ e ┆ null │
└─────┴──────┘
grizz.transformer.ReplaceStrictTransformer ¶
Bases: BaseIn1Out1Transformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ReplaceStrict
>>> transformer = ReplaceStrict(
... in_col="old", out_col="new", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
ReplaceStrictTransformer(in_col='old', out_col='new', exist_policy='raise', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ 4 │
│ e ┆ 5 │
└─────┴─────┘
>>> transformer = ReplaceStrict(
... in_col="old",
... out_col="new",
... old={"a": 1, "b": 2, "c": 3},
... default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ null │
│ e ┆ null │
└─────┴──────┘
grizz.transformer.ReplaceTransformer ¶
Bases: BaseIn1Out1Transformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name. |
required |
out_col
|
str
|
The output column name. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Replace
>>> transformer = Replace(in_col="old", out_col="new", old={"a": 1, "b": 2, "c": 3})
>>> transformer
ReplaceTransformer(in_col='old', out_col='new', exist_policy='raise', missing_policy='raise', old={'a': 1, 'b': 2, 'c': 3})
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ d │
│ e ┆ e │
└─────┴─────┘
>>> transformer = Replace(
... in_col="old",
... out_col="new",
... old={"a": 1, "b": 2, "c": 3},
... default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ null │
│ e ┆ null │
└─────┴──────┘
grizz.transformer.RobustScaler ¶
Bases: BaseInNOutNTransformer
Implement a transformer to scale each column using statistics that are robust to outliers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import RobustScaler
>>> transformer = RobustScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
RobustScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [0, 10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ 0 ┆ a │
│ 1 ┆ 1 ┆ 10 ┆ b │
│ 2 ┆ 2 ┆ 20 ┆ c │
│ 3 ┆ 3 ┆ 30 ┆ d │
│ 4 ┆ 4 ┆ 40 ┆ e │
│ 5 ┆ 5 ┆ 50 ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0 ┆ 0 ┆ 0 ┆ a ┆ -1.0 ┆ -1.0 │
│ 1 ┆ 1 ┆ 10 ┆ b ┆ -0.6 ┆ -0.6 │
│ 2 ┆ 2 ┆ 20 ┆ c ┆ -0.2 ┆ -0.2 │
│ 3 ┆ 3 ┆ 30 ┆ d ┆ 0.2 ┆ 0.2 │
│ 4 ┆ 4 ┆ 40 ┆ e ┆ 0.6 ┆ 0.6 │
│ 5 ┆ 5 ┆ 50 ┆ f ┆ 1.0 ┆ 1.0 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.RobustScalerTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to scale each column using statistics that are robust to outliers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import RobustScaler
>>> transformer = RobustScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
RobustScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 2, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [0, 10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ 0 ┆ a │
│ 1 ┆ 1 ┆ 10 ┆ b │
│ 2 ┆ 2 ┆ 20 ┆ c │
│ 3 ┆ 3 ┆ 30 ┆ d │
│ 4 ┆ 4 ┆ 40 ┆ e │
│ 5 ┆ 5 ┆ 50 ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0 ┆ 0 ┆ 0 ┆ a ┆ -1.0 ┆ -1.0 │
│ 1 ┆ 1 ┆ 10 ┆ b ┆ -0.6 ┆ -0.6 │
│ 2 ┆ 2 ┆ 20 ┆ c ┆ -0.2 ┆ -0.2 │
│ 3 ┆ 3 ┆ 30 ┆ d ┆ 0.2 ┆ 0.2 │
│ 4 ┆ 4 ┆ 40 ┆ e ┆ 0.6 ┆ 0.6 │
│ 5 ┆ 5 ┆ 50 ┆ f ┆ 1.0 ┆ 1.0 │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.Sequential ¶
Bases: BaseTransformer
Implement a polars.DataFrame
transformer to apply
sequentially several transformers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transformers
|
Sequence[BaseTransformer | dict]
|
The transformers or their configurations. |
required |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Sequential, InplaceCast
>>> transformer = Sequential(
... [
... InplaceCast(columns=["col1"], dtype=pl.Float32),
... InplaceCast(columns=["col2"], dtype=pl.Int64),
... ]
... )
>>> transformer
SequentialTransformer(
(0): InplaceCastTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', dtype=Float32)
(1): InplaceCastTransformer(columns=('col2',), exclude_columns=(), missing_policy='raise', dtype=Int64)
)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f32 ┆ i64 ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1.0 ┆ 1 ┆ a ┆ a │
│ 2.0 ┆ 2 ┆ b ┆ b │
│ 3.0 ┆ 3 ┆ c ┆ c │
│ 4.0 ┆ 4 ┆ d ┆ d │
│ 5.0 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
grizz.transformer.SequentialTransformer ¶
Bases: BaseTransformer
Implement a polars.DataFrame
transformer to apply
sequentially several transformers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transformers
|
Sequence[BaseTransformer | dict]
|
The transformers or their configurations. |
required |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Sequential, InplaceCast
>>> transformer = Sequential(
... [
... InplaceCast(columns=["col1"], dtype=pl.Float32),
... InplaceCast(columns=["col2"], dtype=pl.Int64),
... ]
... )
>>> transformer
SequentialTransformer(
(0): InplaceCastTransformer(columns=('col1',), exclude_columns=(), missing_policy='raise', dtype=Float32)
(1): InplaceCastTransformer(columns=('col2',), exclude_columns=(), missing_policy='raise', dtype=Int64)
)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f32 ┆ i64 ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1.0 ┆ 1 ┆ a ┆ a │
│ 2.0 ┆ 2 ┆ b ┆ b │
│ 3.0 ┆ 3 ┆ c ┆ c │
│ 4.0 ┆ 4 ┆ d ┆ d │
│ 5.0 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
grizz.transformer.ShrinkMemory ¶
Bases: BaseArgTransformer
Implement a transformer that shrinks DataFrame memory usage.
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ShrinkMemory
>>> transformer = ShrinkMemory()
>>> transformer
ShrinkMemoryTransformer()
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.ShrinkMemoryTransformer ¶
Bases: BaseArgTransformer
Implement a transformer that shrinks DataFrame memory usage.
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ShrinkMemory
>>> transformer = ShrinkMemory()
>>> transformer
ShrinkMemoryTransformer()
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.SimpleImputer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to impute missing values with simple strategies.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
propagate_nulls
|
bool
|
If set to |
True
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import SimpleImputer
>>> transformer = SimpleImputer(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
SimpleImputerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, None, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [float("nan"), 10, 20, 30, 40, None],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ NaN ┆ a │
│ 1 ┆ 1 ┆ 10.0 ┆ b │
│ null ┆ 2 ┆ 20.0 ┆ c │
│ 3 ┆ 3 ┆ 30.0 ┆ d │
│ 4 ┆ 4 ┆ 40.0 ┆ e │
│ 5 ┆ 5 ┆ null ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ f64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0 ┆ 0 ┆ NaN ┆ a ┆ 0.0 ┆ 25.0 │
│ 1 ┆ 1 ┆ 10.0 ┆ b ┆ 1.0 ┆ 10.0 │
│ null ┆ 2 ┆ 20.0 ┆ c ┆ null ┆ 20.0 │
│ 3 ┆ 3 ┆ 30.0 ┆ d ┆ 3.0 ┆ 30.0 │
│ 4 ┆ 4 ┆ 40.0 ┆ e ┆ 4.0 ┆ 40.0 │
│ 5 ┆ 5 ┆ null ┆ f ┆ 5.0 ┆ null │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.SimpleImputerTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to impute missing values with simple strategies.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
propagate_nulls
|
bool
|
If set to |
True
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import SimpleImputer
>>> transformer = SimpleImputer(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
SimpleImputerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, None, 3, 4, 5],
... "col2": ["0", "1", "2", "3", "4", "5"],
... "col3": [float("nan"), 10, 20, 30, 40, None],
... "col4": ["a", "b", "c", "d", "e", "f"],
... }
... )
>>> frame
shape: (6, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 0 ┆ 0 ┆ NaN ┆ a │
│ 1 ┆ 1 ┆ 10.0 ┆ b │
│ null ┆ 2 ┆ 20.0 ┆ c │
│ 3 ┆ 3 ┆ 30.0 ┆ d │
│ 4 ┆ 4 ┆ 40.0 ┆ e │
│ 5 ┆ 5 ┆ null ┆ f │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (6, 6)
┌──────┬──────┬──────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ f64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪══════════╪══════════╡
│ 0 ┆ 0 ┆ NaN ┆ a ┆ 0.0 ┆ 25.0 │
│ 1 ┆ 1 ┆ 10.0 ┆ b ┆ 1.0 ┆ 10.0 │
│ null ┆ 2 ┆ 20.0 ┆ c ┆ null ┆ 20.0 │
│ 3 ┆ 3 ┆ 30.0 ┆ d ┆ 3.0 ┆ 30.0 │
│ 4 ┆ 4 ┆ 40.0 ┆ e ┆ 4.0 ┆ 40.0 │
│ 5 ┆ 5 ┆ null ┆ f ┆ 5.0 ┆ null │
└──────┴──────┴──────┴──────┴──────────┴──────────┘
grizz.transformer.Sort ¶
Bases: BaseInNTransformer
Implement a transformer to sort the DataFrame by the given columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to use to sort the rows. |
None
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Sort
>>> transformer = Sort(columns=["col3", "col1"])
>>> transformer
SortTransformer(columns=('col3', 'col1'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {"col1": [1, 2, None], "col2": [6.0, 5.0, 4.0], "col3": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 5.0 ┆ c │
│ null ┆ 4.0 ┆ b │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ null ┆ 4.0 ┆ b │
│ 2 ┆ 5.0 ┆ c │
└──────┴──────┴──────┘
grizz.transformer.SortColumns ¶
Bases: BaseArgTransformer
Implement a transformer to sort the DataFrame columns by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
reverse
|
bool
|
If set to |
False
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import SortColumns
>>> transformer = SortColumns()
>>> transformer
SortColumnsTransformer(reverse=False)
>>> frame = pl.DataFrame(
... {"col2": [1, 2, None], "col3": [6.0, 5.0, 4.0], "col1": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col2 ┆ col3 ┆ col1 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 5.0 ┆ c │
│ null ┆ 4.0 ┆ b │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 │
╞══════╪══════╪══════╡
│ a ┆ 1 ┆ 6.0 │
│ c ┆ 2 ┆ 5.0 │
│ b ┆ null ┆ 4.0 │
└──────┴──────┴──────┘
grizz.transformer.SortColumnsTransformer ¶
Bases: BaseArgTransformer
Implement a transformer to sort the DataFrame columns by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
reverse
|
bool
|
If set to |
False
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import SortColumns
>>> transformer = SortColumns()
>>> transformer
SortColumnsTransformer(reverse=False)
>>> frame = pl.DataFrame(
... {"col2": [1, 2, None], "col3": [6.0, 5.0, 4.0], "col1": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col2 ┆ col3 ┆ col1 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 5.0 ┆ c │
│ null ┆ 4.0 ┆ b │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 │
╞══════╪══════╪══════╡
│ a ┆ 1 ┆ 6.0 │
│ c ┆ 2 ┆ 5.0 │
│ b ┆ null ┆ 4.0 │
└──────┴──────┴──────┘
grizz.transformer.SortTransformer ¶
Bases: BaseInNTransformer
Implement a transformer to sort the DataFrame by the given columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to use to sort the rows. |
None
|
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Sort
>>> transformer = Sort(columns=["col3", "col1"])
>>> transformer
SortTransformer(columns=('col3', 'col1'), exclude_columns=(), missing_policy='raise')
>>> frame = pl.DataFrame(
... {"col1": [1, 2, None], "col2": [6.0, 5.0, 4.0], "col3": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 5.0 ┆ c │
│ null ┆ 4.0 ┆ b │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ null ┆ 4.0 ┆ b │
│ 2 ┆ 5.0 ┆ c │
└──────┴──────┴──────┘
grizz.transformer.SqlTransformer ¶
Bases: BaseArgTransformer
Implement a transformer that executes a SQL query against the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query
|
str
|
The SQL query to execute. |
required |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import SqlTransformer
>>> transformer = SqlTransformer(query="SELECT col1, col4 FROM self WHERE col1 > 2")
>>> transformer
SqlTransformer(
(query): SELECT col1, col4 FROM self WHERE col1 > 2
)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (3, 2)
┌──────┬──────┐
│ col1 ┆ col4 │
│ --- ┆ --- │
│ i64 ┆ str │
╞══════╪══════╡
│ 3 ┆ c │
│ 4 ┆ d │
│ 5 ┆ e │
└──────┴──────┘
grizz.transformer.StandardScaler ¶
Bases: BaseInNOutNTransformer
Implement a transformer to standardize each column by removing the mean and scaling to unit variance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import StandardScaler
>>> transformer = StandardScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
StandardScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬───────────┬───────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪═══════════╪═══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ -1.414214 ┆ -1.414214 │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ -0.707107 ┆ -0.707107 │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ 0.0 ┆ 0.0 │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ 0.707107 ┆ 0.707107 │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ 1.414214 ┆ 1.414214 │
└──────┴──────┴──────┴──────┴───────────┴───────────┘
grizz.transformer.StandardScalerTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to standardize each column by removing the mean and scaling to unit variance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to scale. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
propagate_nulls
|
bool
|
If set to |
True
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import StandardScaler
>>> transformer = StandardScaler(columns=["col1", "col3"], prefix="", suffix="_out")
>>> transformer
StandardScalerTransformer(columns=('col1', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', propagate_nulls=True)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [10, 20, 30, 40, 50],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 10 ┆ a │
│ 2 ┆ 2 ┆ 20 ┆ b │
│ 3 ┆ 3 ┆ 30 ┆ c │
│ 4 ┆ 4 ┆ 40 ┆ d │
│ 5 ┆ 5 ┆ 50 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.fit_transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬──────┬──────┬───────────┬───────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col1_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ f64 ┆ f64 │
╞══════╪══════╪══════╪══════╪═══════════╪═══════════╡
│ 1 ┆ 1 ┆ 10 ┆ a ┆ -1.414214 ┆ -1.414214 │
│ 2 ┆ 2 ┆ 20 ┆ b ┆ -0.707107 ┆ -0.707107 │
│ 3 ┆ 3 ┆ 30 ┆ c ┆ 0.0 ┆ 0.0 │
│ 4 ┆ 4 ┆ 40 ┆ d ┆ 0.707107 ┆ 0.707107 │
│ 5 ┆ 5 ┆ 50 ┆ e ┆ 1.414214 ┆ 1.414214 │
└──────┴──────┴──────┴──────┴───────────┴───────────┘
grizz.transformer.StringToDatetime ¶
Bases: BaseInNOutNTransformer
Implement a transformer to convert some string columns to
polars.Datetime
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import StringToDatetime
>>> transformer = StringToDatetime(columns=["col1"], prefix="", suffix="_out")
>>> transformer
StringToDatetimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
... {
... "col1": [
... "2020-01-01 01:01:01",
... "2020-01-01 02:02:02",
... "2020-01-01 12:00:01",
... "2020-01-01 18:18:18",
... "2020-01-01 23:59:59",
... ],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [
... "2020-01-01 11:11:11",
... "2020-02-01 12:12:12",
... "2020-03-01 13:13:13",
... "2020-04-01 08:08:08",
... "2020-05-01 23:59:59",
... ],
... },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌─────────────────────┬──────┬─────────────────────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col1_out │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ datetime[μs] │
╞═════════════════════╪══════╪═════════════════════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 ┆ 2020-01-01 01:01:01 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 ┆ 2020-01-01 02:02:02 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 ┆ 2020-01-01 12:00:01 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 ┆ 2020-01-01 18:18:18 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 ┆ 2020-01-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┴─────────────────────┘
grizz.transformer.StringToDatetimeTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to convert some string columns to
polars.Datetime
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import StringToDatetime
>>> transformer = StringToDatetime(columns=["col1"], prefix="", suffix="_out")
>>> transformer
StringToDatetimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
... {
... "col1": [
... "2020-01-01 01:01:01",
... "2020-01-01 02:02:02",
... "2020-01-01 12:00:01",
... "2020-01-01 18:18:18",
... "2020-01-01 23:59:59",
... ],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [
... "2020-01-01 11:11:11",
... "2020-02-01 12:12:12",
... "2020-03-01 13:13:13",
... "2020-04-01 08:08:08",
... "2020-05-01 23:59:59",
... ],
... },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌─────────────────────┬──────┬─────────────────────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col1_out │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ datetime[μs] │
╞═════════════════════╪══════╪═════════════════════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 ┆ 2020-01-01 01:01:01 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 ┆ 2020-01-01 02:02:02 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 ┆ 2020-01-01 12:00:01 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 ┆ 2020-01-01 18:18:18 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 ┆ 2020-01-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┴─────────────────────┘
grizz.transformer.StringToTime ¶
Bases: BaseInNOutNTransformer
Implement a transformer to convert some string columns to
polars.Time
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import StringToTime
>>> transformer = StringToTime(
... columns=["col1"], format="%H:%M:%S", prefix="", suffix="_out"
... )
>>> transformer
StringToTimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', format='%H:%M:%S')
>>> frame = pl.DataFrame(
... {
... "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col1_out │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ time │
╞══════════╪══════╪══════════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 ┆ 23:59:59 │
└──────────┴──────┴──────────┴──────────┘
grizz.transformer.StringToTimeTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to convert some string columns to
polars.Time
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import StringToTime
>>> transformer = StringToTime(
... columns=["col1"], format="%H:%M:%S", prefix="", suffix="_out"
... )
>>> transformer
StringToTimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', format='%H:%M:%S')
>>> frame = pl.DataFrame(
... {
... "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col1_out │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ time │
╞══════════╪══════╪══════════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 ┆ 23:59:59 │
└──────────┴──────┴──────────┴──────────┘
grizz.transformer.StripChars ¶
Bases: BaseInNOutNTransformer
Implement a transformer to remove leading and trailing characters.
This transformer ignores the columns that are not of type string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to prepare. If |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> transformer = StripChars(columns=["col2", "col3"], prefix="", suffix="_out")
>>> transformer
StripCharsTransformer(columns=('col2', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬───────┬───────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col2_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ a ┆ a ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ b ┆ b ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ c ┆ c ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ d ┆ d ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ e ┆ e ┆ 5 ┆ e │
└──────┴──────┴───────┴───────┴──────────┴──────────┘
grizz.transformer.StripCharsTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to remove leading and trailing characters.
This transformer ignores the columns that are not of type string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to prepare. If |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> transformer = StripChars(columns=["col2", "col3"], prefix="", suffix="_out")
>>> transformer
StripCharsTransformer(columns=('col2', 'col3'), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 6)
┌──────┬──────┬───────┬───────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col2_out ┆ col3_out │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╪══════════╪══════════╡
│ 1 ┆ 1 ┆ a ┆ a ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ b ┆ b ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ c ┆ c ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ d ┆ d ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ e ┆ e ┆ 5 ┆ e │
└──────┴──────┴───────┴───────┴──────────┴──────────┘
grizz.transformer.SumHorizontal ¶
Bases: BaseInNOut1Transformer
Implement a transformer to sum all values horizontally across columns and store the result in a column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to sum. The columns should be compatible.
If |
required |
out_col
|
str
|
The output column. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import SumHorizontal
>>> transformer = SumHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
SumHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [11, 12, 13, 14, 15],
... "col2": [21, 22, 23, 24, 25],
... "col3": [31, 32, 33, 34, 35],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 11 ┆ 21 ┆ 31 ┆ a │
│ 12 ┆ 22 ┆ 32 ┆ b │
│ 13 ┆ 23 ┆ 33 ┆ c │
│ 14 ┆ 24 ┆ 34 ┆ d │
│ 15 ┆ 25 ┆ 35 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 11 ┆ 21 ┆ 31 ┆ a ┆ 63 │
│ 12 ┆ 22 ┆ 32 ┆ b ┆ 66 │
│ 13 ┆ 23 ┆ 33 ┆ c ┆ 69 │
│ 14 ┆ 24 ┆ 34 ┆ d ┆ 72 │
│ 15 ┆ 25 ┆ 35 ┆ e ┆ 75 │
└──────┴──────┴──────┴──────┴─────┘
grizz.transformer.SumHorizontalTransformer ¶
Bases: BaseInNOut1Transformer
Implement a transformer to sum all values horizontally across columns and store the result in a column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to sum. The columns should be compatible.
If |
required |
out_col
|
str
|
The output column. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
Additional arguments passed to
|
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import SumHorizontal
>>> transformer = SumHorizontal(columns=["col1", "col2", "col3"], out_col="col")
>>> transformer
SumHorizontalTransformer(columns=('col1', 'col2', 'col3'), out_col='col', exclude_columns=(), exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "col1": [11, 12, 13, 14, 15],
... "col2": [21, 22, 23, 24, 25],
... "col3": [31, 32, 33, 34, 35],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 11 ┆ 21 ┆ 31 ┆ a │
│ 12 ┆ 22 ┆ 32 ┆ b │
│ 13 ┆ 23 ┆ 33 ┆ c │
│ 14 ┆ 24 ┆ 34 ┆ d │
│ 15 ┆ 25 ┆ 35 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬─────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╡
│ 11 ┆ 21 ┆ 31 ┆ a ┆ 63 │
│ 12 ┆ 22 ┆ 32 ┆ b ┆ 66 │
│ 13 ┆ 23 ┆ 33 ┆ c ┆ 69 │
│ 14 ┆ 24 ┆ 34 ┆ d ┆ 72 │
│ 15 ┆ 25 ┆ 35 ┆ e ┆ 75 │
└──────┴──────┴──────┴──────┴─────┘
grizz.transformer.TimeDiff ¶
Bases: BaseArgTransformer
Implement a transformer to compute the time difference between consecutive time steps.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
group_cols
|
Sequence[str]
|
The columns used to generate the group for each sequence. |
required |
time_col
|
str
|
The input time column name. |
required |
time_diff_col
|
str
|
The output time difference column name. |
required |
shift
|
int
|
The number of slots to shift. |
1
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import TimeDiff
>>> transformer = TimeDiff(group_cols=["col"], time_col="time", time_diff_col="diff")
>>> transformer
TimeDiffTransformer(group_cols=['col'], time_col='time', time_diff_col='diff', shift=1)
>>> frame = pl.DataFrame({"col": ["a", "b", "a", "a", "b"], "time": [1, 2, 3, 4, 5]})
>>> frame
shape: (5, 2)
┌─────┬──────┐
│ col ┆ time │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ a ┆ 3 │
│ a ┆ 4 │
│ b ┆ 5 │
└─────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────┬──────┬──────┐
│ col ┆ time ┆ diff │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪══════╪══════╡
│ a ┆ 1 ┆ 0 │
│ a ┆ 3 ┆ 2 │
│ a ┆ 4 ┆ 1 │
│ b ┆ 2 ┆ 0 │
│ b ┆ 5 ┆ 3 │
└─────┴──────┴──────┘
grizz.transformer.TimeDiffTransformer ¶
Bases: BaseArgTransformer
Implement a transformer to compute the time difference between consecutive time steps.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
group_cols
|
Sequence[str]
|
The columns used to generate the group for each sequence. |
required |
time_col
|
str
|
The input time column name. |
required |
time_diff_col
|
str
|
The output time difference column name. |
required |
shift
|
int
|
The number of slots to shift. |
1
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import TimeDiff
>>> transformer = TimeDiff(group_cols=["col"], time_col="time", time_diff_col="diff")
>>> transformer
TimeDiffTransformer(group_cols=['col'], time_col='time', time_diff_col='diff', shift=1)
>>> frame = pl.DataFrame({"col": ["a", "b", "a", "a", "b"], "time": [1, 2, 3, 4, 5]})
>>> frame
shape: (5, 2)
┌─────┬──────┐
│ col ┆ time │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ a ┆ 3 │
│ a ┆ 4 │
│ b ┆ 5 │
└─────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────┬──────┬──────┐
│ col ┆ time ┆ diff │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪══════╪══════╡
│ a ┆ 1 ┆ 0 │
│ a ┆ 3 ┆ 2 │
│ a ┆ 4 ┆ 1 │
│ b ┆ 2 ┆ 0 │
│ b ┆ 5 ┆ 3 │
└─────┴──────┴──────┘
grizz.transformer.TimeToSecond ¶
Bases: BaseIn1Out1Transformer
Implement a transformer to convert a column with time values to seconds.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column with the time value to convert. |
required |
out_col
|
str
|
The output column with the time in seconds. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import datetime
>>> import polars as pl
>>> from grizz.transformer import TimeToSecond
>>> transformer = TimeToSecond(in_col="time", out_col="second")
>>> transformer
TimeToSecondTransformer(in_col='time', out_col='second', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "time": [
... datetime.time(0, 0, 1, 890000),
... datetime.time(0, 1, 1, 890000),
... datetime.time(1, 1, 1, 890000),
... datetime.time(0, 19, 19, 890000),
... datetime.time(19, 19, 19, 890000),
... ],
... "col": ["a", "b", "c", "d", "e"],
... },
... schema={"time": pl.Time, "col": pl.String},
... )
>>> frame
shape: (5, 2)
┌──────────────┬─────┐
│ time ┆ col │
│ --- ┆ --- │
│ time ┆ str │
╞══════════════╪═════╡
│ 00:00:01.890 ┆ a │
│ 00:01:01.890 ┆ b │
│ 01:01:01.890 ┆ c │
│ 00:19:19.890 ┆ d │
│ 19:19:19.890 ┆ e │
└──────────────┴─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────────┬─────┬──────────┐
│ time ┆ col ┆ second │
│ --- ┆ --- ┆ --- │
│ time ┆ str ┆ f64 │
╞══════════════╪═════╪══════════╡
│ 00:00:01.890 ┆ a ┆ 1.89 │
│ 00:01:01.890 ┆ b ┆ 61.89 │
│ 01:01:01.890 ┆ c ┆ 3661.89 │
│ 00:19:19.890 ┆ d ┆ 1159.89 │
│ 19:19:19.890 ┆ e ┆ 69559.89 │
└──────────────┴─────┴──────────┘
grizz.transformer.TimeToSecondTransformer ¶
Bases: BaseIn1Out1Transformer
Implement a transformer to convert a column with time values to seconds.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column with the time value to convert. |
required |
out_col
|
str
|
The output column with the time in seconds. |
required |
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
Example usage:
>>> import datetime
>>> import polars as pl
>>> from grizz.transformer import TimeToSecond
>>> transformer = TimeToSecond(in_col="time", out_col="second")
>>> transformer
TimeToSecondTransformer(in_col='time', out_col='second', exist_policy='raise', missing_policy='raise')
>>> frame = pl.DataFrame(
... {
... "time": [
... datetime.time(0, 0, 1, 890000),
... datetime.time(0, 1, 1, 890000),
... datetime.time(1, 1, 1, 890000),
... datetime.time(0, 19, 19, 890000),
... datetime.time(19, 19, 19, 890000),
... ],
... "col": ["a", "b", "c", "d", "e"],
... },
... schema={"time": pl.Time, "col": pl.String},
... )
>>> frame
shape: (5, 2)
┌──────────────┬─────┐
│ time ┆ col │
│ --- ┆ --- │
│ time ┆ str │
╞══════════════╪═════╡
│ 00:00:01.890 ┆ a │
│ 00:01:01.890 ┆ b │
│ 01:01:01.890 ┆ c │
│ 00:19:19.890 ┆ d │
│ 19:19:19.890 ┆ e │
└──────────────┴─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────────┬─────┬──────────┐
│ time ┆ col ┆ second │
│ --- ┆ --- ┆ --- │
│ time ┆ str ┆ f64 │
╞══════════════╪═════╪══════════╡
│ 00:00:01.890 ┆ a ┆ 1.89 │
│ 00:01:01.890 ┆ b ┆ 61.89 │
│ 01:01:01.890 ┆ c ┆ 3661.89 │
│ 00:19:19.890 ┆ d ┆ 1159.89 │
│ 19:19:19.890 ┆ e ┆ 69559.89 │
└──────────────┴─────┴──────────┘
grizz.transformer.ToDatetime ¶
Bases: BaseInNOutNTransformer
Implement a transformer to convert some columns to a
polars.Datetime
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ToDatetime
>>> transformer = ToDatetime(columns=["col1"], prefix="", suffix="_out")
>>> transformer
ToDatetimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
... {
... "col1": [
... "2020-01-01 01:01:01",
... "2020-01-01 02:02:02",
... "2020-01-01 12:00:01",
... "2020-01-01 18:18:18",
... "2020-01-01 23:59:59",
... ],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [
... "2020-01-01 11:11:11",
... "2020-02-01 12:12:12",
... "2020-03-01 13:13:13",
... "2020-04-01 08:08:08",
... "2020-05-01 23:59:59",
... ],
... },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌─────────────────────┬──────┬─────────────────────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col1_out │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ datetime[μs] │
╞═════════════════════╪══════╪═════════════════════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 ┆ 2020-01-01 01:01:01 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 ┆ 2020-01-01 02:02:02 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 ┆ 2020-01-01 12:00:01 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 ┆ 2020-01-01 18:18:18 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 ┆ 2020-01-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┴─────────────────────┘
grizz.transformer.ToDatetimeTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to convert some columns to a
polars.Datetime
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ToDatetime
>>> transformer = ToDatetime(columns=["col1"], prefix="", suffix="_out")
>>> transformer
ToDatetimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out')
>>> frame = pl.DataFrame(
... {
... "col1": [
... "2020-01-01 01:01:01",
... "2020-01-01 02:02:02",
... "2020-01-01 12:00:01",
... "2020-01-01 18:18:18",
... "2020-01-01 23:59:59",
... ],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [
... "2020-01-01 11:11:11",
... "2020-02-01 12:12:12",
... "2020-03-01 13:13:13",
... "2020-04-01 08:08:08",
... "2020-05-01 23:59:59",
... ],
... },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌─────────────────────┬──────┬─────────────────────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col1_out │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ datetime[μs] │
╞═════════════════════╪══════╪═════════════════════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 ┆ 2020-01-01 01:01:01 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 ┆ 2020-01-01 02:02:02 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 ┆ 2020-01-01 12:00:01 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 ┆ 2020-01-01 18:18:18 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 ┆ 2020-01-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┴─────────────────────┘
grizz.transformer.ToTime ¶
Bases: BaseInNOutNTransformer
Implement a transformer to convert some columns to a
polars.Time
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ToTime
>>> transformer = ToTime(columns=["col1"], format="%H:%M:%S", prefix="", suffix="_out")
>>> transformer
ToTimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', format='%H:%M:%S')
>>> frame = pl.DataFrame(
... {
... "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col1_out │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ time │
╞══════════╪══════╪══════════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 ┆ 23:59:59 │
└──────────┴──────┴──────────┴──────────┘
grizz.transformer.ToTimeTransformer ¶
Bases: BaseInNOutNTransformer
Implement a transformer to convert some columns to a
polars.Time
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns of type to convert. |
required |
prefix
|
str
|
The column name prefix for the output columns. |
required |
suffix
|
str
|
The column name suffix for the output columns. |
required |
exclude_columns
|
Sequence[str]
|
The columns to exclude from the input
|
()
|
exist_policy
|
str
|
The policy on how to handle existing columns.
The following options are available: |
'raise'
|
missing_policy
|
str
|
The policy on how to handle missing columns.
The following options are available: |
'raise'
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ToTime
>>> transformer = ToTime(columns=["col1"], format="%H:%M:%S", prefix="", suffix="_out")
>>> transformer
ToTimeTransformer(columns=('col1',), exclude_columns=(), exist_policy='raise', missing_policy='raise', prefix='', suffix='_out', format='%H:%M:%S')
>>> frame = pl.DataFrame(
... {
... "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────────┬──────┬──────────┬──────────┐
│ col1 ┆ col2 ┆ col3 ┆ col1_out │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ time │
╞══════════╪══════╪══════════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 ┆ 23:59:59 │
└──────────┴──────┴──────────┴──────────┘
grizz.transformer.is_transformer_config ¶
is_transformer_config(config: dict) -> bool
Indicate if the input configuration is a configuration for a
BaseTransformer
.
This function only checks if the value of the key _target_
is valid. It does not check the other values. If _target_
indicates a function, the returned type hint is used to check
the class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
dict
|
The configuration to check. |
required |
Returns:
Type | Description |
---|---|
bool
|
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import is_transformer_config
>>> is_transformer_config(
... {
... "_target_": "grizz.transformer.InplaceCast",
... "columns": ("col1", "col3"),
... "dtype": pl.Int32,
... }
... )
True
grizz.transformer.setup_transformer ¶
setup_transformer(
transformer: BaseTransformer | dict,
) -> BaseTransformer
Set up a polars.DataFrame
transformer.
The transformer is instantiated from its configuration
by using the BaseTransformer
factory function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transformer
|
BaseTransformer | dict
|
Specifies a |
required |
Returns:
Type | Description |
---|---|
BaseTransformer
|
An instantiated transformer. |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import setup_transformer
>>> transformer = setup_transformer(
... {
... "_target_": "grizz.transformer.InplaceCast",
... "columns": ("col1", "col3"),
... "dtype": pl.Int32,
... }
... )
>>> transformer
InplaceCastTransformer(columns=('col1', 'col3'), exclude_columns=(), missing_policy='raise', dtype=Int32)