transformer
grizz.transformer ¶
Contain polars.DataFrame
transformers.
grizz.transformer.BaseColumnsTransformer ¶
Bases: BaseTransformer
Define a base class to implement transformers that apply the same transformation on multiple columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to prepare. If |
None
|
ignore_missing
|
bool
|
If |
False
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> transformer = StripChars(columns=["col2", "col3"])
>>> transformer
StripCharsTransformer(columns=('col2', 'col3'), ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴──────┴───────┘
grizz.transformer.BaseColumnsTransformer.find_columns ¶
find_columns(frame: DataFrame) -> tuple[str, ...]
Find the columns to transform.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame. Sometimes the columns to transform are found by analyzing the input DataFrame. |
required |
Returns:
Type | Description |
---|---|
tuple[str, ...]
|
The columns to transform. |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> transformer = StripChars(columns=["col2", "col3"])
>>> transformer.find_columns(frame)
('col2', 'col3')
>>> transformer = StripChars()
>>> transformer.find_columns(frame)
('col1', 'col2', 'col3', 'col4')
grizz.transformer.BaseColumnsTransformer.find_common_columns ¶
find_common_columns(frame: DataFrame) -> tuple[str, ...]
Find the common columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame. Sometimes the columns to transform are found by analyzing the input DataFrame. |
required |
Returns:
Type | Description |
---|---|
tuple[str, ...]
|
The common columns. |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> transformer = StripChars(columns=["col2", "col3", "col5"])
>>> transformer.find_common_columns(frame)
('col2', 'col3')
>>> transformer = StripChars()
>>> transformer.find_common_columns(frame)
('col1', 'col2', 'col3', 'col4')
grizz.transformer.BaseColumnsTransformer.find_missing_columns ¶
find_missing_columns(frame: DataFrame) -> tuple[str, ...]
Find the missing columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame. Sometimes the columns to transform are found by analyzing the input DataFrame. |
required |
Returns:
Type | Description |
---|---|
tuple[str, ...]
|
The missing columns. |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> transformer = StripChars(columns=["col2", "col3", "col5"])
>>> transformer.find_missing_columns(frame)
('col5',)
>>> transformer = StripChars()
>>> transformer.find_missing_columns(frame)
()
grizz.transformer.BaseTransformer ¶
Bases: ABC
Define the base class to transform a polars.DataFrame
.
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
CastTransformer(columns=('col1', 'col3'), dtype=Int32, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.BaseTransformer.transform ¶
transform(frame: DataFrame) -> DataFrame
Transform the data in the polars.DataFrame
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The transformed DataFrame. |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.Cast ¶
Bases: BaseColumnsTransformer
Implement a transformer to convert some columns to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
CastTransformer(columns=('col1', 'col3'), dtype=Int32, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.CastTransformer ¶
Bases: BaseColumnsTransformer
Implement a transformer to convert some columns to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
CastTransformer(columns=('col1', 'col3'), dtype=Int32, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.ColumnSelection ¶
Bases: BaseColumnsTransformer
Implement a polars.DataFrame
transformer to select a subset
of columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to keep. |
required |
ignore_missing
|
bool
|
If |
False
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnSelection
>>> transformer = ColumnSelection(columns=["col1", "col2"])
>>> transformer
ColumnSelectionTransformer(columns=('col1', 'col2'), ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
... "col2": [1, 2, 3, 4, 5],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌────────────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ i64 │
╞════════════╪══════╡
│ 2020-1-1 ┆ 1 │
│ 2020-1-2 ┆ 2 │
│ 2020-1-31 ┆ 3 │
│ 2020-12-31 ┆ 4 │
│ null ┆ 5 │
└────────────┴──────┘
grizz.transformer.ColumnSelectionTransformer ¶
Bases: BaseColumnsTransformer
Implement a polars.DataFrame
transformer to select a subset
of columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to keep. |
required |
ignore_missing
|
bool
|
If |
False
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ColumnSelection
>>> transformer = ColumnSelection(columns=["col1", "col2"])
>>> transformer
ColumnSelectionTransformer(columns=('col1', 'col2'), ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
... "col2": [1, 2, 3, 4, 5],
... "col3": ["a", "b", "c", "d", "e"],
... }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌────────────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ i64 │
╞════════════╪══════╡
│ 2020-1-1 ┆ 1 │
│ 2020-1-2 ┆ 2 │
│ 2020-1-31 ┆ 3 │
│ 2020-12-31 ┆ 4 │
│ null ┆ 5 │
└────────────┴──────┘
grizz.transformer.ConcatColumns ¶
Bases: BaseColumnsTransformer
Implement a transformer to concatenate columns into a new column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to concatenate. The column should have the same type or compatible types. |
required |
out_column
|
str
|
The output column. |
required |
ignore_missing
|
bool
|
If |
False
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_column="col")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_column=col, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [11, 12, 13, 14, 15],
... "col2": [21, 22, 23, 24, 25],
... "col3": [31, 32, 33, 34, 35],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 11 ┆ 21 ┆ 31 ┆ a │
│ 12 ┆ 22 ┆ 32 ┆ b │
│ 13 ┆ 23 ┆ 33 ┆ c │
│ 14 ┆ 24 ┆ 34 ┆ d │
│ 15 ┆ 25 ┆ 35 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ list[i64] │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11 ┆ 21 ┆ 31 ┆ a ┆ [11, 21, 31] │
│ 12 ┆ 22 ┆ 32 ┆ b ┆ [12, 22, 32] │
│ 13 ┆ 23 ┆ 33 ┆ c ┆ [13, 23, 33] │
│ 14 ┆ 24 ┆ 34 ┆ d ┆ [14, 24, 34] │
│ 15 ┆ 25 ┆ 35 ┆ e ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘
grizz.transformer.ConcatColumnsTransformer ¶
Bases: BaseColumnsTransformer
Implement a transformer to concatenate columns into a new column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to concatenate. The column should have the same type or compatible types. |
required |
out_column
|
str
|
The output column. |
required |
ignore_missing
|
bool
|
If |
False
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ConcatColumns
>>> transformer = ConcatColumns(columns=["col1", "col2", "col3"], out_column="col")
>>> transformer
ConcatColumnsTransformer(columns=('col1', 'col2', 'col3'), out_column=col, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [11, 12, 13, 14, 15],
... "col2": [21, 22, 23, 24, 25],
... "col3": [31, 32, 33, 34, 35],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 11 ┆ 21 ┆ 31 ┆ a │
│ 12 ┆ 22 ┆ 32 ┆ b │
│ 13 ┆ 23 ┆ 33 ┆ c │
│ 14 ┆ 24 ┆ 34 ┆ d │
│ 15 ┆ 25 ┆ 35 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 5)
┌──────┬──────┬──────┬──────┬──────────────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ list[i64] │
╞══════╪══════╪══════╪══════╪══════════════╡
│ 11 ┆ 21 ┆ 31 ┆ a ┆ [11, 21, 31] │
│ 12 ┆ 22 ┆ 32 ┆ b ┆ [12, 22, 32] │
│ 13 ┆ 23 ┆ 33 ┆ c ┆ [13, 23, 33] │
│ 14 ┆ 24 ┆ 34 ┆ d ┆ [14, 24, 34] │
│ 15 ┆ 25 ┆ 35 ┆ e ┆ [15, 25, 35] │
└──────┴──────┴──────┴──────┴──────────────┘
grizz.transformer.DecimalCast ¶
Bases: CastTransformer
Implement a transformer to convert columns of type decimal to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DecimalCast
>>> transformer = DecimalCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
DecimalCastTransformer(columns=('col1', 'col2'), dtype=Float32, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Decimal,
... "col3": pl.Decimal,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬───────────────┬───────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ decimal[38,0] ┆ decimal[38,0] ┆ str │
╞══════╪═══════════════╪═══════════════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴───────────────┴───────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f32 ┆ decimal[38,0] ┆ str │
╞══════╪══════╪═══════════════╪══════╡
│ 1 ┆ 1.0 ┆ 1 ┆ a │
│ 2 ┆ 2.0 ┆ 2 ┆ b │
│ 3 ┆ 3.0 ┆ 3 ┆ c │
│ 4 ┆ 4.0 ┆ 4 ┆ d │
│ 5 ┆ 5.0 ┆ 5 ┆ e │
└──────┴──────┴───────────────┴──────┘
grizz.transformer.DecimalCastTransformer ¶
Bases: CastTransformer
Implement a transformer to convert columns of type decimal to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DecimalCast
>>> transformer = DecimalCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
DecimalCastTransformer(columns=('col1', 'col2'), dtype=Float32, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Decimal,
... "col3": pl.Decimal,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬───────────────┬───────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ decimal[38,0] ┆ decimal[38,0] ┆ str │
╞══════╪═══════════════╪═══════════════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴───────────────┴───────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f32 ┆ decimal[38,0] ┆ str │
╞══════╪══════╪═══════════════╪══════╡
│ 1 ┆ 1.0 ┆ 1 ┆ a │
│ 2 ┆ 2.0 ┆ 2 ┆ b │
│ 3 ┆ 3.0 ┆ 3 ┆ c │
│ 4 ┆ 4.0 ┆ 4 ┆ d │
│ 5 ┆ 5.0 ┆ 5 ┆ e │
└──────┴──────┴───────────────┴──────┘
grizz.transformer.Diff ¶
Bases: BaseTransformer
Implement a transformer to compute the first discrete difference between shifted items.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name. |
required |
out_col
|
str
|
The output column name. |
required |
shift
|
int
|
The number of slots to shift. |
1
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Diff
>>> transformer = Diff(in_col="col1", out_col="diff")
>>> transformer
DiffTransformer(in_col=col1, out_col=diff, shift=1)
>>> frame = pl.DataFrame({"col1": [1, 2, 3, 4, 5], "col2": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ i64 ┆ str │
╞══════╪══════╡
│ 1 ┆ a │
│ 2 ┆ b │
│ 3 ┆ c │
│ 4 ┆ d │
│ 5 ┆ e │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ diff │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╡
│ 1 ┆ a ┆ null │
│ 2 ┆ b ┆ 1 │
│ 3 ┆ c ┆ 1 │
│ 4 ┆ d ┆ 1 │
│ 5 ┆ e ┆ 1 │
└──────┴──────┴──────┘
grizz.transformer.DiffTransformer ¶
Bases: BaseTransformer
Implement a transformer to compute the first discrete difference between shifted items.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name. |
required |
out_col
|
str
|
The output column name. |
required |
shift
|
int
|
The number of slots to shift. |
1
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Diff
>>> transformer = Diff(in_col="col1", out_col="diff")
>>> transformer
DiffTransformer(in_col=col1, out_col=diff, shift=1)
>>> frame = pl.DataFrame({"col1": [1, 2, 3, 4, 5], "col2": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ i64 ┆ str │
╞══════╪══════╡
│ 1 ┆ a │
│ 2 ┆ b │
│ 3 ┆ c │
│ 4 ┆ d │
│ 5 ┆ e │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ diff │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╡
│ 1 ┆ a ┆ null │
│ 2 ┆ b ┆ 1 │
│ 3 ┆ c ┆ 1 │
│ 4 ┆ d ┆ 1 │
│ 5 ┆ e ┆ 1 │
└──────┴──────┴──────┘
grizz.transformer.DropDuplicate ¶
Bases: BaseColumnsTransformer
Implement a transformer to drop duplicate rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to check. If set to |
None
|
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropDuplicate
>>> transformer = DropDuplicate(keep="first", maintain_order=True)
>>> transformer
DropDuplicateTransformer(columns=None, ignore_missing=False, keep=first, maintain_order=True)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 1],
... "col2": ["1", "2", "3", "4", "1"],
... "col3": ["1", "2", "3", "1", "1"],
... "col4": ["a", "a", "a", "a", "a"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ a │
│ 3 ┆ 3 ┆ 3 ┆ a │
│ 4 ┆ 4 ┆ 1 ┆ a │
│ 1 ┆ 1 ┆ 1 ┆ a │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ a │
│ 3 ┆ 3 ┆ 3 ┆ a │
│ 4 ┆ 4 ┆ 1 ┆ a │
└──────┴──────┴──────┴──────┘
grizz.transformer.DropDuplicateTransformer ¶
Bases: BaseColumnsTransformer
Implement a transformer to drop duplicate rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to check. If set to |
None
|
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropDuplicate
>>> transformer = DropDuplicate(keep="first", maintain_order=True)
>>> transformer
DropDuplicateTransformer(columns=None, ignore_missing=False, keep=first, maintain_order=True)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 1],
... "col2": ["1", "2", "3", "4", "1"],
... "col3": ["1", "2", "3", "1", "1"],
... "col4": ["a", "a", "a", "a", "a"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ a │
│ 3 ┆ 3 ┆ 3 ┆ a │
│ 4 ┆ 4 ┆ 1 ┆ a │
│ 1 ┆ 1 ┆ 1 ┆ a │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ a │
│ 3 ┆ 3 ┆ 3 ┆ a │
│ 4 ┆ 4 ┆ 1 ┆ a │
└──────┴──────┴──────┴──────┘
grizz.transformer.DropNullColumn ¶
Bases: BaseColumnsTransformer
Implement a transformer to remove the columns that have too many null values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
None
|
threshold
|
float
|
The maximum percentage of null values to keep
columns. If the proportion of null vallues is greater
or equal to this threshold value, the column is removed.
If set to |
1.0
|
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNullColumn
>>> transformer = DropNullColumn()
>>> transformer
DropNullColumnTransformer(columns=None, threshold=1.0, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
... "col2": [1, None, 3, None, 5],
... "col3": [None, None, None, None, None],
... }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1 ┆ 1 ┆ null │
│ 2020-1-2 ┆ null ┆ null │
│ 2020-1-31 ┆ 3 ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null ┆ 5 ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌────────────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ i64 │
╞════════════╪══════╡
│ 2020-1-1 ┆ 1 │
│ 2020-1-2 ┆ null │
│ 2020-1-31 ┆ 3 │
│ 2020-12-31 ┆ null │
│ null ┆ 5 │
└────────────┴──────┘
grizz.transformer.DropNullColumnTransformer ¶
Bases: BaseColumnsTransformer
Implement a transformer to remove the columns that have too many null values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
None
|
threshold
|
float
|
The maximum percentage of null values to keep
columns. If the proportion of null vallues is greater
or equal to this threshold value, the column is removed.
If set to |
1.0
|
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNullColumn
>>> transformer = DropNullColumn()
>>> transformer
DropNullColumnTransformer(columns=None, threshold=1.0, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
... "col2": [1, None, 3, None, 5],
... "col3": [None, None, None, None, None],
... }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1 ┆ 1 ┆ null │
│ 2020-1-2 ┆ null ┆ null │
│ 2020-1-31 ┆ 3 ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null ┆ 5 ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌────────────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ i64 │
╞════════════╪══════╡
│ 2020-1-1 ┆ 1 │
│ 2020-1-2 ┆ null │
│ 2020-1-31 ┆ 3 │
│ 2020-12-31 ┆ null │
│ null ┆ 5 │
└────────────┴──────┘
grizz.transformer.DropNullRow ¶
Bases: BaseColumnsTransformer
Implement a transformer to drop all rows that contain null values.
Note that all the values in the row need to be null to drop the row.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to check. If set to |
None
|
ignore_missing
|
bool
|
If |
False
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> transformer = DropNullRow()
>>> transformer
DropNullRowTransformer(columns=None, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
... "col2": [1, None, 3, None, None],
... "col3": [None, None, None, None, None],
... }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1 ┆ 1 ┆ null │
│ 2020-1-2 ┆ null ┆ null │
│ 2020-1-31 ┆ 3 ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null ┆ null ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌────────────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1 ┆ 1 ┆ null │
│ 2020-1-2 ┆ null ┆ null │
│ 2020-1-31 ┆ 3 ┆ null │
│ 2020-12-31 ┆ null ┆ null │
└────────────┴──────┴──────┘
grizz.transformer.DropNullRowTransformer ¶
Bases: BaseColumnsTransformer
Implement a transformer to drop all rows that contain null values.
Note that all the values in the row need to be null to drop the row.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to check. If set to |
None
|
ignore_missing
|
bool
|
If |
False
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import DropNullRow
>>> transformer = DropNullRow()
>>> transformer
DropNullRowTransformer(columns=None, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": ["2020-1-1", "2020-1-2", "2020-1-31", "2020-12-31", None],
... "col2": [1, None, 3, None, None],
... "col3": [None, None, None, None, None],
... }
... )
>>> frame
shape: (5, 3)
┌────────────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1 ┆ 1 ┆ null │
│ 2020-1-2 ┆ null ┆ null │
│ 2020-1-31 ┆ 3 ┆ null │
│ 2020-12-31 ┆ null ┆ null │
│ null ┆ null ┆ null │
└────────────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (4, 3)
┌────────────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ null │
╞════════════╪══════╪══════╡
│ 2020-1-1 ┆ 1 ┆ null │
│ 2020-1-2 ┆ null ┆ null │
│ 2020-1-31 ┆ 3 ┆ null │
│ 2020-12-31 ┆ null ┆ null │
└────────────┴──────┴──────┘
grizz.transformer.FillNan ¶
Bases: BaseColumnsTransformer
Implement a transformer to fill NaN values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to prepare. If |
None
|
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FillNan
>>> transformer = FillNan(columns=["col1", "col4"], value=100)
>>> transformer
FillNanTransformer(columns=('col1', 'col4'), ignore_missing=False, value=100)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, None],
... "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
... "col3": ["a", "b", "c", "d", None],
... "col4": [1.2, float("nan"), 3.2, None, 5.2],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ NaN ┆ null ┆ 5.2 │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ 100.0 │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ NaN ┆ null ┆ 5.2 │
└──────┴──────┴──────┴───────┘
grizz.transformer.FillNanTransformer ¶
Bases: BaseColumnsTransformer
Implement a transformer to fill NaN values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to prepare. If |
None
|
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FillNan
>>> transformer = FillNan(columns=["col1", "col4"], value=100)
>>> transformer
FillNanTransformer(columns=('col1', 'col4'), ignore_missing=False, value=100)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, None],
... "col2": [1.2, 2.2, 3.2, 4.2, float("nan")],
... "col3": ["a", "b", "c", "d", None],
... "col4": [1.2, float("nan"), 3.2, None, 5.2],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ NaN ┆ null ┆ 5.2 │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ 100.0 │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ NaN ┆ null ┆ 5.2 │
└──────┴──────┴──────┴───────┘
grizz.transformer.FillNull ¶
Bases: BaseColumnsTransformer
Implement a transformer to fill null values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to prepare. If |
None
|
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FillNull
>>> transformer = FillNull(columns=["col1", "col4"], value=100)
>>> transformer
FillNullTransformer(columns=('col1', 'col4'), ignore_missing=False, value=100)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, None],
... "col2": [1.2, 2.2, 3.2, 4.2, None],
... "col3": ["a", "b", "c", "d", None],
... "col4": [1.2, float("nan"), 3.2, None, 5.2],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ null ┆ null ┆ 5.2 │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ 100.0 │
│ 100 ┆ null ┆ null ┆ 5.2 │
└──────┴──────┴──────┴───────┘
grizz.transformer.FillNullTransformer ¶
Bases: BaseColumnsTransformer
Implement a transformer to fill null values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to prepare. If |
None
|
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FillNull
>>> transformer = FillNull(columns=["col1", "col4"], value=100)
>>> transformer
FillNullTransformer(columns=('col1', 'col4'), ignore_missing=False, value=100)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, None],
... "col2": [1.2, 2.2, 3.2, 4.2, None],
... "col3": ["a", "b", "c", "d", None],
... "col4": [1.2, float("nan"), 3.2, None, 5.2],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ null │
│ null ┆ null ┆ null ┆ 5.2 │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 1.2 ┆ a ┆ 1.2 │
│ 2 ┆ 2.2 ┆ b ┆ NaN │
│ 3 ┆ 3.2 ┆ c ┆ 3.2 │
│ 4 ┆ 4.2 ┆ d ┆ 100.0 │
│ 100 ┆ null ┆ null ┆ 5.2 │
└──────┴──────┴──────┴───────┘
grizz.transformer.FloatCast ¶
Bases: CastTransformer
Implement a transformer to convert columns of type float to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FloatCast
>>> transformer = FloatCast(columns=["col1", "col2"], dtype=pl.Int32)
>>> transformer
FloatCastTransformer(columns=('col1', 'col2'), dtype=Int32, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float64,
... "col3": pl.Float64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1.0 ┆ a │
│ 2 ┆ 2.0 ┆ 2.0 ┆ b │
│ 3 ┆ 3.0 ┆ 3.0 ┆ c │
│ 4 ┆ 4.0 ┆ 4.0 ┆ d │
│ 5 ┆ 5.0 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i32 ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1.0 ┆ a │
│ 2 ┆ 2 ┆ 2.0 ┆ b │
│ 3 ┆ 3 ┆ 3.0 ┆ c │
│ 4 ┆ 4 ┆ 4.0 ┆ d │
│ 5 ┆ 5 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.FloatCastTransformer ¶
Bases: CastTransformer
Implement a transformer to convert columns of type float to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FloatCast
>>> transformer = FloatCast(columns=["col1", "col2"], dtype=pl.Int32)
>>> transformer
FloatCastTransformer(columns=('col1', 'col2'), dtype=Int32, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float64,
... "col3": pl.Float64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1.0 ┆ a │
│ 2 ┆ 2.0 ┆ 2.0 ┆ b │
│ 3 ┆ 3.0 ┆ 3.0 ┆ c │
│ 4 ┆ 4.0 ┆ 4.0 ┆ d │
│ 5 ┆ 5.0 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i32 ┆ f64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1.0 ┆ a │
│ 2 ┆ 2 ┆ 2.0 ┆ b │
│ 3 ┆ 3 ┆ 3.0 ┆ c │
│ 4 ┆ 4 ┆ 4.0 ┆ d │
│ 5 ┆ 5 ┆ 5.0 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.Function ¶
Bases: BaseTransformer
Implement a transformer that is a wrapper around a function to transform the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func
|
Callable[[DataFrame], DataFrame]
|
The function to transform the DataFrame. |
required |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FunctionTransformer
>>> transformer = FunctionTransformer(
... func=lambda frame: frame.filter(pl.col("col1").is_in({2, 4}))
... )
>>> transformer
FunctionTransformer(func=<function <lambda> at 0x...>)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (2, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 4 ┆ 4 ┆ 4 ┆ d │
└──────┴──────┴──────┴──────┘
grizz.transformer.FunctionTransformer ¶
Bases: BaseTransformer
Implement a transformer that is a wrapper around a function to transform the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func
|
Callable[[DataFrame], DataFrame]
|
The function to transform the DataFrame. |
required |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import FunctionTransformer
>>> transformer = FunctionTransformer(
... func=lambda frame: frame.filter(pl.col("col1").is_in({2, 4}))
... )
>>> transformer
FunctionTransformer(func=<function <lambda> at 0x...>)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (2, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 4 ┆ 4 ┆ 4 ┆ d │
└──────┴──────┴──────┴──────┘
grizz.transformer.IntegerCast ¶
Bases: CastTransformer
Implement a transformer to convert columns of type integer to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import IntegerCast
>>> transformer = IntegerCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
IntegerCastTransformer(columns=('col1', 'col2'), dtype=Float32, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1, 2, 3, 4, 5],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float64,
... "col3": pl.Int64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1 ┆ a │
│ 2 ┆ 2.0 ┆ 2 ┆ b │
│ 3 ┆ 3.0 ┆ 3 ┆ c │
│ 4 ┆ 4.0 ┆ 4 ┆ d │
│ 5 ┆ 5.0 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f32 ┆ f64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1.0 ┆ 1.0 ┆ 1 ┆ a │
│ 2.0 ┆ 2.0 ┆ 2 ┆ b │
│ 3.0 ┆ 3.0 ┆ 3 ┆ c │
│ 4.0 ┆ 4.0 ┆ 4 ┆ d │
│ 5.0 ┆ 5.0 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.IntegerCastTransformer ¶
Bases: CastTransformer
Implement a transformer to convert columns of type integer to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import IntegerCast
>>> transformer = IntegerCast(columns=["col1", "col2"], dtype=pl.Float32)
>>> transformer
IntegerCastTransformer(columns=('col1', 'col2'), dtype=Float32, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": [1.0, 2.0, 3.0, 4.0, 5.0],
... "col3": [1, 2, 3, 4, 5],
... "col4": ["a", "b", "c", "d", "e"],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Float64,
... "col3": pl.Int64,
... "col4": pl.String,
... },
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1.0 ┆ 1 ┆ a │
│ 2 ┆ 2.0 ┆ 2 ┆ b │
│ 3 ┆ 3.0 ┆ 3 ┆ c │
│ 4 ┆ 4.0 ┆ 4 ┆ d │
│ 5 ┆ 5.0 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f32 ┆ f64 ┆ i64 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1.0 ┆ 1.0 ┆ 1 ┆ a │
│ 2.0 ┆ 2.0 ┆ 2 ┆ b │
│ 3.0 ┆ 3.0 ┆ 3 ┆ c │
│ 4.0 ┆ 4.0 ┆ 4 ┆ d │
│ 5.0 ┆ 5.0 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
grizz.transformer.JsonDecode ¶
Bases: BaseColumnsTransformer
Implement a transformer to parse string values as JSON.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to parse. |
required |
dtype
|
PolarsDataType | PythonDataType | None
|
The dtype to cast the extracted value to.
If |
None
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import JsonDecode
>>> transformer = JsonDecode(columns=["col1", "col3"])
>>> transformer
JsonDecodeTransformer(columns=('col1', 'col3'), dtype=None, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2] ┆ 1 ┆ ['1', '2'] ┆ a │
│ [2] ┆ 2 ┆ ['2'] ┆ b │
│ [1, 2, 3] ┆ 3 ┆ ['1', '2', '3'] ┆ c │
│ [4, 5] ┆ 4 ┆ ['4', '5'] ┆ d │
│ [5, 4] ┆ 5 ┆ ['5', '4'] ┆ e │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ list[i64] ┆ str ┆ list[str] ┆ str │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2] ┆ 1 ┆ ["1", "2"] ┆ a │
│ [2] ┆ 2 ┆ ["2"] ┆ b │
│ [1, 2, 3] ┆ 3 ┆ ["1", "2", "3"] ┆ c │
│ [4, 5] ┆ 4 ┆ ["4", "5"] ┆ d │
│ [5, 4] ┆ 5 ┆ ["5", "4"] ┆ e │
└───────────┴──────┴─────────────────┴──────┘
grizz.transformer.JsonDecodeTransformer ¶
Bases: BaseColumnsTransformer
Implement a transformer to parse string values as JSON.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to parse. |
required |
dtype
|
PolarsDataType | PythonDataType | None
|
The dtype to cast the extracted value to.
If |
None
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import JsonDecode
>>> transformer = JsonDecode(columns=["col1", "col3"])
>>> transformer
JsonDecodeTransformer(columns=('col1', 'col3'), dtype=None, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2] ┆ 1 ┆ ['1', '2'] ┆ a │
│ [2] ┆ 2 ┆ ['2'] ┆ b │
│ [1, 2, 3] ┆ 3 ┆ ['1', '2', '3'] ┆ c │
│ [4, 5] ┆ 4 ┆ ['4', '5'] ┆ d │
│ [5, 4] ┆ 5 ┆ ['5', '4'] ┆ e │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ list[i64] ┆ str ┆ list[str] ┆ str │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2] ┆ 1 ┆ ["1", "2"] ┆ a │
│ [2] ┆ 2 ┆ ["2"] ┆ b │
│ [1, 2, 3] ┆ 3 ┆ ["1", "2", "3"] ┆ c │
│ [4, 5] ┆ 4 ┆ ["4", "5"] ┆ d │
│ [5, 4] ┆ 5 ┆ ["5", "4"] ┆ e │
└───────────┴──────┴─────────────────┴──────┘
grizz.transformer.Replace ¶
Bases: BaseTransformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
orig_column
|
str
|
The original column name. |
required |
final_column
|
str
|
The final column name. |
required |
*args
|
Any
|
The positional arguments to pass to |
()
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Replace
>>> transformer = Replace(
... orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3}
... )
>>> transformer
ReplaceTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ d │
│ e ┆ e │
└─────┴─────┘
>>> transformer = Replace(
... orig_column="old",
... final_column="new",
... old={"a": 1, "b": 2, "c": 3},
... default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ null │
│ e ┆ null │
└─────┴──────┘
grizz.transformer.ReplaceStrict ¶
Bases: BaseTransformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
orig_column
|
str
|
The original column name. |
required |
final_column
|
str
|
The final column name. |
required |
*args
|
Any
|
The positional arguments to pass to |
()
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ReplaceStrict
>>> transformer = ReplaceStrict(
... orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
ReplaceStrictTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ 4 │
│ e ┆ 5 │
└─────┴─────┘
>>> transformer = ReplaceStrict(
... orig_column="old",
... final_column="new",
... old={"a": 1, "b": 2, "c": 3},
... default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ null │
│ e ┆ null │
└─────┴──────┘
grizz.transformer.ReplaceStrictTransformer ¶
Bases: BaseTransformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
orig_column
|
str
|
The original column name. |
required |
final_column
|
str
|
The final column name. |
required |
*args
|
Any
|
The positional arguments to pass to |
()
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ReplaceStrict
>>> transformer = ReplaceStrict(
... orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
ReplaceStrictTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ 4 │
│ e ┆ 5 │
└─────┴─────┘
>>> transformer = ReplaceStrict(
... orig_column="old",
... final_column="new",
... old={"a": 1, "b": 2, "c": 3},
... default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ null │
│ e ┆ null │
└─────┴──────┘
grizz.transformer.ReplaceTransformer ¶
Bases: BaseTransformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
orig_column
|
str
|
The original column name. |
required |
final_column
|
str
|
The final column name. |
required |
*args
|
Any
|
The positional arguments to pass to |
()
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Replace
>>> transformer = Replace(
... orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3}
... )
>>> transformer
ReplaceTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ d │
│ e ┆ e │
└─────┴─────┘
>>> transformer = Replace(
... orig_column="old",
... final_column="new",
... old={"a": 1, "b": 2, "c": 3},
... default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ null │
│ e ┆ null │
└─────┴──────┘
grizz.transformer.Sequential ¶
Bases: BaseTransformer
Implement a polars.DataFrame
transformer to apply
sequentially several transformers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transformers
|
Sequence[BaseTransformer | dict]
|
The transformers or their configurations. |
required |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import (
... Sequential,
... Cast,
... )
>>> transformer = Sequential(
... [
... Cast(columns=["col1"], dtype=pl.Float32),
... Cast(columns=["col2"], dtype=pl.Int64),
... ]
... )
>>> transformer
SequentialTransformer(
(0): CastTransformer(columns=('col1',), dtype=Float32, ignore_missing=False)
(1): CastTransformer(columns=('col2',), dtype=Int64, ignore_missing=False)
)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f32 ┆ i64 ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1.0 ┆ 1 ┆ a ┆ a │
│ 2.0 ┆ 2 ┆ b ┆ b │
│ 3.0 ┆ 3 ┆ c ┆ c │
│ 4.0 ┆ 4 ┆ d ┆ d │
│ 5.0 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
grizz.transformer.SequentialTransformer ¶
Bases: BaseTransformer
Implement a polars.DataFrame
transformer to apply
sequentially several transformers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transformers
|
Sequence[BaseTransformer | dict]
|
The transformers or their configurations. |
required |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import (
... Sequential,
... Cast,
... )
>>> transformer = Sequential(
... [
... Cast(columns=["col1"], dtype=pl.Float32),
... Cast(columns=["col2"], dtype=pl.Int64),
... ]
... )
>>> transformer
SequentialTransformer(
(0): CastTransformer(columns=('col1',), dtype=Float32, ignore_missing=False)
(1): CastTransformer(columns=('col2',), dtype=Int64, ignore_missing=False)
)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f32 ┆ i64 ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1.0 ┆ 1 ┆ a ┆ a │
│ 2.0 ┆ 2 ┆ b ┆ b │
│ 3.0 ┆ 3 ┆ c ┆ c │
│ 4.0 ┆ 4 ┆ d ┆ d │
│ 5.0 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
grizz.transformer.Sort ¶
Bases: BaseTransformer
Implement a transformer to sort the DataFrame by the given columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to convert. |
required |
*args
|
Any
|
The positional arguments to pass to |
()
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Sort
>>> transformer = Sort(columns=["col3", "col1"])
>>> transformer
SortTransformer(columns=('col3', 'col1'))
>>> frame = pl.DataFrame(
... {"col1": [1, 2, None], "col2": [6.0, 5.0, 4.0], "col3": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 5.0 ┆ c │
│ null ┆ 4.0 ┆ b │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ null ┆ 4.0 ┆ b │
│ 2 ┆ 5.0 ┆ c │
└──────┴──────┴──────┘
grizz.transformer.SortColumns ¶
Bases: BaseTransformer
Implement a transformer to sort the DataFrame columns by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
reverse
|
bool
|
If set to |
False
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import SortColumns
>>> transformer = SortColumns()
>>> transformer
SortColumnsTransformer(reverse=False)
>>> frame = pl.DataFrame(
... {"col2": [1, 2, None], "col3": [6.0, 5.0, 4.0], "col1": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col2 ┆ col3 ┆ col1 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 5.0 ┆ c │
│ null ┆ 4.0 ┆ b │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 │
╞══════╪══════╪══════╡
│ a ┆ 1 ┆ 6.0 │
│ c ┆ 2 ┆ 5.0 │
│ b ┆ null ┆ 4.0 │
└──────┴──────┴──────┘
grizz.transformer.SortColumnsTransformer ¶
Bases: BaseTransformer
Implement a transformer to sort the DataFrame columns by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
reverse
|
bool
|
If set to |
False
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import SortColumns
>>> transformer = SortColumns()
>>> transformer
SortColumnsTransformer(reverse=False)
>>> frame = pl.DataFrame(
... {"col2": [1, 2, None], "col3": [6.0, 5.0, 4.0], "col1": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col2 ┆ col3 ┆ col1 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 5.0 ┆ c │
│ null ┆ 4.0 ┆ b │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 │
╞══════╪══════╪══════╡
│ a ┆ 1 ┆ 6.0 │
│ c ┆ 2 ┆ 5.0 │
│ b ┆ null ┆ 4.0 │
└──────┴──────┴──────┘
grizz.transformer.SortTransformer ¶
Bases: BaseTransformer
Implement a transformer to sort the DataFrame by the given columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to convert. |
required |
*args
|
Any
|
The positional arguments to pass to |
()
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import Sort
>>> transformer = Sort(columns=["col3", "col1"])
>>> transformer
SortTransformer(columns=('col3', 'col1'))
>>> frame = pl.DataFrame(
... {"col1": [1, 2, None], "col2": [6.0, 5.0, 4.0], "col3": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 5.0 ┆ c │
│ null ┆ 4.0 ┆ b │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ null ┆ 4.0 ┆ b │
│ 2 ┆ 5.0 ┆ c │
└──────┴──────┴──────┘
grizz.transformer.SqlTransformer ¶
Bases: BaseTransformer
Implement a transformer that execute a SQL query against the DataFrame..
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query
|
str
|
The SQL query to execute. |
required |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import SqlTransformer
>>> transformer = SqlTransformer(query="SELECT col1, col4 FROM self WHERE col1 > 2")
>>> transformer
SqlTransformer(
(query): SELECT col1, col4 FROM self WHERE col1 > 2
)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (3, 2)
┌──────┬──────┐
│ col1 ┆ col4 │
│ --- ┆ --- │
│ i64 ┆ str │
╞══════╪══════╡
│ 3 ┆ c │
│ 4 ┆ d │
│ 5 ┆ e │
└──────┴──────┘
grizz.transformer.StripChars ¶
Bases: BaseColumnsTransformer
Implement a transformer to remove leading and trailing characters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to prepare. If |
None
|
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> transformer = StripChars(columns=["col2", "col3"])
>>> transformer
StripCharsTransformer(columns=('col2', 'col3'), ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴──────┴───────┘
grizz.transformer.StripCharsTransformer ¶
Bases: BaseColumnsTransformer
Implement a transformer to remove leading and trailing characters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to prepare. If |
None
|
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import StripChars
>>> transformer = StripChars(columns=["col2", "col3"])
>>> transformer
StripCharsTransformer(columns=('col2', 'col3'), ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴──────┴───────┘
grizz.transformer.TimeDiff ¶
Bases: BaseTransformer
Implement a transformer to compute the time difference between consecutive time steps.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
group_cols
|
Sequence[str]
|
The columns used to generate the group for each sequence. |
required |
time_col
|
str
|
The input time column name. |
required |
time_diff_col
|
str
|
The output time difference column name. |
required |
shift
|
int
|
The number of slots to shift. |
1
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import TimeDiff
>>> transformer = TimeDiff(group_cols=["col"], time_col="time", time_diff_col="diff")
>>> transformer
TimeDiffTransformer(group_cols=['col'], time_col=time, time_diff_col=diff, shift=1)
>>> frame = pl.DataFrame({"col": ["a", "b", "a", "a", "b"], "time": [1, 2, 3, 4, 5]})
>>> frame
shape: (5, 2)
┌─────┬──────┐
│ col ┆ time │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ a ┆ 3 │
│ a ┆ 4 │
│ b ┆ 5 │
└─────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────┬──────┬──────┐
│ col ┆ time ┆ diff │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪══════╪══════╡
│ a ┆ 1 ┆ 0 │
│ a ┆ 3 ┆ 2 │
│ a ┆ 4 ┆ 1 │
│ b ┆ 2 ┆ 0 │
│ b ┆ 5 ┆ 3 │
└─────┴──────┴──────┘
grizz.transformer.TimeDiffTransformer ¶
Bases: BaseTransformer
Implement a transformer to compute the time difference between consecutive time steps.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
group_cols
|
Sequence[str]
|
The columns used to generate the group for each sequence. |
required |
time_col
|
str
|
The input time column name. |
required |
time_diff_col
|
str
|
The output time difference column name. |
required |
shift
|
int
|
The number of slots to shift. |
1
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import TimeDiff
>>> transformer = TimeDiff(group_cols=["col"], time_col="time", time_diff_col="diff")
>>> transformer
TimeDiffTransformer(group_cols=['col'], time_col=time, time_diff_col=diff, shift=1)
>>> frame = pl.DataFrame({"col": ["a", "b", "a", "a", "b"], "time": [1, 2, 3, 4, 5]})
>>> frame
shape: (5, 2)
┌─────┬──────┐
│ col ┆ time │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ a ┆ 3 │
│ a ┆ 4 │
│ b ┆ 5 │
└─────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────┬──────┬──────┐
│ col ┆ time ┆ diff │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪══════╪══════╡
│ a ┆ 1 ┆ 0 │
│ a ┆ 3 ┆ 2 │
│ a ┆ 4 ┆ 1 │
│ b ┆ 2 ┆ 0 │
│ b ┆ 5 ┆ 3 │
└─────┴──────┴──────┘
grizz.transformer.TimeToSecond ¶
Bases: BaseTransformer
Implement a transformer to convert a column with time values to seconds.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column with the time value to convert. |
required |
out_col
|
str
|
The output column with the time in seconds. |
required |
Example usage:
>>> import datetime
>>> import polars as pl
>>> from grizz.transformer import TimeToSecond
>>> transformer = TimeToSecond(in_col="time", out_col="second")
>>> transformer
TimeToSecondTransformer(in_col=time, out_col=second)
>>> frame = pl.DataFrame(
... {
... "time": [
... datetime.time(0, 0, 1, 890000),
... datetime.time(0, 1, 1, 890000),
... datetime.time(1, 1, 1, 890000),
... datetime.time(0, 19, 19, 890000),
... datetime.time(19, 19, 19, 890000),
... ],
... "col": ["a", "b", "c", "d", "e"],
... },
... schema={"time": pl.Time, "col": pl.String},
... )
>>> frame
shape: (5, 2)
┌──────────────┬─────┐
│ time ┆ col │
│ --- ┆ --- │
│ time ┆ str │
╞══════════════╪═════╡
│ 00:00:01.890 ┆ a │
│ 00:01:01.890 ┆ b │
│ 01:01:01.890 ┆ c │
│ 00:19:19.890 ┆ d │
│ 19:19:19.890 ┆ e │
└──────────────┴─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────────┬─────┬──────────┐
│ time ┆ col ┆ second │
│ --- ┆ --- ┆ --- │
│ time ┆ str ┆ f64 │
╞══════════════╪═════╪══════════╡
│ 00:00:01.890 ┆ a ┆ 1.89 │
│ 00:01:01.890 ┆ b ┆ 61.89 │
│ 01:01:01.890 ┆ c ┆ 3661.89 │
│ 00:19:19.890 ┆ d ┆ 1159.89 │
│ 19:19:19.890 ┆ e ┆ 69559.89 │
└──────────────┴─────┴──────────┘
grizz.transformer.TimeToSecondTransformer ¶
Bases: BaseTransformer
Implement a transformer to convert a column with time values to seconds.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column with the time value to convert. |
required |
out_col
|
str
|
The output column with the time in seconds. |
required |
Example usage:
>>> import datetime
>>> import polars as pl
>>> from grizz.transformer import TimeToSecond
>>> transformer = TimeToSecond(in_col="time", out_col="second")
>>> transformer
TimeToSecondTransformer(in_col=time, out_col=second)
>>> frame = pl.DataFrame(
... {
... "time": [
... datetime.time(0, 0, 1, 890000),
... datetime.time(0, 1, 1, 890000),
... datetime.time(1, 1, 1, 890000),
... datetime.time(0, 19, 19, 890000),
... datetime.time(19, 19, 19, 890000),
... ],
... "col": ["a", "b", "c", "d", "e"],
... },
... schema={"time": pl.Time, "col": pl.String},
... )
>>> frame
shape: (5, 2)
┌──────────────┬─────┐
│ time ┆ col │
│ --- ┆ --- │
│ time ┆ str │
╞══════════════╪═════╡
│ 00:00:01.890 ┆ a │
│ 00:01:01.890 ┆ b │
│ 01:01:01.890 ┆ c │
│ 00:19:19.890 ┆ d │
│ 19:19:19.890 ┆ e │
└──────────────┴─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────────┬─────┬──────────┐
│ time ┆ col ┆ second │
│ --- ┆ --- ┆ --- │
│ time ┆ str ┆ f64 │
╞══════════════╪═════╪══════════╡
│ 00:00:01.890 ┆ a ┆ 1.89 │
│ 00:01:01.890 ┆ b ┆ 61.89 │
│ 01:01:01.890 ┆ c ┆ 3661.89 │
│ 00:19:19.890 ┆ d ┆ 1159.89 │
│ 19:19:19.890 ┆ e ┆ 69559.89 │
└──────────────┴─────┴──────────┘
grizz.transformer.ToDatetime ¶
Bases: BaseColumnsTransformer
Implement a transformer to convert some columns to a
polars.Datetime
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
format
|
str | None
|
Format to use for conversion. Refer to the
chrono crate documentation
for the full specification.
Example: |
None
|
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ToDatetime
>>> transformer = ToDatetime(columns=["col1"])
>>> transformer
ToDatetimeTransformer(columns=('col1',), format=None, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [
... "2020-01-01 01:01:01",
... "2020-01-01 02:02:02",
... "2020-01-01 12:00:01",
... "2020-01-01 18:18:18",
... "2020-01-01 23:59:59",
... ],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [
... "2020-01-01 11:11:11",
... "2020-02-01 12:12:12",
... "2020-03-01 13:13:13",
... "2020-04-01 08:08:08",
... "2020-05-01 23:59:59",
... ],
... },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
grizz.transformer.ToDatetimeTransformer ¶
Bases: BaseColumnsTransformer
Implement a transformer to convert some columns to a
polars.Datetime
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
format
|
str | None
|
Format to use for conversion. Refer to the
chrono crate documentation
for the full specification.
Example: |
None
|
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ToDatetime
>>> transformer = ToDatetime(columns=["col1"])
>>> transformer
ToDatetimeTransformer(columns=('col1',), format=None, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": [
... "2020-01-01 01:01:01",
... "2020-01-01 02:02:02",
... "2020-01-01 12:00:01",
... "2020-01-01 18:18:18",
... "2020-01-01 23:59:59",
... ],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": [
... "2020-01-01 11:11:11",
... "2020-02-01 12:12:12",
... "2020-03-01 13:13:13",
... "2020-04-01 08:08:08",
... "2020-05-01 23:59:59",
... ],
... },
... )
>>> frame
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────────────────────┬──────┬─────────────────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ str ┆ str │
╞═════════════════════╪══════╪═════════════════════╡
│ 2020-01-01 01:01:01 ┆ 1 ┆ 2020-01-01 11:11:11 │
│ 2020-01-01 02:02:02 ┆ 2 ┆ 2020-02-01 12:12:12 │
│ 2020-01-01 12:00:01 ┆ 3 ┆ 2020-03-01 13:13:13 │
│ 2020-01-01 18:18:18 ┆ 4 ┆ 2020-04-01 08:08:08 │
│ 2020-01-01 23:59:59 ┆ 5 ┆ 2020-05-01 23:59:59 │
└─────────────────────┴──────┴─────────────────────┘
grizz.transformer.ToTime ¶
Bases: BaseColumnsTransformer
Implement a transformer to convert some columns to a
polars.Time
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
format
|
str | None
|
Format to use for conversion. Refer to the
chrono crate documentation
for the full specification. Example: |
None
|
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ToTime
>>> transformer = ToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
ToTimeTransformer(columns=('col1',), format=%H:%M:%S, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ time ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
grizz.transformer.ToTimeTransformer ¶
Bases: BaseColumnsTransformer
Implement a transformer to convert some columns to a
polars.Time
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str] | None
|
The columns to convert. |
required |
format
|
str | None
|
Format to use for conversion. Refer to the
chrono crate documentation
for the full specification. Example: |
None
|
ignore_missing
|
bool
|
If |
False
|
**kwargs
|
Any
|
The keyword arguments for |
{}
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import ToTime
>>> transformer = ToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
ToTimeTransformer(columns=('col1',), format=%H:%M:%S, ignore_missing=False)
>>> frame = pl.DataFrame(
... {
... "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ time ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
grizz.transformer.is_transformer_config ¶
is_transformer_config(config: dict) -> bool
Indicate if the input configuration is a configuration for a
BaseTransformer
.
This function only checks if the value of the key _target_
is valid. It does not check the other values. If _target_
indicates a function, the returned type hint is used to check
the class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
dict
|
The configuration to check. |
required |
Returns:
Type | Description |
---|---|
bool
|
|
Example usage:
>>> import polars as pl
>>> from grizz.transformer import is_transformer_config
>>> is_transformer_config(
... {
... "_target_": "grizz.transformer.Cast",
... "columns": ("col1", "col3"),
... "dtype": pl.Int32,
... }
... )
True
grizz.transformer.setup_transformer ¶
setup_transformer(
transformer: BaseTransformer | dict,
) -> BaseTransformer
Set up a polars.DataFrame
transformer.
The transformer is instantiated from its configuration
by using the BaseTransformer
factory function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transformer
|
BaseTransformer | dict
|
Specifies a |
required |
Returns:
Type | Description |
---|---|
BaseTransformer
|
An instantiated transformer. |
Example usage:
>>> import polars as pl
>>> from grizz.transformer import setup_transformer
>>> transformer = setup_transformer(
... {
... "_target_": "grizz.transformer.Cast",
... "columns": ("col1", "col3"),
... "dtype": pl.Int32,
... }
... )
>>> transformer
CastTransformer(columns=('col1', 'col3'), dtype=Int32, ignore_missing=False)