Skip to content

arctix.transformer

arctix.transformer.dataframe

Contain DataFrame transformers.

arctix.transformer.dataframe.BaseDataFrameTransformer

Bases: ABC

Define the base class to transform a polars.DataFrame.

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
CastDataFrameTransformer(columns=('col1', 'col3'), dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

arctix.transformer.dataframe.BaseDataFrameTransformer.transform

transform(frame: DataFrame) -> DataFrame

Transform the data in the polars.DataFrame.

Parameters:

Name Type Description Default
frame DataFrame

Specifies the polars.DataFrame to transform.

required

Returns:

Type Description
DataFrame

The transformed DataFrame.

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

arctix.transformer.dataframe.Cast

Bases: BaseDataFrameTransformer

Implement a transformer to convert some columns to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to convert.

required
dtype type[DataType]

The target data type.

required

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
CastDataFrameTransformer(columns=('col1', 'col3'), dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

arctix.transformer.dataframe.CastDataFrameTransformer

Bases: BaseDataFrameTransformer

Implement a transformer to convert some columns to a new data type.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to convert.

required
dtype type[DataType]

The target data type.

required

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
CastDataFrameTransformer(columns=('col1', 'col3'), dtype=Int32)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i32  ┆ str  ┆ i32  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ 1    ┆ a    │
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 3    ┆ 3    ┆ 3    ┆ c    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
│ 5    ┆ 5    ┆ 5    ┆ e    │
└──────┴──────┴──────┴──────┘

arctix.transformer.dataframe.Diff

Bases: BaseDataFrameTransformer

Implement a transformer to compute the first discrete difference between shifted items.

Parameters:

Name Type Description Default
in_col str

The input column name.

required
out_col str

The output column name.

required
shift int

The number of slots to shift.

1

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import Diff
>>> transformer = Diff(in_col="col1", out_col="diff")
>>> transformer
DiffDataFrameTransformer(in_col=col1, out_col=diff, shift=1)
>>> frame = pl.DataFrame({"col1": [1, 2, 3, 4, 5], "col2": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ a    │
│ 2    ┆ b    │
│ 3    ┆ c    │
│ 4    ┆ d    │
│ 5    ┆ e    │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ diff │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╡
│ 1    ┆ a    ┆ null │
│ 2    ┆ b    ┆ 1    │
│ 3    ┆ c    ┆ 1    │
│ 4    ┆ d    ┆ 1    │
│ 5    ┆ e    ┆ 1    │
└──────┴──────┴──────┘

arctix.transformer.dataframe.DiffDataFrameTransformer

Bases: BaseDataFrameTransformer

Implement a transformer to compute the first discrete difference between shifted items.

Parameters:

Name Type Description Default
in_col str

The input column name.

required
out_col str

The output column name.

required
shift int

The number of slots to shift.

1

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import Diff
>>> transformer = Diff(in_col="col1", out_col="diff")
>>> transformer
DiffDataFrameTransformer(in_col=col1, out_col=diff, shift=1)
>>> frame = pl.DataFrame({"col1": [1, 2, 3, 4, 5], "col2": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ i64  ┆ str  │
╞══════╪══════╡
│ 1    ┆ a    │
│ 2    ┆ b    │
│ 3    ┆ c    │
│ 4    ┆ d    │
│ 5    ┆ e    │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ diff │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ i64  │
╞══════╪══════╪══════╡
│ 1    ┆ a    ┆ null │
│ 2    ┆ b    ┆ 1    │
│ 3    ┆ c    ┆ 1    │
│ 4    ┆ d    ┆ 1    │
│ 5    ┆ e    ┆ 1    │
└──────┴──────┴──────┘

arctix.transformer.dataframe.Function

Bases: BaseDataFrameTransformer

Implement a transformer that is a wrapper around a function to transform the DataFrame.

Parameters:

Name Type Description Default
func Callable[[DataFrame], DataFrame]

The function to transform the DataFrame.

required

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import FunctionDataFrameTransformer
>>> transformer = FunctionDataFrameTransformer(
...     func=lambda frame: frame.filter(pl.col("col1").is_in({2, 4}))
... )
>>> transformer
FunctionDataFrameTransformer(func=<function <lambda> at 0x...>)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (2, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
└──────┴──────┴──────┴──────┘

arctix.transformer.dataframe.FunctionDataFrameTransformer

Bases: BaseDataFrameTransformer

Implement a transformer that is a wrapper around a function to transform the DataFrame.

Parameters:

Name Type Description Default
func Callable[[DataFrame], DataFrame]

The function to transform the DataFrame.

required

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import FunctionDataFrameTransformer
>>> transformer = FunctionDataFrameTransformer(
...     func=lambda frame: frame.filter(pl.col("col1").is_in({2, 4}))
... )
>>> transformer
FunctionDataFrameTransformer(func=<function <lambda> at 0x...>)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["1", "2", "3", "4", "5"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (2, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╪══════╡
│ 2    ┆ 2    ┆ 2    ┆ b    │
│ 4    ┆ 4    ┆ 4    ┆ d    │
└──────┴──────┴──────┴──────┘

arctix.transformer.dataframe.IndexToToken

Bases: ReplaceStrictDataFrameTransformer

Replace.

Parameters:

Name Type Description Default
vocab Vocabulary

The vocabulary which contains the index to token mapping.

required
index_column str

The column name which contains the input indices.

required
token_column str

The column name which contains the output tokens.

required
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> from collections import Counter
>>> import polars as pl
>>> from arctix.transformer.dataframe import IndexToToken
>>> from arctix.utils.vocab import Vocabulary
>>> vocab = Vocabulary(Counter({"b": 3, "a": 1, "c": 2, "d": 4}))
>>> vocab.get_index_to_token()
('b', 'a', 'c', 'd')
>>> transformer = IndexToToken(
...     vocab=vocab,
...     index_column="col",
...     token_column="token",
... )
>>> transformer
IndexToTokenDataFrameTransformer(orig_column=col, final_column=token)
>>> frame = pl.DataFrame({"col": [1, 0, 2, 3, 1]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 0   │
│ 2   │
│ 3   │
│ 1   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬───────┐
│ col ┆ token │
│ --- ┆ ---   │
│ i64 ┆ str   │
╞═════╪═══════╡
│ 1   ┆ a     │
│ 0   ┆ b     │
│ 2   ┆ c     │
│ 3   ┆ d     │
│ 1   ┆ a     │
└─────┴───────┘

arctix.transformer.dataframe.IndexToTokenDataFrameTransformer

Bases: ReplaceStrictDataFrameTransformer

Replace.

Parameters:

Name Type Description Default
vocab Vocabulary

The vocabulary which contains the index to token mapping.

required
index_column str

The column name which contains the input indices.

required
token_column str

The column name which contains the output tokens.

required
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> from collections import Counter
>>> import polars as pl
>>> from arctix.transformer.dataframe import IndexToToken
>>> from arctix.utils.vocab import Vocabulary
>>> vocab = Vocabulary(Counter({"b": 3, "a": 1, "c": 2, "d": 4}))
>>> vocab.get_index_to_token()
('b', 'a', 'c', 'd')
>>> transformer = IndexToToken(
...     vocab=vocab,
...     index_column="col",
...     token_column="token",
... )
>>> transformer
IndexToTokenDataFrameTransformer(orig_column=col, final_column=token)
>>> frame = pl.DataFrame({"col": [1, 0, 2, 3, 1]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 0   │
│ 2   │
│ 3   │
│ 1   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬───────┐
│ col ┆ token │
│ --- ┆ ---   │
│ i64 ┆ str   │
╞═════╪═══════╡
│ 1   ┆ a     │
│ 0   ┆ b     │
│ 2   ┆ c     │
│ 3   ┆ d     │
│ 1   ┆ a     │
└─────┴───────┘

arctix.transformer.dataframe.JsonDecode

Bases: BaseDataFrameTransformer

Implement a transformer to parse string values as JSON.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to parse.

required
dtype PolarsDataType | PythonDataType | None

The dtype to cast the extracted value to. If None, the dtype will be inferred from the JSON value.

None

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import JsonDecode
>>> transformer = JsonDecode(columns=["col1", "col3"])
>>> transformer
JsonDecodeDataFrameTransformer(columns=('col1', 'col3'), dtype=None)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ str       ┆ str  ┆ str             ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ list[i64] ┆ str  ┆ list[str]       ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ["1", "2"]      ┆ a    │
│ [2]       ┆ 2    ┆ ["2"]           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ["1", "2", "3"] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ["4", "5"]      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ["5", "4"]      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘

arctix.transformer.dataframe.JsonDecodeDataFrameTransformer

Bases: BaseDataFrameTransformer

Implement a transformer to parse string values as JSON.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to parse.

required
dtype PolarsDataType | PythonDataType | None

The dtype to cast the extracted value to. If None, the dtype will be inferred from the JSON value.

None

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import JsonDecode
>>> transformer = JsonDecode(columns=["col1", "col3"])
>>> transformer
JsonDecodeDataFrameTransformer(columns=('col1', 'col3'), dtype=None)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
...         "col4": ["a", "b", "c", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ str       ┆ str  ┆ str             ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ['1', '2']      ┆ a    │
│ [2]       ┆ 2    ┆ ['2']           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ['1', '2', '3'] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ['4', '5']      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ['5', '4']      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1      ┆ col2 ┆ col3            ┆ col4 │
│ ---       ┆ ---  ┆ ---             ┆ ---  │
│ list[i64] ┆ str  ┆ list[str]       ┆ str  │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2]    ┆ 1    ┆ ["1", "2"]      ┆ a    │
│ [2]       ┆ 2    ┆ ["2"]           ┆ b    │
│ [1, 2, 3] ┆ 3    ┆ ["1", "2", "3"] ┆ c    │
│ [4, 5]    ┆ 4    ┆ ["4", "5"]      ┆ d    │
│ [5, 4]    ┆ 5    ┆ ["5", "4"]      ┆ e    │
└───────────┴──────┴─────────────────┴──────┘

arctix.transformer.dataframe.Replace

Bases: BaseDataFrameTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
orig_column str

The original column name.

required
final_column str

The final column name.

required
*args Any

The positional arguments to pass to replace.

()
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import Replace
>>> transformer = Replace(
...     orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3}
... )
>>> transformer
ReplaceDataFrameTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ d   │
│ e   ┆ e   │
└─────┴─────┘
>>> transformer = Replace(
...     orig_column="old",
...     final_column="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

arctix.transformer.dataframe.ReplaceDataFrameTransformer

Bases: BaseDataFrameTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
orig_column str

The original column name.

required
final_column str

The final column name.

required
*args Any

The positional arguments to pass to replace.

()
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import Replace
>>> transformer = Replace(
...     orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3}
... )
>>> transformer
ReplaceDataFrameTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ d   │
│ e   ┆ e   │
└─────┴─────┘
>>> transformer = Replace(
...     orig_column="old",
...     final_column="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

arctix.transformer.dataframe.ReplaceStrict

Bases: BaseDataFrameTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
orig_column str

The original column name.

required
final_column str

The final column name.

required
*args Any

The positional arguments to pass to replace.

()
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import ReplaceStrict
>>> transformer = ReplaceStrict(
...     orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
ReplaceStrictDataFrameTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ 4   │
│ e   ┆ 5   │
└─────┴─────┘
>>> transformer = ReplaceStrict(
...     orig_column="old",
...     final_column="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

arctix.transformer.dataframe.ReplaceStrictDataFrameTransformer

Bases: BaseDataFrameTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
orig_column str

The original column name.

required
final_column str

The final column name.

required
*args Any

The positional arguments to pass to replace.

()
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import ReplaceStrict
>>> transformer = ReplaceStrict(
...     orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
ReplaceStrictDataFrameTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ e   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 1   │
│ b   ┆ 2   │
│ c   ┆ 3   │
│ d   ┆ 4   │
│ e   ┆ 5   │
└─────┴─────┘
>>> transformer = ReplaceStrict(
...     orig_column="old",
...     final_column="new",
...     old={"a": 1, "b": 2, "c": 3},
...     default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new  │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ c   ┆ 3    │
│ d   ┆ null │
│ e   ┆ null │
└─────┴──────┘

arctix.transformer.dataframe.Sequential

Bases: BaseDataFrameTransformer

Implement a polars.DataFrame transformer to apply sequentially several transformers.

Parameters:

Name Type Description Default
transformers Sequence[BaseDataFrameTransformer | dict]

The transformers or their configurations.

required

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import (
...     Sequential,
...     Cast,
... )
>>> transformer = Sequential(
...     [
...         Cast(columns=["col1"], dtype=pl.Float32),
...         Cast(columns=["col2"], dtype=pl.Int64),
...     ]
... )
>>> transformer
SequentialDataFrameTransformer(
  (0): CastDataFrameTransformer(columns=('col1',), dtype=Float32)
  (1): CastDataFrameTransformer(columns=('col2',), dtype=Int64)
)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ f32  ┆ i64  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1.0  ┆ 1    ┆ a     ┆ a     │
│ 2.0  ┆ 2    ┆  b    ┆  b    │
│ 3.0  ┆ 3    ┆   c   ┆   c   │
│ 4.0  ┆ 4    ┆ d     ┆ d     │
│ 5.0  ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘

arctix.transformer.dataframe.SequentialDataFrameTransformer

Bases: BaseDataFrameTransformer

Implement a polars.DataFrame transformer to apply sequentially several transformers.

Parameters:

Name Type Description Default
transformers Sequence[BaseDataFrameTransformer | dict]

The transformers or their configurations.

required

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import (
...     Sequential,
...     Cast,
... )
>>> transformer = Sequential(
...     [
...         Cast(columns=["col1"], dtype=pl.Float32),
...         Cast(columns=["col2"], dtype=pl.Int64),
...     ]
... )
>>> transformer
SequentialDataFrameTransformer(
  (0): CastDataFrameTransformer(columns=('col1',), dtype=Float32)
  (1): CastDataFrameTransformer(columns=('col2',), dtype=Int64)
)
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ f32  ┆ i64  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1.0  ┆ 1    ┆ a     ┆ a     │
│ 2.0  ┆ 2    ┆  b    ┆  b    │
│ 3.0  ┆ 3    ┆   c   ┆   c   │
│ 4.0  ┆ 4    ┆ d     ┆ d     │
│ 5.0  ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘

arctix.transformer.dataframe.Sort

Bases: BaseDataFrameTransformer

Implement a transformer to sort the DataFrame by the given columns.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to convert.

required
*args Any

The positional arguments to pass to sort.

()
**kwargs Any

The keyword arguments to pass to sort.

{}

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import Sort
>>> transformer = Sort(columns=["col3", "col1"])
>>> transformer
SortDataFrameTransformer(columns=('col3', 'col1'))
>>> frame = pl.DataFrame(
...     {"col1": [1, 2, None], "col2": [6.0, 5.0, 4.0], "col3": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ null ┆ 4.0  ┆ b    │
│ 2    ┆ 5.0  ┆ c    │
└──────┴──────┴──────┘

arctix.transformer.dataframe.SortColumns

Bases: BaseDataFrameTransformer

Implement a transformer to sort the DataFrame columns by name.

Parameters:

Name Type Description Default
reverse bool

If set to False, then the columns are sorted by alphabetical order.

False

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import SortColumns
>>> transformer = SortColumns()
>>> transformer
SortColumnsDataFrameTransformer(reverse=False)
>>> frame = pl.DataFrame(
...     {"col2": [1, 2, None], "col3": [6.0, 5.0, 4.0], "col1": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col2 ┆ col3 ┆ col1 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ str  ┆ i64  ┆ f64  │
╞══════╪══════╪══════╡
│ a    ┆ 1    ┆ 6.0  │
│ c    ┆ 2    ┆ 5.0  │
│ b    ┆ null ┆ 4.0  │
└──────┴──────┴──────┘

arctix.transformer.dataframe.SortColumnsDataFrameTransformer

Bases: BaseDataFrameTransformer

Implement a transformer to sort the DataFrame columns by name.

Parameters:

Name Type Description Default
reverse bool

If set to False, then the columns are sorted by alphabetical order.

False

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import SortColumns
>>> transformer = SortColumns()
>>> transformer
SortColumnsDataFrameTransformer(reverse=False)
>>> frame = pl.DataFrame(
...     {"col2": [1, 2, None], "col3": [6.0, 5.0, 4.0], "col1": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col2 ┆ col3 ┆ col1 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ str  ┆ i64  ┆ f64  │
╞══════╪══════╪══════╡
│ a    ┆ 1    ┆ 6.0  │
│ c    ┆ 2    ┆ 5.0  │
│ b    ┆ null ┆ 4.0  │
└──────┴──────┴──────┘

arctix.transformer.dataframe.SortDataFrameTransformer

Bases: BaseDataFrameTransformer

Implement a transformer to sort the DataFrame by the given columns.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to convert.

required
*args Any

The positional arguments to pass to sort.

()
**kwargs Any

The keyword arguments to pass to sort.

{}

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import Sort
>>> transformer = Sort(columns=["col3", "col1"])
>>> transformer
SortDataFrameTransformer(columns=('col3', 'col1'))
>>> frame = pl.DataFrame(
...     {"col1": [1, 2, None], "col2": [6.0, 5.0, 4.0], "col3": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ 2    ┆ 5.0  ┆ c    │
│ null ┆ 4.0  ┆ b    │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ 6.0  ┆ a    │
│ null ┆ 4.0  ┆ b    │
│ 2    ┆ 5.0  ┆ c    │
└──────┴──────┴──────┘

arctix.transformer.dataframe.StripChars

Bases: BaseDataFrameTransformer

Implement a transformer to remove leading and trailing characters.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to prepare.

required

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import StripChars
>>> transformer = StripChars(columns=["col2", "col3"])
>>> transformer
StripCharsDataFrameTransformer(columns=('col2', 'col3'))
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ str  ┆ str  ┆ str   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1    ┆ a    ┆ a     │
│ 2    ┆ 2    ┆ b    ┆  b    │
│ 3    ┆ 3    ┆ c    ┆   c   │
│ 4    ┆ 4    ┆ d    ┆ d     │
│ 5    ┆ 5    ┆ e    ┆ e     │
└──────┴──────┴──────┴───────┘

arctix.transformer.dataframe.StripCharsDataFrameTransformer

Bases: BaseDataFrameTransformer

Implement a transformer to remove leading and trailing characters.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to prepare.

required

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import StripChars
>>> transformer = StripChars(columns=["col2", "col3"])
>>> transformer
StripCharsDataFrameTransformer(columns=('col2', 'col3'))
>>> frame = pl.DataFrame(
...     {
...         "col1": [1, 2, 3, 4, 5],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["a ", " b", "  c  ", "d", "e"],
...         "col4": ["a ", " b", "  c  ", "d", "e"],
...     }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3  ┆ col4  │
│ ---  ┆ ---  ┆ ---   ┆ ---   │
│ i64  ┆ str  ┆ str   ┆ str   │
╞══════╪══════╪═══════╪═══════╡
│ 1    ┆ 1    ┆ a     ┆ a     │
│ 2    ┆ 2    ┆  b    ┆  b    │
│ 3    ┆ 3    ┆   c   ┆   c   │
│ 4    ┆ 4    ┆ d     ┆ d     │
│ 5    ┆ 5    ┆ e     ┆ e     │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4  │
│ ---  ┆ ---  ┆ ---  ┆ ---   │
│ i64  ┆ str  ┆ str  ┆ str   │
╞══════╪══════╪══════╪═══════╡
│ 1    ┆ 1    ┆ a    ┆ a     │
│ 2    ┆ 2    ┆ b    ┆  b    │
│ 3    ┆ 3    ┆ c    ┆   c   │
│ 4    ┆ 4    ┆ d    ┆ d     │
│ 5    ┆ 5    ┆ e    ┆ e     │
└──────┴──────┴──────┴───────┘

arctix.transformer.dataframe.TimeDiff

Bases: BaseDataFrameTransformer

Implement a transformer to compute the time difference between consecutive time steps.

Parameters:

Name Type Description Default
group_cols Sequence[str]

The columns used to generate the group for each sequence.

required
time_col str

The input time column name.

required
time_diff_col str

The output time difference column name.

required
shift int

The number of slots to shift.

1

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import TimeDiff
>>> transformer = TimeDiff(group_cols=["col"], time_col="time", time_diff_col="diff")
>>> transformer
TimeDiffDataFrameTransformer(group_cols=['col'], time_col=time, time_diff_col=diff, shift=1)
>>> frame = pl.DataFrame({"col": ["a", "b", "a", "a", "b"], "time": [1, 2, 3, 4, 5]})
>>> frame
shape: (5, 2)
┌─────┬──────┐
│ col ┆ time │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ a   ┆ 3    │
│ a   ┆ 4    │
│ b   ┆ 5    │
└─────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────┬──────┬──────┐
│ col ┆ time ┆ diff │
│ --- ┆ ---  ┆ ---  │
│ str ┆ i64  ┆ i64  │
╞═════╪══════╪══════╡
│ a   ┆ 1    ┆ 0    │
│ a   ┆ 3    ┆ 2    │
│ a   ┆ 4    ┆ 1    │
│ b   ┆ 2    ┆ 0    │
│ b   ┆ 5    ┆ 3    │
└─────┴──────┴──────┘

arctix.transformer.dataframe.TimeDiffDataFrameTransformer

Bases: BaseDataFrameTransformer

Implement a transformer to compute the time difference between consecutive time steps.

Parameters:

Name Type Description Default
group_cols Sequence[str]

The columns used to generate the group for each sequence.

required
time_col str

The input time column name.

required
time_diff_col str

The output time difference column name.

required
shift int

The number of slots to shift.

1

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import TimeDiff
>>> transformer = TimeDiff(group_cols=["col"], time_col="time", time_diff_col="diff")
>>> transformer
TimeDiffDataFrameTransformer(group_cols=['col'], time_col=time, time_diff_col=diff, shift=1)
>>> frame = pl.DataFrame({"col": ["a", "b", "a", "a", "b"], "time": [1, 2, 3, 4, 5]})
>>> frame
shape: (5, 2)
┌─────┬──────┐
│ col ┆ time │
│ --- ┆ ---  │
│ str ┆ i64  │
╞═════╪══════╡
│ a   ┆ 1    │
│ b   ┆ 2    │
│ a   ┆ 3    │
│ a   ┆ 4    │
│ b   ┆ 5    │
└─────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────┬──────┬──────┐
│ col ┆ time ┆ diff │
│ --- ┆ ---  ┆ ---  │
│ str ┆ i64  ┆ i64  │
╞═════╪══════╪══════╡
│ a   ┆ 1    ┆ 0    │
│ a   ┆ 3    ┆ 2    │
│ a   ┆ 4    ┆ 1    │
│ b   ┆ 2    ┆ 0    │
│ b   ┆ 5    ┆ 3    │
└─────┴──────┴──────┘

arctix.transformer.dataframe.TimeToSecond

Bases: BaseDataFrameTransformer

Implement a transformer to convert a column with time values to seconds.

Parameters:

Name Type Description Default
in_col str

The input column with the time value to convert.

required
out_col str

The output column with the time in seconds.

required

Example usage:

>>> import datetime
>>> import polars as pl
>>> from arctix.transformer.dataframe import TimeToSecond
>>> transformer = TimeToSecond(in_col="time", out_col="second")
>>> transformer
TimeToSecondDataFrameTransformer(in_col=time, out_col=second)
>>> frame = pl.DataFrame(
...     {
...         "time": [
...             datetime.time(0, 0, 1, 890000),
...             datetime.time(0, 1, 1, 890000),
...             datetime.time(1, 1, 1, 890000),
...             datetime.time(0, 19, 19, 890000),
...             datetime.time(19, 19, 19, 890000),
...         ],
...         "col": ["a", "b", "c", "d", "e"],
...     },
...     schema={"time": pl.Time, "col": pl.String},
... )
>>> frame
shape: (5, 2)
┌──────────────┬─────┐
│ time         ┆ col │
│ ---          ┆ --- │
│ time         ┆ str │
╞══════════════╪═════╡
│ 00:00:01.890 ┆ a   │
│ 00:01:01.890 ┆ b   │
│ 01:01:01.890 ┆ c   │
│ 00:19:19.890 ┆ d   │
│ 19:19:19.890 ┆ e   │
└──────────────┴─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────────┬─────┬──────────┐
│ time         ┆ col ┆ second   │
│ ---          ┆ --- ┆ ---      │
│ time         ┆ str ┆ f64      │
╞══════════════╪═════╪══════════╡
│ 00:00:01.890 ┆ a   ┆ 1.89     │
│ 00:01:01.890 ┆ b   ┆ 61.89    │
│ 01:01:01.890 ┆ c   ┆ 3661.89  │
│ 00:19:19.890 ┆ d   ┆ 1159.89  │
│ 19:19:19.890 ┆ e   ┆ 69559.89 │
└──────────────┴─────┴──────────┘

arctix.transformer.dataframe.TimeToSecondDataFrameTransformer

Bases: BaseDataFrameTransformer

Implement a transformer to convert a column with time values to seconds.

Parameters:

Name Type Description Default
in_col str

The input column with the time value to convert.

required
out_col str

The output column with the time in seconds.

required

Example usage:

>>> import datetime
>>> import polars as pl
>>> from arctix.transformer.dataframe import TimeToSecond
>>> transformer = TimeToSecond(in_col="time", out_col="second")
>>> transformer
TimeToSecondDataFrameTransformer(in_col=time, out_col=second)
>>> frame = pl.DataFrame(
...     {
...         "time": [
...             datetime.time(0, 0, 1, 890000),
...             datetime.time(0, 1, 1, 890000),
...             datetime.time(1, 1, 1, 890000),
...             datetime.time(0, 19, 19, 890000),
...             datetime.time(19, 19, 19, 890000),
...         ],
...         "col": ["a", "b", "c", "d", "e"],
...     },
...     schema={"time": pl.Time, "col": pl.String},
... )
>>> frame
shape: (5, 2)
┌──────────────┬─────┐
│ time         ┆ col │
│ ---          ┆ --- │
│ time         ┆ str │
╞══════════════╪═════╡
│ 00:00:01.890 ┆ a   │
│ 00:01:01.890 ┆ b   │
│ 01:01:01.890 ┆ c   │
│ 00:19:19.890 ┆ d   │
│ 19:19:19.890 ┆ e   │
└──────────────┴─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────────┬─────┬──────────┐
│ time         ┆ col ┆ second   │
│ ---          ┆ --- ┆ ---      │
│ time         ┆ str ┆ f64      │
╞══════════════╪═════╪══════════╡
│ 00:00:01.890 ┆ a   ┆ 1.89     │
│ 00:01:01.890 ┆ b   ┆ 61.89    │
│ 01:01:01.890 ┆ c   ┆ 3661.89  │
│ 00:19:19.890 ┆ d   ┆ 1159.89  │
│ 19:19:19.890 ┆ e   ┆ 69559.89 │
└──────────────┴─────┴──────────┘

arctix.transformer.dataframe.ToTime

Bases: BaseDataFrameTransformer

Implement a transformer to convert some columns to a polars.Time type.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to convert.

required
format str | None

Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%H:%M:%S". If set to None (default), the format is inferred from the data.

None

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import ToTime
>>> transformer = ToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
ToTimeDataFrameTransformer(columns=('col1',), format=%H:%M:%S)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ time     ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘

arctix.transformer.dataframe.ToTimeDataFrameTransformer

Bases: BaseDataFrameTransformer

Implement a transformer to convert some columns to a polars.Time type.

Parameters:

Name Type Description Default
columns Sequence[str]

The columns to convert.

required
format str | None

Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: "%H:%M:%S". If set to None (default), the format is inferred from the data.

None

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import ToTime
>>> transformer = ToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
ToTimeDataFrameTransformer(columns=('col1',), format=%H:%M:%S)
>>> frame = pl.DataFrame(
...     {
...         "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...         "col2": ["1", "2", "3", "4", "5"],
...         "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
...     }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ str      ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1     ┆ col2 ┆ col3     │
│ ---      ┆ ---  ┆ ---      │
│ time     ┆ str  ┆ str      │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1    ┆ 01:01:01 │
│ 02:02:02 ┆ 2    ┆ 02:02:02 │
│ 12:00:01 ┆ 3    ┆ 12:00:01 │
│ 18:18:18 ┆ 4    ┆ 18:18:18 │
│ 23:59:59 ┆ 5    ┆ 23:59:59 │
└──────────┴──────┴──────────┘

arctix.transformer.dataframe.TokenToIndex

Bases: ReplaceStrictDataFrameTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
vocab Vocabulary

The vocabulary which contains the token to index mapping.

required
token_column str

The column name which contains the input tokens.

required
index_column str

The column name which contains the output indices.

required
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> from collections import Counter
>>> import polars as pl
>>> from arctix.transformer.dataframe import TokenToIndex
>>> from arctix.utils.vocab import Vocabulary
>>> vocab = Vocabulary(Counter({"b": 3, "a": 1, "c": 2, "d": 4}))
>>> vocab.get_token_to_index()
{'b': 0, 'a': 1, 'c': 2, 'd': 3}
>>> transformer = TokenToIndex(vocab=vocab, token_column="col", index_column="index")
>>> transformer
TokenToIndexDataFrameTransformer(orig_column=col, final_column=index)
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "a"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ a   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬───────┐
│ col ┆ index │
│ --- ┆ ---   │
│ str ┆ i64   │
╞═════╪═══════╡
│ a   ┆ 1     │
│ b   ┆ 0     │
│ c   ┆ 2     │
│ d   ┆ 3     │
│ a   ┆ 1     │
└─────┴───────┘

arctix.transformer.dataframe.TokenToIndexDataFrameTransformer

Bases: ReplaceStrictDataFrameTransformer

Replace the values in a column by the values in a mapping.

Parameters:

Name Type Description Default
vocab Vocabulary

The vocabulary which contains the token to index mapping.

required
token_column str

The column name which contains the input tokens.

required
index_column str

The column name which contains the output indices.

required
**kwargs Any

The keyword arguments to pass to replace.

{}

Example usage:

>>> from collections import Counter
>>> import polars as pl
>>> from arctix.transformer.dataframe import TokenToIndex
>>> from arctix.utils.vocab import Vocabulary
>>> vocab = Vocabulary(Counter({"b": 3, "a": 1, "c": 2, "d": 4}))
>>> vocab.get_token_to_index()
{'b': 0, 'a': 1, 'c': 2, 'd': 3}
>>> transformer = TokenToIndex(vocab=vocab, token_column="col", index_column="index")
>>> transformer
TokenToIndexDataFrameTransformer(orig_column=col, final_column=index)
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "a"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a   │
│ b   │
│ c   │
│ d   │
│ a   │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬───────┐
│ col ┆ index │
│ --- ┆ ---   │
│ str ┆ i64   │
╞═════╪═══════╡
│ a   ┆ 1     │
│ b   ┆ 0     │
│ c   ┆ 2     │
│ d   ┆ 3     │
│ a   ┆ 1     │
└─────┴───────┘

arctix.transformer.dataframe.is_dataframe_transformer_config

is_dataframe_transformer_config(config: dict) -> bool

Indicate if the input configuration is a configuration for a BaseDataFrameTransformer.

This function only checks if the value of the key _target_ is valid. It does not check the other values. If _target_ indicates a function, the returned type hint is used to check the class.

Parameters:

Name Type Description Default
config dict

Specifies the configuration to check.

required

Returns:

Type Description
bool

True if the input configuration is a configuration for a BaseDataFrameTransformer object.

Example usage:

>>> from arctix.transformer.dataframe import is_dataframe_transformer_config
>>> is_dataframe_transformer_config(
...     {"_target_": "arctix.transformer.dataframe.Cast", "columns": ["col1", "col3"]}
... )
True

arctix.transformer.dataframe.setup_dataframe_transformer

setup_dataframe_transformer(
    transformer: BaseDataFrameTransformer | dict,
) -> BaseDataFrameTransformer

Set up a polars.DataFrame transformer.

The transformer is instantiated from its configuration by using the BaseDataFrameTransformer factory function.

Parameters:

Name Type Description Default
transformer BaseDataFrameTransformer | dict

Specifies a polars.DataFrame transformer or its configuration.

required

Returns:

Type Description
BaseDataFrameTransformer

An instantiated transformer.

Example usage:

>>> import polars as pl
>>> from arctix.transformer.dataframe import setup_dataframe_transformer
>>> transformer = setup_dataframe_transformer(
...     {
...         "_target_": "arctix.transformer.dataframe.Cast",
...         "columns": ["col1", "col3"],
...         "dtype": pl.Int64,
...     }
... )
>>> transformer
CastDataFrameTransformer(columns=('col1', 'col3'), dtype=Int64)