arctix.transformer
arctix.transformer.dataframe ¶
Contain DataFrame transformers.
arctix.transformer.dataframe.BaseDataFrameTransformer ¶
Bases: ABC
Define the base class to transform a polars.DataFrame
.
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
CastDataFrameTransformer(columns=('col1', 'col3'), dtype=Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
arctix.transformer.dataframe.BaseDataFrameTransformer.transform ¶
transform(frame: DataFrame) -> DataFrame
Transform the data in the polars.DataFrame
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
Specifies the |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The transformed DataFrame. |
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
arctix.transformer.dataframe.Cast ¶
Bases: BaseDataFrameTransformer
Implement a transformer to convert some columns to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
CastDataFrameTransformer(columns=('col1', 'col3'), dtype=Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
arctix.transformer.dataframe.CastDataFrameTransformer ¶
Bases: BaseDataFrameTransformer
Implement a transformer to convert some columns to a new data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to convert. |
required |
dtype
|
type[DataType]
|
The target data type. |
required |
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import Cast
>>> transformer = Cast(columns=["col1", "col3"], dtype=pl.Int32)
>>> transformer
CastDataFrameTransformer(columns=('col1', 'col3'), dtype=Int32)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 ┆ str │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 ┆ a │
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 3 ┆ 3 ┆ 3 ┆ c │
│ 4 ┆ 4 ┆ 4 ┆ d │
│ 5 ┆ 5 ┆ 5 ┆ e │
└──────┴──────┴──────┴──────┘
arctix.transformer.dataframe.Diff ¶
Bases: BaseDataFrameTransformer
Implement a transformer to compute the first discrete difference between shifted items.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name. |
required |
out_col
|
str
|
The output column name. |
required |
shift
|
int
|
The number of slots to shift. |
1
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import Diff
>>> transformer = Diff(in_col="col1", out_col="diff")
>>> transformer
DiffDataFrameTransformer(in_col=col1, out_col=diff, shift=1)
>>> frame = pl.DataFrame({"col1": [1, 2, 3, 4, 5], "col2": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ i64 ┆ str │
╞══════╪══════╡
│ 1 ┆ a │
│ 2 ┆ b │
│ 3 ┆ c │
│ 4 ┆ d │
│ 5 ┆ e │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ diff │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╡
│ 1 ┆ a ┆ null │
│ 2 ┆ b ┆ 1 │
│ 3 ┆ c ┆ 1 │
│ 4 ┆ d ┆ 1 │
│ 5 ┆ e ┆ 1 │
└──────┴──────┴──────┘
arctix.transformer.dataframe.DiffDataFrameTransformer ¶
Bases: BaseDataFrameTransformer
Implement a transformer to compute the first discrete difference between shifted items.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column name. |
required |
out_col
|
str
|
The output column name. |
required |
shift
|
int
|
The number of slots to shift. |
1
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import Diff
>>> transformer = Diff(in_col="col1", out_col="diff")
>>> transformer
DiffDataFrameTransformer(in_col=col1, out_col=diff, shift=1)
>>> frame = pl.DataFrame({"col1": [1, 2, 3, 4, 5], "col2": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ i64 ┆ str │
╞══════╪══════╡
│ 1 ┆ a │
│ 2 ┆ b │
│ 3 ┆ c │
│ 4 ┆ d │
│ 5 ┆ e │
└──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ diff │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════╡
│ 1 ┆ a ┆ null │
│ 2 ┆ b ┆ 1 │
│ 3 ┆ c ┆ 1 │
│ 4 ┆ d ┆ 1 │
│ 5 ┆ e ┆ 1 │
└──────┴──────┴──────┘
arctix.transformer.dataframe.Function ¶
Bases: BaseDataFrameTransformer
Implement a transformer that is a wrapper around a function to transform the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func
|
Callable[[DataFrame], DataFrame]
|
The function to transform the DataFrame. |
required |
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import FunctionDataFrameTransformer
>>> transformer = FunctionDataFrameTransformer(
... func=lambda frame: frame.filter(pl.col("col1").is_in({2, 4}))
... )
>>> transformer
FunctionDataFrameTransformer(func=<function <lambda> at 0x...>)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (2, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 4 ┆ 4 ┆ 4 ┆ d │
└──────┴──────┴──────┴──────┘
arctix.transformer.dataframe.FunctionDataFrameTransformer ¶
Bases: BaseDataFrameTransformer
Implement a transformer that is a wrapper around a function to transform the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func
|
Callable[[DataFrame], DataFrame]
|
The function to transform the DataFrame. |
required |
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import FunctionDataFrameTransformer
>>> transformer = FunctionDataFrameTransformer(
... func=lambda frame: frame.filter(pl.col("col1").is_in({2, 4}))
... )
>>> transformer
FunctionDataFrameTransformer(func=<function <lambda> at 0x...>)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["1", "2", "3", "4", "5"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> out = transformer.transform(frame)
>>> out
shape: (2, 4)
┌──────┬──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪══════╡
│ 2 ┆ 2 ┆ 2 ┆ b │
│ 4 ┆ 4 ┆ 4 ┆ d │
└──────┴──────┴──────┴──────┘
arctix.transformer.dataframe.IndexToToken ¶
Bases: ReplaceStrictDataFrameTransformer
Replace.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vocab
|
Vocabulary
|
The vocabulary which contains the index to token mapping. |
required |
index_column
|
str
|
The column name which contains the input indices. |
required |
token_column
|
str
|
The column name which contains the output tokens. |
required |
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> from collections import Counter
>>> import polars as pl
>>> from arctix.transformer.dataframe import IndexToToken
>>> from arctix.utils.vocab import Vocabulary
>>> vocab = Vocabulary(Counter({"b": 3, "a": 1, "c": 2, "d": 4}))
>>> vocab.get_index_to_token()
('b', 'a', 'c', 'd')
>>> transformer = IndexToToken(
... vocab=vocab,
... index_column="col",
... token_column="token",
... )
>>> transformer
IndexToTokenDataFrameTransformer(orig_column=col, final_column=token)
>>> frame = pl.DataFrame({"col": [1, 0, 2, 3, 1]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 0 │
│ 2 │
│ 3 │
│ 1 │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬───────┐
│ col ┆ token │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═══════╡
│ 1 ┆ a │
│ 0 ┆ b │
│ 2 ┆ c │
│ 3 ┆ d │
│ 1 ┆ a │
└─────┴───────┘
arctix.transformer.dataframe.IndexToTokenDataFrameTransformer ¶
Bases: ReplaceStrictDataFrameTransformer
Replace.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vocab
|
Vocabulary
|
The vocabulary which contains the index to token mapping. |
required |
index_column
|
str
|
The column name which contains the input indices. |
required |
token_column
|
str
|
The column name which contains the output tokens. |
required |
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> from collections import Counter
>>> import polars as pl
>>> from arctix.transformer.dataframe import IndexToToken
>>> from arctix.utils.vocab import Vocabulary
>>> vocab = Vocabulary(Counter({"b": 3, "a": 1, "c": 2, "d": 4}))
>>> vocab.get_index_to_token()
('b', 'a', 'c', 'd')
>>> transformer = IndexToToken(
... vocab=vocab,
... index_column="col",
... token_column="token",
... )
>>> transformer
IndexToTokenDataFrameTransformer(orig_column=col, final_column=token)
>>> frame = pl.DataFrame({"col": [1, 0, 2, 3, 1]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 0 │
│ 2 │
│ 3 │
│ 1 │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬───────┐
│ col ┆ token │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═══════╡
│ 1 ┆ a │
│ 0 ┆ b │
│ 2 ┆ c │
│ 3 ┆ d │
│ 1 ┆ a │
└─────┴───────┘
arctix.transformer.dataframe.JsonDecode ¶
Bases: BaseDataFrameTransformer
Implement a transformer to parse string values as JSON.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to parse. |
required |
dtype
|
PolarsDataType | PythonDataType | None
|
The dtype to cast the extracted value to.
If |
None
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import JsonDecode
>>> transformer = JsonDecode(columns=["col1", "col3"])
>>> transformer
JsonDecodeDataFrameTransformer(columns=('col1', 'col3'), dtype=None)
>>> frame = pl.DataFrame(
... {
... "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2] ┆ 1 ┆ ['1', '2'] ┆ a │
│ [2] ┆ 2 ┆ ['2'] ┆ b │
│ [1, 2, 3] ┆ 3 ┆ ['1', '2', '3'] ┆ c │
│ [4, 5] ┆ 4 ┆ ['4', '5'] ┆ d │
│ [5, 4] ┆ 5 ┆ ['5', '4'] ┆ e │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ list[i64] ┆ str ┆ list[str] ┆ str │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2] ┆ 1 ┆ ["1", "2"] ┆ a │
│ [2] ┆ 2 ┆ ["2"] ┆ b │
│ [1, 2, 3] ┆ 3 ┆ ["1", "2", "3"] ┆ c │
│ [4, 5] ┆ 4 ┆ ["4", "5"] ┆ d │
│ [5, 4] ┆ 5 ┆ ["5", "4"] ┆ e │
└───────────┴──────┴─────────────────┴──────┘
arctix.transformer.dataframe.JsonDecodeDataFrameTransformer ¶
Bases: BaseDataFrameTransformer
Implement a transformer to parse string values as JSON.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to parse. |
required |
dtype
|
PolarsDataType | PythonDataType | None
|
The dtype to cast the extracted value to.
If |
None
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import JsonDecode
>>> transformer = JsonDecode(columns=["col1", "col3"])
>>> transformer
JsonDecodeDataFrameTransformer(columns=('col1', 'col3'), dtype=None)
>>> frame = pl.DataFrame(
... {
... "col1": ["[1, 2]", "[2]", "[1, 2, 3]", "[4, 5]", "[5, 4]"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["['1', '2']", "['2']", "['1', '2', '3']", "['4', '5']", "['5', '4']"],
... "col4": ["a", "b", "c", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2] ┆ 1 ┆ ['1', '2'] ┆ a │
│ [2] ┆ 2 ┆ ['2'] ┆ b │
│ [1, 2, 3] ┆ 3 ┆ ['1', '2', '3'] ┆ c │
│ [4, 5] ┆ 4 ┆ ['4', '5'] ┆ d │
│ [5, 4] ┆ 5 ┆ ['5', '4'] ┆ e │
└───────────┴──────┴─────────────────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌───────────┬──────┬─────────────────┬──────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ list[i64] ┆ str ┆ list[str] ┆ str │
╞═══════════╪══════╪═════════════════╪══════╡
│ [1, 2] ┆ 1 ┆ ["1", "2"] ┆ a │
│ [2] ┆ 2 ┆ ["2"] ┆ b │
│ [1, 2, 3] ┆ 3 ┆ ["1", "2", "3"] ┆ c │
│ [4, 5] ┆ 4 ┆ ["4", "5"] ┆ d │
│ [5, 4] ┆ 5 ┆ ["5", "4"] ┆ e │
└───────────┴──────┴─────────────────┴──────┘
arctix.transformer.dataframe.Replace ¶
Bases: BaseDataFrameTransformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
orig_column
|
str
|
The original column name. |
required |
final_column
|
str
|
The final column name. |
required |
*args
|
Any
|
The positional arguments to pass to |
()
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import Replace
>>> transformer = Replace(
... orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3}
... )
>>> transformer
ReplaceDataFrameTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ d │
│ e ┆ e │
└─────┴─────┘
>>> transformer = Replace(
... orig_column="old",
... final_column="new",
... old={"a": 1, "b": 2, "c": 3},
... default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ null │
│ e ┆ null │
└─────┴──────┘
arctix.transformer.dataframe.ReplaceDataFrameTransformer ¶
Bases: BaseDataFrameTransformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
orig_column
|
str
|
The original column name. |
required |
final_column
|
str
|
The final column name. |
required |
*args
|
Any
|
The positional arguments to pass to |
()
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import Replace
>>> transformer = Replace(
... orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3}
... )
>>> transformer
ReplaceDataFrameTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ d │
│ e ┆ e │
└─────┴─────┘
>>> transformer = Replace(
... orig_column="old",
... final_column="new",
... old={"a": 1, "b": 2, "c": 3},
... default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ null │
│ e ┆ null │
└─────┴──────┘
arctix.transformer.dataframe.ReplaceStrict ¶
Bases: BaseDataFrameTransformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
orig_column
|
str
|
The original column name. |
required |
final_column
|
str
|
The final column name. |
required |
*args
|
Any
|
The positional arguments to pass to |
()
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import ReplaceStrict
>>> transformer = ReplaceStrict(
... orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
ReplaceStrictDataFrameTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ 4 │
│ e ┆ 5 │
└─────┴─────┘
>>> transformer = ReplaceStrict(
... orig_column="old",
... final_column="new",
... old={"a": 1, "b": 2, "c": 3},
... default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ null │
│ e ┆ null │
└─────┴──────┘
arctix.transformer.dataframe.ReplaceStrictDataFrameTransformer ¶
Bases: BaseDataFrameTransformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
orig_column
|
str
|
The original column name. |
required |
final_column
|
str
|
The final column name. |
required |
*args
|
Any
|
The positional arguments to pass to |
()
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import ReplaceStrict
>>> transformer = ReplaceStrict(
... orig_column="old", final_column="new", old={"a": 1, "b": 2, "c": 3, "d": 4, "e": 5}
... )
>>> transformer
ReplaceStrictDataFrameTransformer(orig_column=old, final_column=new)
>>> frame = pl.DataFrame({"old": ["a", "b", "c", "d", "e"]})
>>> frame
shape: (5, 1)
┌─────┐
│ old │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ e │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬─────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ 4 │
│ e ┆ 5 │
└─────┴─────┘
>>> transformer = ReplaceStrict(
... orig_column="old",
... final_column="new",
... old={"a": 1, "b": 2, "c": 3},
... default=None,
... )
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬──────┐
│ old ┆ new │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ c ┆ 3 │
│ d ┆ null │
│ e ┆ null │
└─────┴──────┘
arctix.transformer.dataframe.Sequential ¶
Bases: BaseDataFrameTransformer
Implement a polars.DataFrame
transformer to apply
sequentially several transformers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transformers
|
Sequence[BaseDataFrameTransformer | dict]
|
The transformers or their configurations. |
required |
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import (
... Sequential,
... Cast,
... )
>>> transformer = Sequential(
... [
... Cast(columns=["col1"], dtype=pl.Float32),
... Cast(columns=["col2"], dtype=pl.Int64),
... ]
... )
>>> transformer
SequentialDataFrameTransformer(
(0): CastDataFrameTransformer(columns=('col1',), dtype=Float32)
(1): CastDataFrameTransformer(columns=('col2',), dtype=Int64)
)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f32 ┆ i64 ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1.0 ┆ 1 ┆ a ┆ a │
│ 2.0 ┆ 2 ┆ b ┆ b │
│ 3.0 ┆ 3 ┆ c ┆ c │
│ 4.0 ┆ 4 ┆ d ┆ d │
│ 5.0 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
arctix.transformer.dataframe.SequentialDataFrameTransformer ¶
Bases: BaseDataFrameTransformer
Implement a polars.DataFrame
transformer to apply
sequentially several transformers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transformers
|
Sequence[BaseDataFrameTransformer | dict]
|
The transformers or their configurations. |
required |
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import (
... Sequential,
... Cast,
... )
>>> transformer = Sequential(
... [
... Cast(columns=["col1"], dtype=pl.Float32),
... Cast(columns=["col2"], dtype=pl.Int64),
... ]
... )
>>> transformer
SequentialDataFrameTransformer(
(0): CastDataFrameTransformer(columns=('col1',), dtype=Float32)
(1): CastDataFrameTransformer(columns=('col2',), dtype=Int64)
)
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f32 ┆ i64 ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1.0 ┆ 1 ┆ a ┆ a │
│ 2.0 ┆ 2 ┆ b ┆ b │
│ 3.0 ┆ 3 ┆ c ┆ c │
│ 4.0 ┆ 4 ┆ d ┆ d │
│ 5.0 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
arctix.transformer.dataframe.Sort ¶
Bases: BaseDataFrameTransformer
Implement a transformer to sort the DataFrame by the given columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to convert. |
required |
*args
|
Any
|
The positional arguments to pass to |
()
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import Sort
>>> transformer = Sort(columns=["col3", "col1"])
>>> transformer
SortDataFrameTransformer(columns=('col3', 'col1'))
>>> frame = pl.DataFrame(
... {"col1": [1, 2, None], "col2": [6.0, 5.0, 4.0], "col3": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 5.0 ┆ c │
│ null ┆ 4.0 ┆ b │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ null ┆ 4.0 ┆ b │
│ 2 ┆ 5.0 ┆ c │
└──────┴──────┴──────┘
arctix.transformer.dataframe.SortColumns ¶
Bases: BaseDataFrameTransformer
Implement a transformer to sort the DataFrame columns by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
reverse
|
bool
|
If set to |
False
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import SortColumns
>>> transformer = SortColumns()
>>> transformer
SortColumnsDataFrameTransformer(reverse=False)
>>> frame = pl.DataFrame(
... {"col2": [1, 2, None], "col3": [6.0, 5.0, 4.0], "col1": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col2 ┆ col3 ┆ col1 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 5.0 ┆ c │
│ null ┆ 4.0 ┆ b │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 │
╞══════╪══════╪══════╡
│ a ┆ 1 ┆ 6.0 │
│ c ┆ 2 ┆ 5.0 │
│ b ┆ null ┆ 4.0 │
└──────┴──────┴──────┘
arctix.transformer.dataframe.SortColumnsDataFrameTransformer ¶
Bases: BaseDataFrameTransformer
Implement a transformer to sort the DataFrame columns by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
reverse
|
bool
|
If set to |
False
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import SortColumns
>>> transformer = SortColumns()
>>> transformer
SortColumnsDataFrameTransformer(reverse=False)
>>> frame = pl.DataFrame(
... {"col2": [1, 2, None], "col3": [6.0, 5.0, 4.0], "col1": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col2 ┆ col3 ┆ col1 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 5.0 ┆ c │
│ null ┆ 4.0 ┆ b │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 │
╞══════╪══════╪══════╡
│ a ┆ 1 ┆ 6.0 │
│ c ┆ 2 ┆ 5.0 │
│ b ┆ null ┆ 4.0 │
└──────┴──────┴──────┘
arctix.transformer.dataframe.SortDataFrameTransformer ¶
Bases: BaseDataFrameTransformer
Implement a transformer to sort the DataFrame by the given columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to convert. |
required |
*args
|
Any
|
The positional arguments to pass to |
()
|
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import Sort
>>> transformer = Sort(columns=["col3", "col1"])
>>> transformer
SortDataFrameTransformer(columns=('col3', 'col1'))
>>> frame = pl.DataFrame(
... {"col1": [1, 2, None], "col2": [6.0, 5.0, 4.0], "col3": ["a", "c", "b"]}
... )
>>> frame
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 5.0 ┆ c │
│ null ┆ 4.0 ┆ b │
└──────┴──────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (3, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪══════╪══════╡
│ 1 ┆ 6.0 ┆ a │
│ null ┆ 4.0 ┆ b │
│ 2 ┆ 5.0 ┆ c │
└──────┴──────┴──────┘
arctix.transformer.dataframe.StripChars ¶
Bases: BaseDataFrameTransformer
Implement a transformer to remove leading and trailing characters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to prepare. |
required |
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import StripChars
>>> transformer = StripChars(columns=["col2", "col3"])
>>> transformer
StripCharsDataFrameTransformer(columns=('col2', 'col3'))
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴──────┴───────┘
arctix.transformer.dataframe.StripCharsDataFrameTransformer ¶
Bases: BaseDataFrameTransformer
Implement a transformer to remove leading and trailing characters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to prepare. |
required |
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import StripChars
>>> transformer = StripChars(columns=["col2", "col3"])
>>> transformer
StripCharsDataFrameTransformer(columns=('col2', 'col3'))
>>> frame = pl.DataFrame(
... {
... "col1": [1, 2, 3, 4, 5],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["a ", " b", " c ", "d", "e"],
... "col4": ["a ", " b", " c ", "d", "e"],
... }
... )
>>> frame
shape: (5, 4)
┌──────┬──────┬───────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴───────┴───────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 4)
┌──────┬──────┬──────┬───────┐
│ col1 ┆ col2 ┆ col3 ┆ col4 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪══════╪══════╪═══════╡
│ 1 ┆ 1 ┆ a ┆ a │
│ 2 ┆ 2 ┆ b ┆ b │
│ 3 ┆ 3 ┆ c ┆ c │
│ 4 ┆ 4 ┆ d ┆ d │
│ 5 ┆ 5 ┆ e ┆ e │
└──────┴──────┴──────┴───────┘
arctix.transformer.dataframe.TimeDiff ¶
Bases: BaseDataFrameTransformer
Implement a transformer to compute the time difference between consecutive time steps.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
group_cols
|
Sequence[str]
|
The columns used to generate the group for each sequence. |
required |
time_col
|
str
|
The input time column name. |
required |
time_diff_col
|
str
|
The output time difference column name. |
required |
shift
|
int
|
The number of slots to shift. |
1
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import TimeDiff
>>> transformer = TimeDiff(group_cols=["col"], time_col="time", time_diff_col="diff")
>>> transformer
TimeDiffDataFrameTransformer(group_cols=['col'], time_col=time, time_diff_col=diff, shift=1)
>>> frame = pl.DataFrame({"col": ["a", "b", "a", "a", "b"], "time": [1, 2, 3, 4, 5]})
>>> frame
shape: (5, 2)
┌─────┬──────┐
│ col ┆ time │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ a ┆ 3 │
│ a ┆ 4 │
│ b ┆ 5 │
└─────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────┬──────┬──────┐
│ col ┆ time ┆ diff │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪══════╪══════╡
│ a ┆ 1 ┆ 0 │
│ a ┆ 3 ┆ 2 │
│ a ┆ 4 ┆ 1 │
│ b ┆ 2 ┆ 0 │
│ b ┆ 5 ┆ 3 │
└─────┴──────┴──────┘
arctix.transformer.dataframe.TimeDiffDataFrameTransformer ¶
Bases: BaseDataFrameTransformer
Implement a transformer to compute the time difference between consecutive time steps.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
group_cols
|
Sequence[str]
|
The columns used to generate the group for each sequence. |
required |
time_col
|
str
|
The input time column name. |
required |
time_diff_col
|
str
|
The output time difference column name. |
required |
shift
|
int
|
The number of slots to shift. |
1
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import TimeDiff
>>> transformer = TimeDiff(group_cols=["col"], time_col="time", time_diff_col="diff")
>>> transformer
TimeDiffDataFrameTransformer(group_cols=['col'], time_col=time, time_diff_col=diff, shift=1)
>>> frame = pl.DataFrame({"col": ["a", "b", "a", "a", "b"], "time": [1, 2, 3, 4, 5]})
>>> frame
shape: (5, 2)
┌─────┬──────┐
│ col ┆ time │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════╡
│ a ┆ 1 │
│ b ┆ 2 │
│ a ┆ 3 │
│ a ┆ 4 │
│ b ┆ 5 │
└─────┴──────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌─────┬──────┬──────┐
│ col ┆ time ┆ diff │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪══════╪══════╡
│ a ┆ 1 ┆ 0 │
│ a ┆ 3 ┆ 2 │
│ a ┆ 4 ┆ 1 │
│ b ┆ 2 ┆ 0 │
│ b ┆ 5 ┆ 3 │
└─────┴──────┴──────┘
arctix.transformer.dataframe.TimeToSecond ¶
Bases: BaseDataFrameTransformer
Implement a transformer to convert a column with time values to seconds.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column with the time value to convert. |
required |
out_col
|
str
|
The output column with the time in seconds. |
required |
Example usage:
>>> import datetime
>>> import polars as pl
>>> from arctix.transformer.dataframe import TimeToSecond
>>> transformer = TimeToSecond(in_col="time", out_col="second")
>>> transformer
TimeToSecondDataFrameTransformer(in_col=time, out_col=second)
>>> frame = pl.DataFrame(
... {
... "time": [
... datetime.time(0, 0, 1, 890000),
... datetime.time(0, 1, 1, 890000),
... datetime.time(1, 1, 1, 890000),
... datetime.time(0, 19, 19, 890000),
... datetime.time(19, 19, 19, 890000),
... ],
... "col": ["a", "b", "c", "d", "e"],
... },
... schema={"time": pl.Time, "col": pl.String},
... )
>>> frame
shape: (5, 2)
┌──────────────┬─────┐
│ time ┆ col │
│ --- ┆ --- │
│ time ┆ str │
╞══════════════╪═════╡
│ 00:00:01.890 ┆ a │
│ 00:01:01.890 ┆ b │
│ 01:01:01.890 ┆ c │
│ 00:19:19.890 ┆ d │
│ 19:19:19.890 ┆ e │
└──────────────┴─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────────┬─────┬──────────┐
│ time ┆ col ┆ second │
│ --- ┆ --- ┆ --- │
│ time ┆ str ┆ f64 │
╞══════════════╪═════╪══════════╡
│ 00:00:01.890 ┆ a ┆ 1.89 │
│ 00:01:01.890 ┆ b ┆ 61.89 │
│ 01:01:01.890 ┆ c ┆ 3661.89 │
│ 00:19:19.890 ┆ d ┆ 1159.89 │
│ 19:19:19.890 ┆ e ┆ 69559.89 │
└──────────────┴─────┴──────────┘
arctix.transformer.dataframe.TimeToSecondDataFrameTransformer ¶
Bases: BaseDataFrameTransformer
Implement a transformer to convert a column with time values to seconds.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_col
|
str
|
The input column with the time value to convert. |
required |
out_col
|
str
|
The output column with the time in seconds. |
required |
Example usage:
>>> import datetime
>>> import polars as pl
>>> from arctix.transformer.dataframe import TimeToSecond
>>> transformer = TimeToSecond(in_col="time", out_col="second")
>>> transformer
TimeToSecondDataFrameTransformer(in_col=time, out_col=second)
>>> frame = pl.DataFrame(
... {
... "time": [
... datetime.time(0, 0, 1, 890000),
... datetime.time(0, 1, 1, 890000),
... datetime.time(1, 1, 1, 890000),
... datetime.time(0, 19, 19, 890000),
... datetime.time(19, 19, 19, 890000),
... ],
... "col": ["a", "b", "c", "d", "e"],
... },
... schema={"time": pl.Time, "col": pl.String},
... )
>>> frame
shape: (5, 2)
┌──────────────┬─────┐
│ time ┆ col │
│ --- ┆ --- │
│ time ┆ str │
╞══════════════╪═════╡
│ 00:00:01.890 ┆ a │
│ 00:01:01.890 ┆ b │
│ 01:01:01.890 ┆ c │
│ 00:19:19.890 ┆ d │
│ 19:19:19.890 ┆ e │
└──────────────┴─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────────┬─────┬──────────┐
│ time ┆ col ┆ second │
│ --- ┆ --- ┆ --- │
│ time ┆ str ┆ f64 │
╞══════════════╪═════╪══════════╡
│ 00:00:01.890 ┆ a ┆ 1.89 │
│ 00:01:01.890 ┆ b ┆ 61.89 │
│ 01:01:01.890 ┆ c ┆ 3661.89 │
│ 00:19:19.890 ┆ d ┆ 1159.89 │
│ 19:19:19.890 ┆ e ┆ 69559.89 │
└──────────────┴─────┴──────────┘
arctix.transformer.dataframe.ToTime ¶
Bases: BaseDataFrameTransformer
Implement a transformer to convert some columns to a
polars.Time
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to convert. |
required |
format
|
str | None
|
Format to use for conversion. Refer to the
chrono crate documentation
for the full specification. Example: |
None
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import ToTime
>>> transformer = ToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
ToTimeDataFrameTransformer(columns=('col1',), format=%H:%M:%S)
>>> frame = pl.DataFrame(
... {
... "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ time ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
arctix.transformer.dataframe.ToTimeDataFrameTransformer ¶
Bases: BaseDataFrameTransformer
Implement a transformer to convert some columns to a
polars.Time
type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Sequence[str]
|
The columns to convert. |
required |
format
|
str | None
|
Format to use for conversion. Refer to the
chrono crate documentation
for the full specification. Example: |
None
|
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import ToTime
>>> transformer = ToTime(columns=["col1"], format="%H:%M:%S")
>>> transformer
ToTimeDataFrameTransformer(columns=('col1',), format=%H:%M:%S)
>>> frame = pl.DataFrame(
... {
... "col1": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... "col2": ["1", "2", "3", "4", "5"],
... "col3": ["01:01:01", "02:02:02", "12:00:01", "18:18:18", "23:59:59"],
... }
... )
>>> frame
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 3)
┌──────────┬──────┬──────────┐
│ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- │
│ time ┆ str ┆ str │
╞══════════╪══════╪══════════╡
│ 01:01:01 ┆ 1 ┆ 01:01:01 │
│ 02:02:02 ┆ 2 ┆ 02:02:02 │
│ 12:00:01 ┆ 3 ┆ 12:00:01 │
│ 18:18:18 ┆ 4 ┆ 18:18:18 │
│ 23:59:59 ┆ 5 ┆ 23:59:59 │
└──────────┴──────┴──────────┘
arctix.transformer.dataframe.TokenToIndex ¶
Bases: ReplaceStrictDataFrameTransformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vocab
|
Vocabulary
|
The vocabulary which contains the token to index mapping. |
required |
token_column
|
str
|
The column name which contains the input tokens. |
required |
index_column
|
str
|
The column name which contains the output indices. |
required |
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> from collections import Counter
>>> import polars as pl
>>> from arctix.transformer.dataframe import TokenToIndex
>>> from arctix.utils.vocab import Vocabulary
>>> vocab = Vocabulary(Counter({"b": 3, "a": 1, "c": 2, "d": 4}))
>>> vocab.get_token_to_index()
{'b': 0, 'a': 1, 'c': 2, 'd': 3}
>>> transformer = TokenToIndex(vocab=vocab, token_column="col", index_column="index")
>>> transformer
TokenToIndexDataFrameTransformer(orig_column=col, final_column=index)
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "a"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ a │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬───────┐
│ col ┆ index │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═══════╡
│ a ┆ 1 │
│ b ┆ 0 │
│ c ┆ 2 │
│ d ┆ 3 │
│ a ┆ 1 │
└─────┴───────┘
arctix.transformer.dataframe.TokenToIndexDataFrameTransformer ¶
Bases: ReplaceStrictDataFrameTransformer
Replace the values in a column by the values in a mapping.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vocab
|
Vocabulary
|
The vocabulary which contains the token to index mapping. |
required |
token_column
|
str
|
The column name which contains the input tokens. |
required |
index_column
|
str
|
The column name which contains the output indices. |
required |
**kwargs
|
Any
|
The keyword arguments to pass to |
{}
|
Example usage:
>>> from collections import Counter
>>> import polars as pl
>>> from arctix.transformer.dataframe import TokenToIndex
>>> from arctix.utils.vocab import Vocabulary
>>> vocab = Vocabulary(Counter({"b": 3, "a": 1, "c": 2, "d": 4}))
>>> vocab.get_token_to_index()
{'b': 0, 'a': 1, 'c': 2, 'd': 3}
>>> transformer = TokenToIndex(vocab=vocab, token_column="col", index_column="index")
>>> transformer
TokenToIndexDataFrameTransformer(orig_column=col, final_column=index)
>>> frame = pl.DataFrame({"col": ["a", "b", "c", "d", "a"]})
>>> frame
shape: (5, 1)
┌─────┐
│ col │
│ --- │
│ str │
╞═════╡
│ a │
│ b │
│ c │
│ d │
│ a │
└─────┘
>>> out = transformer.transform(frame)
>>> out
shape: (5, 2)
┌─────┬───────┐
│ col ┆ index │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═══════╡
│ a ┆ 1 │
│ b ┆ 0 │
│ c ┆ 2 │
│ d ┆ 3 │
│ a ┆ 1 │
└─────┴───────┘
arctix.transformer.dataframe.is_dataframe_transformer_config ¶
is_dataframe_transformer_config(config: dict) -> bool
Indicate if the input configuration is a configuration for a
BaseDataFrameTransformer
.
This function only checks if the value of the key _target_
is valid. It does not check the other values. If _target_
indicates a function, the returned type hint is used to check
the class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
dict
|
Specifies the configuration to check. |
required |
Returns:
Type | Description |
---|---|
bool
|
|
Example usage:
>>> from arctix.transformer.dataframe import is_dataframe_transformer_config
>>> is_dataframe_transformer_config(
... {"_target_": "arctix.transformer.dataframe.Cast", "columns": ["col1", "col3"]}
... )
True
arctix.transformer.dataframe.setup_dataframe_transformer ¶
setup_dataframe_transformer(
transformer: BaseDataFrameTransformer | dict,
) -> BaseDataFrameTransformer
Set up a polars.DataFrame
transformer.
The transformer is instantiated from its configuration
by using the BaseDataFrameTransformer
factory function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transformer
|
BaseDataFrameTransformer | dict
|
Specifies a |
required |
Returns:
Type | Description |
---|---|
BaseDataFrameTransformer
|
An instantiated transformer. |
Example usage:
>>> import polars as pl
>>> from arctix.transformer.dataframe import setup_dataframe_transformer
>>> transformer = setup_dataframe_transformer(
... {
... "_target_": "arctix.transformer.dataframe.Cast",
... "columns": ["col1", "col3"],
... "dtype": pl.Int64,
... }
... )
>>> transformer
CastDataFrameTransformer(columns=('col1', 'col3'), dtype=Int64)