arkas.state¶

arkas.state ¶

Contain states.

arkas.state.AccuracyState ¶

Bases: BaseArgState

Implement the accuracy state.

Parameters:

Name	Type	Description	Default
`y_true`	`ndarray`	The ground truth target labels. This input must be an array of shape `(n_samples,)` where the values are in `{0, ..., n_classes-1}`.	required
`y_pred`	`ndarray`	The predicted labels. This input must be an array of shape `(n_samples,)` where the values are in `{0, ..., n_classes-1}`.	required
`y_true_name`	`str`	The name associated to the ground truth target labels.	required
`y_pred_name`	`str`	The name associated to the predicted labels.	required
`nan_policy`	`str`	The policy on how to handle NaN values in the input arrays. The following options are available: `'omit'`, `'propagate'`, and `'raise'`.	`'propagate'`

Example usage:

>>> import numpy as np
>>> from arkas.state import AccuracyState
>>> state = AccuracyState(
...     y_true=np.array([1, 0, 0, 1, 1]),
...     y_pred=np.array([1, 0, 0, 1, 1]),
...     y_true_name="target",
...     y_pred_name="pred",
... )
>>> state
AccuracyState(y_true=(5,), y_pred=(5,), y_true_name='target', y_pred_name='pred', nan_policy='propagate')

arkas.state.BaseArgState ¶

Bases: BaseState

Define a base class to manage arbitrary keyword arguments.

Parameters:

Name	Type	Description	Default
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

arkas.state.BaseArgState.get_arg ¶

get_arg(name: str, default: Any = None) -> Any

Get a given argument from the state.

Parameters:

Name	Type	Description	Default
`name`	`str`	The argument name to get.	required
`default`	`Any`	The default value to return if the argument is missing.	`None`

Returns:

Type	Description
`Any`	The argument value or the default value.

Example usage:

>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import DataFrameState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, 0],
...         "col2": [0, 1, 0, 1, 0, 1, 0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
...     },
...     schema={"col1": pl.Int64, "col2": pl.Int32, "col3": pl.Float64},
... )
>>> state = DataFrameState(frame, column="col3")
>>> state.get_arg("column")
col3

arkas.state.BaseArgState.get_args ¶

get_args() -> dict

Get a dictionary with all the arguments of the state.

Returns:

Type	Description
`dict`	The dictionary with all the arguments.

Example usage:

>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import DataFrameState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, 0],
...         "col2": [0, 1, 0, 1, 0, 1, 0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
...     },
...     schema={"col1": pl.Int64, "col2": pl.Int32, "col3": pl.Float64},
... )
>>> state = DataFrameState(frame, column="col3")
>>> args = state.get_args()

arkas.state.BaseState ¶

Bases: ABC

Define the base class to implement a state.

Example usage:

>>> import numpy as np
>>> from arkas.state import AccuracyState
>>> state = AccuracyState(
...     y_true=np.array([1, 0, 0, 1, 1]),
...     y_pred=np.array([1, 0, 0, 1, 1]),
...     y_true_name="target",
...     y_pred_name="pred",
... )
>>> state
AccuracyState(y_true=(5,), y_pred=(5,), y_true_name='target', y_pred_name='pred', nan_policy='propagate')

arkas.state.BaseState.clone `abstractmethod` ¶

clone(deep: bool = True) -> Self

Return a copy of the state.

Parameters:

Name	Type	Description	Default
`deep`	`bool`	If `True`, it returns a deep copy of the state, otherwise it returns a shallow copy.	`True`

Returns:

Type	Description
`Self`	A copy of the state.

Example usage:

>>> import numpy as np
>>> from arkas.state import AccuracyState
>>> state = AccuracyState(
...     y_true=np.array([1, 0, 0, 1, 1]),
...     y_pred=np.array([1, 0, 0, 1, 1]),
...     y_true_name="target",
...     y_pred_name="pred",
... )
... cloned_state = state.clone()

arkas.state.BaseState.equal `abstractmethod` ¶

equal(other: Any, equal_nan: bool = False) -> bool

Indicate if two states are equal or not.

Parameters:

Name	Type	Description	Default
`other`	`Any`	The other state to compare.	required
`equal_nan`	`bool`	Whether to compare NaN's as equal. If `True`, NaN's in both objects will be considered equal.	`False`

Returns:

Type	Description
`bool`	`True` if the two states are equal, otherwise `False`.

Example usage:

>>> import numpy as np
>>> from arkas.state import AccuracyState
>>> state1 = AccuracyState(
...     y_true=np.array([1, 0, 0, 1, 1]),
...     y_pred=np.array([1, 0, 0, 1, 1]),
...     y_true_name="target",
...     y_pred_name="pred",
... )
>>> state2 = AccuracyState(
...     y_true=np.array([1, 0, 0, 1, 1]),
...     y_pred=np.array([1, 0, 0, 1, 1]),
...     y_true_name="target",
...     y_pred_name="pred",
... )
>>> state3 = AccuracyState(
...     y_true=np.array([1, 0, 0, 0, 0]),
...     y_pred=np.array([1, 0, 0, 1, 1]),
...     y_true_name="target",
...     y_pred_name="pred",
... )
>>> state1.equal(state2)
True
>>> state1.equal(state3)
False

arkas.state.ColumnCooccurrenceState ¶

Bases: BaseState

Implement the column co-occurrence state.

Parameters:

Name	Type	Description	Default
`matrix`	`ndarray`	The co-occurrence matrix.	required
`columns`	`Sequence[str]`	The column names.	required
`figure_config`	`BaseFigureConfig \| None`	An optional figure configuration.	`None`

Example usage:

>>> import numpy as np
>>> from arkas.state import ColumnCooccurrenceState
>>> state = ColumnCooccurrenceState(matrix=np.ones((3, 3)), columns=["a", "b", "c"])
>>> state
ColumnCooccurrenceState(matrix=(3, 3), figure_config=MatplotlibFigureConfig())

arkas.state.ColumnCooccurrenceState.from_dataframe `classmethod` ¶

from_dataframe(
    frame: DataFrame,
    ignore_self: bool = False,
    figure_config: BaseFigureConfig | None = None,
) -> ColumnCooccurrenceState

Instantiate a ColumnCooccurrenceState object from a DataFrame.

Parameters:

Name	Type	Description	Default
`frame`	`DataFrame`	The DataFrame to analyze.	required
`ignore_self`	`bool`	If `True`, the diagonal of the co-occurrence matrix (a.k.a. self-co-occurrence) is set to 0.	`False`
`figure_config`	`BaseFigureConfig \| None`	An optional figure configuration.	`None`

Returns:

Type	Description
`ColumnCooccurrenceState`	The instantiate state.

Example usage:

>>> import polars as pl
>>> from arkas.state import ColumnCooccurrenceState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, 0],
...         "col2": [0, 1, 0, 1, 0, 1, 0],
...         "col3": [0, 0, 0, 0, 1, 1, 1],
...     }
... )
>>> state = ColumnCooccurrenceState.from_dataframe(frame)
>>> state
ColumnCooccurrenceState(matrix=(3, 3), figure_config=MatplotlibFigureConfig())

arkas.state.DataFrameState ¶

Bases: BaseArgState

Implement the DataFrame state.

Parameters:

Name	Type	Description	Default
`dataframe`	`DataFrame`	The DataFrame.	required
`nan_policy`	`str`	The policy on how to handle NaN values in the input arrays. The following options are available: `'omit'`, `'propagate'`, and `'raise'`.	`'propagate'`
`figure_config`	`BaseFigureConfig \| None`	An optional figure configuration.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Example usage:

>>> import polars as pl
>>> from arkas.state import DataFrameState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, 0],
...         "col2": [0, 1, 0, 1, 0, 1, 0],
...         "col3": [0, 0, 0, 0, 1, 1, 1],
...     }
... )
>>> state = DataFrameState(frame)
>>> state
DataFrameState(dataframe=(7, 3), nan_policy='propagate', figure_config=MatplotlibFigureConfig())

arkas.state.NullValueState ¶

Bases: BaseState

Implement a state that contains the number of null values per columns.

Parameters:

Name	Type	Description	Default
`null_count`	`ndarray`	The array with the number of null values for each column.	required
`total_count`	`ndarray`	The total number of values for each column.	required
`columns`	`Sequence[str]`	The column names.	required
`figure_config`	`BaseFigureConfig \| None`	An optional figure configuration.	`None`

Example usage:

>>> import numpy as np
>>> from arkas.state import NullValueState
>>> state = NullValueState(
...     null_count=np.array([0, 1, 2]),
...     total_count=np.array([5, 5, 5]),
...     columns=["col1", "col2", "col3"],
... )
>>> state
NullValueState(num_columns=3, figure_config=MatplotlibFigureConfig())

arkas.state.NullValueState.from_dataframe `classmethod` ¶

from_dataframe(
    dataframe: DataFrame,
    figure_config: BaseFigureConfig | None = None,
) -> NullValueState

Instantiate a NullValueState object from a DataFrame.

Parameters:

Name	Type	Description	Default
`dataframe`	`DataFrame`	The DataFrame.	required
`figure_config`	`BaseFigureConfig \| None`	An optional figure configuration.	`None`

Returns:

Type	Description
`NullValueState`	The instantiated `NullValueState` object.

Example usage:

>>> import polars as pl
>>> from arkas.state import NullValueState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, None],
...         "col2": [0, 1, None, None, 0, 1, 0],
...         "col3": [None, 0, 0, 0, None, 1, None],
...     }
... )
>>> state = NullValueState.from_dataframe(frame)
>>> state
NullValueState(num_columns=3, figure_config=MatplotlibFigureConfig())

arkas.state.NullValueState.to_dataframe ¶

to_dataframe() -> DataFrame

Export the content of the state to a DataFrame.

Returns:

Type	Description
`DataFrame`	The DataFrame.

>>> import numpy as np
>>> from arkas.state import NullValueState
>>> state = NullValueState(
...     null_count=np.array([0, 1, 2]),
...     total_count=np.array([5, 5, 5]),
...     columns=["col1", "col2", "col3"],
... )
>>> state.to_dataframe()
shape: (3, 3)
┌────────┬──────┬───────┐
│ column ┆ null ┆ total │
│ ---    ┆ ---  ┆ ---   │
│ str    ┆ i64  ┆ i64   │
╞════════╪══════╪═══════╡
│ col1   ┆ 0    ┆ 5     │
│ col2   ┆ 1    ┆ 5     │
│ col3   ┆ 2    ┆ 5     │
└────────┴──────┴───────┘

arkas.state.PrecisionRecallState ¶

Bases: BaseState

Implement a state for precision-recall-based metrics.

This state can be used in 3 different settings:

binary: y_true must be an array of shape (n_samples,) with 0 and 1 values, and y_pred must be an array of shape (n_samples,).
multiclass: y_true must be an array of shape (n_samples,) with values in {0, ..., n_classes-1}, and y_pred must be an array of shape (n_samples,).
multilabel: y_true must be an array of shape (n_samples, n_classes) with 0 and 1 values, and y_pred must be an array of shape (n_samples, n_classes).

Parameters:

Name	Type	Description	Default
`y_true`	`ndarray`	The ground truth target labels. This input must be an array of shape `(n_samples,)` or `(n_samples, n_classes)`.	required
`y_pred`	`ndarray`	The predicted labels. This input must be an array of shape `(n_samples,)` or `(n_samples, n_classes)`.	required
`y_true_name`	`str`	The name associated to the ground truth target labels.	required
`y_pred_name`	`str`	The name associated to the predicted labels.	required
`label_type`	`str`	The type of labels used to evaluate the metrics. The valid values are: `'binary'`, `'multiclass'`, and `'multilabel'`. If `'binary'` or `'multilabel'`, `y_true` values must be `0` and `1`.	`'auto'`
`nan_policy`	`str`	The policy on how to handle NaN values in the input arrays. The following options are available: `'omit'`, `'propagate'`, and `'raise'`.	`'propagate'`

Example usage:

>>> import numpy as np
>>> from arkas.state import PrecisionRecallState
>>> state = PrecisionRecallState(
...     y_true=np.array([1, 0, 0, 1, 1]),
...     y_pred=np.array([1, 0, 0, 1, 1]),
...     y_true_name="target",
...     y_pred_name="pred",
... )
>>> state
PrecisionRecallState(y_true=(5,), y_pred=(5,), y_true_name='target', y_pred_name='pred', label_type='binary', nan_policy='propagate')

arkas.state.ScatterDataFrameState ¶

Bases: DataFrameState

Implement the DataFrame state for scatter plots.

Parameters:

Name	Type	Description	Default
`dataframe`	`DataFrame`	The DataFrame.	required
`x`	`str`	The x-axis data column.	required
`y`	`str`	The y-axis data column.	required
`color`	`str \| None`	An optional color axis data column.	`None`
`nan_policy`	`str`	The policy on how to handle NaN values in the input arrays. The following options are available: `'omit'`, `'propagate'`, and `'raise'`.	`'propagate'`
`figure_config`	`BaseFigureConfig \| None`	An optional figure configuration.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Example usage:

>>> import polars as pl
>>> from arkas.state import ScatterDataFrameState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, 0],
...         "col2": [0, 1, 0, 1, 0, 1, 0],
...         "col3": [0, 0, 0, 0, 1, 1, 1],
...     }
... )
>>> state = ScatterDataFrameState(frame, x="col1", y="col2")
>>> state
ScatterDataFrameState(dataframe=(7, 3), x='col1', y='col2', color=None, nan_policy='propagate', figure_config=MatplotlibFigureConfig())

arkas.state.SeriesState ¶

Bases: BaseState

Implement the Series state.

Parameters:

Name	Type	Description	Default
`series`	`Series`	The Series.	required
`figure_config`	`BaseFigureConfig \| None`	An optional figure configuration.	`None`

Example usage:

>>> import polars as pl
>>> from arkas.state import SeriesState
>>> state = SeriesState(pl.Series("col1", [1, 2, 3, 4, 5, 6, 7]))
>>> state
SeriesState(name='col1', values=(7,), figure_config=MatplotlibFigureConfig())

arkas.state.TargetDataFrameState ¶

Bases: DataFrameState

Implement a DataFrame state with a target column.

Parameters:

Name	Type	Description	Default
`dataframe`	`DataFrame`	The DataFrame.	required
`target_column`	`str`	The target column in the DataFrame.	required
`nan_policy`	`str`	The policy on how to handle NaN values in the input arrays. The following options are available: `'omit'`, `'propagate'`, and `'raise'`.	`'propagate'`
`figure_config`	`BaseFigureConfig \| None`	An optional figure configuration.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Example usage:

>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import TargetDataFrameState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, 0],
...         "col2": [0, 1, 0, 1, 0, 1, 0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
...     },
...     schema={"col1": pl.Int64, "col2": pl.Int32, "col3": pl.Float64},
... )
>>> state = TargetDataFrameState(frame, target_column="col3")
>>> state
TargetDataFrameState(dataframe=(7, 3), target_column='col3', nan_policy='propagate', figure_config=MatplotlibFigureConfig())

arkas.state.TemporalColumnState ¶

Bases: TemporalDataFrameState

Implement the temporal DataFrame state.

Parameters:

Name	Type	Description	Default
`dataframe`	`DataFrame`	The DataFrame.	required
`target_column`	`str`	The target column in the DataFrame.	required
`temporal_column`	`str`	The temporal column in the DataFrame.	required
`period`	`str \| None`	An optional temporal period e.g. monthly or daily.	`None`
`nan_policy`	`str`	The policy on how to handle NaN values in the input arrays. The following options are available: `'omit'`, `'propagate'`, and `'raise'`.	`'propagate'`
`figure_config`	`BaseFigureConfig \| None`	An optional figure configuration.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Example usage:

>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import TemporalColumnState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0],
...         "col2": [0, 1, 0, 1],
...         "datetime": [
...             datetime(year=2020, month=1, day=3, tzinfo=timezone.utc),
...             datetime(year=2020, month=2, day=3, tzinfo=timezone.utc),
...             datetime(year=2020, month=3, day=3, tzinfo=timezone.utc),
...             datetime(year=2020, month=4, day=3, tzinfo=timezone.utc),
...         ],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Int64,
...         "datetime": pl.Datetime(time_unit="us", time_zone="UTC"),
...     },
... )
>>> state = TemporalColumnState(frame, target_column="col2", temporal_column="datetime")
>>> state
TemporalColumnState(dataframe=(4, 3), target_column='col2', temporal_column='datetime', period=None, nan_policy='propagate', figure_config=MatplotlibFigureConfig())

arkas.state.TemporalDataFrameState ¶

Bases: DataFrameState

Implement the temporal DataFrame state.

Parameters:

Name	Type	Description	Default
`dataframe`	`DataFrame`	The DataFrame.	required
`temporal_column`	`str`	The temporal column in the DataFrame.	required
`period`	`str \| None`	An optional temporal period e.g. monthly or daily.	`None`
`nan_policy`	`str`	The policy on how to handle NaN values in the input arrays. The following options are available: `'omit'`, `'propagate'`, and `'raise'`.	`'propagate'`
`figure_config`	`BaseFigureConfig \| None`	An optional figure configuration.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Example usage:

>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import TemporalDataFrameState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0],
...         "col2": [0, 1, 0, 1],
...         "col3": [1, 0, 0, 0],
...         "datetime": [
...             datetime(year=2020, month=1, day=3, tzinfo=timezone.utc),
...             datetime(year=2020, month=2, day=3, tzinfo=timezone.utc),
...             datetime(year=2020, month=3, day=3, tzinfo=timezone.utc),
...             datetime(year=2020, month=4, day=3, tzinfo=timezone.utc),
...         ],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Int64,
...         "col3": pl.Int64,
...         "datetime": pl.Datetime(time_unit="us", time_zone="UTC"),
...     },
... )
>>> state = TemporalDataFrameState(frame, temporal_column="datetime")
>>> state
TemporalDataFrameState(dataframe=(4, 4), temporal_column='datetime', period=None, nan_policy='propagate', figure_config=MatplotlibFigureConfig())

arkas.state.TwoColumnDataFrameState ¶

Bases: DataFrameState

Implement a DataFrame state with a target column.

Parameters:

Name	Type	Description	Default
`dataframe`	`DataFrame`	The DataFrame.	required
`column1`	`str`	The first target column in the DataFrame.	required
`column2`	`str`	The second target column in the DataFrame.	required
`nan_policy`	`str`	The policy on how to handle NaN values in the input arrays. The following options are available: `'omit'`, `'propagate'`, and `'raise'`.	`'propagate'`
`figure_config`	`BaseFigureConfig \| None`	An optional figure configuration.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Example usage:

>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import TwoColumnDataFrameState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, 0],
...         "col2": [0, 1, 0, 1, 0, 1, 0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
...     },
...     schema={"col1": pl.Int64, "col2": pl.Int32, "col3": pl.Float64},
... )
>>> state = TwoColumnDataFrameState(frame, column1="col3", column2="col1")
>>> state
TwoColumnDataFrameState(dataframe=(7, 3), column1='col3', column2='col1', nan_policy='propagate', figure_config=MatplotlibFigureConfig())

arkas.state¶

arkas.state ¶

arkas.state.AccuracyState ¶

arkas.state.BaseArgState ¶

arkas.state.BaseArgState.get_arg ¶

arkas.state.BaseArgState.get_args ¶

arkas.state.BaseState ¶

arkas.state.BaseState.clone abstractmethod ¶

arkas.state.BaseState.equal abstractmethod ¶

arkas.state.ColumnCooccurrenceState ¶

arkas.state.ColumnCooccurrenceState.from_dataframe classmethod ¶

arkas.state.DataFrameState ¶

arkas.state.NullValueState ¶

arkas.state.NullValueState.from_dataframe classmethod ¶

arkas.state.NullValueState.to_dataframe ¶

arkas.state.PrecisionRecallState ¶

arkas.state.ScatterDataFrameState ¶

arkas.state.SeriesState ¶

arkas.state.TargetDataFrameState ¶

arkas.state.TemporalColumnState ¶

arkas.state.TemporalDataFrameState ¶

arkas.state.TwoColumnDataFrameState ¶

arkas.state.BaseState.clone `abstractmethod` ¶

arkas.state.BaseState.equal `abstractmethod` ¶

arkas.state.ColumnCooccurrenceState.from_dataframe `classmethod` ¶

arkas.state.NullValueState.from_dataframe `classmethod` ¶