Skip to content

arkas.state

arkas.state

Contain states.

arkas.state.AccuracyState

Bases: BaseArgState

Implement the accuracy state.

Parameters:

Name Type Description Default
y_true ndarray

The ground truth target labels. This input must be an array of shape (n_samples,) where the values are in {0, ..., n_classes-1}.

required
y_pred ndarray

The predicted labels. This input must be an array of shape (n_samples,) where the values are in {0, ..., n_classes-1}.

required
y_true_name str

The name associated to the ground truth target labels.

required
y_pred_name str

The name associated to the predicted labels.

required
nan_policy str

The policy on how to handle NaN values in the input arrays. The following options are available: 'omit', 'propagate', and 'raise'.

'propagate'

Example usage:

>>> import numpy as np
>>> from arkas.state import AccuracyState
>>> state = AccuracyState(
...     y_true=np.array([1, 0, 0, 1, 1]),
...     y_pred=np.array([1, 0, 0, 1, 1]),
...     y_true_name="target",
...     y_pred_name="pred",
... )
>>> state
AccuracyState(y_true=(5,), y_pred=(5,), y_true_name='target', y_pred_name='pred', nan_policy='propagate')

arkas.state.BaseArgState

Bases: BaseState

Define a base class to manage arbitrary keyword arguments.

Parameters:

Name Type Description Default
**kwargs Any

Additional keyword arguments.

{}
arkas.state.BaseArgState.get_arg
get_arg(name: str, default: Any = None) -> Any

Get a given argument from the state.

Parameters:

Name Type Description Default
name str

The argument name to get.

required
default Any

The default value to return if the argument is missing.

None

Returns:

Type Description
Any

The argument value or the default value.

Example usage:

>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import DataFrameState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, 0],
...         "col2": [0, 1, 0, 1, 0, 1, 0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
...     },
...     schema={"col1": pl.Int64, "col2": pl.Int32, "col3": pl.Float64},
... )
>>> state = DataFrameState(frame, column="col3")
>>> state.get_arg("column")
col3
arkas.state.BaseArgState.get_args
get_args() -> dict

Get a dictionary with all the arguments of the state.

Returns:

Type Description
dict

The dictionary with all the arguments.

Example usage:

>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import DataFrameState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, 0],
...         "col2": [0, 1, 0, 1, 0, 1, 0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
...     },
...     schema={"col1": pl.Int64, "col2": pl.Int32, "col3": pl.Float64},
... )
>>> state = DataFrameState(frame, column="col3")
>>> args = state.get_args()

arkas.state.BaseState

Bases: ABC

Define the base class to implement a state.

Example usage:

>>> import numpy as np
>>> from arkas.state import AccuracyState
>>> state = AccuracyState(
...     y_true=np.array([1, 0, 0, 1, 1]),
...     y_pred=np.array([1, 0, 0, 1, 1]),
...     y_true_name="target",
...     y_pred_name="pred",
... )
>>> state
AccuracyState(y_true=(5,), y_pred=(5,), y_true_name='target', y_pred_name='pred', nan_policy='propagate')
arkas.state.BaseState.clone abstractmethod
clone(deep: bool = True) -> Self

Return a copy of the state.

Parameters:

Name Type Description Default
deep bool

If True, it returns a deep copy of the state, otherwise it returns a shallow copy.

True

Returns:

Type Description
Self

A copy of the state.

Example usage:

>>> import numpy as np
>>> from arkas.state import AccuracyState
>>> state = AccuracyState(
...     y_true=np.array([1, 0, 0, 1, 1]),
...     y_pred=np.array([1, 0, 0, 1, 1]),
...     y_true_name="target",
...     y_pred_name="pred",
... )
... cloned_state = state.clone()
arkas.state.BaseState.equal abstractmethod
equal(other: Any, equal_nan: bool = False) -> bool

Indicate if two states are equal or not.

Parameters:

Name Type Description Default
other Any

The other state to compare.

required
equal_nan bool

Whether to compare NaN's as equal. If True, NaN's in both objects will be considered equal.

False

Returns:

Type Description
bool

True if the two states are equal, otherwise False.

Example usage:

>>> import numpy as np
>>> from arkas.state import AccuracyState
>>> state1 = AccuracyState(
...     y_true=np.array([1, 0, 0, 1, 1]),
...     y_pred=np.array([1, 0, 0, 1, 1]),
...     y_true_name="target",
...     y_pred_name="pred",
... )
>>> state2 = AccuracyState(
...     y_true=np.array([1, 0, 0, 1, 1]),
...     y_pred=np.array([1, 0, 0, 1, 1]),
...     y_true_name="target",
...     y_pred_name="pred",
... )
>>> state3 = AccuracyState(
...     y_true=np.array([1, 0, 0, 0, 0]),
...     y_pred=np.array([1, 0, 0, 1, 1]),
...     y_true_name="target",
...     y_pred_name="pred",
... )
>>> state1.equal(state2)
True
>>> state1.equal(state3)
False

arkas.state.ColumnCooccurrenceState

Bases: BaseState

Implement the column co-occurrence state.

Parameters:

Name Type Description Default
matrix ndarray

The co-occurrence matrix.

required
columns Sequence[str]

The column names.

required
figure_config BaseFigureConfig | None

An optional figure configuration.

None

Example usage:

>>> import numpy as np
>>> from arkas.state import ColumnCooccurrenceState
>>> state = ColumnCooccurrenceState(matrix=np.ones((3, 3)), columns=["a", "b", "c"])
>>> state
ColumnCooccurrenceState(matrix=(3, 3), figure_config=MatplotlibFigureConfig())
arkas.state.ColumnCooccurrenceState.from_dataframe classmethod
from_dataframe(
    frame: DataFrame,
    ignore_self: bool = False,
    figure_config: BaseFigureConfig | None = None,
) -> ColumnCooccurrenceState

Instantiate a ColumnCooccurrenceState object from a DataFrame.

Parameters:

Name Type Description Default
frame DataFrame

The DataFrame to analyze.

required
ignore_self bool

If True, the diagonal of the co-occurrence matrix (a.k.a. self-co-occurrence) is set to 0.

False
figure_config BaseFigureConfig | None

An optional figure configuration.

None

Returns:

Type Description
ColumnCooccurrenceState

The instantiate state.

Example usage:

>>> import polars as pl
>>> from arkas.state import ColumnCooccurrenceState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, 0],
...         "col2": [0, 1, 0, 1, 0, 1, 0],
...         "col3": [0, 0, 0, 0, 1, 1, 1],
...     }
... )
>>> state = ColumnCooccurrenceState.from_dataframe(frame)
>>> state
ColumnCooccurrenceState(matrix=(3, 3), figure_config=MatplotlibFigureConfig())

arkas.state.DataFrameState

Bases: BaseArgState

Implement the DataFrame state.

Parameters:

Name Type Description Default
dataframe DataFrame

The DataFrame.

required
nan_policy str

The policy on how to handle NaN values in the input arrays. The following options are available: 'omit', 'propagate', and 'raise'.

'propagate'
figure_config BaseFigureConfig | None

An optional figure configuration.

None
**kwargs Any

Additional keyword arguments.

{}

Example usage:

>>> import polars as pl
>>> from arkas.state import DataFrameState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, 0],
...         "col2": [0, 1, 0, 1, 0, 1, 0],
...         "col3": [0, 0, 0, 0, 1, 1, 1],
...     }
... )
>>> state = DataFrameState(frame)
>>> state
DataFrameState(dataframe=(7, 3), nan_policy='propagate', figure_config=MatplotlibFigureConfig())

arkas.state.NullValueState

Bases: BaseState

Implement a state that contains the number of null values per columns.

Parameters:

Name Type Description Default
null_count ndarray

The array with the number of null values for each column.

required
total_count ndarray

The total number of values for each column.

required
columns Sequence[str]

The column names.

required
figure_config BaseFigureConfig | None

An optional figure configuration.

None

Example usage:

>>> import numpy as np
>>> from arkas.state import NullValueState
>>> state = NullValueState(
...     null_count=np.array([0, 1, 2]),
...     total_count=np.array([5, 5, 5]),
...     columns=["col1", "col2", "col3"],
... )
>>> state
NullValueState(num_columns=3, figure_config=MatplotlibFigureConfig())
arkas.state.NullValueState.from_dataframe classmethod
from_dataframe(
    dataframe: DataFrame,
    figure_config: BaseFigureConfig | None = None,
) -> NullValueState

Instantiate a NullValueState object from a DataFrame.

Parameters:

Name Type Description Default
dataframe DataFrame

The DataFrame.

required
figure_config BaseFigureConfig | None

An optional figure configuration.

None

Returns:

Type Description
NullValueState

The instantiated NullValueState object.

Example usage:

>>> import polars as pl
>>> from arkas.state import NullValueState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, None],
...         "col2": [0, 1, None, None, 0, 1, 0],
...         "col3": [None, 0, 0, 0, None, 1, None],
...     }
... )
>>> state = NullValueState.from_dataframe(frame)
>>> state
NullValueState(num_columns=3, figure_config=MatplotlibFigureConfig())
arkas.state.NullValueState.to_dataframe
to_dataframe() -> DataFrame

Export the content of the state to a DataFrame.

Returns:

Type Description
DataFrame

The DataFrame.

>>> import numpy as np
>>> from arkas.state import NullValueState
>>> state = NullValueState(
...     null_count=np.array([0, 1, 2]),
...     total_count=np.array([5, 5, 5]),
...     columns=["col1", "col2", "col3"],
... )
>>> state.to_dataframe()
shape: (3, 3)
┌────────┬──────┬───────┐
│ column ┆ null ┆ total │
│ ---    ┆ ---  ┆ ---   │
│ str    ┆ i64  ┆ i64   │
╞════════╪══════╪═══════╡
│ col1   ┆ 0    ┆ 5     │
│ col2   ┆ 1    ┆ 5     │
│ col3   ┆ 2    ┆ 5     │
└────────┴──────┴───────┘

arkas.state.PrecisionRecallState

Bases: BaseState

Implement a state for precision-recall-based metrics.

This state can be used in 3 different settings:

  • binary: y_true must be an array of shape (n_samples,) with 0 and 1 values, and y_pred must be an array of shape (n_samples,).
  • multiclass: y_true must be an array of shape (n_samples,) with values in {0, ..., n_classes-1}, and y_pred must be an array of shape (n_samples,).
  • multilabel: y_true must be an array of shape (n_samples, n_classes) with 0 and 1 values, and y_pred must be an array of shape (n_samples, n_classes).

Parameters:

Name Type Description Default
y_true ndarray

The ground truth target labels. This input must be an array of shape (n_samples,) or (n_samples, n_classes).

required
y_pred ndarray

The predicted labels. This input must be an array of shape (n_samples,) or (n_samples, n_classes).

required
y_true_name str

The name associated to the ground truth target labels.

required
y_pred_name str

The name associated to the predicted labels.

required
label_type str

The type of labels used to evaluate the metrics. The valid values are: 'binary', 'multiclass', and 'multilabel'. If 'binary' or 'multilabel', y_true values must be 0 and 1.

'auto'
nan_policy str

The policy on how to handle NaN values in the input arrays. The following options are available: 'omit', 'propagate', and 'raise'.

'propagate'

Example usage:

>>> import numpy as np
>>> from arkas.state import PrecisionRecallState
>>> state = PrecisionRecallState(
...     y_true=np.array([1, 0, 0, 1, 1]),
...     y_pred=np.array([1, 0, 0, 1, 1]),
...     y_true_name="target",
...     y_pred_name="pred",
... )
>>> state
PrecisionRecallState(y_true=(5,), y_pred=(5,), y_true_name='target', y_pred_name='pred', label_type='binary', nan_policy='propagate')

arkas.state.ScatterDataFrameState

Bases: DataFrameState

Implement the DataFrame state for scatter plots.

Parameters:

Name Type Description Default
dataframe DataFrame

The DataFrame.

required
x str

The x-axis data column.

required
y str

The y-axis data column.

required
color str | None

An optional color axis data column.

None
nan_policy str

The policy on how to handle NaN values in the input arrays. The following options are available: 'omit', 'propagate', and 'raise'.

'propagate'
figure_config BaseFigureConfig | None

An optional figure configuration.

None
**kwargs Any

Additional keyword arguments.

{}

Example usage:

>>> import polars as pl
>>> from arkas.state import ScatterDataFrameState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, 0],
...         "col2": [0, 1, 0, 1, 0, 1, 0],
...         "col3": [0, 0, 0, 0, 1, 1, 1],
...     }
... )
>>> state = ScatterDataFrameState(frame, x="col1", y="col2")
>>> state
ScatterDataFrameState(dataframe=(7, 3), x='col1', y='col2', color=None, nan_policy='propagate', figure_config=MatplotlibFigureConfig())

arkas.state.SeriesState

Bases: BaseState

Implement the Series state.

Parameters:

Name Type Description Default
series Series

The Series.

required
figure_config BaseFigureConfig | None

An optional figure configuration.

None

Example usage:

>>> import polars as pl
>>> from arkas.state import SeriesState
>>> state = SeriesState(pl.Series("col1", [1, 2, 3, 4, 5, 6, 7]))
>>> state
SeriesState(name='col1', values=(7,), figure_config=MatplotlibFigureConfig())

arkas.state.TargetDataFrameState

Bases: DataFrameState

Implement a DataFrame state with a target column.

Parameters:

Name Type Description Default
dataframe DataFrame

The DataFrame.

required
target_column str

The target column in the DataFrame.

required
nan_policy str

The policy on how to handle NaN values in the input arrays. The following options are available: 'omit', 'propagate', and 'raise'.

'propagate'
figure_config BaseFigureConfig | None

An optional figure configuration.

None
**kwargs Any

Additional keyword arguments.

{}

Example usage:

>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import TargetDataFrameState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, 0],
...         "col2": [0, 1, 0, 1, 0, 1, 0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
...     },
...     schema={"col1": pl.Int64, "col2": pl.Int32, "col3": pl.Float64},
... )
>>> state = TargetDataFrameState(frame, target_column="col3")
>>> state
TargetDataFrameState(dataframe=(7, 3), target_column='col3', nan_policy='propagate', figure_config=MatplotlibFigureConfig())

arkas.state.TemporalColumnState

Bases: TemporalDataFrameState

Implement the temporal DataFrame state.

Parameters:

Name Type Description Default
dataframe DataFrame

The DataFrame.

required
target_column str

The target column in the DataFrame.

required
temporal_column str

The temporal column in the DataFrame.

required
period str | None

An optional temporal period e.g. monthly or daily.

None
nan_policy str

The policy on how to handle NaN values in the input arrays. The following options are available: 'omit', 'propagate', and 'raise'.

'propagate'
figure_config BaseFigureConfig | None

An optional figure configuration.

None
**kwargs Any

Additional keyword arguments.

{}

Example usage:

>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import TemporalColumnState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0],
...         "col2": [0, 1, 0, 1],
...         "datetime": [
...             datetime(year=2020, month=1, day=3, tzinfo=timezone.utc),
...             datetime(year=2020, month=2, day=3, tzinfo=timezone.utc),
...             datetime(year=2020, month=3, day=3, tzinfo=timezone.utc),
...             datetime(year=2020, month=4, day=3, tzinfo=timezone.utc),
...         ],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Int64,
...         "datetime": pl.Datetime(time_unit="us", time_zone="UTC"),
...     },
... )
>>> state = TemporalColumnState(frame, target_column="col2", temporal_column="datetime")
>>> state
TemporalColumnState(dataframe=(4, 3), target_column='col2', temporal_column='datetime', period=None, nan_policy='propagate', figure_config=MatplotlibFigureConfig())

arkas.state.TemporalDataFrameState

Bases: DataFrameState

Implement the temporal DataFrame state.

Parameters:

Name Type Description Default
dataframe DataFrame

The DataFrame.

required
temporal_column str

The temporal column in the DataFrame.

required
period str | None

An optional temporal period e.g. monthly or daily.

None
nan_policy str

The policy on how to handle NaN values in the input arrays. The following options are available: 'omit', 'propagate', and 'raise'.

'propagate'
figure_config BaseFigureConfig | None

An optional figure configuration.

None
**kwargs Any

Additional keyword arguments.

{}

Example usage:

>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import TemporalDataFrameState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0],
...         "col2": [0, 1, 0, 1],
...         "col3": [1, 0, 0, 0],
...         "datetime": [
...             datetime(year=2020, month=1, day=3, tzinfo=timezone.utc),
...             datetime(year=2020, month=2, day=3, tzinfo=timezone.utc),
...             datetime(year=2020, month=3, day=3, tzinfo=timezone.utc),
...             datetime(year=2020, month=4, day=3, tzinfo=timezone.utc),
...         ],
...     },
...     schema={
...         "col1": pl.Int64,
...         "col2": pl.Int64,
...         "col3": pl.Int64,
...         "datetime": pl.Datetime(time_unit="us", time_zone="UTC"),
...     },
... )
>>> state = TemporalDataFrameState(frame, temporal_column="datetime")
>>> state
TemporalDataFrameState(dataframe=(4, 4), temporal_column='datetime', period=None, nan_policy='propagate', figure_config=MatplotlibFigureConfig())

arkas.state.TwoColumnDataFrameState

Bases: DataFrameState

Implement a DataFrame state with a target column.

Parameters:

Name Type Description Default
dataframe DataFrame

The DataFrame.

required
column1 str

The first target column in the DataFrame.

required
column2 str

The second target column in the DataFrame.

required
nan_policy str

The policy on how to handle NaN values in the input arrays. The following options are available: 'omit', 'propagate', and 'raise'.

'propagate'
figure_config BaseFigureConfig | None

An optional figure configuration.

None
**kwargs Any

Additional keyword arguments.

{}

Example usage:

>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import TwoColumnDataFrameState
>>> frame = pl.DataFrame(
...     {
...         "col1": [0, 1, 1, 0, 0, 1, 0],
...         "col2": [0, 1, 0, 1, 0, 1, 0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
...     },
...     schema={"col1": pl.Int64, "col2": pl.Int32, "col3": pl.Float64},
... )
>>> state = TwoColumnDataFrameState(frame, column1="col3", column2="col1")
>>> state
TwoColumnDataFrameState(dataframe=(7, 3), column1='col3', column2='col1', nan_policy='propagate', figure_config=MatplotlibFigureConfig())