arkas.state¶
arkas.state ¶
Contain states.
arkas.state.AccuracyState ¶
Bases: BaseArgState
Implement the accuracy state.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y_true
|
ndarray
|
The ground truth target labels. This input must
be an array of shape |
required |
y_pred
|
ndarray
|
The predicted labels. This input must be an
array of shape |
required |
y_true_name
|
str
|
The name associated to the ground truth target labels. |
required |
y_pred_name
|
str
|
The name associated to the predicted labels. |
required |
nan_policy
|
str
|
The policy on how to handle NaN values in the input
arrays. The following options are available: |
'propagate'
|
Example usage:
>>> import numpy as np
>>> from arkas.state import AccuracyState
>>> state = AccuracyState(
... y_true=np.array([1, 0, 0, 1, 1]),
... y_pred=np.array([1, 0, 0, 1, 1]),
... y_true_name="target",
... y_pred_name="pred",
... )
>>> state
AccuracyState(y_true=(5,), y_pred=(5,), y_true_name='target', y_pred_name='pred', nan_policy='propagate')
arkas.state.BaseArgState ¶
Bases: BaseState
Define a base class to manage arbitrary keyword arguments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**kwargs
|
Any
|
Additional keyword arguments. |
{}
|
arkas.state.BaseArgState.get_arg ¶
get_arg(name: str, default: Any = None) -> Any
Get a given argument from the state.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The argument name to get. |
required |
default
|
Any
|
The default value to return if the argument is missing. |
None
|
Returns:
Type | Description |
---|---|
Any
|
The argument value or the default value. |
Example usage:
>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import DataFrameState
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 1, 0, 0, 1, 0],
... "col2": [0, 1, 0, 1, 0, 1, 0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
... },
... schema={"col1": pl.Int64, "col2": pl.Int32, "col3": pl.Float64},
... )
>>> state = DataFrameState(frame, column="col3")
>>> state.get_arg("column")
col3
arkas.state.BaseArgState.get_args ¶
get_args() -> dict
Get a dictionary with all the arguments of the state.
Returns:
Type | Description |
---|---|
dict
|
The dictionary with all the arguments. |
Example usage:
>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import DataFrameState
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 1, 0, 0, 1, 0],
... "col2": [0, 1, 0, 1, 0, 1, 0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
... },
... schema={"col1": pl.Int64, "col2": pl.Int32, "col3": pl.Float64},
... )
>>> state = DataFrameState(frame, column="col3")
>>> args = state.get_args()
arkas.state.BaseState ¶
Bases: ABC
Define the base class to implement a state.
Example usage:
>>> import numpy as np
>>> from arkas.state import AccuracyState
>>> state = AccuracyState(
... y_true=np.array([1, 0, 0, 1, 1]),
... y_pred=np.array([1, 0, 0, 1, 1]),
... y_true_name="target",
... y_pred_name="pred",
... )
>>> state
AccuracyState(y_true=(5,), y_pred=(5,), y_true_name='target', y_pred_name='pred', nan_policy='propagate')
arkas.state.BaseState.clone
abstractmethod
¶
clone(deep: bool = True) -> Self
Return a copy of the state.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
deep
|
bool
|
If |
True
|
Returns:
Type | Description |
---|---|
Self
|
A copy of the state. |
Example usage:
>>> import numpy as np
>>> from arkas.state import AccuracyState
>>> state = AccuracyState(
... y_true=np.array([1, 0, 0, 1, 1]),
... y_pred=np.array([1, 0, 0, 1, 1]),
... y_true_name="target",
... y_pred_name="pred",
... )
... cloned_state = state.clone()
arkas.state.BaseState.equal
abstractmethod
¶
equal(other: Any, equal_nan: bool = False) -> bool
Indicate if two states are equal or not.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Any
|
The other state to compare. |
required |
equal_nan
|
bool
|
Whether to compare NaN's as equal. If |
False
|
Returns:
Type | Description |
---|---|
bool
|
|
Example usage:
>>> import numpy as np
>>> from arkas.state import AccuracyState
>>> state1 = AccuracyState(
... y_true=np.array([1, 0, 0, 1, 1]),
... y_pred=np.array([1, 0, 0, 1, 1]),
... y_true_name="target",
... y_pred_name="pred",
... )
>>> state2 = AccuracyState(
... y_true=np.array([1, 0, 0, 1, 1]),
... y_pred=np.array([1, 0, 0, 1, 1]),
... y_true_name="target",
... y_pred_name="pred",
... )
>>> state3 = AccuracyState(
... y_true=np.array([1, 0, 0, 0, 0]),
... y_pred=np.array([1, 0, 0, 1, 1]),
... y_true_name="target",
... y_pred_name="pred",
... )
>>> state1.equal(state2)
True
>>> state1.equal(state3)
False
arkas.state.ColumnCooccurrenceState ¶
Bases: BaseState
Implement the column co-occurrence state.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
matrix
|
ndarray
|
The co-occurrence matrix. |
required |
columns
|
Sequence[str]
|
The column names. |
required |
figure_config
|
BaseFigureConfig | None
|
An optional figure configuration. |
None
|
Example usage:
>>> import numpy as np
>>> from arkas.state import ColumnCooccurrenceState
>>> state = ColumnCooccurrenceState(matrix=np.ones((3, 3)), columns=["a", "b", "c"])
>>> state
ColumnCooccurrenceState(matrix=(3, 3), figure_config=MatplotlibFigureConfig())
arkas.state.ColumnCooccurrenceState.from_dataframe
classmethod
¶
from_dataframe(
frame: DataFrame,
ignore_self: bool = False,
figure_config: BaseFigureConfig | None = None,
) -> ColumnCooccurrenceState
Instantiate a ColumnCooccurrenceState
object from a
DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The DataFrame to analyze. |
required |
ignore_self
|
bool
|
If |
False
|
figure_config
|
BaseFigureConfig | None
|
An optional figure configuration. |
None
|
Returns:
Type | Description |
---|---|
ColumnCooccurrenceState
|
The instantiate state. |
Example usage:
>>> import polars as pl
>>> from arkas.state import ColumnCooccurrenceState
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 1, 0, 0, 1, 0],
... "col2": [0, 1, 0, 1, 0, 1, 0],
... "col3": [0, 0, 0, 0, 1, 1, 1],
... }
... )
>>> state = ColumnCooccurrenceState.from_dataframe(frame)
>>> state
ColumnCooccurrenceState(matrix=(3, 3), figure_config=MatplotlibFigureConfig())
arkas.state.DataFrameState ¶
Bases: BaseArgState
Implement the DataFrame state.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataframe
|
DataFrame
|
The DataFrame. |
required |
nan_policy
|
str
|
The policy on how to handle NaN values in the input
arrays. The following options are available: |
'propagate'
|
figure_config
|
BaseFigureConfig | None
|
An optional figure configuration. |
None
|
**kwargs
|
Any
|
Additional keyword arguments. |
{}
|
Example usage:
>>> import polars as pl
>>> from arkas.state import DataFrameState
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 1, 0, 0, 1, 0],
... "col2": [0, 1, 0, 1, 0, 1, 0],
... "col3": [0, 0, 0, 0, 1, 1, 1],
... }
... )
>>> state = DataFrameState(frame)
>>> state
DataFrameState(dataframe=(7, 3), nan_policy='propagate', figure_config=MatplotlibFigureConfig())
arkas.state.NullValueState ¶
Bases: BaseState
Implement a state that contains the number of null values per columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
null_count
|
ndarray
|
The array with the number of null values for each column. |
required |
total_count
|
ndarray
|
The total number of values for each column. |
required |
columns
|
Sequence[str]
|
The column names. |
required |
figure_config
|
BaseFigureConfig | None
|
An optional figure configuration. |
None
|
Example usage:
>>> import numpy as np
>>> from arkas.state import NullValueState
>>> state = NullValueState(
... null_count=np.array([0, 1, 2]),
... total_count=np.array([5, 5, 5]),
... columns=["col1", "col2", "col3"],
... )
>>> state
NullValueState(num_columns=3, figure_config=MatplotlibFigureConfig())
arkas.state.NullValueState.from_dataframe
classmethod
¶
from_dataframe(
dataframe: DataFrame,
figure_config: BaseFigureConfig | None = None,
) -> NullValueState
Instantiate a NullValueState
object from a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataframe
|
DataFrame
|
The DataFrame. |
required |
figure_config
|
BaseFigureConfig | None
|
An optional figure configuration. |
None
|
Returns:
Type | Description |
---|---|
NullValueState
|
The instantiated |
Example usage:
>>> import polars as pl
>>> from arkas.state import NullValueState
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 1, 0, 0, 1, None],
... "col2": [0, 1, None, None, 0, 1, 0],
... "col3": [None, 0, 0, 0, None, 1, None],
... }
... )
>>> state = NullValueState.from_dataframe(frame)
>>> state
NullValueState(num_columns=3, figure_config=MatplotlibFigureConfig())
arkas.state.NullValueState.to_dataframe ¶
to_dataframe() -> DataFrame
Export the content of the state to a DataFrame.
Returns:
Type | Description |
---|---|
DataFrame
|
The DataFrame. |
>>> import numpy as np
>>> from arkas.state import NullValueState
>>> state = NullValueState(
... null_count=np.array([0, 1, 2]),
... total_count=np.array([5, 5, 5]),
... columns=["col1", "col2", "col3"],
... )
>>> state.to_dataframe()
shape: (3, 3)
┌────────┬──────┬───────┐
│ column ┆ null ┆ total │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞════════╪══════╪═══════╡
│ col1 ┆ 0 ┆ 5 │
│ col2 ┆ 1 ┆ 5 │
│ col3 ┆ 2 ┆ 5 │
└────────┴──────┴───────┘
arkas.state.PrecisionRecallState ¶
Bases: BaseState
Implement a state for precision-recall-based metrics.
This state can be used in 3 different settings:
- binary:
y_true
must be an array of shape(n_samples,)
with0
and1
values, andy_pred
must be an array of shape(n_samples,)
. - multiclass:
y_true
must be an array of shape(n_samples,)
with values in{0, ..., n_classes-1}
, andy_pred
must be an array of shape(n_samples,)
. - multilabel:
y_true
must be an array of shape(n_samples, n_classes)
with0
and1
values, andy_pred
must be an array of shape(n_samples, n_classes)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y_true
|
ndarray
|
The ground truth target labels. This input must
be an array of shape |
required |
y_pred
|
ndarray
|
The predicted labels. This input must
be an array of shape |
required |
y_true_name
|
str
|
The name associated to the ground truth target labels. |
required |
y_pred_name
|
str
|
The name associated to the predicted labels. |
required |
label_type
|
str
|
The type of labels used to evaluate the metrics.
The valid values are: |
'auto'
|
nan_policy
|
str
|
The policy on how to handle NaN values in the input
arrays. The following options are available: |
'propagate'
|
Example usage:
>>> import numpy as np
>>> from arkas.state import PrecisionRecallState
>>> state = PrecisionRecallState(
... y_true=np.array([1, 0, 0, 1, 1]),
... y_pred=np.array([1, 0, 0, 1, 1]),
... y_true_name="target",
... y_pred_name="pred",
... )
>>> state
PrecisionRecallState(y_true=(5,), y_pred=(5,), y_true_name='target', y_pred_name='pred', label_type='binary', nan_policy='propagate')
arkas.state.ScatterDataFrameState ¶
Bases: DataFrameState
Implement the DataFrame state for scatter plots.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataframe
|
DataFrame
|
The DataFrame. |
required |
x
|
str
|
The x-axis data column. |
required |
y
|
str
|
The y-axis data column. |
required |
color
|
str | None
|
An optional color axis data column. |
None
|
nan_policy
|
str
|
The policy on how to handle NaN values in the input
arrays. The following options are available: |
'propagate'
|
figure_config
|
BaseFigureConfig | None
|
An optional figure configuration. |
None
|
**kwargs
|
Any
|
Additional keyword arguments. |
{}
|
Example usage:
>>> import polars as pl
>>> from arkas.state import ScatterDataFrameState
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 1, 0, 0, 1, 0],
... "col2": [0, 1, 0, 1, 0, 1, 0],
... "col3": [0, 0, 0, 0, 1, 1, 1],
... }
... )
>>> state = ScatterDataFrameState(frame, x="col1", y="col2")
>>> state
ScatterDataFrameState(dataframe=(7, 3), x='col1', y='col2', color=None, nan_policy='propagate', figure_config=MatplotlibFigureConfig())
arkas.state.SeriesState ¶
Bases: BaseState
Implement the Series state.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
series
|
Series
|
The Series. |
required |
figure_config
|
BaseFigureConfig | None
|
An optional figure configuration. |
None
|
Example usage:
>>> import polars as pl
>>> from arkas.state import SeriesState
>>> state = SeriesState(pl.Series("col1", [1, 2, 3, 4, 5, 6, 7]))
>>> state
SeriesState(name='col1', values=(7,), figure_config=MatplotlibFigureConfig())
arkas.state.TargetDataFrameState ¶
Bases: DataFrameState
Implement a DataFrame state with a target column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataframe
|
DataFrame
|
The DataFrame. |
required |
target_column
|
str
|
The target column in the DataFrame. |
required |
nan_policy
|
str
|
The policy on how to handle NaN values in the input
arrays. The following options are available: |
'propagate'
|
figure_config
|
BaseFigureConfig | None
|
An optional figure configuration. |
None
|
**kwargs
|
Any
|
Additional keyword arguments. |
{}
|
Example usage:
>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import TargetDataFrameState
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 1, 0, 0, 1, 0],
... "col2": [0, 1, 0, 1, 0, 1, 0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
... },
... schema={"col1": pl.Int64, "col2": pl.Int32, "col3": pl.Float64},
... )
>>> state = TargetDataFrameState(frame, target_column="col3")
>>> state
TargetDataFrameState(dataframe=(7, 3), target_column='col3', nan_policy='propagate', figure_config=MatplotlibFigureConfig())
arkas.state.TemporalColumnState ¶
Bases: TemporalDataFrameState
Implement the temporal DataFrame state.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataframe
|
DataFrame
|
The DataFrame. |
required |
target_column
|
str
|
The target column in the DataFrame. |
required |
temporal_column
|
str
|
The temporal column in the DataFrame. |
required |
period
|
str | None
|
An optional temporal period e.g. monthly or daily. |
None
|
nan_policy
|
str
|
The policy on how to handle NaN values in the input
arrays. The following options are available: |
'propagate'
|
figure_config
|
BaseFigureConfig | None
|
An optional figure configuration. |
None
|
**kwargs
|
Any
|
Additional keyword arguments. |
{}
|
Example usage:
>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import TemporalColumnState
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 1, 0],
... "col2": [0, 1, 0, 1],
... "datetime": [
... datetime(year=2020, month=1, day=3, tzinfo=timezone.utc),
... datetime(year=2020, month=2, day=3, tzinfo=timezone.utc),
... datetime(year=2020, month=3, day=3, tzinfo=timezone.utc),
... datetime(year=2020, month=4, day=3, tzinfo=timezone.utc),
... ],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Int64,
... "datetime": pl.Datetime(time_unit="us", time_zone="UTC"),
... },
... )
>>> state = TemporalColumnState(frame, target_column="col2", temporal_column="datetime")
>>> state
TemporalColumnState(dataframe=(4, 3), target_column='col2', temporal_column='datetime', period=None, nan_policy='propagate', figure_config=MatplotlibFigureConfig())
arkas.state.TemporalDataFrameState ¶
Bases: DataFrameState
Implement the temporal DataFrame state.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataframe
|
DataFrame
|
The DataFrame. |
required |
temporal_column
|
str
|
The temporal column in the DataFrame. |
required |
period
|
str | None
|
An optional temporal period e.g. monthly or daily. |
None
|
nan_policy
|
str
|
The policy on how to handle NaN values in the input
arrays. The following options are available: |
'propagate'
|
figure_config
|
BaseFigureConfig | None
|
An optional figure configuration. |
None
|
**kwargs
|
Any
|
Additional keyword arguments. |
{}
|
Example usage:
>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import TemporalDataFrameState
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 1, 0],
... "col2": [0, 1, 0, 1],
... "col3": [1, 0, 0, 0],
... "datetime": [
... datetime(year=2020, month=1, day=3, tzinfo=timezone.utc),
... datetime(year=2020, month=2, day=3, tzinfo=timezone.utc),
... datetime(year=2020, month=3, day=3, tzinfo=timezone.utc),
... datetime(year=2020, month=4, day=3, tzinfo=timezone.utc),
... ],
... },
... schema={
... "col1": pl.Int64,
... "col2": pl.Int64,
... "col3": pl.Int64,
... "datetime": pl.Datetime(time_unit="us", time_zone="UTC"),
... },
... )
>>> state = TemporalDataFrameState(frame, temporal_column="datetime")
>>> state
TemporalDataFrameState(dataframe=(4, 4), temporal_column='datetime', period=None, nan_policy='propagate', figure_config=MatplotlibFigureConfig())
arkas.state.TwoColumnDataFrameState ¶
Bases: DataFrameState
Implement a DataFrame state with a target column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataframe
|
DataFrame
|
The DataFrame. |
required |
column1
|
str
|
The first target column in the DataFrame. |
required |
column2
|
str
|
The second target column in the DataFrame. |
required |
nan_policy
|
str
|
The policy on how to handle NaN values in the input
arrays. The following options are available: |
'propagate'
|
figure_config
|
BaseFigureConfig | None
|
An optional figure configuration. |
None
|
**kwargs
|
Any
|
Additional keyword arguments. |
{}
|
Example usage:
>>> from datetime import datetime, timezone
>>> import polars as pl
>>> from arkas.state import TwoColumnDataFrameState
>>> frame = pl.DataFrame(
... {
... "col1": [0, 1, 1, 0, 0, 1, 0],
... "col2": [0, 1, 0, 1, 0, 1, 0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
... },
... schema={"col1": pl.Int64, "col2": pl.Int32, "col3": pl.Float64},
... )
>>> state = TwoColumnDataFrameState(frame, column1="col3", column2="col1")
>>> state
TwoColumnDataFrameState(dataframe=(7, 3), column1='col3', column2='col1', nan_policy='propagate', figure_config=MatplotlibFigureConfig())