Utils

votingsys.utils ¶

Contain utility functions.

votingsys.utils.counter ¶

Contain counter utility functions.

votingsys.utils.counter.check_non_empty_count ¶

check_non_empty_count(counter: Counter) -> None

Check if the counter is not empty.

Parameters:

Name	Type	Description	Default
`counter`	`Counter`	The counter to check.	required

Raises:

Type	Description
`ValueError`	if the counter is empty.

Example usage:

>>> from collections import Counter
>>> from votingsys.utils.counter import check_non_empty_count
>>> check_non_empty_count(Counter({"a": 10, "b": 2, "c": 5, "d": 3}))

votingsys.utils.counter.check_non_negative_count ¶

check_non_negative_count(counter: Counter) -> None

Check if all the count values are non-negative (>=0).

Parameters:

Name	Type	Description	Default
`counter`	`Counter`	The counter to check.	required

Raises:

Type	Description
`ValueError`	if at least one count is negative (<0).

Example usage:

>>> from collections import Counter
>>> from votingsys.utils.counter import check_non_negative_count
>>> check_non_negative_count(Counter({"a": 10, "b": 2, "c": 5, "d": 3}))

votingsys.utils.dataframe ¶

Contain DataFrame utility functions.

votingsys.utils.dataframe.check_column_exist ¶

check_column_exist(frame: DataFrame, col: str) -> None

Check if a column exists in a DataFrame.

Parameters:

Name	Type	Description	Default
`frame`	`DataFrame`	The DataFrame to check.	required
`col`	`str`	The column that should exist in the DataFrame.	required

Raises:

Type	Description
`ValueError`	if the column is missing in the DataFrame.

Example usage:

>>> import polars as pl
>>> from votingsys.utils.dataframe import check_column_exist
>>> check_column_exist(
...     pl.DataFrame({"a": [0, 1, 2, 1, 0], "b": [1, 2, 0, 2, 1], "c": [2, 0, 1, 0, 2]}),
...     col="a",
... )

votingsys.utils.dataframe.check_column_missing ¶

check_column_missing(frame: DataFrame, col: str) -> None

Check if a column is missing in a DataFrame.

Parameters:

Name	Type	Description	Default
`frame`	`DataFrame`	The DataFrame to check.	required
`col`	`str`	The column that should be missing in the DataFrame.	required

Raises:

Type	Description
`ValueError`	if the column exists in the DataFrame.

Example usage:

>>> import polars as pl
>>> from votingsys.utils.dataframe import check_column_missing
>>> check_column_missing(
...     pl.DataFrame({"a": [0, 1, 2, 1, 0], "b": [1, 2, 0, 2, 1], "c": [2, 0, 1, 0, 2]}),
...     col="col",
... )

votingsys.utils.dataframe.remove_zero_weight_rows ¶

remove_zero_weight_rows(
    frame: DataFrame, weight_col: str
) -> DataFrame

Remove all rows from a DataFrame where the weight value is zero.

Parameters:

Name	Type	Description	Default
`frame`	`DataFrame`	The input DataFrame from which rows should be filtered.	required
`weight_col`	`str`	The name of the column that contains the weight values.	required

Returns:

Type	Description
`DataFrame`	A new DataFrame with all rows removed where the weight is zero.

Raises:

Type	Description
`ValueError`	if `weight_col` does not exist in the DataFrame.

Example usage:

>>> import polars as pl
>>> from votingsys.utils.dataframe import remove_zero_weight_rows
>>> out = remove_zero_weight_rows(
...     pl.DataFrame(
...         {
...             "a": [0, 1, 2, 0, 1, 2],
...             "b": [1, 2, 0, 1, 2, 0],
...             "c": [2, 0, 1, 2, 0, 1],
...             "weight": [3, 0, 2, 1, 2, 0],
...         }
...     ),
...     weight_col="weight",
... )
>>> out
shape: (4, 4)
┌─────┬─────┬─────┬────────┐
│ a   ┆ b   ┆ c   ┆ weight │
│ --- ┆ --- ┆ --- ┆ ---    │
│ i64 ┆ i64 ┆ i64 ┆ i64    │
╞═════╪═════╪═════╪════════╡
│ 0   ┆ 1   ┆ 2   ┆ 3      │
│ 2   ┆ 0   ┆ 1   ┆ 2      │
│ 0   ┆ 1   ┆ 2   ┆ 1      │
│ 1   ┆ 2   ┆ 0   ┆ 2      │
└─────┴─────┴─────┴────────┘

votingsys.utils.dataframe.sum_weights_by_group ¶

sum_weights_by_group(
    frame: DataFrame, weight_col: str
) -> DataFrame

Aggregate a DataFrame by summing the weight values for rows with identical values in all columns except the weight column.

Parameters:

Name	Type	Description	Default
`frame`	`DataFrame`	The input DataFrame to aggregate.	required
`weight_col`	`str`	The name of the column that contains the weight values to be summed.	required

Returns:

Type	Description
`DataFrame`	A new DataFrame with rows grouped by all non-weight columns,
`DataFrame`	and the weight column summed within each group.

Raises:

Type	Description
`ValueError`	if `weight_col` does not exist in the DataFrame.

Example usage:

>>> import polars as pl
>>> from votingsys.utils.dataframe import sum_weights_by_group
>>> out = sum_weights_by_group(
...     pl.DataFrame(
...         {
...             "a": [0, 1, 2, 0, 1, 2],
...             "b": [1, 2, 0, 1, 2, 0],
...             "c": [2, 0, 1, 2, 0, 1],
...             "weight": [3, 5, 2, 1, 2, -2],
...         }
...     ),
...     weight_col="weight",
... )
>>> out.sort("weight", descending=True)
shape: (3, 4)
┌─────┬─────┬─────┬────────┐
│ a   ┆ b   ┆ c   ┆ weight │
│ --- ┆ --- ┆ --- ┆ ---    │
│ i64 ┆ i64 ┆ i64 ┆ i64    │
╞═════╪═════╪═════╪════════╡
│ 1   ┆ 2   ┆ 0   ┆ 7      │
│ 0   ┆ 1   ┆ 2   ┆ 4      │
│ 2   ┆ 0   ┆ 1   ┆ 0      │
└─────┴─────┴─────┴────────┘

votingsys.utils.dataframe.value_count ¶

value_count(frame: DataFrame, value: Any) -> dict[str, int]

Count the occurrences of a given value in each column of a DataFrame.

This function computes how many times a specified value appears in each column. Null values are ignored during the counting process.

Parameters:

Name	Type	Description	Default
`frame`	`DataFrame`	The input DataFrame.	required
`value`	`Any`	The value to count in each column.	required

Returns:

Type	Description
`dict[str, int]`	A dictionary mapping each column name to the number of times the specified value appears.

Raises:

Type	Description
`ValueError`	If the specified value is `None`.

Example usage:

>>> import polars as pl
>>> from votingsys.utils.dataframe import value_count
>>> counts = value_count(
...     pl.DataFrame({"a": [0, 1, 2, 1, 0], "b": [1, 2, 0, 2, 1], "c": [2, 0, 1, 0, 2]}),
...     value=1,
... )
>>> counts
{'a': 2, 'b': 2, 'c': 1}

votingsys.utils.dataframe.weighted_value_count ¶

weighted_value_count(
    frame: DataFrame, value: int, weight_col: str
) -> dict[str, int | float]

Count the weighted occurrences of a given value in each column of a DataFrame.

This function computes how many times a specified value appears in each column, weighted by the values in a separate count column. Null values are ignored during the counting process.

Parameters:

Name	Type	Description	Default
`frame`	`DataFrame`	The input DataFrame.	required
`value`	`int`	The value to count in each column.	required
`weight_col`	`str`	The name of the column that holds the weight for each row.	required

Returns:

Type	Description
`dict[str, int \| float]`	A dictionary mapping each column name (excluding the count column) to the weighted number of times the specified value appears.

Raises:

Type	Description
`ValueError`	if the weight column is missing in the DataFrame.

Example usage:

>>> import polars as pl
>>> from votingsys.utils.dataframe import weighted_value_count
>>> counts = weighted_value_count(
...     pl.DataFrame({"a": [0, 1, 2], "b": [1, 2, 0], "c": [2, 0, 1], "count": [3, 5, 2]}),
...     value=1,
...     weight_col="count",
... )
>>> counts
{'a': 5, 'b': 3, 'c': 2}

votingsys.utils.mapping ¶

Contain mapping utility functions.

votingsys.utils.mapping.find_max_in_mapping ¶

find_max_in_mapping(
    mapping: Mapping[str, float],
) -> tuple[tuple[str, ...], float]

Find the maximum value in a mapping and returns the corresponding key(s) and the value.

If multiple keys have the same maximum value, all such keys are returned in a list.

Parameters:

Name	Type	Description	Default
`mapping`	`Mapping[str, float]`	A mapping from keys to numeric values.	required

Returns:

Type	Description
`tuple[tuple[str, ...], float]`	A tuple containing the tuple of keys with the maximum value and the maximum value itself.

Raises:

Type	Description
`ValueError`	if the mapping is empty.

Example usage:

>>> import polars as pl
>>> from votingsys.utils.mapping import find_max_in_mapping
>>> out = find_max_in_mapping({"x": 3, "y": 1})
>>> out
(('x',), 3)
>>> out = find_max_in_mapping({"a": 10, "b": 20, "c": 20})
>>> out
(('b', 'c'), 20)

votingsys.utils.timing ¶

Contain utility functions to measure time.

votingsys.utils.timing.timeblock ¶

timeblock(
    message: str = "Total time: {time}",
) -> Generator[None]

Implement a context manager to measure the execution time of a block of code.

Parameters:

Name	Type	Description	Default
`message`	`str`	The message displayed when the time is logged.	`'Total time: {time}'`

Example usage:

>>> from votingsys.utils.timing import timeblock
>>> with timeblock():
...     x = [1, 2, 3]
...
>>> with timeblock("Training: {time}"):
...     y = [1, 2, 3]
...