Utils
votingsys.utils ¶
Contain utility functions.
votingsys.utils.counter ¶
Contain counter utility functions.
votingsys.utils.counter.check_non_empty_count ¶
check_non_empty_count(counter: Counter) -> None
Check if the counter is not empty.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
counter
|
Counter
|
The counter to check. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
if the counter is empty. |
Example usage:
>>> from collections import Counter
>>> from votingsys.utils.counter import check_non_empty_count
>>> check_non_empty_count(Counter({"a": 10, "b": 2, "c": 5, "d": 3}))
votingsys.utils.counter.check_non_negative_count ¶
check_non_negative_count(counter: Counter) -> None
Check if all the count values are non-negative (>=0).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
counter
|
Counter
|
The counter to check. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
if at least one count is negative (<0). |
Example usage:
>>> from collections import Counter
>>> from votingsys.utils.counter import check_non_negative_count
>>> check_non_negative_count(Counter({"a": 10, "b": 2, "c": 5, "d": 3}))
votingsys.utils.dataframe ¶
Contain DataFrame utility functions.
votingsys.utils.dataframe.check_column_exist ¶
check_column_exist(frame: DataFrame, col: str) -> None
Check if a column exists in a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The DataFrame to check. |
required |
col
|
str
|
The column that should exist in the DataFrame. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
if the column is missing in the DataFrame. |
Example usage:
>>> import polars as pl
>>> from votingsys.utils.dataframe import check_column_exist
>>> check_column_exist(
... pl.DataFrame({"a": [0, 1, 2, 1, 0], "b": [1, 2, 0, 2, 1], "c": [2, 0, 1, 0, 2]}),
... col="a",
... )
votingsys.utils.dataframe.check_column_missing ¶
check_column_missing(frame: DataFrame, col: str) -> None
Check if a column is missing in a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The DataFrame to check. |
required |
col
|
str
|
The column that should be missing in the DataFrame. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
if the column exists in the DataFrame. |
Example usage:
>>> import polars as pl
>>> from votingsys.utils.dataframe import check_column_missing
>>> check_column_missing(
... pl.DataFrame({"a": [0, 1, 2, 1, 0], "b": [1, 2, 0, 2, 1], "c": [2, 0, 1, 0, 2]}),
... col="col",
... )
votingsys.utils.dataframe.remove_zero_weight_rows ¶
remove_zero_weight_rows(
frame: DataFrame, weight_col: str
) -> DataFrame
Remove all rows from a DataFrame where the weight value is zero.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame from which rows should be filtered. |
required |
weight_col
|
str
|
The name of the column that contains the weight values. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A new DataFrame with all rows removed where the weight is zero. |
Raises:
Type | Description |
---|---|
ValueError
|
if |
Example usage:
>>> import polars as pl
>>> from votingsys.utils.dataframe import remove_zero_weight_rows
>>> out = remove_zero_weight_rows(
... pl.DataFrame(
... {
... "a": [0, 1, 2, 0, 1, 2],
... "b": [1, 2, 0, 1, 2, 0],
... "c": [2, 0, 1, 2, 0, 1],
... "weight": [3, 0, 2, 1, 2, 0],
... }
... ),
... weight_col="weight",
... )
>>> out
shape: (4, 4)
┌─────┬─────┬─────┬────────┐
│ a ┆ b ┆ c ┆ weight │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪════════╡
│ 0 ┆ 1 ┆ 2 ┆ 3 │
│ 2 ┆ 0 ┆ 1 ┆ 2 │
│ 0 ┆ 1 ┆ 2 ┆ 1 │
│ 1 ┆ 2 ┆ 0 ┆ 2 │
└─────┴─────┴─────┴────────┘
votingsys.utils.dataframe.sum_weights_by_group ¶
sum_weights_by_group(
frame: DataFrame, weight_col: str
) -> DataFrame
Aggregate a DataFrame by summing the weight values for rows with identical values in all columns except the weight column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame to aggregate. |
required |
weight_col
|
str
|
The name of the column that contains the weight values to be summed. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A new DataFrame with rows grouped by all non-weight columns, |
DataFrame
|
and the weight column summed within each group. |
Raises:
Type | Description |
---|---|
ValueError
|
if |
Example usage:
>>> import polars as pl
>>> from votingsys.utils.dataframe import sum_weights_by_group
>>> out = sum_weights_by_group(
... pl.DataFrame(
... {
... "a": [0, 1, 2, 0, 1, 2],
... "b": [1, 2, 0, 1, 2, 0],
... "c": [2, 0, 1, 2, 0, 1],
... "weight": [3, 5, 2, 1, 2, -2],
... }
... ),
... weight_col="weight",
... )
>>> out.sort("weight", descending=True)
shape: (3, 4)
┌─────┬─────┬─────┬────────┐
│ a ┆ b ┆ c ┆ weight │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪════════╡
│ 1 ┆ 2 ┆ 0 ┆ 7 │
│ 0 ┆ 1 ┆ 2 ┆ 4 │
│ 2 ┆ 0 ┆ 1 ┆ 0 │
└─────┴─────┴─────┴────────┘
votingsys.utils.dataframe.value_count ¶
value_count(frame: DataFrame, value: Any) -> dict[str, int]
Count the occurrences of a given value in each column of a DataFrame.
This function computes how many times a specified value appears in each column. Null values are ignored during the counting process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame. |
required |
value
|
Any
|
The value to count in each column. |
required |
Returns:
Type | Description |
---|---|
dict[str, int]
|
A dictionary mapping each column name to the number of times the specified value appears. |
Raises:
Type | Description |
---|---|
ValueError
|
If the specified value is |
Example usage:
>>> import polars as pl
>>> from votingsys.utils.dataframe import value_count
>>> counts = value_count(
... pl.DataFrame({"a": [0, 1, 2, 1, 0], "b": [1, 2, 0, 2, 1], "c": [2, 0, 1, 0, 2]}),
... value=1,
... )
>>> counts
{'a': 2, 'b': 2, 'c': 1}
votingsys.utils.dataframe.weighted_value_count ¶
weighted_value_count(
frame: DataFrame, value: int, weight_col: str
) -> dict[str, int | float]
Count the weighted occurrences of a given value in each column of a DataFrame.
This function computes how many times a specified value appears in each column, weighted by the values in a separate count column. Null values are ignored during the counting process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame
|
DataFrame
|
The input DataFrame. |
required |
value
|
int
|
The value to count in each column. |
required |
weight_col
|
str
|
The name of the column that holds the weight for each row. |
required |
Returns:
Type | Description |
---|---|
dict[str, int | float]
|
A dictionary mapping each column name (excluding the count column) to the weighted number of times the specified value appears. |
Raises:
Type | Description |
---|---|
ValueError
|
if the weight column is missing in the DataFrame. |
Example usage:
>>> import polars as pl
>>> from votingsys.utils.dataframe import weighted_value_count
>>> counts = weighted_value_count(
... pl.DataFrame({"a": [0, 1, 2], "b": [1, 2, 0], "c": [2, 0, 1], "count": [3, 5, 2]}),
... value=1,
... weight_col="count",
... )
>>> counts
{'a': 5, 'b': 3, 'c': 2}
votingsys.utils.mapping ¶
Contain mapping utility functions.
votingsys.utils.mapping.find_max_in_mapping ¶
find_max_in_mapping(
mapping: Mapping[str, float],
) -> tuple[tuple[str, ...], float]
Find the maximum value in a mapping and returns the corresponding key(s) and the value.
If multiple keys have the same maximum value, all such keys are returned in a list.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mapping
|
Mapping[str, float]
|
A mapping from keys to numeric values. |
required |
Returns:
Type | Description |
---|---|
tuple[tuple[str, ...], float]
|
A tuple containing the tuple of keys with the maximum value and the maximum value itself. |
Raises:
Type | Description |
---|---|
ValueError
|
if the mapping is empty. |
Example usage:
>>> import polars as pl
>>> from votingsys.utils.mapping import find_max_in_mapping
>>> out = find_max_in_mapping({"x": 3, "y": 1})
>>> out
(('x',), 3)
>>> out = find_max_in_mapping({"a": 10, "b": 20, "c": 20})
>>> out
(('b', 'c'), 20)
votingsys.utils.timing ¶
Contain utility functions to measure time.
votingsys.utils.timing.timeblock ¶
timeblock(
message: str = "Total time: {time}",
) -> Generator[None]
Implement a context manager to measure the execution time of a block of code.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
message
|
str
|
The message displayed when the time is logged. |
'Total time: {time}'
|
Example usage:
>>> from votingsys.utils.timing import timeblock
>>> with timeblock():
... x = [1, 2, 3]
...
>>> with timeblock("Training: {time}"):
... y = [1, 2, 3]
...