Skip to content

section

flamme.section

Contain sections.

flamme.section.BaseSection

Bases: ABC

Define the base class to manage sections.

flamme.section.BaseSection.get_statistics abstractmethod

get_statistics() -> dict

Return the statistics associated to the section.

Returns:

Type Description
dict

The statistics.

flamme.section.BaseSection.render_html_body abstractmethod

render_html_body(
    number: str = "",
    tags: Sequence[str] = (),
    depth: int = 0,
) -> str

Return the HTML body associated to the section.

Parameters:

Name Type Description Default
number str

The section number.

''
tags Sequence[str]

The tags associated to the section.

()
depth int

The depth in the report.

0

Returns:

Type Description
str

The HTML body associated to the section.

flamme.section.BaseSection.render_html_toc abstractmethod

render_html_toc(
    number: str = "",
    tags: Sequence[str] = (),
    depth: int = 0,
    max_depth: int = 1,
) -> str

Return the HTML table of content (TOC) associated to the section.

Parameters:

Name Type Description Default
number str

The section number associated to the section.

''
tags Sequence[str]

The tags associated to the section.

()
depth int

The depth in the report.

0
max_depth int

The maximum depth to generate in the TOC.

1

Returns:

Type Description
str

The HTML table of content associated to the section.

flamme.section.ColumnContinuousAdvancedSection

Bases: BaseSection

Implement a section that analyzes a continuous distribution of values.

Parameters:

Name Type Description Default
series Series

The series/column to analyze.

required
column str

The column name.

required
nbins int | None

The number of bins in the histogram.

None
yscale str

The y-axis scale. If 'auto', the 'linear' or 'log'/'symlog' scale is chosen based on the distribution.

'auto'
figsize tuple[float, float] | None

The figure size in inches. The first dimension is the width and the second is the height.

None

Example usage:

>>> import pandas as pd
>>> from flamme.section import ColumnContinuousSection
>>> section = ColumnContinuousAdvancedSection(
...     series=pd.Series([np.nan, *list(range(101)), np.nan]), column="col"
... )
>>> section
ColumnContinuousAdvancedSection(
  (column): col
  (nbins): None
  (yscale): auto
  (figsize): None
)
>>> section.get_statistics()
{'count': 103, 'num_nulls': 2, 'nunique': 102, 'mean': 50.0, 'std': 29.30...,
 'skewness': 0.0, 'kurtosis': -1.200235294117647, 'min': 0.0,
 'q001': 0.1, 'q01': 1.0, 'q05': 5.0, 'q10': 10.0, 'q25': 25.0, 'median': 50.0,
 'q75': 75.0, 'q90': 90.0, 'q95': 95.0, 'q99': 99.0, 'q999': 99.9, 'max': 100.0,
 '>0': 100, '<0': 0, '=0': 1, 'num_non_nulls': 101}

flamme.section.ColumnContinuousAdvancedSection.figsize property

figsize: tuple[float, float] | None

The individual figure size in pixels.

The first dimension is the width and the second is the height.

flamme.section.ColumnContinuousSection

Bases: BaseSection

Implement a section that analyzes a continuous distribution of values.

Parameters:

Name Type Description Default
series Series

The series/column to analyze.

required
column str

The column name.

required
nbins int | None

The number of bins in the histogram.

None
yscale str

The y-axis scale. If 'auto', the 'linear' or 'log'/'symlog' scale is chosen based on the distribution.

'auto'
xmin float | str | None

The minimum value of the range or its associated quantile. q0.1 means the 10% quantile. 0 is the minimum value and 1 is the maximum value.

None
xmax float | str | None

The maximum value of the range or its associated quantile. q0.9 means the 90% quantile. 0 is the minimum value and 1 is the maximum value.

None
figsize tuple[float, float] | None

The figure size in inches. The first dimension is the width and the second is the height.

None

Example usage:

>>> import pandas as pd
>>> from flamme.section import ColumnContinuousSection
>>> section = ColumnContinuousSection(
...     series=pd.Series([np.nan, *list(range(101)), np.nan]), column="col"
... )
>>> section
ColumnContinuousSection(
  (column): col
  (nbins): None
  (yscale): auto
  (xmin): None
  (xmax): None
  (figsize): None
)
>>> section.get_statistics()
{'count': 103, 'num_nulls': 2, 'nunique': 102, 'mean': 50.0, 'std': 29.30...,
 'skewness': 0.0, 'kurtosis': -1.200235294117647, 'min': 0.0,
 'q001': 0.1, 'q01': 1.0, 'q05': 5.0, 'q10': 10.0, 'q25': 25.0, 'median': 50.0,
 'q75': 75.0, 'q90': 90.0, 'q95': 95.0, 'q99': 99.0, 'q999': 99.9, 'max': 100.0,
 '>0': 100, '<0': 0, '=0': 1, 'num_non_nulls': 101}

flamme.section.ColumnContinuousSection.figsize property

figsize: tuple[float, float] | None

The individual figure size in pixels.

The first dimension is the width and the second is the height.

flamme.section.ColumnDiscreteSection

Bases: BaseSection

Implement a section that analyzes a discrete distribution of values.

Parameters:

Name Type Description Default
counter Counter

The counter that represents the discrete distribution.

required
null_values int

The number of null values.

0
column str

The column name.

'N/A'
max_rows int

The maximum number of rows to show in the table.

20
yscale str

The y-axis scale. If 'auto', the 'linear' or 'log' scale is chosen based on the distribution.

'auto'
figsize tuple[float, float] | None

The figure size in inches. The first dimension is the width and the second is the height.

None

Example usage:

>>> from collections import Counter
>>> from flamme.section import ColumnDiscreteSection
>>> section = ColumnDiscreteSection(counter=Counter({"a": 4, "b": 2, "c": 6}), column="col")
>>> section
ColumnDiscreteSection(
  (null_values): 0
  (column): col
  (yscale): auto
  (max_rows): 20
  (figsize): None
)
>>> section.get_statistics()
{'most_common': [('c', 6), ('a', 4), ('b', 2)], 'nunique': 3, 'total': 12}

flamme.section.ColumnDiscreteSection.figsize property

figsize: tuple[float, float] | None

The individual figure size in pixels.

The first dimension is the width and the second is the height.

flamme.section.ColumnTemporalContinuousSection

Bases: BaseSection

Implement a section that analyzes the temporal distribution of a column with continuous values.

Parameters:

Name Type Description Default
frame DataFrame

The DataFrame to analyze.

required
column str

The column to analyze.

required
dt_column str

The datetime column used to analyze the temporal distribution.

required
period str

The temporal period e.g. monthly or daily.

required
yscale str

The y-axis scale. If 'auto', the 'linear' or 'log'/'symlog' scale is chosen based on the distribution.

'auto'
figsize tuple[float, float] | None

The figure size in inches. The first dimension is the width and the second is the height.

None

Example usage:

>>> import pandas as pd
>>> from flamme.section import ColumnContinuousSection
>>> section = ColumnTemporalContinuousSection(
...     frame=pd.DataFrame(
...         {
...             "col": np.array([1.2, 4.2, np.nan, 2.2]),
...             "datetime": pd.to_datetime(
...                 ["2020-01-03", "2020-02-03", "2020-03-03", "2020-04-03"]
...             ),
...         }
...     ),
...     column="col",
...     dt_column="datetime",
...     period="M",
... )
>>> section
ColumnTemporalContinuousSection(
  (column): col
  (dt_column): datetime
  (period): M
  (yscale): auto
  (figsize): None
)
>>> section.get_statistics()
{}

flamme.section.ColumnTemporalContinuousSection.figsize property

figsize: tuple[float, float] | None

The individual figure size in pixels.

The first dimension is the width and the second is the height.

flamme.section.ColumnTemporalDiscreteSection

Bases: BaseSection

Implement a section that analyzes the temporal distribution of a column with discrete values.

Parameters:

Name Type Description Default
frame DataFrame

The DataFrame to analyze.

required
column str

The column of the DataFrame to analyze.

required
dt_column str

The datetime column used to analyze the temporal distribution.

required
period str

The temporal period e.g. monthly or daily.

required
figsize tuple[float, float] | None

The figure size in inches. The first dimension is the width and the second is the height.

None

Example usage:

>>> import pandas as pd
>>> import numpy as np
>>> from flamme.section import ColumnTemporalDiscreteSection
>>> section = ColumnTemporalDiscreteSection(
...     frame=pd.DataFrame(
...         {
...             "col": np.array([1, 42, np.nan, 22]),
...             "col2": ["a", "b", 1, "a"],
...             "datetime": pd.to_datetime(
...                 ["2020-01-03", "2020-02-03", "2020-03-03", "2020-04-03"]
...             ),
...         }
...     ),
...     column="col",
...     dt_column="datetime",
...     period="M",
... )
>>> section
ColumnTemporalDiscreteSection(
  (column): col
  (dt_column): datetime
  (period): M
  (figsize): None
)
>>> section.get_statistics()
{}

flamme.section.ColumnTemporalDiscreteSection.figsize property

figsize: tuple[float, float] | None

The individual figure size in pixels.

The first dimension is the width and the second is the height.

flamme.section.ColumnTemporalNullValueSection

Bases: BaseSection

Implement a section to analyze the temporal distribution of null values for all columns.

Parameters:

Name Type Description Default
frame DataFrame

The DataFrame to analyze.

required
columns Sequence[str]

The list of columns to analyze. A plot is generated for each column.

required
dt_column str

The datetime column used to analyze the temporal distribution.

required
period str

The temporal period e.g. monthly or daily.

required
ncols int

The number of columns.

2
figsize tuple[float, float]

The figure size in inches. The first dimension is the width and the second is the height.

(7, 5)

Example usage:

>>> import pandas as pd
>>> import numpy as np
>>> from flamme.section import ColumnTemporalNullValueSection
>>> dataframe = pd.DataFrame(
...     {
...         "float": np.array([1.2, 4.2, np.nan, 2.2]),
...         "int": np.array([np.nan, 1, 0, 1]),
...         "str": np.array(["A", "B", None, np.nan]),
...         "datetime": pd.to_datetime(
...             ["2020-01-03", "2020-02-03", "2020-03-03", "2020-04-03"]
...         ),
...     }
... )
>>> section = ColumnTemporalNullValueSection(
...     frame=dataframe, columns=["float", "int", "str"], dt_column="datetime", period="M"
... )
>>> section
ColumnTemporalNullValueSection(
  (columns): ('float', 'int', 'str')
  (dt_column): datetime
  (period): M
  (ncols): 2
  (figsize): (7, 5)
)
>>> section.get_statistics()
{}

flamme.section.ColumnTemporalNullValueSection.columns property

columns: tuple[str, ...]

The columns to analyze.

flamme.section.ColumnTemporalNullValueSection.dt_column property

dt_column: str

The datetime column.

flamme.section.ColumnTemporalNullValueSection.figsize property

figsize: tuple[float, float] | None

The individual figure size in pixels.

The first dimension is the width and the second is the height.

flamme.section.ColumnTemporalNullValueSection.frame property

frame: DataFrame

The DataFrame to analyze.

flamme.section.ColumnTemporalNullValueSection.ncols property

ncols: int

The number of columns to show the figures.

flamme.section.ColumnTemporalNullValueSection.period property

period: str

The temporal period used to analyze the data.

flamme.section.ContentSection

Bases: BaseSection

Implement a section that generates the given custom content.

Parameters:

Name Type Description Default
content str

The content to use in the HTML code.

required

Example usage:

>>> from flamme.section import ContentSection
>>> section = ContentSection(content="meow")
>>> section
ContentSection()
>>> section.get_statistics()
{}

flamme.section.DataFrameSummarySection

Bases: BaseSection

Implement a section that returns a summary of a DataFrame.

Parameters:

Name Type Description Default
frame DataFrame

The DataFrame to analyze.

required
top int

The number of most frequent values to show.

5

Example usage:

```pycon

import pandas as pd import numpy as np from flamme.section import DataFrameSummarySection section = DataFrameSummarySection( ... frame=pd.DataFrame( ... { ... "col1": np.array([1.2, 4.2, 4.2, 2.2]), ... "col2": np.array([1, 1, 1, 1]), ... "col3": np.array([1, 2, 2, 2]), ... } ... ) ... ) section DataFrameSummarySection(top=5) section.get_statistics() {'columns': ('col1', 'col2', 'col3'), 'null_count': (0, 0, 0), 'nunique': (3, 1, 2), 'column_types': ({}, {}, {})}

flamme.section.DataFrameSummarySection.frame property

frame: DataFrame

The DataFrame to analyze.

flamme.section.DataTypeSection

Bases: BaseSection

Implement a section that analyzes the data type of each column.

Parameters:

Name Type Description Default
dtypes dict[str, DTypeLike]

The data type for each column.

required
types dict[str, set]

The types of the values in each column. A column can contain multiple types. The keys are the column names.

required

Example usage:

>>> import numpy as np
>>> from flamme.section import DataTypeSection
>>> section = DataTypeSection(
...     dtypes={
...         "float": np.dtype("float64"),
...         "int": np.dtype("float64"),
...         "str": np.dtype("O"),
...     },
...     types={"float": {float}, "int": {int}, "str": {str, type(None)}},
... )
>>> section
DataTypeSection(
  (dtypes): {'float': dtype('float64'), 'int': dtype('float64'), 'str': dtype('O')}
  (types): {'float': {<class 'float'>}, 'int': {<class 'int'>}, 'str': {<class 'NoneType'>, <class 'str'>}}
)
>>> section.get_statistics()
{'float': {<class 'float'>}, 'int': {<class 'int'>}, 'str': {<class 'NoneType'>, <class 'str'>}}

flamme.section.DuplicatedRowSection

Bases: BaseSection

Implement a section to analyze the number of duplicated rows.

Parameters:

Name Type Description Default
frame DataFrame

The DataFrame to analyze.

required
columns Sequence[str] | None

The columns used to compute the duplicated rows. None means all the columns.

None
figsize tuple[float, float] | None

The figure size in inches. The first dimension is the width and the second is the height.

None

Example usage:

>>> import pandas as pd
>>> import numpy as np
>>> from flamme.section import DuplicatedRowSection
>>> section = DuplicatedRowSection(
...     frame=pd.DataFrame(
...         {
...             "col1": np.array([1.2, 4.2, 4.2, 2.2]),
...             "col2": np.array([1, 1, 1, 1]),
...             "col3": np.array([1, 2, 2, 2]),
...         }
...     )
... )
>>> section
DuplicatedRowSection(
  (columns): None
  (figsize): None
)
>>> section.get_statistics()
{'num_rows': 4, 'num_unique_rows': 3}

flamme.section.DuplicatedRowSection.columns property

columns: tuple[str, ...] | None

Tuple or None: The columns used to compute the duplicated rows.

flamme.section.DuplicatedRowSection.figsize property

figsize: tuple[float, float] | None

The individual figure size in pixels.

The first dimension is the width and the second is the height.

flamme.section.DuplicatedRowSection.frame property

frame: DataFrame

The DataFrame to analyze.

flamme.section.EmptySection

Bases: BaseSection

Implement an empty section.

This section is implemented to deal with missing columns or to skip some analyses.

Example usage:

>>> import pandas as pd
>>> import numpy as np
>>> from flamme.section import EmptySection
>>> section = EmptySection()
>>> section
EmptySection()
>>> section.get_statistics()
{}

flamme.section.MarkdownSection

Bases: BaseSection

Implement a section that converts a markdown string into HTML.

Parameters:

Name Type Description Default
desc str

The markdown string to convert.

required

Example usage:

>>> from flamme.section import MarkdownSection
>>> section = MarkdownSection(desc="meow")
>>> section
MarkdownSection()
>>> section.get_statistics()
{}

flamme.section.MostFrequentValuesSection

Bases: BaseSection

Implement a section that analyzes the most frequent values for a given columns.

Parameters:

Name Type Description Default
counter Counter

The counter with the number of occurrences for all values.

required
column str

The column name.

required
top int

The maximum number of values to show.

100

Example usage:

>>> from collections import Counter
>>> from flamme.section import MostFrequentValuesSection
>>> section = MostFrequentValuesSection(
...     counter=Counter({"a": 4, "b": 2, "c": 6}), column="col"
... )
>>> section
MostFrequentValuesSection(
  (counter): Counter({'c': 6, 'a': 4, 'b': 2})
  (column): col
  (top): 100
  (total): 12
)
>>> section.get_statistics()
{'most_common': [('c', 6), ('a', 4), ('b', 2)]}

flamme.section.NullValueSection

Bases: BaseSection

Implement a section that analyzes the number of null values.

Parameters:

Name Type Description Default
columns Sequence[str]

The column names.

required
null_count ndarray

The number of null values for each column.

required
total_count ndarray

The total number of values for each column.

required
figsize tuple[float, float] | None

The figure size in inches. The first dimension is the width and the second is the height.

None

Example usage:

>>> import numpy as np
>>> from flamme.section import NullValueSection
>>> section = NullValueSection(
...     columns=["col1", "col2", "col3"],
...     null_count=np.array([0, 1, 2]),
...     total_count=np.array([5, 5, 5]),
... )
>>> section
NullValueSection(
  (columns): ('col1', 'col2', 'col3')
  (null_count): array([0, 1, 2])
  (total_count): array([5, 5, 5])
  (figsize): None
)
>>> section.get_statistics()
{'columns': ('col1', 'col2', 'col3'), 'null_count': (0, 1, 2), 'total_count': (5, 5, 5)}

flamme.section.NullValueSection.columns property

columns: tuple[str, ...]

The columns used to compute the duplicated rows.

flamme.section.NullValueSection.figsize property

figsize: tuple[float, float] | None

The individual figure size in pixels.

The first dimension is the width and the second is the height.

flamme.section.NullValueSection.null_count property

null_count: ndarray

The number of null values for each column.

flamme.section.NullValueSection.total_count property

total_count: ndarray

The total number of values for each column.

flamme.section.SectionDict

Bases: BaseSection

Implement a section to manage a dictionary of sections.

Parameters:

Name Type Description Default
sections dict[str, BaseSection]

The dictionary of sections.

required
max_toc_depth int

The maximum level to show in the table of content. Set this value to 0 to not show the table of content at the beginning of the section.

0

Example usage:

>>> import pandas as pd
>>> from flamme.section import SectionDict, ContentSection, TemporalRowCountSection
>>> frame = pd.DataFrame(
...     {
...         "datetime": pd.to_datetime(
...             [
...                 "2020-01-03",
...                 "2020-01-04",
...                 "2020-01-05",
...                 "2020-02-03",
...                 "2020-03-03",
...                 "2020-04-03",
...             ]
...         )
...     }
... )
>>> section = SectionDict(
...     {
...         "content": ContentSection("meow"),
...         "rows": TemporalRowCountSection(frame, dt_column="datetime", period="M"),
...     }
... )
>>> section
SectionDict(
  (content): ContentSection()
  (rows): TemporalRowCountSection(dt_column=datetime, period=M, figsize=None)
)
>>> section.get_statistics()
{'content': {}, 'rows': {}}

flamme.section.TableOfContentSection

Bases: BaseSection

Implement a wrapper section that generates a table of content before the section.

Parameters:

Name Type Description Default
section BaseSection

The section.

required
max_toc_depth int

The maximum level to show in the table of content.

1

Example usage:

>>> import pandas as pd
>>> import numpy as np
>>> from flamme.section import TableOfContentSection, DuplicatedRowSection
>>> section = TableOfContentSection(
...     DuplicatedRowSection(
...         frame=pd.DataFrame(
...             {
...                 "col1": np.array([1.2, 4.2, 4.2, 2.2]),
...                 "col2": np.array([1, 1, 1, 1]),
...                 "col3": np.array([1, 2, 2, 2]),
...             }
...         )
...     )
... )
>>> section
TableOfContentSection(
  (section): DuplicatedRowSection(
      (columns): None
      (figsize): None
    )
  (max_toc_depth): 1
)
>>> section.get_statistics()
{'num_rows': 4, 'num_unique_rows': 3}

flamme.section.TemporalNullValueSection

Bases: BaseSection

Implement a section to analyze the temporal distribution of null values for all columns.

Parameters:

Name Type Description Default
frame DataFrame

The DataFrame to analyze.

required
dt_column str

The datetime column used to analyze the temporal distribution.

required
period str

The temporal period e.g. monthly or daily.

required
figsize tuple[float, float] | None

The figure size in inches. The first dimension is the width and the second is the height.

None

Example usage:

>>> import pandas as pd
>>> import numpy as np
>>> from flamme.section import TemporalNullValueSection
>>> section = TemporalNullValueSection(
...     frame=pd.DataFrame(
...         {
...             "col1": np.array([1.2, 4.2, np.nan, 2.2]),
...             "col2": np.array([np.nan, 1, np.nan, 1]),
...             "datetime": pd.to_datetime(
...                 ["2020-01-03", "2020-02-03", "2020-03-03", "2020-04-03"]
...             ),
...         }
...     ),
...     columns=["col1", "col2"],
...     dt_column="datetime",
...     period="M",
... )
>>> section
TemporalNullValueSection(
  (columns): ('col1', 'col2')
  (dt_column): datetime
  (period): M
  (figsize): None
)
>>> section.get_statistics()
{}

flamme.section.TemporalNullValueSection.columns property

columns: tuple[str, ...]

The columns to analyze.

flamme.section.TemporalNullValueSection.dt_column property

dt_column: str

The datetime column.

flamme.section.TemporalNullValueSection.figsize property

figsize: tuple[float, float] | None

The individual figure size in pixels.

The first dimension is the width and the second is the height.

flamme.section.TemporalNullValueSection.frame property

frame: DataFrame

The DataFrame to analyze.

flamme.section.TemporalNullValueSection.period property

period: str

The temporal period used to analyze the data.

flamme.section.TemporalRowCountSection

Bases: BaseSection

Implement a section to analyze the number of rows per temporal window.

Parameters:

Name Type Description Default
frame DataFrame

The DataFrame to analyze.

required
dt_column str

The datetime column used to analyze the temporal distribution.

required
period str

The temporal period e.g. monthly or daily.

required
figsize tuple[float, float] | None

The figure size in inches. The first dimension is the width and the second is the height.

None

Example usage:

>>> import pandas as pd
>>> from flamme.section import TemporalRowCountSection
>>> section = TemporalRowCountSection(
...     frame=pd.DataFrame(
...         {
...             "datetime": pd.to_datetime(
...                 [
...                     "2020-01-03",
...                     "2020-01-04",
...                     "2020-01-05",
...                     "2020-02-03",
...                     "2020-03-03",
...                     "2020-04-03",
...                 ]
...             )
...         }
...     ),
...     dt_column="datetime",
...     period="M",
... )
>>> section
TemporalRowCountSection(dt_column=datetime, period=M, figsize=None)
>>> section.get_statistics()
{}

flamme.section.TemporalRowCountSection.dt_column property

dt_column: str

The datetime column.

flamme.section.TemporalRowCountSection.figsize property

figsize: tuple[float, float] | None

The individual figure size in pixels.

The first dimension is the width and the second is the height.

flamme.section.TemporalRowCountSection.frame property

frame: DataFrame

The DataFrame to analyze.

flamme.section.TemporalRowCountSection.period property

period: str

The temporal period used to analyze the data.