section
flamme.section ¶
Contain sections.
flamme.section.BaseSection ¶
Bases: ABC
Define the base class to manage sections.
flamme.section.BaseSection.get_statistics
abstractmethod
¶
get_statistics() -> dict
Return the statistics associated to the section.
Returns:
Type | Description |
---|---|
dict
|
The statistics. |
flamme.section.BaseSection.render_html_body
abstractmethod
¶
render_html_body(
number: str = "",
tags: Sequence[str] = (),
depth: int = 0,
) -> str
Return the HTML body associated to the section.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
number |
str
|
The section number. |
''
|
tags |
Sequence[str]
|
The tags associated to the section. |
()
|
depth |
int
|
The depth in the report. |
0
|
Returns:
Type | Description |
---|---|
str
|
The HTML body associated to the section. |
flamme.section.BaseSection.render_html_toc
abstractmethod
¶
render_html_toc(
number: str = "",
tags: Sequence[str] = (),
depth: int = 0,
max_depth: int = 1,
) -> str
Return the HTML table of content (TOC) associated to the section.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
number |
str
|
The section number associated to the section. |
''
|
tags |
Sequence[str]
|
The tags associated to the section. |
()
|
depth |
int
|
The depth in the report. |
0
|
max_depth |
int
|
The maximum depth to generate in the TOC. |
1
|
Returns:
Type | Description |
---|---|
str
|
The HTML table of content associated to the section. |
flamme.section.ColumnContinuousAdvancedSection ¶
Bases: BaseSection
Implement a section that analyzes a continuous distribution of values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
series |
Series
|
The series/column to analyze. |
required |
column |
str
|
The column name. |
required |
nbins |
int | None
|
The number of bins in the histogram. |
None
|
yscale |
str
|
The y-axis scale. If |
'auto'
|
figsize |
tuple[float, float] | None
|
The figure size in inches. The first dimension is the width and the second is the height. |
None
|
Example usage:
>>> import pandas as pd
>>> from flamme.section import ColumnContinuousSection
>>> section = ColumnContinuousAdvancedSection(
... series=pd.Series([np.nan, *list(range(101)), np.nan]), column="col"
... )
>>> section
ColumnContinuousAdvancedSection(
(column): col
(nbins): None
(yscale): auto
(figsize): None
)
>>> section.get_statistics()
{'count': 103, 'num_nulls': 2, 'nunique': 102, 'mean': 50.0, 'std': 29.30...,
'skewness': 0.0, 'kurtosis': -1.200235294117647, 'min': 0.0,
'q001': 0.1, 'q01': 1.0, 'q05': 5.0, 'q10': 10.0, 'q25': 25.0, 'median': 50.0,
'q75': 75.0, 'q90': 90.0, 'q95': 95.0, 'q99': 99.0, 'q999': 99.9, 'max': 100.0,
'>0': 100, '<0': 0, '=0': 1, 'num_non_nulls': 101}
flamme.section.ColumnContinuousAdvancedSection.figsize
property
¶
figsize: tuple[float, float] | None
The individual figure size in pixels.
The first dimension is the width and the second is the height.
flamme.section.ColumnContinuousSection ¶
Bases: BaseSection
Implement a section that analyzes a continuous distribution of values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
series |
Series
|
The series/column to analyze. |
required |
column |
str
|
The column name. |
required |
nbins |
int | None
|
The number of bins in the histogram. |
None
|
yscale |
str
|
The y-axis scale. If |
'auto'
|
xmin |
float | str | None
|
The minimum value of the range or its
associated quantile. |
None
|
xmax |
float | str | None
|
The maximum value of the range or its
associated quantile. |
None
|
figsize |
tuple[float, float] | None
|
The figure size in inches. The first dimension is the width and the second is the height. |
None
|
Example usage:
>>> import pandas as pd
>>> from flamme.section import ColumnContinuousSection
>>> section = ColumnContinuousSection(
... series=pd.Series([np.nan, *list(range(101)), np.nan]), column="col"
... )
>>> section
ColumnContinuousSection(
(column): col
(nbins): None
(yscale): auto
(xmin): None
(xmax): None
(figsize): None
)
>>> section.get_statistics()
{'count': 103, 'num_nulls': 2, 'nunique': 102, 'mean': 50.0, 'std': 29.30...,
'skewness': 0.0, 'kurtosis': -1.200235294117647, 'min': 0.0,
'q001': 0.1, 'q01': 1.0, 'q05': 5.0, 'q10': 10.0, 'q25': 25.0, 'median': 50.0,
'q75': 75.0, 'q90': 90.0, 'q95': 95.0, 'q99': 99.0, 'q999': 99.9, 'max': 100.0,
'>0': 100, '<0': 0, '=0': 1, 'num_non_nulls': 101}
flamme.section.ColumnContinuousSection.figsize
property
¶
figsize: tuple[float, float] | None
The individual figure size in pixels.
The first dimension is the width and the second is the height.
flamme.section.ColumnDiscreteSection ¶
Bases: BaseSection
Implement a section that analyzes a discrete distribution of values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
counter |
Counter
|
The counter that represents the discrete distribution. |
required |
null_values |
int
|
The number of null values. |
0
|
column |
str
|
The column name. |
'N/A'
|
max_rows |
int
|
The maximum number of rows to show in the table. |
20
|
yscale |
str
|
The y-axis scale. If |
'auto'
|
figsize |
tuple[float, float] | None
|
The figure size in inches. The first dimension is the width and the second is the height. |
None
|
Example usage:
>>> from collections import Counter
>>> from flamme.section import ColumnDiscreteSection
>>> section = ColumnDiscreteSection(counter=Counter({"a": 4, "b": 2, "c": 6}), column="col")
>>> section
ColumnDiscreteSection(
(null_values): 0
(column): col
(yscale): auto
(max_rows): 20
(figsize): None
)
>>> section.get_statistics()
{'most_common': [('c', 6), ('a', 4), ('b', 2)], 'nunique': 3, 'total': 12}
flamme.section.ColumnDiscreteSection.figsize
property
¶
figsize: tuple[float, float] | None
The individual figure size in pixels.
The first dimension is the width and the second is the height.
flamme.section.ColumnTemporalContinuousSection ¶
Bases: BaseSection
Implement a section that analyzes the temporal distribution of a column with continuous values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame |
DataFrame
|
The DataFrame to analyze. |
required |
column |
str
|
The column to analyze. |
required |
dt_column |
str
|
The datetime column used to analyze the temporal distribution. |
required |
period |
str
|
The temporal period e.g. monthly or daily. |
required |
yscale |
str
|
The y-axis scale. If |
'auto'
|
figsize |
tuple[float, float] | None
|
The figure size in inches. The first dimension is the width and the second is the height. |
None
|
Example usage:
>>> import pandas as pd
>>> from flamme.section import ColumnContinuousSection
>>> section = ColumnTemporalContinuousSection(
... frame=pd.DataFrame(
... {
... "col": np.array([1.2, 4.2, np.nan, 2.2]),
... "datetime": pd.to_datetime(
... ["2020-01-03", "2020-02-03", "2020-03-03", "2020-04-03"]
... ),
... }
... ),
... column="col",
... dt_column="datetime",
... period="M",
... )
>>> section
ColumnTemporalContinuousSection(
(column): col
(dt_column): datetime
(period): M
(yscale): auto
(figsize): None
)
>>> section.get_statistics()
{}
flamme.section.ColumnTemporalContinuousSection.figsize
property
¶
figsize: tuple[float, float] | None
The individual figure size in pixels.
The first dimension is the width and the second is the height.
flamme.section.ColumnTemporalDiscreteSection ¶
Bases: BaseSection
Implement a section that analyzes the temporal distribution of a column with discrete values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame |
DataFrame
|
The DataFrame to analyze. |
required |
column |
str
|
The column of the DataFrame to analyze. |
required |
dt_column |
str
|
The datetime column used to analyze the temporal distribution. |
required |
period |
str
|
The temporal period e.g. monthly or daily. |
required |
figsize |
tuple[float, float] | None
|
The figure size in inches. The first dimension is the width and the second is the height. |
None
|
Example usage:
>>> import pandas as pd
>>> import numpy as np
>>> from flamme.section import ColumnTemporalDiscreteSection
>>> section = ColumnTemporalDiscreteSection(
... frame=pd.DataFrame(
... {
... "col": np.array([1, 42, np.nan, 22]),
... "col2": ["a", "b", 1, "a"],
... "datetime": pd.to_datetime(
... ["2020-01-03", "2020-02-03", "2020-03-03", "2020-04-03"]
... ),
... }
... ),
... column="col",
... dt_column="datetime",
... period="M",
... )
>>> section
ColumnTemporalDiscreteSection(
(column): col
(dt_column): datetime
(period): M
(figsize): None
)
>>> section.get_statistics()
{}
flamme.section.ColumnTemporalDiscreteSection.figsize
property
¶
figsize: tuple[float, float] | None
The individual figure size in pixels.
The first dimension is the width and the second is the height.
flamme.section.ColumnTemporalNullValueSection ¶
Bases: BaseSection
Implement a section to analyze the temporal distribution of null values for all columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame |
DataFrame
|
The DataFrame to analyze. |
required |
columns |
Sequence[str]
|
The list of columns to analyze. A plot is generated for each column. |
required |
dt_column |
str
|
The datetime column used to analyze the temporal distribution. |
required |
period |
str
|
The temporal period e.g. monthly or daily. |
required |
ncols |
int
|
The number of columns. |
2
|
figsize |
tuple[float, float]
|
The figure size in inches. The first dimension is the width and the second is the height. |
(7, 5)
|
Example usage:
>>> import pandas as pd
>>> import numpy as np
>>> from flamme.section import ColumnTemporalNullValueSection
>>> dataframe = pd.DataFrame(
... {
... "float": np.array([1.2, 4.2, np.nan, 2.2]),
... "int": np.array([np.nan, 1, 0, 1]),
... "str": np.array(["A", "B", None, np.nan]),
... "datetime": pd.to_datetime(
... ["2020-01-03", "2020-02-03", "2020-03-03", "2020-04-03"]
... ),
... }
... )
>>> section = ColumnTemporalNullValueSection(
... frame=dataframe, columns=["float", "int", "str"], dt_column="datetime", period="M"
... )
>>> section
ColumnTemporalNullValueSection(
(columns): ('float', 'int', 'str')
(dt_column): datetime
(period): M
(ncols): 2
(figsize): (7, 5)
)
>>> section.get_statistics()
{}
flamme.section.ColumnTemporalNullValueSection.columns
property
¶
columns: tuple[str, ...]
The columns to analyze.
flamme.section.ColumnTemporalNullValueSection.dt_column
property
¶
dt_column: str
The datetime column.
flamme.section.ColumnTemporalNullValueSection.figsize
property
¶
figsize: tuple[float, float] | None
The individual figure size in pixels.
The first dimension is the width and the second is the height.
flamme.section.ColumnTemporalNullValueSection.frame
property
¶
frame: DataFrame
The DataFrame to analyze.
flamme.section.ColumnTemporalNullValueSection.ncols
property
¶
ncols: int
The number of columns to show the figures.
flamme.section.ColumnTemporalNullValueSection.period
property
¶
period: str
The temporal period used to analyze the data.
flamme.section.ContentSection ¶
Bases: BaseSection
Implement a section that generates the given custom content.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
content |
str
|
The content to use in the HTML code. |
required |
Example usage:
>>> from flamme.section import ContentSection
>>> section = ContentSection(content="meow")
>>> section
ContentSection()
>>> section.get_statistics()
{}
flamme.section.DataFrameSummarySection ¶
Bases: BaseSection
Implement a section that returns a summary of a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame |
DataFrame
|
The DataFrame to analyze. |
required |
top |
int
|
The number of most frequent values to show. |
5
|
Example usage:
```pycon
import pandas as pd import numpy as np from flamme.section import DataFrameSummarySection section = DataFrameSummarySection( ... frame=pd.DataFrame( ... { ... "col1": np.array([1.2, 4.2, 4.2, 2.2]), ... "col2": np.array([1, 1, 1, 1]), ... "col3": np.array([1, 2, 2, 2]), ... } ... ) ... ) section DataFrameSummarySection(top=5) section.get_statistics() {'columns': ('col1', 'col2', 'col3'), 'null_count': (0, 0, 0), 'nunique': (3, 1, 2), 'column_types': ({
}, { }, { })}
flamme.section.DataTypeSection ¶
Bases: BaseSection
Implement a section that analyzes the data type of each column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dtypes |
dict[str, DTypeLike]
|
The data type for each column. |
required |
types |
dict[str, set]
|
The types of the values in each column. A column can contain multiple types. The keys are the column names. |
required |
Example usage:
>>> import numpy as np
>>> from flamme.section import DataTypeSection
>>> section = DataTypeSection(
... dtypes={
... "float": np.dtype("float64"),
... "int": np.dtype("float64"),
... "str": np.dtype("O"),
... },
... types={"float": {float}, "int": {int}, "str": {str, type(None)}},
... )
>>> section
DataTypeSection(
(dtypes): {'float': dtype('float64'), 'int': dtype('float64'), 'str': dtype('O')}
(types): {'float': {<class 'float'>}, 'int': {<class 'int'>}, 'str': {<class 'NoneType'>, <class 'str'>}}
)
>>> section.get_statistics()
{'float': {<class 'float'>}, 'int': {<class 'int'>}, 'str': {<class 'NoneType'>, <class 'str'>}}
flamme.section.DuplicatedRowSection ¶
Bases: BaseSection
Implement a section to analyze the number of duplicated rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame |
DataFrame
|
The DataFrame to analyze. |
required |
columns |
Sequence[str] | None
|
The columns used to compute the duplicated rows.
|
None
|
figsize |
tuple[float, float] | None
|
The figure size in inches. The first dimension is the width and the second is the height. |
None
|
Example usage:
>>> import pandas as pd
>>> import numpy as np
>>> from flamme.section import DuplicatedRowSection
>>> section = DuplicatedRowSection(
... frame=pd.DataFrame(
... {
... "col1": np.array([1.2, 4.2, 4.2, 2.2]),
... "col2": np.array([1, 1, 1, 1]),
... "col3": np.array([1, 2, 2, 2]),
... }
... )
... )
>>> section
DuplicatedRowSection(
(columns): None
(figsize): None
)
>>> section.get_statistics()
{'num_rows': 4, 'num_unique_rows': 3}
flamme.section.DuplicatedRowSection.columns
property
¶
columns: tuple[str, ...] | None
Tuple or None
: The columns used to compute the
duplicated rows.
flamme.section.DuplicatedRowSection.figsize
property
¶
figsize: tuple[float, float] | None
The individual figure size in pixels.
The first dimension is the width and the second is the height.
flamme.section.EmptySection ¶
Bases: BaseSection
Implement an empty section.
This section is implemented to deal with missing columns or to skip some analyses.
Example usage:
>>> import pandas as pd
>>> import numpy as np
>>> from flamme.section import EmptySection
>>> section = EmptySection()
>>> section
EmptySection()
>>> section.get_statistics()
{}
flamme.section.MarkdownSection ¶
Bases: BaseSection
Implement a section that converts a markdown string into HTML.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
desc |
str
|
The markdown string to convert. |
required |
Example usage:
>>> from flamme.section import MarkdownSection
>>> section = MarkdownSection(desc="meow")
>>> section
MarkdownSection()
>>> section.get_statistics()
{}
flamme.section.MostFrequentValuesSection ¶
Bases: BaseSection
Implement a section that analyzes the most frequent values for a given columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
counter |
Counter
|
The counter with the number of occurrences for all values. |
required |
column |
str
|
The column name. |
required |
top |
int
|
The maximum number of values to show. |
100
|
Example usage:
>>> from collections import Counter
>>> from flamme.section import MostFrequentValuesSection
>>> section = MostFrequentValuesSection(
... counter=Counter({"a": 4, "b": 2, "c": 6}), column="col"
... )
>>> section
MostFrequentValuesSection(
(counter): Counter({'c': 6, 'a': 4, 'b': 2})
(column): col
(top): 100
(total): 12
)
>>> section.get_statistics()
{'most_common': [('c', 6), ('a', 4), ('b', 2)]}
flamme.section.NullValueSection ¶
Bases: BaseSection
Implement a section that analyzes the number of null values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns |
Sequence[str]
|
The column names. |
required |
null_count |
ndarray
|
The number of null values for each column. |
required |
total_count |
ndarray
|
The total number of values for each column. |
required |
figsize |
tuple[float, float] | None
|
The figure size in inches. The first dimension is the width and the second is the height. |
None
|
Example usage:
>>> import numpy as np
>>> from flamme.section import NullValueSection
>>> section = NullValueSection(
... columns=["col1", "col2", "col3"],
... null_count=np.array([0, 1, 2]),
... total_count=np.array([5, 5, 5]),
... )
>>> section
NullValueSection(
(columns): ('col1', 'col2', 'col3')
(null_count): array([0, 1, 2])
(total_count): array([5, 5, 5])
(figsize): None
)
>>> section.get_statistics()
{'columns': ('col1', 'col2', 'col3'), 'null_count': (0, 1, 2), 'total_count': (5, 5, 5)}
flamme.section.NullValueSection.columns
property
¶
columns: tuple[str, ...]
The columns used to compute the duplicated rows.
flamme.section.NullValueSection.figsize
property
¶
figsize: tuple[float, float] | None
The individual figure size in pixels.
The first dimension is the width and the second is the height.
flamme.section.NullValueSection.null_count
property
¶
null_count: ndarray
The number of null values for each column.
flamme.section.NullValueSection.total_count
property
¶
total_count: ndarray
The total number of values for each column.
flamme.section.SectionDict ¶
Bases: BaseSection
Implement a section to manage a dictionary of sections.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sections |
dict[str, BaseSection]
|
The dictionary of sections. |
required |
max_toc_depth |
int
|
The maximum level to show in the
table of content. Set this value to |
0
|
Example usage:
>>> import pandas as pd
>>> from flamme.section import SectionDict, ContentSection, TemporalRowCountSection
>>> frame = pd.DataFrame(
... {
... "datetime": pd.to_datetime(
... [
... "2020-01-03",
... "2020-01-04",
... "2020-01-05",
... "2020-02-03",
... "2020-03-03",
... "2020-04-03",
... ]
... )
... }
... )
>>> section = SectionDict(
... {
... "content": ContentSection("meow"),
... "rows": TemporalRowCountSection(frame, dt_column="datetime", period="M"),
... }
... )
>>> section
SectionDict(
(content): ContentSection()
(rows): TemporalRowCountSection(dt_column=datetime, period=M, figsize=None)
)
>>> section.get_statistics()
{'content': {}, 'rows': {}}
flamme.section.TableOfContentSection ¶
Bases: BaseSection
Implement a wrapper section that generates a table of content before the section.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
section |
BaseSection
|
The section. |
required |
max_toc_depth |
int
|
The maximum level to show in the table of content. |
1
|
Example usage:
>>> import pandas as pd
>>> import numpy as np
>>> from flamme.section import TableOfContentSection, DuplicatedRowSection
>>> section = TableOfContentSection(
... DuplicatedRowSection(
... frame=pd.DataFrame(
... {
... "col1": np.array([1.2, 4.2, 4.2, 2.2]),
... "col2": np.array([1, 1, 1, 1]),
... "col3": np.array([1, 2, 2, 2]),
... }
... )
... )
... )
>>> section
TableOfContentSection(
(section): DuplicatedRowSection(
(columns): None
(figsize): None
)
(max_toc_depth): 1
)
>>> section.get_statistics()
{'num_rows': 4, 'num_unique_rows': 3}
flamme.section.TemporalNullValueSection ¶
Bases: BaseSection
Implement a section to analyze the temporal distribution of null values for all columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame |
DataFrame
|
The DataFrame to analyze. |
required |
dt_column |
str
|
The datetime column used to analyze the temporal distribution. |
required |
period |
str
|
The temporal period e.g. monthly or daily. |
required |
figsize |
tuple[float, float] | None
|
The figure size in inches. The first dimension is the width and the second is the height. |
None
|
Example usage:
>>> import pandas as pd
>>> import numpy as np
>>> from flamme.section import TemporalNullValueSection
>>> section = TemporalNullValueSection(
... frame=pd.DataFrame(
... {
... "col1": np.array([1.2, 4.2, np.nan, 2.2]),
... "col2": np.array([np.nan, 1, np.nan, 1]),
... "datetime": pd.to_datetime(
... ["2020-01-03", "2020-02-03", "2020-03-03", "2020-04-03"]
... ),
... }
... ),
... columns=["col1", "col2"],
... dt_column="datetime",
... period="M",
... )
>>> section
TemporalNullValueSection(
(columns): ('col1', 'col2')
(dt_column): datetime
(period): M
(figsize): None
)
>>> section.get_statistics()
{}
flamme.section.TemporalNullValueSection.columns
property
¶
columns: tuple[str, ...]
The columns to analyze.
flamme.section.TemporalNullValueSection.figsize
property
¶
figsize: tuple[float, float] | None
The individual figure size in pixels.
The first dimension is the width and the second is the height.
flamme.section.TemporalNullValueSection.period
property
¶
period: str
The temporal period used to analyze the data.
flamme.section.TemporalRowCountSection ¶
Bases: BaseSection
Implement a section to analyze the number of rows per temporal window.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
frame |
DataFrame
|
The DataFrame to analyze. |
required |
dt_column |
str
|
The datetime column used to analyze the temporal distribution. |
required |
period |
str
|
The temporal period e.g. monthly or daily. |
required |
figsize |
tuple[float, float] | None
|
The figure size in inches. The first dimension is the width and the second is the height. |
None
|
Example usage:
>>> import pandas as pd
>>> from flamme.section import TemporalRowCountSection
>>> section = TemporalRowCountSection(
... frame=pd.DataFrame(
... {
... "datetime": pd.to_datetime(
... [
... "2020-01-03",
... "2020-01-04",
... "2020-01-05",
... "2020-02-03",
... "2020-03-03",
... "2020-04-03",
... ]
... )
... }
... ),
... dt_column="datetime",
... period="M",
... )
>>> section
TemporalRowCountSection(dt_column=datetime, period=M, figsize=None)
>>> section.get_statistics()
{}
flamme.section.TemporalRowCountSection.figsize
property
¶
figsize: tuple[float, float] | None
The individual figure size in pixels.
The first dimension is the width and the second is the height.
flamme.section.TemporalRowCountSection.period
property
¶
period: str
The temporal period used to analyze the data.