Available settings
A set of options is available in order to customize the behaviour of pandas-profiling
and the appearance of the generated report. The depth of customization allows the creation of behaviours highly targeted at the specific dataset being analysed. The available settings are listed below. To learn how to change them, check Changing settings.
General settings
Global report settings:
Parameter |
Type |
Default |
Description |
---|---|---|---|
|
string |
Pandas Profiling Report |
Title for the report, shown in the header and title bar. |
|
integer |
0 |
Number of workers in thread pool. When set to zero, it is set to the number of CPUs available. |
|
boolean |
|
If |
Variable summary settings
Settings related with the information displayed for each variable.
Parameter |
Type |
Default |
Description |
---|---|---|---|
|
None, asc or desc |
None |
Sort the variables asc (ending), desc (ending) or None (leaves original sorting). |
|
dict |
{} |
Ability to display a description alongside the descriptive statistics of each variable ({‘var_name’: ‘Description’}). |
|
list[float] |
[0.05,0.25,0.5,0.75,0.95] |
The quantiles to calculate. Note that .25, .5 and .75 are required for the computation of other metrics (median and IQR). |
|
integer |
20 |
Warn if the skewness is above this threshold. |
|
integer |
5 |
If the number of distinct values is smaller than this number, then the series is considered to be categorical. Set to 0 to disable. |
|
float |
0.999 |
Set to 0 to disable chi-squared calculation. |
|
boolean |
|
Check the string length and aggregate values (min, max, mean, media). |
|
boolean |
|
Check the distribution of characters and their Unicode properties. Often informative, but may be computationally expensive. |
|
boolean |
|
Check the distribution of words. Often informative, but may be computationally expensive. |
|
integer |
50 |
Warn if the number of distinct values is above this threshold. |
|
float |
0.5 |
Warn if the imbalance score is above this threshold. |
|
integer |
5 |
Display this number of observations. |
|
float |
0.999 |
Same as above, but for categorical variables. |
|
integer |
3 |
Same as above, but for boolean variables. |
|
float |
0.5 |
Warn if the imbalance score is above this threshold. |
profile = df.profile_report(
sort="ascending",
vars={
"num": {"low_categorical_threshold": 0},
"cat": {
"length": True,
"characters": False,
"words": False,
"n_obs": 5,
},
},
)
profile.config.variables.descriptions = {
"files": "Files in the filesystem",
"datec": "Creation date",
"datem": "Modification date",
}
profile.to_file("report.html")
Missing data overview plots
Settings related with the missing data section and the visualizations it can include.
Parameter |
Type |
Default |
Description |
---|---|---|---|
|
boolean |
|
Display a bar chart with counts of missing values for each column. |
|
boolean |
|
Display a matrix of missing values. Similar to the bar chart, but might provide overview of the co-occurrence of missing values in rows. |
|
boolean |
|
Display a heatmap of missing values, that measures nullity correlation (i.e. how strongly the presence or absence of one variable affects the presence of another). |
profile = df.profile_report(
missing_diagrams={
"heatmap": False,
}
)
profile.to_file("report.html")
Correlations
Settings regarding correlation metrics and thresholds.
Parameter |
Type |
Default |
Description |
---|---|---|---|
|
boolean |
|
Whether to calculate this coefficient |
|
boolean |
|
Show warning for correlations higher than the threshold |
|
float |
0.9 |
Warning threshold |
|
boolean |
|
Whether to calculate this coefficient |
|
boolean |
|
Show warning for correlations higher than the threshold |
|
float |
0.9 |
Warning threshold |
|
boolean |
|
Whether to calculate this coefficient |
|
boolean |
|
Show warning for correlations higher than the threshold |
|
float |
0.9 |
Warning threshold |
|
boolean |
|
Whether to calculate this coefficient |
|
boolean |
|
Show warning for correlations higher than the threshold |
|
float |
0.9 |
Warning threshold |
|
boolean |
|
Whether to calculate this coefficient |
|
boolean |
|
Show warning for correlations higher than the threshold |
|
float |
0.9 |
Warning threshold |
|
boolean |
|
Whether to calculate this coefficient |
|
boolean |
|
Show warning for correlations higher than the threshold |
|
float |
0.9 |
Warning threshold |
For instance, to disable all correlation computations (may be relevant for large datasets):
profile = df.profile_report(
title="Report without correlations",
correlations={
"auto": {"calculate": False},
"pearson": {"calculate": False},
"spearman": {"calculate": False},
"kendall": {"calculate": False},
"phi_k": {"calculate": False},
"cramers": {"calculate": False},
},
)
# or using a shorthand that is available for correlations
profile = df.profile_report(
title="Report without correlations",
correlations=None,
)
Interactions
Settings related with the interactions section.
Parameter |
Type |
Default |
Description |
---|---|---|---|
|
boolean |
|
Generate a 2D scatter plot (or hexagonal binned plot) for all continuous variable pairs. |
|
list |
[] |
When a list of variable names is given, only interactions between these and all other variables are computed. |
Report’s appearance
Settings related with the appearance and style of the report.
Parameter |
Type |
Default |
Description |
---|---|---|---|
|
bool |
|
If |
|
bool |
|
If |
|
boolean |
|
If |
|
boolean |
|
Whether to include a navigation bar in the report |
|
string |
|
Select a bootswatch theme. Available options: flatly (dark) and united (orange) |
|
string |
A base64 encoded logo, to display in the navigation bar. |
|
|
string |
#337ab7 |
The primary color to use in the report. |
|
boolean |
|
By default, the width of the report is fixed. If set to |