Available settings

A set of options is available in order to customize the behaviour of pandas-profiling and the appearance of the generated report. The depth of customization allows the creation of behaviours highly targeted at the specific dataset being analysed. The available settings are listed below. To learn how to change them, check Changing settings.

General settings

Global report settings:

Parameter

Type

Default

Description

title

string

Pandas Profiling Report

Title for the report, shown in the header and title bar.

pool_size

integer

0

Number of workers in thread pool. When set to zero, it is set to the number of CPUs available.

progress_bar

boolean

True

If True, pandas-profiling will display a progress bar.

Variable summary settings

Settings related with the information displayed for each variable.

Parameter

Type

Default

Description

sort

None, asc or desc

None

Sort the variables asc (ending), desc (ending) or None (leaves original sorting).

variables.descriptions

dict

{}

Ability to display a description alongside the descriptive statistics of each variable ({‘var_name’: ‘Description’}).

vars.num.quantiles

list[float]

[0.05,0.25,0.5,0.75,0.95]

The quantiles to calculate. Note that .25, .5 and .75 are required for the computation of other metrics (median and IQR).

vars.num.skewness_threshold

integer

20

Warn if the skewness is above this threshold.

vars.num.low_categorical_threshold

integer

5

If the number of distinct values is smaller than this number, then the series is considered to be categorical. Set to 0 to disable.

vars.num.chi_squared_threshold

float

0.999

Set to 0 to disable chi-squared calculation.

vars.cat.length

boolean

True

Check the string length and aggregate values (min, max, mean, media).

vars.cat.characters

boolean

False

Check the distribution of characters and their Unicode properties. Often informative, but may be computationally expensive.

vars.cat.words

boolean

False

Check the distribution of words. Often informative, but may be computationally expensive.

vars.cat.cardinality_threshold

integer

50

Warn if the number of distinct values is above this threshold.

vars.cat.n_obs

integer

5

Display this number of observations.

vars.cat.chi_squared_threshold

float

0.999

Same as above, but for categorical variables.

vars.bool.n_obs

integer

3

Same as above, but for boolean variables.

Configuration example
profile = df.profile_report(
    sort="ascending",
    vars={
        "num": {"low_categorical_threshold": 0},
        "cat": {
            "length": True,
            "characters": False,
            "words": False,
            "n_obs": 5,
        },
    },
)

profile.config.variables.descriptions = {
    "files": "Files in the filesystem",
    "datec": "Creation date",
    "datem": "Modification date",
}

profile.to_file("report.html")

Missing data overview plots

Settings related with the missing data section and the visualizations it can include.

Parameter

Type

Default

Description

missing_diagrams.bar

boolean

True

Display a bar chart with counts of missing values for each column.

missing_diagrams.matrix

boolean

True

Display a matrix of missing values. Similar to the bar chart, but might provide overview of the co-occurrence of missing values in rows.

missing_diagrams.heatmap

boolean

True

Display a heatmap of missing values, that measures nullity correlation (i.e. how strongly the presence or absence of one variable affects the presence of another).

Configuration example: disable heatmap for large datasets
profile = df.profile_report(
    missing_diagrams={
        "heatmap": False,
    }
)
profile.to_file("report.html")

Correlations

Settings regarding correlation metrics and thresholds.

Parameter

Type

Default

Description

correlations.pearson.calculate

boolean

True

Whether to calculate this coefficient

correlations.pearson.warn_high_correlations

boolean

True

Show warning for correlations higher than the threshold

correlations.pearson.threshold

float

0.9

Warning threshold

correlations.spearman.calculate

boolean

True

Whether to calculate this coefficient

correlations.spearman.warn_high_correlations

boolean

False

Show warning for correlations higher than the threshold

correlations.spearman.threshold

float

0.9

Warning threshold

correlations.kendall.calculate

boolean

True

Whether to calculate this coefficient

correlations.kendall.warn_high_correlations

boolean

False

Show warning for correlations higher than the threshold

correlations.kendall.threshold

float

0.9

Warning threshold

correlations.phi_k.calculate

boolean

True

Whether to calculate this coefficient

correlations.phi_k.warn_high_correlations

boolean

False

Show warning for correlations higher than the threshold

correlations.phi_k.threshold

float

0.9

Warning threshold

correlations.cramers.calculate

boolean

True

Whether to calculate this coefficient

correlations.cramers.warn_high_correlations

boolean

True

Show warning for correlations higher than the threshold

correlations.cramers.threshold

float

0.9

Warning threshold

correlations.auto.calculate

boolean

True

Whether to calculate this coefficient

correlations.auto.warn_high_correlations

boolean

True

Show warning for correlations higher than the threshold

correlations.auto.threshold

float

0.9

Warning threshold

For instance, to disable all correlation computations (may be relevant for large datasets):

profile = df.profile_report(
    title="Report without correlations",
    correlations={
        "auto": {"calculate": False},
        "pearson": {"calculate": False},
        "spearman": {"calculate": False},
        "kendall": {"calculate": False},
        "phi_k": {"calculate": False},
        "cramers": {"calculate": False},
    },
)

# or using a shorthand that is available for correlations
profile = df.profile_report(
    title="Report without correlations",
    correlations=None,
)

Interactions

Settings related with the interactions section.

Parameter

Type

Default

Description

interactions.continuous

boolean

True

Generate a 2D scatter plot (or hexagonal binned plot) for all continuous variable pairs.

interactions.targets

list

[]

When a list of variable names is given, only interactions between these and all other variables are computed.

Report’s appearance

Settings related with the appearance and style of the report.

Parameter

Type

Default

Description

html.minify_html

bool

True

If True, the output HTML is minified using the htmlmin package.

html.use_local_assets

bool

True

If True, all assets (stylesheets, scripts, images) are stored locally. If False, a CDN is used for some stylesheets and scripts.

html.inline

boolean

True

If True, all assets are contained in the report. If False, then a web export is created, where all assets are stored in the ‘[REPORT_NAME]_assets/’ directory.

html.navbar_show

boolean

True

Whether to include a navigation bar in the report

html.style.theme

string

None

Select a bootswatch theme. Available options: flatly (dark) and united (orange)

html.style.logo

string

A base64 encoded logo, to display in the navigation bar.

html.style.primary_color

string

#337ab7

The primary color to use in the report.

html.style.full_width

boolean

False

By default, the width of the report is fixed. If set to True, the full width of the screen is used.