Data types

Types, when going beyond the the logical data types such as integer, floats, etc, are a powerful abstraction for effective data analysis, allowing analysis under higher level lenses. pandas-profiling is backed by a powerful type system developed specifically for data analysis: visions. Currently, pandas-profiling recognizes the following types:

  • Boolean

  • Numerical

  • Date (and Datetime)

  • Categorical

  • URL

  • Path

  • File

  • Image

Appropriate typesets can both improve the overall expressiveness and reduce the complexity of the analysis/code. User customized summarizations and type definitions are fully supported, with PRs supporting new data types for specific use cases more than welcome. For reference, you can check the implementation of pandas-profiling’s default typeset here.

Data quality alerts

Data quality warnings

Alerts section in the NASA Meteorites dataset’s report. Some alerts include numerical indicators.

The Alerts section of the report includes a comprehensive and automatic list of potential data quality issues. Although useful, the decision on whether an alert is in fact a data quality issue always requires domain validation. Some of the warnings refer to a specific column, others refer to inter-column relationships and others are dataset-wide. The table below lists all possible data quality alerts and their meanings.




Column only contains one value


Column only contains zeros

High Correlation

Correlations (either Spearman, Cramer, Pearson, Kendall, 𝜙k) are above the warning threshold (configurable).

High Cardinality

Whether the column has more than 50 distinct values. Threshold is configurable.


Column’s univariate distribution presents skewness. Threshold value is configurable.

Missing Values

Column has missing values

Infinite Values

Column has infinite values (either np.inf or -np.inf)

Unique Values

All values of the column are unique (count of unique values equals column’s length)


Column (likely/mostly) contains Date or Datetime records


Column follows a uniform distribution (Chi-squared test score > 0.999, threshold score is configrable)

Constant length

For strings/date/datetimes columns whose entries all have the same length


Variable has mixed types or is constant (thus not suitable for meaningful analysis)


Column can’t be analysed (type is not supported, has mixed types, has lists/dicts/tuples, is empty, wrongly formatted)


Dataset-level warning signaling the presence of more than 10 duplicated records.


Dataset-level warning signaling there’s no data to be analysed.

Information on the default values and the specific parameters/thresholds used in the computation of these alerts, as well as settings to disable specific ones, can be consulted in Available settings.