pandas-profiling project became what it is today due to the
work of the creators to make it successful. This page aims to highlights
a bit of the development history. For the full picture, have a look at
In 2016, Jos Polfliet was working for SAS Institute and was getting
bored with doing the same types of exploratory data analysis over and
over again. Automating his own logic, he noticed it was useful and
decided to open-source it under the MIT License. The package was named
pandas-profiling as a contraction of pandas and data profiling.
The idea was to enable the user to perform automated exploratory data
analysis, beyond what the
df.describe() function was offering and by
abusing Jupyter’s HTML output. Since that start, human years of
repetitive plotting and summary statistics have been saved from the
Machine Learning community.
Since May 2019, principal development has been taken over by Simon Brugman. The startup that he co-founded was an early adopter of the package, and he heavily invested in growing the package with experience brought from using it in the industry. Simon led the package through a huge refactor (99.5% was changed) and two major releases, and great collaborations, most notably with Ian Eaves in visions.
Where are we now?
At the time of writing,
pandas-profiling is one of the primary tools
for data exploration in Python, with > 8k GitHub stars, 30 million
downloads and users working in any industry, including many at FAANG,
banks and insurance companies, startups and universities.
has been named one of the Top 20 ML packages by Google.
pandas-profiling will be part of a bigger mission, the Data-Centric AI
movement spearheaded by YData. The tech startup YData will be the
prominent force that will help make
pandas-profiling the standard
profiling library among data scientists, by introducing new features through a
bigger team and enterprise support.