History

The pandas-profiling project became what it is today, due to the work of the creators to make it successful. This page aims to highlights a bit of the development history. For the full picture, have a look at the contributor history.

Inception

In 2016, Jos Polfliet was working for SAS Institute and was getting bored with doing the same types of exploratory data analysis over and over again. Automating his own logic, he noticed it was useful and decided to open-source it under the MIT License. The package was named pandas-profiling as a contraction of pandas and data profiling. The idea was to enable the user to perform automated exploratory data analysis, beyond what the df.describe() function was offering and by abusing Jupyters HTML output. Since that start, human years of repetitive plotting and summary statistics have been saved from the Machine Learning community.

Second life

Since May 2019, principal development has been taken over by Simon Brugman. The startup that he co-founded was an early adopter of the package, and he heavily invested in growing the package with experience brought from using it in the industry. Simon led the package through a huge refactor (99.5% was changed) and two major releases, and great collaborations, notably with Ian Eaves in visions.

Where are we now?

At the time of writing, pandas-profiling is one of the primary tools for data exploration in python, with > 8k Github stars, 30 million downloads and users working in any industry, including many at FAANG, banks and insurance companies, startups, universities. pandas-profiling has been named one of the Top 20 ML packages by Google.

What’s next?

Pandas Profiling will be part of a bigger mission, the Data-Centric AI movement spearheaded by YData. The tech startup, YData, will be the prominent force that will help make Pandas Profiling the standard profiling library among data scientists introducing more features, bigger team and enterprise support.