Changelogο
Changelog v3.6.3ο
π Bug fixesο
Changelog v3.6.2ο
π Bug fixesο
Changelog v3.6.1ο
π Bug fixesο
3.6.0 (2022-12-21)ο
π Bug fixesο
add css to cope with large tables (7f42f87)
adjust categoricals layout (f0bb45a)
categorical data not being obscured in the common values plot (40236bc)
compare report ignoring config parameter (3d60556)
compare report warnings always showing the last alert type (6b3c13d)
comparison fails when duplicates are disable (#1208) (6d19620)
do no raise exception for percentage formatter (3ea626d)
enforce recomputation of description sets (a9fd1c8)
error comparing only one precomputed profile (00646cd)
html: sensible cloud-platform notebook html rendering (b22ece2)
ignoring config of precomputed reports (6478c40)
only compute auto correlation when no config is specified (d5d4f58)
remove malfunctioning hook (e2593f5)
remove unused test (2170338)
return the proper type for widgets (4c0b358)
set compute default to false (c70e491)
solve mypy error (9c4266e)
solve mypy issue (e3e7788)
uses colors from the specified config (c0c556d)
utils: use βurllib.requestβ instead of βrequestsβ (#1177) (e4d020b), closes #1168
π Featuresο
Changelog v3.5.1ο
π Bug fixesο
Changelog v3.5.0ο
π Bug fixesο
π Featuresο
Changelog v3.4.0ο
π Bug fixesο
π Featuresο
Changelog v3.3.1ο
π Bug fixesο
Changelog v3.3.0ο
π Featuresο
Add time-series profiling support: PACF, ACF, Augmented Dickey-Fuller test and Seasonality test
π Bug fixesο
π·ββοΈ Internal Improvementsο
Improve package installation compatibility: Fix package versions
Remove support to python 3.6
π Documentationο
-Update package footer author
Changelog v3.2.0ο
π Featuresο
Add stop words to word_summary_vc [#863]
show categorical freq with stacked barh instead of pie
Make pie plot colors customizable
π Bug fixesο
π·ββοΈ Internal Improvementsο
tryceratops for CI: improved exception handling
β¬οΈ Dependenciesο
tangled-up-in-unicode 0.2.0 (unicode 14)
Loosen jupyter-client dependency for Colab (now >=5.3.4, was >=6.0.0)
Changelog v3.1.0ο
π Featuresο
Fine-grained progress bar
π Bug fixesο
Python 3.9 and 3.10 compatibility
Phik correlation order
π Documentationο
Several link fixes, readme updates
π·ββοΈ Internal Improvementsο
Matplotlib backend
β¬οΈ Dependenciesο
pre-commit
visions to 0.7.4
Changelog v3.0.0ο
This is the first release to adhere to the SemVer and Conventional Commits specifications.
π Featuresο
The report configuration was completely overhauled, providing a more intuitive API and fixing issues inherent to the previous global config.
π Bug fixesο
π Documentationο
Enforce QA using flake8 for documentation, for instance checking for backticks and enforcing black code style on examples.
Automated configuration documentation API.
π·ββοΈ Internal Improvementsο
CI: mypy type checking was moved to the pre-commit hooks.
π¨ Breaking changesο
The configuration syntax has changed!
The yaml configuration now requires the official syntax (e.g. null
instead of None
).
The previously used configuration library could not handle comments with indentation - you are now free to use conventional yaml.
For the python configuration the set_variable
method has been replaced by more intuitively accessing the configuration object.
For example, you can now set the title in the following way report.config.title = "My title"
.
The docs provide additional examples.
β¬οΈ Dependenciesο
pydantic
andPyYaml
are dependencies for the new configuration.confuse
andattrs
are no longer (explicit) dependencies.Upgraded
tangled-up-in-unicode
to 0.0.7.
Changelog v2.13.0ο
π Featuresο
configurable numeric precision
π·ββοΈ Internal Improvementsο
string type detection performance optimization
various improvements to software quality (flake8, commitlint)
β¬οΈ Dependenciesο
upgrade from
visions
0.6.0 to 0.7.1upgrade from
coverage
<5 to ~=5.5
Changelog v2.12.0ο
π Featuresο
π Bug fixesο
π Documentationο
Fix link syntax (contributed by @ChrisCarini)
π·ββοΈ Internal Improvementsο
Several performance improvements (minimal mode, duplicates, frequency table sorting)
Introduce
pytest-benchmark
in CI to monitor commit performance impactIntroduce
commitlint
in CI to start automating the changelog generation
β¬οΈ Dependenciesο
The
ipywidgets
dependency was moved to the[notebook]
extra, so most of Jupyter will not be installed alongside this package by default (contributed by @akx)Replaced the (testing only)
fastparquet
dependency withpyarrow
(default pandas parquet engine, contributed by @kurosch)Upgrade
phik
. This drops the hard dependency on numba (contributed by @akx)
Changelog v2.11.0ο
π Featuresο
Great Expectations integration [430] docs (thanks @spbail, @talagluck and the Great Expectations team).
Introduced the
infer_dtypes
parameter to control automatic inference of data types [676] (thanks @mohith7548 and @ieaves).Improved JSON representation for pd.Series, pd.DataFrame, numpy data and Samples.
π¨ Breaking changesο
Global config setting removed; config resets on report initialization.
β¬οΈ Dependenciesο
Update
pyupgrade
to2.10.0
.
Changelog v2.10.1ο
π Bug fixesο
π Documentationο
Update Slack community link on readme [673]
Include recent contributions to the βResourcesβ page.
Changelog v2.10.0ο
π Featuresο
Restructured the overview for categorical variables.
Handling of compressed files
Option for random sample
Restructure categorical variable overview
π·ββοΈ Internal Improvementsο
Full visions integration for type system: read more here.
Migrate from Travis CI to Github Actionsβ¦
π¨ Breaking changesο
The configuration parameter
vars.cat.unicode
is replaced byvars.cat.characters
.
Changelog v2.9.0ο
π Featuresο
Description per variable now possible (see the metadata page) or the Census example.
π Bug fixesο
Fixed bug for small DataFrames with unused categories.
Fixed bug where parallelization would have side effects.
Removed warning where colormap was modified in place.
Distinguish between unique and distinct correctly.
π Documentationο
Extend documentation for frequent issues.
Extended documentation for Streamlit and Panel.
Provide visibility to our supporters.
β¬οΈ Dependenciesο
Pandas 1.1.0 contains bugs that make it incompatible. Please up- or downgrade.
Upgraded visions to 0.5.0.
Changelog v2.9.0rc1ο
π Featuresο
Working with sensitive data: Introduced
sensitive=True
option to mask non-aggregated data (such as samples, duplicates, frequency tables for categorical columns) [#503].The sample section can be parametrized with a custom sample (for instance mock data).
Introduce shorthands for groups of parameters for styles and explorative mode [#499].
Metadata of a dataset can be added to the report (see documentation).
Numeric columns now report monotonicity information.
A pie chart can be generated for boolean and (low) categorical columns.
π Bug fixesο
π·ββοΈ Internal Improvementsο
Histograms used to be calculated at view time (single thread) and are now computed in parallel.
Matplotlibβs rcParams are now modified through the contextmanager [#494].
π Documentationο
π¨ Breaking changesο
bayesian_blocks
binning has been removed, together with theastropy
dependency.Config files
config_dark.yaml
,config_united.yaml
andconfig_explorative.yaml
have been removed in favour of shorthand for groups of parameters.
β¬οΈ Dependenciesο
isort
updated to major version 5.attrs
is now required for classes.
Changelog v2.8.0ο
π Featuresο
Expanded the Unicode analysis capabilities: next to the most occurring unicode scripts, categories and blocks, itβs now possible to inspect the most frequent characters for each of them.
ProfileReport.set_variable now accepts nested parameters such as
report.set_variable("variables.descriptions", {"var1": "Identifier"})
.Ability to have descriptions of the variables alongside the descriptive statistics (#232, #402).
Config: Introducing config shorthands.
Config:
plot.scatter_threshold
allows for configuration above what value scatter plots are replace with hexbin plots.Config:
html.inline
allows for rendering assets as vector images to package export as folder and file (similar to exporting a website). (#452).Itβs now possible to specify which interactions to compute to filter out un-needed interactions between columns (#451).
When the
output_file
is omitted in the CLI, it uses theinput_file
with HTML extensions. This can be useful when profiling of a complete directory from the command line, e.g.find . -type f -name "*.csv" -exec pandas_profiling {} \;
.Config: Split the
vars.cat.check_composition
invars.cat.unicode
andvars.cat.length
for more control on the summaries.Config: Included a new configuration sample file
config_explorative.yml
, includingText
(length distribution, unicode information),File
(file size, creation time),Image
(dimensions, exif information).
π Bug fixesο
Resolved color ValueError on Mac (#464).
Style: too many interactions overflowed tabs. Now they elegantly turn into a select control.
Unique variables are always uniform and have high cardinality, hence we can remove the redundant labels.
The counts for unicode properties were based on unique characters, instead of following the original frequency distribution.
Slimmed down the HTML by removing classes and more effective CSS.
π·ββοΈ Internal Improvementsο
CI: Added macOS and Windows to the testing environments (experimental).
CI: Added python3.9-dev to the testing environment (experimental).
CI: Reduced the number of permutations for code formatting and type checking.
π Documentationο
API documentation is now available.
β οΈ Deprecatedο
The
bayesian_bins
parameter will be removed in the next release.
π¨ Breaking changesο
Config:
vars.cat.check_composition
is replaced byvars.cat.unicode
andvars.cat.length
.
β¬οΈ Dependenciesο
Update
visions
to0.4.4
for more informative Unicode summaries.
Changelog v2.7.1ο
β¬οΈ Dependenciesο
Fix version of
visions
due to breaking changes in new summarization functions.
Changelog v2.7.0ο
π Featuresο
Reports are built in phases, see issue for details (#421)
The most occurring duplicates rows are included in the report.
ProfileReports can now be saved to and loaded from disk (for caching).
Explicit analysis duration is added to the reproduction section of the report.
Doc: this version introduces documentation powered by Sphinx. The previously used pdoc3 has been adequate initially, however misses functionality and extensibility.
Doc: Dedicated page for large datasets is created (#420).
Doc: The installation instructions have been extended, installation via conda would default to 1.4.1 (#449, #448).
CI: Linting, building the documentation and examples and uploading the package to PyPi have been automated using git flow and Github Actions.
π Bug fixesο
warnings were not shown in the βwarningsβ tab, but were at variable level (#389).
The βmedian absolute deviationβ is now reported instead of the βmean absolute deviationβ (#453).
Several style-related fixes for Jupyter lab and notebooks (tables, warnings, wide images).
pd.NAN
introduced inpandas
1 now supported (#437).The logic for calculating infinite values is now correct (#397).
π·ββοΈ Internal Improvementsο
The number of progress bars is reduced. The progress bars are now grouped by build phase (e.g. describing dataset, building report structure, rendering report, exporting to file).
The progress bars provide more information about the current step to the user #434).
Invalid correlations coefficients do not cause it to drop the complete variable anymore, instead the plot now propagates the NaN (#417).
Performance: type inference test now short-circuit, as
visions
does by default.Performance: the numerical summary is optimized to use
numpy
directly, instead of slower methods provided bypandas
.Config: dynamic histogram bins are now disabled by default default for better default computational performance (#441).
Config: type inference to warning when date variables are processed as categorical is set to False by default for being a bottleneck for larger datasets.
Warn: the user is warned that the
to_widgets
does not work in Google Colab, which doesnβt supportipywidgets
properly (#462).Cln: Moved ProfileReport out of
__init__
to itβs own class file.Cln: removed the
output_file
parameter form examples.Cln: the HTML representation of the footer and wrapper are moved out of ProfileReport to the report structure.
Cln: the imports are automatically ordered with
isort
.
β οΈ Deprecatedο
Doc: the
pdoc3
documentation will be removed in the future.Config: using the config globally is deprecated. In the future, the configuration will be tied to the ProfileReport.
π¨ Breaking changesο
Doc: the example HTML reports were removed from the repository (still available in the
gh-pages
branch and documentation).The recoded βcorrelationβ was removed for not being informative enough to justify itβs costs.
β¬οΈ Dependenciesο
Requirements now correctly excludes pandas 1.0.0, 1.0.1 and 1.0.2. Either user pandas <1 or >= 1.0.3.
Prior to v2.7.0ο
Previously, there was no explicit changelog. However, changes were included in the release description on GitHub, which you can find in this page.