gitmyhub

fg-data-profiling

Python ★ 14k updated 1mo ago

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

fg-data-profiling (formerly ydata-profiling and pandas-profiling) generates a thorough HTML or JSON analysis report of any dataset, types, missing values, distributions, correlations, with a single line of Python code.

PythonPandasSparksetup: easycomplexity 2/5

fg-data-profiling is a Python library that produces a detailed analysis report of a dataset with a single line of code. You load a table of data into a standard Python data structure called a DataFrame, run one command, and get back a thorough breakdown of every column, covering data types, missing values, duplicate rows, statistical summaries, and visualizations. The report can be exported as an HTML file you can open in a browser, as JSON for use in automated systems, or as an interactive widget inside a Jupyter Notebook.

The library handles several types of data automatically. For numeric columns it computes averages, medians, and distributions. For text columns it identifies character patterns and scripts. For date and time columns it detects seasonality and auto-correlation patterns. It also handles file and image columns by reporting file sizes, creation dates, and image dimensions. It automatically flags potential problems in the data, such as columns that are almost entirely empty, values that are heavily skewed to one side, or columns that are nearly identical to each other.

One common use case is comparing two versions of the same dataset side by side, which the library supports with the same one-line approach. It also scales to large datasets through Spark support, allowing the same profiling workflow on distributed data rather than only on data that fits on a single machine.

The package was previously called ydata-profiling and before that pandas-profiling. It was recently renamed to fg-data-profiling under new stewardship by the Data-Centric AI Community. If you have older code that imports ydata-profiling, the README includes a short migration guide showing how to swap the package name and update import statements. The old package will no longer receive updates.

Where it fits