facets
Visualizations for machine learning datasets
Two browser-based tools that let you visually explore machine learning datasets, one shows statistics per data column to catch quality issues, the other lets you zoom into individual data points across tens of thousands of items.
Facets is a pair of browser-based visualization tools designed to help people explore and understand machine learning datasets without writing custom analysis code. Both tools can run inside Jupyter notebooks or on standalone web pages, and they are built as web components backed by TypeScript.
The first tool, Facets Overview, gives a summary of one or more datasets at the feature level. A feature is any column or attribute in your data, such as age, income, or a category label. Overview computes statistics for each feature and renders them visually so you can quickly spot problems: features with a large number of missing values, unexpected value ranges, or distributions that differ significantly between your training set and your test set. Suspicious features are highlighted in red, and you can sort columns by metrics like the proportion of missing data. A Python package called facets-overview, installable via pip, generates the statistics that the visualization reads.
The second tool, Facets Dive, is for hands-on exploration of individual data points rather than column-level statistics. It can display up to tens of thousands of items at once, each rendered as a small tile. You sort and group items by their feature values, creating a grid that reveals patterns across the dataset. Zooming in shows specific examples; zooming out shows the full distribution. The README describes the experience as switching between a high-level view and low-level details using smooth animation.
Both tools embed into Google Colab or Jupyter notebooks using HTML tags that load the visualization components. The repository includes example notebooks showing how to connect the tools to a dataset. One known limitation noted in the README is that the visualizations currently work only in Chrome. The disclaimer at the bottom notes this is not an official Google product.
Where it fits
- Spot missing values, unexpected ranges, or distribution mismatches between your training and test datasets before you start model training.
- Explore tens of thousands of image or text samples side-by-side in a grid, grouping them by feature values to find patterns in your data.
- Embed an interactive dataset overview directly inside a Jupyter or Google Colab notebook to share data quality findings with teammates.