gitmyhub

embedding-atlas

TypeScript ★ 4.8k updated 8h ago

Embedding Atlas is a tool that provides interactive visualizations for large embeddings. It allows you to visualize, cross-filter, and search embeddings and metadata.

Embedding Atlas is an interactive visualization tool from Apple for exploring large collections of data points that have been converted into numerical representations called embeddings. Embeddings are commonly produced by AI models when processing text, images, or other content, and they capture meaning in a form that computers can compare. Embedding Atlas lets you see those points laid out visually on a map-like canvas, with up to a few million points rendered smoothly.

The tool automatically groups similar points into clusters and labels them, so you can see the overall shape of a dataset without having to inspect individual items. You can also search for points similar to a query, filter points by metadata columns linked to the main view, and see density contours that highlight where data is concentrated versus sparse.

Embedding Atlas is available in three forms. The Python command-line tool takes a data file in Parquet format and opens an interactive viewer with a single command. A Python widget lets you embed the same viewer inside a Jupyter notebook, passing a data frame directly. A JavaScript package lets developers integrate the visualization into web applications, with support for React and Svelte.

The tool is backed by a research paper and is aimed at data scientists, machine learning practitioners, and developers who want to inspect or communicate patterns in large AI datasets. It requires Python or a modern JavaScript environment to use. The rendering is done with WebGPU where available, falling back to WebGL 2 on older browsers, which keeps the interface fast even with large datasets. The code is open source under the MIT license.