arrow
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
A shared in-memory columnar data format with libraries in 12+ languages, so programs can pass large datasets to each other without serializing or copying.
apache/arrow is a universal standard for how data is stored and moved between programs, with libraries available in over a dozen programming languages. Rather than each data tool inventing its own internal data format, Arrow defines a single shared in-memory layout — called a columnar format (meaning data is organized by column rather than by row) — that makes moving data between tools fast and efficient.
The problem it solves is data exchange overhead. Without a shared standard, passing data between two different programs (say, a database and a data analytics library) usually requires serializing the data into a file format and deserializing it back on the other side, which wastes time. Arrow lets programs share data directly in memory with zero-copy transfers — meaning no unnecessary data duplication.
Key components include the Arrow Columnar Format (the in-memory data layout standard), the Arrow IPC format for efficient data transmission between processes, Arrow Flight (a protocol for building high-performance data services over a network), ADBC (Arrow Database Connectivity, an API for connecting to databases in an Arrow-native way), and readers and writers for common file formats including Parquet and CSV.
Libraries are available for C++, Python, R, Java, Go, Rust, JavaScript, Ruby, Julia, Swift, and more. Each language implementation follows the same underlying format, meaning data can move between them without conversion.
You would use Apache Arrow when building data pipelines, analytics tools, or anything where multiple programs need to share large datasets quickly. It is an Apache Software Foundation project.
Where it fits
- Move large dataframes between Python and a database with zero copy
- Build a data service that streams Arrow records over the network with Flight
- Read and write Parquet files from any supported language
- Connect to databases using ADBC instead of language-specific drivers