gitmyhub

faust

Python ★ 6.8k updated 1y ago

Python Stream Processing

Python library for building real-time stream processors on top of Apache Kafka, inspired by Kafka Streams but written in plain Python. Built by Robinhood for billions of daily events, now deprecated, the community faust-streaming fork is actively maintained.

PythonApache KafkaRocksDBasynciosetup: hardcomplexity 4/5

Faust is a Python library that lets developers build systems that process continuous streams of data, reading events as they arrive rather than working on batches after the fact. It was built by Robinhood and used internally to handle billions of events per day across distributed systems and real-time data pipelines. The library is now deprecated and no longer maintained by Robinhood; an active community-maintained fork continues at a separate GitHub repository.

The core idea comes from Kafka Streams, a Java-based stream processing tool, but Faust brings that approach to plain Python. You connect it to Apache Kafka, a messaging system that acts as a high-throughput queue, and then write ordinary Python functions that react to each incoming message. Because it uses Python's async features, those functions can also make web requests or run other background work without blocking the stream.

Faust includes a built-in distributed key/value store called Tables. These work like Python dictionaries in your code, but the data is stored on disk using RocksDB (a fast embedded database) and replicated across all nodes in your cluster. If one machine fails, another picks up where it left off automatically. Tables also support time-based windowing, so you can track counts like "clicks in the last hour" and let older windows expire on their own.

Because it is just Python, Faust works alongside any library you already use: NumPy, Pandas, Django, Flask, or anything else. Models describe how messages are serialized, using Python type annotations to define the shape of expected data. The library is statically typed and works with the mypy type checker, which can catch errors before you run anything.

Faust requires Python 3.6 or later. Given the deprecation notice, new projects should consider the community fork rather than this repository.

Where it fits