gitmyhub

creek

CoffeeScript ★ 0 updated 13y ago ⑂ fork

Configurable streaming aggregator

Understanding Creek

Creek is a tool for analyzing streaming data in real time without needing to store everything. Imagine you're watching a fire hose of information—tweets, log entries, sensor readings, anything that comes in continuously—and you want to know summaries like "how many unique words have we seen?" or "what are the top 10 trending words right now?" Creek lets you answer those questions on the fly by feeding data through it and asking for live statistics.

The way it works is straightforward: data flows in through a parser (which understands the format, whether that's plain text, JSON, or something else), gets processed through aggregators (small calculation engines that track statistics like counts, trends, or unique values), and then you query the results through a simple web interface. You write a config file describing what you want to track—essentially a recipe—and then pipe your data stream through the command line. Within seconds, you have a REST API serving up your analytics.

A real-world example from the README: if you connected Twitter's live stream and configured Creek to track trending words in real time, it would constantly update a list of the top 10 words spoken about over the last 30 minutes, filtering out short words or non-URLs as you specify. You'd then visit a local web page to see those rankings update live. The project was originally built to handle over 35 million messages a day in production, so it's designed for volume.

The project is no longer actively maintained—the README points people toward a successor—but it demonstrates a clean approach to the problem: keep the configuration language simple, let non-programmers write rules, and handle the heavy lifting of windowed calculations and aggregation in the background. It's useful for anyone who needs quick operational dashboards, monitoring, or trend detection without building a full data pipeline.