orderly

Java ★ 0 updated 14y ago ⑂ fork

Schema and type system for creating sortable, byte-valued keys

What This Does

Orderly solves a practical problem: how do you store complex data (like a timestamp, username, and score) in a database key in a way that sorts correctly when you read the raw bytes back out? It's a serialization library that converts Java data types into byte arrays that maintain the same sort order as the original data. If you sort the byte arrays alphabetically, you get the same result as if you'd sorted the original numbers and strings—which is essential for databases like HBase that organize data by byte-sorted keys.

How It Works

The library takes different data types (integers, floating-point numbers, text, dates, even complex records made of multiple fields) and encodes each one into a sequence of bytes. The encoding is carefully designed so that when those bytes are sorted lexicographically (the way a computer naturally sorts byte sequences), the order matches how you'd expect the original values to sort. For instance, the number 5 encodes to bytes that come before the number 10's byte encoding. The library also optimizes for space—small numbers like 17 use only 1 byte, while larger ones use more, so you're not wasting storage.

It handles edge cases that naive serialization approaches miss: negative numbers, floating-point special values like NaN and infinity, empty strings, null values, and more. For composite keys (a record with multiple fields), you can build up a single byte sequence from multiple typed fields, then later split it back apart or skip over parts without deserializing everything.

Who Uses This and Why

This is for developers building systems on top of sorted key-value databases. The most direct use case is HBase, a distributed database that organizes data by sorted row keys. If you're designing a leaderboard that stores records keyed by (timestamp, username, score), Orderly lets you pack those three values into a single sortable byte key. A time-series database might use it to create keys that sort chronologically. MapReduce jobs processing billions of records benefit from the library's focus on memory efficiency—it reuses objects rather than creating new ones for each operation.

The README emphasizes performance considerations: using the right type (prefer reusable "Writable" types over immutable objects when processing huge datasets) and choosing the right encoding size (32-bit or 64-bit integers, variable or fixed length) based on your actual data range. The library is mature and well-documented with JavaDoc and example code, making it feasible to integrate even for teams unfamiliar with serialization internals.

Open on GitHub → Full breakdown on explaingit →