Until Zeek 7.2, storing data across a cluster could be tricky and inefficient. The new Storage Framework changes that. This post explains the old model, the new framework, and what’s coming next.
The Old Model: Broker Storage
Storage in Zeek traditionally has run through the Broker subsystem, which uses a master/clone layout. The master has direct access to the underlying storage, while the clones send events when new entries are added to the store and the master attempts to keep everyone in sync. This approach works, but not without limitations:
- Clones have direct access to an in-memory table of the store, but may not see all of the entries at once. This speeds up access, but reduces accuracy.
- Busy stores generate a lot of extra event traffic, as the master has to send an event to every node with a clone.
- There’s no guarantee of eventual consistency for all of the nodes.
Serialization uses a custom binary format implemented by Broker, which isn’t user-friendly and can hinder debugging. Adding new backends to Broker is also complicated, requiring patches to the code and a deep understanding of its architecture. Broker provides a backend for SQLite and a non-persisting, memory-based one.
The New Approach: Zeek Storage Framework
The new storage framework turns this model on its head. Instead of relying on a master node, all nodes connect directly to the storage backend, whether that’s a local database like SQLite or an external server-based one like Redis or PostgreSQL. This means:
- All nodes have access to all of the data.
- Zeek no longer has to manage data synchronization and can simply use off-the-shelf products.
- Queries may be slightly slower as they require a round-trip to the backend, but the nodes aren’t waiting for syncs.
- Backends are implemented as plugins like any other in the Zeek ecosystem (similar to log writers and protocol analyzers), making it easy to add new storage options.
- Serialization is also implemented as plugins. By default we use Zeek’s native JSON format, but other serializations can be implemented for space, performance, or human-readability.
By leveraging off-the-shelf databases and a plugin-based architecture, the Storage Framework makes long-term and cluster-wide storage more flexible, maintainable, and user-friendly. By default, we provide built-in backends for SQLite and Redis, plus an external plugin for NATS (soon to be available from packages.zeek.org).
Try It Out
We hope that you’ll try out the Zeek storage framework, and would love to hear how it’s working for you. For a detailed walkthrough of how to set it up and use different backends, see our step-by-step technical guide, and don’t hesitate to reach out on Slack, Discourse, or Github with feedback or questions.
RSS - Posts