Zeek: The Storage Framework in Action

I’ve talked in a video and a blog post recently about the new Zeek storage framework. This blog post expands upon those previous posts to explain how to actually use the framework. It will do this by adapting an existing policy script to use the new features to replace an existing use of Broker’s storage methods. We’ll be using the async version of the API throughout this point, but a sync version exists as well. The primary difference between the two is that using the async methods requires calling them as part of when–conditions, whereas the sync methods can be called directly as well.

The basics for using the storage framework boil down to the following:

1. Load the base storage framework scripts via either/both of these lines:

@load base/frameworks/storage/async
@load base/frameworks/storage/sync

2. Load a policy for the backend you’re using by default. The example below loads the policy for sqlite. The policy file here defines an additional field for the Storage::BackendOptions record for use later.

@load policy/frameworks/storage/backend/sqlite

3. Decide what type of backend you’re going to use. It’s best to store this as a variable in your module, defined with the &redef attribute, so that it can be overridden by users if needed.

4. Create a variable of type Storage::BackendOptions that will hold the configuration for the connection to the backend. This should typically be defined with the &redef attribute so that it can be easily overridden in local.zeek.

5. Create a variable of type opaque of Storage::BackendHandle to hold the backend connection itself.

6. Use the open_backend method from the Storage module to open a connection and store it in the backend handle.

7. Use the various API methods to perform whatever operations are needed for storage for your script.

8. Optional, but preferred: Before Zeek exits, or when you are completely finished with storage operations, call the close_backend method on the backend handle.

There are small details in each of these steps, but that’s the gist of it.

A Real-World Example

Let’s get into some actual code. As I said before, we’re going to look at this through the lens of an existing policy script. The script is policy/protocols/conn/known-hosts.zeek, and has come with Zeek since v1.1. It was rewritten in v2.6 to use a Broker store instead of using a table local to each Zeek node. This allowed the nodes to share state across the cluster about what hosts were seen. The version released as part of Zeek 8.0.0 can be seen at https://github.com/zeek/zeek/blob/v8.0.0/scripts/policy/protocols/conn/known-hosts.zeek.

First, we’ll need to add a few new variables to the export block to setup the backend configuration:

## The type of storage backend to open.
const host_store_backend_type : Storage::Backend = 
Storage::STORAGE_BACKEND_SQLITE &redef;

## The options for the store. This should be redef'd in local.zeek to set
## connection information for the backend. The options default to a memory
## store.
const host_store_backend_options : Storage::BackendOptions = [ $sqlite = 
[$database_path=":memory:", $table_name=Known::host_store_name ]] &redef;

Above, we create a variable to store the backend type and set it to Storage::STORAGE_BACKEND_SQLITE. This will be used later to tell open_backend that we want to open a connection to an SQLite backend. We then create a Storage::BackendOptions variable and set some default values. This is a record that defines the various options needed to open the SQLite database. We use a memory-based database here by default, but this could be a filename instead. The table_name field defines the name of the table used to store this specific type of data in the database. It should be unique across uses of the same database file so as to not cause conflicts. See https://github.com/zeek/zeek/blob/master/scripts/policy/frameworks/storage/backend/sqlite/main.zeek for more details about available options for the SQLite backend. We’re reusing the same host_store_name variable that exists here, but we have to modify it slightly since SQLite has rules about what characters are allowed in table names:

## Table name to use for :zeek:see:`Known::host_store`.
const host_store_name = "zeekknownhosts" &redef;

Not mentioned above is how data is serialized before being stored by the backend. The Storage::BackendOptions record has a field called serializer of type Storage::Serializer. This can be set to provide a different serialization scheme. It defaults to Storage::STORAGE_SERIALIZER_JSON, which uses Zeek’s built-in JSON serialization. These serializers are plugins just like the backends themselves. We provide the JSON serializer as part of the Zeek distribution in 8.0.

Next, replace the existing Cluster::StoreInfo variable one for a storage backend handle:

global host_store: opaque of Storage::BackendHandle;

The remainder of the export block can stay exactly as it is. We’ll reuse a few of the other variables later. Next up, replace the call to Cluster::create_store in zeek_init():

event zeek_init()
 {
 if ( ! Known::use_host_store )
 return;

 local res = Known::host_store = 
 Storage::Sync::open_backend(Known::host_store_backend_type, 
 Known::host_store_backend_options, Known::AddrPortServTriplet, bool);
  if ( res$code == Storage::SUCCESS )
   Known::host_store = res$value;
  else
   Reporter::error(fmt("%s: Failed to open backend connection: %s", 
Known::host_store_name, res$error_str));
 }

This looks a bit complicated. Every operation you can call in the storage framework API returns a common type of Storage::OperationResult. This result carries the status of the operation, an error message if it fails, and optionally a result value if the operation can return one (such as an open backend handle, in this case). We opt to use the sync version of open_backend here because we want initialization to pause until we either succeed or fail at opening the backend. We pass the backend type and options that we defined earlier, as well as two types. The first type is the type of keys that can get inserted into the store, and the second type is the type of values that get inserted. In this case, our keys are of record type Known::AddrPortServTriplet and our values are bool. If you’ve looked at the existing script, you’ll realize that the value is a throwaway, but we have to insert something and bool is a simple type to use.

If open_backend succeeds, it sets the result’s code to Storage::SUCCESS and the results value to the handle we need for later. Otherwise, it sets it to any of a number of codes. The default set of codes is defined in Storage::ReturnCode, but backend plugins can add additional codes to this if needed.

Now that we have an open backend, we can get to using it to store data. Down in Known::host_found, we try to insert data into the store and use the return code to know whether or not this is a new host:

event Known::host_found(info: HostsInfo)
    {
    if ( ! Known::use_host_store )
        return;

    when [info, s, key] ( local put_res = 
Storage::Async::put(Known::host_store_backend, [$key=key, $value=T, 
$overwrite=F, $expire_time=Known::host_store_expiry]) )
        {
        if ( put_res$code == Storage::SUCCESS )
            {
            Log::write(Known::HOSTS_LOG, info);
            }
        else if ( put_res$code != Storage::KEY_EXISTS )
            Reporter::error(fmt("%s: storage put failure: %s",
                                Known::host_store_name,
put_res$error_str));
                }
    timeout Known::host_store_timeout
        {
        Log::write(Known::HOSTS_LOG, info);
        }
    }

Where we previously called Broker::put_unique, we now can call Storage::Async::put in order to insert new data into the store. We pass the backend, and a record denoting what exactly to try to insert and some other values controlling the operation. There’s the two obvious key and value values. overwrite here controls whether the operation should overwrite existing keys. expire_time is the amount of time before the key would be automatically removed from the store.

We care about two states for the results. If the insertion was a success, then it’s a new host that the cluster hasn’t seen before and can get logged. A result code of KEY_EXISTS means that the host was already in the store, and we can ignore it. Otherwise, there was some issue with doing the insertion and an error message gets logged. Because the async version of put was called here and was run as part of a when condition as required, we also check for a timeout.

Type Constraints and Other Operations

The two operations not mentioned in the example above are get and erase. They weren’t necessary for the known-hosts policy, but I’ll talk about them briefly here. The get operation retrieves the value for a given key from the store. Assuming the operation returns ReturnCode::SUCCESS, the value is stored in the value field of the OperationResult record. The erase operation removes a key/value pair from the store.

Each of these operations has a built-in constraint on the type of data that can be passed to them. As mentioned earlier, we pass the type of the key and the value to open_backend. If calls to the operations attempt to pass keys or values of types that don’t match, the operations will return appropriate return codes and error messages.

Wrapping It Up

We’ve talked a lot recently about the availability of the storage framework, and the new features and possibilities it brings to the Zeek ecosystem. We hope that the further dive into how to use the framework inspires you to try it out and find places that it can improve your own scripts.

Author

Tim Wojtulewicz

A core maintainer and merge master for the Zeek team, with 5+ years of Zeek experience and 20+ years in the software industry. I like to take things apart and put them back together again.

View all posts

The Storage Framework in Action

A Real-World Example

Type Constraints and Other Operations

Wrapping It Up

Author

Share this:

Like this:

Discover more from Zeek