Zeek: Building a Redis Analyzer with Spicy, Part II

In our last post, we left off with a functional but incomplete analyzer for Redis. It analyzed RESP (Redis Serialization Protocol) traffic, crudely created “command” objects, then sent that off into Zeek script in events in order to log it. It created something nice, but there are some big holes.

If you want more on the initial intricacies of making a Redis analyzer in Spicy, check out the first post. This is building off that post, so it may not make sense without the first.

Here we’ll discuss the biggest of the holes after the first part: the client/server relationship.

Redis client/server

As we discussed in the first post, there’s a difference between RESP (that is, serializing data) and Redis commands. RESP is just a way to serialize data, but it does have one small notion of a client/server relationship, even though it’s not entirely clear. Here’s a quote from the “inline commands” section in the RESP documentation:

Basically, to issue an inline command, you simply write space-separated arguments in a telnet session. Since no command starts with * (the identifying byte of RESP Arrays), Redis detects this condition and parses your command inline.

This seems to imply that the client can only send data as an array (which starts with *) or inline (that is, without any serialization). If we boot up a Redis server and try just sending one simple string:

$ (printf "+SET hi there\r\n"; sleep 1) | nc localhost 6379
-ERR unknown command '+SET', with args beginning with: 'hi' 'there'

It seems like the server is interpreting +SET as a command, rather than a serialized piece of data. Sure enough, we can find this in the Redis source code:

/* Determine request type when unknown. */
if (!c->reqtype) {
    if (c->querybuf[c->qb_pos] == '*') {
        c->reqtype = PROTO_REQ_MULTIBULK;
    } else {
        c->reqtype = PROTO_REQ_INLINE;
    }
}

if (c->reqtype == PROTO_REQ_INLINE) {
    if (processInlineBuffer(c) != C_OK) break;
} else if (c->reqtype == PROTO_REQ_MULTIBULK) {
    if (processMultibulkBuffer(c) != C_OK) break;
} else {
    serverPanic("Unknown request type");
}

This will mark any request that starts with * as multibulk and anything else as inline. So, our big Data unit doesn’t need to be quite so big for clients — and, as a bonus, we can just say everything that comes from the client is a command. We will no longer use the same structure for both the client and the server! Even if that’s not what the RESP docs seem to suggest (to me, at least).

Spicy client/server

So now we have a new set of requirements: clients and servers don’t send the same thing back and forth, the client may send one of two possibilities, while the server may send any serialized data back.

Before, our main message parsing looked like this:

public type Messages = unit {
  : (Data &synchronize)[];
};

Which was used for both clients and servers. Now, we’ll have two:

public type ClientMessages = unit {
  : (ClientData &synchronize)[];
};

public type ServerMessages = unit {
  : (ServerData &synchronize)[];
};

Then, we need to tell Zeek to use ClientMessages for the client and ServerMessages for the server in the .evt file:

protocol analyzer spicy::Redis over TCP:
  parse originator with RESP::ClientMessages,
  parse responder with RESP::ServerMessages;

Any originator data is parsed with ClientMessages. Any responder data is parsed with ServerMessages.

The new data

The client data is just a simplified version of Data from before. Instead of a big switch that enumerates all possibilities, we have two possibilities, so we can just go back to an if statement:

if (self.ty == DataType::ARRAY) {
  multibulk: Array;
} else {
  inline: RedisBytes &convert=(pack(cast<uint8>(self.ty), spicy::ByteOrder::Network) + $$)
                     &max-size=1024;
};

Unfortunately for anyone with eyes, the big unruly &convert stayed for inline data. Again, that’s just to prepend the byte from ty to the remaining bytes until CRLF. While we’re here, let’s fix that using random access in Spicy.

First we store the starting point on initialization:

var start: iterator<stream>;
on %init() { self.start = self.input(); }

Then, the ty field just “unparses” itself if it’s not an array:

ty: uint8 &convert=DataType($$) {
  if (self.ty != DataType::ARRAY) {
    # This is inline, so we need to reparse `ty`
    self.set_input(self.start);
  }
}

Voila, we have our complete ClientData:

public type ClientData = unit {
  %synchronize-after = b"\x0d\x0a";

  var start: iterator<stream>;
  on %init() { self.start = self.input(); }

  # Clients can only be an array or inline
  ty: uint8 &convert=DataType($$) {
    if (self.ty != DataType::ARRAY) {
      # This is inline, so we need to reparse `ty`
      self.set_input(self.start);
    }
  }
  if (self.ty == DataType::ARRAY) {
    multibulk: Array;
  } else {
    inline: RedisBytes &max-size=1024;
  };
};

Server data

Server data is actually much easier, it’s just serialized RESP data:

public type ServerData = unit {
  %synchronize-after = b"\x0d\x0a";
  data: Data;
};

So why does the ServerMessages unit not just use Data directly? Well imagine an array of two elements:

[$ty=DataType::ARRAY, $array=[
  $num_elements=2,
  $elements=[
    [$ty=DataType::SIMPLE_STRING, $simple_string=[$content=b"Hi"]],
    [$ty=DataType::SIMPLE_STRING, $simple_string=[$content=b"there"]]
  ]
]]

We have the outer Data object (with $ty=DataType::ARRAY) and two inner Data objects (with $ty=DataType::SIMPLE_STRING). If we trigger an event for some Data object, we’ll get three events, one for each. With the ServerData wrapper, we can trigger an event on all ServerData and only get one event per server response.

Revisiting command parsing

If you recall, there were two steps to parsing the Redis client data:

Parse serialized data as RESP
Form the serialized RESP into Redis commands

The second step was done with Spicy function calls from the .evt file. There were separate cases for arrays of bulk strings and inline data (handled by functions like make_command and inline_command). We can clean a lot of that up. It’s now absolutely trivial to know that something is a command when parsing the RESP data: if it’s from the client, it’s a command.

Since anything from the client is a command, we can add an extra step after parsing the client data that “parses” the parsed data. For this, we’ll use a variable and a hook, like so:

public type ClientData = unit {
  # …
  var command: Redis::Command;
  on %done {
    self.command = Redis::make_command(self);
  }
};

Okay, so every ClientData sets its command from the result of make_command. That is defined in some redis.spicy file here:

public function make_command(command: RESP::ClientData): Command {
  if (command?.multibulk)
    return bulk_command(command);
  else
    return inline_command(command);
}

Here we finally have a relatively elegant solution: all client data gets reparsed into a Command object. Bulk commands are already “tokenized,” while inline commands need to get special handling to tokenize them (read: splitting on whitespace, but not if it’s quoted). The Command object from make_command has anything you may need:

type Command = struct {
    raw: vector<bytes>;
    command: bytes;
    key: optional<bytes>;
    value:optional<bytes>;
    known: optional<KnownCommand>;
};

One of the more powerful pieces here is the known field, where if we recognize the command, we store it as an enum. Remember in the previous post, we used the is_get method which got a KnownCommand from a command_from function. This recalculated the KnownCommand every time it was called. We can now avoid that recalculation by using the known field. For example, this is how we can check if a command is GET now:

public function is_get(data: RESP::ClientData): bool {
  return data.command.known
    && *(data.command.known) == KnownCommand::GET
    && |data.command.raw| >= 2;
}

Just with some extra validation to make sure we don’t do anything silly.

This also cleans up the .evt file substantially. Here’s the event definition that triggers for each command:

on RESP::ClientData -> event Redis::command($conn, $is_orig, self.command);

There are no conditions nor extra function calls! It just shoves what we parsed into an event. Nice and simple. Before, we needed to (try to) make sure we weren’t sending server data as a command — but now we know!

So, separating the client and server really simplified client data. It also allowed us to avoid repeating work. But, the real reason for this preliminary work is to correlate a request with the server’s response.

Associating requests with responses

Now we want to connect some client data to the server data that responded to it. I implemented this in Zeek script, but depending on your use, it may be a lot easier to use Spicy’s contexts to solve this problem.

NOTE: This isn’t an endorsement for this particular problem — it probably would be better to use contexts in the general case. But, the Redis analyzer has a couple considerations, such as some commands that can disable server responses. I wanted to keep the understanding of what Redis commands do out of the Spicy code entirely — that way Zeek script can decide what to do with it. But, that doesn’t mean this couldn’t have been solved in a better way.

We already have all client data getting sent to Zeek script, but not server data, so we’ll add that to the .evt file:

on RESP::ServerData -> event Redis::server_data($conn, $is_orig, Zeek_Redis::make_server_data(self));

And what’s the result of make_server_data? We can’t currently send over recursive data types, so we’ll trim it down. Here’s the structure I came up with:

public type ZeekServerData = struct {
  err: bool;
  data: optional<bytes>;
};

So it tells you whether this is an error or not. Then, data will get set if we have some really common cases like simple_string or simple_error — that seems like a good enough start for now.

Now we’ll go over into Zeek script for the rest of this. That same structure is here in Zeek script world with attached &log annotations:

type ServerData: record {
    ## Was this an error?
    err: bool &log;
    ## The string response, if it was a simple string or error
    data: string &log &optional;
};

Then we have an Info object that stores the valuable information that we’ll later log (some generic fields removed):

type Info: record {
    # …
    ## The Redis command
    cmd: Command &log &optional;
    ## The response for the command
    response: ServerData &log &optional;
};

Then, we need a way to correlate the Command and its response. Naively, you could just use one Info object attached to the connection, then the following should happen for most requests/responses:
– The command event is triggered. Set the Info$cmd field in that event.
– The server_data event is triggered. Set the Info$response field in that event.
– Log the Info record in the server_data event.

Except that won’t work.

Why not? Well, you can send multiple commands, then get multiple responses. If we sent three commands and got three responses, the above would overwrite Info$cmd twice, use the third value for the first response, then log 2 responses without an associated command.

Instead, we need a queue. That’s implemented in some State record:

type State: record {
    ## Pending requests.
    pending: table[count] of Info;
    ## Current request in the pending queue.
    current_request: count &default=0;
    ## Current response in the pending queue.
    current_response: count &default=0;
};

So we have some association of a count to that request’s Info, then track the current “pointer” to a request and response.

Then we add a bit to the connection record so this persists between events:

redef record connection += {
    redis: Info &optional;
    redis_state: State &optional;
};

Okay, setup complete! So the goal here is to correlate command N with response N, then log those when they’re complete. For now, we’re going to make a lot of assumptions (frankly, probably too many assumptions). That’ll be good to get the proof-of-concept.

First, when we get a Command event, we need to put that into the State. I’ll do that in a helper function, set_state, so the handling in the event is pretty minimal:

# In command event
++c$redis_state$current_request;
set_state(c, T);
c$redis$cmd = command;

And set_state’s signature looks like this:

function set_state(c: connection, is_orig: bool)

We use the c$redis field after set_state — so we expect set_state to properly set the c$redis field. The “appropriate” Info value depends on whether this was a request or a response, which is why we passed in the is_orig boolean. Based on that, the set_state function starts by identifying the correct index:

local current: count;
if ( is_orig ) current = c$redis_state$current_request;
else current = c$redis_state$current_response;

Then we just need to grab that pending Info and set the field in the connection object:

# In case this is the first time we see this request/response
if ( current !in c$redis_state$pending )
    c$redis_state$pending[current] = new_redis_session(c);

c$redis = c$redis_state$pending[current];

Given that set_state, we can see that the server_data event has almost identical handling, just with a different is_orig value:

# In server_data event
++c$redis_state$current_response;
set_state(c, F);
c$redis$response = data;

There is one extra piece in the server_data event, though:

Log::write(Redis::LOG, c$redis);
delete c$redis_state$pending[c$redis_state$current_response];

Since we’re making the assumption that a response always comes after a request, we just assume it’s ready for logging and send it off to get logged.

Requests without responses

Okay, there’s one more point with logging. There are a number of times that Redis will not send one request for one command exactly. You can even control this manually with the CLIENT REPLY command. With CLIENT REPLY, you may disable responses. Therefore, we can’t just increment current_response and expect it to line up with the requests. Instead, we watch for CLIENT REPLY commands, then if we turn responses back on, we make the response number “catch up” with the request number. This is done with a new response_num function (which I will not show):

local previous_response_num = c$redis_state$current_response;
c$redis_state$current_response = response_num(c);

Then, we log anything between previous_response_num and current_response, and finally this one:

while ( previous_response_num < c$redis_state$current_response ) {
    if ( previous_response_num == 0 ) {
        ++previous_response_num;
        next;
    }

    if ( previous_response_num in c$redis_state$pending ) {
        Log::write(Redis::LOG, c$redis_state$pending[previous_response_num]);
        delete c$redis_state$pending[previous_response_num];
    }

    previous_response_num += 1;
}

# Log this one
Log::write(Redis::LOG, c$redis);
delete c$redis_state$pending[c$redis_state$current_response];

So that extra logic is only necessary because we can manually disable responses. This sort of logic can vary a fair bit depending on what you care about in your protocol. You can imagine that many protocols will have some nuance to them that would require similar logic, though.

The logs

Remember the Fibonacci logs from last time? Well, now we can have the response after!

GET :1:factorial_50 - F -
SET :1:factorial_1 1 F OK
SET :1:factorial_2 2 F OK
SET :1:factorial_3 6 F OK
SET :1:factorial_4 24 F OK
SET :1:factorial_5 120 F OK
SET :1:factorial_6 720 F OK
SET :1:factorial_7 5040 F OK
SET :1:factorial_8 40320 F OK
SET :1:factorial_9 362880 F OK
SET :1:factorial_10 3628800 F OK
SET :1:factorial_11 39916800 F OK
SET :1:factorial_12 479001600 F OK
SET :1:factorial_13 6227020800 F OK
SET :1:factorial_14 87178291200 F OK
SET :1:factorial_15 1307674368000 F OK
SET :1:factorial_16 20922789888000 F OK
SET :1:factorial_17 355687428096000 F OK
SET :1:factorial_18 6402373705728000 F OK
SET :1:factorial_19 121645100408832000 F OK
SET :1:factorial_20 2432902008176640000 F OK
SET :1:factorial_21 51090942171709440000 F OK
SET :1:factorial_22 1124000727777607680000 F OK
SET :1:factorial_23 25852016738884976640000 F OK
SET :1:factorial_24 620448401733239439360000 F OK
SET :1:factorial_25 15511210043330985984000000 F OK
SET :1:factorial_26 403291461126605635584000000 F OK
SET :1:factorial_27 10888869450418352160768000000 F OK

The server said OK for each of them and never errored.

Here’s what an error would look like:

SET bad1 "unclosed double quotes T ERR Protocol error: unbalanced quotes in request

This was from an inline request with unbalanced quotes. You can see T in the fourth column referring to the err field — so it was an error. It had a protocol error message. Cool.

Conclusions

So, the client and server have an obvious relationship. One sends a command, the other responds. We want to associate those two things together. There are a bunch of considerations though, not all of which are easy to solve.

There is one hole that seems pretty difficult to account for: synchronization. That means the analyzer can get in a very weird state from some unexpected input, then recover to a known state. But, now our counts are going to be off, since we’ll never send an event into Zeek script to keep track of that. These sorts of edge cases may be more or less important to deal with elegantly, depending on your use case and your protocol.

Most protocols need some different handling to associate a request and response. In fact, the basis for the code I wrote for Redis is modeled on HTTP. But, at a certain point, you have to consider your specific protocol and its edge cases. That requires extra work and a keen eye to spot each case. Solving those problems may be hard (or impossible!), but knowing about them can help you create a solution that mitigates any potential issues.

Author

Evan Typanski

Hi! I'm Evan, an open source developer on Zeek and Spicy. I focus on programming languages and compilers. In my free time, I like to run, cook, bake, go to concerts, play music, and more 🙂

View all posts

Building a Redis Analyzer with Spicy, Part II