X

We are very happy to announce a new Zeek project now available on GitHub. The Spicy parser generator makes it substantially easier for Zeek to support and parse new protocols and file formats. I will tell you a bit more about Spicy’s capabilities and history in the following, and also show an end-to-end example of adding TFTP support to Zeek without writing a single line of C++ code. We are very interested in feedback, and I encourage you to give Spicy a try. 

Spicy Logo

What is Spicy?

Parsing network traffic is core to everything Zeek does: all Zeek scripts rely critically on the stream of events that Zeek’s built-in protocol analyzers produce by carefully decoding conversations on the network. Unfortunately, however, implementing these protocol analyzers remains one of the most challenging tasks in Zeek, as their code must robustly sift through large volumes of untrusted input—data that could have been sent by a noncompliant protocol stack, or forged by an adversary. The time and effort it takes to implement new analyzers has arguably become the primary bottleneck for advancing Zeek’s capabilities, with hardly any new analyzers making it into the distribution in recent years.

Spicy aims to overcome this state of affairs by drastically lowering the bar for creating new analyzers. Spicy provides a new, domain-specific scripting language that is tailored to describing protocols & file formats, and it comes with a compiler toolchain that turns that language into robust C++ parsing code. Through its abstractions, Spicy brings creating protocol analyzers down to the level of writing Zeek scripts: while certainly not trivial, it becomes a task accessible to a much broader set of people.

Spicy comes with a Zeek plugin that allows users to add support for new protocol and file analyzers simply by loading the corresponding Spicy code. You provide the grammar, specify which Zeek events to generate—and Spicy takes it from there. 

History

If you have been around in the Zeek community for a while, you will probably have already heard of Spicy. In 2016, some of us published a research paper presenting what we termed a “next-generation protocol parser generator”, meant to eventually replace our earlier “binpac” system currently shipping with Zeek. The Spicy paper came with an extensive prototype implementation, and a number of folks started trying out the new system by implementing custom Zeek analyzers. The feedback was quite positive and confirmed that Spicy’s approach provided a viable path to increasing Zeek’s protocol coverage. However, it also became clear that our initial prototype was not even close to supporting serious production usage; not only had it never actually seen any live traffic, but it also came with a number of technical challenges and limitations that left the code base hard to extend and maintain. That is common for research code of course; much of Zeek’s functionality started out as early prototypes that would often require comprehensive rewrites before going upstream. We were in the same boat with Spicy, except that our research team did not have the cycles to move it forward at the time.

This changed a couple of years later when Corelight committed to making Spicy real. Corelight’s open source team began working on a Spicy reimplementation. We developed the new version from scratch, incorporating the lessons we had learned from the prototype into a new code base. A few weeks ago, we reached our initial milestone: bringing back sufficient functionality so that users could start writing functional analyzers end-to-end, from Spicy source code all the way to transparent Zeek integration. At that point, we moved development over to GitHub in preparation for open sourcing the code under Zeek’s standard BSD license. As people are now getting a chance to try out Spicy, we will continue to improve and extend the implementation.

Users who have already worked with the old prototype will be happy to hear that the language has not changed much beyond some overall cleanup; see the release notes for a summary of differences if you would like to port old Spicy code over. The most fundamental change comes internally: the new Spicy version no longer generates LLVM bitcode, but instead standard C++, which helps with portability and maintenance. In that sense, Spicy is now closer to binpac.

Example: A TFTP Analyzer

To demonstrate Spicy’s capabilities, let us use it to add a small, new analyzer to Zeek that parses TFTP—a protocol sufficiently simple to show it here end-to-end. The following code constitutes a Spicy TFTP grammar that covers the original TFTP specification from RFC 1350:

tftp.spicy

module TFTP;

type Opcode = enum { RRQ = 0x01, WRQ = 0x02, DATA = 0x03, ACK = 0x04, ERROR = 0x05 };

public type Message = unit {  # entry point for parsing
  op: uint16 &convert=Opcode($$);
  switch ( self.op ) {  # branch by message type
    Opcode::RRQ   -> rrq: Request(True);
    Opcode::WRQ   -> wrq: Request(False);
    Opcode::DATA  -> data: Data;
    Opcode::ACK   -> ack: Ack;
    Opcode::ERROR -> error: Error;
    };
};

type Request = unit(is_read: bool) { # type handling both RRQ and WRQ
  fname: bytes &until=b"\x00";
  mode:  bytes &until=b"\x00";
};

type Data = unit {
  num:  uint16;
  data: bytes &eod;
};

type Ack = unit {
  num: uint16;
};

type Error = unit {
  code: uint16;
  msg:  bytes &until=b"\x00";
};

We won’t go into details of the protocol, but the general structure of the Spicy parser is probably not too difficult to follow. It will start parsing TFTP payload with the Message type and then descend down into the other PDUs from there.

Let’s integrate this into Zeek. Spicy comes with a Zeek plugin that performs all of the hard work here, we just need to tell it a couple of things about how we want Zeek to deploy our new analyzer. The plugin offers a small configuration language for that:

tftp.evt

# Make the content of our Spicy TFTP parser available to the rest of this file.
import TFTP;

# Tell Zeek when and how to use the Spicy TFTP parser.
protocol analyzer spicy::TFTP over UDP:
    parse with TFTP::Message,  # entry point for parsing TFTP payload
    port 69/udp;               # use analyzer for sessions on UDP port 69

# Define the events we want Zeek to generate as PDUs are parsed.
on TFTP::Request if ( is_read )	
   -> event tftp::read_request($conn, $is_orig, self.fname, self.mode);

on TFTP::Request if ( ! is_read ) 
   -> event tftp::write_request($conn, $is_orig, self.fname, self.mode);

on TFTP::Data  -> event tftp::data($conn, $is_orig, self.num, self.data);
on TFTP::Ack   -> event tftp::ack($conn, $is_orig, self.num);
on TFTP::Error -> event tftp::error($conn, $is_orig, self.code, self.msg);

This configuration adds five new Zeek events, tftp::{read_request,write_request,data,ack,error}, which the plugin will raise as the analyzer encounters the corresponding messages. To now do something with these events, we also need a Zeek script that accompanies the new TFTP analyzer, just like any standard Zeek analyzer comes with scripts implementing some base functionality, such as creating a Zeek log file. Due to some intricacies of TFTP, the Zeek script in fact ends up being the longest part of our new analyzer. We skip showing it here; you can find tftp.zeek on GitHub.  

We now have all the pieces together for our new Zeek TFTP analyzer. To put them to use, we first compile the new analyzer into object code using the spicyz tool coming with Spicy:

$ spicyz tftp.spicy tftp.evt -o tftp.hlto

That writes the compiled analyzer into a shared library containing the executable code, tftp.hlto. Assuming we have installed the Spicy plugin for Zeek, we can then simply pass that object file to Zeek on startup. If we add tftp.zeek as well, we’ll get a log file:

$ zeek -r tftp_rrq.pcap tftp.hlto tftp
$ cat tftp.log
ts                 uid                id.orig_h      id.orig_p  id.resp_h     id.resp_p ...
1367411051.972852  CHhAvVGS1DHFjwGM9  192.168.0.253  50618      192.168.0.10. 69        ...
  ... wrq  fname        mode   uid_data            size   block_sent  block_acked
  ... F    rfc1350.txt  octet  ClEkJM2Vm5giqnMf4h  24599  49          49

(tftp_rrq.pcap is a TFTP trace from Wireshark’s pcap archive.) 

That’s all. We just added a new protocol analyzer to Zeek.

Getting Started

We encourage you to give Spicy a try. It comes with a user manual that guides you through installation and initial steps. To get started easily, we are providing precompiled binaries for several Linux distributions and for macOS, as well as pre-built containers on Docker Hub. 

Please note that there are no stable Spicy releases yet—just a moving git master branch—so come back often. 

We would love to hear your feedback on the Spicy language & toolchain, and have several ways to get in touch:

What’s Next?

We are currently focussing on stabilizing the Spicy implementation in terms of feature set and code quality. We will also bring back some more advanced features that are still missing compared to the research prototype, such as error recovery and support for rewriting the content of network traffic at the protocol level.

We will also start developing a set of Zeek file analyzers. From Spicy’s perspective, there is not much of a difference between parsing protocols and file formats, and the Spicy plugin for Zeek already supports file analyzers as well.

We believe that Spicy has the potential to support applications beyond just Zeek as well. Spicy’s parsers operate standalone, with an API that can accommodate a range of use cases. We envision Spicy to eventually become a platform for sharing parser implementations across traditional application boundaries, to the benefit of the open source networking ecosystem in particular.

%d bloggers like this: