We are happy to announce the release of Spicy 1.0, an open source parser generator that makes it much easier for Zeek—and other applications—to support new protocols and file formats. We had made an initial, experimental version of Spicy available a little while ago. Since then, we have spent quite a bit of time making it more robust and easier to install, adding more features, and writing a few initial analyzers for Zeek as well. Many thanks to everybody who has given Spicy a try so far; your feedback has been extremely valuable (keep it coming!).

What is Spicy?

Spicy is a parser generator that makes it easy to create robust C++ parsers for network protocols, file formats, and more. Spicy is a bit like a “yacc for protocols”, but it’s much more than that: It’s a self-contained programming environment enabling developers to describe both syntax and semantics of a format through a single, unified language. Think of Spicy as a domain-specific scripting language for all your parsing needs.

Deep packet inspection systems—such as Zeek, but more generally firewalls, intrusion detection systems, inline virus scanners, proxies, etc.—must process large volumes of wire-format network data in real-time, and from untrusted sources. As they work their way from raw packets upwards through the network stack, they collect semantic information from a variety of protocols, regularly going far into the application-layer to extract, e.g., the bodies of HTTP sessions or attachments from emails. Many of these systems also process data beyond the network level to mine file content—documents, images, executables, and archives—for high-level context. For developers of these systems, the desire for such rich analyses means writing a large number of individual parsers for the potpourri of protocols and file formats that today’s networks carry. As anybody who has worked in this space can attest, implementing such parsers almost invariably turns into a daunting task even for relatively simple protocols: not only does it prove time consuming and cumbersome, but it also poses fundamental security challenges as in practice real-world network traffic regularly fails to follow standards and RFCs—inadvertently or potentially maliciously. For Zeek, the time and effort that it takes to implement new parsers has become one of the primary bottlenecks for advancing its capabilities: hardly any support for new protocols or file formats has made it into the distribution in recent years.

Spicy drastically lowers the bar for creating parsers through a domain-specific language tailored to describing protocols & file formats. It comes with a compiler toolchain that turns code written in that language into robust C++ parsing code ready for integration into host applications, either just-in-time at startup or precompiled ahead of time. With Spicy, creating new parsers becomes akin to writing policy scripts in Zeek: while certainly not trivial, it’s a task now accessible to a much broader set of people than those able to write safe C/C++ code.

Spicy’s parsers provide a simple C++ API to their host applications for feeding them data and retrieving results. Using a streaming model, they process their input fully incrementally as it comes in—without blocking, and without needing to buffer the input data as a whole. A Spicy parser can, hence, work concurrently on many inputs of arbitrary size, with Spicy’s runtime library hiding the low-level details behind the scenes. While we haven’t explored integration into applications other than Zeek much so far, we did present a proof-of-concept Wireshark integration at SharkFest last year that turned Spicy parsers into Wireshark dissector plugins.

Example

The following code shows a simple example of Spicy code parsing one of the most basic packet formats, a TFTP request:

module TFTP;                         

public type ReadRequest = unit {      
  opcode:   uint16;                 
  filename: bytes &until=b"\x00"; # parse data until null byte found
  mode:     bytes &until=b"\x00";
  
  on %done { print self; }        # once fully parsed, output the fields
};

Using Spicy’s just-in-time compiler spicy-driver, we can turn this source code into executable code and feed it input data, all in one step:

# printf '\000\001rfc1350.txt\000octet\000' | spicy-driver tftp.spicy
[$opcode=1, $filename=b"rfc1350.txt", $mode=b"octet"]

The printf crafts the contents of a fake TFTP request packet and pipes that into the Spicy parser. The output comes from the print statement inside the %done hook, showing the values that have just been parsed. In a more realistic setup, one wouldn’t actually print the parsed information, but instead pass it on to the host application for further processing. With Zeek, we’d now generate an event for sending into script-land; more on that below. This TFTP example is simplified of course, take a look at a full TFTP parser in Spicy’s tutorial if you’d like to see more. The spicy-driver tool used in this example is just a tiny wrapper around Spicy’s C++ API that other applications would deploy as well. 

Zeek Analyzers

While Spicy itself remains independent of Zeek, we have developed a Zeek plugin that makes all of its functionality available for implementing new protocol, file, and packet analyzers for Zeek—without having to write a single line of C++ code. For example, the following configuration instructs the plugin to raise a Zeek event for every TFTP request parsed with the Spicy code above, passing along the current connection record and direction as well:

on TFTP::ReadRequest -> event tftp::read_request($conn, $is_orig, self.filename, self.mode);

We can then write a Zeek event handler to process the event:

event tftp::read_request(c: connection, is_orig: bool, filename: string, mode: string)
	{
	print "TFTP request", c$id, is_orig, filename, mode;
	}

You can find the full Zeek integration for TFTP in the Spicy tutorial.

The experience with Spicy so far confirms that the system lowers the bar for adding new, robust parsing capabilities to Zeek substantially. Compared to Zeek’s traditional approach to writing analyzers through BinPAC, Spicy not only cuts down the time from idea to implementation, but it also relieves the developer from worrying about low-level efficiency and safety of their code. For those who have worked with BinPAC before, the Spicy learning curve shouldn’t be too steep either, as they will recognize a similar declarative approach to defining parsers, yet with the most cumbersome aspects of BinPAC removed from the picture.

We have started collecting an initial set of Spicy-based analyzers for Zeek, contributed by community members. The collection can be installed through the Zeek package manager and includes: new protocol analyzers for IPSec, OpenVPN, and Wireguard; a PNG file analyzer; an extended Portable Executable analyzer that extracts linker information as well; and also replacements for some of Zeek’s built-in analyzers (DHPC, DNS, HTTP). In addition to adding new functionality to Zeek, these analyzers also provide real-world examples of parsing complex formats with Spicy.

Getting started

The easiest way to install Spicy is picking one of the pre-built packages for Linux or macOS, and then adding the Zeek support through the Zeek package manager. To use the collection of Spicy-based Zeek analyzers, install the spicy-analyzers package as well. Alternatively, you can also start out by pulling our Docker image that comes with everything pre-installed, including Zeek and the analyzers; or you build it all from source on the platform of your choice. Once installed, work through the documentation’s Getting Started section to get a feel for the tools and the language. 

Learning more

Spicy comes with an extensive manual that describes installation and usage, including in particular a tutorial on writing parsers and integrating them into Zeek.  

We’d love to hear how Spicy is working out for you. If you have questions or thoughts, join the #spicy channel on the Zeek Slack. If you see any problems, please file a ticket on GitHub. And if you end up developing a new analyzer for Zeek, consider contributing it to the spicy-analyzers repository; just file a pull request. 

If you are interested in integrating Spicy into applications other than Zeek, check out the manual’s section on custom host applications. Feel free to ask any questions on this as well. 

What’s next?

Our goal is turning Spicy into Zeek’s standard approach for developing new analyzers. To that end, we plan to focus on two areas next: we are going to improve the code generator further to optimize its output for performance and robustness, and we are aiming to extend the Spicy language with further capabilities, including automatic error recovery when a parser encounters unexpected input. And of course, we’ll also keep writing Zeek analyzers!

Spicy Logo

Discover more from Zeek

Subscribe now to keep reading and get access to the full archive.

Continue reading