Additionally, paste this code immediately after the opening tag:

I recently tried my hand at writing my first protocol analyzer for Zeek. This is something that I’ve wanted to accomplish since first learning about Zeek. I recall trying to concatenate all the strings from tcp_contents() and parse application layer data using string splits and other hacks from scriptland. This is definitely not a good solution and, needless to say, I failed at writing any parsers this way.

Recently, I have been reading about RDP and noticed that Zeek has an RDP analyzer for TCP but not for UDP, so I decided to try and build one. Previously, I had heard (probably from Seth) that the syslog analyzer is a good example analyzer because of its simplicity, so I started by reading through that and trying to make sense of it. I also had previously asked on the mailing list for a good place to start, and was pointed at binpac_quickstart. I ran binpac_quickstart in a temporary directory to see what files it created and what their contents were. I then compared the contents of the syslog analyzer to what binpac_quickstart provided. By changing things in the syslog analyzer, like the component name in the file or defining methods of the analyzer in, and recompiling Zeek I was able to slowly understand how things tied together in the different files which make up an analyzer. 

Looking at other existing analyzers was also very useful. The Kerberos analyzer supports both TCP and UDP connections so it, too, was a good example to study for my purposes. I knew that RDPEUDP (the Microsoft name for RDP over UDP) carried TLS records as its payload, so I looked at the openvpn analyzer prototype as an example of something that passed data to the ssl analyzer. Unfortunately, I wasn’t able to get the RDPEUDP analyzer developed enough to pass payloads to the ssl analyzer. RDPEUDP will require a reassembler, similar to the one the TCP analyzer uses, as the protocol supports fragmentation and unordered delivery.

After reading through existing analyzers and the files generated by binpac_quickstart, I had enough understanding of the files’ contents to try to modify the existing plugin which provided the TCP RDP analyzer and add UDP support. Most of the C++ files which make up an analyzer  is boilerplate code, such as registering a new analyzer component in the files, or it can be pieced together from other examples. The logic in *-protocol.pac and *-analyzer.pac, however, are specific to the analyzer. The *-protocol.pac file is where protocol specific types and structures are defined. The *-analyzer.pac file is where the logic lives for “doing stuff” based on those structures as the protocol proceeds.

Pac files are source files for binpac. Binpac can be difficult to understand. I don’t claim to fully get it. I found using it effective but tedious to troubleshoot. In its defense, binpac has ambitious goals and delivers on them. My development workflow included constantly commenting out code, adding printf statements in different places, and recompiling Zeek. When compiling Zeek, I found using the ninja generator to be indispensable.

RDPEUDP is a quite complex protocol. It’s similar to TCP in many aspects. While the analyzer I started is nowhere near complete, I hope it serves as a point for collaboration. I’d like to end this blog with a list of things I learned during my protocol analyzer struggles:

  • Zeek has a style guide.
  • The Zeek Slack channel is super helpful. Thanks to everyone who put up with my whining and provided me with tips and pointers.
  • btest is not as scary as I imagined.
  • Pac files are compiled to C++ and then included as C++ header files when the C++ source is compiled. This means that someone wrote Python code (binpac_quickstart), which writes pac code, which compiles to C++, which then compiles to the Zeek binary’s machine instructions. All the different layers of abstraction are cool to think about.
  • Binpac’s DSL is nowhere near as feature rich as Zeek’s scripting language. In fact, everything you need to know can be found from existing analyzers or the binpac README. However, don’t get too attached to binpac as Spicy, its replacement, is already in development. 

Ultimately, adding a new protocol analyzer to Zeek was hard. However, I agree with Julia Evans when she encourages people to find and take on hard projects.

About Anthony Kasza

Anthony Kasza is a member of the Research Team at Corelight. At Corelight, Anthony is responsible for developing prototypes that provide insights into network activity. Prior to working at Corelight, Anthony was responsible for discovering new and tracking known threats, creating scalable classification systems, producing and operationalizing threat intelligence, and researching malware communication protocols.