Introduction

This blog post will walk you through the process of adding a JPEG file analyzer to Zeek. Please keep in mind that our main goal in this blog series is to “teach a person to fish” along with a few small fish to get started as bait rather than simply providing an explanation of how to add JPEG support to Zeek. While it may be great to add field “XYZ” for the JPEG in this blog post, our main point of this series is to demonstrate how you could add any file analyzer with repeatable processes. It will be relatively simple to add additional fields from JPEG images using the methods we present in this blog series.

We will first discuss why you might want to add a custom file analyzer. Then, we will walk you through a brief overview of the File Analysis Framework (FAF) so that we can figure out where we will put our new file analyzer’s source code. Lastly, we will present the source code to create a simple JPEG file analyzer stub. You can find the complete source code for this article available in this branch on GitHub:

https://github.com/keithjjones/zeek/tree/file_jpeg_1

The difference between this source code and master, at the time this post was written, can be found at:

https://github.com/keithjjones/zeek/compare/file_jpeg_master…keithjjones:file_jpeg_1

Note that to understand this article we assume you are familiar with C++, Zeek scripting, and compiling Zeek from source.

Why Would You Want To Do This?

Zeek comes out of the box with a number of file analyzers already. There are file analyzers to calculate a file’s entropy, analyzers to extract files to a local hard drive, analyzers to parse Portable Executable (PE) files, and there are also analyzers to calculate file hashes (MD5, SHA1, and SHA256). You may have a new idea for a file analyzer but you may not know how to add your custom code to Zeek. If this sounds like your problem, then this article should help you because we are going to add a simple JPEG analyzer in much the same way.

For this task you must delve into the Zeek C++ source code for the file analysis framework located in the https://github.com/zeek/zeek/tree/master/src/file_analysis directory. We will discuss adding a JPEG file analyzer and explore the components inside Zeek that will allow for you to do this.

Enable Debugging

In order to view the logs we will need for development, we will need to enable debugging in Zeek. This can be done with the following configure command before you build Zeek from source:

./configure –enable-debug

After the source has been configured, build and install Zeek with the standard compilation commands:

make
sudo make install

Next, to generate the debugging logs for the FAF you must execute Zeek directly with the “-B file_analysis” command line option. This command line option is new, and it is now available because we enabled debugging with configuration command prior to building Zeek. You can also generate debugging messages for other portions of Zeek with the “-B” option, but that is outside the scope of this article.

At this point, you should create a file named “jpeg.zeek” and it should contain the following content:

event file_new(f: fa_file)
{
print “file_new”;
print f;
}

We will be executing Zeek in the following manner, so you should be able to execute the same command without error:

zeek -B file_analysis -r http.pcap jpeg.zeek

The pcap file is located at http://www.bro.org/bro-workshop-2011/solutions/logs/http.pcap and should be downloaded if you want to follow along. After executing the command line above, you should be able to see a debug.log created in your current directory. If this file exists, and it has content, then you are ready to move on to the next section.

File Analysis Framework Overview

The file analysis framework is a collection of:

  1. File magic signatures
  2. Built in functions (.bif)
  3. Zeek scripts
  4. C++ plugins

Each item above is important to the overall file analysis plug creation process. The first item, file magic signatures, is used to identify files on your network. Here, the signature will be looking for JPEG files, which have a very well known pattern in the first three bytes. Once the new files are identified, they are processed using a series of built in functions, Zeek scripts, and C++ plugins in the three remaining items listed above. Zeek’s process of identifying a file entering the file analysis framework can be simplified with the following diagram:

The diagram above shows that files are first identified by magic signatures. This is the same type of signature you may be familiar with through the Unix “file” command. Luckily, JPEG signatures are already defined in https://github.com/zeek/zeek/blob/master/scripts/base/frameworks/files/magic/image.sig with the other image file types. If there is a match, Zeek passes the newly identified file to a FAF manager. The following diagram looks deeper into how the FAF accepts new file data:

Before we discuss the components of the FAF above, it is important to remember that the file data is not provided to the FAF all at once, since files traversing your network are seen across multiple network packets. Also keep in mind that portions of the file can be received out of order. Luckily, there is some file reassembly logic already built into Zeek and the FAF to handle this.

Data is passed from Zeek to the manager (https://github.com/zeek/zeek/blob/master/src/file_analysis/Manager.h), through the file C++ class (https://github.com/zeek/zeek/blob/master/src/file_analysis/File.h), and finally to the analyzers (https://github.com/zeek/zeek/tree/master/src/file_analysis/analyzer/pe is one such example). The data will be transferred between the classes and functions listed above using streams. What this means is that each function will minimally expect a buffer size and a length of that buffer, and rarely will that buffer contain the whole file unless the file is extremely small. Optionally, the offset in the file may also be supplied to these functions.

Since the data is delivered as a stream, we will not have a whole file for analysis all at once and it would be unwise to buffer everything to have the whole file because we will quickly run out of memory. This is markedly different than traditional host based forensics and presents very unique software development challenges when we create a new file analysis plugin for a network traffic analysis framework like Zeek.

Creating The JPEG File Analysis Plugin

Now that the fundamentals are out of the way, creating a new JPEG file analysis plugin is a basic eight step process. Each step will be discussed in its own subsection below. The changes between master at the time this article was written and the changes discussed here can be viewed at:

https://github.com/keithjjones/zeek/compare/file_jpeg_master…keithjjones:file_jpeg_1

Step 1: Copy The PE Plugin

A working file analysis plugin is found in the “pe” directory. Copy this directory to the “file_analysis/analyzer” directory and call it “jpeg”. Next, rename the files PE.cc/PE.h to JPEG.cc/JPEG.h so your directory structure looks like the following:

Rename the “pe” to “jpeg” in the *.pac files. These files are binpac (https://github.com/zeek/binpac#glossary-and-convention) files, and they define the parser for PE and JPEG, shortly. Delete the binpac file with “idata” in the name, as this is PE specific and we will not use it with JPEG.

Step 2: Modify The CMake Files

The “CMakeLists.txt” in the “jpeg” directory should have the following content, which is based on the original PE plugin. Notice that the “PE” phrases have been translated to “JPEG” for the new plugin we are creating.

include(ZeekPlugin)

include_directories(BEFORE ${CMAKE_CURRENT_SOURCE_DIR}
                           ${CMAKE_CURRENT_BINARY_DIR})

zeek_plugin_begin(Zeek JPEG)
zeek_plugin_cc(JPEG.cc Plugin.cc)
zeek_plugin_bif(events.bif)
zeek_plugin_pac(
jpeg.pac
jpeg-analyzer.pac
jpeg-file-headers.pac
jpeg-file.pac
jpeg-file-types.pac
)
zeek_plugin_end()

The “CMakeLists.txt” in the “analyzer” directory should have the following content so that it includes the “jpeg” directory:

add_subdirectory(data_event)
add_subdirectory(entropy)
add_subdirectory(extract)
add_subdirectory(hash)
add_subdirectory(jpeg)
add_subdirectory(pe)
add_subdirectory(unified2)
add_subdirectory(x509)

Step 3: Rename the PE Class to the JPEG Class

While renaming the class, remove the logic so that it becomes a stub for the logic we would like to add later. In “JPEG.h”, the content of your new file will be:

#pragma once

#include <string>

#include “Val.h”
#include “../File.h”
#include “jpeg_pac.h”

namespace file_analysis {

/**
* Analyze Portable Executable files
*/
class JPEG : public file_analysis::Analyzer {
public:


      ~JPEG();

      static file_analysis::Analyzer* Instantiate(RecordVal* args, File* file)
             { return new JPEG(args, file); }

      virtual bool DeliverStream(const u_char* data, uint64_t len);

      virtual bool EndOfFile();

protected:
  JPEG(RecordVal* args, File* file);
      binpac::JPEG::File* interp;
      binpac::JPEG::MockConnection* conn;
      bool done;
};

} // namespace file_analysis



“JPEG.cc” should look like the following:



#include “JPEG.h”
#include “file_analysis/Manager.h”

using namespace file_analysis;



JPEG::JPEG(RecordVal* args, File* file)
    : file_analysis::Analyzer(file_mgr->GetComponentTag(“JPEG”), args, file)
       {
       conn = new binpac::JPEG::MockConnection(this);
       interp = new binpac::JPEG::File(conn);
       done = false;



mgr.QueueEventFast(file_jpeg, {
        GetFile()->GetVal()->Ref()
});



}



JPEG::~JPEG()
        {
        delete interp;
        delete conn;
        }



bool JPEG::DeliverStream(const u_char* data, uint64_t len)
       {
       if ( conn->is_done() )
              return false;

       try
              {
              interp->NewData(data, data + len);
              }

       catch ( const binpac::Exception& e )
               {
               return false;
               }





       return ! conn->is_done();
       }





bool JPEG::EndOfFile()
       {
       return false;
       }

The binpac files may feel convoluted, so an explanation is in order. The binpac file “JPEG.pac” is used first. The content of this file is the following:

%include binpac.pac
%include bro.pac

analyzer JPEG withcontext {
       connection: MockConnection;
       flow: File;


};

connection MockConnection(bro_analyzer: BroFileAnalyzer) {
       upflow = File;
       downflow = File;


};

%include jpeg-file.pac




flow File {
        flowunit = JPEG_File withcontext(connection, this);
}






%include jpeg-analyzer.pac

This sets up an analyzer called “JPEG” with a connection to a flow called “File”. The structure “JPEG_File” is found in the included file “jpeg-file.pac”. The content of “jpeg-file.pac” is:



%include jpeg-file-types.pac
%include jpeg-file-headers.pac

# The base record for a JPEG file
type JPEG_File = case $context.connection.is_done() of {
       false -> JPEG : JPEG_Image;
       true -> overlay : bytestring &length=1 &transient;
};

type JPEG_Image = record {
       headers : Headers;
       pad : Padding(padlen);
} &let {
       padlen: uint64 = 100;
} &byteorder=bigendian;

refine connection MockConnection += {

       %member{
              bool done_;
       %}

       %init{
              done_ = false;
       %}

       function mark_done(): bool
              %{
              done_ = true;
              return true;
%}

       function is_done(): bool
              %{
              return done_;
              %}


};

The included “jpeg-file-types.pac” has the following content:

# The BinPAC padding type doesn’t work here.
type Padding(length: uint64) = record {
pad: bytestring &length=length &transient;
};

This new type is used by the other binpac files, and it was carried over from the PE plugin. The other included file “jpeg-file-headers.pac” defines the header structure and has the following content:

type Headers = record {
       jpeg_header : JPEG_Header;
} &let {
       # Do not care about parsing rest of the file so mark done now …
       proc: bool = $context.connection.mark_done();


};

type JPEG_Header = record {
        soi : bytestring &length=2;
        app : bytestring &length=2;
};

Most of the code is a copy from the PE version of the same file, with the headers shortened to just the three bytes expected at the beginning of every JPEG file (https://www.media.mit.edu/pia/Research/deepview/exif.html or https://en.wikipedia.org/wiki/JPEG_File_Interchange_Format). An example JPEG from this pcap trace looks like the following:

Notice that the first two bytes are “FFD8”. The first field “soi” is the “Start of Image” and are these two bytes. The next field is “app”, the application number, and is 0xFF followed by another byte that will define the application. The “soi” will always be 0xFFD8 and the “app” will always start with 0xFF for a JPEG image. Additional JPEG structures will be discussed in the second article.

The last binpac file is named “jpeg-analyzer.pac” and has the following content:

%extern{
#include “Event.h”
#include “file_analysis/File.h”
#include “events.bif.h”
%}

%header{
%}

%code{
%}

refine flow File += {

        function proc_jpeg_header(h: JPEG_Header): bool
               %{

            DBG_LOG(DBG_FILE_ANALYSIS, “TRYING TO PROCESS A JPEG!!!”);



            if ( file_jpeg )


               {




                        DBG_LOG(DBG_FILE_ANALYSIS, “PROCESSING A JPEG!!!”);

               }



               return true;
               %}



};

refine typeattr JPEG_Header += &let {
    proc : bool = $context.flow.proc_jpeg_header(this);
};

The script above adds a function to parse the “JPEG_Header” headers. The parsing will output some debugging logs with the “DBG_LOG” function, but additional logic could be added to parse additional JPEG attributes if so desired.

Step 4: Define The Plugin

Next, edit Plugin.cc to be the following so that the new “JPEG” file analysis plugin exists:

// See the file in the main distribution directory for copyright.

#include “plugin/Plugin.h”

#include “JPEG.h”

namespace plugin {
namespace Zeek_JPEG {

class Plugin : public plugin::Plugin {
public:
       plugin::Configuration Configure()
               {
               AddComponent(new ::file_analysis::Component(“JPEG”, 
::file_analysis::JPEG::Instantiate));

               plugin::Configuration config;
               config.name = “Zeek::JPEG”;
               config.description = “JPEG analyzer”;
               return config;
               }


} plugin;

}
}

The code above links the C++ source code stub we are creating to a component we can call through Zeek scripts. The call will come in step 7.

Step 5: Create A New Event

Open “events.bif” and make sure it matches the following:

## This event is generated each time file analysis identifies
## a jpeg file.
##
## f: The file.
##
event file_jpeg%(f: fa_file%);

This will create a single new event called “file_jpeg” that we will use later. This file is automagically compiled into C++ source code by CMake, so just listing the new event here is enough to create it!

Step 6: Add the Zeek Scripts for JPEG Handling

Next, you will create a subdirectory to the file analysis script directory at https://github.com/zeek/zeek/tree/master/scripts/base/files called “jpeg”. Inside this directory you will need two files. The first file is named “__load__.zeek” and contains a single line:

@load ./main




The other file is named “main.zeek” and contains the following script, and is very similar to the PE script:

module JPEG;

export {


       redef enum Log::ID += { LOG };

       type Info: record {
              ## Current timestamp.
              ts: time &log;
              ## File id of this portable executable file.
              id: string &log;
       };




       ## Event for accessing logged records.
       global log_jpeg: event(rec: Info);



       ## A hook that gets called when we first see a JPEG file.
       global set_file: hook(f: fa_file);


}

redef record fa_file += {
        jpeg: Info &optional;
};

const jpeg_mime_types = { “image/jpeg” };

event zeek_init() &priority=5
        {
        Files::register_for_mime_types(Files::ANALYZER_JPEG, jpeg_mime_types);
        Log::create_stream(LOG, [$columns=Info, $ev=log_jpeg, $path=”jpeg”]);
        }

hook set_file(f: fa_file) &priority=5
        {
        if ( ! f?$jpeg )
            {
               f$jpeg = [$ts=network_time(), $id=f$id];
            }
     }

event file_jpeg(f: fa_file) &priority=5
       {
       hook set_file(f);
       }

event file_state_remove(f: fa_file) &priority=-5
       {
       if ( f?$jpeg )
            {
              Log::write(LOG, f$jpeg);
              }
       }

This script sets up the logging and attaches our new JPEG analyzer to any file determined to be a JPEG file via its inferred MIME type. Because of the location of this script, it will be loaded automatically like the PE plugin. Right now, the JPEG plugin simply outputs the JPEG file’s ID number and a timestamp. Additional logic will be added to this plugin later, but this code will allow us to see that our new plugin works after we compile.

Step 7: Use The New Event

Make your “jpeg.zeek” script content the following.

event file_jpeg(f: fa_file)
   {
   print “file_jpeg”;
   print f$jpeg;
   }

The script prints the file JPEG information we just created.

Step 8: Compile And Test

After every substantial change you will want to compile and test your changes. The following commands will accomplish this:

make
sudo make install
zeek -B file_analysis -r http.pcap jpeg.zeek

With “make”, as long as you haven’t executed “make clean” recently you will only need to rebuild portions of Zeek that have changed. This will substantially improve your compile time.

After execution, you should see lines in your “debug.log” file similar to the following:

1320279566.886920/1574191165.328912 [file_analysis] [FFTf9Zdgk3YkfCKo3] Add analyzer JPEG

If you open your “files.log” file, you will see “JPEG” show up in the analyzers for each JPEG file, and not files that are not JPEG. This proves that the JPEG file analyzer we created is being attached to JPEG files processed by Zeek. You should also see your debugging lines that demonstrate the binpac file parsing:

1320279566.886920/1574292315.388565 [file_analysis] TRYING TO PROCESS A JPEG!!!
1320279566.886920/1574292315.388569 [file_analysis] PROCESSING A JPEG!!!

The basics you have learned in this article will become an iterative process you should be familiar with as we improve our JPEG file analyzer in the next article.

Conclusion

This article walked you through the process of enabling debugging in Zeek, copying a working plugin, and modifying that plugin so it would become a new JPEG file analysis plugin stub. The source code created for this article is available in the GitHub branch at https://github.com/keithjjones/zeek/tree/file_jpeg_1. The next article will address adding additional logic to our JPEG file analyzer along with the types of data our analyzer will output to the rest of Zeek.

————-

About Keith J. Jones, Ph.D


Dr. Jones is an internationally industry-recognized expert with over two decades of experience in cyber security, incident response, and computer forensics. His expertise includes software development, innovative prototyping, information security consulting, application security, malware analysis & reverse engineering, software analysis/design and image/video/audio analysis.

Dr. Jones holds an Electrical Engineering and Computer Engineering undergraduate degrees from Michigan State University. He also earned a Master of Science degree in Electrical Engineering from MSU. Dr. Jones recently completed his Ph.D. in Cyber Operations from Dakota State University in 2019.
 
%d bloggers like this: