Zeek: Reduce conn.log from 35GB to 5GB with a Simple Hook

Blocking scanners at your network edge solves one problem but creates another: Zeek will log every failed connection attempt, filling conn.log with noise from hosts you’ve already blocked.

A simple log filtering hook can eliminate this noise. Aaron Scantlin from NERSC (National Energy Research Scientific Computing Center) shared this approach during a February community discussion about Zeek customization. Here’s how it works and when to use it.

The Problem

When you use Zeek’s NetControl framework (or any blocking mechanism) to drop malicious traffic at your edge, blocked hosts don’t stop trying to connect. Your edge stops the traffic, but Zeek dutifully logs every blocked attempt as S0 connection states in conn.log.

The result: your conn.log fills with entries for traffic you’ve already handled. If you’re running a high-traffic deployment, this can mean gigabytes of noise obscuring the actual threats you need to investigate.

The Solution

This log filtering hook prevents Zeek from logging incomplete connection attempts from known scanners:

hook Conn::log_policy(rec: Conn::Info, id: Log::ID, filter: Log::Filter)
{
    if (!rec?$history)
        return;


    local hist = rec$history;
    if (hist != "S" && hist != "Sr" && hist != "SW")
        return;


    if ( rec$id$orig_h in Scan::known_scanners)
        break;
}

Code originally written by Justin Azoff. Shared by Aaron Scantlin (NERSC) during a Zeek Community Slack discussion.

This hook relies on Scan::known_scanners, a set populated by bro-simple-scan, a scanning detection script developed by NCSA and commonly used in Zeek deployments. When bro-simple-scan identifies a host performing scan-like behavior, it adds that IP to the known_scanners set, which this hook then uses to filter logging.

How It Works

The hook intercepts Zeek’s connection logging before it writes to conn.log.

The first check: does the connection have a history field? If not, there’s nothing to evaluate, so it returns and allows normal logging.

Next, it looks at what that history actually is. If it’s anything OTHER than “S” (SYN), “Sr” (SYN-reset), or “SW” (SYN-window advertisement), the connection has progressed beyond initial handshake attempts. That’s a real connection worth logging, so the hook returns.

Here’s what this means in practice: a connection with history “ShADadFf”—a full TCP handshake with data exchange—will always get logged. But a connection with just “S”? That’s an incomplete connection attempt.

The final check only runs if we’ve gotten this far, which means we’re looking at an initial handshake attempt. Is the originating IP in Scan::known_scanners? If yes, the hook breaks out of the logging policy. The connection doesn’t get logged.

Simple logic: completed connections always get logged. Initial handshake attempts from known scanners—all those S0 connection events from hosts you’ve already blocked—get filtered out.

Prerequisites

This works best when you’re tapping both sides of your network edge. If you only monitor internal traffic, you won’t see the blocked connection attempts in the first place, so there’s nothing to filter.

You also need to be actively identifying and blocking scanners. The obvious choice here is bro-simple-scan, but any method that populates Scan::known_scanners will work.

That said, if your deployment doesn’t match these criteria, the core principle still applies. You can use log policy hooks to filter out redundant or low-value entries from any Zeek log stream.

Real-World Impact

NERSC operates as an open science enclave—most of their resources are on the public internet, which makes them particularly sensitive to scanner noise. Their NetControl implementation automatically blocks hosts based on various notice categories. Before implementing this hook: 35GB of conn.log daily. After: 5-7GB. That’s an 85% reduction.

The storage savings were secondary. The real win was improved signal-to-noise ratio. When investigating potential threats, their analysts now search through 5GB instead of 35GB.

What’s Noisy in Your Logs?

Fifteen lines of code producing an 85% log reduction isn’t magic. It’s the result of identifying a specific pain point and using Zeek’s scripting capabilities to address it.

Your noisy logs probably don’t look like NERSC’s. Maybe you’re logging internal administrative traffic you don’t need. Maybe certain protocols generate redundant entries. Maybe you’re tracking connections you’ve already categorized as benign.

The pattern is the same: identify what you don’t like about your current logging, then ask whether a log policy hook can filter it. You don’t need complex frameworks or hundreds of lines of code. Sometimes the most valuable customization is knowing what NOT to log.

If you’re comfortable getting Zeek running but haven’t started customizing it, this is a good place to start: Look at your logs, find what’s noisy, and write the hook that filters it out.

Author

Michelle Pathe

Michelle Pathe is the Zeek Community Liaison at Corelight. She has over 7 years of experience managing technical communities and has worked with thousands of cybersecurity, software engineering, and data science professionals.

View all posts

Reduce conn.log from 35GB to 5GB with a Simple Hook