Why your dashboard is green, but the server is dead

A tutorial on Event-Driven Monitoring with eBPF

Nov 19, 2025

When a Linux server crashes, the logs often show nothing. The system freezes, you reboot it, and the problem disappears.

This happens because of how standard monitoring tools work. They use a method called Polling.

I have been managing Linux systems for 15 years. I have seen this pattern too many times. We need to stop guessing.

This post explains why polling fails, what eBPF is, and how engineers are using it to debug “invisible” system failures.

The Problem: Polling vs. Tracing

Most tools we use today (like top, htop, or standard monitoring agents) use Polling.

How Polling Works
Every few seconds, the tool wakes up. It reads a file in the /proc directory (where Linux keeps status reports). It calculates the numbers, prints them, and goes back to sleep.

The Flaw
Imagine you check your front door every minute.

09:00:00 - You check. The door is closed.
09:00:30 - Someone opens the door, walks in, and closes it.
09:01:00 - You check. The door is closed.

To you, the door never opened. You missed the event because it happened between your checks.

The “Invisible” Crash
This is why tools miss “Fork Bombs” (when a process replicates too fast).

A bad script starts 100 new processes.
These processes run for only 10 milliseconds.
They consume all CPU power.
They exit.

By the time top wakes up 5 seconds later, the processes are gone. The CPU usage average looks normal. But the server was frozen for those 10 milliseconds.

If this happens repeatedly, the server becomes unresponsive, but your dashboard shows everything is green.

The Solution: Tracing with eBPF

To fix this, we stop Polling and start Tracing.

To understand the solution, we must understand the technology: eBPF.

Think of the Linux Operating System as a secure building.

User Space: This is where your applications (Python, Nginx, Database) live. They are tenants.
Kernel Space: This is the maintenance room. It controls power, water, and doors (Memory, CPU, Network).

Usually, you cannot go into the maintenance room. If you want to know what is happening, you have to ask the building manager.

eBPF allows us to install a programmable camera inside the maintenance room.

Technically, eBPF is a “Virtual Machine” inside the Linux Kernel. It lets us run very small, safe programs directly in Kernel Space.

It is Safe: The Kernel checks every line of code before running it to ensure it will not crash the system.
It is Event-Driven: It does not wait to be asked. When a file is opened, or a packet arrives, the eBPF program runs instantly.

Real Scenarios

We attach small eBPF programs to specific “Tracepoints” in the Linux Kernel. A Tracepoint is a hook in the code where we can listen for events.

Scenario A: Detecting the Fork Bomb
Instead of counting processes every 5 seconds, we attach an eBPF program to the sched_process_fork event.

Standard Tool: “I count 150 processes right now.”
eBPF Program: “Process ID 500 just created a child. Process ID 500 just created another child.”

If Process 500 creates 50 children in one second, the eBPF program sees 50 individual events. It can calculate the rate instantly and trigger an alert before the system runs out of resources.

Scenario B: Catching Memory Leaks Early
Memory leaks are often slow. A process might add 1MB of data every hour.

Standard Tool: Checks total memory. It only alerts you when memory is 99% full. At that point, the Linux “OOM Killer” runs and kills a random process to save the system.
eBPF Program: We trace memory allocation events (malloc or kernel page faults). We calculate the slope (the rate of growth). If a process grows consistently for 10 minutes, we know it is a leak pattern. We can alert the engineer when memory is at 60%, giving them hours to fix it.

Summary

If you want to build reliable systems, you must stop guessing what happened between the samples.

Polling answers: “What is the state right now?”
Tracing (eBPF) answers: “What changed?”

By moving from checking the dashboard to watching the engine directly, we can detect failures that were previously invisible.

How to Start Learning eBPF

If you want to try this yourself, you do not need to write raw Kernel code. Several open-source projects make it easy:

1. BCC (BPF Compiler Collection)
A toolkit for creating efficient kernel tracing and manipulation programs.

2. bpftrace
A high-level tracing language for Linux eBPF (similar to awk/sed).

codeBash

# Example: Count syscalls by process name
sudo bpftrace -e ‘tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }’

3. Aya (Rust)
If you prefer writing in Rust, Aya allows you to build eBPF programs with type safety.

In future posts, I will write more about how to implement these concepts in production environments.

Parth’s Substack

Discussion about this post