The eBPF Verifier Errors Project

The eBPF verifier is a vital part of the eBPF stack. The verifier checks all eBPF bytecode that is loaded into the kernel:

Courtesy of Mohammed Aboullaite

This ensures that the bytecode is valid, that it doesn’t access potentially unsafe memory (like dereferencing null pointers), and that all programs terminate. This is vital for safe-guarding the kernel against common issues.

The only problem? The verifier output can be quite hard to understand. Consider for example the following verifier error log which only lists the bytecode:

libbpf: prog 'enterOpenat2': -- BEGIN PROG LOAD LOG --
0: R1=ctx() R10=fp0
;  @ <stdin>:15
0: (b7) r1 = 0                        ; R1_w=0
;  @ <stdin>:16
1: (7b) *(u64 *)(r10 -8) = r1         ; R1_w=0 R10=fp0 fp-8_w=0
2: (7b) *(u64 *)(r10 -16) = r1        ; R1_w=0 R10=fp0 fp-16_w=0
3: (bf) r6 = r10                      ; R6_w=fp0 R10=fp0
4: (07) r6 += -16                     ; R6_w=fp-16
;  @ <stdin>:17
5: (bf) r1 = r6                       ; R1_w=fp-16 R6_w=fp-16
6: (b7) r2 = 16                       ; R2_w=16
7: (85) call bpf_get_current_comm#16          ; R0_w=scalar() fp-8_w=mmmmmmmm fp-16_w=mmmmmmmm
;  @ <stdin>:18
8: (18) r1 = 0xffff95b5c6d05800       ; R1_w=map_ptr(map=map,ks=16,vs=4)
10: (bf) r2 = r6                      ; R2_w=fp-16 R6_w=fp-16
11: (85) call bpf_map_lookup_elem#1   ; R0=map_value_or_null(id=1,map=map,ks=16,vs=4)
;  @ <stdin>:19
12: (61) r1 = *(u32 *)(r0 +0)
R0 invalid mem access 'map_value_or_null'
processed 12 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
-- END PROG LOAD LOG --

For the related program:

struct {
    __uint (type, BPF_MAP_TYPE_HASH);
    __uint (key_size, sizeof(u8[16]));
    __uint (value_size, sizeof(u32));
    __uint (max_entries, 256);
} map SEC(".maps");

// ...

u32 *counter = bpf_map_lookup_elem(&comm);
*(counter) = (*(counter)) + 1;

It’s not entirely obvious what the error is and how to resolve it, especially for people new to eBPF.

This is why I’m starting eBPF Verifier Errors Project:

A project that aims to collect as many eBPF verifier logs as possible, with the related source code, a description of the issue, and how to resolve them. This creates a database where people can look up verifier errors and possibly use it as a data source for developer tooling (like AI chatbots).

The errors are collected in the form of issues in the ebpf-verifier-errors repository on GitHub. You can obtain a list on the command line via:

curl -s -H "https://api.github.com/repos/parttimenerd/ebpf-verifier-errors/issues?labels=submission" \
     | jq -r '.[] | "\(.title)\n\(.body)\n-----"'

How to Submit

Please consider submitting your verifier errors today. It’s not hard; just fill out the issue form, and you’re done:

Any submission counts. We use GitHub issues because it makes submitting as easy as possible. No need to clone the repository and submit a pull request. Feel free to submit C, Rust, and Java code. If you find a submission that can improved, just add a comment.

If you’re unsure what a submission should look like, there are already five submissions by Dylan Reimerink and me, and possibly many more when you read this:

Conclusion

Verifier errors can be cryptic and frustrating, but we can change that. Let’s make them less scary and enrich the eBPF ecosystem. I’m happy that Dylan of ebpf-docs fame has agreed to join this worthy cause.

I pledge to contribute all eBPF verifier errors that I encounter and I look forward to your contributions.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Hello eBPF: Writing a Lottery Scheduler in Pure Java with bpf_for_each Support (18)

Welcome back to my hello-ebpf series. Last week, we learned about lottery schedulers and how to implement one with sched-ext and hello-ebpf. But there was one major issue when writing the scheduler in Java: hello-ebpf lacked support for the bpf_for_each macro. This is what this weeks short blog post is all about.

To loop over all elements in the scheduling queue, to pick a random one, we had to write inline C code. A simplified version without scheduling priorities looks like the following:

@Override
public void dispatch(int cpu, Ptr<TaskDefinitions.task_struct> prev) {
    String CODE = """
        // pick a random task struct index
        s32 random = bpf_get_prandom_u32() % 
            scx_bpf_dsq_nr_queued(SHARED_DSQ_ID);
        struct task_struct *p = NULL;
        bpf_for_each(scx_dsq, p, SHARED_DSQ_ID, 0) {
            // iterate till we get to this number
            random = random - 1;
            // and try to dispatch it
            if (random <= 0 && 
              tryDispatching(BPF_FOR_EACH_ITER, p, cpu)) {
                return 0;
            }
       };
       return 0;
       """;
}

This surely works, but inline C code is not the most readable and writing it is error-prone. I’ve written on this topic in my post Hello eBPF: Write your eBPF application in Pure Java (12) before. So what was keeping us from supporting the bpf_for_each macro? Support for lambdas as arguments to built-in bpf functions.

Lottery Scheduler in Pure Java

May I introduce you know the support for directly passed lambda expressions: This let’s us define a bpf_for_each_dsq function and to write a simple lottery scheduler in pure Java:

Continue reading

Hello eBPF: Writing a Lottery Scheduler in Java with sched-ext (17)

Welcome back to my hello-ebpf series. Last week, I showed you how to create a custom Linux scheduler with a simple web API to control the task scheduling using sched-ext. Just so you know: A talk based on this blog post got accepted in the Testing and Continuous Delivery dev room at FOSDEM 2025. But this is not what I’m going to talk about in this blog post; instead, this article is on a topic that I covered almost exactly eleven years ago on this very blog: Lottery scheduling. We’ll extend the taskcontrol project covered in last week’s post with a new scheduler.

Foundations and Retrospektive

Eleven years ago, I was a student in Professor Frank Bellosa‘s operating systems course, learning about different schedulers. I wrote a blog post on a scheduler that I found particularly interesting: the lottery scheduler. I could write a new introduction to lottery scheduling, but I can also delegate this task to my old self:

I currently have a course covering operating systems at university. We learn in this course several scheduling algorithms. An operating system needs these kind of algorithms to determine which process to run at which time to allow these process to be executed “simultaniously” (from the users view).

Good scheduling algorithms have normally some very nice features:

  • They avoid the starving of one process (that one process can’t run at all and therefore makes no progress).
  • That all processes run the approriate percentage (defined by it’s priority) of time in each bigger time interval.

But they are maybe not only useful for operating systems but also for people like me. Probably they could help me to improve the scheduling of my learning.

An algorithm that seems to be suited best for this purpose is called lottery scheduling (pdf). It’s an algorithm that gives a each process some lottery tickets (their number resembles the priority of the process). Every time the algorithm has to choose a new process to run, it simply picks a lottery ticket randomly and returns the process that owns the ticket. A useful addition is (in scenarios with only a few tickets) to remove the tickets that are picked temporarily from the lottery pot and put them back again, when the pot is empty.

Real life practice in lottery scheduling

We can visualize this similarly to the FIFO scheduler from last week’s blog post:

Visualization of lottery scheduling with sched-ext

This is a truly random scheduler, making it great for testing applications with random scheduling orders. This is why it’s a great addition to the taskcontrol project.

Continue reading

Hello eBPF: Control task scheduling with a custom scheduler written in Java (16)

Welcome back to my hello-ebpf series. A few weeks ago, I showed you how to write a minimal Linux scheduler in both Java and C using the new Linux scheduler extensions. Before we start: The Linux kernel with sched-ext support has finally been released, so the next Ubuntu version will probably have support for writing custom Linux schedulers out-of-the-box.

In this week’s blog posts, I’ll show you how to use a custom scheduler to control the execution of tasks and processes. The main idea is that I want to be able to tell a task (a single thread of a process) or a task group (usually a process) to stop for a few seconds. This can be helpful for testing the behaviour of programs in scheduling edge cases, when for example a consumer thread isn’t scheduled as it used to be normally.

Of course, we can achieve this without a custom Linux scheduler: We can just send the task a POSIX SIGSTOP signal and a SIGCONT signal when we want to continue it. But where is the fun in that? Also, some applications (I look at you, OpenJDK) don’t like it when you send signals to them, and they will behave strangely, as they can observe the signals.

Idea

I showed before that we can write schedulers ourselves, so why not use them for this case? The main idea is to only schedule processes in the scheduler that are allowed to be scheduled, according to a task (and task group) settings BPF map:

This is essentially our minimal scheduler, with one slight modification that I show you later.

And yes, the tasks are not stopped immediately, but with a maximum of 5ms scheduling intervals, we only have a small delay.

I implemented all this in the taskcontrol project, which you can find on GitHub. This is where you also find information on the required dependencies and how to install a 6.12 Kernel if you’re on Ubuntu 24.10.

Continue reading

A Minimal Scheduler with eBPF, sched_ext and C

The following is a tutorial written for the sched_ext project, that gives you an introduction into writing a Linux scheduler directly in C. The tutorial is also present on the sched_ext scx wiki.

In the following a short tutorial for creating a minimal scheduler written with sched_ext in C. This scheduler uses a global scheduling queue from which every CPU gets its tasks to run for a time slice. The scheduler order is First-In-First-Out. So it essentially implements a round-robin scheduler:

Round Robin Diagram

This short tutorial covers the basics; to learn more, visit the resources from the scx wiki.

Continue reading

The slow Death of the onjcmd Debugger Feature

Almost to the day, a year ago, I published my blog post called Level-up your Java Debugging Skills with on-demand Debugging. In this blog post, I wrote about multiple rarely known and rarely used features of the Java debugging agent, including the onjcmd feature. To quote my own blog post:

JCmd triggered debugging

There are often cases where the code that you want to debug is executed later in your program’s run or after a specific issue appears. So don’t waste time running the debugging session from the start of your program, but use the onjcmd=y option to tell the JDWP agent to wait with the debugging session till it is triggered via jcmd.

A similar feature long existed in the SAPJVM. In 2019 Christoph Langer from SAP decided to add it to the OpenJDK, where it was implemented in JDK 12 and has been there ever since.

The alternative to using this feature is to start the debugging session at the beginning and only connect to the JDWP agent when you want to start debugging. But this was, for a time, significantly slower than using the onjcmd feature (source):

Continue reading

Mapping Java Thread Ids to OS Thread Ids

This week, a short blog post on a question that bothered me this week: How can I get the operating systems thread ID for a given Java thread? This is useful when you want to deal with Java threads using native code (foreshadowing another blog post). The question was asked countless times on the internet, but I couldn’t find a comprehensive collection, so here’s my take. But first some background:

Background

In Java, normal threads are mapped 1:1 to operating system threads. This is not the case for virtual threads because they are multiplexed on fewer carrier threads than virtual threads, but we ignore these threads for simplicity here.

But what is an operating system thread? An operating system thread is an operating system task that shares the address space (and more) with other thread tasks of the same process/thread group. The main thread is the thread group leader; its operating system ID is the same as the process ID.

Be aware that the Java thread ID is not related to the operating system ID but rather to the Java thread creation order. Now, what different options do we have to translate between the two?

Different Options

During my research, I found three different mechanisms:

  1. Using the gettid() method
  2. Using JFR
  3. Parsing thread dumps

In the end, I found that option 3 is best; you’ll see why in the following.

Continue reading

Hello eBPF: Writing a Linux scheduler in Java with eBPF (15)

Or: Learn how to write a performant* Linux scheduler in 25 lines of Java code.

Welcome back to my series on ebpf. In the last post, I presented a recording of my JavaZone presentation on eBPF and a list of helpful resources for learning about the topic. Today, I’ll show you how to write a Linux scheduler in Java with eBPF. This blog post is the accompanying post to my eBPF summit keynote of the same title:

With my newest hello-ebpf addition, you can create a Linux scheduler by just implementing the methods of the Scheduler interface, allowing you to write a small scheduler with ease:

Is it really as easy as that? Of course not, at least not yet. Developing and running this scheduler requires a slightly modified version of hello-ebpf, which lives in the branch scx_demo, and a kernel patched with the sched-ext extension or a CachyOS instance with a 6.10 kernel, as well as some luck because it’s still slightly brittle.

Nonetheless, when you get it working, you can enter the wondrous world of people who build their schedulers with eBPF. You can find some of them on the sched-ext slack and many of their creation in the sched-ext/scx repository on GitHub. The kernel patches will hopefully be merged into the mainline kernel soon and will be available with version 6.12.

Continue reading

Hello eBPF: Collection of Resources for eBPF (14.5)

Welcome back to my series on ebpf. In the last post, I told you how to build a Firewall with Java and eBPF. I gave and will give the talk on the very same topic at multiple conferences, as mentioned at the end of the post. Last week, I gave it, together with Mohammed Aboullaite, at one of my favorite Java conferences: JavaZone. One of the reasons I recommend this conference to every upcoming speaker is that they create really good recordings of every talk and upload them to Vimeo almost immediately. So here is the recording of the talk:

You can find the slides here.

As a courtesy to the listener, I created a list of resources on eBPF that helped me a lot:

Main

Continue reading

Hello eBPF: Building a Lightning Fast Firewall with Java & eBPF (14)

Welcome back to my series on ebpf. In the last post, I told you how TC and XDP hooks allow us to monitor incoming and outgoing packets. This week, we’re extending this to build a firewall in Java, including a small Spring-Boot-based web frontend, with hello-ebpf:

Before I start, here is a disclaimer: The details of eBPF are hard, so I could only get the filtering of incoming packets to work reliably. Can I still call it a firewall? I would say yes, but please help me filter the outgoing packets if you disagree. Also, it’s my first Spring-Boot-based application, so please don’t judge it too harshly. Lastly, we only focus on IPv4 packets, so adding support for IPv6 rules is left to the reader.

Is it fast? Probably. I didn’t do any measurements myself, but research by Cloudflare suggests that XDP is far faster at dropping packets than the standard firewall.

Continue reading

Hello eBPF: A Packet Logger in Pure Java using TC and XDP Hooks (13)

Welcome back to my series on ebpf. In the last post, I told you about writing eBPF applications in pure Java using my new Java compiler plugin. This week, we’re extending the XDP example from last week (and Hello eBPF: XDP-based Packet Filter (9)) to also capture outgoing packets using a Traffic Control (TC) classifier.

Before we go into the details, first, the demo of the PacketLogger:

The logger captures the incoming and outgoing IP packets with their IP address, their protocol (TCP, UDP, OTHER), the TCP/UDP port, and the packet length. But before I show you how I implemented all this in Java, here is a short introduction to the Linux network stack:

Continue reading

Hello eBPF: Write your eBPF application in Pure Java (12)

Welcome back to my series on ebpf. In the last post, I told you about BTF and generating Java classes for all BPF types. This week, we’re using these classes to write a simple packet blocker in pure Java. This is the culmination of my efforts that started in my post Hello eBPF: Generating C Code (8), to reduce the amount of C code that you have to write to create your eBPF application.

This blog post took again longer than expected, but you’ll soon see why. And I dropped libbcc support along the way.

After my last blog post, you still had to write the eBPF methods in a String embedded in the Java application. So if you wanted to write a simple XDP-based packet blocker that blocks every third incoming packet, you wrote the actual XDP logic into a String-typed field named EBPF_PROGRAM. But we already can define the data types and global variables in Java, generating C code automatically. Can we do the same for the remaining C code? We can now. Introducing the new Java compiler plugin, that allows to you write the above in “pure” Java, using Java as a DSL for C (GitHub):

@BPF(license = "GPL") // define a license
public abstract class XDPDropEveryThirdPacket 
  extends BPFProgram implements XDPHook {
    
    // declare the global variable
    final GlobalVariable<@Unsigned Integer> count = 
        new GlobalVariable<>(0);

    @BPFFunction
    public boolean shouldDrop() {
        return count.get() % 3 == 1;
    }

    @Override // defined in XDPHook, compiled to C
    public xdp_action xdpHandlePacket(Ptr<xdp_md> ctx) {
        // update count
        count.set(count.get() + 1);
        // drop based on count
        return shouldDrop() ? xdp_action.XDP_DROP : xdp_action.XDP_PASS;
    }

    public static void main(String[] args) 
      throws InterruptedException {
        try (XDPDropEveryThirdPacket program = 
             BPFProgram.load(XDPDropEveryThirdPacket.class)) {
            program.xdpAttach(XDPUtil.getNetworkInterfaceIndex());
            while (true) {
                System.out.println("Packet count " + 
                                   program.count.get());
                Thread.sleep(1000);
            }
        }
    }
}
Continue reading

Hello eBPF: BPF Type Format and 13 Thousand Generated Java Classes (11)

Welcome back to my series on ebpf. In the last post, we learned how to use global variables to communicate easily between user and kernel land. In this post, you’ll learn about the BPF Type Format (BTF) and how and why we generate Java code from it.

We start with the simple question of what is BTF:

VMLinux Header

In all BPF programs that we’ve written in this blog series, we included a specific header:

#include "vmlinux.h"

This header contains all of the fundamental types and definitions we need when writing our BPF programs. It contains simple definitions like the integer types used in many of the examples:

Continue reading

Wait you can place Java annotations there?

I worked too much on other stuff, so I didn’t have time to blog, so here is a tiny post.

Java annotations are pretty nice: You can annotate many things to add more information. For example, you can add an @Nullable to a type used to tell static analyzers or IDEs that this the value of this type there might actually be null:

public @Nullable String parse(String description) {
  ...
  return error ? null : result;
}

There are many other uses, especially in adding more information needed for code generation. In working on hello-ebpf, I used annotations and generated code with JavaPoet containing annotations. When we generate the code from above with JavaPoet, it produces:

public java.lang. @Nullable String parse(
  java.lang.String description) {
  // ...
}

But how could this be valid Java? I expected

public @Nullable java.lang.String parse(
  java.lang.String description) {
  // ...
}

but not the former. Let’s look into the language specification. Section 4.3 tells us class types in fields and other type usages as follows:

ClassType:
  {Annotation} TypeIdentifier [TypeArguments]
  PackageName . {Annotation} TypeIdentifier [TypeArguments]
  ClassOrInterfaceType . {Annotation} TypeIdentifier [TypeArguments] 

According to the specification @Nullable java.lang.String and java.lang. @Nullable String are the same.

It gets even weirder with arrays:

java.lang. @Nullable Integer @Nullable [] arr @Nullable []

This denotes a two-dimensional array of strings that might be null and might contain null, and its arrays might contain null. This is true to the language specification:

ArrayType:
  PrimitiveType Dims
  ClassOrInterfaceType Dims
  TypeVariable Dims
Dims:
  {Annotation} [ ] {{Annotation} [ ]}

There is even an example in the specification that is similar to our example:

For example, given the field declaration:

@Foo int f;

@Foo is a declaration annotation on f if Foo is meta-annotated by @Target(ElementType.FIELD), and a type annotation on int if Foo is meta-annotated by @Target(ElementType.TYPE_USE). It is possible for @Foo to be both a declaration annotation and a type annotation simultaneously.

Type annotations can apply to an array type or any component type thereof (§10.1). For example, assuming that A, B, and C are annotation interfaces meta-annotated with @Target(ElementType.TYPE_USE), then given the field declaration:

@C int @A [] @B [] f;

@A applies to the array type int[][], @B applies to its component type int[], and @C applies to the element type int. For more examples, see §10.2.

An important property of this syntax is that, in two declarations that differ only in the number of array levels, the annotations to the left of the type refer to the same type. For example, @C applies to the type int in all of the following declarations:

@C int f;
@C int[] f;
@C int[][] f;
Language Specification Section 9.7.4

Conclusion

Java never stops surprising me. This syntax looked weird when I first stumbled upon it, but after looking through the language specification, I see how useful and justified this placement of annotations is.

I hope you enjoyed this tiny blog post on annotations; see you in my next one.

P.S.: I’m currently at KCDC

Hello eBPF: Global Variables (10)

Welcome back to my series on ebpf; in the last post, we learned how to write a simple XDP-based packet filter. In this post, we’ll continue the work on eBPF to make it easier to write more complex programs. Yes, I promised to write a load balancer but instead opted to add support for global variables to hello-ebpf, documenting it in this short post.

When we want to configure our eBPF program, say to set a simple logLevel setting, we currently have only one option: We could create an array map with one entry, our configuration value, and then use the usual methods to set its value and retrieve it. In Java, this would look like:

@BPFMapDefinition(maxEntries = 1)
BPFArray<Integer> logLevel;

void setLogLevel(int level) {
    logLevel.set(0, level);
}

In the ebpf program itself, see Hello eBPF: Recording data in basic eBPF maps (2) for more information; the value would be used as:

struct { // this is auto-generated by hello-ebpf
    __uint (type, BPF_MAP_TYPE_ARRAY);
    __type (key, u32);                
    __type (value, s32);              
    __uint (max_entries, 1);
} logLevel SEC(".maps");                           

s32 getLogLevel() {        
    u32 zero = 0;                            
    return *bpf_map_lookup_elem(&map, &zero);
}                

Memory Segmentation

This is quite cumbersome, especially as C already has a concept of global variables. Why couldn’t we just use these:

s32 logLevel;                           

s32 getLogLevel() {                          
    return logLevel;
}                                            

A program’s memory at runtime is split into multiple segments:

Segments as BPF Maps

Starting with Linux 5.2, d8eca5bbb2be (“bpf: implement lookup-free direct value access for maps”), we can directly access segments from the user-land as if they are a single-valued array map and can use the BPF Type Format information for every segment to see where each global variable is placed.

But how can we expose this to the user in user-land in a usable manner? We can extend the preprocessor to do its magic:

final GlobalVariable<Integer> logLevel =
    new GlobalVariable(/* initial value */ 42);

// later
program.logLevel.set(...);
// or
program.logLevel.get();

It is essential to state that the eBPF program can change the global variables, too, allowing us to have a simple communication channel between user-land and kernel-land.

This mechanism isn’t limited to scalar values; you can also store more complex values:

@Type
record Server(int ip, @Size(10) int[] ports, int portsCount) {}

final GlobalVariable<Server> server =
    new GlobalVariable<>(new Server(..., 
        new int[]{22, 23, 0, 0, 0, 0, 0, 0, 0, 0}, 2));

Conclusion

Using global variables, we can easily configure our eBPF and communicate between user-land and kernel-land. Add some preprocessor magic, and we have a powerful new feature in hello-ebpf. With this at hand, we can finally start writing a load balancer.

Thanks for joining us on the journey to create an eBPF library for Java. I’ll see you in two weeks for the next installment.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone. Thanks to Dylan Reimerink for answering all my questions and sharing all his knowledge on eBPF; this blog post is based on one of his answers on StackOverflow.

Who instruments the instrumenters?

Fixing bugs in Spring Boot and Mockito by instrumenting them

Have you ever wondered how libraries like Spring and Mockito modify your code at run-time to implement all their advanced features? Wouldn’t it be cool to get a peek behind the curtains? This is the premise of my meta-agent, a Java agent to instrument instrumenters, to get these insights and what this blog post is about. This post is a collaboration with Mikaël Francoeur, who had the idea for the meta-agent and wrote most of this post. So it’s my first ever post-collaboration. But I start with a short introduction to the agent itself before Mikaël takes over with real-world examples.

Meta-Agent

The meta-agent (GitHub) is a Java agent that instruments the Instrumentation.addTransformer methods agents use to add bytecode transformers and wrap the added transformers to capture bytecode before and after each transformation. This allows the agent to capture what every instrumenting agent does at run-time. I covered the basics of writing your own instrumenting agent before in my blog post Instrumenting Java Code to Find and Handle Unused Classes and my related talk. So, I’ll skip all the implementation details here.

But how can you use it? You first have to download the agent (or build it from scratch via mvn package -DskipTests), then you can just attach it to your JVM at the start:

java -javaagent:target/meta-agent.jar -jar your-program.jar

This will then create a web server at http://localhost:7071 that allows you to inspect the bytecode modifications of each instrumenter dynamically. For the example from the README http://localhost:7071/full-diff/class?pattern=java.lang.Iterable shows you, for example, how Mockito modifies the Iterable class upon mocking:

Continue reading

Hello eBPF: XDP-based Packet Filter (9)

Welcome back to my series on ebpf. In the last blog post, we learned how annotation processors can generate C code, simplifying writing eBPF applications. This week, we’ll use this work together with new support for XDP to create a simple package blocker for eBPF (GitHub):

./run_bpf.sh XDPPacketFilter twitter.com

This blocks all incoming IPv4 packages from twitter.com. We see how it works in this blog post. First, we start with some background on networking and explain what XDP is.

Network Packet

All networking is packet-based, with multiple layers of protocol from shared medium (e.g., Ethernet) to application level (e.g., HTTP):

Continue reading

Hello eBPF: Generating C Code (8)

Welcome back to my series on ebpf. In the last blog post, we learned how to auto-layout struct members and auto-generate BPFStructTypes for annotated Java records. We’re going to extend this work today.

This is a rather short blog post, but the implementation and fixing all the bugs took far more time then expected.

Generating Struct Definitions

We saw in the last blog post how powerful Java annotation processing is for generating Java code; this week, we’ll tackle the generation of C code: In the previous blog post, we still had to write the C struct and map definitions ourselves, but writing

struct event {
  u32 e_pid;
  char e_filename[FILE_NAME_LEN];
  char e_comm[TASK_COMM_LEN];
};

when we already specified the data type properly in Java

record Event(@Unsigned int pid,
             @Size(FILE_NAME_LEN) String filename,
             @Size(TASK_COMM_LEN) String comm) {}

seems to be a great place to improve our annotation processor. There are only two problems:

  1. The annotation processor needs to know about BPFTypes, so we have to move them in there. But the BPFTypes use the Panama API which requires the –enable-preview flag in JDK 21, making it unusable in Java 21. So we have to move the whole library over to JDK 22, as this version includes Panama.
  2. There is no C code generation library like JavaPoet for generating Java code.

Regarding the first problem: Moving to JDK 22 is quite easy, the only changes I had to make are listed in this gist. The only major problem was getting the Lima VM to use a current JDK 22. In the end I resorted to just using sdkman, you can a look into the install.sh script to see how I did it.

Regarding the second problem: We can reduce the problem of generating C code into two steps:

  1. Create an Abstract Syntax Tree (AST) for C
  2. Create a pretty printer for this AST

To create an AST I resorted to an ANSI C grammar for inspiration. Each AST node implements the following interface:

public interface CAST {

    List<? extends CAST> children();

    Statement toStatement();

    /** Generate pretty printed code */
    default String toPrettyString() {
        return toPrettyString("", "  ");
    }

    String toPrettyString(String indent, String increment);
}

We can then create a hierarchy of extending interfaces (PrimaryExpression, …) and implementing records (ConstantExpression, …). You can find the whole C AST on GitHub.

This leads us to an annotation processor that can add automatically insert struct definitions into the C code of our eBPF program, reducing the amount of hard-to-debug errors as it is guaranteed that both the Java specification and C representation of every type are compatible.

But can we do more with annotation processing?

Generating Map Definitions

There is another definition that we can auto-generate: Map definitions like

 struct                                
 {                                     
   __uint (type, BPF_MAP_TYPE_RINGBUF);
   __uint (max_entries, 256 * 4096);   
 } rb SEC (".maps");

which define maps like hash maps and ring buffers that allow the communication between user- and kernel-space.

With a little of annotation processor, we can define the same ring buffer from above in Java:

@BPFMapDefinition(maxEntries = 256 * 4096)
BPFRingBuffer<Event> rb;

Our annotation-processor then turns this into the C definition from above and inserts code into the constructor of the Java program that properly initializes rb.

But how does the processor know what code it should generate? By parsing the BPFMapClass annotation on BPFRingBuffer (and any other class). This annotation contains the templates for both the C and the Java code:

@BPFMapClass(
        cTemplate = """
        struct {
            __uint (type, BPF_MAP_TYPE_RINGBUF);
            __uint (max_entries, $maxEntries);
        } $field SEC(".maps");
        """,
        javaTemplate = """
        new $class<>($fd, $b1)
        """)
public class BPFRingBuffer<E> extends BPFMap {
}

Here $field is the Java field name, $maxEntries the value in the BPFMapDefinition annotation and $class the name of the Java class. $cX, $bX, $jX give the C type name, BPFType and Java class names related to the Xth type parameter.

Ring Buffer Sample Program

When we combine all this together we can have a much simpler ring buffer sample program (see TypeProcessingSample2 on GitHub):

@BPF(license = "GPL")
public abstract class TypeProcessingSample2 extends BPFProgram {

    private static final int FILE_NAME_LEN = 256;
    private static final int TASK_COMM_LEN = 16;

    @Type(name = "event")
    record Event(
      @Unsigned int pid, 
      @Size(FILE_NAME_LEN) String filename, 
      @Size(TASK_COMM_LEN) String comm) {}

    @BPFMapDefinition(maxEntries = 256 * 4096)
    BPFRingBuffer<Event> rb;

    static final String EBPF_PROGRAM = """
            #include "vmlinux.h"
            #include <bpf/bpf_helpers.h>
            #include <bpf/bpf_tracing.h>
            #include <string.h>
              
            // This is where the struct and map
            // definitions are inserted automatically          
                  
            SEC ("kprobe/do_sys_openat2")
            int kprobe__do_sys_openat2 (struct pt_regs *ctx)
            {
               // ... // as before
            }
            """;

    public static void main(String[] args) {
        try (TypeProcessingSample2 program = 
           BPFProgram.load(TypeProcessingSample2.class)) {
            program.autoAttachProgram(
              program.getProgramByName("kprobe__do_sys_openat2"));
            // we can use the rb ring buffer directly
            // but have to set the call back
            program.rb.setCallback((buffer, event) -> {
                System.out.printf(
                  "do_sys_openat2 called by:%s " + 
                  "file:%s pid:%d\n", 
                  event.comm(), event.filename(), 
                  event.pid());
            });
            while (true) {
                // consumes all registered ring buffers
                program.consumeAndThrow();
            }
        }
    }
}

There are two other things missing in the C code that are also auto-generated: Constant defining macros and the license definition. Macros are generated for all static final fields in the program class that are defined at compile time.

Conclusion

Using annotation processing allows to reduce the amount of C code we have to write and reduces errors by generating all definitions from the Java code. This simplifies writing eBPF applications.

See you in two weeks when we tackle global variables, moving closer and closer to making hello-ebpf’s bpf support able to write a small firewall.

This will also be the topic of a talk that I submitted together with Mohammed Aboullaite to several conferences for autumn.

Addendum

The more I work on writing my own ebpf library, the more I value the effort that the developers of other libraries like bcc, the Go or Rust ebpf libraries put it in to create usable libraries. They do this despite the lack of of proper documentation. A simple example is the deattaching of attached ebpf programs: There are multiple (undocumented) methods in libbpf that might be suitable; bpf_program__unload, bpf_link__detach, bpf_link__destroy, bpf_prog_detach, but only bpf_link__destroy properly detached a program.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

Hello eBPF: Auto Layouting Structs (7)

Welcome back to my series on ebpf. In the last blog post, we learned how to use ring buffers with libbpf for efficient communication. This week, we’re looking into the memory layout and alignment of structs transferred between the kernel and user-land.

Alignment is essential; it specifies how the compiler layouts the structs and variables and where to put the data in memory. Take, for example, the struct that we defined in the previous blog post in the RingSample:

#define FILE_NAME_LEN 256
#define TASK_COMM_LEN  16
                
// Structure to store the data that we want to pass to user
struct event {
  u32 e_pid;
  char e_filename[FILE_NAME_LEN];
  char e_comm[TASK_COMM_LEN];
};

Struct Example

Using Pahole in the Compiler Explorer, we can see the memory layout on amd64:

struct event {
	unsigned int               e_pid;                /*     0     4 */
	char                       e_filename[256];      /*     4   256 */
	/* --- cacheline 4 boundary (256 bytes) was 4 bytes ago --- */
	char                       e_comm[16];           /*   260    16 */

	/* size: 276, cachelines: 5, members: 3 */
	/* last cacheline: 20 bytes */
};

This means that the know also knows how to transform member accesses to this struct and can adequately place the event in the allocated memory:

You’ve actually seen the layouting information before, as the hello-ebpf project requires you to hand layout all structs manually:

record Event(@Unsigned int pid,
             @Size(FILE_NAME_LEN) String filename,
             @Size(TASK_COMM_LEN) String comm) {}

// define the event records layout
private static final BPFStructType<Event> eventType =
        new BPFStructType<>("rb", List.of(
        new BPFStructMember<>("e_pid",
                BPFIntType.UINT32, 0, Event::pid),
        new BPFStructMember<>("e_filename",
                new StringType(FILE_NAME_LEN),
                4, Event::filename),
        new BPFStructMember<>("e_comm",
                new StringType(TASK_COMM_LEN),
                4 + FILE_NAME_LEN, Event::comm)
   ), new AnnotatedClass(Event.class, List.of()),
   fields -> new Event((int)fields.get(0),
       (String)fields.get(1), (String)fields.get(2)));

eBPF is agnostic regarding alignment, as the compiler on your system compiles the eBPF and the C code, so the compiler can decide how to align everything.

Alignment Rules

But where do these alignment rules come from? They come from how your CPU works. Your CPU usually only allows/is optimized for certain types of accesses. So, for example, x86 CPUs are optimized for accessing 32-bit integers that lay at addresses in memory that are a multiple of four. The rules are defined in the Application Binary Interface (ABI). The alignment rules for x86 (64-bit) on Linux are specified in the System V ABI Specification:

And more, but in general, scalar types are aligned by their size. Structs, unions, and arrays are, on the other hand, aligned based on their members:

Structures and unions assume the alignment of their most strictly aligned component. Each member is assigned to the lowest available offset with the appropriate alignment. The size of any object is always a multiple of the object‘s alignment.

An array uses the same alignment as its elements, except that a local or global array variable of length at least 16 bytes or a C99 variable-length array variable always has alignment of at least 16 bytes.

Structure and union objects can require padding to meet size and alignment constraints. The contents of any padding is undefined.

System V Application Binary Interface
AMD64 Architecture Processor Supplement
Draft Version 0.99.6

ARM 64-but has the same scalar alignments and struct alignment rules (see Procedure Call Standard for the Arm® 64-bit Architecture (AArch64)); we can therefore use the same layouting algorithm for both CPU architectures.

We can formulate the algorithm for structs as follows:

struct_alignment = 1
current_position = 0
for member in struct:
  # compute the position of the member
  # that is properly aligned
  # this introduces padding (empty space between members)
  # if there are alignment issues
  current_position = \
    math.ceil(current_position / alignment) * member.alignment
  member.position = current_position
  # the next position has to be after the current member
  current_position += member.size
  # the struct alignment is the maximum of all alignments
  struct_alignment = max(struct_alignment, member.alignment)

With this at hand, we can look at a slightly more complex example:

Struct Example with Padding

The compiler, at times, has to create an unused memory section between two members to satisfy the individual alignments. This can be seen in the following example:

struct padded_event {
  char c;  // single byte char, alignment of 1
  long l;  // alignment of 8
  int i;   // alignment of 4
  void* x; // alignment of 8
};

Using Pahole again in the Compiler Explorer, we see the layout that the compiler generates:

struct padded_event {
	char                       c;                    /*     0     1 */

	/* XXX 7 bytes hole, try to pack */

	long                       l;                    /*     8     8 */
	int                        i;                    /*    16     4 */

	/* XXX 4 bytes hole, try to pack */

	void *                     x;                    /*    24     8 */

	/* size: 32, cachelines: 1, members: 4 */
	/* sum members: 21, holes: 2, sum holes: 11 */
	/* last cacheline: 32 bytes */
};

Pahole tells us that it had to introduce 11 bytes of padding. We can visualize this as follows:

This means that we’re essentially wasting memory. I recommend reading The Lost Art of Structure Packing by Eric S. Raymond to learn more about this. If we really want to save memory, we could reorder the int with the long member, thereby only needing the padding after the char, leading to an object with 24 bytes and only 3 bytes of padding. This is really important when storing many of these structs in arrays, where the wasted memory accumulates.

But what do we do with this knowledge?

Auto-Layouting in hello-ebpf

The record that we defined in Java before contains all the information to auto-generate the BPFStructType for the class; we just need a little bit of annotation processor magic:

@Type
record Event(@Unsigned int pid,
             @Size(FILE_NAME_LEN) String filename,
             @Size(TASK_COMM_LEN) String comm) {}

This record is processed, and out comes the suitable BPFStructType:

We implemented the auto-layouting in the BPFStructType class to reduce the amount of logic in the annotation processor.

This results in a much cleaner RingSample version, named TypeProcessingSample:

@BPF
public abstract class TypeProcessingSample extends BPFProgram {

    static final String EBPF_PROGRAM = """...""";

    private static final int FILE_NAME_LEN = 256;
    private static final int TASK_COMM_LEN = 16;

    @Type
    record Event(@Unsigned int pid, 
                 @Size(FILE_NAME_LEN) String filename, 
                 @Size(TASK_COMM_LEN) String comm) {}


    public static void main(String[] args) {
        try (TypeProcessingSample program = BPFProgram.load(TypeProcessingSample.class)) {
            program.autoAttachProgram(
              program.getProgramByName("kprobe__do_sys_openat2"));

            // get the generated struct type
            var eventType = program.getTypeForClass(Event.class);

            var ringBuffer = program.getRingBufferByName("rb", eventType,
             (buffer, event) -> {
                System.out.printf("do_sys_openat2 called by:%s file:%s pid:%d\n", 
                                  event.comm(), event.filename(), event.pid());
            });
            while (true) {
                ringBuffer.consumeAndThrow();
            }
        }
    }
}

The annotation processor currently supports the following members in records:

  • integer types (int, long, …), optionally annotated with @Unsigned if unsigned
  • String types, annotated with @Size to specify the size
  • Other @Type annotated types in the same scope
  • @Type.Member annotated member to specify the BPFType directly

You can find the up-to-date list in the documentation for the Type annotation.

Conclusion

We have to model all C types that we use in both eBPF and Java in Java, too; this includes placing the different members of structs in memory and keeping them properly aligned. We saw that the general algorithm behind the layouting is straightforward. This algorithm can be used in the hello-ebpf library with an annotation processor to make writing eBPF applications more concise and less error-prone.

I hope you liked this introduction to struct layouts. See you in two weeks when we start supporting more features of libbpf.

This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.

To Brussels, Canada and back

Last year was my first year blogging, speaking at conferences, meeting incredible people, and seeing places I’ve never been before. It was at times quite arduous but at the same time energizing, as you can read in my post Looking back on one year of speaking and blogging. I didn’t want it to be a one-off year, so I dutifully started a new blog series on eBPF and applied for conferences… And I got accepted at a few of them, which was really great because I started missing traveling after almost three months of being home. In this blog post, I’ll cover my first three conferences this year: FOSDEM in Brussels, ConFoo in Montreal, and Voxxed Days Zurich; they all happened between early February and early March.

It was the most travel, distance (and continent) wise, that I ever did before, by quite some margin:

Continue reading