All of lore.kernel.org
 help / color / mirror / Atom feed
From: Farid Zakaria <farid.m.zakaria@gmail.com>
To: Xdp <xdp-newbies@vger.kernel.org>
Subject: bpf_helpers and you... some more...
Date: Wed, 30 Oct 2019 12:03:53 -0700	[thread overview]
Message-ID: <CACCo2j=TJYZ68ur53vNYxaS2qQgPv6ouij3P=tmrno-SJFTw0Q@mail.gmail.com> (raw)

This is my attempt of a continuation of David's prior e-mail
https://www.spinics.net/lists/xdp-newbies/msg00179.html

I was curious how ebpf filters are wired and work. The heavy use of C
macros makes the source code difficult for me to comprehend (maybe
there's an online pre-processed version?).
I'm hoping others may find this exploratory-dive insightful (hopefully
it's accurate enough).

Let's write a very trivial ebpf filter (hello_world_kern.c) and have
it print "hello world"

    #include <linux/bpf.h>

    #define __section(NAME) __attribute__((section(NAME), used))

    static char _license[] __section("license") = "GPL";

    /* helper functions called from eBPF programs written in C */
    static int (*bpf_trace_printk)(const char *fmt, int fmt_size,
                                ...) = (void *)BPF_FUNC_trace_printk;

    __section("hello_world") int hello_world_filter(struct __sk_buff *skb) {
        char msg[] = "hello world";
        bpf_debug_printk(msg, sizeof(msg));
        return 0;
    }

If we compile the above using the below we can inspect the LLVM IR.
    clang -c -o hello_world_kern.ll -x c -S -emit-llvm hello_world_kern.c

The few lines that standout are:

    @bpf_trace_printk = internal global i32 (i8*, i32, ...)* inttoptr
(i64 6 to i32 (i8*, i32, ...)*), align 8
    ....
    %6 = load i32 (i8*, i32, ...)*, i32 (i8*, i32, ...)**
@bpf_trace_printk, align 8
    %7 = getelementptr inbounds [13 x i8], [13 x i8]* %3, i32 0, i32 0
    %8 = call i32 (i8*, i32, ...) %6(i8* %7, i32 13)

The above demonstrates that the value of BPF_FUNC_trace_printk is
simply the integer 6 and it is being casted to a function pointer.
Sure enough, we can confirm that `bpf_trace_printk` is the 6th value
in the enumeration of known bpf bpf_helpers.
(https://elixir.bootlin.com/linux/v5.3.7/source/include/uapi/linux/bpf.h#L2724)

We can go even further and take this LLVM IR and generate human
readable eBPF assembly using `llc`

    llc hello_world_kern.ll -march=bpf

Depending on the optimization level of the earlier `clang` call you
may see different results however using `-O3` we can see

    call 6

Great! so we know that the call to `bpf_trace_printk` gets translated
into a call instruction with immediate value of 6.

How does it end up calling code within the kernel though?
Once the Verifier verifies the bytecode it calls `fixup_bpf_calls`
(https://elixir.bootlin.com/linux/v5.3.8/source/kernel/bpf/verifier.c#L8869)
which goes through all the instructions and makes the necessary
adjustment to the immediate value

    fixup_bpf_calls(...) {
        ...
        patch_call_imm:
            fn = env->ops->get_func_proto(insn->imm, env->prog);
            /* all functions that have prototype and verifier allowed
            * programs to call them, must be real in-kernel functions
            */
            if (!fn->func) {
                verbose(env,
                    "kernel subsystem misconfigured func %s#%d\n",
                    func_id_name(insn->imm), insn->imm);
                return -EFAULT;
            }
            insn->imm = fn->func - __bpf_call_base;

N.B. I haven't deciphered how __bpf_call_base is used / works

The `get_func_proto` will return the function prototypes registered by
every subsystem such as in net.
(https://elixir.bootlin.com/linux/v5.3.8/source/net/core/filter.c#L5991)
At this point in the method it's a simple switch statement to get the
matching function prototype given the numeric value.

I'd love to see more on the code path of how the non-JIT vs JIT
instructions get handled.
For the net subsystem, I can see where the ebpf prog is invoked
(https://elixir.bootlin.com/linux/v5.3.8/source/net/core/filter.c#L119),
but it's difficult to work out how the choice of executing the
function directly (in the case of JIT) vs running it through the
interpreter is handled.

             reply	other threads:[~2019-10-30 19:04 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-30 19:03 Farid Zakaria [this message]
2019-10-31  9:58 ` bpf_helpers and you... some more Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACCo2j=TJYZ68ur53vNYxaS2qQgPv6ouij3P=tmrno-SJFTw0Q@mail.gmail.com' \
    --to=farid.m.zakaria@gmail.com \
    --cc=xdp-newbies@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.