[PATCH 0/2] bpf: context casting for tail call and gtrace prog type

* [PATCH 0/2] bpf: context casting for tail call and gtrace prog type
@ 2019-02-25 15:54 Kris Van Hees
  2019-02-26  6:18 ` Alexei Starovoitov
  0 siblings, 1 reply; 13+ messages in thread
From: Kris Van Hees @ 2019-02-25 15:54 UTC (permalink / raw)
  To: netdev

The patches in this set are part of an effort to provide support for
tracing tools beyond attaching programs to probes and events and working
with the context data they provide.  It is also aimed at avoiding adding
new helpers for every piece of task information that tracers may want to
include in trace data (as was discussed at the Linux Plumbers Conference
BPF mini-conference track last year).

One of the main characteristics of tracers is that a variety of
information can be collected at the time of a probe firing.  When using
BPF program to implement actions to be taken when a probe fires, The most
natural source of a large part of this information (task information,
probe data, tracer state) is the context that is associated with a BPF
program.  It is also possible to obtain (most of) this information by
means of full access helper calls like probe_read() but that isn't
something you want to make available to an unprivileged user.

So we have two areas where BPF programs can be very useful and powerful:

- BPF programs that are attached to a probe or event, operating on the
  context provided by the probe or event
- BPF programs that implement actions to be taken within the context of a
  tracing tool

The Linux kernel provides a wealth of probes and events to which we can
attach BPF programs.  These event sources do not have any knowledge of the
tracing tool that might be using them.  But being able to use them from any
tracing tool is definitely preferable over implementing your own probes and
events.  We definitely also do not want to 'teach' all the existing probes
and events about any possible BPF program type that would like to get called
from those probes and events.

So, to illustrate what we're trying to accomplish, consider a kprobe.  We
can attach a BPF program to it and it will be called with a 'struct pt_regs'
context.  From the side of our tracing tool, we also want information about
the task that triggered the kprobe to fire (beyond what is currently
available through helpers) and we want to be able to access that information
from a BPF program that implements what should happen when the probe fires
(e.g. recording the event and specific data that we are interested in).

The 2nd patch in this set implements a very basic generic tracer program
type BPF_PROG_TYPE_GTRACE) that provides the pt_regs data and select task
data in its context.  We cannot attach a program of this type to a kprobe
because that probe supports BPF_PROG_TYPE_KPROBE instead.

The 1st patch in this set implements a mechanism to solve this issue: it
allows a tail-call from one program type to another if the callee type
supports conversion of a caller context into a context for the callee.

So, in the sample, the BPF_PROG_TYPE_GTRACE provides can_cast() and
cast_context() functions that support converting a BPF_PROG_TYPE_KPROBE
context into a BPF_PROG_TYPE_GTRACE context.

The work flow a tracer can use is:

 1. The tracer creates a program array map, and inserts one or more programs
    of type BPF_PROG_TYPE_GTRACE.  These programs implement whatever actions
    are to be taken when a specific probe fires.  This step must be done first
    so that the program array is initialized with the correct program type.
    This type needs to be known so that when the calling program is verified,
    compatibility checking can be performed.
 2. The tracer loads a program of type BPF_PROG_TYPE_KPROBE and attaches it
    to the kprobe we're interested in.  This program contains a tail-call to
    a BPF_PROG_TYPE_GTRACE program in the program array.
 3. The kprobe fires and executes our program (of type BPF_PROG_TYPE_KPROBE).
   3.1 The program performs whatever operations that we need to have done
       at the level of the probe firing.
   3.2 The program performs a tail-call into a program from our program array.
     3.2.1 The execution of the tail-call instruction causes a call to be
           made to a cast_context() function provided by BPF_PROG_TYPE_GTRACE.
           This function creates a context structure, and populates it with
           task information and copies in the pt_regs data from the context
           that was passed to the BPF_PROG_TYPE_KPROBE program.
     3.2.2 The new context is assigned to R1 (replacing the original context),
           and execution is transferred to the called program.

The implementation is done in such way that existing tail-calls will work
without any change aside from the fact that the verifier is inserting an
instruction right before the tail-call.  That instruction simply loads the
BPF program type into R4.  This ensures that at the time of the tail-call,
the program type of the calling program can be passed to the cast_context()
function.  Knowledge about the program type of an executing program is not
available anywhere and we need to know what context we're trying to convert
from.  The function prototype for the (pseudo-)helper bpf_tail_call declares
only 3 arguments so existing code is not affected by this internal use of R4.

Obviously, if there is no conversion function or the conversion is not
supported, the tail-call will fail because that situation is effectively the
same as trying to call a program of an incompatible type.

The goal is to further extend the BPF_PROG_TYPE_GTRACE implementation to
support what tracers commonly need, and I am also looking at ways to further
extend this model to allow more tracer-specific features as well without the
need for adding a BPF program types for every tracer.

	Kris

^ permalink raw reply	[flat|nested] 13+ messages in thread