bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next 0/7] Add libbpf support for USDTs
@ 2022-04-02  0:29 Andrii Nakryiko
  2022-04-02  0:29 ` [PATCH v2 bpf-next 1/7] libbpf: add BPF-side of USDT support Andrii Nakryiko
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Andrii Nakryiko @ 2022-04-02  0:29 UTC (permalink / raw)
  To: bpf, ast, daniel
  Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky, Hengqi Chen

Add libbpf support for USDT (User Statically-Defined Tracing) probes.
USDTs is important part of tracing, and BPF, ecosystem, widely used in
mission-critical production applications for observability, performance
analysis, and debugging.

And while USDTs themselves are pretty complicated abstraction built on top of
uprobes, for end-users USDT is as natural a primitive as uprobes themselves.
And thus it's important for libbpf to provide best possible user experience
when it comes to build tracing applications relying on USDTs.

USDTs historically presented a lot of challenges for libbpf's no
compilation-on-the-fly general approach to BPF tracing. BCC utilizes power of
on-the-fly source code generation and compilation using its embedded Clang
toolchain, which was impractical for more lightweight and thus more rigid
libbpf-based approach. But still, with enough diligence and BPF cookies it's
possible to implement USDT support that feels as natural as tracing any
uprobe.

This patch set is the culmination of such effort to add libbpf USDT support
following the spirit and philosophy of BPF CO-RE (even though it's not
inherently relying on BPF CO-RE much, see patch #1 for some notes regarding
this). Each respective patch has enough details and explanations, so I won't
go into details here.

In the end, I think the overall usability of libbpf's USDT support *exceeds*
the status quo set by BCC due to the elimination of awkward runtime USDT
supporting code generation. It also exceeds BCC's capabilities due to the use
of BPF cookie. This eliminates the need to determine a USDT call site (and
thus specifics about how exactly to fetch arguments) based on its *absolute IP
address*, which is impossible with shared libraries if no PID is specified (as
we then just *can't* know absolute IP at which shared library is loaded,
because it might be different for each process). With BPF cookie this is not
a problem as we record "call site ID" directly in a BPF cookie value. This
makes it possible to do a system-wide tracing of a USDT defined in a shared
library. Think about tracing some USDT in libc across any process in the
system, both running at the time of attachment and all the new processes
started *afterwards*. This is a very powerful capability that allows more
efficient observability and tracing tooling.

Once this functionality lands, the plan is to extend libbpf-bootstrap ([0])
with an USDT example. It will also become possible to start converting BCC
tools that rely on USDTs to their libbpf-based counterparts ([1]).

It's worth noting that preliminary version of this code was currently used and
tested in production code running fleet-wide observability toolkit.

Libbpf functionality is broken down into 5 mostly logically independent parts,
for ease of reviewing:
  - patch #1 adds BPF-side implementation;
  - patch #2 adds user-space APIs and wires bpf_link for USDTs;
  - patch #3 adds the most mundate pieces: handling ELF, parsing USDT notes,
    dealing with memory segments, relative vs absolute addresses, etc;
  - patch #4 adds internal ID allocation and setting up/tearing down of
    BPF-side state (spec and IP-to-ID mapping);
  - patch #5 implements x86/x86-64-specific logic of parsing USDT argument
    specifications;
  - patch #6 adds testing of various basic aspects of handling of USDT;
  - patch #7 extends the set of tests with more combinations of semaphore,
    executable vs shared library, and PID filter options.

  [0] https://github.com/libbpf/libbpf-bootstrap
  [1] https://github.com/iovisor/bcc/tree/master/libbpf-tools

v1->v2:
  - huge high-level comment describing how all the moving parts fit together
    (Alan, Alexei);
  - switched from `__hidden __weak` to `static inline __noinline` for now, as
    there is a bug in BPF linker breaking final BPF object file due to invalid
    .BTF.ext data; I want to fix it separately at which point I'll switch back
    to __hidden __weak again. The fix isn't trivial, so I don't want to block
    on that. Same for __weak variable lookup bug that Henqi reported.
  - various fixes and improvements, addressing other feedback (Alan, Hengqi);

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Hengqi Chen <hengqi.chen@gmail.com>

Andrii Nakryiko (7):
  libbpf: add BPF-side of USDT support
  libbpf: wire up USDT API and bpf_link integration
  libbpf: add USDT notes parsing and resolution logic
  libbpf: wire up spec management and other arch-independent USDT logic
  libbpf: add x86-specific USDT arg spec parsing logic
  selftests/bpf: add basic USDT selftests
  selftests/bpf: add urandom_read shared lib and USDTs

 Documentation/bpf/bpf_devel_QA.rst            |    3 +
 tools/lib/bpf/Build                           |    3 +-
 tools/lib/bpf/Makefile                        |    2 +-
 tools/lib/bpf/libbpf.c                        |  100 +-
 tools/lib/bpf/libbpf.h                        |   31 +
 tools/lib/bpf/libbpf.map                      |    1 +
 tools/lib/bpf/libbpf_internal.h               |   19 +
 tools/lib/bpf/usdt.bpf.h                      |  256 ++++
 tools/lib/bpf/usdt.c                          | 1277 +++++++++++++++++
 tools/testing/selftests/bpf/Makefile          |   25 +-
 tools/testing/selftests/bpf/prog_tests/usdt.c |  421 ++++++
 .../selftests/bpf/progs/test_urandom_usdt.c   |   70 +
 tools/testing/selftests/bpf/progs/test_usdt.c |   96 ++
 .../selftests/bpf/progs/test_usdt_multispec.c |   32 +
 tools/testing/selftests/bpf/urandom_read.c    |   63 +-
 .../testing/selftests/bpf/urandom_read_aux.c  |    9 +
 .../testing/selftests/bpf/urandom_read_lib1.c |   13 +
 .../testing/selftests/bpf/urandom_read_lib2.c |    8 +
 18 files changed, 2406 insertions(+), 23 deletions(-)
 create mode 100644 tools/lib/bpf/usdt.bpf.h
 create mode 100644 tools/lib/bpf/usdt.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/usdt.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_urandom_usdt.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_usdt.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_usdt_multispec.c
 create mode 100644 tools/testing/selftests/bpf/urandom_read_aux.c
 create mode 100644 tools/testing/selftests/bpf/urandom_read_lib1.c
 create mode 100644 tools/testing/selftests/bpf/urandom_read_lib2.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-04-02  0:29 [PATCH v2 bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
@ 2022-04-02  0:29 ` Andrii Nakryiko
  2022-04-04  0:23   ` Dave Marchevsky
  2022-04-02  0:29 ` [PATCH v2 bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration Andrii Nakryiko
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Andrii Nakryiko @ 2022-04-02  0:29 UTC (permalink / raw)
  To: bpf, ast, daniel
  Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky, Hengqi Chen

Add BPF-side implementation of libbpf-provided USDT support. This
consists of single header library, usdt.bpf.h, which is meant to be used
from user's BPF-side source code. This header is added to the list of
installed libbpf header, along bpf_helpers.h and others.

BPF-side implementation consists of two BPF maps:
  - spec map, which contains "a USDT spec" which encodes information
    necessary to be able to fetch USDT arguments and other information
    (argument count, user-provided cookie value, etc) at runtime;
  - IP-to-spec-ID map, which is only used on kernels that don't support
    BPF cookie feature. It allows to lookup spec ID based on the place
    in user application that triggers USDT program.

These maps have default sizes, 256 and 1024, which are chosen
conservatively to not waste a lot of space, but handling a lot of common
cases. But there could be cases when user application needs to either
trace a lot of different USDTs, or USDTs are heavily inlined and their
arguments are located in a lot of differing locations. For such cases it
might be necessary to size those maps up, which libbpf allows to do by
overriding BPF_USDT_MAX_SPEC_CNT and BPF_USDT_MAX_IP_CNT macros.

It is an important aspect to keep in mind. Single USDT (user-space
equivalent of kernel tracepoint) can have multiple USDT "call sites".
That is, single logical USDT is triggered from multiple places in user
application. This can happen due to function inlining. Each such inlined
instance of USDT invocation can have its own unique USDT argument
specification (instructions about the location of the value of each of
USDT arguments). So while USDT looks very similar to usual uprobe or
kernel tracepoint, under the hood it's actually a collection of uprobes,
each potentially needing different spec to know how to fetch arguments.

User-visible API consists of three helper functions:
  - bpf_usdt_arg_cnt(), which returns number of arguments of current USDT;
  - bpf_usdt_arg(), which reads value of specified USDT argument (by
    it's zero-indexed position) and returns it as 64-bit value;
  - bpf_usdt_cookie(), which functions like BPF cookie for USDT
    programs; this is necessary as libbpf doesn't allow specifying actual
    BPF cookie and utilizes it internally for USDT support implementation.

Each bpf_usdt_xxx() APIs expect struct pt_regs * context, passed into
BPF program. On kernels that don't support BPF cookie it is used to
fetch absolute IP address of the underlying uprobe.

usdt.bpf.h also provides BPF_USDT() macro, which functions like
BPF_PROG() and BPF_KPROBE() and allows much more user-friendly way to
get access to USDT arguments, if USDT definition is static and known to
the user. It is expected that majority of use cases won't have to use
bpf_usdt_arg_cnt() and bpf_usdt_arg() directly and BPF_USDT() will cover
all their needs.

Last, usdt.bpf.h is utilizing BPF CO-RE for one single purpose: to
detect kernel support for BPF cookie. If BPF CO-RE dependency is
undesirable, user application can redefine BPF_USDT_HAS_BPF_COOKIE to
either a boolean constant (or equivalently zero and non-zero), or even
point it to its own .rodata variable that can be specified from user's
application user-space code. It is important that
BPF_USDT_HAS_BPF_COOKIE is known to BPF verifier as static value (thus
.rodata and not just .data), as otherwise BPF code will still contain
bpf_get_attach_cookie() BPF helper call and will fail validation at
runtime, if not dead-code eliminated.

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/Makefile   |   2 +-
 tools/lib/bpf/usdt.bpf.h | 256 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 257 insertions(+), 1 deletion(-)
 create mode 100644 tools/lib/bpf/usdt.bpf.h

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index b8b37fe76006..b4fbe8bed555 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -239,7 +239,7 @@ install_lib: all_cmd
 
 SRC_HDRS := bpf.h libbpf.h btf.h libbpf_common.h libbpf_legacy.h xsk.h	     \
 	    bpf_helpers.h bpf_tracing.h bpf_endian.h bpf_core_read.h	     \
-	    skel_internal.h libbpf_version.h
+	    skel_internal.h libbpf_version.h usdt.bpf.h
 GEN_HDRS := $(BPF_GENERATED)
 
 INSTALL_PFX := $(DESTDIR)$(prefix)/include/bpf
diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h
new file mode 100644
index 000000000000..0941c915d58d
--- /dev/null
+++ b/tools/lib/bpf/usdt.bpf.h
@@ -0,0 +1,256 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#ifndef __USDT_BPF_H__
+#define __USDT_BPF_H__
+
+#include <linux/errno.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+
+/* Below types and maps are internal implementation details of libbpf's USDT
+ * support and are subjects to change. Also, usdt_xxx() API helpers should be
+ * considered an unstable API as well and might be adjusted based on user
+ * feedback from using libbpf's USDT support in production.
+ */
+
+/* User can override BPF_USDT_MAX_SPEC_CNT to change default size of internal
+ * map that keeps track of USDT argument specifications. This might be
+ * necessary if there are a lot of USDT attachments.
+ */
+#ifndef BPF_USDT_MAX_SPEC_CNT
+#define BPF_USDT_MAX_SPEC_CNT 256
+#endif
+/* User can override BPF_USDT_MAX_IP_CNT to change default size of internal
+ * map that keeps track of IP (memory address) mapping to USDT argument
+ * specification.
+ * Note, if kernel supports BPF cookies, this map is not used and could be
+ * resized all the way to 1 to save a bit of memory.
+ */
+#ifndef BPF_USDT_MAX_IP_CNT
+#define BPF_USDT_MAX_IP_CNT (4 * BPF_USDT_MAX_SPEC_CNT)
+#endif
+/* We use BPF CO-RE to detect support for BPF cookie from BPF side. This is
+ * the only dependency on CO-RE, so if it's undesirable, user can override
+ * BPF_USDT_HAS_BPF_COOKIE to specify whether to BPF cookie is supported or not.
+ */
+#ifndef BPF_USDT_HAS_BPF_COOKIE
+#define BPF_USDT_HAS_BPF_COOKIE \
+	bpf_core_enum_value_exists(enum bpf_func_id___usdt, BPF_FUNC_get_attach_cookie___usdt)
+#endif
+
+enum __bpf_usdt_arg_type {
+	BPF_USDT_ARG_CONST,
+	BPF_USDT_ARG_REG,
+	BPF_USDT_ARG_REG_DEREF,
+};
+
+struct __bpf_usdt_arg_spec {
+	/* u64 scalar interpreted depending on arg_type, see below */
+	__u64 val_off;
+	/* arg location case, see bpf_udst_arg() for details */
+	enum __bpf_usdt_arg_type arg_type;
+	/* offset of referenced register within struct pt_regs */
+	short reg_off;
+	/* whether arg should be interpreted as signed value */
+	bool arg_signed;
+	/* number of bits that need to be cleared and, optionally,
+	 * sign-extended to cast arguments that are 1, 2, or 4 bytes
+	 * long into final 8-byte u64/s64 value returned to user
+	 */
+	char arg_bitshift;
+};
+
+/* should match USDT_MAX_ARG_CNT in usdt.c exactly */
+#define BPF_USDT_MAX_ARG_CNT 12
+struct __bpf_usdt_spec {
+	struct __bpf_usdt_arg_spec args[BPF_USDT_MAX_ARG_CNT];
+	__u64 usdt_cookie;
+	short arg_cnt;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, BPF_USDT_MAX_SPEC_CNT);
+	__type(key, int);
+	__type(value, struct __bpf_usdt_spec);
+} __bpf_usdt_specs SEC(".maps") __weak;
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, BPF_USDT_MAX_IP_CNT);
+	__type(key, long);
+	__type(value, __u32);
+} __bpf_usdt_ip_to_spec_id SEC(".maps") __weak;
+
+/* don't rely on user's BPF code to have latest definition of bpf_func_id */
+enum bpf_func_id___usdt {
+	BPF_FUNC_get_attach_cookie___usdt = 0xBAD, /* value doesn't matter */
+};
+
+static __always_inline
+int __bpf_usdt_spec_id(struct pt_regs *ctx)
+{
+	if (!BPF_USDT_HAS_BPF_COOKIE) {
+		long ip = PT_REGS_IP(ctx);
+		int *spec_id_ptr;
+
+		spec_id_ptr = bpf_map_lookup_elem(&__bpf_usdt_ip_to_spec_id, &ip);
+		return spec_id_ptr ? *spec_id_ptr : -ESRCH;
+	}
+
+	return bpf_get_attach_cookie(ctx);
+}
+
+/* Return number of USDT arguments defined for currently traced USDT. */
+static inline __noinline
+int bpf_usdt_arg_cnt(struct pt_regs *ctx)
+{
+	struct __bpf_usdt_spec *spec;
+	int spec_id;
+
+	spec_id = __bpf_usdt_spec_id(ctx);
+	if (spec_id < 0)
+		return -ESRCH;
+
+	spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
+	if (!spec)
+		return -ESRCH;
+
+	return spec->arg_cnt;
+}
+
+/* Fetch USDT argument #*arg_num* (zero-indexed) and put its value into *res.
+ * Returns 0 on success; negative error, otherwise.
+ * On error *res is guaranteed to be set to zero.
+ */
+static inline __noinline
+int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
+{
+	struct __bpf_usdt_spec *spec;
+	struct __bpf_usdt_arg_spec *arg_spec;
+	unsigned long val;
+	int err, spec_id;
+
+	*res = 0;
+
+	spec_id = __bpf_usdt_spec_id(ctx);
+	if (spec_id < 0)
+		return -ESRCH;
+
+	spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
+	if (!spec)
+		return -ESRCH;
+
+	if (arg_num >= BPF_USDT_MAX_ARG_CNT || arg_num >= spec->arg_cnt)
+		return -ENOENT;
+
+	arg_spec = &spec->args[arg_num];
+	switch (arg_spec->arg_type) {
+	case BPF_USDT_ARG_CONST:
+		/* Arg is just a constant ("-4@$-9" in USDT arg spec).
+		 * value is recorded in arg_spec->val_off directly.
+		 */
+		val = arg_spec->val_off;
+		break;
+	case BPF_USDT_ARG_REG:
+		/* Arg is in a register (e.g, "8@%rax" in USDT arg spec),
+		 * so we read the contents of that register directly from
+		 * struct pt_regs. To keep things simple user-space parts
+		 * record offsetof(struct pt_regs, <regname>) in arg_spec->reg_off.
+		 */
+		err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
+		if (err)
+			return err;
+		break;
+	case BPF_USDT_ARG_REG_DEREF:
+		/* Arg is in memory addressed by register, plus some offset
+		 * (e.g., "-4@-1204(%rbp)" in USDT arg spec). Register is
+		 * identified lik with BPF_USDT_ARG_REG case, and the offset
+		 * is in arg_spec->val_off. We first fetch register contents
+		 * from pt_regs, then do another user-space probe read to
+		 * fetch argument value itself.
+		 */
+		err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
+		if (err)
+			return err;
+		err = bpf_probe_read_user(&val, sizeof(val), (void *)val + arg_spec->val_off);
+		if (err)
+			return err;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	/* cast arg from 1, 2, or 4 bytes to final 8 byte size clearing
+	 * necessary upper arg_bitshift bits, with sign extension if argument
+	 * is signed
+	 */
+	val <<= arg_spec->arg_bitshift;
+	if (arg_spec->arg_signed)
+		val = ((long)val) >> arg_spec->arg_bitshift;
+	else
+		val = val >> arg_spec->arg_bitshift;
+	*res = val;
+	return 0;
+}
+
+/* Retrieve user-specified cookie value provided during attach as
+ * bpf_usdt_opts.usdt_cookie. This serves the same purpose as BPF cookie
+ * returned by bpf_get_attach_cookie(). Libbpf's support for USDT is itself
+ * utilizaing BPF cookies internally, so user can't use BPF cookie directly
+ * for USDT programs and has to use bpf_usdt_cookie() API instead.
+ */
+static inline __noinline
+long bpf_usdt_cookie(struct pt_regs *ctx)
+{
+	struct __bpf_usdt_spec *spec;
+	int spec_id;
+
+	spec_id = __bpf_usdt_spec_id(ctx);
+	if (spec_id < 0)
+		return 0;
+
+	spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
+	if (!spec)
+		return 0;
+
+	return spec->usdt_cookie;
+}
+
+/* we rely on ___bpf_apply() and ___bpf_narg() macros already defined in bpf_tracing.h */
+#define ___bpf_usdt_args0() ctx
+#define ___bpf_usdt_args1(x) ___bpf_usdt_args0(), ({ long _x; bpf_usdt_arg(ctx, 0, &_x); (void *)_x; })
+#define ___bpf_usdt_args2(x, args...) ___bpf_usdt_args1(args), ({ long _x; bpf_usdt_arg(ctx, 1, &_x); (void *)_x; })
+#define ___bpf_usdt_args3(x, args...) ___bpf_usdt_args2(args), ({ long _x; bpf_usdt_arg(ctx, 2, &_x); (void *)_x; })
+#define ___bpf_usdt_args4(x, args...) ___bpf_usdt_args3(args), ({ long _x; bpf_usdt_arg(ctx, 3, &_x); (void *)_x; })
+#define ___bpf_usdt_args5(x, args...) ___bpf_usdt_args4(args), ({ long _x; bpf_usdt_arg(ctx, 4, &_x); (void *)_x; })
+#define ___bpf_usdt_args6(x, args...) ___bpf_usdt_args5(args), ({ long _x; bpf_usdt_arg(ctx, 5, &_x); (void *)_x; })
+#define ___bpf_usdt_args7(x, args...) ___bpf_usdt_args6(args), ({ long _x; bpf_usdt_arg(ctx, 6, &_x); (void *)_x; })
+#define ___bpf_usdt_args8(x, args...) ___bpf_usdt_args7(args), ({ long _x; bpf_usdt_arg(ctx, 7, &_x); (void *)_x; })
+#define ___bpf_usdt_args9(x, args...) ___bpf_usdt_args8(args), ({ long _x; bpf_usdt_arg(ctx, 8, &_x); (void *)_x; })
+#define ___bpf_usdt_args10(x, args...) ___bpf_usdt_args9(args), ({ long _x; bpf_usdt_arg(ctx, 9, &_x); (void *)_x; })
+#define ___bpf_usdt_args11(x, args...) ___bpf_usdt_args10(args), ({ long _x; bpf_usdt_arg(ctx, 10, &_x); (void *)_x; })
+#define ___bpf_usdt_args12(x, args...) ___bpf_usdt_args11(args), ({ long _x; bpf_usdt_arg(ctx, 11, &_x); (void *)_x; })
+#define ___bpf_usdt_args(args...) ___bpf_apply(___bpf_usdt_args, ___bpf_narg(args))(args)
+
+/*
+ * BPF_USDT serves the same purpose for USDT handlers as BPF_PROG for
+ * tp_btf/fentry/fexit BPF programs and BPF_KPROBE for kprobes.
+ * Original struct pt_regs * context is preserved as 'ctx' argument.
+ */
+#define BPF_USDT(name, args...)						    \
+name(struct pt_regs *ctx);						    \
+static __attribute__((always_inline)) typeof(name(0))			    \
+____##name(struct pt_regs *ctx, ##args);				    \
+typeof(name(0)) name(struct pt_regs *ctx)				    \
+{									    \
+        _Pragma("GCC diagnostic push")					    \
+        _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")		    \
+        return ____##name(___bpf_usdt_args(args));			    \
+        _Pragma("GCC diagnostic pop")					    \
+}									    \
+static __attribute__((always_inline)) typeof(name(0))			    \
+____##name(struct pt_regs *ctx, ##args)
+
+#endif /* __USDT_BPF_H__ */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration
  2022-04-02  0:29 [PATCH v2 bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
  2022-04-02  0:29 ` [PATCH v2 bpf-next 1/7] libbpf: add BPF-side of USDT support Andrii Nakryiko
@ 2022-04-02  0:29 ` Andrii Nakryiko
  2022-04-04  3:12   ` Dave Marchevsky
  2022-04-02  0:29 ` [PATCH v2 bpf-next 3/7] libbpf: add USDT notes parsing and resolution logic Andrii Nakryiko
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Andrii Nakryiko @ 2022-04-02  0:29 UTC (permalink / raw)
  To: bpf, ast, daniel
  Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky, Hengqi Chen

Wire up libbpf USDT support APIs without yet implementing all the
nitty-gritty details of USDT discovery, spec parsing, and BPF map
initialization.

User-visible user-space API is simple and is conceptually very similar
to uprobe API.

bpf_program__attach_usdt() API allows to programmatically attach given
BPF program to a USDT, specified through binary path (executable or
shared lib), USDT provider and name. Also, just like in uprobe case, PID
filter is specified (0 - self, -1 - any process, or specific PID).
Optionally, USDT cookie value can be specified. Such single API
invocation will try to discover given USDT in specified binary and will
use (potentially many) BPF uprobes to attach this program in correct
locations.

Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that
represents this attachment. It is a virtual BPF link that doesn't have
direct kernel object, as it can consist of multiple underlying BPF
uprobe links. As such, attachment is not atomic operation and there can
be brief moment when some USDT call sites are attached while others are
still in the process of attaching. This should be taken into
consideration by user. But bpf_program__attach_usdt() guarantees that
in the case of success all USDT call sites are successfully attached, or
all the successfuly attachments will be detached as soon as some USDT
call sites failed to be attached. So, in theory, there could be cases of
failed bpf_program__attach_usdt() call which did trigger few USDT
program invocations. This is unavoidable due to multi-uprobe nature of
USDT and has to be handled by user, if it's important to create an
illusion of atomicity.

USDT BPF programs themselves are marked in BPF source code as either
SEC("usdt"), in which case they won't be auto-attached through
skeleton's <skel>__attach() method, or it can have a full definition,
which follows the spirit of fully-specified uprobes:
SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's
attach method will attempt auto-attachment. Similarly, generic
bpf_program__attach() will have enought information to go off of for
parameterless attachment.

USDT BPF programs are actually uprobes, and as such for kernel they are
marked as BPF_PROG_TYPE_KPROBE.

Another part of this patch is USDT-related feature probing:
  - BPF cookie support detection from user-space;
  - detection of kernel support for auto-refcounting of USDT semaphore.

The latter is optional. If kernel doesn't support such feature and USDT
doesn't rely on USDT semaphores, no error is returned. But if libbpf
detects that USDT requires setting semaphores and kernel doesn't support
this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't
support poking process's memory directly to increment semaphore value,
like BCC does on legacy kernels, due to inherent raciness and danger of
such process memory manipulation. Libbpf let's kernel take care of this
properly or gives up.

Logistically, all the extra USDT-related infrastructure of libbpf is put
into a separate usdt.c file and abstracted behind struct usdt_manager.
Each bpf_object has lazily-initialized usdt_manager pointer, which is
only instantiated if USDT programs are attempted to be attached. Closing
BPF object frees up usdt_manager resources. usdt_manager keeps track of
USDT spec ID assignment and few other small things.

Subsequent patches will fill out remaining missing pieces of USDT
initialization and setup logic.

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/Build             |   3 +-
 tools/lib/bpf/libbpf.c          | 100 +++++++-
 tools/lib/bpf/libbpf.h          |  31 +++
 tools/lib/bpf/libbpf.map        |   1 +
 tools/lib/bpf/libbpf_internal.h |  19 ++
 tools/lib/bpf/usdt.c            | 426 ++++++++++++++++++++++++++++++++
 6 files changed, 571 insertions(+), 9 deletions(-)
 create mode 100644 tools/lib/bpf/usdt.c

diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index 94f0a146bb7b..31a1a9015902 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1,3 +1,4 @@
 libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \
 	    netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o hashmap.o \
-	    btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o
+	    btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \
+	    usdt.o
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 809fe209cdcc..ed0f2ff61561 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -483,6 +483,8 @@ struct elf_state {
 	int st_ops_shndx;
 };
 
+struct usdt_manager;
+
 struct bpf_object {
 	char name[BPF_OBJ_NAME_LEN];
 	char license[64];
@@ -545,6 +547,8 @@ struct bpf_object {
 	size_t fd_array_cap;
 	size_t fd_array_cnt;
 
+	struct usdt_manager *usdt_man;
+
 	char path[];
 };
 
@@ -4678,6 +4682,18 @@ static int probe_perf_link(void)
 	return link_fd < 0 && err == -EBADF;
 }
 
+static int probe_kern_bpf_cookie(void)
+{
+	struct bpf_insn insns[] = {
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_attach_cookie),
+		BPF_EXIT_INSN(),
+	};
+	int ret, insn_cnt = ARRAY_SIZE(insns);
+
+	ret = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL", insns, insn_cnt, NULL);
+	return probe_fd(ret);
+}
+
 enum kern_feature_result {
 	FEAT_UNKNOWN = 0,
 	FEAT_SUPPORTED = 1,
@@ -4740,6 +4756,9 @@ static struct kern_feature_desc {
 	[FEAT_MEMCG_ACCOUNT] = {
 		"memcg-based memory accounting", probe_memcg_account,
 	},
+	[FEAT_BPF_COOKIE] = {
+		"BPF cookie support", probe_kern_bpf_cookie,
+	},
 };
 
 bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
@@ -8200,6 +8219,9 @@ void bpf_object__close(struct bpf_object *obj)
 	if (obj->clear_priv)
 		obj->clear_priv(obj, obj->priv);
 
+	usdt_manager_free(obj->usdt_man);
+	obj->usdt_man = NULL;
+
 	bpf_gen__free(obj->gen_loader);
 	bpf_object__elf_finish(obj);
 	bpf_object_unload(obj);
@@ -8630,6 +8652,7 @@ int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log
 }
 
 static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link);
+static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link);
 static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link);
 static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link);
 static int attach_trace(const struct bpf_program *prog, long cookie, struct bpf_link **link);
@@ -8647,6 +8670,7 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("uretprobe/",		KPROBE, 0, SEC_NONE),
 	SEC_DEF("kprobe.multi/",	KPROBE,	BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
 	SEC_DEF("kretprobe.multi/",	KPROBE,	BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
+	SEC_DEF("usdt+",		KPROBE,	0, SEC_NONE, attach_usdt),
 	SEC_DEF("tc",			SCHED_CLS, 0, SEC_NONE),
 	SEC_DEF("classifier",		SCHED_CLS, 0, SEC_NONE | SEC_SLOPPY_PFX | SEC_DEPRECATED),
 	SEC_DEF("action",		SCHED_ACT, 0, SEC_NONE | SEC_SLOPPY_PFX),
@@ -9692,14 +9716,6 @@ int bpf_prog_load_deprecated(const char *file, enum bpf_prog_type type,
 	return bpf_prog_load_xattr2(&attr, pobj, prog_fd);
 }
 
-struct bpf_link {
-	int (*detach)(struct bpf_link *link);
-	void (*dealloc)(struct bpf_link *link);
-	char *pin_path;		/* NULL, if not pinned */
-	int fd;			/* hook FD, -1 if not applicable */
-	bool disconnected;
-};
-
 /* Replace link's underlying BPF program with the new one */
 int bpf_link__update_program(struct bpf_link *link, struct bpf_program *prog)
 {
@@ -10599,6 +10615,74 @@ struct bpf_link *bpf_program__attach_uprobe(const struct bpf_program *prog,
 	return bpf_program__attach_uprobe_opts(prog, pid, binary_path, func_offset, &opts);
 }
 
+struct bpf_link *bpf_program__attach_usdt(const struct bpf_program *prog,
+					  pid_t pid, const char *binary_path,
+					  const char *usdt_provider, const char *usdt_name,
+					  const struct bpf_usdt_opts *opts)
+{
+	struct bpf_object *obj = prog->obj;
+	struct bpf_link *link;
+	long usdt_cookie;
+	int err;
+
+	if (!OPTS_VALID(opts, bpf_uprobe_opts))
+		return libbpf_err_ptr(-EINVAL);
+
+	if (bpf_program__fd(prog) < 0) {
+		pr_warn("prog '%s': can't attach BPF program w/o FD (did you load it?)\n",
+			prog->name);
+		return libbpf_err_ptr(-EINVAL);
+	}
+
+	/* USDT manager is instantiated lazily on first USDT attach. It will
+	 * be destroyed together with BPF object in bpf_object__close().
+	 */
+	if (IS_ERR(obj->usdt_man))
+		return libbpf_ptr(obj->usdt_man);
+	if (!obj->usdt_man) {
+		obj->usdt_man = usdt_manager_new(obj);
+		if (IS_ERR(obj->usdt_man))
+			return libbpf_ptr(obj->usdt_man);
+	}
+
+	usdt_cookie = OPTS_GET(opts, usdt_cookie, 0);
+	link = usdt_manager_attach_usdt(obj->usdt_man, prog, pid, binary_path,
+				        usdt_provider, usdt_name, usdt_cookie);
+	err = libbpf_get_error(link);
+	if (err)
+		return libbpf_err_ptr(err);
+	return link;
+}
+
+static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link)
+{
+	char *path = NULL, *provider = NULL, *name = NULL;
+	const char *sec_name;
+	int n, err;
+
+	sec_name = bpf_program__section_name(prog);
+	if (strcmp(sec_name, "usdt") == 0) {
+		/* no auto-attach for just SEC("usdt") */
+		*link = NULL;
+		return 0;
+	}
+
+	n = sscanf(sec_name, "usdt/%m[^:]:%m[^:]:%m[^:]", &path, &provider, &name);
+	if (n != 3) {
+		pr_warn("invalid section '%s', expected SEC(\"usdt/<path>:<provider>:<name>\")\n",
+			sec_name);
+		err = -EINVAL;
+	} else {
+		*link = bpf_program__attach_usdt(prog, -1 /* any process */, path,
+						 provider, name, NULL);
+		err = libbpf_get_error(*link);
+	}
+	free(path);
+	free(provider);
+	free(name);
+	return err;
+}
+
 static int determine_tracepoint_id(const char *tp_category,
 				   const char *tp_name)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 05dde85e19a6..66d346cd2069 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -503,6 +503,37 @@ bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid,
 				const char *binary_path, size_t func_offset,
 				const struct bpf_uprobe_opts *opts);
 
+struct bpf_usdt_opts {
+	/* size of this struct, for forward/backward compatibility */
+	size_t sz;
+	/* custom user-provided value accessible through usdt_cookie() */
+	__u64 usdt_cookie;
+	size_t :0;
+};
+#define bpf_usdt_opts__last_field usdt_cookie
+
+/**
+ * @brief **bpf_program__attach_usdt()** is just like
+ * bpf_program__attach_uprobe_opts() except it covers USDT (User-space
+ * Statically Defined Tracepoint) attachment, instead of attaching to
+ * user-space function entry or exit.
+ *
+ * @param prog BPF program to attach
+ * @param pid Process ID to attach the uprobe to, 0 for self (own process),
+ * -1 for all processes
+ * @param binary_path Path to binary that contains provided USDT probe
+ * @param usdt_provider USDT provider name
+ * @param usdt_name USDT probe name
+ * @param opts Options for altering program attachment
+ * @return Reference to the newly created BPF link; or NULL is returned on error,
+ * error code is stored in errno
+ */
+LIBBPF_API struct bpf_link *
+bpf_program__attach_usdt(const struct bpf_program *prog,
+			 pid_t pid, const char *binary_path,
+			 const char *usdt_provider, const char *usdt_name,
+			 const struct bpf_usdt_opts *opts);
+
 struct bpf_tracepoint_opts {
 	/* size of this struct, for forward/backward compatiblity */
 	size_t sz;
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index dd35ee58bfaa..82f6d62176dd 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -444,6 +444,7 @@ LIBBPF_0.8.0 {
 	global:
 		bpf_object__destroy_subskeleton;
 		bpf_object__open_subskeleton;
+		bpf_program__attach_usdt;
 		libbpf_register_prog_handler;
 		libbpf_unregister_prog_handler;
 		bpf_program__attach_kprobe_multi_opts;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index b6247dc7f8eb..dd0d4ccfa649 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -148,6 +148,15 @@ do {				\
 #ifndef __has_builtin
 #define __has_builtin(x) 0
 #endif
+
+struct bpf_link {
+	int (*detach)(struct bpf_link *link);
+	void (*dealloc)(struct bpf_link *link);
+	char *pin_path;		/* NULL, if not pinned */
+	int fd;			/* hook FD, -1 if not applicable */
+	bool disconnected;
+};
+
 /*
  * Re-implement glibc's reallocarray() for libbpf internal-only use.
  * reallocarray(), unfortunately, is not available in all versions of glibc,
@@ -329,6 +338,8 @@ enum kern_feature_id {
 	FEAT_BTF_TYPE_TAG,
 	/* memcg-based accounting for BPF maps and progs */
 	FEAT_MEMCG_ACCOUNT,
+	/* BPF cookie (bpf_get_attach_cookie() BPF helper) support */
+	FEAT_BPF_COOKIE,
 	__FEAT_CNT,
 };
 
@@ -543,4 +554,12 @@ int bpf_core_add_cands(struct bpf_core_cand *local_cand,
 		       struct bpf_core_cand_list *cands);
 void bpf_core_free_cands(struct bpf_core_cand_list *cands);
 
+struct usdt_manager *usdt_manager_new(struct bpf_object *obj);
+void usdt_manager_free(struct usdt_manager *man);
+struct bpf_link * usdt_manager_attach_usdt(struct usdt_manager *man,
+					   const struct bpf_program *prog,
+					   pid_t pid, const char *path,
+					   const char *usdt_provider, const char *usdt_name,
+					   long usdt_cookie);
+
 #endif /* __LIBBPF_LIBBPF_INTERNAL_H */
diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
new file mode 100644
index 000000000000..9476f7a15769
--- /dev/null
+++ b/tools/lib/bpf/usdt.c
@@ -0,0 +1,426 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#include <ctype.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <libelf.h>
+#include <gelf.h>
+#include <unistd.h>
+#include <linux/ptrace.h>
+#include <linux/kernel.h>
+
+#include "bpf.h"
+#include "libbpf.h"
+#include "libbpf_common.h"
+#include "libbpf_internal.h"
+#include "hashmap.h"
+
+/* libbpf's USDT support consists of BPF-side state/code and user-space
+ * state/code working together in concert. BPF-side parts are defined in
+ * usdt.bpf.h header library. User-space state is encapsulated by struct
+ * usdt_manager and all the supporting code centered around usdt_manager.
+ *
+ * usdt.bpf.h defines two BPF maps that usdt_manager expects: USDT spec map
+ * and IP-to-spec-ID map, which is auxiliary map necessary for kernels that
+ * don't support BPF cookie (see below). These two maps are implicitly
+ * embedded into user's end BPF object file when user's code included
+ * usdt.bpf.h. This means that libbpf doesn't do anything special to create
+ * these USDT support maps. They are created by normal libbpf logic of
+ * instantiating BPF maps when opening and loading BPF object.
+ *
+ * As such, libbpf is basically unaware of the need to do anything
+ * USDT-related until the very first call to bpf_program__attach_usdt(), which
+ * can be called by user explicitly or happen automatically during skeleton
+ * attach (or, equivalently, through generic bpf_program__attach() call). At
+ * this point, libbpf will instantiate and initialize struct usdt_manager and
+ * store it in bpf_object. USDT manager is per-BPF object construct, as each
+ * independent BPF object might or might not have USDT programs, and thus all
+ * the expected USDT-related state. There is no coordination between two
+ * bpf_object in parts of USDT attachment, they are oblivious of each other's
+ * existence and libbpf is just oblivious, dealing with bpf_object-specific
+ * USDT state.
+ *
+ * Quick crash course on USDTs.
+ *
+ * From user-space application's point of view, USDT is essentially just
+ * a slightly special function call that normally has zero overhead, unless it
+ * is being traced by some external entity (e.g, BPF-based tool). Here's how
+ * a typical application can trigger USDT probe:
+ *
+ * #include <sys/sdt.h>  // provided by systemtap-sdt-devel package
+ * // folly also provide similar functionality in folly/tracing/StaticTracepoint.h
+ *
+ * STAP_PROBE3(my_usdt_provider, my_usdt_probe_name, 123, x, &y);
+ *
+ * USDT is identified by it's <provider-name>:<probe-name> pair of names. Each
+ * individual USDT has a fixed number of arguments (3 in the above example)
+ * and specifies values of each argument as if it was a function call.
+ *
+ * USDT call is actually not a function call, but is instead replaced by
+ * a single NOP instruction (thus zero overhead, effectively). But in addition
+ * to that, those USDT macros generate special SHT_NOTE ELF records in
+ * .note.stapsdt ELF section. Here's an example USDT definition as emitted by
+ * `readelf -n <binary>`:
+ *
+ *   stapsdt              0x00000089       NT_STAPSDT (SystemTap probe descriptors)
+ *   Provider: test
+ *   Name: usdt12
+ *   Location: 0x0000000000549df3, Base: 0x00000000008effa4, Semaphore: 0x0000000000a4606e
+ *   Arguments: -4@-1204(%rbp) -4@%edi -8@-1216(%rbp) -8@%r8 -4@$5 -8@%r9 8@%rdx 8@%r10 -4@$-9 -2@%cx -2@%ax -1@%sil
+ *
+ * In this case we have USDT test:usdt12 with 12 arguments.
+ *
+ * Location and base are offsets used to calculate absolute IP address of that
+ * NOP instruction that kernel can replace with an interrupt instruction to
+ * trigger instrumentation code (BPF program for all that we care about).
+ *
+ * Semaphore above is and optional feature. It records an address of a 2-byte
+ * refcount variable (normally in '.probes' ELF section) used for signaling if
+ * there is anything that is attached to USDT. This is useful for user
+ * applications if, for example, they need to prepare some arguments that are
+ * passed only to USDTs and preparation is expensive. By checking if USDT is
+ * "activated", an application can avoid paying those costs unnecessarily.
+ * Recent enough kernel has built-in support for automatically managing this
+ * refcount, which libbpf expects and relies on. If USDT is defined without
+ * associated semaphore, this value will be zero. See selftests for semaphore
+ * examples.
+ *
+ * Arguments is the most interesting part. This USDT specification string is
+ * providing information about all the USDT arguments and their locations. The
+ * part before @ sign defined byte size of the argument (1, 2, 4, or 8) and
+ * whether the argument is singed or unsigned (negative size means signed).
+ * The part after @ sign is assembly-like definition of argument location.
+ * Technically, assembler can provide some pretty advanced definitions, but
+ * libbpf is currently supporting three most common cases:
+ *   1) immediate constant, see 5th and 9th args above (-4@$5 and -4@-9);
+ *   2) register value, e.g., 8@%rdx, which means "unsigned 8-byte integer
+ *      whose value is in register %rdx";
+ *   3) memory dereference addressed by register, e.g., -4@-1204(%rbp), which
+ *      specifies signed 32-bit integer stored at offset -1204 bytes from
+ *      memory address stored in %rbp.
+ *
+ * During attachment, libbpf parses all the relevant USDT specifications and
+ * prepares `struct usdt_spec` (USDT spec), which is then provided to BPF-side
+ * code through spec map. This allows BPF applications to quickly fetch the
+ * actual value at runtime using a simple BPF-side code.
+ *
+ * With basics out of the way, let's go over less immeditately obvious aspects
+ * of supporting USDTs.
+ *
+ * First, there is no special USDT BPF program type. It is actually just
+ * a uprobe BPF program (which for kernel, at least currently, is just a kprobe
+ * program, so BPF_PROG_TYPE_KPROBE program type). With the only difference
+ * that uprobe is usually attached at the function entry, while USDT will
+ * normally will be somewhere inside the function. But it should always be
+ * pointing to NOP instruction, which makes such uprobes the fastest uprobe
+ * kind.
+ *
+ * Second, it's important to realize that such STAP_PROBEn(provider, name, ...)
+ * macro invocations can end up being inlined many-many times, depending on
+ * specifics of each individual user application. So single conceptual USDT
+ * (identified by provider:name pair of identifiers) is, generally speaking,
+ * multiple uprobe locations (USDT call sites) in different places in user
+ * application. Further, again due to inlining, each USDT call site might end
+ * up having the same argument #N be located in a different place. In one call
+ * site it could be a constant, in another will end up in a register, and in
+ * yet another could be some other register or even somewhere on the stack.
+ *
+ * As such, "attaching to USDT" means (in general case) attaching the same
+ * uprobe BPF program to multiple target locations in user application, each
+ * potentially having a completely different USDT spec associated with it.
+ * To wire all this up together libbpf allocates a unique integer spec ID for
+ * each unique USDT spec. Spec IDs are allocated as sequential small integers
+ * so that they can be used as keys in array BPF map (for performance reasons).
+ * Spec ID allocation and accounting is big part of what usdt_manager is
+ * about. This state has to be maintained per-BPF object and coordinate
+ * between different USDT attachments within the same BPF object.
+ *
+ * Spec ID is the key in spec BPF map, value is the actual USDT spec layed out
+ * as struct usdt_spec. Each invocation of BPF program at runtime needs to
+ * know its associated spec ID. It gets it either through BPF cookie, which
+ * libbpf sets to spec ID during attach time, or, if kernel is too old to
+ * support BPF cookie, through IP-to-spec-ID map that libbpf maintains in such
+ * case. The latter means that some modes of operation can't be supported
+ * without BPF cookie. Such mode is attaching to shared library "generically",
+ * without specifying target process. In such case, it's impossible to
+ * calculate absolute IP addresses for IP-to-spec-ID map, and thus such mode
+ * is not supported without BPF cookie support.
+ *
+ * Note that libbpf is using BPF cookie functionality for its own internal
+ * needs, so user itself can't rely on BPF cookie feature. To that end, libbpf
+ * provides conceptually equivalent USDT cookie support. It's still u64
+ * user-provided value that can be associated with USDT attachment. Note that
+ * this will be the same value for all USDT call sites within the same single
+ * *logical* USDT attachment. This makes sense because to user attaching to
+ * USDT is a single BPF program triggered for singular USDT probe. The fact
+ * that this is done at multiple actual locations is a mostly hidden
+ * implementation details. This USDT cookie value can be fetched with
+ * bpf_usdt_cookie(ctx) API provided by usdt.bpf.h
+ *
+ * Lastly, while single USDT can have tons of USDT call sites, it doesn't
+ * necessarily have that many different USDT specs. It very well might be
+ * that 1000 USDT call sites only need 5 different USDT specs, because all the
+ * arguments are typically contained in a small set of registers or stack
+ * locations. As such, it's wasteful to allocate as many USDT spec IDs as
+ * there are USDT call sites. So libbpf tries to be frugal and performs
+ * on-the-fly deduplication during a single USDT attachment to only allocate
+ * the minimal required amount of unique USDT specs (and thus spec IDs). This
+ * is trivially achieved by using USDT spec string (Arguments string from USDT
+ * note) as a lookup key in a hashmap. USDT spec string uniquely defines
+ * everything about how to fetch USDT arguments, so two USDT call sites
+ * sharing USDT spec string can safely share the same USDT spec and spec ID.
+ * Note, this spec string deduplication is happening only during the same USDT
+ * attachment, so each USDT spec shares the same USDT cookie value. This is
+ * not generally true for other USDT attachments within the same BPF object,
+ * as even if USDT spec string is the same, USDT cookie value can be
+ * different. It was deemed excessive to try to deduplicate across independent
+ * USDT attachments by taking into account USDT spec string *and* USDT cookie
+ * value, which would complicated spec ID accounting significantly for little
+ * gain.
+ */
+
+struct usdt_target {
+	long abs_ip;
+	long rel_ip;
+	long sema_off;
+};
+
+struct usdt_manager {
+	struct bpf_map *specs_map;
+	struct bpf_map *ip_to_spec_id_map;
+
+	bool has_bpf_cookie;
+	bool has_sema_refcnt;
+};
+
+struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
+{
+	static const char *ref_ctr_sysfs_path = "/sys/bus/event_source/devices/uprobe/format/ref_ctr_offset";
+	struct usdt_manager *man;
+	struct bpf_map *specs_map, *ip_to_spec_id_map;
+
+	specs_map = bpf_object__find_map_by_name(obj, "__bpf_usdt_specs");
+	ip_to_spec_id_map = bpf_object__find_map_by_name(obj, "__bpf_usdt_ip_to_spec_id");
+	if (!specs_map || !ip_to_spec_id_map) {
+		pr_warn("usdt: failed to find USDT support BPF maps, did you forget to include bpf/usdt.bpf.h?\n");
+		return ERR_PTR(-ESRCH);
+	}
+
+	man = calloc(1, sizeof(*man));
+	if (!man)
+		return ERR_PTR(-ENOMEM);
+
+	man->specs_map = specs_map;
+	man->ip_to_spec_id_map = ip_to_spec_id_map;
+
+	/* Detect if BPF cookie is supported for kprobes.
+	 * We don't need IP-to-ID mapping if we can use BPF cookies.
+	 * Added in: 7adfc6c9b315 ("bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value")
+	 */
+	man->has_bpf_cookie = kernel_supports(obj, FEAT_BPF_COOKIE);
+
+	/* Detect kernel support for automatic refcounting of USDT semaphore.
+	 * If this is not supported, USDTs with semaphores will not be supported.
+	 * Added in: a6ca88b241d5 ("trace_uprobe: support reference counter in fd-based uprobe")
+	 */
+	man->has_sema_refcnt = access(ref_ctr_sysfs_path, F_OK) == 0;
+
+	return man;
+}
+
+void usdt_manager_free(struct usdt_manager *man)
+{
+	if (IS_ERR_OR_NULL(man))
+		return;
+
+	free(man);
+}
+
+static int sanity_check_usdt_elf(Elf *elf, const char *path)
+{
+	GElf_Ehdr ehdr;
+	int endianness;
+
+	if (elf_kind(elf) != ELF_K_ELF) {
+		pr_warn("usdt: unrecognized ELF kind %d for '%s'\n", elf_kind(elf), path);
+		return -EBADF;
+	}
+
+	switch (gelf_getclass(elf)) {
+	case ELFCLASS64:
+		if (sizeof(void *) != 8) {
+			pr_warn("usdt: attaching to 64-bit ELF binary '%s' is not supported\n", path);
+			return -EBADF;
+		}
+		break;
+	case ELFCLASS32:
+		if (sizeof(void *) != 4) {
+			pr_warn("usdt: attaching to 32-bit ELF binary '%s' is not supported\n", path);
+			return -EBADF;
+		}
+		break;
+	default:
+		pr_warn("usdt: unsupported ELF class for '%s'\n", path);
+		return -EBADF;
+	}
+
+	if (!gelf_getehdr(elf, &ehdr))
+		return -EINVAL;
+
+	if (ehdr.e_type != ET_EXEC && ehdr.e_type != ET_DYN) {
+		pr_warn("usdt: unsupported type of ELF binary '%s' (%d), only ET_EXEC and ET_DYN are supported\n",
+			path, ehdr.e_type);
+		return -EBADF;
+	}
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+	endianness = ELFDATA2LSB;
+#elif __BYTE_ORDER == __BIG_ENDIAN
+	endianness = ELFDATA2MSB;
+#else
+# error "Unrecognized __BYTE_ORDER__"
+#endif
+	if (endianness != ehdr.e_ident[EI_DATA]) {
+		pr_warn("usdt: ELF endianness mismatch for '%s'\n", path);
+		return -EBADF;
+	}
+
+	return 0;
+}
+
+static int collect_usdt_targets(struct usdt_manager *man, Elf *elf, const char *path, pid_t pid,
+				const char *usdt_provider, const char *usdt_name, long usdt_cookie,
+				struct usdt_target **out_targets, size_t *out_target_cnt)
+{
+	return -ENOTSUP;
+}
+
+struct bpf_link_usdt {
+	struct bpf_link link;
+
+	struct usdt_manager *usdt_man;
+
+	size_t uprobe_cnt;
+	struct {
+		long abs_ip;
+		struct bpf_link *link;
+	} *uprobes;
+};
+
+static int bpf_link_usdt_detach(struct bpf_link *link)
+{
+	struct bpf_link_usdt *usdt_link = container_of(link, struct bpf_link_usdt, link);
+	int i;
+
+	for (i = 0; i < usdt_link->uprobe_cnt; i++) {
+		/* detach underlying uprobe link */
+		bpf_link__destroy(usdt_link->uprobes[i].link);
+	}
+
+	return 0;
+}
+
+static void bpf_link_usdt_dealloc(struct bpf_link *link)
+{
+	struct bpf_link_usdt *usdt_link = container_of(link, struct bpf_link_usdt, link);
+
+	free(usdt_link->uprobes);
+	free(usdt_link);
+}
+
+struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct bpf_program *prog,
+					  pid_t pid, const char *path,
+					  const char *usdt_provider, const char *usdt_name,
+					  long usdt_cookie)
+{
+	int i, fd, err;
+	LIBBPF_OPTS(bpf_uprobe_opts, opts);
+	struct bpf_link_usdt *link = NULL;
+	struct usdt_target *targets = NULL;
+	size_t target_cnt;
+	Elf *elf;
+
+	/* TODO: perform path resolution similar to uprobe's */
+	fd = open(path, O_RDONLY);
+	if (fd < 0) {
+		err = -errno;
+		pr_warn("usdt: failed to open ELF binary '%s': %d\n", path, err);
+		return libbpf_err_ptr(err);
+	}
+
+	elf = elf_begin(fd, ELF_C_READ_MMAP, NULL);
+	if (!elf) {
+		err = -EBADF;
+		pr_warn("usdt: failed to parse ELF binary '%s': %s\n", path, elf_errmsg(-1));
+		goto err_out;
+	}
+
+	err = sanity_check_usdt_elf(elf, path);
+	if (err)
+		goto err_out;
+
+	/* normalize PID filter */
+	if (pid < 0)
+		pid = -1;
+	else if (pid == 0)
+		pid = getpid();
+
+	/* discover USDT in given binary, optionally limiting
+	 * activations to a given PID, if pid > 0
+	 */
+	err = collect_usdt_targets(man, elf, path, pid, usdt_provider, usdt_name,
+				   usdt_cookie, &targets, &target_cnt);
+	if (err <= 0) {
+		err = (err == 0) ? -ENOENT : err;
+		goto err_out;
+	}
+
+	link = calloc(1, sizeof(*link));
+	if (!link) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+
+	link->usdt_man = man;
+	link->link.detach = &bpf_link_usdt_detach;
+	link->link.dealloc = &bpf_link_usdt_dealloc;
+
+	link->uprobes = calloc(target_cnt, sizeof(*link->uprobes));
+	if (!link->uprobes) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+
+	for (i = 0; i < target_cnt; i++) {
+		struct usdt_target *target = &targets[i];
+		struct bpf_link *uprobe_link;
+
+		opts.ref_ctr_offset = target->sema_off;
+		uprobe_link = bpf_program__attach_uprobe_opts(prog, pid, path,
+							      target->rel_ip, &opts);
+		err = libbpf_get_error(uprobe_link);
+		if (err) {
+			pr_warn("usdt: failed to attach uprobe #%d for '%s:%s' in '%s': %d\n",
+				i, usdt_provider, usdt_name, path, err);
+			goto err_out;
+		}
+
+		link->uprobes[i].link = uprobe_link;
+		link->uprobes[i].abs_ip = target->abs_ip;
+		link->uprobe_cnt++;
+	}
+
+	elf_end(elf);
+	close(fd);
+
+	return &link->link;
+
+err_out:
+	bpf_link__destroy(&link->link);
+
+	if (elf)
+		elf_end(elf);
+	close(fd);
+	return libbpf_err_ptr(err);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 bpf-next 3/7] libbpf: add USDT notes parsing and resolution logic
  2022-04-02  0:29 [PATCH v2 bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
  2022-04-02  0:29 ` [PATCH v2 bpf-next 1/7] libbpf: add BPF-side of USDT support Andrii Nakryiko
  2022-04-02  0:29 ` [PATCH v2 bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration Andrii Nakryiko
@ 2022-04-02  0:29 ` Andrii Nakryiko
  2022-04-02  0:29 ` [PATCH v2 bpf-next 4/7] libbpf: wire up spec management and other arch-independent USDT logic Andrii Nakryiko
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Andrii Nakryiko @ 2022-04-02  0:29 UTC (permalink / raw)
  To: bpf, ast, daniel
  Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky, Hengqi Chen

Implement architecture-agnostic parts of USDT parsing logic. The code is
the documentation in this case, it's futile to try to succinctly
describe how USDT parsing is done in any sort of concreteness. But
still, USDTs are recorded in special ELF notes section (.note.stapsdt),
where each USDT call site is described separately. Along with USDT
provider and USDT name, each such note contains USDT argument
specification, which uses assembly-like syntax to describe how to fetch
value of USDT argument. USDT arg spec could be just a constant, or
a register, or a register dereference (most common cases in x86_64), but
it technically can be much more complicated cases, like offset relative
to global symbol and stuff like that. One of the later patches will
implement most common subset of this for x86 and x86-64 architectures,
which seems to handle a lot of real-world production application.

USDT arg spec contains a compact encoding allowing usdt.bpf.h from
previous patch to handle the above 3 cases. Instead of recording which
register might be needed, we encode register's offset within struct
pt_regs to simplify BPF-side implementation. USDT argument can be of
different byte sizes (1, 2, 4, and 8) and signed or unsigned. To handle
this, libbpf pre-calculates necessary bit shifts to do proper casting
and sign-extension in a short sequences of left and right shifts.

The rest is in the code with sometimes extensive comments and references
to external "documentation" for USDTs.

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/usdt.c | 582 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 581 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
index 9476f7a15769..c9eff690e291 100644
--- a/tools/lib/bpf/usdt.c
+++ b/tools/lib/bpf/usdt.c
@@ -180,10 +180,56 @@
  * gain.
  */
 
+#define USDT_BASE_SEC ".stapsdt.base"
+#define USDT_SEMA_SEC ".probes"
+#define USDT_NOTE_SEC  ".note.stapsdt"
+#define USDT_NOTE_TYPE 3
+#define USDT_NOTE_NAME "stapsdt"
+
+/* should match exactly enum __bpf_usdt_arg_type from bpf_usdt.bpf.h */
+enum usdt_arg_type {
+	USDT_ARG_CONST,
+	USDT_ARG_REG,
+	USDT_ARG_REG_DEREF,
+};
+
+/* should match exactly struct __bpf_usdt_arg_spec from bpf_usdt.bpf.h */
+struct usdt_arg_spec {
+	__u64 val_off;
+	enum usdt_arg_type arg_type;
+	short reg_off;
+	bool arg_signed;
+	char arg_bitshift;
+};
+
+/* should match BPF_USDT_MAX_ARG_CNT in usdt.bpf.h */
+#define USDT_MAX_ARG_CNT 12
+
+/* should match struct __bpf_usdt_spec from usdt.bpf.h */
+struct usdt_spec {
+	struct usdt_arg_spec args[USDT_MAX_ARG_CNT];
+	__u64 usdt_cookie;
+	short arg_cnt;
+};
+
+struct usdt_note {
+	const char *provider;
+	const char *name;
+	/* USDT args specification string, e.g.:
+	 * "-4@%esi -4@-24(%rbp) -4@%ecx 2@%ax 8@%rdx"
+	 */
+	const char *args;
+	long loc_addr;
+	long base_addr;
+	long sema_addr;
+};
+
 struct usdt_target {
 	long abs_ip;
 	long rel_ip;
 	long sema_off;
+	struct usdt_spec spec;
+	const char *spec_str;
 };
 
 struct usdt_manager {
@@ -289,11 +335,450 @@ static int sanity_check_usdt_elf(Elf *elf, const char *path)
 	return 0;
 }
 
+static int find_elf_sec_by_name(Elf *elf, const char *sec_name, GElf_Shdr *shdr, Elf_Scn **scn)
+{
+	Elf_Scn *sec = NULL;
+	size_t shstrndx;
+
+	if (elf_getshdrstrndx(elf, &shstrndx))
+		return -EINVAL;
+
+	/* check if ELF is corrupted and avoid calling elf_strptr if yes */
+	if (!elf_rawdata(elf_getscn(elf, shstrndx), NULL))
+		return -EINVAL;
+
+	while ((sec = elf_nextscn(elf, sec)) != NULL) {
+		char *name;
+
+		if (!gelf_getshdr(sec, shdr))
+			return -EINVAL;
+
+		name = elf_strptr(elf, shstrndx, shdr->sh_name);
+		if (name && strcmp(sec_name, name) == 0) {
+			*scn = sec;
+			return 0;
+		}
+	}
+
+	return -ENOENT;
+}
+
+struct elf_seg {
+	long start;
+	long end;
+	long offset;
+	bool is_exec;
+};
+
+static int cmp_elf_segs(const void *_a, const void *_b)
+{
+	const struct elf_seg *a = _a;
+	const struct elf_seg *b = _b;
+
+	return a->start < b->start ? -1 : 1;
+}
+
+static int parse_elf_segs(Elf *elf, const char *path, struct elf_seg **segs, size_t *seg_cnt)
+{
+	GElf_Phdr phdr;
+	size_t n;
+	int i, err;
+	struct elf_seg *seg;
+	void *tmp;
+
+	*seg_cnt = 0;
+
+	if (elf_getphdrnum(elf, &n)) {
+		err = -errno;
+		return err;
+	}
+
+	for (i = 0; i < n; i++) {
+		if (!gelf_getphdr(elf, i, &phdr)) {
+			err = -errno;
+			return err;
+		}
+
+		pr_debug("usdt: discovered PHDR #%d in '%s': vaddr 0x%lx memsz 0x%lx offset 0x%lx type 0x%lx flags 0x%lx\n",
+			 i, path, (long)phdr.p_vaddr, (long)phdr.p_memsz, (long)phdr.p_offset,
+			 (long)phdr.p_type, (long)phdr.p_flags);
+		if (phdr.p_type != PT_LOAD)
+			continue;
+
+		tmp = libbpf_reallocarray(*segs, *seg_cnt + 1, sizeof(**segs));
+		if (!tmp)
+			return -ENOMEM;
+
+		*segs = tmp;
+		seg = *segs + *seg_cnt;
+		(*seg_cnt)++;
+
+		seg->start = phdr.p_vaddr;
+		seg->end = phdr.p_vaddr + phdr.p_memsz;
+		seg->offset = phdr.p_offset;
+		seg->is_exec = phdr.p_flags & PF_X;
+	}
+
+	if (*seg_cnt == 0) {
+		pr_warn("usdt: failed to find PT_LOAD program headers in '%s'\n", path);
+		return -ESRCH;
+	}
+
+	qsort(*segs, *seg_cnt, sizeof(**segs), cmp_elf_segs);
+	return 0;
+}
+
+static int parse_lib_segs(int pid, const char *lib_path, struct elf_seg **segs, size_t *seg_cnt)
+{
+	char path[PATH_MAX], line[PATH_MAX], mode[16];
+	size_t seg_start, seg_end, seg_off;
+	struct elf_seg *seg;
+	int tmp_pid, i, err;
+	FILE *f;
+
+	*seg_cnt = 0;
+
+	/* Handle containerized binaries only accessible from
+	 * /proc/<pid>/root/<path>. They will be reported as just /<path> in
+	 * /proc/<pid>/maps.
+	 */
+	if (sscanf(lib_path, "/proc/%d/root%s", &tmp_pid, path) == 2 && pid == tmp_pid)
+		goto proceed;
+
+	if (!realpath(lib_path, path)) {
+		pr_warn("usdt: failed to get absolute path of '%s' (err %d), using path as is...\n",
+			lib_path, -errno);
+		strcpy(path, lib_path);
+	}
+
+proceed:
+	sprintf(line, "/proc/%d/maps", pid);
+	f = fopen(line, "r");
+	if (!f) {
+		err = -errno;
+		pr_warn("usdt: failed to open '%s' to get base addr of '%s': %d\n",
+			line, lib_path, err);
+		return err;
+	}
+
+	/* We need to handle lines with no path at the end:
+	 *
+	 * 7f5c6f5d1000-7f5c6f5d3000 rw-p 001c7000 08:04 21238613      /usr/lib64/libc-2.17.so
+	 * 7f5c6f5d3000-7f5c6f5d8000 rw-p 00000000 00:00 0
+	 * 7f5c6f5d8000-7f5c6f5d9000 r-xp 00000000 103:01 362990598    /data/users/andriin/linux/tools/bpf/usdt/libhello_usdt.so
+	 */
+	while (fscanf(f, "%zx-%zx %s %zx %*s %*d%[^\n]\n",
+		      &seg_start, &seg_end, mode, &seg_off, line) == 5) {
+		void *tmp;
+
+		/* to handle no path case (see above) we need to capture line
+		 * without skipping any whitespaces. So we need to strip
+		 * leading whitespaces manually here
+		 */
+		i = 0;
+		while (isblank(line[i]))
+			i++;
+		if (strcmp(line + i, path) != 0)
+			continue;
+
+		pr_debug("usdt: discovered segment for lib '%s': addrs %zx-%zx mode %s offset %zx\n",
+			 path, seg_start, seg_end, mode, seg_off);
+
+		/* ignore non-executable sections for shared libs */
+		if (mode[2] != 'x')
+			continue;
+
+		tmp = libbpf_reallocarray(*segs, *seg_cnt + 1, sizeof(**segs));
+		if (!tmp) {
+			err = -ENOMEM;
+			goto err_out;
+		}
+
+		*segs = tmp;
+		seg = *segs + *seg_cnt;
+		*seg_cnt += 1;
+
+		seg->start = seg_start;
+		seg->end = seg_end;
+		seg->offset = seg_off;
+		seg->is_exec = true;
+	}
+
+	if (*seg_cnt == 0) {
+		pr_warn("usdt: failed to find '%s' (resolved to '%s') within PID %d memory mappings\n",
+			lib_path, path, pid);
+		err = -ESRCH;
+		goto err_out;
+	}
+
+	qsort(*segs, *seg_cnt, sizeof(**segs), cmp_elf_segs);
+	err = 0;
+err_out:
+	fclose(f);
+	return err;
+}
+
+static struct elf_seg *find_elf_seg(struct elf_seg *segs, size_t seg_cnt, long addr, bool relative)
+{
+	struct elf_seg *seg;
+	int i;
+
+	if (relative) {
+		/* for shared libraries, address is relative offset and thus
+		 * should be fall within logical offset-based range of
+		 * [offset_start, offset_end)
+		 */
+		for (i = 0, seg = segs; i < seg_cnt; i++, seg++) {
+			if (seg->offset <= addr && addr < seg->offset + (seg->end - seg->start))
+				return seg;
+		}
+	} else {
+		/* for binaries, address is absolute and thus should be within
+		 * absolute address range of [seg_start, seg_end)
+		 */
+		for (i = 0, seg = segs; i < seg_cnt; i++, seg++) {
+			if (seg->start <= addr && addr < seg->end)
+				return seg;
+		}
+	}
+
+	return NULL;
+}
+
+static int parse_usdt_note(Elf *elf, const char *path, long base_addr,
+			   GElf_Nhdr *nhdr, const char *data, size_t name_off, size_t desc_off,
+			   struct usdt_note *usdt_note);
+
+static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note, long usdt_cookie);
+
 static int collect_usdt_targets(struct usdt_manager *man, Elf *elf, const char *path, pid_t pid,
 				const char *usdt_provider, const char *usdt_name, long usdt_cookie,
 				struct usdt_target **out_targets, size_t *out_target_cnt)
 {
-	return -ENOTSUP;
+	size_t off, name_off, desc_off, seg_cnt = 0, lib_seg_cnt = 0, target_cnt = 0;
+	struct elf_seg *segs = NULL, *lib_segs = NULL;
+	struct usdt_target *targets = NULL, *target;
+	long base_addr = 0;
+	Elf_Scn *notes_scn, *base_scn;
+	GElf_Shdr base_shdr, notes_shdr;
+	GElf_Ehdr ehdr;
+	GElf_Nhdr nhdr;
+	Elf_Data *data;
+	int err;
+
+	*out_targets = NULL;
+	*out_target_cnt = 0;
+
+	err = find_elf_sec_by_name(elf, USDT_NOTE_SEC, &notes_shdr, &notes_scn);
+	if (err) {
+		pr_warn("usdt: no USDT notes section (%s) found in '%s'\n", USDT_NOTE_SEC, path);
+		return err;
+	}
+
+	if (notes_shdr.sh_type != SHT_NOTE || !gelf_getehdr(elf, &ehdr)) {
+		pr_warn("usdt: invalid USDT notes section (%s) in '%s'\n", USDT_NOTE_SEC, path);
+		return -EINVAL;
+	}
+
+	err = parse_elf_segs(elf, path, &segs, &seg_cnt);
+	if (err) {
+		pr_warn("usdt: failed to process ELF program segments for '%s': %d\n", path, err);
+		goto err_out;
+	}
+
+	/* .stapsdt.base ELF section is optional, but is used for prelink
+	 * offset compensation (see a big comment further below)
+	 */
+	if (find_elf_sec_by_name(elf, USDT_BASE_SEC, &base_shdr, &base_scn) == 0)
+		base_addr = base_shdr.sh_addr;
+
+	data = elf_getdata(notes_scn, 0);
+	off = 0;
+	while ((off = gelf_getnote(data, off, &nhdr, &name_off, &desc_off)) > 0) {
+		long usdt_abs_ip, usdt_rel_ip, usdt_sema_off = 0;
+		struct usdt_note note;
+		struct elf_seg *seg = NULL;
+		void *tmp;
+
+		err = parse_usdt_note(elf, path, base_addr, &nhdr,
+				      data->d_buf, name_off, desc_off, &note);
+		if (err)
+			goto err_out;
+
+		if (strcmp(note.provider, usdt_provider) != 0 || strcmp(note.name, usdt_name) != 0)
+			continue;
+
+		/* We need to compensate "prelink effect". See [0] for details,
+		 * relevant parts quoted here:
+		 *
+		 * Each SDT probe also expands into a non-allocated ELF note. You can
+		 * find this by looking at SHT_NOTE sections and decoding the format;
+		 * see below for details. Because the note is non-allocated, it means
+		 * there is no runtime cost, and also preserved in both stripped files
+		 * and .debug files.
+		 *
+		 * However, this means that prelink won't adjust the note's contents
+		 * for address offsets. Instead, this is done via the .stapsdt.base
+		 * section. This is a special section that is added to the text. We
+		 * will only ever have one of these sections in a final link and it
+		 * will only ever be one byte long. Nothing about this section itself
+		 * matters, we just use it as a marker to detect prelink address
+		 * adjustments.
+		 *
+		 * Each probe note records the link-time address of the .stapsdt.base
+		 * section alongside the probe PC address. The decoder compares the
+		 * base address stored in the note with the .stapsdt.base section's
+		 * sh_addr. Initially these are the same, but the section header will
+		 * be adjusted by prelink. So the decoder applies the difference to
+		 * the probe PC address to get the correct prelinked PC address; the
+		 * same adjustment is applied to the semaphore address, if any. 
+		 *
+		 *   [0] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
+		 */
+		usdt_rel_ip = usdt_abs_ip = note.loc_addr;
+		if (base_addr) {
+			usdt_abs_ip += base_addr - note.base_addr;
+			usdt_rel_ip += base_addr - note.base_addr;
+		}
+
+		if (ehdr.e_type == ET_EXEC) {
+			/* When attaching uprobes (which what USDTs basically
+			 * are) kernel expects a relative IP to be specified,
+			 * so if we are attaching to an executable ELF binary
+			 * (i.e., not a shared library), we need to calculate
+			 * proper relative IP based on ELF's load address
+			 */
+			seg = find_elf_seg(segs, seg_cnt, usdt_abs_ip, false /* relative */);
+			if (!seg) {
+				err = -ESRCH;
+				pr_warn("usdt: failed to find ELF program segment for '%s:%s' in '%s' at IP 0x%lx\n",
+					usdt_provider, usdt_name, path, usdt_abs_ip);
+				goto err_out;
+			}
+			if (!seg->is_exec) {
+				err = -ESRCH;
+				pr_warn("usdt: matched ELF binary '%s' segment [0x%lx, 0x%lx) for '%s:%s' at IP 0x%lx is not executable\n",
+				        path, seg->start, seg->end, usdt_provider, usdt_name,
+					usdt_abs_ip);
+				goto err_out;
+			}
+
+			usdt_rel_ip = usdt_abs_ip - (seg->start - seg->offset);
+		} else if (!man->has_bpf_cookie) { /* ehdr.e_type == ET_DYN */
+			/* If we don't have BPF cookie support but need to
+			 * attach to a shared library, we'll need to know and
+			 * record absolute addresses of attach points due to
+			 * the need to lookup USDT spec by absolute IP of
+			 * triggered uprobe. Doing this resolution is only
+			 * possible when we have a specific PID of the process
+			 * that's using specified shared library. BPF cookie
+			 * removes the absolute address limitation as we don't
+			 * need to do this lookup (we just use BPF cookie as
+			 * an index of USDT spec), so for newer kernels with
+			 * BPF cookie support libbpf supports USDT attachment
+			 * to shared libraries with no PID filter.
+			 */
+			if (pid < 0) {
+				pr_warn("usdt: attaching to shared libaries without specific PID is not supported on current kernel\n");
+				err = -ENOTSUP;
+				goto err_out;
+			}
+
+			/* lib_segs are lazily initialized only if necessary */
+			if (lib_seg_cnt == 0) {
+				err = parse_lib_segs(pid, path, &lib_segs, &lib_seg_cnt);
+				if (err) {
+					pr_warn("usdt: failed to get memory segments in PID %d for shared library '%s': %d\n",
+						pid, path, err);
+					goto err_out;
+				}
+			}
+
+			seg = find_elf_seg(lib_segs, lib_seg_cnt, usdt_rel_ip, true /* relative */);
+			if (!seg) {
+				err = -ESRCH;
+				pr_warn("usdt: failed to find shared lib memory segment for '%s:%s' in '%s' at relative IP 0x%lx\n",
+				         usdt_provider, usdt_name, path, usdt_rel_ip);
+				goto err_out;
+			}
+
+			usdt_abs_ip = seg->start + (usdt_rel_ip - seg->offset);
+		}
+
+		pr_debug("usdt: probe for '%s:%s' in %s '%s': addr 0x%lx base 0x%lx (resolved abs_ip 0x%lx rel_ip 0x%lx) args '%s' in segment [0x%lx, 0x%lx) at offset 0x%lx\n",
+			 usdt_provider, usdt_name, ehdr.e_type == ET_EXEC ? "exec" : "lib ", path,
+			 note.loc_addr, note.base_addr, usdt_abs_ip, usdt_rel_ip, note.args,
+			 seg ? seg->start : 0, seg ? seg->end : 0, seg ? seg->offset : 0);
+
+		/* Adjust semaphore address to be a relative offset */
+		if (note.sema_addr) {
+			if (!man->has_sema_refcnt) {
+				pr_warn("usdt: kernel doesn't support USDT semaphore refcounting for '%s:%s' in '%s'\n",
+					usdt_provider, usdt_name, path);
+				err = -ENOTSUP;
+				goto err_out;
+			}
+
+			seg = find_elf_seg(segs, seg_cnt, note.sema_addr, false /* relative */);
+			if (!seg) {
+				err = -ESRCH;
+				pr_warn("usdt: failed to find ELF loadable segment with semaphore of '%s:%s' in '%s' at 0x%lx\n",
+				        usdt_provider, usdt_name, path, note.sema_addr);
+				goto err_out;
+			}
+			if (seg->is_exec) {
+				err = -ESRCH;
+				pr_warn("usdt: matched ELF binary '%s' segment [0x%lx, 0x%lx] for semaphore of '%s:%s' at 0x%lx is executable\n",
+					path, seg->start, seg->end, usdt_provider, usdt_name,
+					note.sema_addr);
+				goto err_out;
+			}
+
+			usdt_sema_off = note.sema_addr - (seg->start - seg->offset);
+
+			pr_debug("usdt: sema  for '%s:%s' in %s '%s': addr 0x%lx base 0x%lx (resolved 0x%lx) in segment [0x%lx, 0x%lx] at offset 0x%lx\n",
+				 usdt_provider, usdt_name, ehdr.e_type == ET_EXEC ? "exec" : "lib ",
+				 path, note.sema_addr, note.base_addr, usdt_sema_off,
+				 seg->start, seg->end, seg->offset);
+		}
+
+		/* Record adjusted addresses and offsets and parse USDT spec */
+		tmp = libbpf_reallocarray(targets, target_cnt + 1, sizeof(*targets));
+		if (!tmp) {
+			err = -ENOMEM;
+			goto err_out;
+		}
+		targets = tmp;
+
+		target = &targets[target_cnt];
+		memset(target, 0, sizeof(*target));
+
+		target->abs_ip = usdt_abs_ip;
+		target->rel_ip = usdt_rel_ip;
+		target->sema_off = usdt_sema_off;
+
+		/* notes->args references strings from Elf itself, so they can
+		 * be referenced safely until elf_end() call
+		 */
+		target->spec_str = note.args;
+
+		err = parse_usdt_spec(&target->spec, &note, usdt_cookie);
+		if (err)
+			goto err_out;
+
+		target_cnt++;
+	}
+
+	*out_targets = targets;
+	*out_target_cnt = target_cnt;
+	err = target_cnt;
+
+err_out:
+	free(segs);
+	free(lib_segs);
+	if (err < 0)
+		free(targets);
+	return err;
 }
 
 struct bpf_link_usdt {
@@ -411,6 +896,7 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
 		link->uprobe_cnt++;
 	}
 
+	free(targets);
 	elf_end(elf);
 	close(fd);
 
@@ -419,8 +905,102 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
 err_out:
 	bpf_link__destroy(&link->link);
 
+	free(targets);
 	if (elf)
 		elf_end(elf);
 	close(fd);
 	return libbpf_err_ptr(err);
 }
+
+/* Parse out USDT ELF note from '.note.stapsdt' section.
+ * Logic inspired by perf's code.
+ */
+static int parse_usdt_note(Elf *elf, const char *path, long base_addr,
+			   GElf_Nhdr *nhdr, const char *data, size_t name_off, size_t desc_off,
+			   struct usdt_note *note)
+{
+	const char *provider, *name, *args;
+	long addrs[3];
+	size_t len;
+
+	/* sanity check USDT note name and type first */
+	if (strncmp(data + name_off, USDT_NOTE_NAME, nhdr->n_namesz) != 0)
+		return -EINVAL;
+	if (nhdr->n_type != USDT_NOTE_TYPE)
+		return -EINVAL;
+
+	/* sanity check USDT note contents ("description" in ELF terminology) */
+	len = nhdr->n_descsz;
+	data = data + desc_off;
+
+	/* +3 is the very minimum required to store three empty strings */
+	if (len < sizeof(addrs) + 3)
+		return -EINVAL;
+
+	/* get location, base, and semaphore addrs */
+	memcpy(&addrs, data, sizeof(addrs));
+
+	/* parse string fields: provider, name, args */
+	provider = data + sizeof(addrs);
+
+	name = (const char *)memchr(provider, '\0', data + len - provider);
+	if (!name) /* non-zero-terminated provider */
+		return -EINVAL;
+	name++;
+	if (name >= data + len || *name == '\0') /* missing or empty name */
+		return -EINVAL;
+
+	args = memchr(name, '\0', data + len - name);
+	if (!args) /* non-zero-terminated name */
+		return -EINVAL;
+	++args;
+	if (args >= data + len) /* missing arguments spec */
+		return -EINVAL;
+
+	note->provider = provider;
+	note->name = name;
+	if (*args == '\0' || *args == ':')
+		note->args = "";
+	else
+		note->args = args;
+	note->loc_addr = addrs[0];
+	note->base_addr = addrs[1];
+	note->sema_addr = addrs[2];
+
+	return 0;
+}
+
+static int parse_usdt_arg(const char *arg_str, int arg_num, struct usdt_arg_spec *arg);
+
+static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note, long usdt_cookie)
+{
+	const char *s;
+	int len;
+
+	spec->usdt_cookie = usdt_cookie;
+	spec->arg_cnt = 0;
+
+	s = note->args;
+	while (s[0]) {
+		if (spec->arg_cnt >= USDT_MAX_ARG_CNT) {
+			pr_warn("usdt: too many USDT arguments (> %d) for '%s:%s' with args spec '%s'\n",
+				USDT_MAX_ARG_CNT, note->provider, note->name, note->args);
+			return -E2BIG;
+		}
+
+		len = parse_usdt_arg(s, spec->arg_cnt, &spec->args[spec->arg_cnt]);
+		if (len < 0)
+			return len;
+
+		s += len;
+		spec->arg_cnt++;
+	}
+
+	return 0;
+}
+
+static int parse_usdt_arg(const char *arg_str, int arg_num, struct usdt_arg_spec *arg)
+{
+	pr_warn("usdt: libbpf doesn't support USDTs on current architecture\n");
+	return -ENOTSUP;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 bpf-next 4/7] libbpf: wire up spec management and other arch-independent USDT logic
  2022-04-02  0:29 [PATCH v2 bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2022-04-02  0:29 ` [PATCH v2 bpf-next 3/7] libbpf: add USDT notes parsing and resolution logic Andrii Nakryiko
@ 2022-04-02  0:29 ` Andrii Nakryiko
  2022-04-04 19:58   ` Dave Marchevsky
  2022-04-02  0:29 ` [PATCH v2 bpf-next 5/7] libbpf: add x86-specific USDT arg spec parsing logic Andrii Nakryiko
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Andrii Nakryiko @ 2022-04-02  0:29 UTC (permalink / raw)
  To: bpf, ast, daniel
  Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky, Hengqi Chen

Last part of architecture-agnostic user-space USDT handling logic is to
set up BPF spec and, optionally, IP-to-ID maps from user-space.
usdt_manager performs a compact spec ID allocation to utilize
fixed-sized BPF maps as efficiently as possible. We also use hashmap to
deduplicate USDT arg spec strings and map identical strings to single
USDT spec, minimizing the necessary BPF map size. usdt_manager supports
arbitrary sequences of attachment and detachment, both of the same USDT
and multiple different USDTs and internally maintains a free list of
unused spec IDs. bpf_link_usdt's logic is extended with proper setup and
teardown of this spec ID free list and supporting BPF maps.

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/usdt.c | 168 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 167 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
index c9eff690e291..afbae742c081 100644
--- a/tools/lib/bpf/usdt.c
+++ b/tools/lib/bpf/usdt.c
@@ -236,6 +236,10 @@ struct usdt_manager {
 	struct bpf_map *specs_map;
 	struct bpf_map *ip_to_spec_id_map;
 
+	int *free_spec_ids;
+	size_t free_spec_cnt;
+	size_t next_free_spec_id;
+
 	bool has_bpf_cookie;
 	bool has_sema_refcnt;
 };
@@ -280,6 +284,7 @@ void usdt_manager_free(struct usdt_manager *man)
 	if (IS_ERR_OR_NULL(man))
 		return;
 
+	free(man->free_spec_ids);
 	free(man);
 }
 
@@ -786,6 +791,9 @@ struct bpf_link_usdt {
 
 	struct usdt_manager *usdt_man;
 
+	size_t spec_cnt;
+	int *spec_ids;
+
 	size_t uprobe_cnt;
 	struct {
 		long abs_ip;
@@ -796,11 +804,52 @@ struct bpf_link_usdt {
 static int bpf_link_usdt_detach(struct bpf_link *link)
 {
 	struct bpf_link_usdt *usdt_link = container_of(link, struct bpf_link_usdt, link);
+	struct usdt_manager *man = usdt_link->usdt_man;
 	int i;
 
 	for (i = 0; i < usdt_link->uprobe_cnt; i++) {
 		/* detach underlying uprobe link */
 		bpf_link__destroy(usdt_link->uprobes[i].link);
+		/* there is no need to update specs map because it will be
+		 * unconditionally overwritten on subsequent USDT attaches,
+		 * but if BPF cookies are not used we need to remove entry
+		 * from ip_to_spec_id map, otherwise we'll run into false
+		 * conflicting IP errors
+		 */
+		if (!man->has_bpf_cookie) {
+			/* not much we can do about errors here */
+			(void)bpf_map_delete_elem(bpf_map__fd(man->ip_to_spec_id_map),
+						  &usdt_link->uprobes[i].abs_ip);
+		}
+	}
+
+	/* try to return the list of previously used spec IDs to usdt_manager
+	 * for future reuse for subsequent USDT attaches
+	 */
+	if (!man->free_spec_ids) {
+		/* if there were no free spec IDs yet, just transfer our IDs */
+		man->free_spec_ids = usdt_link->spec_ids;
+		man->free_spec_cnt = usdt_link->spec_cnt;
+		usdt_link->spec_ids = NULL;
+	} else {
+		/* otherwise concat IDs */
+		size_t new_cnt = man->free_spec_cnt + usdt_link->spec_cnt;
+		int *new_free_ids;
+
+		new_free_ids = libbpf_reallocarray(man->free_spec_ids, new_cnt,
+						   sizeof(*new_free_ids));
+		/* If we couldn't resize free_spec_ids, we'll just leak
+		 * a bunch of free IDs; this is very unlikely to happen and if
+		 * system is so exausted on memory, it's the least of user's
+		 * concerns, probably.
+		 * So just do our best here to return those IDs to usdt_manager.
+		 */
+		if (new_free_ids) {
+			memcpy(new_free_ids + man->free_spec_cnt, usdt_link->spec_ids,
+			       usdt_link->spec_cnt * sizeof(*usdt_link->spec_ids));
+			man->free_spec_ids = new_free_ids;
+			man->free_spec_cnt = new_cnt;
+		}
 	}
 
 	return 0;
@@ -810,22 +859,96 @@ static void bpf_link_usdt_dealloc(struct bpf_link *link)
 {
 	struct bpf_link_usdt *usdt_link = container_of(link, struct bpf_link_usdt, link);
 
+	free(usdt_link->spec_ids);
 	free(usdt_link->uprobes);
 	free(usdt_link);
 }
 
+static size_t specs_hash_fn(const void *key, void *ctx)
+{
+	const char *s = key;
+
+	return str_hash(s);
+}
+
+static bool specs_equal_fn(const void *key1, const void *key2, void *ctx)
+{
+	const char *s1 = key1;
+	const char *s2 = key2;
+
+	return strcmp(s1, s2) == 0;
+}
+
+static int allocate_spec_id(struct usdt_manager *man, struct hashmap *specs_hash,
+			    struct bpf_link_usdt *link, struct usdt_target *target,
+			    int *spec_id, bool *is_new)
+{
+	void *tmp;
+	int err;
+
+	/* check if we already allocated spec ID for this spec string */
+	if (hashmap__find(specs_hash, target->spec_str, &tmp)) {
+		*spec_id = (long)tmp;
+		*is_new = false;
+		return 0;
+	}
+
+	/* otherwise it's a new ID that needs to be set up in specs map and
+	 * returned back to usdt_manager when USDT link is detached
+	 */
+	tmp = libbpf_reallocarray(link->spec_ids, link->spec_cnt + 1, sizeof(*link->spec_ids));
+	if (!tmp)
+		return -ENOMEM;
+	link->spec_ids = tmp;
+
+	/* get next free spec ID, giving preference to free list, if not empty */
+	if (man->free_spec_cnt) {
+		*spec_id = man->free_spec_ids[man->free_spec_cnt - 1];
+
+		/* cache spec ID for current spec string for future lookups */
+		err = hashmap__add(specs_hash, target->spec_str, (void *)(long)*spec_id);
+		if (err)
+			 return err;
+
+		man->free_spec_cnt--;
+	} else {
+		/* don't allocate spec ID bigger than what fits in specs map */
+		if (man->next_free_spec_id >= bpf_map__max_entries(man->specs_map))
+			return -E2BIG;
+
+		*spec_id = man->next_free_spec_id;
+
+		/* cache spec ID for current spec string for future lookups */
+		err = hashmap__add(specs_hash, target->spec_str, (void *)(long)*spec_id);
+		if (err)
+			 return err;
+
+		man->next_free_spec_id++;
+	}
+
+	/* remember new spec ID in the link for later return back to free list on detach */
+	link->spec_ids[link->spec_cnt] = *spec_id;
+	link->spec_cnt++;
+	*is_new = true;
+	return 0;
+}
+
 struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct bpf_program *prog,
 					  pid_t pid, const char *path,
 					  const char *usdt_provider, const char *usdt_name,
 					  long usdt_cookie)
 {
-	int i, fd, err;
+	int i, fd, err, spec_map_fd, ip_map_fd;
 	LIBBPF_OPTS(bpf_uprobe_opts, opts);
+	struct hashmap *specs_hash = NULL;
 	struct bpf_link_usdt *link = NULL;
 	struct usdt_target *targets = NULL;
 	size_t target_cnt;
 	Elf *elf;
 
+	spec_map_fd = bpf_map__fd(man->specs_map);
+	ip_map_fd = bpf_map__fd(man->ip_to_spec_id_map);
+
 	/* TODO: perform path resolution similar to uprobe's */
 	fd = open(path, O_RDONLY);
 	if (fd < 0) {
@@ -861,6 +984,12 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
 		goto err_out;
 	}
 
+	specs_hash = hashmap__new(specs_hash_fn, specs_equal_fn, NULL);
+	if (IS_ERR(specs_hash)) {
+		err = PTR_ERR(specs_hash);
+		goto err_out;
+	}
+
 	link = calloc(1, sizeof(*link));
 	if (!link) {
 		err = -ENOMEM;
@@ -880,8 +1009,43 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
 	for (i = 0; i < target_cnt; i++) {
 		struct usdt_target *target = &targets[i];
 		struct bpf_link *uprobe_link;
+		bool is_new;
+		int spec_id;
+
+		/* Spec ID can be either reused or newly allocated. If it is
+		 * newly allocated, we'll need to fill out spec map, otherwise
+		 * entire spec should be valid and can be just used by a new
+		 * uprobe. We reuse spec when USDT arg spec is identical. We
+		 * also never share specs between two different USDT
+		 * attachments ("links"), so all the reused specs already
+		 * share USDT cookie value implicitly.
+		 */
+		err = allocate_spec_id(man, specs_hash, link, target, &spec_id, &is_new);
+		if (err)
+			goto err_out;
+
+		if (is_new && bpf_map_update_elem(spec_map_fd, &spec_id, &target->spec, BPF_ANY)) {
+			err = -errno;
+			pr_warn("usdt: failed to set USDT spec #%d for '%s:%s' in '%s': %d\n",
+				spec_id, usdt_provider, usdt_name, path, err);
+			goto err_out;
+		}
+		if (!man->has_bpf_cookie &&
+		    bpf_map_update_elem(ip_map_fd, &target->abs_ip, &spec_id, BPF_NOEXIST)) {
+			err = -errno;
+			if (err == -EEXIST) {
+				pr_warn("usdt: IP collision detected for spec #%d for '%s:%s' in '%s'\n",
+				        spec_id, usdt_provider, usdt_name, path);
+			} else {
+				pr_warn("usdt: failed to map IP 0x%lx to spec #%d for '%s:%s' in '%s': %d\n",
+					target->abs_ip, spec_id, usdt_provider, usdt_name,
+					path, err);
+			}
+			goto err_out;
+		}
 
 		opts.ref_ctr_offset = target->sema_off;
+		opts.bpf_cookie = man->has_bpf_cookie ? spec_id : 0;
 		uprobe_link = bpf_program__attach_uprobe_opts(prog, pid, path,
 							      target->rel_ip, &opts);
 		err = libbpf_get_error(uprobe_link);
@@ -897,6 +1061,7 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
 	}
 
 	free(targets);
+	hashmap__free(specs_hash);
 	elf_end(elf);
 	close(fd);
 
@@ -906,6 +1071,7 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
 	bpf_link__destroy(&link->link);
 
 	free(targets);
+	hashmap__free(specs_hash);
 	if (elf)
 		elf_end(elf);
 	close(fd);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 bpf-next 5/7] libbpf: add x86-specific USDT arg spec parsing logic
  2022-04-02  0:29 [PATCH v2 bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2022-04-02  0:29 ` [PATCH v2 bpf-next 4/7] libbpf: wire up spec management and other arch-independent USDT logic Andrii Nakryiko
@ 2022-04-02  0:29 ` Andrii Nakryiko
  2022-04-04  5:07   ` Dave Marchevsky
  2022-04-02  0:29 ` [PATCH v2 bpf-next 6/7] selftests/bpf: add basic USDT selftests Andrii Nakryiko
  2022-04-02  0:29 ` [PATCH v2 bpf-next 7/7] selftests/bpf: add urandom_read shared lib and USDTs Andrii Nakryiko
  6 siblings, 1 reply; 19+ messages in thread
From: Andrii Nakryiko @ 2022-04-02  0:29 UTC (permalink / raw)
  To: bpf, ast, daniel
  Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky, Hengqi Chen

Add x86/x86_64-specific USDT argument specification parsing. Each
architecture will require their own logic, as all this is arch-specific
assembly-based notation. Architectures that libbpf doesn't support for
USDTs will pr_warn() with specific error and return -ENOTSUP.

We use sscanf() as a very powerful and easy to use string parser. Those
spaces in sscanf's format string mean "skip any whitespaces", which is
pretty nifty (and somewhat little known) feature.

All this was tested on little-endian architecture, so bit shifts are
probably off on big-endian, which our CI will hopefully prove.

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/usdt.c | 105 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 105 insertions(+)

diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
index afbae742c081..c195da29964b 100644
--- a/tools/lib/bpf/usdt.c
+++ b/tools/lib/bpf/usdt.c
@@ -1165,8 +1165,113 @@ static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note,
 	return 0;
 }
 
+/* Architecture-specific logic for parsing USDT argument location specs */
+
+#if defined(__x86_64__) || defined(__i386__)
+
+static int calc_pt_regs_off(const char *reg_name)
+{
+	static struct {
+		const char *names[4];
+		size_t pt_regs_off;
+	} reg_map[] = {
+#if __x86_64__
+#define reg_off(reg64, reg32) offsetof(struct pt_regs, reg64)
+#else
+#define reg_off(reg64, reg32) offsetof(struct pt_regs, reg32)
+#endif
+		{ {"rip", "eip", "", ""}, reg_off(rip, eip) },
+		{ {"rax", "eax", "ax", "al"}, reg_off(rax, eax) },
+		{ {"rbx", "ebx", "bx", "bl"}, reg_off(rbx, ebx) },
+		{ {"rcx", "ecx", "cx", "cl"}, reg_off(rcx, ecx) },
+		{ {"rdx", "edx", "dx", "dl"}, reg_off(rdx, edx) },
+		{ {"rsi", "esi", "si", "sil"}, reg_off(rsi, esi) },
+		{ {"rdi", "edi", "di", "dil"}, reg_off(rdi, edi) },
+		{ {"rbp", "ebp", "bp", "bpl"}, reg_off(rbp, ebp) },
+		{ {"rsp", "esp", "sp", "spl"}, reg_off(rsp, esp) },
+#undef reg_off
+#if __x86_64__
+		{ {"r8", "r8d", "r8w", "r8b"}, offsetof(struct pt_regs, r8) },
+		{ {"r9", "r9d", "r9w", "r9b"}, offsetof(struct pt_regs, r9) },
+		{ {"r10", "r10d", "r10w", "r10b"}, offsetof(struct pt_regs, r10) },
+		{ {"r11", "r11d", "r11w", "r11b"}, offsetof(struct pt_regs, r11) },
+		{ {"r12", "r12d", "r12w", "r12b"}, offsetof(struct pt_regs, r12) },
+		{ {"r13", "r13d", "r13w", "r13b"}, offsetof(struct pt_regs, r13) },
+		{ {"r14", "r14d", "r14w", "r14b"}, offsetof(struct pt_regs, r14) },
+		{ {"r15", "r15d", "r15w", "r15b"}, offsetof(struct pt_regs, r15) },
+#endif
+	};
+	int i, j;
+
+	for (i = 0; i < ARRAY_SIZE(reg_map); i++) {
+		for (j = 0; j < ARRAY_SIZE(reg_map[i].names); j++) {
+			if (strcmp(reg_name, reg_map[i].names[j]) == 0)
+				return reg_map[i].pt_regs_off;
+		}
+	}
+
+	pr_warn("usdt: unrecognized register '%s'\n", reg_name);
+	return -ENOENT;
+}
+
+static int parse_usdt_arg(const char *arg_str, int arg_num, struct usdt_arg_spec *arg)
+{
+	char *reg_name = NULL;
+	int arg_sz, len, reg_off;
+	long off;
+
+	if (sscanf(arg_str, " %d @ %ld ( %%%m[^)] ) %n", &arg_sz, &off, &reg_name, &len) == 3) {
+		/* Memory dereference case, e.g., -4@-20(%rbp) */
+		arg->arg_type = USDT_ARG_REG_DEREF;
+		arg->val_off = off;
+		reg_off = calc_pt_regs_off(reg_name);
+		free(reg_name);
+		if (reg_off < 0)
+			return reg_off;
+		arg->reg_off = reg_off;
+	} else if (sscanf(arg_str, " %d @ %%%ms %n", &arg_sz, &reg_name, &len) == 2) {
+		/* Register read case, e.g., -4@%eax */
+		arg->arg_type = USDT_ARG_REG;
+		arg->val_off = 0;
+
+		reg_off = calc_pt_regs_off(reg_name);
+		free(reg_name);
+		if (reg_off < 0)
+			return reg_off;
+		arg->reg_off = reg_off;
+	} else if (sscanf(arg_str, " %d @ $%ld %n", &arg_sz, &off, &len) == 2) {
+		/* Constant value case, e.g., 4@$71 */
+		arg->arg_type = USDT_ARG_CONST;
+		arg->val_off = off;
+		arg->reg_off = 0;
+	} else {
+		pr_warn("usdt: unrecognized arg #%d spec '%s'\n", arg_num, arg_str);
+		return -EINVAL;
+	}
+
+	arg->arg_signed = arg_sz < 0;
+	if (arg_sz < 0)
+		arg_sz = -arg_sz;
+
+	switch (arg_sz) {
+	case 1: case 2: case 4: case 8:
+		arg->arg_bitshift = 64 - arg_sz * 8;
+		break;
+	default:
+		pr_warn("usdt: unsupported arg #%d (spec '%s') size: %d\n",
+			arg_num, arg_str, arg_sz);
+		return -EINVAL;
+	}
+
+	return len;
+}
+
+#else
+
 static int parse_usdt_arg(const char *arg_str, int arg_num, struct usdt_arg_spec *arg)
 {
 	pr_warn("usdt: libbpf doesn't support USDTs on current architecture\n");
 	return -ENOTSUP;
 }
+
+#endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 bpf-next 6/7] selftests/bpf: add basic USDT selftests
  2022-04-02  0:29 [PATCH v2 bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2022-04-02  0:29 ` [PATCH v2 bpf-next 5/7] libbpf: add x86-specific USDT arg spec parsing logic Andrii Nakryiko
@ 2022-04-02  0:29 ` Andrii Nakryiko
  2022-04-04  5:48   ` Dave Marchevsky
  2022-04-02  0:29 ` [PATCH v2 bpf-next 7/7] selftests/bpf: add urandom_read shared lib and USDTs Andrii Nakryiko
  6 siblings, 1 reply; 19+ messages in thread
From: Andrii Nakryiko @ 2022-04-02  0:29 UTC (permalink / raw)
  To: bpf, ast, daniel
  Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky, Hengqi Chen

Add semaphore-based USDT to test_progs itself and write basic tests to
valicate both auto-attachment and manual attachment logic, as well as
BPF-side functionality.

Also add subtests to validate that libbpf properly deduplicates USDT
specs and handles spec overflow situations correctly, as well as proper
"rollback" of partially-attached multi-spec USDT.

BPF-side of selftest intentionally consists of two files to validate
that usdt.bpf.h header can be included from multiple source code files
that are subsequently linked into final BPF object file without causing
any symbol duplication or other issues. We are validating that __weak
maps and bpf_usdt_xxx() API functions defined in usdt.bpf.h do work as
intended.

USDT selftests use sys/sdt.h header provided by systemtap-sdt-devel, so
document that build-time dependency in Documentation/bpf/bpf_devel_QA.rst.

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 Documentation/bpf/bpf_devel_QA.rst            |   3 +
 tools/testing/selftests/bpf/Makefile          |  14 +-
 tools/testing/selftests/bpf/prog_tests/usdt.c | 313 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/test_usdt.c |  96 ++++++
 .../selftests/bpf/progs/test_usdt_multispec.c |  34 ++
 5 files changed, 454 insertions(+), 6 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/usdt.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_usdt.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_usdt_multispec.c

diff --git a/Documentation/bpf/bpf_devel_QA.rst b/Documentation/bpf/bpf_devel_QA.rst
index 761474bd7fe6..8eb04076c3a3 100644
--- a/Documentation/bpf/bpf_devel_QA.rst
+++ b/Documentation/bpf/bpf_devel_QA.rst
@@ -465,6 +465,9 @@ https://fedorapeople.org/~acme/dwarves
 Some distros have pahole version 1.16 packaged already, e.g.
 Fedora, Gentoo.
 
+Some selftests rely on sys/sdt.h header defining USDT-related macros. You can
+get them by installing systemtap-sdt-devel package.
+
 Q: Which BPF kernel selftests version should I run my kernel against?
 ---------------------------------------------------------------------
 A: If you run a kernel ``xyz``, then always run the BPF kernel selftests
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 3820608faf57..0f8c55dfd844 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -328,12 +328,8 @@ SKEL_BLACKLIST := btf__% test_pinning_invalid.c test_sk_assign.c
 
 LINKED_SKELS := test_static_linked.skel.h linked_funcs.skel.h		\
 		linked_vars.skel.h linked_maps.skel.h 			\
-		test_subskeleton.skel.h test_subskeleton_lib.skel.h
-
-# In the subskeleton case, we want the test_subskeleton_lib.subskel.h file
-# but that's created as a side-effect of the skel.h generation.
-test_subskeleton.skel.h-deps := test_subskeleton_lib2.o test_subskeleton_lib.o test_subskeleton.o
-test_subskeleton_lib.skel.h-deps := test_subskeleton_lib2.o test_subskeleton_lib.o
+		test_subskeleton.skel.h test_subskeleton_lib.skel.h	\
+		test_usdt.skel.h
 
 LSKELS := kfunc_call_test.c fentry_test.c fexit_test.c fexit_sleep.c \
 	test_ringbuf.c atomics.c trace_printk.c trace_vprintk.c \
@@ -346,6 +342,11 @@ test_static_linked.skel.h-deps := test_static_linked1.o test_static_linked2.o
 linked_funcs.skel.h-deps := linked_funcs1.o linked_funcs2.o
 linked_vars.skel.h-deps := linked_vars1.o linked_vars2.o
 linked_maps.skel.h-deps := linked_maps1.o linked_maps2.o
+# In the subskeleton case, we want the test_subskeleton_lib.subskel.h file
+# but that's created as a side-effect of the skel.h generation.
+test_subskeleton.skel.h-deps := test_subskeleton_lib2.o test_subskeleton_lib.o test_subskeleton.o
+test_subskeleton_lib.skel.h-deps := test_subskeleton_lib2.o test_subskeleton_lib.o
+test_usdt.skel.h-deps := test_usdt.o test_usdt_multispec.o
 
 LINKED_BPF_SRCS := $(patsubst %.o,%.c,$(foreach skel,$(LINKED_SKELS),$($(skel)-deps)))
 
@@ -400,6 +401,7 @@ $(TRUNNER_BPF_OBJS): $(TRUNNER_OUTPUT)/%.o:				\
 		     $(TRUNNER_BPF_PROGS_DIR)/*.h			\
 		     $$(INCLUDE_DIR)/vmlinux.h				\
 		     $(wildcard $(BPFDIR)/bpf_*.h)			\
+		     $(wildcard $(BPFDIR)/*.bpf.h)			\
 		     | $(TRUNNER_OUTPUT) $$(BPFOBJ)
 	$$(call $(TRUNNER_BPF_BUILD_RULE),$$<,$$@,			\
 					  $(TRUNNER_BPF_CFLAGS))
diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
new file mode 100644
index 000000000000..d5a3125d4a4d
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
@@ -0,0 +1,313 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+
+#define _SDT_HAS_SEMAPHORES 1
+#include <sys/sdt.h>
+
+#include "test_usdt.skel.h"
+
+int lets_test_this(int);
+
+static volatile int idx = 2;
+static volatile __u64 bla = 0xFEDCBA9876543210ULL;
+static volatile short nums[] = {-1, -2, -3, };
+
+static volatile struct {
+	int x;
+	signed char y;
+} t1 = { 1, -127 };
+
+#define SEC(name) __attribute__((section(name), used))
+
+unsigned short test_usdt0_semaphore SEC(".probes");
+unsigned short test_usdt3_semaphore SEC(".probes");
+unsigned short test_usdt12_semaphore SEC(".probes");
+
+static void __always_inline trigger_func(int x) {
+	long y = 42;
+
+	if (test_usdt0_semaphore)
+		STAP_PROBE(test, usdt0);
+	if (test_usdt3_semaphore)
+		STAP_PROBE3(test, usdt3, x, y, &bla);
+	if (test_usdt12_semaphore) {
+		STAP_PROBE12(test, usdt12,
+			     x, x + 1, y, x + y, 5,
+			     y / 7, bla, &bla, -9, nums[x],
+			     nums[idx], t1.y);
+	}
+}
+
+static void subtest_basic_usdt(void)
+{
+	LIBBPF_OPTS(bpf_usdt_opts, opts);
+	struct test_usdt *skel;
+	struct test_usdt__bss *bss;
+	int err;
+
+	skel = test_usdt__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	bss = skel->bss;
+	bss->my_pid = getpid();
+
+	err = test_usdt__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto cleanup;
+
+	/* usdt0 won't be auto-attached */
+	opts.usdt_cookie = 0xcafedeadbeeffeed;
+	skel->links.usdt0 = bpf_program__attach_usdt(skel->progs.usdt0,
+						     0 /*self*/, "/proc/self/exe",
+						     "test", "usdt0", &opts);
+	if (!ASSERT_OK_PTR(skel->links.usdt0, "usdt0_link"))
+		goto cleanup;
+
+	trigger_func(1);
+
+	ASSERT_EQ(bss->usdt0_called, 1, "usdt0_called");
+	ASSERT_EQ(bss->usdt3_called, 1, "usdt3_called");
+	ASSERT_EQ(bss->usdt12_called, 1, "usdt12_called");
+
+	ASSERT_EQ(bss->usdt0_cookie, 0xcafedeadbeeffeed, "usdt0_cookie");
+	ASSERT_EQ(bss->usdt0_arg_cnt, 0, "usdt0_arg_cnt");
+	ASSERT_EQ(bss->usdt0_arg_ret, -ENOENT, "usdt0_arg_ret");
+
+	/* auto-attached usdt3 gets default zero cookie value */
+	ASSERT_EQ(bss->usdt3_cookie, 0, "usdt3_cookie");
+	ASSERT_EQ(bss->usdt3_arg_cnt, 3, "usdt3_arg_cnt");
+
+	ASSERT_EQ(bss->usdt3_arg_rets[0], 0, "usdt3_arg1_ret");
+	ASSERT_EQ(bss->usdt3_arg_rets[1], 0, "usdt3_arg2_ret");
+	ASSERT_EQ(bss->usdt3_arg_rets[2], 0, "usdt3_arg3_ret");
+	ASSERT_EQ(bss->usdt3_args[0], 1, "usdt3_arg1");
+	ASSERT_EQ(bss->usdt3_args[1], 42, "usdt3_arg2");
+	ASSERT_EQ(bss->usdt3_args[2], (uintptr_t)&bla, "usdt3_arg3");
+
+	/* auto-attached usdt12 gets default zero cookie value */
+	ASSERT_EQ(bss->usdt12_cookie, 0, "usdt12_cookie");
+	ASSERT_EQ(bss->usdt12_arg_cnt, 12, "usdt12_arg_cnt");
+
+	ASSERT_EQ(bss->usdt12_args[0], 1, "usdt12_arg1");
+	ASSERT_EQ(bss->usdt12_args[1], 1 + 1, "usdt12_arg2");
+	ASSERT_EQ(bss->usdt12_args[2], 42, "usdt12_arg3");
+	ASSERT_EQ(bss->usdt12_args[3], 42 + 1, "usdt12_arg4");
+	ASSERT_EQ(bss->usdt12_args[4], 5, "usdt12_arg5");
+	ASSERT_EQ(bss->usdt12_args[5], 42 / 7, "usdt12_arg6");
+	ASSERT_EQ(bss->usdt12_args[6], bla, "usdt12_arg7");
+	ASSERT_EQ(bss->usdt12_args[7], (uintptr_t)&bla, "usdt12_arg8");
+	ASSERT_EQ(bss->usdt12_args[8], -9, "usdt12_arg9");
+	ASSERT_EQ(bss->usdt12_args[9], nums[1], "usdt12_arg10");
+	ASSERT_EQ(bss->usdt12_args[10], nums[idx], "usdt12_arg11");
+	ASSERT_EQ(bss->usdt12_args[11], t1.y, "usdt12_arg12");
+
+	/* trigger_func() is marked __always_inline, so USDT invocations will be
+	 * inlined in two different places, meaning that each USDT will have
+	 * at least 2 different places to be attached to. This verifies that
+	 * bpf_program__attach_usdt() handles this properly and attaches to
+	 * all possible places of USDT invocation.
+	 */
+	trigger_func(2);
+
+	ASSERT_EQ(bss->usdt0_called, 2, "usdt0_called");
+	ASSERT_EQ(bss->usdt3_called, 2, "usdt3_called");
+	ASSERT_EQ(bss->usdt12_called, 2, "usdt12_called");
+
+	/* only check values that depend on trigger_func()'s input value */
+	ASSERT_EQ(bss->usdt3_args[0], 2, "usdt3_arg1");
+
+	ASSERT_EQ(bss->usdt12_args[0], 2, "usdt12_arg1");
+	ASSERT_EQ(bss->usdt12_args[1], 2 + 1, "usdt12_arg2");
+	ASSERT_EQ(bss->usdt12_args[3], 42 + 2, "usdt12_arg4");
+	ASSERT_EQ(bss->usdt12_args[9], nums[2], "usdt12_arg10");
+
+	/* detach and re-attach usdt3 */
+	bpf_link__destroy(skel->links.usdt3);
+
+	opts.usdt_cookie = 0xBADC00C51E;
+	skel->links.usdt3 = bpf_program__attach_usdt(skel->progs.usdt3, -1 /* any pid */,
+						     "/proc/self/exe", "test", "usdt3", &opts);
+	if (!ASSERT_OK_PTR(skel->links.usdt3, "usdt3_reattach"))
+		goto cleanup;
+
+	trigger_func(3);
+
+	ASSERT_EQ(bss->usdt3_called, 3, "usdt3_called");
+	/* this time usdt3 has custom cookie */
+	ASSERT_EQ(bss->usdt3_cookie, 0xBADC00C51E, "usdt3_cookie");
+	ASSERT_EQ(bss->usdt3_arg_cnt, 3, "usdt3_arg_cnt");
+
+	ASSERT_EQ(bss->usdt3_arg_rets[0], 0, "usdt3_arg1_ret");
+	ASSERT_EQ(bss->usdt3_arg_rets[1], 0, "usdt3_arg2_ret");
+	ASSERT_EQ(bss->usdt3_arg_rets[2], 0, "usdt3_arg3_ret");
+	ASSERT_EQ(bss->usdt3_args[0], 3, "usdt3_arg1");
+	ASSERT_EQ(bss->usdt3_args[1], 42, "usdt3_arg2");
+	ASSERT_EQ(bss->usdt3_args[2], (uintptr_t)&bla, "usdt3_arg3");
+
+cleanup:
+	test_usdt__destroy(skel);
+}
+
+unsigned short test_usdt_100_semaphore SEC(".probes");
+unsigned short test_usdt_300_semaphore SEC(".probes");
+unsigned short test_usdt_400_semaphore SEC(".probes");
+
+#define R10(F, X)  F(X+0); F(X+1);F(X+2); F(X+3); F(X+4); \
+		   F(X+5); F(X+6); F(X+7); F(X+8); F(X+9);
+#define R100(F, X) R10(F,X+ 0);R10(F,X+10);R10(F,X+20);R10(F,X+30);R10(F,X+40); \
+		   R10(F,X+50);R10(F,X+60);R10(F,X+70);R10(F,X+80);R10(F,X+90);
+
+/* carefully control that we get exactly 100 inlines by preventing inlining */
+static void __always_inline f100(int x)
+{
+	STAP_PROBE1(test, usdt_100, x);
+}
+
+__weak void trigger_100_usdts(void)
+{
+	R100(f100, 0);
+}
+
+/* we shouldn't be able to attach to test:usdt2_300 USDT as we don't have as
+ * many slots for specs. It's important that each STAP_PROBE2() invocation
+ * (after untolling) gets different arg spec due to compiler inlining i as
+ * a constant
+ */
+static void __always_inline f300(int x)
+{
+	STAP_PROBE1(test, usdt_300, x);
+}
+
+__weak void trigger_300_usdts(void)
+{
+	R100(f300, 0);
+	R100(f300, 100);
+	R100(f300, 200);
+}
+
+static void __always_inline f400(int x __attribute__((unused)))
+{
+	static int y;
+
+	STAP_PROBE1(test, usdt_400, y++);
+}
+
+/* this time we have 400 different USDT call sites, but they have uniform
+ * argument location, so libbpf's spec string deduplication logic should keep
+ * spec count use very small and so we should be able to attach to all 400
+ * call sites
+ */
+__weak void trigger_400_usdts(void)
+{
+	R100(f400, 0);
+	R100(f400, 100);
+	R100(f400, 200);
+	R100(f400, 300);
+}
+
+static void subtest_multispec_usdt(void)
+{
+	LIBBPF_OPTS(bpf_usdt_opts, opts);
+	struct test_usdt *skel;
+	struct test_usdt__bss *bss;
+	int err, i;
+
+	skel = test_usdt__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	bss = skel->bss;
+	bss->my_pid = getpid();
+
+	err = test_usdt__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto cleanup;
+
+	/* usdt_100 is auto-attached and there are 100 inlined call sites,
+	 * let's validate that all of them are properly attached to and
+	 * handled from BPF side
+	 */
+	trigger_100_usdts();
+
+	ASSERT_EQ(bss->usdt_100_called, 100, "usdt_100_called");
+	ASSERT_EQ(bss->usdt_100_sum, 99 * 100 / 2, "usdt_100_sum");
+
+	/* Stress test free spec ID tracking. By default libbpf allows up to
+	 * 256 specs to be used, so if we don't return free spec IDs back
+	 * after few detachments and re-attachments we should run out of
+	 * available spec IDs.
+	 */
+	for (i = 0; i < 2; i++) {
+		bpf_link__destroy(skel->links.usdt_100);
+
+		skel->links.usdt_100 = bpf_program__attach_usdt(skel->progs.usdt_100, -1,
+							        "/proc/self/exe",
+								"test", "usdt_100", NULL);
+		if (!ASSERT_OK_PTR(skel->links.usdt_100, "usdt_100_reattach"))
+			goto cleanup;
+
+		bss->usdt_100_sum = 0;
+		trigger_100_usdts();
+
+		ASSERT_EQ(bss->usdt_100_called, (i + 1) * 100 + 100, "usdt_100_called");
+		ASSERT_EQ(bss->usdt_100_sum, 99 * 100 / 2, "usdt_100_sum");
+	}
+
+	/* Now let's step it up and try to attach USDT that requires more than
+	 * 256 attach points with different specs for each.
+	 * Note that we need trigger_300_usdts() only to actually have 300
+	 * USDT call sites, we are not going to actually trace them.
+	 */
+	trigger_300_usdts();
+
+	/* we'll reuse usdt_100 BPF program for usdt_300 test */
+	bpf_link__destroy(skel->links.usdt_100);
+	skel->links.usdt_100 = bpf_program__attach_usdt(skel->progs.usdt_100, -1, "/proc/self/exe",
+							"test", "usdt_300", NULL);
+	err = -errno;
+	if (!ASSERT_ERR_PTR(skel->links.usdt_100, "usdt_300_bad_attach"))
+		goto cleanup;
+	ASSERT_EQ(err, -E2BIG, "usdt_300_attach_err");
+
+	/* let's check that there are no "dangling" BPF programs attached due
+	 * to partial success of the above test:usdt_300 attachment
+	 */
+	bss->usdt_100_called = 0;
+	bss->usdt_100_sum = 0;
+
+	f300(777); /* this is 301st instance of usdt_300 */
+
+	ASSERT_EQ(bss->usdt_100_called, 0, "usdt_301_called");
+	ASSERT_EQ(bss->usdt_100_sum, 0, "usdt_301_sum");
+
+	/* This time we have USDT with 400 inlined invocations, but arg specs
+	 * should be the same across all sites, so libbpf will only need to
+	 * use one spec and thus we'll be able to attach 400 uprobes
+	 * successfully.
+	 *
+	 * Again, we are reusing usdt_100 BPF program.
+	 */
+	skel->links.usdt_100 = bpf_program__attach_usdt(skel->progs.usdt_100, -1,
+							"/proc/self/exe",
+							"test", "usdt_400", NULL);
+	if (!ASSERT_OK_PTR(skel->links.usdt_100, "usdt_400_attach"))
+		goto cleanup;
+
+	trigger_400_usdts();
+
+	ASSERT_EQ(bss->usdt_100_called, 400, "usdt_400_called");
+	ASSERT_EQ(bss->usdt_100_sum, 399 * 400 / 2, "usdt_400_sum");
+
+cleanup:
+	test_usdt__destroy(skel);
+}
+
+void test_usdt(void)
+{
+	if (test__start_subtest("basic"))
+		subtest_basic_usdt();
+	if (test__start_subtest("multispec"))
+		subtest_multispec_usdt();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_usdt.c b/tools/testing/selftests/bpf/progs/test_usdt.c
new file mode 100644
index 000000000000..505aab9a5234
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_usdt.c
@@ -0,0 +1,96 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/usdt.bpf.h>
+
+int my_pid;
+
+int usdt0_called;
+u64 usdt0_cookie;
+int usdt0_arg_cnt;
+int usdt0_arg_ret;
+
+SEC("usdt")
+int usdt0(struct pt_regs *ctx)
+{
+	long tmp;
+
+	if (my_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&usdt0_called, 1);
+
+	usdt0_cookie = bpf_usdt_cookie(ctx);
+	usdt0_arg_cnt = bpf_usdt_arg_cnt(ctx);
+	/* should return -ENOENT for any arg_num */
+	usdt0_arg_ret = bpf_usdt_arg(ctx, bpf_get_prandom_u32(), &tmp);
+	return 0;
+}
+
+int usdt3_called;
+u64 usdt3_cookie;
+int usdt3_arg_cnt;
+int usdt3_arg_rets[3];
+u64 usdt3_args[3];
+
+SEC("usdt//proc/self/exe:test:usdt3")
+int usdt3(struct pt_regs *ctx)
+{
+	long tmp;
+
+	if (my_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&usdt3_called, 1);
+
+	usdt3_cookie = bpf_usdt_cookie(ctx);
+	usdt3_arg_cnt = bpf_usdt_arg_cnt(ctx);
+
+	usdt3_arg_rets[0] = bpf_usdt_arg(ctx, 0, &tmp);
+	usdt3_args[0] = (int)tmp;
+
+	usdt3_arg_rets[1] = bpf_usdt_arg(ctx, 1, &tmp);
+	usdt3_args[1] = (long)tmp;
+
+	usdt3_arg_rets[2] = bpf_usdt_arg(ctx, 2, &tmp);
+	usdt3_args[2] = (uintptr_t)tmp;
+
+	return 0;
+}
+
+int usdt12_called;
+u64 usdt12_cookie;
+int usdt12_arg_cnt;
+u64 usdt12_args[12];
+
+SEC("usdt//proc/self/exe:test:usdt12")
+int BPF_USDT(usdt12, int a1, int a2, long a3, long a4, unsigned a5,
+		     long a6, __u64 a7, uintptr_t a8, int a9, short a10,
+		     short a11, signed char a12)
+{
+	if (my_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&usdt12_called, 1);
+
+	usdt12_cookie = bpf_usdt_cookie(ctx);
+	usdt12_arg_cnt = bpf_usdt_arg_cnt(ctx);
+
+	usdt12_args[0] = a1;
+	usdt12_args[1] = a2;
+	usdt12_args[2] = a3;
+	usdt12_args[3] = a4;
+	usdt12_args[4] = a5;
+	usdt12_args[5] = a6;
+	usdt12_args[6] = a7;
+	usdt12_args[7] = a8;
+	usdt12_args[8] = a9;
+	usdt12_args[9] = a10;
+	usdt12_args[10] = a11;
+	usdt12_args[11] = a12;
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_usdt_multispec.c b/tools/testing/selftests/bpf/progs/test_usdt_multispec.c
new file mode 100644
index 000000000000..96fe128790d1
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_usdt_multispec.c
@@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/usdt.bpf.h>
+
+/* this file is linked together with test_usdt.c to validate that ust.bpf.h
+ * can be included in multiple .bpf.c files forming single final BPF object
+ * file
+ */
+
+extern int my_pid;
+
+int usdt_100_called;
+int usdt_100_sum;
+
+SEC("usdt//proc/self/exe:test:usdt_100")
+int BPF_USDT(usdt_100, int x)
+{
+	long tmp;
+
+	if (my_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&usdt_100_called, 1);
+	__sync_fetch_and_add(&usdt_100_sum, x);
+
+	bpf_printk("X is %d, sum is %d", x, usdt_100_sum);
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 bpf-next 7/7] selftests/bpf: add urandom_read shared lib and USDTs
  2022-04-02  0:29 [PATCH v2 bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2022-04-02  0:29 ` [PATCH v2 bpf-next 6/7] selftests/bpf: add basic USDT selftests Andrii Nakryiko
@ 2022-04-02  0:29 ` Andrii Nakryiko
  2022-04-04 18:34   ` Dave Marchevsky
  6 siblings, 1 reply; 19+ messages in thread
From: Andrii Nakryiko @ 2022-04-02  0:29 UTC (permalink / raw)
  To: bpf, ast, daniel
  Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky, Hengqi Chen

Extend urandom_read helper binary to include USDTs of 4 combinations:
semaphore/semaphoreless (refcounted and non-refcounted) and based in
executable or shared library. We also extend urandom_read with ability
to report it's own PID to parent process and wait for parent process to
ready itself up for tracing urandom_read. We utilize popen() and
underlying pipe properties for proper signaling.

Once urandom_read is ready, we add few tests to validate that libbpf's
USDT attachment handles all the above combinations of semaphore (or lack
of it) and static or shared library USDTs. Also, we validate that libbpf
handles shared libraries both with PID filter and without one (i.e., -1
for PID argument).

Having the shared library case tested with and without PID is important
because internal logic differs on kernels that don't support BPF
cookies. On such older kernels, attaching to USDTs in shared libraries
without specifying concrete PID doesn't work in principle, because it's
impossible to determine shared library's load address to derive absolute
IPs for uprobe attachments. Without absolute IPs, it's impossible to
perform correct look up of USDT spec based on uprobe's absolute IP (the
only kind available from BPF at runtime). This is not the problem on
newer kernels with BPF cookie as we don't need IP-to-ID lookup because
BPF cookie value *is* spec ID.

So having those two situations as separate subtests is good because
libbpf CI is able to test latest selftests against old kernels (e.g.,
4.9 and 5.5), so we'll be able to disable PID-less shared lib attachment
for old kernels, but will still leave PID-specific one enabled to validate
this legacy logic is working correctly.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/testing/selftests/bpf/Makefile          |  11 +-
 tools/testing/selftests/bpf/prog_tests/usdt.c | 108 ++++++++++++++++++
 .../selftests/bpf/progs/test_urandom_usdt.c   |  70 ++++++++++++
 .../selftests/bpf/progs/test_usdt_multispec.c |   2 -
 tools/testing/selftests/bpf/urandom_read.c    |  63 +++++++++-
 .../testing/selftests/bpf/urandom_read_aux.c  |   9 ++
 .../testing/selftests/bpf/urandom_read_lib1.c |  13 +++
 .../testing/selftests/bpf/urandom_read_lib2.c |   8 ++
 8 files changed, 275 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_urandom_usdt.c
 create mode 100644 tools/testing/selftests/bpf/urandom_read_aux.c
 create mode 100644 tools/testing/selftests/bpf/urandom_read_lib1.c
 create mode 100644 tools/testing/selftests/bpf/urandom_read_lib2.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 0f8c55dfd844..bafdc5373a13 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -168,9 +168,15 @@ $(OUTPUT)/%:%.c
 	$(call msg,BINARY,,$@)
 	$(Q)$(LINK.c) $^ $(LDLIBS) -o $@
 
-$(OUTPUT)/urandom_read: urandom_read.c
+$(OUTPUT)/liburandom_read.so: urandom_read_lib1.c urandom_read_lib2.c
+	$(call msg,LIB,,$@)
+	$(Q)$(CC) $(CFLAGS) -fPIC $(LDFLAGS) $^ $(LDLIBS) --shared -o $@
+
+$(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_read.so
 	$(call msg,BINARY,,$@)
-	$(Q)$(CC) $(CFLAGS) $(LDFLAGS) $< $(LDLIBS) -Wl,--build-id=sha1 -o $@
+	$(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.c,$^)			       \
+		  liburandom_read.so $(LDLIBS)	       			       \
+		  -Wl,-rpath=. -Wl,--build-id=sha1 -o $@
 
 $(OUTPUT)/bpf_testmod.ko: $(VMLINUX_BTF) $(wildcard bpf_testmod/Makefile bpf_testmod/*.[ch])
 	$(call msg,MOD,,$@)
@@ -493,6 +499,7 @@ TRUNNER_EXTRA_SOURCES := test_progs.c cgroup_helpers.c trace_helpers.c	\
 			 btf_helpers.c flow_dissector_load.h		\
 			 cap_helpers.c
 TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko	\
+		       $(OUTPUT)/liburandom_read.so			\
 		       ima_setup.sh					\
 		       $(wildcard progs/btf_dump_test_case_*.c)
 TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
index d5a3125d4a4d..50b41423aa9c 100644
--- a/tools/testing/selftests/bpf/prog_tests/usdt.c
+++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
@@ -6,6 +6,7 @@
 #include <sys/sdt.h>
 
 #include "test_usdt.skel.h"
+#include "test_urandom_usdt.skel.h"
 
 int lets_test_this(int);
 
@@ -304,10 +305,117 @@ static void subtest_multispec_usdt(void)
 	test_usdt__destroy(skel);
 }
 
+static FILE *urand_spawn(int *pid)
+{
+	FILE *f;
+
+	/* urandom_read's stdout is wired into f */
+	f = popen("./urandom_read 1 report-pid", "r");
+	if (!f)
+		return NULL;
+
+	if (fscanf(f, "%d", pid) != 1) {
+		pclose(f);
+		return NULL;
+	}
+
+	return f;
+}
+
+static int urand_trigger(FILE **urand_pipe)
+{
+	int exit_code;
+
+	/* pclose() waits for child process to exit and returns their exit code */
+	exit_code = pclose(*urand_pipe);
+	*urand_pipe = NULL;
+
+	return exit_code;
+}
+
+static void subtest_urandom_usdt(bool auto_attach)
+{
+	struct test_urandom_usdt *skel;
+	struct test_urandom_usdt__bss *bss;
+	struct bpf_link *l;
+	FILE *urand_pipe = NULL;
+	int err, urand_pid = 0;
+
+	skel = test_urandom_usdt__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	urand_pipe = urand_spawn(&urand_pid);
+	if (!ASSERT_OK_PTR(urand_pipe, "urand_spawn"))
+		goto cleanup;
+
+	bss = skel->bss;
+	bss->urand_pid = urand_pid;
+
+	if (auto_attach) {
+		err = test_urandom_usdt__attach(skel);
+		if (!ASSERT_OK(err, "skel_auto_attach"))
+			goto cleanup;
+	} else {
+		l = bpf_program__attach_usdt(skel->progs.urand_read_without_sema,
+					     urand_pid, "./urandom_read",
+					     "urand", "read_without_sema", NULL);
+		if (!ASSERT_OK_PTR(l, "urand_without_sema_attach"))
+			goto cleanup;
+		skel->links.urand_read_without_sema = l;
+
+		l = bpf_program__attach_usdt(skel->progs.urand_read_with_sema,
+					     urand_pid, "./urandom_read",
+					     "urand", "read_with_sema", NULL);
+		if (!ASSERT_OK_PTR(l, "urand_with_sema_attach"))
+			goto cleanup;
+		skel->links.urand_read_with_sema = l;
+
+		l = bpf_program__attach_usdt(skel->progs.urandlib_read_without_sema,
+					     urand_pid, "./liburandom_read.so",
+					     "urandlib", "read_without_sema", NULL);
+		if (!ASSERT_OK_PTR(l, "urandlib_without_sema_attach"))
+			goto cleanup;
+		skel->links.urandlib_read_without_sema = l;
+
+		l = bpf_program__attach_usdt(skel->progs.urandlib_read_with_sema,
+					     urand_pid, "./liburandom_read.so",
+					     "urandlib", "read_with_sema", NULL);
+		if (!ASSERT_OK_PTR(l, "urandlib_with_sema_attach"))
+			goto cleanup;
+		skel->links.urandlib_read_with_sema = l;
+
+	}
+
+	/* trigger urandom_read USDTs */
+	ASSERT_OK(urand_trigger(&urand_pipe), "urand_exit_code");
+
+	ASSERT_EQ(bss->urand_read_without_sema_call_cnt, 1, "urand_wo_sema_cnt");
+	ASSERT_EQ(bss->urand_read_without_sema_buf_sz_sum, 256, "urand_wo_sema_sum");
+
+	ASSERT_EQ(bss->urand_read_with_sema_call_cnt, 1, "urand_w_sema_cnt");
+	ASSERT_EQ(bss->urand_read_with_sema_buf_sz_sum, 256, "urand_w_sema_sum");
+
+	ASSERT_EQ(bss->urandlib_read_without_sema_call_cnt, 1, "urandlib_wo_sema_cnt");
+	ASSERT_EQ(bss->urandlib_read_without_sema_buf_sz_sum, 256, "urandlib_wo_sema_sum");
+
+	ASSERT_EQ(bss->urandlib_read_with_sema_call_cnt, 1, "urandlib_w_sema_cnt");
+	ASSERT_EQ(bss->urandlib_read_with_sema_buf_sz_sum, 256, "urandlib_w_sema_sum");
+
+cleanup:
+	if (urand_pipe)
+		pclose(urand_pipe);
+	test_urandom_usdt__destroy(skel);
+}
+
 void test_usdt(void)
 {
 	if (test__start_subtest("basic"))
 		subtest_basic_usdt();
 	if (test__start_subtest("multispec"))
 		subtest_multispec_usdt();
+	if (test__start_subtest("urand_auto_attach"))
+		subtest_urandom_usdt(true /* auto_attach */);
+	if (test__start_subtest("urand_pid_attach"))
+		subtest_urandom_usdt(false /* auto_attach */);
 }
diff --git a/tools/testing/selftests/bpf/progs/test_urandom_usdt.c b/tools/testing/selftests/bpf/progs/test_urandom_usdt.c
new file mode 100644
index 000000000000..3539b02bd5f7
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_urandom_usdt.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/usdt.bpf.h>
+
+int urand_pid;
+
+int urand_read_without_sema_call_cnt;
+int urand_read_without_sema_buf_sz_sum;
+
+SEC("usdt/./urandom_read:urand:read_without_sema")
+int BPF_USDT(urand_read_without_sema, int iter_num, int iter_cnt, int buf_sz)
+{
+	if (urand_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&urand_read_without_sema_call_cnt, 1);
+	__sync_fetch_and_add(&urand_read_without_sema_buf_sz_sum, buf_sz);
+
+	return 0;
+}
+
+int urand_read_with_sema_call_cnt;
+int urand_read_with_sema_buf_sz_sum;
+
+SEC("usdt/./urandom_read:urand:read_with_sema")
+int BPF_USDT(urand_read_with_sema, int iter_num, int iter_cnt, int buf_sz)
+{
+	if (urand_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&urand_read_with_sema_call_cnt, 1);
+	__sync_fetch_and_add(&urand_read_with_sema_buf_sz_sum, buf_sz);
+
+	return 0;
+}
+
+int urandlib_read_without_sema_call_cnt;
+int urandlib_read_without_sema_buf_sz_sum;
+
+SEC("usdt/./liburandom_read.so:urandlib:read_without_sema")
+int BPF_USDT(urandlib_read_without_sema, int iter_num, int iter_cnt, int buf_sz)
+{
+	if (urand_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&urandlib_read_without_sema_call_cnt, 1);
+	__sync_fetch_and_add(&urandlib_read_without_sema_buf_sz_sum, buf_sz);
+
+	return 0;
+}
+
+int urandlib_read_with_sema_call_cnt;
+int urandlib_read_with_sema_buf_sz_sum;
+
+SEC("usdt/./liburandom_read.so:urandlib:read_with_sema")
+int BPF_USDT(urandlib_read_with_sema, int iter_num, int iter_cnt, int buf_sz)
+{
+	if (urand_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&urandlib_read_with_sema_call_cnt, 1);
+	__sync_fetch_and_add(&urandlib_read_with_sema_buf_sz_sum, buf_sz);
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_usdt_multispec.c b/tools/testing/selftests/bpf/progs/test_usdt_multispec.c
index 96fe128790d1..3c28a30f145c 100644
--- a/tools/testing/selftests/bpf/progs/test_usdt_multispec.c
+++ b/tools/testing/selftests/bpf/progs/test_usdt_multispec.c
@@ -26,8 +26,6 @@ int BPF_USDT(usdt_100, int x)
 	__sync_fetch_and_add(&usdt_100_called, 1);
 	__sync_fetch_and_add(&usdt_100_sum, x);
 
-	bpf_printk("X is %d, sum is %d", x, usdt_100_sum);
-
 	return 0;
 }
 
diff --git a/tools/testing/selftests/bpf/urandom_read.c b/tools/testing/selftests/bpf/urandom_read.c
index db781052758d..0366f36e2174 100644
--- a/tools/testing/selftests/bpf/urandom_read.c
+++ b/tools/testing/selftests/bpf/urandom_read.c
@@ -1,32 +1,85 @@
+#include <stdbool.h>
 #include <stdio.h>
 #include <unistd.h>
+#include <errno.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <stdlib.h>
+#include <signal.h>
+
+#define _SDT_HAS_SEMAPHORES 1
+#include <sys/sdt.h>
+
+#define SEC(name) __attribute__((section(name), used))
 
 #define BUF_SIZE 256
 
+/* defined in urandom_read_aux.c */
+void urand_read_without_sema(int iter_num, int iter_cnt, int read_sz);
+/* these are coming from urandom_read_lib{1,2}.c */
+void urandlib_read_with_sema(int iter_num, int iter_cnt, int read_sz);
+void urandlib_read_without_sema(int iter_num, int iter_cnt, int read_sz);
+
+unsigned short urand_read_with_sema_semaphore SEC(".probes");
+
 static __attribute__((noinline))
 void urandom_read(int fd, int count)
 {
-       char buf[BUF_SIZE];
-       int i;
+	char buf[BUF_SIZE];
+	int i;
+
+	for (i = 0; i < count; ++i) {
+		read(fd, buf, BUF_SIZE);
+
+		/* trigger USDTs defined in executable itself */
+		urand_read_without_sema(i, count, BUF_SIZE);
+		STAP_PROBE3(urand, read_with_sema, i, count, BUF_SIZE);
 
-       for (i = 0; i < count; ++i)
-               read(fd, buf, BUF_SIZE);
+		/* trigger USDTs defined in shared lib */
+		urandlib_read_without_sema(i, count, BUF_SIZE);
+		urandlib_read_with_sema(i, count, BUF_SIZE);
+	}
+}
+
+static volatile bool parent_ready;
+
+static void handle_sigpipe(int sig)
+{
+	parent_ready = true;
 }
 
 int main(int argc, char *argv[])
 {
 	int fd = open("/dev/urandom", O_RDONLY);
 	int count = 4;
+	bool report_pid = false;
 
 	if (fd < 0)
 		return 1;
 
-	if (argc == 2)
+	if (argc >= 2)
 		count = atoi(argv[1]);
+	if (argc >= 3) {
+		report_pid = true;
+		/* install SIGPIPE handler to catch when parent closes their
+		 * end of the pipe (on the other side of our stdout)
+		 */
+		signal(SIGPIPE, handle_sigpipe);
+	}
+
+	/* report PID and wait for parent process to send us "signal" by
+	 * closing stdout
+	 */
+	if (report_pid) {
+		while (!parent_ready) {
+			fprintf(stdout, "%d\n", getpid());
+			fflush(stdout);
+		}
+		/* at this point stdout is closed, parent process knows our
+		 * PID and is ready to trace us
+		 */
+	}
 
 	urandom_read(fd, count);
 
diff --git a/tools/testing/selftests/bpf/urandom_read_aux.c b/tools/testing/selftests/bpf/urandom_read_aux.c
new file mode 100644
index 000000000000..88026f21ebfb
--- /dev/null
+++ b/tools/testing/selftests/bpf/urandom_read_aux.c
@@ -0,0 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#include <sys/sdt.h>
+
+void urand_read_without_sema(int iter_num, int iter_cnt, int read_sz)
+{
+	/* semaphore-less USDT */
+	STAP_PROBE3(urand, read_without_sema, iter_num, iter_cnt, read_sz);
+}
diff --git a/tools/testing/selftests/bpf/urandom_read_lib1.c b/tools/testing/selftests/bpf/urandom_read_lib1.c
new file mode 100644
index 000000000000..3e1b63b00dfb
--- /dev/null
+++ b/tools/testing/selftests/bpf/urandom_read_lib1.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#define _SDT_HAS_SEMAPHORES 1
+#include <sys/sdt.h>
+
+#define SEC(name) __attribute__((section(name), used))
+
+unsigned short urandlib_read_with_sema_semaphore SEC(".probes");
+
+void urandlib_read_with_sema(int iter_num, int iter_cnt, int read_sz)
+{
+	STAP_PROBE3(urandlib, read_with_sema, iter_num, iter_cnt, read_sz);
+}
diff --git a/tools/testing/selftests/bpf/urandom_read_lib2.c b/tools/testing/selftests/bpf/urandom_read_lib2.c
new file mode 100644
index 000000000000..e307a52d07e9
--- /dev/null
+++ b/tools/testing/selftests/bpf/urandom_read_lib2.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#include <sys/sdt.h>
+
+void urandlib_read_without_sema(int iter_num, int iter_cnt, int read_sz)
+{
+	STAP_PROBE3(urandlib, read_without_sema, iter_num, iter_cnt, read_sz);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-04-02  0:29 ` [PATCH v2 bpf-next 1/7] libbpf: add BPF-side of USDT support Andrii Nakryiko
@ 2022-04-04  0:23   ` Dave Marchevsky
  2022-04-04  3:55     ` Andrii Nakryiko
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Marchevsky @ 2022-04-04  0:23 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Alan Maguire, Hengqi Chen

On 4/1/22 8:29 PM, Andrii Nakryiko wrote:   
> Add BPF-side implementation of libbpf-provided USDT support. This
> consists of single header library, usdt.bpf.h, which is meant to be used
> from user's BPF-side source code. This header is added to the list of
> installed libbpf header, along bpf_helpers.h and others.
> 
> BPF-side implementation consists of two BPF maps:
>   - spec map, which contains "a USDT spec" which encodes information
>     necessary to be able to fetch USDT arguments and other information
>     (argument count, user-provided cookie value, etc) at runtime;
>   - IP-to-spec-ID map, which is only used on kernels that don't support
>     BPF cookie feature. It allows to lookup spec ID based on the place
>     in user application that triggers USDT program.
> 
> These maps have default sizes, 256 and 1024, which are chosen
> conservatively to not waste a lot of space, but handling a lot of common
> cases. But there could be cases when user application needs to either
> trace a lot of different USDTs, or USDTs are heavily inlined and their
> arguments are located in a lot of differing locations. For such cases it
> might be necessary to size those maps up, which libbpf allows to do by
> overriding BPF_USDT_MAX_SPEC_CNT and BPF_USDT_MAX_IP_CNT macros.
> 
> It is an important aspect to keep in mind. Single USDT (user-space
> equivalent of kernel tracepoint) can have multiple USDT "call sites".
> That is, single logical USDT is triggered from multiple places in user
> application. This can happen due to function inlining. Each such inlined
> instance of USDT invocation can have its own unique USDT argument
> specification (instructions about the location of the value of each of
> USDT arguments). So while USDT looks very similar to usual uprobe or
> kernel tracepoint, under the hood it's actually a collection of uprobes,
> each potentially needing different spec to know how to fetch arguments.
> 
> User-visible API consists of three helper functions:
>   - bpf_usdt_arg_cnt(), which returns number of arguments of current USDT;
>   - bpf_usdt_arg(), which reads value of specified USDT argument (by
>     it's zero-indexed position) and returns it as 64-bit value;
>   - bpf_usdt_cookie(), which functions like BPF cookie for USDT
>     programs; this is necessary as libbpf doesn't allow specifying actual
>     BPF cookie and utilizes it internally for USDT support implementation.
> 
> Each bpf_usdt_xxx() APIs expect struct pt_regs * context, passed into
> BPF program. On kernels that don't support BPF cookie it is used to
> fetch absolute IP address of the underlying uprobe.
> 
> usdt.bpf.h also provides BPF_USDT() macro, which functions like
> BPF_PROG() and BPF_KPROBE() and allows much more user-friendly way to
> get access to USDT arguments, if USDT definition is static and known to
> the user. It is expected that majority of use cases won't have to use
> bpf_usdt_arg_cnt() and bpf_usdt_arg() directly and BPF_USDT() will cover
> all their needs.
> 
> Last, usdt.bpf.h is utilizing BPF CO-RE for one single purpose: to
> detect kernel support for BPF cookie. If BPF CO-RE dependency is
> undesirable, user application can redefine BPF_USDT_HAS_BPF_COOKIE to
> either a boolean constant (or equivalently zero and non-zero), or even
> point it to its own .rodata variable that can be specified from user's
> application user-space code. It is important that
> BPF_USDT_HAS_BPF_COOKIE is known to BPF verifier as static value (thus
> .rodata and not just .data), as otherwise BPF code will still contain
> bpf_get_attach_cookie() BPF helper call and will fail validation at
> runtime, if not dead-code eliminated.
> 
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  tools/lib/bpf/Makefile   |   2 +-
>  tools/lib/bpf/usdt.bpf.h | 256 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 257 insertions(+), 1 deletion(-)
>  create mode 100644 tools/lib/bpf/usdt.bpf.h

[...]

> diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h
> new file mode 100644
> index 000000000000..0941c915d58d
> --- /dev/null
> +++ b/tools/lib/bpf/usdt.bpf.h
> @@ -0,0 +1,256 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +#ifndef __USDT_BPF_H__
> +#define __USDT_BPF_H__
> +
> +#include <linux/errno.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_core_read.h>
> +
> +/* Below types and maps are internal implementation details of libbpf's USDT
> + * support and are subjects to change. Also, usdt_xxx() API helpers should be
> + * considered an unstable API as well and might be adjusted based on user
> + * feedback from using libbpf's USDT support in production.
> + */
> +
> +/* User can override BPF_USDT_MAX_SPEC_CNT to change default size of internal
> + * map that keeps track of USDT argument specifications. This might be
> + * necessary if there are a lot of USDT attachments.
> + */
> +#ifndef BPF_USDT_MAX_SPEC_CNT
> +#define BPF_USDT_MAX_SPEC_CNT 256
> +#endif
> +/* User can override BPF_USDT_MAX_IP_CNT to change default size of internal
> + * map that keeps track of IP (memory address) mapping to USDT argument
> + * specification.
> + * Note, if kernel supports BPF cookies, this map is not used and could be
> + * resized all the way to 1 to save a bit of memory.
> + */
> +#ifndef BPF_USDT_MAX_IP_CNT
> +#define BPF_USDT_MAX_IP_CNT (4 * BPF_USDT_MAX_SPEC_CNT)
> +#endif
> +/* We use BPF CO-RE to detect support for BPF cookie from BPF side. This is
> + * the only dependency on CO-RE, so if it's undesirable, user can override
> + * BPF_USDT_HAS_BPF_COOKIE to specify whether to BPF cookie is supported or not.
> + */
> +#ifndef BPF_USDT_HAS_BPF_COOKIE
> +#define BPF_USDT_HAS_BPF_COOKIE \
> +	bpf_core_enum_value_exists(enum bpf_func_id___usdt, BPF_FUNC_get_attach_cookie___usdt)
> +#endif
> +
> +enum __bpf_usdt_arg_type {
> +	BPF_USDT_ARG_CONST,
> +	BPF_USDT_ARG_REG,
> +	BPF_USDT_ARG_REG_DEREF,
> +};
> +
> +struct __bpf_usdt_arg_spec {
> +	/* u64 scalar interpreted depending on arg_type, see below */
> +	__u64 val_off;
> +	/* arg location case, see bpf_udst_arg() for details */
> +	enum __bpf_usdt_arg_type arg_type;
> +	/* offset of referenced register within struct pt_regs */
> +	short reg_off;

Although USDT args are almost always passed via regs in pt_regs, other regs are
occasionally used. In BCC repo this has come up a few times recently ([0][1]).
Notably, in [1] we found a USDT in libpthread using a 'xmm' register on a recent
Fedora, so it's not just obscure libs' probe targets.

Of course, there's currently no way to fetch 'xmm' registers from BPF programs.
Alexei and I had a braindump session where he concluded that it would be useful
to have some arch-specific helper to pull non-pt_regs regs and a concise repro.
Am poking at this, hope to have something in next week or two.

Anyways, if this lib didn't assume that USDT arg regs were always some reg_off
into pt_regs it would be easier to extend in the near future. This could be
simply reserving negative reg_off values to signal some arch-specific non
pt_regs register, or a more explicit enum to signal.

  [0]: https://github.com/iovisor/bcc/issues/3875
  [1]: https://github.com/iovisor/bcc/pull/3880

> +	/* whether arg should be interpreted as signed value */
> +	bool arg_signed;
> +	/* number of bits that need to be cleared and, optionally,
> +	 * sign-extended to cast arguments that are 1, 2, or 4 bytes
> +	 * long into final 8-byte u64/s64 value returned to user
> +	 */
> +	char arg_bitshift;

Could this field instead be 'arg_sz' holding size of arg in bytes? Could derive
bitshift from size where it's needed in this patch, and future potential uses of
size would already have it on hand. Folks writing new arch-specific parsing logic
like your "libbpf: add x86-specific USDT arg spec parsing logic" patch wouldn't
need to derive bitshift from size or care about what size is used for.

I don't feel strongly about this, just noticed that, aside from this field and
reg_off, it's very easy to conceptually map this struct's fields 1:1 to what
USDT arg looks like without any knowledge of this lib's inner workings.

> +};

[...]

> +static inline __noinline
> +int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
> +{
> +	struct __bpf_usdt_spec *spec;
> +	struct __bpf_usdt_arg_spec *arg_spec;
> +	unsigned long val;
> +	int err, spec_id;
> +
> +	*res = 0;
> +
> +	spec_id = __bpf_usdt_spec_id(ctx);
> +	if (spec_id < 0)
> +		return -ESRCH;
> +
> +	spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> +	if (!spec)
> +		return -ESRCH;
> +
> +	if (arg_num >= BPF_USDT_MAX_ARG_CNT || arg_num >= spec->arg_cnt)
> +		return -ENOENT;
> +
> +	arg_spec = &spec->args[arg_num];
> +	switch (arg_spec->arg_type) {
> +	case BPF_USDT_ARG_CONST:
> +		/* Arg is just a constant ("-4@$-9" in USDT arg spec).
> +		 * value is recorded in arg_spec->val_off directly.
> +		 */
> +		val = arg_spec->val_off;
> +		break;
> +	case BPF_USDT_ARG_REG:
> +		/* Arg is in a register (e.g, "8@%rax" in USDT arg spec),
> +		 * so we read the contents of that register directly from
> +		 * struct pt_regs. To keep things simple user-space parts
> +		 * record offsetof(struct pt_regs, <regname>) in arg_spec->reg_off.
> +		 */
> +		err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
> +		if (err)
> +			return err;
> +		break;
> +	case BPF_USDT_ARG_REG_DEREF:
> +		/* Arg is in memory addressed by register, plus some offset
> +		 * (e.g., "-4@-1204(%rbp)" in USDT arg spec). Register is
> +		 * identified lik with BPF_USDT_ARG_REG case, and the offset
> +		 * is in arg_spec->val_off. We first fetch register contents
> +		 * from pt_regs, then do another user-space probe read to
> +		 * fetch argument value itself.
> +		 */
> +		err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
> +		if (err)
> +			return err;
> +		err = bpf_probe_read_user(&val, sizeof(val), (void *)val + arg_spec->val_off);
> +		if (err)
> +			return err;
> +		break;

Assuming either of previous reg_off suggestions are taken, very straightforward
to extend this for non-pt_regs regs.

> +	default:
> +		return -EINVAL;
> +	}
> +
> +	/* cast arg from 1, 2, or 4 bytes to final 8 byte size clearing
> +	 * necessary upper arg_bitshift bits, with sign extension if argument
> +	 * is signed
> +	 */
> +	val <<= arg_spec->arg_bitshift;
> +	if (arg_spec->arg_signed)
> +		val = ((long)val) >> arg_spec->arg_bitshift;
> +	else
> +		val = val >> arg_spec->arg_bitshift;
> +	*res = val;
> +	return 0;
> +}

[...]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration
  2022-04-02  0:29 ` [PATCH v2 bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration Andrii Nakryiko
@ 2022-04-04  3:12   ` Dave Marchevsky
  2022-04-04  4:18     ` Andrii Nakryiko
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Marchevsky @ 2022-04-04  3:12 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Alan Maguire, Hengqi Chen

On 4/1/22 8:29 PM, Andrii Nakryiko wrote:   
> Wire up libbpf USDT support APIs without yet implementing all the
> nitty-gritty details of USDT discovery, spec parsing, and BPF map
> initialization.
> 
> User-visible user-space API is simple and is conceptually very similar
> to uprobe API.
> 
> bpf_program__attach_usdt() API allows to programmatically attach given
> BPF program to a USDT, specified through binary path (executable or
> shared lib), USDT provider and name. Also, just like in uprobe case, PID
> filter is specified (0 - self, -1 - any process, or specific PID).
> Optionally, USDT cookie value can be specified. Such single API
> invocation will try to discover given USDT in specified binary and will
> use (potentially many) BPF uprobes to attach this program in correct
> locations.
> 
> Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that
> represents this attachment. It is a virtual BPF link that doesn't have
> direct kernel object, as it can consist of multiple underlying BPF
> uprobe links. As such, attachment is not atomic operation and there can
> be brief moment when some USDT call sites are attached while others are
> still in the process of attaching. This should be taken into
> consideration by user. But bpf_program__attach_usdt() guarantees that
> in the case of success all USDT call sites are successfully attached, or
> all the successfuly attachments will be detached as soon as some USDT
> call sites failed to be attached. So, in theory, there could be cases of
> failed bpf_program__attach_usdt() call which did trigger few USDT
> program invocations. This is unavoidable due to multi-uprobe nature of
> USDT and has to be handled by user, if it's important to create an
> illusion of atomicity.

It would be useful to be able to control the behavior in response to attach
failure in bpf_program__attach_usdt. Specifically I'd like to be able to
choose between existing "all attaches succeed or entire operation fails" and
"_any_ attach succeeds or entire operation fails". Few reasons for this:

 * Tools like BOLT were not playing nicely with USDTs for some time ([0],[1])
 * BCC's logic was changed to support more granular 'attach failure' logic ([2])
 * At FB I still see some multi-probe USDTs with incorrect-looking locations on
   some of the probes

Note that my change for 2nd bullet was to handle ".so in shortlived process"
usecase, which this lib handles by properly supporting pid = -1. But it's since
come in handy to avoid 3rd bullet's issue from causing trouble.

Production tracing tools would be less brittle if they could control this attach
failure logic.

  [0]: https://github.com/facebookincubator/BOLT/commit/ea49a61463c65775aa796a9ef7a1199f20d2a698
  [1]: https://github.com/facebookincubator/BOLT/commit/93860e02a19227be4963a68aa99ea0e09771052b
  [2]: https://github.com/iovisor/bcc/pull/2476

> USDT BPF programs themselves are marked in BPF source code as either
> SEC("usdt"), in which case they won't be auto-attached through
> skeleton's <skel>__attach() method, or it can have a full definition,
> which follows the spirit of fully-specified uprobes:
> SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's
> attach method will attempt auto-attachment. Similarly, generic
> bpf_program__attach() will have enought information to go off of for
> parameterless attachment.
> 
> USDT BPF programs are actually uprobes, and as such for kernel they are
> marked as BPF_PROG_TYPE_KPROBE.
> 
> Another part of this patch is USDT-related feature probing:
>   - BPF cookie support detection from user-space;
>   - detection of kernel support for auto-refcounting of USDT semaphore.
> 
> The latter is optional. If kernel doesn't support such feature and USDT
> doesn't rely on USDT semaphores, no error is returned. But if libbpf
> detects that USDT requires setting semaphores and kernel doesn't support
> this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't
> support poking process's memory directly to increment semaphore value,
> like BCC does on legacy kernels, due to inherent raciness and danger of
> such process memory manipulation. Libbpf let's kernel take care of this
> properly or gives up.
> 
> Logistically, all the extra USDT-related infrastructure of libbpf is put
> into a separate usdt.c file and abstracted behind struct usdt_manager.
> Each bpf_object has lazily-initialized usdt_manager pointer, which is
> only instantiated if USDT programs are attempted to be attached. Closing
> BPF object frees up usdt_manager resources. usdt_manager keeps track of
> USDT spec ID assignment and few other small things.
> 
> Subsequent patches will fill out remaining missing pieces of USDT
> initialization and setup logic.
> 
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  tools/lib/bpf/Build             |   3 +-
>  tools/lib/bpf/libbpf.c          | 100 +++++++-
>  tools/lib/bpf/libbpf.h          |  31 +++
>  tools/lib/bpf/libbpf.map        |   1 +
>  tools/lib/bpf/libbpf_internal.h |  19 ++
>  tools/lib/bpf/usdt.c            | 426 ++++++++++++++++++++++++++++++++
>  6 files changed, 571 insertions(+), 9 deletions(-)
>  create mode 100644 tools/lib/bpf/usdt.c

[...]

> +static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link)
> +{
> +	char *path = NULL, *provider = NULL, *name = NULL;
> +	const char *sec_name;
> +	int n, err;
> +
> +	sec_name = bpf_program__section_name(prog);
> +	if (strcmp(sec_name, "usdt") == 0) {
> +		/* no auto-attach for just SEC("usdt") */
> +		*link = NULL;
> +		return 0;
> +	}
> +
> +	n = sscanf(sec_name, "usdt/%m[^:]:%m[^:]:%m[^:]", &path, &provider, &name);

TIL %m

> +	if (n != 3) {
> +		pr_warn("invalid section '%s', expected SEC(\"usdt/<path>:<provider>:<name>\")\n",
> +			sec_name);
> +		err = -EINVAL;
> +	} else {
> +		*link = bpf_program__attach_usdt(prog, -1 /* any process */, path,
> +						 provider, name, NULL);
> +		err = libbpf_get_error(*link);
> +	}
> +	free(path);
> +	free(provider);
> +	free(name);
> +	return err;
> +}

[...]

> diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> new file mode 100644
> index 000000000000..9476f7a15769
> --- /dev/null
> +++ b/tools/lib/bpf/usdt.c
> @@ -0,0 +1,426 @@
> +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +#include <ctype.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <libelf.h>
> +#include <gelf.h>
> +#include <unistd.h>
> +#include <linux/ptrace.h>
> +#include <linux/kernel.h>
> +
> +#include "bpf.h"
> +#include "libbpf.h"
> +#include "libbpf_common.h"
> +#include "libbpf_internal.h"
> +#include "hashmap.h"
> +
> +/* libbpf's USDT support consists of BPF-side state/code and user-space
> + * state/code working together in concert. BPF-side parts are defined in
> + * usdt.bpf.h header library. User-space state is encapsulated by struct
> + * usdt_manager and all the supporting code centered around usdt_manager.
> + *
> + * usdt.bpf.h defines two BPF maps that usdt_manager expects: USDT spec map
> + * and IP-to-spec-ID map, which is auxiliary map necessary for kernels that
> + * don't support BPF cookie (see below). These two maps are implicitly
> + * embedded into user's end BPF object file when user's code included
> + * usdt.bpf.h. This means that libbpf doesn't do anything special to create
> + * these USDT support maps. They are created by normal libbpf logic of
> + * instantiating BPF maps when opening and loading BPF object.
> + *
> + * As such, libbpf is basically unaware of the need to do anything
> + * USDT-related until the very first call to bpf_program__attach_usdt(), which
> + * can be called by user explicitly or happen automatically during skeleton
> + * attach (or, equivalently, through generic bpf_program__attach() call). At
> + * this point, libbpf will instantiate and initialize struct usdt_manager and
> + * store it in bpf_object. USDT manager is per-BPF object construct, as each
> + * independent BPF object might or might not have USDT programs, and thus all
> + * the expected USDT-related state. There is no coordination between two
> + * bpf_object in parts of USDT attachment, they are oblivious of each other's
> + * existence and libbpf is just oblivious, dealing with bpf_object-specific
> + * USDT state.
> + *
> + * Quick crash course on USDTs.
> + *
> + * From user-space application's point of view, USDT is essentially just
> + * a slightly special function call that normally has zero overhead, unless it

Maybe better to claim 'low overhead' instead of 'zero' here and elsewhere?
Or elaborate about the overhead more explicitly. Agreed that it's so low as to
effectively be zero in most cases, but someone somewhere is going to put the nop
in a 4096-bytes-of-instr tight loop, see changed iTLB behavior, and get
frustrated.

A contrived scenario to be sure, but no other USDT documentation I can find
claims zero overhead, rather 'low', guessing for a good reason.


> + * is being traced by some external entity (e.g, BPF-based tool). Here's how
> + * a typical application can trigger USDT probe:
> + *
> + * #include <sys/sdt.h>  // provided by systemtap-sdt-devel package
> + * // folly also provide similar functionality in folly/tracing/StaticTracepoint.h
> + *
> + * STAP_PROBE3(my_usdt_provider, my_usdt_probe_name, 123, x, &y);
> + *
> + * USDT is identified by it's <provider-name>:<probe-name> pair of names. Each
> + * individual USDT has a fixed number of arguments (3 in the above example)
> + * and specifies values of each argument as if it was a function call.
> + *
> + * USDT call is actually not a function call, but is instead replaced by
> + * a single NOP instruction (thus zero overhead, effectively). But in addition

This zero overhead claim bothers me less since NOP is directly mentioned.

> + * to that, those USDT macros generate special SHT_NOTE ELF records in
> + * .note.stapsdt ELF section. Here's an example USDT definition as emitted by
> + * `readelf -n <binary>`:

[...]

> + * Arguments is the most interesting part. This USDT specification string is
> + * providing information about all the USDT arguments and their locations. The
> + * part before @ sign defined byte size of the argument (1, 2, 4, or 8) and
> + * whether the argument is singed or unsigned (negative size means signed).
> + * The part after @ sign is assembly-like definition of argument location.
> + * Technically, assembler can provide some pretty advanced definitions, but

Perhaps it would be more precise to state that argument specifiers can be 'any
operand accepted by Gnu Asm Syntax' (per [0]).

[0]: https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

> + * libbpf is currently supporting three most common cases:
> + *   1) immediate constant, see 5th and 9th args above (-4@$5 and -4@-9);
> + *   2) register value, e.g., 8@%rdx, which means "unsigned 8-byte integer
> + *      whose value is in register %rdx";
> + *   3) memory dereference addressed by register, e.g., -4@-1204(%rbp), which
> + *      specifies signed 32-bit integer stored at offset -1204 bytes from
> + *      memory address stored in %rbp.

[...]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-04-04  0:23   ` Dave Marchevsky
@ 2022-04-04  3:55     ` Andrii Nakryiko
  0 siblings, 0 replies; 19+ messages in thread
From: Andrii Nakryiko @ 2022-04-04  3:55 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Alan Maguire, Hengqi Chen

On Sun, Apr 3, 2022 at 5:23 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>
> On 4/1/22 8:29 PM, Andrii Nakryiko wrote:
> > Add BPF-side implementation of libbpf-provided USDT support. This
> > consists of single header library, usdt.bpf.h, which is meant to be used
> > from user's BPF-side source code. This header is added to the list of
> > installed libbpf header, along bpf_helpers.h and others.
> >
> > BPF-side implementation consists of two BPF maps:
> >   - spec map, which contains "a USDT spec" which encodes information
> >     necessary to be able to fetch USDT arguments and other information
> >     (argument count, user-provided cookie value, etc) at runtime;
> >   - IP-to-spec-ID map, which is only used on kernels that don't support
> >     BPF cookie feature. It allows to lookup spec ID based on the place
> >     in user application that triggers USDT program.
> >
> > These maps have default sizes, 256 and 1024, which are chosen
> > conservatively to not waste a lot of space, but handling a lot of common
> > cases. But there could be cases when user application needs to either
> > trace a lot of different USDTs, or USDTs are heavily inlined and their
> > arguments are located in a lot of differing locations. For such cases it
> > might be necessary to size those maps up, which libbpf allows to do by
> > overriding BPF_USDT_MAX_SPEC_CNT and BPF_USDT_MAX_IP_CNT macros.
> >
> > It is an important aspect to keep in mind. Single USDT (user-space
> > equivalent of kernel tracepoint) can have multiple USDT "call sites".
> > That is, single logical USDT is triggered from multiple places in user
> > application. This can happen due to function inlining. Each such inlined
> > instance of USDT invocation can have its own unique USDT argument
> > specification (instructions about the location of the value of each of
> > USDT arguments). So while USDT looks very similar to usual uprobe or
> > kernel tracepoint, under the hood it's actually a collection of uprobes,
> > each potentially needing different spec to know how to fetch arguments.
> >
> > User-visible API consists of three helper functions:
> >   - bpf_usdt_arg_cnt(), which returns number of arguments of current USDT;
> >   - bpf_usdt_arg(), which reads value of specified USDT argument (by
> >     it's zero-indexed position) and returns it as 64-bit value;
> >   - bpf_usdt_cookie(), which functions like BPF cookie for USDT
> >     programs; this is necessary as libbpf doesn't allow specifying actual
> >     BPF cookie and utilizes it internally for USDT support implementation.
> >
> > Each bpf_usdt_xxx() APIs expect struct pt_regs * context, passed into
> > BPF program. On kernels that don't support BPF cookie it is used to
> > fetch absolute IP address of the underlying uprobe.
> >
> > usdt.bpf.h also provides BPF_USDT() macro, which functions like
> > BPF_PROG() and BPF_KPROBE() and allows much more user-friendly way to
> > get access to USDT arguments, if USDT definition is static and known to
> > the user. It is expected that majority of use cases won't have to use
> > bpf_usdt_arg_cnt() and bpf_usdt_arg() directly and BPF_USDT() will cover
> > all their needs.
> >
> > Last, usdt.bpf.h is utilizing BPF CO-RE for one single purpose: to
> > detect kernel support for BPF cookie. If BPF CO-RE dependency is
> > undesirable, user application can redefine BPF_USDT_HAS_BPF_COOKIE to
> > either a boolean constant (or equivalently zero and non-zero), or even
> > point it to its own .rodata variable that can be specified from user's
> > application user-space code. It is important that
> > BPF_USDT_HAS_BPF_COOKIE is known to BPF verifier as static value (thus
> > .rodata and not just .data), as otherwise BPF code will still contain
> > bpf_get_attach_cookie() BPF helper call and will fail validation at
> > runtime, if not dead-code eliminated.
> >
> > Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  tools/lib/bpf/Makefile   |   2 +-
> >  tools/lib/bpf/usdt.bpf.h | 256 +++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 257 insertions(+), 1 deletion(-)
> >  create mode 100644 tools/lib/bpf/usdt.bpf.h
>
> [...]
>
> > diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h
> > new file mode 100644
> > index 000000000000..0941c915d58d
> > --- /dev/null
> > +++ b/tools/lib/bpf/usdt.bpf.h
> > @@ -0,0 +1,256 @@
> > +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> > +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> > +#ifndef __USDT_BPF_H__
> > +#define __USDT_BPF_H__
> > +
> > +#include <linux/errno.h>
> > +#include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_tracing.h>
> > +#include <bpf/bpf_core_read.h>
> > +
> > +/* Below types and maps are internal implementation details of libbpf's USDT
> > + * support and are subjects to change. Also, usdt_xxx() API helpers should be
> > + * considered an unstable API as well and might be adjusted based on user
> > + * feedback from using libbpf's USDT support in production.
> > + */
> > +
> > +/* User can override BPF_USDT_MAX_SPEC_CNT to change default size of internal
> > + * map that keeps track of USDT argument specifications. This might be
> > + * necessary if there are a lot of USDT attachments.
> > + */
> > +#ifndef BPF_USDT_MAX_SPEC_CNT
> > +#define BPF_USDT_MAX_SPEC_CNT 256
> > +#endif
> > +/* User can override BPF_USDT_MAX_IP_CNT to change default size of internal
> > + * map that keeps track of IP (memory address) mapping to USDT argument
> > + * specification.
> > + * Note, if kernel supports BPF cookies, this map is not used and could be
> > + * resized all the way to 1 to save a bit of memory.
> > + */
> > +#ifndef BPF_USDT_MAX_IP_CNT
> > +#define BPF_USDT_MAX_IP_CNT (4 * BPF_USDT_MAX_SPEC_CNT)
> > +#endif
> > +/* We use BPF CO-RE to detect support for BPF cookie from BPF side. This is
> > + * the only dependency on CO-RE, so if it's undesirable, user can override
> > + * BPF_USDT_HAS_BPF_COOKIE to specify whether to BPF cookie is supported or not.
> > + */
> > +#ifndef BPF_USDT_HAS_BPF_COOKIE
> > +#define BPF_USDT_HAS_BPF_COOKIE \
> > +     bpf_core_enum_value_exists(enum bpf_func_id___usdt, BPF_FUNC_get_attach_cookie___usdt)
> > +#endif
> > +
> > +enum __bpf_usdt_arg_type {
> > +     BPF_USDT_ARG_CONST,
> > +     BPF_USDT_ARG_REG,
> > +     BPF_USDT_ARG_REG_DEREF,
> > +};
> > +
> > +struct __bpf_usdt_arg_spec {
> > +     /* u64 scalar interpreted depending on arg_type, see below */
> > +     __u64 val_off;
> > +     /* arg location case, see bpf_udst_arg() for details */
> > +     enum __bpf_usdt_arg_type arg_type;
> > +     /* offset of referenced register within struct pt_regs */
> > +     short reg_off;
>
> Although USDT args are almost always passed via regs in pt_regs, other regs are
> occasionally used. In BCC repo this has come up a few times recently ([0][1]).
> Notably, in [1] we found a USDT in libpthread using a 'xmm' register on a recent
> Fedora, so it's not just obscure libs' probe targets.
>
> Of course, there's currently no way to fetch 'xmm' registers from BPF programs.
> Alexei and I had a braindump session where he concluded that it would be useful
> to have some arch-specific helper to pull non-pt_regs regs and a concise repro.
> Am poking at this, hope to have something in next week or two.
>
> Anyways, if this lib didn't assume that USDT arg regs were always some reg_off
> into pt_regs it would be easier to extend in the near future. This could be
> simply reserving negative reg_off values to signal some arch-specific non
> pt_regs register, or a more explicit enum to signal.

Yeah, I'm aware of that. It's still easy to extend, it's just going to
be a different bpf_usdt_arg_type.

>
>   [0]: https://github.com/iovisor/bcc/issues/3875
>   [1]: https://github.com/iovisor/bcc/pull/3880
>
> > +     /* whether arg should be interpreted as signed value */
> > +     bool arg_signed;
> > +     /* number of bits that need to be cleared and, optionally,
> > +      * sign-extended to cast arguments that are 1, 2, or 4 bytes
> > +      * long into final 8-byte u64/s64 value returned to user
> > +      */
> > +     char arg_bitshift;
>
> Could this field instead be 'arg_sz' holding size of arg in bytes? Could derive
> bitshift from size where it's needed in this patch, and future potential uses of
> size would already have it on hand. Folks writing new arch-specific parsing logic
> like your "libbpf: add x86-specific USDT arg spec parsing logic" patch wouldn't
> need to derive bitshift from size or care about what size is used for.
>
> I don't feel strongly about this, just noticed that, aside from this field and
> reg_off, it's very easy to conceptually map this struct's fields 1:1 to what
> USDT arg looks like without any knowledge of this lib's inner workings.

It could be arg_sz, yep, but here I'm optimizing for simplest and
fastest runtime (BPF-side) code at the expense of setup-time
(user-space) code. I think that's the right tradeoff, as setting this
up is one-time (and thus could be slow and arbitrarily complicated)
step, while fetching values at runtime can be very frequent operation,
so should be as simple and fast as possible.


>
> > +};
>
> [...]
>
> > +static inline __noinline
> > +int bpf_usdt_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
> > +{
> > +     struct __bpf_usdt_spec *spec;
> > +     struct __bpf_usdt_arg_spec *arg_spec;
> > +     unsigned long val;
> > +     int err, spec_id;
> > +
> > +     *res = 0;
> > +
> > +     spec_id = __bpf_usdt_spec_id(ctx);
> > +     if (spec_id < 0)
> > +             return -ESRCH;
> > +
> > +     spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> > +     if (!spec)
> > +             return -ESRCH;
> > +
> > +     if (arg_num >= BPF_USDT_MAX_ARG_CNT || arg_num >= spec->arg_cnt)
> > +             return -ENOENT;
> > +
> > +     arg_spec = &spec->args[arg_num];
> > +     switch (arg_spec->arg_type) {
> > +     case BPF_USDT_ARG_CONST:
> > +             /* Arg is just a constant ("-4@$-9" in USDT arg spec).
> > +              * value is recorded in arg_spec->val_off directly.
> > +              */
> > +             val = arg_spec->val_off;
> > +             break;
> > +     case BPF_USDT_ARG_REG:
> > +             /* Arg is in a register (e.g, "8@%rax" in USDT arg spec),
> > +              * so we read the contents of that register directly from
> > +              * struct pt_regs. To keep things simple user-space parts
> > +              * record offsetof(struct pt_regs, <regname>) in arg_spec->reg_off.
> > +              */
> > +             err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
> > +             if (err)
> > +                     return err;
> > +             break;
> > +     case BPF_USDT_ARG_REG_DEREF:
> > +             /* Arg is in memory addressed by register, plus some offset
> > +              * (e.g., "-4@-1204(%rbp)" in USDT arg spec). Register is
> > +              * identified lik with BPF_USDT_ARG_REG case, and the offset
> > +              * is in arg_spec->val_off. We first fetch register contents
> > +              * from pt_regs, then do another user-space probe read to
> > +              * fetch argument value itself.
> > +              */
> > +             err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
> > +             if (err)
> > +                     return err;
> > +             err = bpf_probe_read_user(&val, sizeof(val), (void *)val + arg_spec->val_off);
> > +             if (err)
> > +                     return err;
> > +             break;
>
> Assuming either of previous reg_off suggestions are taken, very straightforward
> to extend this for non-pt_regs regs.

So no-pt_regs register values are going to be fetched very differently
than BPF_USDT_ARG_REG case above with bpf_probe_read_kernel(). It will
be a separate case clause with call to a BPF helper that returns some
arch-specific register. So either way not much to reuse between the
two. As for BPF_USDT_ARG_REG_DEREF, I actually assume that those fancy
xmm registers can't be used for pointer dereference, tbh, but if we
see that it does, then having a helper function for fetching register
value is a trivial refactoring away.

>
> > +     default:
> > +             return -EINVAL;
> > +     }
> > +
> > +     /* cast arg from 1, 2, or 4 bytes to final 8 byte size clearing
> > +      * necessary upper arg_bitshift bits, with sign extension if argument
> > +      * is signed
> > +      */
> > +     val <<= arg_spec->arg_bitshift;
> > +     if (arg_spec->arg_signed)
> > +             val = ((long)val) >> arg_spec->arg_bitshift;
> > +     else
> > +             val = val >> arg_spec->arg_bitshift;
> > +     *res = val;
> > +     return 0;
> > +}
>
> [...]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration
  2022-04-04  3:12   ` Dave Marchevsky
@ 2022-04-04  4:18     ` Andrii Nakryiko
  2022-04-05  1:00       ` Dave Marchevsky
  0 siblings, 1 reply; 19+ messages in thread
From: Andrii Nakryiko @ 2022-04-04  4:18 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Alan Maguire, Hengqi Chen

On Sun, Apr 3, 2022 at 8:12 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>
> On 4/1/22 8:29 PM, Andrii Nakryiko wrote:
> > Wire up libbpf USDT support APIs without yet implementing all the
> > nitty-gritty details of USDT discovery, spec parsing, and BPF map
> > initialization.
> >
> > User-visible user-space API is simple and is conceptually very similar
> > to uprobe API.
> >
> > bpf_program__attach_usdt() API allows to programmatically attach given
> > BPF program to a USDT, specified through binary path (executable or
> > shared lib), USDT provider and name. Also, just like in uprobe case, PID
> > filter is specified (0 - self, -1 - any process, or specific PID).
> > Optionally, USDT cookie value can be specified. Such single API
> > invocation will try to discover given USDT in specified binary and will
> > use (potentially many) BPF uprobes to attach this program in correct
> > locations.
> >
> > Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that
> > represents this attachment. It is a virtual BPF link that doesn't have
> > direct kernel object, as it can consist of multiple underlying BPF
> > uprobe links. As such, attachment is not atomic operation and there can
> > be brief moment when some USDT call sites are attached while others are
> > still in the process of attaching. This should be taken into
> > consideration by user. But bpf_program__attach_usdt() guarantees that
> > in the case of success all USDT call sites are successfully attached, or
> > all the successfuly attachments will be detached as soon as some USDT
> > call sites failed to be attached. So, in theory, there could be cases of
> > failed bpf_program__attach_usdt() call which did trigger few USDT
> > program invocations. This is unavoidable due to multi-uprobe nature of
> > USDT and has to be handled by user, if it's important to create an
> > illusion of atomicity.
>
> It would be useful to be able to control the behavior in response to attach
> failure in bpf_program__attach_usdt. Specifically I'd like to be able to
> choose between existing "all attaches succeed or entire operation fails" and
> "_any_ attach succeeds or entire operation fails". Few reasons for this:
>
>  * Tools like BOLT were not playing nicely with USDTs for some time ([0],[1])
>  * BCC's logic was changed to support more granular 'attach failure' logic ([2])
>  * At FB I still see some multi-probe USDTs with incorrect-looking locations on
>    some of the probes
>
> Note that my change for 2nd bullet was to handle ".so in shortlived process"
> usecase, which this lib handles by properly supporting pid = -1. But it's since
> come in handy to avoid 3rd bullet's issue from causing trouble.
>
> Production tracing tools would be less brittle if they could control this attach
> failure logic.
>

So, we have bpf_usdt_opts for that and can add this in the future. The
reason I didn't do it from the outset is that no other attach API
currently has this partial success behavior. For example, multi-attach
kprobe that we recently added is also an all-or-nothing API. So I
wanted to start out with this stricter approach and only allow to
change that if/when we have a clear case where this is objectively not
enough. The BOLT case you mentioned normally should have been solved
by fixing BOLT tooling itself, not by sloppier attach behavior in
kernel or libbpf.

For the [2], if you re-read comments, I've suggested to allow adding
one USDT at a time instead of the "partial failure is ok" option,
which you ended up doing. So your initial frustration was from
suboptimal BCC API. After you added init_usdt() call that allowed to
generate code for each individual binary+USDT target, you had all the
control you needed, right? So here, bpf_program__attach_usdt() is a
logical equivalent of that init_usdt() call from BCC, so should be all
good. If bpf_program__attach_usdt() fails for some process/binary that
is now gone, you can just ignore and continue attaching for other
binaries, right?

>   [0]: https://github.com/facebookincubator/BOLT/commit/ea49a61463c65775aa796a9ef7a1199f20d2a698
>   [1]: https://github.com/facebookincubator/BOLT/commit/93860e02a19227be4963a68aa99ea0e09771052b
>   [2]: https://github.com/iovisor/bcc/pull/2476
>
> > USDT BPF programs themselves are marked in BPF source code as either
> > SEC("usdt"), in which case they won't be auto-attached through
> > skeleton's <skel>__attach() method, or it can have a full definition,
> > which follows the spirit of fully-specified uprobes:
> > SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's
> > attach method will attempt auto-attachment. Similarly, generic
> > bpf_program__attach() will have enought information to go off of for
> > parameterless attachment.
> >
> > USDT BPF programs are actually uprobes, and as such for kernel they are
> > marked as BPF_PROG_TYPE_KPROBE.
> >
> > Another part of this patch is USDT-related feature probing:
> >   - BPF cookie support detection from user-space;
> >   - detection of kernel support for auto-refcounting of USDT semaphore.
> >
> > The latter is optional. If kernel doesn't support such feature and USDT
> > doesn't rely on USDT semaphores, no error is returned. But if libbpf
> > detects that USDT requires setting semaphores and kernel doesn't support
> > this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't
> > support poking process's memory directly to increment semaphore value,
> > like BCC does on legacy kernels, due to inherent raciness and danger of
> > such process memory manipulation. Libbpf let's kernel take care of this
> > properly or gives up.
> >
> > Logistically, all the extra USDT-related infrastructure of libbpf is put
> > into a separate usdt.c file and abstracted behind struct usdt_manager.
> > Each bpf_object has lazily-initialized usdt_manager pointer, which is
> > only instantiated if USDT programs are attempted to be attached. Closing
> > BPF object frees up usdt_manager resources. usdt_manager keeps track of
> > USDT spec ID assignment and few other small things.
> >
> > Subsequent patches will fill out remaining missing pieces of USDT
> > initialization and setup logic.
> >
> > Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  tools/lib/bpf/Build             |   3 +-
> >  tools/lib/bpf/libbpf.c          | 100 +++++++-
> >  tools/lib/bpf/libbpf.h          |  31 +++
> >  tools/lib/bpf/libbpf.map        |   1 +
> >  tools/lib/bpf/libbpf_internal.h |  19 ++
> >  tools/lib/bpf/usdt.c            | 426 ++++++++++++++++++++++++++++++++
> >  6 files changed, 571 insertions(+), 9 deletions(-)
> >  create mode 100644 tools/lib/bpf/usdt.c
>
> [...]
>
> > +static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link)
> > +{
> > +     char *path = NULL, *provider = NULL, *name = NULL;
> > +     const char *sec_name;
> > +     int n, err;
> > +
> > +     sec_name = bpf_program__section_name(prog);
> > +     if (strcmp(sec_name, "usdt") == 0) {
> > +             /* no auto-attach for just SEC("usdt") */
> > +             *link = NULL;
> > +             return 0;
> > +     }
> > +
> > +     n = sscanf(sec_name, "usdt/%m[^:]:%m[^:]:%m[^:]", &path, &provider, &name);
>
> TIL %m

yep it's pretty nifty in some cases. In general, sscanf() is pretty
great for a lot of pretty complicated cases without the need to go for
a full-blown regex engine.

>
> > +     if (n != 3) {
> > +             pr_warn("invalid section '%s', expected SEC(\"usdt/<path>:<provider>:<name>\")\n",
> > +                     sec_name);
> > +             err = -EINVAL;
> > +     } else {
> > +             *link = bpf_program__attach_usdt(prog, -1 /* any process */, path,
> > +                                              provider, name, NULL);
> > +             err = libbpf_get_error(*link);
> > +     }
> > +     free(path);
> > +     free(provider);
> > +     free(name);
> > +     return err;
> > +}
>
> [...]
>
> > diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> > new file mode 100644
> > index 000000000000..9476f7a15769
> > --- /dev/null
> > +++ b/tools/lib/bpf/usdt.c
> > @@ -0,0 +1,426 @@
> > +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> > +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> > +#include <ctype.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <libelf.h>
> > +#include <gelf.h>
> > +#include <unistd.h>
> > +#include <linux/ptrace.h>
> > +#include <linux/kernel.h>
> > +
> > +#include "bpf.h"
> > +#include "libbpf.h"
> > +#include "libbpf_common.h"
> > +#include "libbpf_internal.h"
> > +#include "hashmap.h"
> > +
> > +/* libbpf's USDT support consists of BPF-side state/code and user-space
> > + * state/code working together in concert. BPF-side parts are defined in
> > + * usdt.bpf.h header library. User-space state is encapsulated by struct
> > + * usdt_manager and all the supporting code centered around usdt_manager.
> > + *
> > + * usdt.bpf.h defines two BPF maps that usdt_manager expects: USDT spec map
> > + * and IP-to-spec-ID map, which is auxiliary map necessary for kernels that
> > + * don't support BPF cookie (see below). These two maps are implicitly
> > + * embedded into user's end BPF object file when user's code included
> > + * usdt.bpf.h. This means that libbpf doesn't do anything special to create
> > + * these USDT support maps. They are created by normal libbpf logic of
> > + * instantiating BPF maps when opening and loading BPF object.
> > + *
> > + * As such, libbpf is basically unaware of the need to do anything
> > + * USDT-related until the very first call to bpf_program__attach_usdt(), which
> > + * can be called by user explicitly or happen automatically during skeleton
> > + * attach (or, equivalently, through generic bpf_program__attach() call). At
> > + * this point, libbpf will instantiate and initialize struct usdt_manager and
> > + * store it in bpf_object. USDT manager is per-BPF object construct, as each
> > + * independent BPF object might or might not have USDT programs, and thus all
> > + * the expected USDT-related state. There is no coordination between two
> > + * bpf_object in parts of USDT attachment, they are oblivious of each other's
> > + * existence and libbpf is just oblivious, dealing with bpf_object-specific
> > + * USDT state.
> > + *
> > + * Quick crash course on USDTs.
> > + *
> > + * From user-space application's point of view, USDT is essentially just
> > + * a slightly special function call that normally has zero overhead, unless it
>
> Maybe better to claim 'low overhead' instead of 'zero' here and elsewhere?
> Or elaborate about the overhead more explicitly. Agreed that it's so low as to
> effectively be zero in most cases, but someone somewhere is going to put the nop
> in a 4096-bytes-of-instr tight loop, see changed iTLB behavior, and get
> frustrated.

Hm.... but it is literally zero overhead because that nop is pretty
much free. If you put USDT into a loop, you'll be seeing an overhead
of a loop itself, not of that nop. What's more, if you have input
arguments that are already prepared/calculated, there is no movement
of them, no stack frame creation and jump (as opposed to function
calls), USDT macro just record their location without changing the
properties of the code. So I stand by zero-overhead.

>
> A contrived scenario to be sure, but no other USDT documentation I can find
> claims zero overhead, rather 'low', guessing for a good reason.

The only real overhead of USDT will be if someone calculates some
expression just to pass its result into USDT. That's when USDT
semaphores come in, but that's a bit different from USDT invocation
overhead itself.

Think about ftrace preamble in almost every kernel function (we use
that for fentry/fexit and kprobe utilizes it for faster jump, if
kernel supports it). It's 5-byte nop, but I don't think anyone claims
that there is "low overhead" of ftrace when nothing is attached to the
kernel function. It is considered to be *zero overhead*, unless
someone attaches and replaces those 5 bytes with a call, jump, or
interrupt. USDT is similarly "zero overhead".

I think it's important to emphasize that this is objectively zero
overhead and encourage applications to utilize this technology,
instead of inadvertently scaring them away with vague "low overhead".

>
>
> > + * is being traced by some external entity (e.g, BPF-based tool). Here's how
> > + * a typical application can trigger USDT probe:
> > + *
> > + * #include <sys/sdt.h>  // provided by systemtap-sdt-devel package
> > + * // folly also provide similar functionality in folly/tracing/StaticTracepoint.h
> > + *
> > + * STAP_PROBE3(my_usdt_provider, my_usdt_probe_name, 123, x, &y);
> > + *
> > + * USDT is identified by it's <provider-name>:<probe-name> pair of names. Each
> > + * individual USDT has a fixed number of arguments (3 in the above example)
> > + * and specifies values of each argument as if it was a function call.
> > + *
> > + * USDT call is actually not a function call, but is instead replaced by
> > + * a single NOP instruction (thus zero overhead, effectively). But in addition
>
> This zero overhead claim bothers me less since NOP is directly mentioned.

Well, I already wrote up my tirade above and I'm living it anyways ;)

>
> > + * to that, those USDT macros generate special SHT_NOTE ELF records in
> > + * .note.stapsdt ELF section. Here's an example USDT definition as emitted by
> > + * `readelf -n <binary>`:
>
> [...]
>
> > + * Arguments is the most interesting part. This USDT specification string is
> > + * providing information about all the USDT arguments and their locations. The
> > + * part before @ sign defined byte size of the argument (1, 2, 4, or 8) and
> > + * whether the argument is singed or unsigned (negative size means signed).
> > + * The part after @ sign is assembly-like definition of argument location.
> > + * Technically, assembler can provide some pretty advanced definitions, but
>
> Perhaps it would be more precise to state that argument specifiers can be 'any
> operand accepted by Gnu Asm Syntax' (per [0]).
>
> [0]: https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Yep, will update and live the reference to that page.

>
> > + * libbpf is currently supporting three most common cases:
> > + *   1) immediate constant, see 5th and 9th args above (-4@$5 and -4@-9);
> > + *   2) register value, e.g., 8@%rdx, which means "unsigned 8-byte integer
> > + *      whose value is in register %rdx";
> > + *   3) memory dereference addressed by register, e.g., -4@-1204(%rbp), which
> > + *      specifies signed 32-bit integer stored at offset -1204 bytes from
> > + *      memory address stored in %rbp.
>
> [...]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 5/7] libbpf: add x86-specific USDT arg spec parsing logic
  2022-04-02  0:29 ` [PATCH v2 bpf-next 5/7] libbpf: add x86-specific USDT arg spec parsing logic Andrii Nakryiko
@ 2022-04-04  5:07   ` Dave Marchevsky
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Marchevsky @ 2022-04-04  5:07 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Alan Maguire, Hengqi Chen

On 4/1/22 8:29 PM, Andrii Nakryiko wrote:   
> Add x86/x86_64-specific USDT argument specification parsing. Each
> architecture will require their own logic, as all this is arch-specific
> assembly-based notation. Architectures that libbpf doesn't support for
> USDTs will pr_warn() with specific error and return -ENOTSUP.
> 
> We use sscanf() as a very powerful and easy to use string parser. Those
> spaces in sscanf's format string mean "skip any whitespaces", which is
> pretty nifty (and somewhat little known) feature.
> 
> All this was tested on little-endian architecture, so bit shifts are
> probably off on big-endian, which our CI will hopefully prove.
> 
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  tools/lib/bpf/usdt.c | 105 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 105 insertions(+)

Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 6/7] selftests/bpf: add basic USDT selftests
  2022-04-02  0:29 ` [PATCH v2 bpf-next 6/7] selftests/bpf: add basic USDT selftests Andrii Nakryiko
@ 2022-04-04  5:48   ` Dave Marchevsky
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Marchevsky @ 2022-04-04  5:48 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Alan Maguire, Hengqi Chen

On 4/1/22 8:29 PM, Andrii Nakryiko wrote:   
> Add semaphore-based USDT to test_progs itself and write basic tests to
> valicate both auto-attachment and manual attachment logic, as well as
> BPF-side functionality.
> 
> Also add subtests to validate that libbpf properly deduplicates USDT
> specs and handles spec overflow situations correctly, as well as proper
> "rollback" of partially-attached multi-spec USDT.
> 
> BPF-side of selftest intentionally consists of two files to validate
> that usdt.bpf.h header can be included from multiple source code files
> that are subsequently linked into final BPF object file without causing
> any symbol duplication or other issues. We are validating that __weak
> maps and bpf_usdt_xxx() API functions defined in usdt.bpf.h do work as
> intended.
> 
> USDT selftests use sys/sdt.h header provided by systemtap-sdt-devel, so
> document that build-time dependency in Documentation/bpf/bpf_devel_QA.rst.
> 
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  Documentation/bpf/bpf_devel_QA.rst            |   3 +
>  tools/testing/selftests/bpf/Makefile          |  14 +-
>  tools/testing/selftests/bpf/prog_tests/usdt.c | 313 ++++++++++++++++++
>  tools/testing/selftests/bpf/progs/test_usdt.c |  96 ++++++
>  .../selftests/bpf/progs/test_usdt_multispec.c |  34 ++
>  5 files changed, 454 insertions(+), 6 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/usdt.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_usdt.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_usdt_multispec.c

[...]

> diff --git a/tools/testing/selftests/bpf/progs/test_usdt_multispec.c b/tools/testing/selftests/bpf/progs/test_usdt_multispec.c
> new file mode 100644
> index 000000000000..96fe128790d1
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_usdt_multispec.c
> @@ -0,0 +1,34 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +
> +#include "vmlinux.h"
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/usdt.bpf.h>
> +
> +/* this file is linked together with test_usdt.c to validate that ust.bpf.h

nit: usdt.bpf.h in above comment

> + * can be included in multiple .bpf.c files forming single final BPF object
> + * file
> + */

Acked-by: Dave Marchevsky <davemarchevsky@fb.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 7/7] selftests/bpf: add urandom_read shared lib and USDTs
  2022-04-02  0:29 ` [PATCH v2 bpf-next 7/7] selftests/bpf: add urandom_read shared lib and USDTs Andrii Nakryiko
@ 2022-04-04 18:34   ` Dave Marchevsky
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Marchevsky @ 2022-04-04 18:34 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Alan Maguire, Hengqi Chen

On 4/1/22 8:29 PM, Andrii Nakryiko wrote:   
> Extend urandom_read helper binary to include USDTs of 4 combinations:
> semaphore/semaphoreless (refcounted and non-refcounted) and based in
> executable or shared library. We also extend urandom_read with ability
> to report it's own PID to parent process and wait for parent process to
> ready itself up for tracing urandom_read. We utilize popen() and
> underlying pipe properties for proper signaling.
> 
> Once urandom_read is ready, we add few tests to validate that libbpf's
> USDT attachment handles all the above combinations of semaphore (or lack
> of it) and static or shared library USDTs. Also, we validate that libbpf
> handles shared libraries both with PID filter and without one (i.e., -1
> for PID argument).
> 
> Having the shared library case tested with and without PID is important
> because internal logic differs on kernels that don't support BPF
> cookies. On such older kernels, attaching to USDTs in shared libraries
> without specifying concrete PID doesn't work in principle, because it's
> impossible to determine shared library's load address to derive absolute
> IPs for uprobe attachments. Without absolute IPs, it's impossible to
> perform correct look up of USDT spec based on uprobe's absolute IP (the
> only kind available from BPF at runtime). This is not the problem on
> newer kernels with BPF cookie as we don't need IP-to-ID lookup because
> BPF cookie value *is* spec ID.
> 
> So having those two situations as separate subtests is good because
> libbpf CI is able to test latest selftests against old kernels (e.g.,
> 4.9 and 5.5), so we'll be able to disable PID-less shared lib attachment
> for old kernels, but will still leave PID-specific one enabled to validate
> this legacy logic is working correctly.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---

Acked-by: Dave Marchevsky <davemarchevsky@fb.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 4/7] libbpf: wire up spec management and other arch-independent USDT logic
  2022-04-02  0:29 ` [PATCH v2 bpf-next 4/7] libbpf: wire up spec management and other arch-independent USDT logic Andrii Nakryiko
@ 2022-04-04 19:58   ` Dave Marchevsky
  2022-04-04 22:21     ` Andrii Nakryiko
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Marchevsky @ 2022-04-04 19:58 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel; +Cc: kernel-team, Alan Maguire, Hengqi Chen

On 4/1/22 8:29 PM, Andrii Nakryiko wrote:   
> Last part of architecture-agnostic user-space USDT handling logic is to
> set up BPF spec and, optionally, IP-to-ID maps from user-space.
> usdt_manager performs a compact spec ID allocation to utilize
> fixed-sized BPF maps as efficiently as possible. We also use hashmap to
> deduplicate USDT arg spec strings and map identical strings to single
> USDT spec, minimizing the necessary BPF map size. usdt_manager supports
> arbitrary sequences of attachment and detachment, both of the same USDT
> and multiple different USDTs and internally maintains a free list of
> unused spec IDs. bpf_link_usdt's logic is extended with proper setup and
> teardown of this spec ID free list and supporting BPF maps.
> 
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  tools/lib/bpf/usdt.c | 168 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 167 insertions(+), 1 deletion(-)

[...]

> diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> index c9eff690e291..afbae742c081 100644
> --- a/tools/lib/bpf/usdt.c
> +++ b/tools/lib/bpf/usdt.c

[...]

> +static size_t specs_hash_fn(const void *key, void *ctx)
> +{
> +	const char *s = key;
> +
> +	return str_hash(s);
> +}
> +
> +static bool specs_equal_fn(const void *key1, const void *key2, void *ctx)
> +{
> +	const char *s1 = key1;
> +	const char *s2 = key2;
> +
> +	return strcmp(s1, s2) == 0;
> +}

IIUC, you're not worried about diabolical strings in strcmp and str_hash here
because of sanity checking in parse_usdt_note?

Anyways,

Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 4/7] libbpf: wire up spec management and other arch-independent USDT logic
  2022-04-04 19:58   ` Dave Marchevsky
@ 2022-04-04 22:21     ` Andrii Nakryiko
  0 siblings, 0 replies; 19+ messages in thread
From: Andrii Nakryiko @ 2022-04-04 22:21 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Alan Maguire, Hengqi Chen

On Mon, Apr 4, 2022 at 12:59 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>
> On 4/1/22 8:29 PM, Andrii Nakryiko wrote:
> > Last part of architecture-agnostic user-space USDT handling logic is to
> > set up BPF spec and, optionally, IP-to-ID maps from user-space.
> > usdt_manager performs a compact spec ID allocation to utilize
> > fixed-sized BPF maps as efficiently as possible. We also use hashmap to
> > deduplicate USDT arg spec strings and map identical strings to single
> > USDT spec, minimizing the necessary BPF map size. usdt_manager supports
> > arbitrary sequences of attachment and detachment, both of the same USDT
> > and multiple different USDTs and internally maintains a free list of
> > unused spec IDs. bpf_link_usdt's logic is extended with proper setup and
> > teardown of this spec ID free list and supporting BPF maps.
> >
> > Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  tools/lib/bpf/usdt.c | 168 ++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 167 insertions(+), 1 deletion(-)
>
> [...]
>
> > diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> > index c9eff690e291..afbae742c081 100644
> > --- a/tools/lib/bpf/usdt.c
> > +++ b/tools/lib/bpf/usdt.c
>
> [...]
>
> > +static size_t specs_hash_fn(const void *key, void *ctx)
> > +{
> > +     const char *s = key;
> > +
> > +     return str_hash(s);
> > +}
> > +
> > +static bool specs_equal_fn(const void *key1, const void *key2, void *ctx)
> > +{
> > +     const char *s1 = key1;
> > +     const char *s2 = key2;
> > +
> > +     return strcmp(s1, s2) == 0;
> > +}
>
> IIUC, you're not worried about diabolical strings in strcmp and str_hash here
> because of sanity checking in parse_usdt_note?
>

Yeah, we perform parse_usdt_spec() before we get to hashmap, so if
there is anything too funky about spec string we'll error out much
sooner.

> Anyways,
>
> Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration
  2022-04-04  4:18     ` Andrii Nakryiko
@ 2022-04-05  1:00       ` Dave Marchevsky
  2022-04-05 23:41         ` Andrii Nakryiko
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Marchevsky @ 2022-04-05  1:00 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Alan Maguire, Hengqi Chen

On 4/4/22 12:18 AM, Andrii Nakryiko wrote:   
> On Sun, Apr 3, 2022 at 8:12 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>>
>> On 4/1/22 8:29 PM, Andrii Nakryiko wrote:
>>> Wire up libbpf USDT support APIs without yet implementing all the
>>> nitty-gritty details of USDT discovery, spec parsing, and BPF map
>>> initialization.
>>>
>>> User-visible user-space API is simple and is conceptually very similar
>>> to uprobe API.
>>>
>>> bpf_program__attach_usdt() API allows to programmatically attach given
>>> BPF program to a USDT, specified through binary path (executable or
>>> shared lib), USDT provider and name. Also, just like in uprobe case, PID
>>> filter is specified (0 - self, -1 - any process, or specific PID).
>>> Optionally, USDT cookie value can be specified. Such single API
>>> invocation will try to discover given USDT in specified binary and will
>>> use (potentially many) BPF uprobes to attach this program in correct
>>> locations.
>>>
>>> Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that
>>> represents this attachment. It is a virtual BPF link that doesn't have
>>> direct kernel object, as it can consist of multiple underlying BPF
>>> uprobe links. As such, attachment is not atomic operation and there can
>>> be brief moment when some USDT call sites are attached while others are
>>> still in the process of attaching. This should be taken into
>>> consideration by user. But bpf_program__attach_usdt() guarantees that
>>> in the case of success all USDT call sites are successfully attached, or
>>> all the successfuly attachments will be detached as soon as some USDT
>>> call sites failed to be attached. So, in theory, there could be cases of
>>> failed bpf_program__attach_usdt() call which did trigger few USDT
>>> program invocations. This is unavoidable due to multi-uprobe nature of
>>> USDT and has to be handled by user, if it's important to create an
>>> illusion of atomicity.
>>
>> It would be useful to be able to control the behavior in response to attach
>> failure in bpf_program__attach_usdt. Specifically I'd like to be able to
>> choose between existing "all attaches succeed or entire operation fails" and
>> "_any_ attach succeeds or entire operation fails". Few reasons for this:
>>
>>  * Tools like BOLT were not playing nicely with USDTs for some time ([0],[1])
>>  * BCC's logic was changed to support more granular 'attach failure' logic ([2])
>>  * At FB I still see some multi-probe USDTs with incorrect-looking locations on
>>    some of the probes
>>
>> Note that my change for 2nd bullet was to handle ".so in shortlived process"
>> usecase, which this lib handles by properly supporting pid = -1. But it's since
>> come in handy to avoid 3rd bullet's issue from causing trouble.
>>
>> Production tracing tools would be less brittle if they could control this attach
>> failure logic.
>>
> 
> So, we have bpf_usdt_opts for that and can add this in the future. The
> reason I didn't do it from the outset is that no other attach API
> currently has this partial success behavior. For example, multi-attach
> kprobe that we recently added is also an all-or-nothing API. So I
> wanted to start out with this stricter approach and only allow to
> change that if/when we have a clear case where this is objectively not
> enough. The BOLT case you mentioned normally should have been solved
> by fixing BOLT tooling itself, not by sloppier attach behavior in
> kernel or libbpf.

Re: BOLT - agreed that it's better to find the root cause and fix that. But for
some time before root fix is deployed there will be binaries with some incorrect
USDT notes, and tracing programs will still want to use valid USDT notes to
collect data. I think this happens fairly often.

In fact I found an example last week while investigating something unrelated.
Here's a (snipped) 'readelf -n' from a binary in a prod env:

  stapsdt              0x0000007f	NT_STAPSDT (SystemTap probe descriptors)
    Provider: thrift
    Name: thread_manager_task_stats
    Location: 0x0000000000000077, Base: 0x0000000000000000, Semaphore: 0x0000000000000000
    Arguments: -8@-256(%rbp) -8@-248(%rbp) -8@-240(%rbp) -8@-232(%rbp) -8@-224(%rbp)
  stapsdt              0x0000007f	NT_STAPSDT (SystemTap probe descriptors)
    Provider: thrift
    Name: thread_manager_task_stats
    Location: 0x0000000002bd1eb3, Base: 0x0000000000000000, Semaphore: 0x0000000000000000
    Arguments: -8@-272(%rbp) -8@-264(%rbp) -8@-256(%rbp) -8@-248(%rbp) -8@-240(%rbp)

Coming from thrift [0]. Note that this is an ET_EXEC ELF, so the first probe's
location is certainly invalid. Second looks reasonable. IIUC if I wanted to
attach to this USDT, find_elf_seg would fail to find a segment for first probe,
causing collect_usdt_targets to error out as well. BCC complains about this
([1]) but continues.

  [0]: https://github.com/facebook/fbthrift/blob/main/thrift/lib/cpp/concurrency/ThreadManager.cpp#L1003
  [1]: https://github.com/iovisor/bcc/blob/master/src/cc/bcc_elf.c#L127

> 
> For the [2], if you re-read comments, I've suggested to allow adding
> one USDT at a time instead of the "partial failure is ok" option,
> which you ended up doing. So your initial frustration was from
> suboptimal BCC API. After you added init_usdt() call that allowed to
> generate code for each individual binary+USDT target, you had all the
> control you needed, right? So here, bpf_program__attach_usdt() is a
> logical equivalent of that init_usdt() call from BCC, so should be all
> good. If bpf_program__attach_usdt() fails for some process/binary that
> is now gone, you can just ignore and continue attaching for other
> binaries, right?

You're right, goal of that BCC PR was to relax strictness at USDT level, not
probe level. I included it to demonstrate that giving users more control over
attach behavior has been useful in the past. If BCC had similar strictness
at probe level I would've sent a PR to address that as well, as it would've
broken a tracing daemon by now.

>>   [0]: https://github.com/facebookincubator/BOLT/commit/ea49a61463c65775aa796a9ef7a1199f20d2a698
>>   [1]: https://github.com/facebookincubator/BOLT/commit/93860e02a19227be4963a68aa99ea0e09771052b
>>   [2]: https://github.com/iovisor/bcc/pull/2476
>>
>>> USDT BPF programs themselves are marked in BPF source code as either
>>> SEC("usdt"), in which case they won't be auto-attached through
>>> skeleton's <skel>__attach() method, or it can have a full definition,
>>> which follows the spirit of fully-specified uprobes:
>>> SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's
>>> attach method will attempt auto-attachment. Similarly, generic
>>> bpf_program__attach() will have enought information to go off of for
>>> parameterless attachment.
>>>
>>> USDT BPF programs are actually uprobes, and as such for kernel they are
>>> marked as BPF_PROG_TYPE_KPROBE.
>>>
>>> Another part of this patch is USDT-related feature probing:
>>>   - BPF cookie support detection from user-space;
>>>   - detection of kernel support for auto-refcounting of USDT semaphore.
>>>
>>> The latter is optional. If kernel doesn't support such feature and USDT
>>> doesn't rely on USDT semaphores, no error is returned. But if libbpf
>>> detects that USDT requires setting semaphores and kernel doesn't support
>>> this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't
>>> support poking process's memory directly to increment semaphore value,
>>> like BCC does on legacy kernels, due to inherent raciness and danger of
>>> such process memory manipulation. Libbpf let's kernel take care of this
>>> properly or gives up.
>>>
>>> Logistically, all the extra USDT-related infrastructure of libbpf is put
>>> into a separate usdt.c file and abstracted behind struct usdt_manager.
>>> Each bpf_object has lazily-initialized usdt_manager pointer, which is
>>> only instantiated if USDT programs are attempted to be attached. Closing
>>> BPF object frees up usdt_manager resources. usdt_manager keeps track of
>>> USDT spec ID assignment and few other small things.
>>>
>>> Subsequent patches will fill out remaining missing pieces of USDT
>>> initialization and setup logic.
>>>
>>> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>>> ---
>>>  tools/lib/bpf/Build             |   3 +-
>>>  tools/lib/bpf/libbpf.c          | 100 +++++++-
>>>  tools/lib/bpf/libbpf.h          |  31 +++
>>>  tools/lib/bpf/libbpf.map        |   1 +
>>>  tools/lib/bpf/libbpf_internal.h |  19 ++
>>>  tools/lib/bpf/usdt.c            | 426 ++++++++++++++++++++++++++++++++
>>>  6 files changed, 571 insertions(+), 9 deletions(-)
>>>  create mode 100644 tools/lib/bpf/usdt.c
>>
>> [...]
>>
>>> +static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link)
>>> +{
>>> +     char *path = NULL, *provider = NULL, *name = NULL;
>>> +     const char *sec_name;
>>> +     int n, err;
>>> +
>>> +     sec_name = bpf_program__section_name(prog);
>>> +     if (strcmp(sec_name, "usdt") == 0) {
>>> +             /* no auto-attach for just SEC("usdt") */
>>> +             *link = NULL;
>>> +             return 0;
>>> +     }
>>> +
>>> +     n = sscanf(sec_name, "usdt/%m[^:]:%m[^:]:%m[^:]", &path, &provider, &name);
>>
>> TIL %m
> 
> yep it's pretty nifty in some cases. In general, sscanf() is pretty
> great for a lot of pretty complicated cases without the need to go for
> a full-blown regex engine.
> 
>>
>>> +     if (n != 3) {
>>> +             pr_warn("invalid section '%s', expected SEC(\"usdt/<path>:<provider>:<name>\")\n",
>>> +                     sec_name);
>>> +             err = -EINVAL;
>>> +     } else {
>>> +             *link = bpf_program__attach_usdt(prog, -1 /* any process */, path,
>>> +                                              provider, name, NULL);
>>> +             err = libbpf_get_error(*link);
>>> +     }
>>> +     free(path);
>>> +     free(provider);
>>> +     free(name);
>>> +     return err;
>>> +}
>>
>> [...]
>>
>>> diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
>>> new file mode 100644
>>> index 000000000000..9476f7a15769
>>> --- /dev/null
>>> +++ b/tools/lib/bpf/usdt.c
>>> @@ -0,0 +1,426 @@
>>> +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
>>> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
>>> +#include <ctype.h>
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +#include <string.h>
>>> +#include <libelf.h>
>>> +#include <gelf.h>
>>> +#include <unistd.h>
>>> +#include <linux/ptrace.h>
>>> +#include <linux/kernel.h>
>>> +
>>> +#include "bpf.h"
>>> +#include "libbpf.h"
>>> +#include "libbpf_common.h"
>>> +#include "libbpf_internal.h"
>>> +#include "hashmap.h"
>>> +
>>> +/* libbpf's USDT support consists of BPF-side state/code and user-space
>>> + * state/code working together in concert. BPF-side parts are defined in
>>> + * usdt.bpf.h header library. User-space state is encapsulated by struct
>>> + * usdt_manager and all the supporting code centered around usdt_manager.
>>> + *
>>> + * usdt.bpf.h defines two BPF maps that usdt_manager expects: USDT spec map
>>> + * and IP-to-spec-ID map, which is auxiliary map necessary for kernels that
>>> + * don't support BPF cookie (see below). These two maps are implicitly
>>> + * embedded into user's end BPF object file when user's code included
>>> + * usdt.bpf.h. This means that libbpf doesn't do anything special to create
>>> + * these USDT support maps. They are created by normal libbpf logic of
>>> + * instantiating BPF maps when opening and loading BPF object.
>>> + *
>>> + * As such, libbpf is basically unaware of the need to do anything
>>> + * USDT-related until the very first call to bpf_program__attach_usdt(), which
>>> + * can be called by user explicitly or happen automatically during skeleton
>>> + * attach (or, equivalently, through generic bpf_program__attach() call). At
>>> + * this point, libbpf will instantiate and initialize struct usdt_manager and
>>> + * store it in bpf_object. USDT manager is per-BPF object construct, as each
>>> + * independent BPF object might or might not have USDT programs, and thus all
>>> + * the expected USDT-related state. There is no coordination between two
>>> + * bpf_object in parts of USDT attachment, they are oblivious of each other's
>>> + * existence and libbpf is just oblivious, dealing with bpf_object-specific
>>> + * USDT state.
>>> + *
>>> + * Quick crash course on USDTs.
>>> + *
>>> + * From user-space application's point of view, USDT is essentially just
>>> + * a slightly special function call that normally has zero overhead, unless it
>>
>> Maybe better to claim 'low overhead' instead of 'zero' here and elsewhere?
>> Or elaborate about the overhead more explicitly. Agreed that it's so low as to
>> effectively be zero in most cases, but someone somewhere is going to put the nop
>> in a 4096-bytes-of-instr tight loop, see changed iTLB behavior, and get
>> frustrated.
> 
> Hm.... but it is literally zero overhead because that nop is pretty
> much free. If you put USDT into a loop, you'll be seeing an overhead
> of a loop itself, not of that nop. What's more, if you have input
> arguments that are already prepared/calculated, there is no movement
> of them, no stack frame creation and jump (as opposed to function
> calls), USDT macro just record their location without changing the
> properties of the code. So I stand by zero-overhead.
> 
>>
>> A contrived scenario to be sure, but no other USDT documentation I can find
>> claims zero overhead, rather 'low', guessing for a good reason.
> 
> The only real overhead of USDT will be if someone calculates some
> expression just to pass its result into USDT. That's when USDT
> semaphores come in, but that's a bit different from USDT invocation
> overhead itself.
> 
> Think about ftrace preamble in almost every kernel function (we use
> that for fentry/fexit and kprobe utilizes it for faster jump, if
> kernel supports it). It's 5-byte nop, but I don't think anyone claims
> that there is "low overhead" of ftrace when nothing is attached to the
> kernel function. It is considered to be *zero overhead*, unless
> someone attaches and replaces those 5 bytes with a call, jump, or
> interrupt. USDT is similarly "zero overhead".
> 
> I think it's important to emphasize that this is objectively zero
> overhead and encourage applications to utilize this technology,
> instead of inadvertently scaring them away with vague "low overhead".
> 
>>
>>
>>> + * is being traced by some external entity (e.g, BPF-based tool). Here's how
>>> + * a typical application can trigger USDT probe:
>>> + *
>>> + * #include <sys/sdt.h>  // provided by systemtap-sdt-devel package
>>> + * // folly also provide similar functionality in folly/tracing/StaticTracepoint.h
>>> + *
>>> + * STAP_PROBE3(my_usdt_provider, my_usdt_probe_name, 123, x, &y);
>>> + *
>>> + * USDT is identified by it's <provider-name>:<probe-name> pair of names. Each
>>> + * individual USDT has a fixed number of arguments (3 in the above example)
>>> + * and specifies values of each argument as if it was a function call.
>>> + *
>>> + * USDT call is actually not a function call, but is instead replaced by
>>> + * a single NOP instruction (thus zero overhead, effectively). But in addition
>>
>> This zero overhead claim bothers me less since NOP is directly mentioned.
> 
> Well, I already wrote up my tirade above and I'm living it anyways ;)
> 
>>
>>> + * to that, those USDT macros generate special SHT_NOTE ELF records in
>>> + * .note.stapsdt ELF section. Here's an example USDT definition as emitted by
>>> + * `readelf -n <binary>`:
>>
>> [...]
>>
>>> + * Arguments is the most interesting part. This USDT specification string is
>>> + * providing information about all the USDT arguments and their locations. The
>>> + * part before @ sign defined byte size of the argument (1, 2, 4, or 8) and
>>> + * whether the argument is singed or unsigned (negative size means signed).
>>> + * The part after @ sign is assembly-like definition of argument location.
>>> + * Technically, assembler can provide some pretty advanced definitions, but
>>
>> Perhaps it would be more precise to state that argument specifiers can be 'any
>> operand accepted by Gnu Asm Syntax' (per [0]).
>>
>> [0]: https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation 
> 
> Yep, will update and live the reference to that page.
> 
>>
>>> + * libbpf is currently supporting three most common cases:
>>> + *   1) immediate constant, see 5th and 9th args above (-4@$5 and -4@-9);
>>> + *   2) register value, e.g., 8@%rdx, which means "unsigned 8-byte integer
>>> + *      whose value is in register %rdx";
>>> + *   3) memory dereference addressed by register, e.g., -4@-1204(%rbp), which
>>> + *      specifies signed 32-bit integer stored at offset -1204 bytes from
>>> + *      memory address stored in %rbp.
>>
>> [...]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration
  2022-04-05  1:00       ` Dave Marchevsky
@ 2022-04-05 23:41         ` Andrii Nakryiko
  0 siblings, 0 replies; 19+ messages in thread
From: Andrii Nakryiko @ 2022-04-05 23:41 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Alan Maguire, Hengqi Chen

On Mon, Apr 4, 2022 at 6:00 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>
> On 4/4/22 12:18 AM, Andrii Nakryiko wrote:
> > On Sun, Apr 3, 2022 at 8:12 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
> >>
> >> On 4/1/22 8:29 PM, Andrii Nakryiko wrote:
> >>> Wire up libbpf USDT support APIs without yet implementing all the
> >>> nitty-gritty details of USDT discovery, spec parsing, and BPF map
> >>> initialization.
> >>>
> >>> User-visible user-space API is simple and is conceptually very similar
> >>> to uprobe API.
> >>>
> >>> bpf_program__attach_usdt() API allows to programmatically attach given
> >>> BPF program to a USDT, specified through binary path (executable or
> >>> shared lib), USDT provider and name. Also, just like in uprobe case, PID
> >>> filter is specified (0 - self, -1 - any process, or specific PID).
> >>> Optionally, USDT cookie value can be specified. Such single API
> >>> invocation will try to discover given USDT in specified binary and will
> >>> use (potentially many) BPF uprobes to attach this program in correct
> >>> locations.
> >>>
> >>> Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that
> >>> represents this attachment. It is a virtual BPF link that doesn't have
> >>> direct kernel object, as it can consist of multiple underlying BPF
> >>> uprobe links. As such, attachment is not atomic operation and there can
> >>> be brief moment when some USDT call sites are attached while others are
> >>> still in the process of attaching. This should be taken into
> >>> consideration by user. But bpf_program__attach_usdt() guarantees that
> >>> in the case of success all USDT call sites are successfully attached, or
> >>> all the successfuly attachments will be detached as soon as some USDT
> >>> call sites failed to be attached. So, in theory, there could be cases of
> >>> failed bpf_program__attach_usdt() call which did trigger few USDT
> >>> program invocations. This is unavoidable due to multi-uprobe nature of
> >>> USDT and has to be handled by user, if it's important to create an
> >>> illusion of atomicity.
> >>
> >> It would be useful to be able to control the behavior in response to attach
> >> failure in bpf_program__attach_usdt. Specifically I'd like to be able to
> >> choose between existing "all attaches succeed or entire operation fails" and
> >> "_any_ attach succeeds or entire operation fails". Few reasons for this:
> >>
> >>  * Tools like BOLT were not playing nicely with USDTs for some time ([0],[1])
> >>  * BCC's logic was changed to support more granular 'attach failure' logic ([2])
> >>  * At FB I still see some multi-probe USDTs with incorrect-looking locations on
> >>    some of the probes
> >>
> >> Note that my change for 2nd bullet was to handle ".so in shortlived process"
> >> usecase, which this lib handles by properly supporting pid = -1. But it's since
> >> come in handy to avoid 3rd bullet's issue from causing trouble.
> >>
> >> Production tracing tools would be less brittle if they could control this attach
> >> failure logic.
> >>
> >
> > So, we have bpf_usdt_opts for that and can add this in the future. The
> > reason I didn't do it from the outset is that no other attach API
> > currently has this partial success behavior. For example, multi-attach
> > kprobe that we recently added is also an all-or-nothing API. So I
> > wanted to start out with this stricter approach and only allow to
> > change that if/when we have a clear case where this is objectively not
> > enough. The BOLT case you mentioned normally should have been solved
> > by fixing BOLT tooling itself, not by sloppier attach behavior in
> > kernel or libbpf.
>
> Re: BOLT - agreed that it's better to find the root cause and fix that. But for
> some time before root fix is deployed there will be binaries with some incorrect
> USDT notes, and tracing programs will still want to use valid USDT notes to
> collect data. I think this happens fairly often.
>
> In fact I found an example last week while investigating something unrelated.
> Here's a (snipped) 'readelf -n' from a binary in a prod env:
>
>   stapsdt              0x0000007f       NT_STAPSDT (SystemTap probe descriptors)
>     Provider: thrift
>     Name: thread_manager_task_stats
>     Location: 0x0000000000000077, Base: 0x0000000000000000, Semaphore: 0x0000000000000000
>     Arguments: -8@-256(%rbp) -8@-248(%rbp) -8@-240(%rbp) -8@-232(%rbp) -8@-224(%rbp)
>   stapsdt              0x0000007f       NT_STAPSDT (SystemTap probe descriptors)
>     Provider: thrift
>     Name: thread_manager_task_stats
>     Location: 0x0000000002bd1eb3, Base: 0x0000000000000000, Semaphore: 0x0000000000000000
>     Arguments: -8@-272(%rbp) -8@-264(%rbp) -8@-256(%rbp) -8@-248(%rbp) -8@-240(%rbp)
>
> Coming from thrift [0]. Note that this is an ET_EXEC ELF, so the first probe's
> location is certainly invalid. Second looks reasonable. IIUC if I wanted to
> attach to this USDT, find_elf_seg would fail to find a segment for first probe,
> causing collect_usdt_targets to error out as well. BCC complains about this
> ([1]) but continues.
>
>   [0]: https://github.com/facebook/fbthrift/blob/main/thrift/lib/cpp/concurrency/ThreadManager.cpp#L1003
>   [1]: https://github.com/iovisor/bcc/blob/master/src/cc/bcc_elf.c#L127
>

Good thing is that we have an easy way to add these options (opts
struct), but let me first investigate what's going on here and try to
root cause it. Giving a too easy "way out" for cases like this
disincentivizes root causing and fixing the actual underlying issue.
I'll leave this relaxed option as a follow up based on additional
production rollout results.

> >
> > For the [2], if you re-read comments, I've suggested to allow adding
> > one USDT at a time instead of the "partial failure is ok" option,
> > which you ended up doing. So your initial frustration was from
> > suboptimal BCC API. After you added init_usdt() call that allowed to
> > generate code for each individual binary+USDT target, you had all the
> > control you needed, right? So here, bpf_program__attach_usdt() is a
> > logical equivalent of that init_usdt() call from BCC, so should be all
> > good. If bpf_program__attach_usdt() fails for some process/binary that
> > is now gone, you can just ignore and continue attaching for other
> > binaries, right?
>
> You're right, goal of that BCC PR was to relax strictness at USDT level, not
> probe level. I included it to demonstrate that giving users more control over
> attach behavior has been useful in the past. If BCC had similar strictness
> at probe level I would've sent a PR to address that as well, as it would've
> broken a tracing daemon by now.
>
> >>   [0]: https://github.com/facebookincubator/BOLT/commit/ea49a61463c65775aa796a9ef7a1199f20d2a698
> >>   [1]: https://github.com/facebookincubator/BOLT/commit/93860e02a19227be4963a68aa99ea0e09771052b
> >>   [2]: https://github.com/iovisor/bcc/pull/2476
> >>
> >>> USDT BPF programs themselves are marked in BPF source code as either
> >>> SEC("usdt"), in which case they won't be auto-attached through
> >>> skeleton's <skel>__attach() method, or it can have a full definition,
> >>> which follows the spirit of fully-specified uprobes:
> >>> SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's
> >>> attach method will attempt auto-attachment. Similarly, generic
> >>> bpf_program__attach() will have enought information to go off of for
> >>> parameterless attachment.
> >>>
> >>> USDT BPF programs are actually uprobes, and as such for kernel they are
> >>> marked as BPF_PROG_TYPE_KPROBE.
> >>>
> >>> Another part of this patch is USDT-related feature probing:
> >>>   - BPF cookie support detection from user-space;
> >>>   - detection of kernel support for auto-refcounting of USDT semaphore.
> >>>
> >>> The latter is optional. If kernel doesn't support such feature and USDT
> >>> doesn't rely on USDT semaphores, no error is returned. But if libbpf
> >>> detects that USDT requires setting semaphores and kernel doesn't support
> >>> this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't
> >>> support poking process's memory directly to increment semaphore value,
> >>> like BCC does on legacy kernels, due to inherent raciness and danger of
> >>> such process memory manipulation. Libbpf let's kernel take care of this
> >>> properly or gives up.
> >>>
> >>> Logistically, all the extra USDT-related infrastructure of libbpf is put
> >>> into a separate usdt.c file and abstracted behind struct usdt_manager.
> >>> Each bpf_object has lazily-initialized usdt_manager pointer, which is
> >>> only instantiated if USDT programs are attempted to be attached. Closing
> >>> BPF object frees up usdt_manager resources. usdt_manager keeps track of
> >>> USDT spec ID assignment and few other small things.
> >>>
> >>> Subsequent patches will fill out remaining missing pieces of USDT
> >>> initialization and setup logic.
> >>>
> >>> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> >>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> >>> ---
> >>>  tools/lib/bpf/Build             |   3 +-
> >>>  tools/lib/bpf/libbpf.c          | 100 +++++++-
> >>>  tools/lib/bpf/libbpf.h          |  31 +++
> >>>  tools/lib/bpf/libbpf.map        |   1 +
> >>>  tools/lib/bpf/libbpf_internal.h |  19 ++
> >>>  tools/lib/bpf/usdt.c            | 426 ++++++++++++++++++++++++++++++++
> >>>  6 files changed, 571 insertions(+), 9 deletions(-)
> >>>  create mode 100644 tools/lib/bpf/usdt.c
> >>

[...]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-04-06  5:30 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-02  0:29 [PATCH v2 bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
2022-04-02  0:29 ` [PATCH v2 bpf-next 1/7] libbpf: add BPF-side of USDT support Andrii Nakryiko
2022-04-04  0:23   ` Dave Marchevsky
2022-04-04  3:55     ` Andrii Nakryiko
2022-04-02  0:29 ` [PATCH v2 bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration Andrii Nakryiko
2022-04-04  3:12   ` Dave Marchevsky
2022-04-04  4:18     ` Andrii Nakryiko
2022-04-05  1:00       ` Dave Marchevsky
2022-04-05 23:41         ` Andrii Nakryiko
2022-04-02  0:29 ` [PATCH v2 bpf-next 3/7] libbpf: add USDT notes parsing and resolution logic Andrii Nakryiko
2022-04-02  0:29 ` [PATCH v2 bpf-next 4/7] libbpf: wire up spec management and other arch-independent USDT logic Andrii Nakryiko
2022-04-04 19:58   ` Dave Marchevsky
2022-04-04 22:21     ` Andrii Nakryiko
2022-04-02  0:29 ` [PATCH v2 bpf-next 5/7] libbpf: add x86-specific USDT arg spec parsing logic Andrii Nakryiko
2022-04-04  5:07   ` Dave Marchevsky
2022-04-02  0:29 ` [PATCH v2 bpf-next 6/7] selftests/bpf: add basic USDT selftests Andrii Nakryiko
2022-04-04  5:48   ` Dave Marchevsky
2022-04-02  0:29 ` [PATCH v2 bpf-next 7/7] selftests/bpf: add urandom_read shared lib and USDTs Andrii Nakryiko
2022-04-04 18:34   ` Dave Marchevsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).