bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/7] Add libbpf support for USDTs
@ 2022-03-25  5:29 Andrii Nakryiko
  2022-03-25  5:29 ` [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support Andrii Nakryiko
                   ` (6 more replies)
  0 siblings, 7 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-25  5:29 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky

Add libbpf support for USDT (User Statically-Defined Tracing) probes.
USDTs is important part of tracing, and BPF, ecosystem, widely used in
mission-critical production applications for observability, performance
analysis, and debugging.

And while USDTs themselves are pretty complicated abstraction built on top of
uprobes, for end-users USDT is as natural a primitive as uprobes themselves.
And thus it's important for libbpf to provide best possible user experience
when it comes to build tracing applications relying on USDTs.

USDTs historically presented a lot of challenges for libbpf's no
compilation-on-the-fly general approach to BPF tracing. BCC utilizes power of
on-the-fly source code generation and compilation using its embedded Clang
toolchain, which was impractical for more lightweight and thus more rigid
libbpf-based approach. But still, with enough diligence and BPF cookies it's
possible to implement USDT support that feels as natural as tracing any
uprobe.

This patch set is the culmination of such effort to add libbpf USDT support
following the spirit and philosophy of BPF CO-RE (even though it's not
inherently relying on BPF CO-RE much, see patch #1 for some notes regarding
this). Each respective patch has enough details and explanations, so I won't
go into details here.

In the end, I think the overall usability of libbpf's USDT support *exceeds*
the status quo set by BCC due to the elimination of awkward runtime USDT
supporting code generation. It also exceeds BCC's capabilities due to the use
of BPF cookie. This eliminates the need to determine a USDT call site (and
thus specifics about how exactly to fetch arguments) based on its *absolute IP
address*, which is impossible with shared libraries if no PID is specified (as
we then just *can't* know absolute IP at which shared library is loaded,
because it might be different for each process). With BPF cookie this is not
a problem as we record "call site ID" directly in a BPF cookie value. This
makes it possible to do a system-wide tracing of a USDT defined in a shared
library. Think about tracing some USDT in libc across any process in the
system, both running at the time of attachment and all the new processes
started *afterwards*. This is a very powerful capability that allows more
efficient observability and tracing tooling.

Once this functionality lands, the plan is to extend libbpf-bootstrap ([0])
with an USDT example. It will also become possible to start converting BCC
tools that rely on USDTs to their libbpf-based counterparts ([1]).

It's worth noting that preliminary version of this code was currently used and
tested in production code running fleet-wide observability toolkit.

Libbpf functionality is broken down into 5 mostly logically independent parts,
for ease of reviewing:
  - patch #1 adds BPF-side implementation;
  - patch #2 adds user-space APIs and wires bpf_link for USDTs;
  - patch #3 adds the most mundate pieces: handling ELF, parsing USDT notes,
    dealing with memory segments, relative vs absolute addresses, etc;
  - patch #4 adds internal ID allocation and setting up/tearing down of
    BPF-side state (spec and IP-to-ID mapping);
  - patch #5 implements x86/x86-64-specific logic of parsing USDT argument
    specifications;
  - patch #6 adds testing of various basic aspects of handling of USDT;
  - patch #7 extends the set of tests with more combinations of semaphore,
    executable vs shared library, and PID filter options.

  [0] https://github.com/libbpf/libbpf-bootstrap
  [1] https://github.com/iovisor/bcc/tree/master/libbpf-tools

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>

Andrii Nakryiko (7):
  libbpf: add BPF-side of USDT support
  libbpf: wire up USDT API and bpf_link integration
  libbpf: add USDT notes parsing and resolution logic
  libbpf: wire up spec management and other arch-independent USDT logic
  libbpf: add x86-specific USDT arg spec parsing logic
  selftests/bpf: add basic USDT selftests
  selftests/bpf: add urandom_read shared lib and USDTs

 tools/lib/bpf/Build                           |    3 +-
 tools/lib/bpf/Makefile                        |    2 +-
 tools/lib/bpf/libbpf.c                        |   92 +-
 tools/lib/bpf/libbpf.h                        |   15 +
 tools/lib/bpf/libbpf.map                      |    1 +
 tools/lib/bpf/libbpf_internal.h               |   19 +
 tools/lib/bpf/usdt.bpf.h                      |  228 ++++
 tools/lib/bpf/usdt.c                          | 1119 +++++++++++++++++
 tools/testing/selftests/bpf/Makefile          |   12 +-
 tools/testing/selftests/bpf/prog_tests/usdt.c |  421 +++++++
 .../selftests/bpf/progs/test_urandom_usdt.c   |   70 ++
 tools/testing/selftests/bpf/progs/test_usdt.c |  115 ++
 tools/testing/selftests/bpf/urandom_read.c    |   63 +-
 .../testing/selftests/bpf/urandom_read_aux.c  |    9 +
 .../testing/selftests/bpf/urandom_read_lib1.c |   13 +
 .../testing/selftests/bpf/urandom_read_lib2.c |    8 +
 16 files changed, 2173 insertions(+), 17 deletions(-)
 create mode 100644 tools/lib/bpf/usdt.bpf.h
 create mode 100644 tools/lib/bpf/usdt.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/usdt.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_urandom_usdt.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_usdt.c
 create mode 100644 tools/testing/selftests/bpf/urandom_read_aux.c
 create mode 100644 tools/testing/selftests/bpf/urandom_read_lib1.c
 create mode 100644 tools/testing/selftests/bpf/urandom_read_lib2.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-03-25  5:29 [PATCH bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
@ 2022-03-25  5:29 ` Andrii Nakryiko
  2022-03-30  3:10   ` Hengqi Chen
                     ` (2 more replies)
  2022-03-25  5:29 ` [PATCH bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration Andrii Nakryiko
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-25  5:29 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky

Add BPF-side implementation of libbpf-provided USDT support. This
consists of single header library, usdt.bpf.h, which is meant to be used
from user's BPF-side source code. This header is added to the list of
installed libbpf header, along bpf_helpers.h and others.

BPF-side implementation consists of two BPF maps:
  - spec map, which contains "a USDT spec" which encodes information
    necessary to be able to fetch USDT arguments and other information
    (argument count, user-provided cookie value, etc) at runtime;
  - IP-to-spec-ID map, which is only used on kernels that don't support
    BPF cookie feature. It allows to lookup spec ID based on the place
    in user application that triggers USDT program.

These maps have default sizes, 256 and 1024, which are chosen
conservatively to not waste a lot of space, but handling a lot of common
cases. But there could be cases when user application needs to either
trace a lot of different USDTs, or USDTs are heavily inlined and their
arguments are located in a lot of differing locations. For such cases it
might be necessary to size those maps up, which libbpf allows to do by
overriding BPF_USDT_MAX_SPEC_CNT and BPF_USDT_MAX_IP_CNT macros.

It is an important aspect to keep in mind. Single USDT (user-space
equivalent of kernel tracepoint) can have multiple USDT "call sites".
That is, single logical USDT is triggered from multiple places in user
application. This can happen due to function inlining. Each such inlined
instance of USDT invocation can have its own unique USDT argument
specification (instructions about the location of the value of each of
USDT arguments). So while USDT looks very similar to usual uprobe or
kernel tracepoint, under the hood it's actually a collection of uprobes,
each potentially needing different spec to know how to fetch arguments.

User-visible API consists of three helper functions:
  - bpf_usdt_arg_cnt(), which returns number of arguments of current USDT;
  - bpf_usdt_arg(), which reads value of specified USDT argument (by
    it's zero-indexed position) and returns it as 64-bit value;
  - bpf_usdt_cookie(), which functions like BPF cookie for USDT
    programs; this is necessary as libbpf doesn't allow specifying actual
    BPF cookie and utilizes it internally for USDT support implementation.

Each bpf_usdt_xxx() APIs expect struct pt_regs * context, passed into
BPF program. On kernels that don't support BPF cookie it is used to
fetch absolute IP address of the underlying uprobe.

usdt.bpf.h also provides BPF_USDT() macro, which functions like
BPF_PROG() and BPF_KPROBE() and allows much more user-friendly way to
get access to USDT arguments, if USDT definition is static and known to
the user. It is expected that majority of use cases won't have to use
bpf_usdt_arg_cnt() and bpf_usdt_arg() directly and BPF_USDT() will cover
all their needs.

Last, usdt.bpf.h is utilizing BPF CO-RE for one single purpose: to
detect kernel support for BPF cookie. If BPF CO-RE dependency is
undesirable, user application can redefine BPF_USDT_HAS_BPF_COOKIE to
either a boolean constant (or equivalently zero and non-zero), or even
point it to its own .rodata variable that can be specified from user's
application user-space code. It is important that
BPF_USDT_HAS_BPF_COOKIE is known to BPF verifier as static value (thus
.rodata and not just .data), as otherwise BPF code will still contain
bpf_get_attach_cookie() BPF helper call and will fail validation at
runtime, if not dead-code eliminated.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/Makefile   |   2 +-
 tools/lib/bpf/usdt.bpf.h | 228 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 229 insertions(+), 1 deletion(-)
 create mode 100644 tools/lib/bpf/usdt.bpf.h

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index b8b37fe76006..b4fbe8bed555 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -239,7 +239,7 @@ install_lib: all_cmd
 
 SRC_HDRS := bpf.h libbpf.h btf.h libbpf_common.h libbpf_legacy.h xsk.h	     \
 	    bpf_helpers.h bpf_tracing.h bpf_endian.h bpf_core_read.h	     \
-	    skel_internal.h libbpf_version.h
+	    skel_internal.h libbpf_version.h usdt.bpf.h
 GEN_HDRS := $(BPF_GENERATED)
 
 INSTALL_PFX := $(DESTDIR)$(prefix)/include/bpf
diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h
new file mode 100644
index 000000000000..8ee084b2e6b5
--- /dev/null
+++ b/tools/lib/bpf/usdt.bpf.h
@@ -0,0 +1,228 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#ifndef __USDT_BPF_H__
+#define __USDT_BPF_H__
+
+#include <linux/errno.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+
+/* Below types and maps are internal implementation details of libpf's USDT
+ * support and are subjects to change. Also, usdt_xxx() API helpers should be
+ * considered an unstable API as well and might be adjusted based on user
+ * feedback from using libbpf's USDT support in production.
+ */
+
+/* User can override BPF_USDT_MAX_SPEC_CNT to change default size of internal
+ * map that keeps track of USDT argument specifications. This might be
+ * necessary if there are a lot of USDT attachments.
+ */
+#ifndef BPF_USDT_MAX_SPEC_CNT
+#define BPF_USDT_MAX_SPEC_CNT 256
+#endif
+/* User can override BPF_USDT_MAX_IP_CNT to change default size of internal
+ * map that keeps track of IP (memory address) mapping to USDT argument
+ * specification.
+ * Note, if kernel supports BPF cookies, this map is not used and could be
+ * resized all the way to 1 to save a bit of memory.
+ */
+#ifndef BPF_USDT_MAX_IP_CNT
+#define BPF_USDT_MAX_IP_CNT 1024
+#endif
+/* We use BPF CO-RE to detect support for BPF cookie from BPF side. This is
+ * the only dependency on CO-RE, so if it's undesirable, user can override
+ * BPF_USDT_HAS_BPF_COOKIE to specify whether to BPF cookie is supported or not.
+ */
+#ifndef BPF_USDT_HAS_BPF_COOKIE
+#define BPF_USDT_HAS_BPF_COOKIE \
+	bpf_core_enum_value_exists(enum bpf_func_id___usdt, BPF_FUNC_get_attach_cookie___usdt)
+#endif
+
+enum __bpf_usdt_arg_type {
+	BPF_USDT_ARG_CONST,
+	BPF_USDT_ARG_REG,
+	BPF_USDT_ARG_REG_DEREF,
+};
+
+struct __bpf_usdt_arg_spec {
+	__u64 val_off;
+	enum __bpf_usdt_arg_type arg_type;
+	short reg_off;
+	bool arg_signed;
+	char arg_bitshift;
+};
+
+/* should match USDT_MAX_ARG_CNT in usdt.c exactly */
+#define BPF_USDT_MAX_ARG_CNT 12
+struct __bpf_usdt_spec {
+	struct __bpf_usdt_arg_spec args[BPF_USDT_MAX_ARG_CNT];
+	__u64 usdt_cookie;
+	short arg_cnt;
+};
+
+__weak struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, BPF_USDT_MAX_SPEC_CNT);
+	__type(key, int);
+	__type(value, struct __bpf_usdt_spec);
+} __bpf_usdt_specs SEC(".maps");
+
+__weak struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, BPF_USDT_MAX_IP_CNT);
+	__type(key, long);
+	__type(value, struct __bpf_usdt_spec);
+} __bpf_usdt_specs_ip_to_id SEC(".maps");
+
+/* don't rely on user's BPF code to have latest definition of bpf_func_id */
+enum bpf_func_id___usdt {
+	BPF_FUNC_get_attach_cookie___usdt = 0xBAD, /* value doesn't matter */
+};
+
+static inline int __bpf_usdt_spec_id(struct pt_regs *ctx)
+{
+	if (!BPF_USDT_HAS_BPF_COOKIE) {
+		long ip = PT_REGS_IP(ctx);
+		int *spec_id_ptr;
+
+		spec_id_ptr = bpf_map_lookup_elem(&__bpf_usdt_specs_ip_to_id, &ip);
+		return spec_id_ptr ? *spec_id_ptr : -ESRCH;
+	}
+
+	return bpf_get_attach_cookie(ctx);
+}
+
+/* Return number of USDT arguments defined for currently traced USDT. */
+__hidden __weak
+int bpf_usdt_arg_cnt(struct pt_regs *ctx)
+{
+	struct __bpf_usdt_spec *spec;
+	int spec_id;
+
+	spec_id = __bpf_usdt_spec_id(ctx);
+	if (spec_id < 0)
+		return -EINVAL;
+
+	spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
+	if (!spec)
+		return -EINVAL;
+
+	return spec->arg_cnt;
+}
+
+/* Fetch USDT argument *arg* (zero-indexed) and put its value into *res.
+ * Returns 0 on success; negative error, otherwise.
+ * On error *res is guaranteed to be set to zero.
+ */
+__hidden __weak
+int bpf_usdt_arg(struct pt_regs *ctx, int arg, long *res)
+{
+	struct __bpf_usdt_spec *spec;
+	struct __bpf_usdt_arg_spec *arg_spec;
+	unsigned long val;
+	int err, spec_id;
+
+	*res = 0;
+
+	spec_id = __bpf_usdt_spec_id(ctx);
+	if (spec_id < 0)
+		return -ESRCH;
+
+	spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
+	if (!spec)
+		return -ESRCH;
+
+	if (arg >= spec->arg_cnt)
+		return -ENOENT;
+
+	arg_spec = &spec->args[arg];
+	switch (arg_spec->arg_type) {
+	case BPF_USDT_ARG_CONST:
+		val = arg_spec->val_off;
+		break;
+	case BPF_USDT_ARG_REG:
+		err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
+		if (err)
+			return err;
+		break;
+	case BPF_USDT_ARG_REG_DEREF:
+		err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
+		if (err)
+			return err;
+		err = bpf_probe_read_user(&val, sizeof(val), (void *)val + arg_spec->val_off);
+		if (err)
+			return err;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	val <<= arg_spec->arg_bitshift;
+	if (arg_spec->arg_signed)
+		val = ((long)val) >> arg_spec->arg_bitshift;
+	else
+		val = val >> arg_spec->arg_bitshift;
+	*res = val;
+	return 0;
+}
+
+/* Retrieve user-specified cookie value provided during attach as
+ * bpf_usdt_opts.usdt_cookie. This serves the same purpose as BPF cookie
+ * returned by bpf_get_attach_cookie(). Libbpf's support for USDT is itself
+ * utilizaing BPF cookies internally, so user can't use BPF cookie directly
+ * for USDT programs and has to use bpf_usdt_cookie() API instead.
+ */
+__hidden __weak
+long bpf_usdt_cookie(struct pt_regs *ctx)
+{
+	struct __bpf_usdt_spec *spec;
+	int spec_id;
+
+	spec_id = __bpf_usdt_spec_id(ctx);
+	if (spec_id < 0)
+		return 0;
+
+	spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
+	if (!spec)
+		return 0;
+
+	return spec->usdt_cookie;
+}
+
+/* we rely on ___bpf_apply() and ___bpf_narg() macros already defined in bpf_tracing.h */
+#define ___bpf_usdt_args0() ctx
+#define ___bpf_usdt_args1(x) ___bpf_usdt_args0(), ({ long _x; bpf_usdt_arg(ctx, 0, &_x); (void *)_x; })
+#define ___bpf_usdt_args2(x, args...) ___bpf_usdt_args1(args), ({ long _x; bpf_usdt_arg(ctx, 1, &_x); (void *)_x; })
+#define ___bpf_usdt_args3(x, args...) ___bpf_usdt_args2(args), ({ long _x; bpf_usdt_arg(ctx, 2, &_x); (void *)_x; })
+#define ___bpf_usdt_args4(x, args...) ___bpf_usdt_args3(args), ({ long _x; bpf_usdt_arg(ctx, 3, &_x); (void *)_x; })
+#define ___bpf_usdt_args5(x, args...) ___bpf_usdt_args4(args), ({ long _x; bpf_usdt_arg(ctx, 4, &_x); (void *)_x; })
+#define ___bpf_usdt_args6(x, args...) ___bpf_usdt_args5(args), ({ long _x; bpf_usdt_arg(ctx, 5, &_x); (void *)_x; })
+#define ___bpf_usdt_args7(x, args...) ___bpf_usdt_args6(args), ({ long _x; bpf_usdt_arg(ctx, 6, &_x); (void *)_x; })
+#define ___bpf_usdt_args8(x, args...) ___bpf_usdt_args7(args), ({ long _x; bpf_usdt_arg(ctx, 7, &_x); (void *)_x; })
+#define ___bpf_usdt_args9(x, args...) ___bpf_usdt_args8(args), ({ long _x; bpf_usdt_arg(ctx, 8, &_x); (void *)_x; })
+#define ___bpf_usdt_args10(x, args...) ___bpf_usdt_args9(args), ({ long _x; bpf_usdt_arg(ctx, 9, &_x); (void *)_x; })
+#define ___bpf_usdt_args11(x, args...) ___bpf_usdt_args10(args), ({ long _x; bpf_usdt_arg(ctx, 10, &_x); (void *)_x; })
+#define ___bpf_usdt_args12(x, args...) ___bpf_usdt_args11(args), ({ long _x; bpf_usdt_arg(ctx, 11, &_x); (void *)_x; })
+#define ___bpf_usdt_args(args...) ___bpf_apply(___bpf_usdt_args, ___bpf_narg(args))(args)
+
+/*
+ * BPF_USDT serves the same purpose for USDT handlers as BPF_PROG for
+ * tp_btf/fentry/fexit BPF programs and BPF_KPROBE for kprobes.
+ * Original struct pt_regs * context is preserved as 'ctx' argument.
+ */
+#define BPF_USDT(name, args...)						    \
+name(struct pt_regs *ctx);						    \
+static __attribute__((always_inline)) typeof(name(0))			    \
+____##name(struct pt_regs *ctx, ##args);				    \
+typeof(name(0)) name(struct pt_regs *ctx)				    \
+{									    \
+        _Pragma("GCC diagnostic push")					    \
+        _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")		    \
+        return ____##name(___bpf_usdt_args(args));			    \
+        _Pragma("GCC diagnostic pop")					    \
+}									    \
+static __attribute__((always_inline)) typeof(name(0))			    \
+____##name(struct pt_regs *ctx, ##args)
+
+#endif /* __USDT_BPF_H__ */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration
  2022-03-25  5:29 [PATCH bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
  2022-03-25  5:29 ` [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support Andrii Nakryiko
@ 2022-03-25  5:29 ` Andrii Nakryiko
  2022-03-30  3:24   ` Hengqi Chen
  2022-03-31 12:13   ` Alan Maguire
  2022-03-25  5:29 ` [PATCH bpf-next 3/7] libbpf: add USDT notes parsing and resolution logic Andrii Nakryiko
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-25  5:29 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky

Wire up libbpf USDT support APIs without yet implementing all the
nitty-gritty details of USDT discovery, spec parsing, and BPF map
initialization.

User-visible user-space API is simple and is conceptually very similar
to uprobe API.

bpf_program__attach_usdt() API allows to programmatically attach given
BPF program to a USDT, specified through binary path (executable or
shared lib), USDT provider and name. Also, just like in uprobe case, PID
filter is specified (0 - self, -1 - any process, or specific PID).
Optionally, USDT cookie value can be specified. Such single API
invocation will try to discover given USDT in specified binary and will
use (potentially many) BPF uprobes to attach this program in correct
locations.

Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that
represents this attachment. It is a virtual BPF link that doesn't have
direct kernel object, as it can consist of multiple underlying BPF
uprobe links. As such, attachment is not atomic operation and there can
be brief moment when some USDT call sites are attached while others are
still in the process of attaching. This should be taken into
consideration by user. But bpf_program__attach_usdt() guarantees that
in the case of success all USDT call sites are successfully attached, or
all the successfuly attachments will be detached as soon as some USDT
call sites failed to be attached. So, in theory, there could be cases of
failed bpf_program__attach_usdt() call which did trigger few USDT
program invocations. This is unavoidable due to multi-uprobe nature of
USDT and has to be handled by user, if it's important to create an
illusion of atomicity.

USDT BPF programs themselves are marked in BPF source code as either
SEC("usdt"), in which case they won't be auto-attached through
skeleton's <skel>__attach() method, or it can have a full definition,
which follows the spirit of fully-specified uprobes:
SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's
attach method will attempt auto-attachment. Similarly, generic
bpf_program__attach() will have enought information to go off of for
parameterless attachment.

USDT BPF programs are actually uprobes, and as such for kernel they are
marked as BPF_PROG_TYPE_KPROBE.

Another part of this patch is USDT-related feature probing:
  - BPF cookie support detection from user-space;
  - detection of kernel support for auto-refcounting of USDT semaphore.

The latter is optional. If kernel doesn't support such feature and USDT
doesn't rely on USDT semaphores, no error is returned. But if libbpf
detects that USDT requires setting semaphores and kernel doesn't support
this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't
support poking process's memory directly to increment semaphore value,
like BCC does on legacy kernels, due to inherent raciness and danger of
such process memory manipulation. Libbpf let's kernel take care of this
properly or gives up.

Logistically, all the extra USDT-related infrastructure of libbpf is put
into a separate usdt.c file and abstracted behind struct usdt_manager.
Each bpf_object has lazily-initialized usdt_manager pointer, which is
only instantiated if USDT programs are attempted to be attached. Closing
BPF object frees up usdt_manager resources. usdt_manager keeps track of
USDT spec ID assignment and few other small things.

Subsequent patches will fill out remaining missing pieces of USDT
initialization and setup logic.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/Build             |   3 +-
 tools/lib/bpf/libbpf.c          |  92 ++++++++++-
 tools/lib/bpf/libbpf.h          |  15 ++
 tools/lib/bpf/libbpf.map        |   1 +
 tools/lib/bpf/libbpf_internal.h |  19 +++
 tools/lib/bpf/usdt.c            | 270 ++++++++++++++++++++++++++++++++
 6 files changed, 391 insertions(+), 9 deletions(-)
 create mode 100644 tools/lib/bpf/usdt.c

diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index 94f0a146bb7b..31a1a9015902 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1,3 +1,4 @@
 libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \
 	    netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o hashmap.o \
-	    btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o
+	    btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \
+	    usdt.o
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 809fe209cdcc..8841499f5f12 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -483,6 +483,8 @@ struct elf_state {
 	int st_ops_shndx;
 };
 
+struct usdt_manager;
+
 struct bpf_object {
 	char name[BPF_OBJ_NAME_LEN];
 	char license[64];
@@ -545,6 +547,8 @@ struct bpf_object {
 	size_t fd_array_cap;
 	size_t fd_array_cnt;
 
+	struct usdt_manager *usdt_man;
+
 	char path[];
 };
 
@@ -4678,6 +4682,18 @@ static int probe_perf_link(void)
 	return link_fd < 0 && err == -EBADF;
 }
 
+static int probe_kern_bpf_cookie(void)
+{
+	struct bpf_insn insns[] = {
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_attach_cookie),
+		BPF_EXIT_INSN(),
+	};
+	int ret, insn_cnt = ARRAY_SIZE(insns);
+
+	ret = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL", insns, insn_cnt, NULL);
+	return probe_fd(ret);
+}
+
 enum kern_feature_result {
 	FEAT_UNKNOWN = 0,
 	FEAT_SUPPORTED = 1,
@@ -4740,6 +4756,9 @@ static struct kern_feature_desc {
 	[FEAT_MEMCG_ACCOUNT] = {
 		"memcg-based memory accounting", probe_memcg_account,
 	},
+	[FEAT_BPF_COOKIE] = {
+		"BPF cookie support", probe_kern_bpf_cookie,
+	},
 };
 
 bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
@@ -8200,6 +8219,9 @@ void bpf_object__close(struct bpf_object *obj)
 	if (obj->clear_priv)
 		obj->clear_priv(obj, obj->priv);
 
+	usdt_manager_free(obj->usdt_man);
+	obj->usdt_man = NULL;
+
 	bpf_gen__free(obj->gen_loader);
 	bpf_object__elf_finish(obj);
 	bpf_object_unload(obj);
@@ -8630,6 +8652,7 @@ int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log
 }
 
 static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link);
+static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link);
 static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link);
 static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link);
 static int attach_trace(const struct bpf_program *prog, long cookie, struct bpf_link **link);
@@ -8647,6 +8670,7 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("uretprobe/",		KPROBE, 0, SEC_NONE),
 	SEC_DEF("kprobe.multi/",	KPROBE,	BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
 	SEC_DEF("kretprobe.multi/",	KPROBE,	BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
+	SEC_DEF("usdt+",		KPROBE,	0, SEC_NONE, attach_usdt),
 	SEC_DEF("tc",			SCHED_CLS, 0, SEC_NONE),
 	SEC_DEF("classifier",		SCHED_CLS, 0, SEC_NONE | SEC_SLOPPY_PFX | SEC_DEPRECATED),
 	SEC_DEF("action",		SCHED_ACT, 0, SEC_NONE | SEC_SLOPPY_PFX),
@@ -9692,14 +9716,6 @@ int bpf_prog_load_deprecated(const char *file, enum bpf_prog_type type,
 	return bpf_prog_load_xattr2(&attr, pobj, prog_fd);
 }
 
-struct bpf_link {
-	int (*detach)(struct bpf_link *link);
-	void (*dealloc)(struct bpf_link *link);
-	char *pin_path;		/* NULL, if not pinned */
-	int fd;			/* hook FD, -1 if not applicable */
-	bool disconnected;
-};
-
 /* Replace link's underlying BPF program with the new one */
 int bpf_link__update_program(struct bpf_link *link, struct bpf_program *prog)
 {
@@ -10599,6 +10615,66 @@ struct bpf_link *bpf_program__attach_uprobe(const struct bpf_program *prog,
 	return bpf_program__attach_uprobe_opts(prog, pid, binary_path, func_offset, &opts);
 }
 
+struct bpf_link *bpf_program__attach_usdt(const struct bpf_program *prog,
+					  pid_t pid, const char *binary_path,
+					  const char *usdt_provider, const char *usdt_name,
+					  const struct bpf_usdt_opts *opts)
+{
+	struct bpf_object *obj = prog->obj;
+	struct bpf_link *link;
+	long usdt_cookie;
+	int err;
+
+	if (!OPTS_VALID(opts, bpf_uprobe_opts))
+		return libbpf_err_ptr(-EINVAL);
+
+	/* USDT manager is instantiated lazily on first USDT attach. It will
+	 * be destroyed together with BPF object in bpf_object__close().
+	 */
+	if (!obj->usdt_man) {
+		obj->usdt_man = usdt_manager_new(obj);
+		if (!obj->usdt_man)
+			return libbpf_err_ptr(-ENOMEM);
+	}
+
+	usdt_cookie = OPTS_GET(opts, usdt_cookie, 0);
+	link = usdt_manager_attach_usdt(obj->usdt_man, prog, pid, binary_path,
+				        usdt_provider, usdt_name, usdt_cookie);
+	err = libbpf_get_error(link);
+	if (err)
+		return libbpf_err_ptr(err);
+	return link;
+}
+
+static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link)
+{
+	char *path = NULL, *provider = NULL, *name = NULL;
+	const char *sec_name;
+
+	sec_name = bpf_program__section_name(prog);
+	if (strcmp(sec_name, "usdt") == 0) {
+		/* no auto-attach for just SEC("usdt") */
+		*link = NULL;
+		return 0;
+	}
+
+	if (3 != sscanf(sec_name, "usdt/%m[^:]:%m[^:]:%m[^:]", &path, &provider, &name)) {
+		pr_warn("invalid section '%s', expected SEC(\"usdt/<path>:<provider>:<name>\")\n",
+			sec_name);
+		free(path);
+		free(provider);
+		free(name);
+		return -EINVAL;
+	}
+
+	*link = bpf_program__attach_usdt(prog, -1 /* any process */, path,
+					 provider, name, NULL);
+	free(path);
+	free(provider);
+	free(name);
+	return libbpf_get_error(*link);
+}
+
 static int determine_tracepoint_id(const char *tp_category,
 				   const char *tp_name)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 05dde85e19a6..318eecaa14e7 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -503,6 +503,21 @@ bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid,
 				const char *binary_path, size_t func_offset,
 				const struct bpf_uprobe_opts *opts);
 
+struct bpf_usdt_opts {
+	/* size of this struct, for forward/backward compatibility */
+	size_t sz;
+	/* custom user-provided value accessible through usdt_cookie() */
+	__u64 usdt_cookie;
+	size_t :0;
+};
+#define bpf_usdt_opts__last_field usdt_cookie
+
+LIBBPF_API struct bpf_link *
+bpf_program__attach_usdt(const struct bpf_program *prog,
+			 pid_t pid, const char *binary_path,
+			 const char *usdt_provider, const char *usdt_name,
+			 const struct bpf_usdt_opts *opts);
+
 struct bpf_tracepoint_opts {
 	/* size of this struct, for forward/backward compatiblity */
 	size_t sz;
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index dd35ee58bfaa..82f6d62176dd 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -444,6 +444,7 @@ LIBBPF_0.8.0 {
 	global:
 		bpf_object__destroy_subskeleton;
 		bpf_object__open_subskeleton;
+		bpf_program__attach_usdt;
 		libbpf_register_prog_handler;
 		libbpf_unregister_prog_handler;
 		bpf_program__attach_kprobe_multi_opts;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index b6247dc7f8eb..dd0d4ccfa649 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -148,6 +148,15 @@ do {				\
 #ifndef __has_builtin
 #define __has_builtin(x) 0
 #endif
+
+struct bpf_link {
+	int (*detach)(struct bpf_link *link);
+	void (*dealloc)(struct bpf_link *link);
+	char *pin_path;		/* NULL, if not pinned */
+	int fd;			/* hook FD, -1 if not applicable */
+	bool disconnected;
+};
+
 /*
  * Re-implement glibc's reallocarray() for libbpf internal-only use.
  * reallocarray(), unfortunately, is not available in all versions of glibc,
@@ -329,6 +338,8 @@ enum kern_feature_id {
 	FEAT_BTF_TYPE_TAG,
 	/* memcg-based accounting for BPF maps and progs */
 	FEAT_MEMCG_ACCOUNT,
+	/* BPF cookie (bpf_get_attach_cookie() BPF helper) support */
+	FEAT_BPF_COOKIE,
 	__FEAT_CNT,
 };
 
@@ -543,4 +554,12 @@ int bpf_core_add_cands(struct bpf_core_cand *local_cand,
 		       struct bpf_core_cand_list *cands);
 void bpf_core_free_cands(struct bpf_core_cand_list *cands);
 
+struct usdt_manager *usdt_manager_new(struct bpf_object *obj);
+void usdt_manager_free(struct usdt_manager *man);
+struct bpf_link * usdt_manager_attach_usdt(struct usdt_manager *man,
+					   const struct bpf_program *prog,
+					   pid_t pid, const char *path,
+					   const char *usdt_provider, const char *usdt_name,
+					   long usdt_cookie);
+
 #endif /* __LIBBPF_LIBBPF_INTERNAL_H */
diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
new file mode 100644
index 000000000000..8481e300598e
--- /dev/null
+++ b/tools/lib/bpf/usdt.c
@@ -0,0 +1,270 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#include <ctype.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <libelf.h>
+#include <gelf.h>
+#include <unistd.h>
+#include <linux/ptrace.h>
+#include <linux/kernel.h>
+
+#include "bpf.h"
+#include "libbpf.h"
+#include "libbpf_common.h"
+#include "libbpf_internal.h"
+#include "hashmap.h"
+
+#define PERF_UPROBE_REF_CTR_OFFSET_SHIFT 32
+
+struct usdt_target {
+	long abs_ip;
+	long rel_ip;
+	long sema_off;
+};
+
+struct usdt_manager {
+	struct bpf_map *specs_map;
+	struct bpf_map *ip_to_id_map;
+
+	bool has_bpf_cookie;
+	bool has_sema_refcnt;
+};
+
+struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
+{
+	static const char *ref_ctr_sysfs_path = "/sys/bus/event_source/devices/uprobe/format/ref_ctr_offset";
+	struct usdt_manager *man;
+	struct bpf_map *specs_map, *ip_to_id_map;
+
+	specs_map = bpf_object__find_map_by_name(obj, "__bpf_usdt_specs");
+	ip_to_id_map = bpf_object__find_map_by_name(obj, "__bpf_usdt_specs_ip_to_id");
+	if (!specs_map || !ip_to_id_map) {
+		pr_warn("usdt: failed to find USDT support BPF maps, did you forget to include bpf/usdt.bpf.h?\n");
+		return NULL;
+	}
+
+	man = calloc(1, sizeof(*man));
+	if (!man)
+		return NULL;
+
+	man->specs_map = specs_map;
+	man->ip_to_id_map = ip_to_id_map;
+
+        /* Detect if BPF cookie is supported for kprobes.
+	 * We don't need IP-to-ID mapping if we can use BPF cookies.
+         * Added in: 7adfc6c9b315 ("bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value")
+         */
+	man->has_bpf_cookie = kernel_supports(obj, FEAT_BPF_COOKIE);
+
+	/* Detect kernel support for automatic refcounting of USDT semaphore.
+	 * If this is not supported, USDTs with semaphores will not be supported.
+	 * Added in: a6ca88b241d5 ("trace_uprobe: support reference counter in fd-based uprobe")
+	 */
+	man->has_sema_refcnt = access(ref_ctr_sysfs_path, F_OK) == 0;
+
+	return man;
+}
+
+void usdt_manager_free(struct usdt_manager *man)
+{
+	if (!man)
+		return;
+
+	free(man);
+}
+
+static int sanity_check_usdt_elf(Elf *elf, const char *path)
+{
+	GElf_Ehdr ehdr;
+	int endianness;
+
+	if (elf_kind(elf) != ELF_K_ELF) {
+		pr_warn("usdt: unrecognized ELF kind %d for '%s'\n", elf_kind(elf), path);
+		return -EBADF;
+	}
+
+	switch (gelf_getclass(elf)) {
+	case ELFCLASS64:
+		if (sizeof(void *) != 8) {
+			pr_warn("usdt: attaching to 64-bit ELF binary '%s' is not supported\n", path);
+			return -EBADF;
+		}
+		break;
+	case ELFCLASS32:
+		if (sizeof(void *) != 4) {
+			pr_warn("usdt: attaching to 32-bit ELF binary '%s' is not supported\n", path);
+			return -EBADF;
+		}
+		break;
+	default:
+		pr_warn("usdt: unsupported ELF class for '%s'\n", path);
+		return -EBADF;
+	}
+
+	if (!gelf_getehdr(elf, &ehdr))
+		return -EINVAL;
+
+	if (ehdr.e_type != ET_EXEC && ehdr.e_type != ET_DYN) {
+		pr_warn("usdt: unsupported type of ELF binary '%s' (%d), only ET_EXEC and ET_DYN are supported\n",
+			path, ehdr.e_type);
+		return -EBADF;
+	}
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+	endianness = ELFDATA2LSB;
+#elif __BYTE_ORDER == __BIG_ENDIAN
+	endianness = ELFDATA2MSB;
+#else
+# error "Unrecognized __BYTE_ORDER__"
+#endif
+	if (endianness != ehdr.e_ident[EI_DATA]) {
+		pr_warn("usdt: ELF endianness mismatch for '%s'\n", path);
+		return -EBADF;
+	}
+
+	return 0;
+}
+
+static int collect_usdt_targets(struct usdt_manager *man, Elf *elf, const char *path, pid_t pid,
+				const char *usdt_provider, const char *usdt_name, long usdt_cookie,
+				struct usdt_target **out_targets, size_t *out_target_cnt)
+{
+	return -ENOTSUP;
+}
+
+struct bpf_link_usdt {
+	struct bpf_link link;
+
+	struct usdt_manager *usdt_man;
+
+	size_t uprobe_cnt;
+	struct {
+		long abs_ip;
+		struct bpf_link *link;
+	} *uprobes;
+};
+
+static int bpf_link_usdt_detach(struct bpf_link *link)
+{
+	struct bpf_link_usdt *usdt_link = container_of(link, struct bpf_link_usdt, link);
+	int i;
+
+	for (i = 0; i < usdt_link->uprobe_cnt; i++) {
+		/* detach underlying uprobe link */
+		bpf_link__destroy(usdt_link->uprobes[i].link);
+	}
+
+	return 0;
+}
+
+static void bpf_link_usdt_dealloc(struct bpf_link *link)
+{
+	struct bpf_link_usdt *usdt_link = container_of(link, struct bpf_link_usdt, link);
+
+	free(usdt_link->uprobes);
+	free(usdt_link);
+}
+
+struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct bpf_program *prog,
+					  pid_t pid, const char *path,
+					  const char *usdt_provider, const char *usdt_name,
+					  long usdt_cookie)
+{
+	int i, fd, err;
+	LIBBPF_OPTS(bpf_uprobe_opts, opts);
+	struct bpf_link_usdt *link = NULL;
+	struct usdt_target *targets = NULL;
+	size_t target_cnt;
+	Elf *elf;
+
+	if (bpf_program__fd(prog) < 0) {
+		pr_warn("prog '%s': can't attach BPF program w/o FD (did you load it?)\n",
+			bpf_program__name(prog));
+		return libbpf_err_ptr(-EINVAL);
+	}
+
+	/* TODO: perform path resolution similar to uprobe's */
+	fd = open(path, O_RDONLY);
+	if (fd < 0) {
+		err = -errno;
+		pr_warn("usdt: failed to open ELF binary '%s': %d\n", path, err);
+		return libbpf_err_ptr(err);
+	}
+
+	elf = elf_begin(fd, ELF_C_READ_MMAP, NULL);
+	if (!elf) {
+		err = -EBADF;
+		pr_warn("usdt: failed to parse ELF binary '%s': %s\n", path, elf_errmsg(-1));
+		goto err_out;
+	}
+
+	err = sanity_check_usdt_elf(elf, path);
+	if (err)
+		goto err_out;
+
+	/* normalize PID filter */
+	if (pid < 0)
+		pid = -1;
+	else if (pid == 0)
+		pid = getpid();
+
+	/* discover USDT in given binary, optionally limiting
+	 * activations to a given PID, if pid > 0
+	 */
+	err = collect_usdt_targets(man, elf, path, pid, usdt_provider, usdt_name,
+				   usdt_cookie, &targets, &target_cnt);
+	if (err <= 0) {
+		err = (err == 0) ? -ENOENT : err;
+		goto err_out;
+	}
+
+	link = calloc(1, sizeof(*link));
+	if (!link) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+
+	link->usdt_man = man;
+	link->link.detach = &bpf_link_usdt_detach;
+	link->link.dealloc = &bpf_link_usdt_dealloc;
+
+	link->uprobes = calloc(target_cnt, sizeof(*link->uprobes));
+	if (!link->uprobes) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+
+	for (i = 0; i < target_cnt; i++) {
+		struct usdt_target *target = &targets[i];
+		struct bpf_link *uprobe_link;
+
+		opts.ref_ctr_offset = target->sema_off;
+		uprobe_link = bpf_program__attach_uprobe_opts(prog, pid, path,
+							      target->rel_ip, &opts);
+		err = libbpf_get_error(link);
+		if (err) {
+			pr_warn("usdt: failed to attach uprobe #%d for '%s:%s' in '%s': %d\n",
+				i, usdt_provider, usdt_name, path, err);
+			goto err_out;
+		}
+
+		link->uprobes[i].link = uprobe_link;
+		link->uprobes[i].abs_ip = target->abs_ip;
+		link->uprobe_cnt++;
+	}
+
+	elf_end(elf);
+	close(fd);
+
+	return &link->link;
+
+err_out:
+	bpf_link__destroy(&link->link);
+
+	if (elf)
+		elf_end(elf);
+	close(fd);
+	return libbpf_err_ptr(err);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 3/7] libbpf: add USDT notes parsing and resolution logic
  2022-03-25  5:29 [PATCH bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
  2022-03-25  5:29 ` [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support Andrii Nakryiko
  2022-03-25  5:29 ` [PATCH bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration Andrii Nakryiko
@ 2022-03-25  5:29 ` Andrii Nakryiko
  2022-03-31 13:37   ` Alan Maguire
  2022-03-25  5:29 ` [PATCH bpf-next 4/7] libbpf: wire up spec management and other arch-independent USDT logic Andrii Nakryiko
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-25  5:29 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky

Implement architecture-agnostic parts of USDT parsing logic. The code is
the documentation in this case, it's futile to try to succinctly
describe how USDT parsing is done in any sort of concreteness. But
still, USDTs are recorded in special ELF notes section (.note.stapsdt),
where each USDT call site is described separately. Along with USDT
provider and USDT name, each such note contains USDT argument
specification, which uses assembly-like syntax to describe how to fetch
value of USDT argument. USDT arg spec could be just a constant, or
a register, or a register dereference (most common cases in x86_64), but
it technically can be much more complicated cases, like offset relative
to global symbol and stuff like that. One of the later patches will
implement most common subset of this for x86 and x86-64 architectures,
which seems to handle a lot of real-world production application.

USDT arg spec contains a compact encoding allowing usdt.bpf.h from
previous patch to handle the above 3 cases. Instead of recording which
register might be needed, we encode register's offset within struct
pt_regs to simplify BPF-side implementation. USDT argument can be of
different byte sizes (1, 2, 4, and 8) and signed or unsigned. To handle
this, libbpf pre-calculates necessary bit shifts to do proper casting
and sign-extension in a short sequences of left and right shifts.

The rest is in the code with sometimes extensive comments and references
to external "documentation" for USDTs.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/usdt.c | 581 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 580 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
index 8481e300598e..86d5d8390eb1 100644
--- a/tools/lib/bpf/usdt.c
+++ b/tools/lib/bpf/usdt.c
@@ -18,10 +18,56 @@
 
 #define PERF_UPROBE_REF_CTR_OFFSET_SHIFT 32
 
+#define USDT_BASE_SEC ".stapsdt.base"
+#define USDT_SEMA_SEC ".probes"
+#define USDT_NOTE_SEC  ".note.stapsdt"
+#define USDT_NOTE_TYPE 3
+#define USDT_NOTE_NAME "stapsdt"
+
+/* should match exactly enum __bpf_usdt_arg_type from bpf_usdt.bpf.h */
+enum usdt_arg_type {
+	USDT_ARG_CONST,
+	USDT_ARG_REG,
+	USDT_ARG_REG_DEREF,
+};
+
+/* should match exactly struct __bpf_usdt_arg_spec from bpf_usdt.bpf.h */
+struct usdt_arg_spec {
+	__u64 val_off;
+	enum usdt_arg_type arg_type;
+	short reg_off;
+	bool arg_signed;
+	char arg_bitshift;
+};
+
+/* should match BPF_USDT_MAX_ARG_CNT in usdt.bpf.h */
+#define USDT_MAX_ARG_CNT 12
+
+/* should match struct __bpf_usdt_spec from usdt.bpf.h */
+struct usdt_spec {
+	struct usdt_arg_spec args[USDT_MAX_ARG_CNT];
+	__u64 usdt_cookie;
+	short arg_cnt;
+};
+
+struct usdt_note {
+	const char *provider;
+	const char *name;
+	/* USDT args specification string, e.g.:
+	 * "-4@%esi -4@-24(%rbp) -4@%ecx 2@%ax 8@%rdx"
+	 */
+	const char *args;
+	long loc_addr;
+	long base_addr;
+	long sema_addr;
+};
+
 struct usdt_target {
 	long abs_ip;
 	long rel_ip;
 	long sema_off;
+	struct usdt_spec spec;
+	const char *spec_str;
 };
 
 struct usdt_manager {
@@ -127,11 +173,449 @@ static int sanity_check_usdt_elf(Elf *elf, const char *path)
 	return 0;
 }
 
+static int find_elf_sec_by_name(Elf *elf, const char *sec_name, GElf_Shdr *shdr, Elf_Scn **scn)
+{
+	Elf_Scn *sec = NULL;
+	size_t shstrndx;
+
+	if (elf_getshdrstrndx(elf, &shstrndx))
+		return -EINVAL;
+
+	/* check if ELF is corrupted and avoid calling elf_strptr if yes */
+	if (!elf_rawdata(elf_getscn(elf, shstrndx), NULL))
+		return -EINVAL;
+
+	while ((sec = elf_nextscn(elf, sec)) != NULL) {
+		char *name;
+
+		if (!gelf_getshdr(sec, shdr))
+			return -EINVAL;
+
+		name = elf_strptr(elf, shstrndx, shdr->sh_name);
+		if (name && strcmp(sec_name, name) == 0) {
+			*scn = sec;
+			return 0;
+		}
+	}
+
+	return -ENOENT;
+}
+
+struct elf_seg {
+	long start;
+	long end;
+	long offset;
+	bool is_exec;
+};
+
+static int cmp_elf_segs(const void *_a, const void *_b)
+{
+	const struct elf_seg *a = _a;
+	const struct elf_seg *b = _b;
+
+	return a->start < b->start ? -1 : 1;
+}
+
+static int parse_elf_segs(Elf *elf, const char *path, struct elf_seg **segs, size_t *seg_cnt)
+{
+	GElf_Phdr phdr;
+	size_t n;
+	int i, err;
+	struct elf_seg *seg;
+	void *tmp;
+
+	*seg_cnt = 0;
+
+	if (elf_getphdrnum(elf, &n)) {
+		err = -errno;
+		return err;
+	}
+
+	for (i = 0; i < n; i++) {
+		if (!gelf_getphdr(elf, i, &phdr)) {
+			err = -errno;
+			return err;
+		}
+
+		pr_debug("usdt: discovered PHDR #%d in '%s': vaddr 0x%lx memsz 0x%lx offset 0x%lx type 0x%lx flags 0x%lx\n",
+			 i, path, (long)phdr.p_vaddr, (long)phdr.p_memsz, (long)phdr.p_offset,
+			 (long)phdr.p_type, (long)phdr.p_flags);
+		if (phdr.p_type != PT_LOAD)
+			continue;
+
+		tmp = libbpf_reallocarray(*segs, *seg_cnt + 1, sizeof(**segs));
+		if (!tmp)
+			return -ENOMEM;
+
+		*segs = tmp;
+		seg = *segs + *seg_cnt;
+		(*seg_cnt)++;
+
+		seg->start = phdr.p_vaddr;
+		seg->end = phdr.p_vaddr + phdr.p_memsz;
+		seg->offset = phdr.p_offset;
+		seg->is_exec = phdr.p_flags & PF_X;
+	}
+
+	if (*seg_cnt == 0) {
+		pr_warn("usdt: failed to find PT_LOAD program headers in '%s'\n", path);
+		return -ESRCH;
+	}
+
+	qsort(*segs, *seg_cnt, sizeof(**segs), cmp_elf_segs);
+	return 0;
+}
+
+static int parse_lib_segs(int pid, const char *lib_path, struct elf_seg **segs, size_t *seg_cnt)
+{
+	char path[PATH_MAX], line[PATH_MAX], mode[16];
+	size_t seg_start, seg_end, seg_off;
+	struct elf_seg *seg;
+	int tmp_pid, i, err;
+	FILE *f;
+
+	*seg_cnt = 0;
+
+	/* Handle containerized binaries only accessible from
+	 * /proc/<pid>/root/<path>. They will be reported as just /<path> in
+	 * /proc/<pid>/maps.
+	 */
+	if (sscanf(lib_path, "/proc/%d/root%s", &tmp_pid, path) == 2 && pid == tmp_pid)
+		goto proceed;
+
+	if (!realpath(lib_path, path)) {
+		pr_warn("usdt: failed to get absolute path of '%s' (err %d), using path as is...\n",
+			lib_path, -errno);
+		strcpy(path, lib_path);
+	}
+
+proceed:
+	sprintf(line, "/proc/%d/maps", pid);
+	f = fopen(line, "r");
+	if (!f) {
+		err = -errno;
+		pr_warn("usdt: failed to open '%s' to get base addr of '%s': %d\n",
+			line, lib_path, err);
+		return err;
+	}
+
+	/* We need to handle lines with no path at the end:
+	 *
+	 * 7f5c6f5d1000-7f5c6f5d3000 rw-p 001c7000 08:04 21238613      /usr/lib64/libc-2.17.so
+	 * 7f5c6f5d3000-7f5c6f5d8000 rw-p 00000000 00:00 0
+	 * 7f5c6f5d8000-7f5c6f5d9000 r-xp 00000000 103:01 362990598    /data/users/andriin/linux/tools/bpf/usdt/libhello_usdt.so
+	 */
+	while (fscanf(f, "%zx-%zx %s %zx %*s %*d%[^\n]\n",
+		      &seg_start, &seg_end, mode, &seg_off, line) == 5) {
+		void *tmp;
+
+		/* to handle no path case (see above) we need to capture line
+		 * without skipping any whitespaces. So we need to strip
+		 * leading whitespaces manually here
+		 */
+		i = 0;
+		while (isblank(line[i]))
+			i++;
+		if (strcmp(line + i, path) != 0)
+			continue;
+
+		pr_debug("usdt: discovered segment for lib '%s': addrs %zx-%zx mode %s offset %zx\n",
+			 path, seg_start, seg_end, mode, seg_off);
+
+		/* ignore non-executable sections for shared libs */
+		if (mode[2] != 'x')
+			continue;
+
+		tmp = libbpf_reallocarray(*segs, *seg_cnt + 1, sizeof(**segs));
+		if (!tmp) {
+			err = -ENOMEM;
+			goto err_out;
+		}
+
+		*segs = tmp;
+		seg = *segs + *seg_cnt;
+		*seg_cnt += 1;
+
+		seg->start = seg_start;
+		seg->end = seg_end;
+		seg->offset = seg_off;
+		seg->is_exec = true;
+	}
+
+	if (*seg_cnt == 0) {
+		pr_warn("usdt: failed to find '%s' (resolved to '%s') within PID %d memory mappings\n",
+			lib_path, path, pid);
+		err = -ESRCH;
+		goto err_out;
+	}
+
+	qsort(*segs, *seg_cnt, sizeof(**segs), cmp_elf_segs);
+	err = 0;
+err_out:
+	fclose(f);
+	return err;
+}
+
+static struct elf_seg *find_elf_seg(struct elf_seg *segs, size_t seg_cnt, long addr, bool relative)
+{
+	struct elf_seg *seg;
+	int i;
+
+	if (relative) {
+		/* for shared libraries, address is relative offset and thus
+		 * should be fall within logical offset-based range of
+		 * [offset_start, offset_end)
+		 */
+		for (i = 0, seg = segs; i < seg_cnt; i++, seg++) {
+			if (seg->offset <= addr && addr < seg->offset + (seg->end - seg->start))
+				return seg;
+		}
+	} else {
+		/* for binaries, address is absolute and thus should be within
+		 * absolute address range of [seg_start, seg_end)
+		 */
+		for (i = 0, seg = segs; i < seg_cnt; i++, seg++) {
+			if (seg->start <= addr && addr < seg->end)
+				return seg;
+		}
+	}
+
+	return NULL;
+}
+
+static int parse_usdt_note(Elf *elf, const char *path, long base_addr,
+			   GElf_Nhdr *nhdr, const char *data, size_t name_off, size_t desc_off,
+			   struct usdt_note *usdt_note);
+
+static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note, long usdt_cookie);
+
 static int collect_usdt_targets(struct usdt_manager *man, Elf *elf, const char *path, pid_t pid,
 				const char *usdt_provider, const char *usdt_name, long usdt_cookie,
 				struct usdt_target **out_targets, size_t *out_target_cnt)
 {
-	return -ENOTSUP;
+	size_t off, name_off, desc_off, seg_cnt = 0, lib_seg_cnt = 0, target_cnt = 0;
+	struct elf_seg *segs = NULL, *lib_segs = NULL;
+	struct usdt_target *targets = NULL, *target;
+	long base_addr = 0;
+	Elf_Scn *notes_scn, *base_scn;
+	GElf_Shdr base_shdr, notes_shdr;
+	GElf_Ehdr ehdr;
+	GElf_Nhdr nhdr;
+	Elf_Data *data;
+	int err;
+
+	*out_targets = NULL;
+	*out_target_cnt = 0;
+
+	err = find_elf_sec_by_name(elf, USDT_NOTE_SEC, &notes_shdr, &notes_scn);
+	if (err)
+		return err;
+
+	if (notes_shdr.sh_type != SHT_NOTE)
+		return -EINVAL;
+
+	if (!gelf_getehdr(elf, &ehdr))
+		return -EINVAL;
+
+	err = parse_elf_segs(elf, path, &segs, &seg_cnt);
+	if (err) {
+		pr_warn("usdt: failed to process ELF program segments for '%s': %d\n", path, err);
+		goto err_out;
+	}
+
+	/* .stapsdt.base ELF section is optional, but is used for prelink
+	 * offset compensation (see a big comment further below)
+	 */
+	if (find_elf_sec_by_name(elf, USDT_BASE_SEC, &base_shdr, &base_scn) == 0)
+		base_addr = base_shdr.sh_addr;
+
+	data = elf_getdata(notes_scn, 0);
+	off = 0;
+	while ((off = gelf_getnote(data, off, &nhdr, &name_off, &desc_off)) > 0) {
+		long usdt_abs_ip, usdt_rel_ip, usdt_sema_off = 0;
+		struct usdt_note note;
+		struct elf_seg *seg = NULL;
+		void *tmp;
+
+		err = parse_usdt_note(elf, path, base_addr, &nhdr,
+				      data->d_buf, name_off, desc_off, &note);
+		if (err)
+			goto err_out;
+
+		if (strcmp(note.provider, usdt_provider) != 0 || strcmp(note.name, usdt_name) != 0)
+			continue;
+
+		/* We need to compensate "prelink effect". See [0] for details,
+		 * relevant parts quoted here:
+		 *
+		 * Each SDT probe also expands into a non-allocated ELF note. You can
+		 * find this by looking at SHT_NOTE sections and decoding the format;
+		 * see below for details. Because the note is non-allocated, it means
+		 * there is no runtime cost, and also preserved in both stripped files
+		 * and .debug files.
+		 *
+		 * However, this means that prelink won't adjust the note's contents
+		 * for address offsets. Instead, this is done via the .stapsdt.base
+		 * section. This is a special section that is added to the text. We
+		 * will only ever have one of these sections in a final link and it
+		 * will only ever be one byte long. Nothing about this section itself
+		 * matters, we just use it as a marker to detect prelink address
+		 * adjustments.
+		 *
+		 * Each probe note records the link-time address of the .stapsdt.base
+		 * section alongside the probe PC address. The decoder compares the
+		 * base address stored in the note with the .stapsdt.base section's
+		 * sh_addr. Initially these are the same, but the section header will
+		 * be adjusted by prelink. So the decoder applies the difference to
+		 * the probe PC address to get the correct prelinked PC address; the
+		 * same adjustment is applied to the semaphore address, if any. 
+		 *
+		 *   [0] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
+		 */
+		usdt_rel_ip = usdt_abs_ip = note.loc_addr;
+		if (base_addr) {
+			usdt_abs_ip += base_addr - note.base_addr;
+			usdt_rel_ip += base_addr - note.base_addr;
+		}
+
+		if (ehdr.e_type == ET_EXEC) {
+			/* When attaching uprobes (which what USDTs basically
+			 * are) kernel expects a relative IP to be specified,
+			 * so if we are attaching to an executable ELF binary
+			 * (i.e., not a shared library), we need to calculate
+			 * proper relative IP based on ELF's load address
+			 */
+			seg = find_elf_seg(segs, seg_cnt, usdt_abs_ip, false /* relative */);
+			if (!seg) {
+				err = -ESRCH;
+				pr_warn("usdt: failed to find ELF program segment for '%s:%s' in '%s' at IP 0x%lx\n",
+					usdt_provider, usdt_name, path, usdt_abs_ip);
+				goto err_out;
+			}
+			if (!seg->is_exec) {
+				err = -ESRCH;
+				pr_warn("usdt: matched ELF binary '%s' segment [0x%lx, 0x%lx) for '%s:%s' at IP 0x%lx is not executable\n",
+				        path, seg->start, seg->end, usdt_provider, usdt_name,
+					usdt_abs_ip);
+				goto err_out;
+			}
+
+			usdt_rel_ip = usdt_abs_ip - (seg->start - seg->offset);
+		} else if (!man->has_bpf_cookie) { /* ehdr.e_type == ET_DYN */
+			/* If we don't have BPF cookie support but need to
+			 * attach to a shared library, we'll need to know and
+			 * record absolute addresses of attach points due to
+			 * the need to lookup USDT spec by absolute IP of
+			 * triggered uprobe. Doing this resolution is only
+			 * possible when we have a specific PID of the process
+			 * that's using specified shared library. BPF cookie
+			 * removes the absolute address limitation as we don't
+			 * need to do this lookup (we just use BPF cookie as
+			 * an index of USDT spec), so for newer kernels with
+			 * BPF cookie support libbpf supports USDT attachment
+			 * to shared libraries with no PID filter.
+			 */
+			if (pid < 0) {
+				pr_warn("usdt: attaching to shared libaries without specific PID is not supported on current kernel\n");
+				err = -ENOTSUP;
+				goto err_out;
+			}
+
+			/* lib_segs are lazily initialized only if necessary */
+			if (lib_seg_cnt == 0) {
+				err = parse_lib_segs(pid, path, &lib_segs, &lib_seg_cnt);
+				if (err) {
+					pr_warn("usdt: failed to get memory segments in PID %d for shared library '%s': %d\n",
+						pid, path, err);
+					goto err_out;
+				}
+			}
+
+			seg = find_elf_seg(lib_segs, lib_seg_cnt, usdt_rel_ip, true /* relative */);
+			if (!seg) {
+				err = -ESRCH;
+				pr_warn("usdt: failed to find shared lib memory segment for '%s:%s' in '%s' at relative IP 0x%lx\n",
+				         usdt_provider, usdt_name, path, usdt_rel_ip);
+				goto err_out;
+			}
+
+			usdt_abs_ip = seg->start + (usdt_rel_ip - seg->offset);
+		}
+
+		pr_debug("usdt: probe for '%s:%s' in %s '%s': addr 0x%lx base 0x%lx (resolved abs_ip 0x%lx rel_ip 0x%lx) args '%s' in segment [0x%lx, 0x%lx) at offset 0x%lx\n",
+			 usdt_provider, usdt_name, ehdr.e_type == ET_EXEC ? "exec" : "lib ", path,
+			 note.loc_addr, note.base_addr, usdt_abs_ip, usdt_rel_ip, note.args,
+			 seg ? seg->start : 0, seg ? seg->end : 0, seg ? seg->offset : 0);
+
+		/* Adjust semaphore address to be a relative offset */
+		if (note.sema_addr) {
+			if (!man->has_sema_refcnt) {
+				pr_warn("usdt: kernel doesn't support USDT semaphore refcounting for '%s:%s' in '%s'\n",
+					usdt_provider, usdt_name, path);
+				err = -ENOTSUP;
+				goto err_out;
+			}
+
+			seg = find_elf_seg(segs, seg_cnt, note.sema_addr, false /* relative */);
+			if (!seg) {
+				err = -ESRCH;
+				pr_warn("usdt: failed to find ELF loadable segment with semaphore of '%s:%s' in '%s' at 0x%lx\n",
+				        usdt_provider, usdt_name, path, note.sema_addr);
+				goto err_out;
+			}
+			if (seg->is_exec) {
+				err = -ESRCH;
+				pr_warn("usdt: matched ELF binary '%s' segment [0x%lx, 0x%lx] for semaphore of '%s:%s' at 0x%lx is executable\n",
+					path, seg->start, seg->end, usdt_provider, usdt_name,
+					note.sema_addr);
+				goto err_out;
+			}
+
+			usdt_sema_off = note.sema_addr - (seg->start - seg->offset);
+
+			pr_debug("usdt: sema  for '%s:%s' in %s '%s': addr 0x%lx base 0x%lx (resolved 0x%lx) in segment [0x%lx, 0x%lx] at offset 0x%lx\n",
+				 usdt_provider, usdt_name, ehdr.e_type == ET_EXEC ? "exec" : "lib ",
+				 path, note.sema_addr, note.base_addr, usdt_sema_off,
+				 seg->start, seg->end, seg->offset);
+		}
+
+		/* Record adjusted addresses and offsets and parse USDT spec */
+		tmp = libbpf_reallocarray(targets, target_cnt + 1, sizeof(*targets));
+		if (!tmp) {
+			err = -ENOMEM;
+			goto err_out;
+		}
+		targets = tmp;
+
+		target = &targets[target_cnt];
+		memset(target, 0, sizeof(*target));
+
+		target->abs_ip = usdt_abs_ip;
+		target->rel_ip = usdt_rel_ip;
+		target->sema_off = usdt_sema_off;
+
+		/* notes->args references strings from Elf itself, so they can
+		 * be referenced safely until elf_end() call
+		 */
+		target->spec_str = note.args;
+
+		err = parse_usdt_spec(&target->spec, &note, usdt_cookie);
+		if (err)
+			goto err_out;
+
+		target_cnt++;
+	}
+
+	*out_targets = targets;
+	*out_target_cnt = target_cnt;
+	err = target_cnt;
+
+err_out:
+	free(segs);
+	free(lib_segs);
+	if (err < 0)
+		free(targets);
+	return err;
 }
 
 struct bpf_link_usdt {
@@ -255,6 +739,7 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
 		link->uprobe_cnt++;
 	}
 
+	free(targets);
 	elf_end(elf);
 	close(fd);
 
@@ -263,8 +748,102 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
 err_out:
 	bpf_link__destroy(&link->link);
 
+	free(targets);
 	if (elf)
 		elf_end(elf);
 	close(fd);
 	return libbpf_err_ptr(err);
 }
+
+/* Parse out USDT ELF note from '.note.stapsdt' section.
+ * Logic inspired by perf's code.
+ */
+static int parse_usdt_note(Elf *elf, const char *path, long base_addr,
+			   GElf_Nhdr *nhdr, const char *data, size_t name_off, size_t desc_off,
+			   struct usdt_note *note)
+{
+	const char *provider, *name, *args;
+	long addrs[3];
+	size_t len;
+
+	/* sanity check USDT note name and type first */
+	if (strncmp(data + name_off, USDT_NOTE_NAME, nhdr->n_namesz) != 0)
+		return -EINVAL;
+	if (nhdr->n_type != USDT_NOTE_TYPE)
+		return -EINVAL;
+
+	/* sanity check USDT note contents ("description" in ELF terminology) */
+	len = nhdr->n_descsz;
+	data = data + desc_off;
+
+	/* +3 is the very minimum required to store three empty strings */
+	if (len < sizeof(addrs) + 3)
+		return -EINVAL;
+
+	/* get location, base, and semaphore addrs */
+	memcpy(&addrs, data, sizeof(addrs));
+
+	/* parse string fields: provider, name, args */
+	provider = data + sizeof(addrs);
+
+	name = (const char *)memchr(provider, '\0', data + len - provider);
+	if (!name) /* non-zero-terminated provider */
+		return -EINVAL;
+	name++;
+	if (name >= data + len || *name == '\0') /* missing or empty name */
+		return -EINVAL;
+
+	args = memchr(name, '\0', data + len - name);
+	if (!args) /* non-zero-terminated name */
+		return -EINVAL;
+	++args;
+	if (args >= data + len) /* missing arguments spec */
+		return -EINVAL;
+
+	note->provider = provider;
+	note->name = name;
+	if (*args == '\0' || *args == ':')
+		note->args = "";
+	else
+		note->args = args;
+	note->loc_addr = addrs[0];
+	note->base_addr = addrs[1];
+	note->sema_addr = addrs[2];
+
+	return 0;
+}
+
+static int parse_usdt_arg(const char *arg_str, int arg_num, struct usdt_arg_spec *arg);
+
+static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note, long usdt_cookie)
+{
+	const char *s;
+	int len;
+
+	spec->usdt_cookie = usdt_cookie;
+	spec->arg_cnt = 0;
+
+	s = note->args;
+	while (s[0]) {
+		if (spec->arg_cnt >= USDT_MAX_ARG_CNT) {
+			pr_warn("usdt: too many USDT arguments (> %d) for '%s:%s' with args spec '%s'\n",
+				USDT_MAX_ARG_CNT, note->provider, note->name, note->args);
+			return -E2BIG;
+		}
+
+		len = parse_usdt_arg(s, spec->arg_cnt, &spec->args[spec->arg_cnt]);
+		if (len < 0)
+			return len;
+
+		s += len;
+		spec->arg_cnt++;
+	}
+
+	return 0;
+}
+
+static int parse_usdt_arg(const char *arg_str, int arg_num, struct usdt_arg_spec *arg)
+{
+	pr_warn("usdt: libbpf doesn't support USDTs on current architecture\n");
+	return -ENOTSUP;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 4/7] libbpf: wire up spec management and other arch-independent USDT logic
  2022-03-25  5:29 [PATCH bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2022-03-25  5:29 ` [PATCH bpf-next 3/7] libbpf: add USDT notes parsing and resolution logic Andrii Nakryiko
@ 2022-03-25  5:29 ` Andrii Nakryiko
  2022-03-31 14:49   ` Alan Maguire
  2022-03-25  5:29 ` [PATCH bpf-next 5/7] libbpf: add x86-specific USDT arg spec parsing logic Andrii Nakryiko
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-25  5:29 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky

Last part of architecture-agnostic user-space USDT handling logic is to
set up BPF spec and, optionally, IP-to-ID maps from user-space.
usdt_manager performs a compact spec ID allocation to utilize
fixed-sized BPF maps as efficiently as possible. We also use hashmap to
deduplicate USDT arg spec strings and map identical strings to single
USDT spec, minimizing the necessary BPF map size. usdt_manager supports
arbitrary sequences of attachment and detachment, both of the same USDT
and multiple different USDTs and internally maintains a free list of
unused spec IDs. bpf_link_usdt's logic is extended with proper setup and
teardown of this spec ID free list and supporting BPF maps.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/usdt.c | 167 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 166 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
index 86d5d8390eb1..22f5f56992f8 100644
--- a/tools/lib/bpf/usdt.c
+++ b/tools/lib/bpf/usdt.c
@@ -74,6 +74,10 @@ struct usdt_manager {
 	struct bpf_map *specs_map;
 	struct bpf_map *ip_to_id_map;
 
+	int *free_spec_ids;
+	size_t free_spec_cnt;
+	size_t next_free_spec_id;
+
 	bool has_bpf_cookie;
 	bool has_sema_refcnt;
 };
@@ -118,6 +122,7 @@ void usdt_manager_free(struct usdt_manager *man)
 	if (!man)
 		return;
 
+	free(man->free_spec_ids);
 	free(man);
 }
 
@@ -623,6 +628,9 @@ struct bpf_link_usdt {
 
 	struct usdt_manager *usdt_man;
 
+	size_t spec_cnt;
+	int *spec_ids;
+
 	size_t uprobe_cnt;
 	struct {
 		long abs_ip;
@@ -633,11 +641,52 @@ struct bpf_link_usdt {
 static int bpf_link_usdt_detach(struct bpf_link *link)
 {
 	struct bpf_link_usdt *usdt_link = container_of(link, struct bpf_link_usdt, link);
+	struct usdt_manager *man = usdt_link->usdt_man;
 	int i;
 
 	for (i = 0; i < usdt_link->uprobe_cnt; i++) {
 		/* detach underlying uprobe link */
 		bpf_link__destroy(usdt_link->uprobes[i].link);
+		/* there is no need to update specs map because it will be
+		 * unconditionally overwritten on subsequent USDT attaches,
+		 * but if BPF cookies are not used we need to remove entry
+		 * from ip_to_id map, otherwise we'll run into false
+		 * conflicting IP errors
+		 */
+		if (!man->has_bpf_cookie) {
+			/* not much we can do about errors here */
+			(void)bpf_map_delete_elem(bpf_map__fd(man->ip_to_id_map),
+						  &usdt_link->uprobes[i].abs_ip);
+		}
+	}
+
+	/* try to return the list of previously used spec IDs to usdt_manager
+	 * for future reuse for subsequent USDT attaches
+	 */
+	if (!man->free_spec_ids) {
+		/* if there were no free spec IDs yet, just transfer our IDs */
+		man->free_spec_ids = usdt_link->spec_ids;
+		man->free_spec_cnt = usdt_link->spec_cnt;
+		usdt_link->spec_ids = NULL;
+	} else {
+		/* otherwise concat IDs */
+		size_t new_cnt = man->free_spec_cnt + usdt_link->spec_cnt;
+		int *new_free_ids;
+
+		new_free_ids = libbpf_reallocarray(man->free_spec_ids, new_cnt,
+						   sizeof(*new_free_ids));
+		/* If we couldn't resize free_spec_ids, we'll just leak
+		 * a bunch of free IDs; this is very unlikely to happen and if
+		 * system is so exausted on memory, it's the least of user's
+		 * concerns, probably.
+		 * So just do our best here to return those IDs to usdt_manager.
+		 */
+		if (new_free_ids) {
+			memcpy(new_free_ids + man->free_spec_cnt, usdt_link->spec_ids,
+			       usdt_link->spec_cnt * sizeof(*usdt_link->spec_ids));
+			man->free_spec_ids = new_free_ids;
+			man->free_spec_cnt = new_cnt;
+		}
 	}
 
 	return 0;
@@ -647,17 +696,82 @@ static void bpf_link_usdt_dealloc(struct bpf_link *link)
 {
 	struct bpf_link_usdt *usdt_link = container_of(link, struct bpf_link_usdt, link);
 
+	free(usdt_link->spec_ids);
 	free(usdt_link->uprobes);
 	free(usdt_link);
 }
 
+static int prepare_spec_id(struct usdt_manager *man, struct bpf_link_usdt *link,
+			   struct hashmap *specs_hash, struct usdt_target *target,
+			   int *spec_id, bool *id_exists)
+{
+	void *tmp;
+	bool reused;
+	int err;
+
+	/* check if we already assigned spec ID for this spec string */
+	if (hashmap__find(specs_hash, target->spec_str, &tmp)) {
+		*id_exists = true;
+		*spec_id = (long)tmp;
+		return 0;
+	}
+
+	/* otherwise get next free spec ID, giving preference to free list */
+	tmp = libbpf_reallocarray(link->spec_ids, link->spec_cnt + 1, sizeof(*link->spec_ids));
+	if (!tmp)
+		return -ENOMEM;
+	link->spec_ids = tmp;
+
+	if (man->free_spec_cnt) {
+		*spec_id = man->free_spec_ids[man->free_spec_cnt - 1];
+		man->free_spec_cnt--;
+		reused = true;
+	} else {
+		*spec_id = man->next_free_spec_id;
+		man->next_free_spec_id++;
+		reused = false;
+	}
+
+	/* cache spec ID for current spec string for future lookups */
+	err = hashmap__add(specs_hash, target->spec_str, (void *)(long)*spec_id);
+	if (err) {
+		/* undo usdt_manager state changes, if we got an error */
+		if (reused)
+			man->free_spec_cnt++;
+		else
+			man->next_free_spec_id--;
+		return err;
+	}
+
+	*id_exists = false;
+	link->spec_ids[link->spec_cnt] = *spec_id;
+	link->spec_cnt++;
+	return 0;
+}
+
+static size_t specs_hash_fn(const void *key, void *ctx)
+{
+	const char *s = key;
+
+	return str_hash(s);
+}
+
+static bool specs_equal_fn(const void *key1, const void *key2, void *ctx)
+{
+	const char *s1 = key1;
+	const char *s2 = key2;
+
+	return strcmp(s1, s2) == 0;
+}
+
 struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct bpf_program *prog,
 					  pid_t pid, const char *path,
 					  const char *usdt_provider, const char *usdt_name,
 					  long usdt_cookie)
 {
-	int i, fd, err;
+	int i, fd, err, spec_map_fd, ip_map_fd;
 	LIBBPF_OPTS(bpf_uprobe_opts, opts);
+	struct hashmap *specs_hash = NULL;
 	struct bpf_link_usdt *link = NULL;
 	struct usdt_target *targets = NULL;
 	size_t target_cnt;
@@ -669,6 +783,9 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
 		return libbpf_err_ptr(-EINVAL);
 	}
 
+	spec_map_fd = bpf_map__fd(man->specs_map);
+	ip_map_fd = bpf_map__fd(man->ip_to_id_map);
+
 	/* TODO: perform path resolution similar to uprobe's */
 	fd = open(path, O_RDONLY);
 	if (fd < 0) {
@@ -704,6 +821,12 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
 		goto err_out;
 	}
 
+	specs_hash = hashmap__new(specs_hash_fn, specs_equal_fn, NULL);
+	if (IS_ERR(specs_hash)) {
+		err = PTR_ERR(specs_hash);
+		goto err_out;
+	}
+
 	link = calloc(1, sizeof(*link));
 	if (!link) {
 		err = -ENOMEM;
@@ -723,8 +846,48 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
 	for (i = 0; i < target_cnt; i++) {
 		struct usdt_target *target = &targets[i];
 		struct bpf_link *uprobe_link;
+		bool reused;
+		int spec_id;
+
+		/* Spec ID can be either reused or newly allocated. If it is
+		 * newly allocated, we'll need to fill out spec map, otherwise
+		 * entire spec should be valid and can be just used by a new
+		 * uprobe. We reuse spec when USDT arg spec is identical. We
+		 * also never share specs between two different USDT
+		 * attachments ("links"), so all the reused specs already
+		 * share USDT cookie value implicitly.
+		 */
+		err = prepare_spec_id(man, link, specs_hash, target, &spec_id, &reused);
+		if (err)
+			goto err_out;
+
+		if (!reused &&
+		    bpf_map_update_elem(spec_map_fd, &spec_id, &target->spec, BPF_ANY)) {
+			err = -errno;
+			pr_warn("usdt: failed to set USDT spec #%d for '%s:%s' in '%s': %d\n",
+				spec_id, usdt_provider, usdt_name, path, err);
+			/* make sure we don't return this bad spec ID into the
+			 * pool of free spec IDs
+			 */
+			link->spec_cnt--;
+			goto err_out;
+		}
+		if (!man->has_bpf_cookie &&
+		    bpf_map_update_elem(ip_map_fd, &target->abs_ip, &spec_id, BPF_NOEXIST)) {
+			err = -errno;
+			if (err == -EEXIST) {
+				pr_warn("usdt: IP collision detected for spec #%d for '%s:%s' in '%s'\n",
+				        spec_id, usdt_provider, usdt_name, path);
+			} else {
+				pr_warn("usdt: failed to map IP 0x%lx to spec #%d for '%s:%s' in '%s': %d\n",
+					target->abs_ip, spec_id, usdt_provider, usdt_name,
+					path, err);
+			}
+			goto err_out;
+		}
 
 		opts.ref_ctr_offset = target->sema_off;
+		opts.bpf_cookie = man->has_bpf_cookie ? spec_id : 0;
 		uprobe_link = bpf_program__attach_uprobe_opts(prog, pid, path,
 							      target->rel_ip, &opts);
 		err = libbpf_get_error(link);
@@ -740,6 +903,7 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
 	}
 
 	free(targets);
+	hashmap__free(specs_hash);
 	elf_end(elf);
 	close(fd);
 
@@ -749,6 +913,7 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
 	bpf_link__destroy(&link->link);
 
 	free(targets);
+	hashmap__free(specs_hash);
 	if (elf)
 		elf_end(elf);
 	close(fd);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 5/7] libbpf: add x86-specific USDT arg spec parsing logic
  2022-03-25  5:29 [PATCH bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2022-03-25  5:29 ` [PATCH bpf-next 4/7] libbpf: wire up spec management and other arch-independent USDT logic Andrii Nakryiko
@ 2022-03-25  5:29 ` Andrii Nakryiko
  2022-03-31 15:13   ` Alan Maguire
  2022-03-25  5:29 ` [PATCH bpf-next 6/7] selftests/bpf: add basic USDT selftests Andrii Nakryiko
  2022-03-25  5:29 ` [PATCH bpf-next 7/7] selftests/bpf: add urandom_read shared lib and USDTs Andrii Nakryiko
  6 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-25  5:29 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky

Add x86/x86_64-specific USDT argument specification parsing. Each
architecture will require their own logic, as all this is arch-specific
assembly-based notation. Architectures that libbpf doesn't support for
USDTs will pr_warn() with specific error and return -ENOTSUP.

We use sscanf() as a very powerful and easy to use string parser. Those
spaces in sscanf's format string mean "skip any whitespaces", which is
pretty nifty (and somewhat little known) feature.

All this was tested on little-endian architecture, so bit shifts are
probably off on big-endian, which our CI will hopefully prove.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/usdt.c | 105 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 105 insertions(+)

diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
index 22f5f56992f8..5cf809db60aa 100644
--- a/tools/lib/bpf/usdt.c
+++ b/tools/lib/bpf/usdt.c
@@ -1007,8 +1007,113 @@ static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note,
 	return 0;
 }
 
+/* Architecture-specific logic for parsing USDT argument location specs */
+
+#if defined(__x86_64__) || defined(__i386__)
+
+static int calc_pt_regs_off(const char *reg_name)
+{
+	static struct {
+		const char *names[4];
+		size_t pt_regs_off;
+	} reg_map[] = {
+#if __x86_64__
+#define reg_off(reg64, reg32) offsetof(struct pt_regs, reg64)
+#else
+#define reg_off(reg64, reg32) offsetof(struct pt_regs, reg32)
+#endif
+		{ {"rip", "eip", "", ""}, reg_off(rip, eip) },
+		{ {"rax", "eax", "ax", "al"}, reg_off(rax, eax) },
+		{ {"rbx", "ebx", "bx", "bl"}, reg_off(rbx, ebx) },
+		{ {"rcx", "ecx", "cx", "cl"}, reg_off(rcx, ecx) },
+		{ {"rdx", "edx", "dx", "dl"}, reg_off(rdx, edx) },
+		{ {"rsi", "esi", "si", "sil"}, reg_off(rsi, esi) },
+		{ {"rdi", "edi", "di", "dil"}, reg_off(rdi, edi) },
+		{ {"rbp", "ebp", "bp", "bpl"}, reg_off(rbp, ebp) },
+		{ {"rsp", "esp", "sp", "spl"}, reg_off(rsp, esp) },
+#undef reg_off
+#if __x86_64__
+		{ {"r8", "r8d", "r8w", "r8b"}, offsetof(struct pt_regs, r8) },
+		{ {"r9", "r9d", "r9w", "r9b"}, offsetof(struct pt_regs, r9) },
+		{ {"r10", "r10d", "r10w", "r10b"}, offsetof(struct pt_regs, r10) },
+		{ {"r11", "r11d", "r11w", "r11b"}, offsetof(struct pt_regs, r11) },
+		{ {"r12", "r12d", "r12w", "r12b"}, offsetof(struct pt_regs, r12) },
+		{ {"r13", "r13d", "r13w", "r13b"}, offsetof(struct pt_regs, r13) },
+		{ {"r14", "r14d", "r14w", "r14b"}, offsetof(struct pt_regs, r14) },
+		{ {"r15", "r15d", "r15w", "r15b"}, offsetof(struct pt_regs, r15) },
+#endif
+	};
+	int i, j;
+
+	for (i = 0; i < ARRAY_SIZE(reg_map); i++) {
+		for (j = 0; j < ARRAY_SIZE(reg_map[i].names); j++) {
+			if (strcmp(reg_name, reg_map[i].names[j]) == 0)
+				return reg_map[i].pt_regs_off;
+		}
+	}
+
+	pr_warn("usdt: unrecognized register '%s'\n", reg_name);
+	return -ENOENT;
+}
+
+static int parse_usdt_arg(const char *arg_str, int arg_num, struct usdt_arg_spec *arg)
+{
+	char *reg_name = NULL;
+	int arg_sz, len, reg_off;
+	long off;
+
+	if (3 == sscanf(arg_str, " %d @ %ld ( %%%m[^)] ) %n", &arg_sz, &off, &reg_name, &len)) {
+		/* -4@-20(%rbp) */
+		arg->arg_type = USDT_ARG_REG_DEREF;
+		arg->val_off = off;
+		reg_off = calc_pt_regs_off(reg_name);
+		free(reg_name);
+		if (reg_off < 0)
+			return reg_off;
+		arg->reg_off = reg_off;
+	} else if (2 == sscanf(arg_str, " %d @ %%%ms %n", &arg_sz, &reg_name, &len)) {
+		/* -4@%eax */
+		arg->arg_type = USDT_ARG_REG;
+		arg->val_off = 0;
+
+		reg_off = calc_pt_regs_off(reg_name);
+		free(reg_name);
+		if (reg_off < 0)
+			return reg_off;
+		arg->reg_off = reg_off;
+	} else if (2 == sscanf(arg_str, " %d @ $%ld %n", &arg_sz, &off, &len)) {
+		/* 4@$71 */
+		arg->arg_type = USDT_ARG_CONST;
+		arg->val_off = off;
+		arg->reg_off = 0;
+	} else {
+		pr_warn("usdt: unrecognized arg #%d spec '%s'\n", arg_num, arg_str);
+		return -EINVAL;
+	}
+
+	arg->arg_signed = arg_sz < 0;
+	if (arg_sz < 0)
+		arg_sz = -arg_sz;
+
+	switch (arg_sz) {
+	case 1: case 2: case 4: case 8:
+		arg->arg_bitshift = 64 - arg_sz * 8;
+		break;
+	default:
+		pr_warn("usdt: unsupported arg #%d (spec '%s') size: %d\n",
+			arg_num, arg_str, arg_sz);
+		return -EINVAL;
+	}
+
+	return len;
+}
+
+#else
+
 static int parse_usdt_arg(const char *arg_str, int arg_num, struct usdt_arg_spec *arg)
 {
 	pr_warn("usdt: libbpf doesn't support USDTs on current architecture\n");
 	return -ENOTSUP;
 }
+
+#endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 6/7] selftests/bpf: add basic USDT selftests
  2022-03-25  5:29 [PATCH bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2022-03-25  5:29 ` [PATCH bpf-next 5/7] libbpf: add x86-specific USDT arg spec parsing logic Andrii Nakryiko
@ 2022-03-25  5:29 ` Andrii Nakryiko
  2022-03-31 15:54   ` Alan Maguire
  2022-03-25  5:29 ` [PATCH bpf-next 7/7] selftests/bpf: add urandom_read shared lib and USDTs Andrii Nakryiko
  6 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-25  5:29 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky

Add semaphore-based USDT to test_progs itself and write basic tests to
valicate both auto-attachment and manual attachment logic, as well as
BPF-side functionality.

Also add subtests to validate that libbpf properly deduplicates USDT
specs and handles spec overflow situations correctly, as well as proper
"rollback" of partially-attached multi-spec USDT.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/testing/selftests/bpf/Makefile          |   1 +
 tools/testing/selftests/bpf/prog_tests/usdt.c | 314 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/test_usdt.c | 115 +++++++
 3 files changed, 430 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/usdt.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_usdt.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 3820608faf57..18e22def3bdb 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -400,6 +400,7 @@ $(TRUNNER_BPF_OBJS): $(TRUNNER_OUTPUT)/%.o:				\
 		     $(TRUNNER_BPF_PROGS_DIR)/*.h			\
 		     $$(INCLUDE_DIR)/vmlinux.h				\
 		     $(wildcard $(BPFDIR)/bpf_*.h)			\
+		     $(wildcard $(BPFDIR)/*.bpf.h)			\
 		     | $(TRUNNER_OUTPUT) $$(BPFOBJ)
 	$$(call $(TRUNNER_BPF_BUILD_RULE),$$<,$$@,			\
 					  $(TRUNNER_BPF_CFLAGS))
diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
new file mode 100644
index 000000000000..44a20d8c45d7
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
@@ -0,0 +1,314 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+
+#define _SDT_HAS_SEMAPHORES 1
+#include <sys/sdt.h>
+
+#include "test_usdt.skel.h"
+#include "test_urandom_usdt.skel.h"
+
+int lets_test_this(int);
+
+static volatile int idx = 2;
+static volatile __u64 bla = 0xFEDCBA9876543210ULL;
+static volatile short nums[] = {-1, -2, -3, };
+
+static volatile struct {
+	int x;
+	signed char y;
+} t1 = { 1, -127 };
+
+#define SEC(name) __attribute__((section(name), used))
+
+unsigned short test_usdt0_semaphore SEC(".probes");
+unsigned short test_usdt3_semaphore SEC(".probes");
+unsigned short test_usdt12_semaphore SEC(".probes");
+
+static void __always_inline trigger_func(int x) {
+	long y = 42;
+
+	if (test_usdt0_semaphore)
+		STAP_PROBE(test, usdt0);
+	if (test_usdt3_semaphore)
+		STAP_PROBE3(test, usdt3, x, y, &bla);
+	if (test_usdt12_semaphore) {
+		STAP_PROBE12(test, usdt12,
+			     x, x + 1, y, x + y, 5,
+			     y / 7, bla, &bla, -9, nums[x],
+			     nums[idx], t1.y);
+	}
+}
+
+static void subtest_basic_usdt(void)
+{
+	LIBBPF_OPTS(bpf_usdt_opts, opts);
+	struct test_usdt *skel;
+	struct test_usdt__bss *bss;
+	int err;
+
+	skel = test_usdt__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	bss = skel->bss;
+	bss->my_pid = getpid();
+
+	err = test_usdt__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto cleanup;
+
+	/* usdt0 won't be auto-attached */
+	opts.usdt_cookie = 0xcafedeadbeeffeed;
+	skel->links.usdt0 = bpf_program__attach_usdt(skel->progs.usdt0,
+						     0 /*self*/, "/proc/self/exe",
+						     "test", "usdt0", &opts);
+	if (!ASSERT_OK_PTR(skel->links.usdt0, "usdt0_link"))
+		goto cleanup;
+
+	trigger_func(1);
+
+	ASSERT_EQ(bss->usdt0_called, 1, "usdt0_called");
+	ASSERT_EQ(bss->usdt3_called, 1, "usdt3_called");
+	ASSERT_EQ(bss->usdt12_called, 1, "usdt12_called");
+
+	ASSERT_EQ(bss->usdt0_cookie, 0xcafedeadbeeffeed, "usdt0_cookie");
+	ASSERT_EQ(bss->usdt0_arg_cnt, 0, "usdt0_arg_cnt");
+	ASSERT_EQ(bss->usdt0_arg_ret, -ENOENT, "usdt0_arg_ret");
+
+	/* auto-attached usdt3 gets default zero cookie value */
+	ASSERT_EQ(bss->usdt3_cookie, 0, "usdt3_cookie");
+	ASSERT_EQ(bss->usdt3_arg_cnt, 3, "usdt3_arg_cnt");
+
+	ASSERT_EQ(bss->usdt3_arg_rets[0], 0, "usdt3_arg1_ret");
+	ASSERT_EQ(bss->usdt3_arg_rets[1], 0, "usdt3_arg2_ret");
+	ASSERT_EQ(bss->usdt3_arg_rets[2], 0, "usdt3_arg3_ret");
+	ASSERT_EQ(bss->usdt3_args[0], 1, "usdt3_arg1");
+	ASSERT_EQ(bss->usdt3_args[1], 42, "usdt3_arg2");
+	ASSERT_EQ(bss->usdt3_args[2], (uintptr_t)&bla, "usdt3_arg3");
+
+	/* auto-attached usdt12 gets default zero cookie value */
+	ASSERT_EQ(bss->usdt12_cookie, 0, "usdt12_cookie");
+	ASSERT_EQ(bss->usdt12_arg_cnt, 12, "usdt12_arg_cnt");
+
+	ASSERT_EQ(bss->usdt12_args[0], 1, "usdt12_arg1");
+	ASSERT_EQ(bss->usdt12_args[1], 1 + 1, "usdt12_arg2");
+	ASSERT_EQ(bss->usdt12_args[2], 42, "usdt12_arg3");
+	ASSERT_EQ(bss->usdt12_args[3], 42 + 1, "usdt12_arg4");
+	ASSERT_EQ(bss->usdt12_args[4], 5, "usdt12_arg5");
+	ASSERT_EQ(bss->usdt12_args[5], 42 / 7, "usdt12_arg6");
+	ASSERT_EQ(bss->usdt12_args[6], bla, "usdt12_arg7");
+	ASSERT_EQ(bss->usdt12_args[7], (uintptr_t)&bla, "usdt12_arg8");
+	ASSERT_EQ(bss->usdt12_args[8], -9, "usdt12_arg9");
+	ASSERT_EQ(bss->usdt12_args[9], nums[1], "usdt12_arg10");
+	ASSERT_EQ(bss->usdt12_args[10], nums[idx], "usdt12_arg11");
+	ASSERT_EQ(bss->usdt12_args[11], t1.y, "usdt12_arg12");
+
+	/* trigger_func() is marked __always_inline, so USDT invocations will be
+	 * inlined in two different places, meaning that each USDT will have
+	 * at least 2 different places to be attached to. This verifies that
+	 * bpf_program__attach_usdt() handles this properly and attaches to
+	 * all possible places of USDT invocation.
+	 */
+	trigger_func(2);
+
+	ASSERT_EQ(bss->usdt0_called, 2, "usdt0_called");
+	ASSERT_EQ(bss->usdt3_called, 2, "usdt3_called");
+	ASSERT_EQ(bss->usdt12_called, 2, "usdt12_called");
+
+	/* only check values that depend on trigger_func()'s input value */
+	ASSERT_EQ(bss->usdt3_args[0], 2, "usdt3_arg1");
+
+	ASSERT_EQ(bss->usdt12_args[0], 2, "usdt12_arg1");
+	ASSERT_EQ(bss->usdt12_args[1], 2 + 1, "usdt12_arg2");
+	ASSERT_EQ(bss->usdt12_args[3], 42 + 2, "usdt12_arg4");
+	ASSERT_EQ(bss->usdt12_args[9], nums[2], "usdt12_arg10");
+
+	/* detach and re-attach usdt3 */
+	bpf_link__destroy(skel->links.usdt3);
+
+	opts.usdt_cookie = 0xBADC00C51E;
+	skel->links.usdt3 = bpf_program__attach_usdt(skel->progs.usdt3, -1 /* any pid */,
+						     "/proc/self/exe", "test", "usdt3", &opts);
+	if (!ASSERT_OK_PTR(skel->links.usdt3, "usdt3_reattach"))
+		goto cleanup;
+
+	trigger_func(3);
+
+	ASSERT_EQ(bss->usdt3_called, 3, "usdt3_called");
+	/* this time usdt3 has custom cookie */
+	ASSERT_EQ(bss->usdt3_cookie, 0xBADC00C51E, "usdt3_cookie");
+	ASSERT_EQ(bss->usdt3_arg_cnt, 3, "usdt3_arg_cnt");
+
+	ASSERT_EQ(bss->usdt3_arg_rets[0], 0, "usdt3_arg1_ret");
+	ASSERT_EQ(bss->usdt3_arg_rets[1], 0, "usdt3_arg2_ret");
+	ASSERT_EQ(bss->usdt3_arg_rets[2], 0, "usdt3_arg3_ret");
+	ASSERT_EQ(bss->usdt3_args[0], 3, "usdt3_arg1");
+	ASSERT_EQ(bss->usdt3_args[1], 42, "usdt3_arg2");
+	ASSERT_EQ(bss->usdt3_args[2], (uintptr_t)&bla, "usdt3_arg3");
+
+cleanup:
+	test_usdt__destroy(skel);
+}
+
+unsigned short test_usdt_100_semaphore SEC(".probes");
+unsigned short test_usdt_300_semaphore SEC(".probes");
+unsigned short test_usdt_400_semaphore SEC(".probes");
+
+#define R10(F, X)  F(X+0); F(X+1);F(X+2); F(X+3); F(X+4); \
+		   F(X+5); F(X+6); F(X+7); F(X+8); F(X+9);
+#define R100(F, X) R10(F,X+ 0);R10(F,X+10);R10(F,X+20);R10(F,X+30);R10(F,X+40); \
+		   R10(F,X+50);R10(F,X+60);R10(F,X+70);R10(F,X+80);R10(F,X+90);
+
+/* carefully control that we get exactly 100 inlines by preventing inlining */
+static void __always_inline f100(int x)
+{
+	STAP_PROBE1(test, usdt_100, x);
+}
+
+__weak void trigger_100_usdts(void)
+{
+	R100(f100, 0);
+}
+
+/* we shouldn't be able to attach to test:usdt2_300 USDT as we don't have as
+ * many slots for specs. It's important that each STAP_PROBE2() invocation
+ * (after untolling) gets different arg spec due to compiler inlining i as
+ * a constant
+ */
+static void __always_inline f300(int x)
+{
+	STAP_PROBE1(test, usdt_300, x);
+}
+
+__weak void trigger_300_usdts(void)
+{
+	R100(f300, 0);
+	R100(f300, 100);
+	R100(f300, 200);
+}
+
+static void __always_inline f400(int /*unused*/ )
+{
+	static int x;
+
+	STAP_PROBE1(test, usdt_400, x++);
+}
+
+/* this time we have 400 different USDT call sites, but they have uniform
+ * argument location, so libbpf's spec string deduplication logic should keep
+ * spec count use very small and so we should be able to attach to all 400
+ * call sites
+ */
+__weak void trigger_400_usdts(void)
+{
+	R100(f400, 0);
+	R100(f400, 100);
+	R100(f400, 200);
+	R100(f400, 300);
+}
+
+static void subtest_multispec_usdt(void)
+{
+	LIBBPF_OPTS(bpf_usdt_opts, opts);
+	struct test_usdt *skel;
+	struct test_usdt__bss *bss;
+	int err, i;
+
+	skel = test_usdt__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	bss = skel->bss;
+	bss->my_pid = getpid();
+
+	err = test_usdt__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto cleanup;
+
+	/* usdt_100 is auto-attached and there are 100 inlined call sites,
+	 * let's validate that all of them are properly attached to and
+	 * handled from BPF side
+	 */
+	trigger_100_usdts();
+
+	ASSERT_EQ(bss->usdt_100_called, 100, "usdt_100_called");
+	ASSERT_EQ(bss->usdt_100_sum, 99 * 100 / 2, "usdt_100_sum");
+
+	/* Stress test free spec ID tracking. By default libbpf allows up to
+	 * 256 specs to be used, so if we don't return free spec IDs back
+	 * after few detachments and re-attachments we should run out of
+	 * available spec IDs.
+	 */
+	for (i = 0; i < 2; i++) {
+		bpf_link__destroy(skel->links.usdt_100);
+
+		skel->links.usdt_100 = bpf_program__attach_usdt(skel->progs.usdt_100, -1,
+							        "/proc/self/exe",
+								"test", "usdt_100", NULL);
+		if (!ASSERT_OK_PTR(skel->links.usdt_100, "usdt_100_reattach"))
+			goto cleanup;
+
+		bss->usdt_100_sum = 0;
+		trigger_100_usdts();
+
+		ASSERT_EQ(bss->usdt_100_called, (i + 1) * 100 + 100, "usdt_100_called");
+		ASSERT_EQ(bss->usdt_100_sum, 99 * 100 / 2, "usdt_100_sum");
+	}
+
+	/* Now let's step it up and try to attach USDT that requires more than
+	 * 256 attach points with different specs for each.
+	 * Note that we need trigger_300_usdts() only to actually have 300
+	 * USDT call sites, we are not going to actually trace them.
+	 */
+	trigger_300_usdts();
+
+	/* we'll reuse usdt_100 BPF program for usdt_300 test */
+	bpf_link__destroy(skel->links.usdt_100);
+	skel->links.usdt_100 = bpf_program__attach_usdt(skel->progs.usdt_100, -1, "/proc/self/exe",
+							"test", "usdt_300", NULL);
+	err = -errno;
+	if (!ASSERT_ERR_PTR(skel->links.usdt_100, "usdt_300_bad_attach"))
+		goto cleanup;
+	ASSERT_EQ(err, -E2BIG, "usdt_300_attach_err");
+
+	/* let's check that there are no "dangling" BPF programs attached due
+	 * to partial success of the above test:usdt_300 attachment
+	 */
+	bss->usdt_100_called = 0;
+	bss->usdt_100_sum = 0;
+
+	f300(777); /* this is 301st instance of usdt_300 */
+
+	ASSERT_EQ(bss->usdt_100_called, 0, "usdt_301_called");
+	ASSERT_EQ(bss->usdt_100_sum, 0, "usdt_301_sum");
+
+	/* This time we have USDT with 400 inlined invocations, but arg specs
+	 * should be the same across all sites, so libbpf will only need to
+	 * use one spec and thus we'll be able to attach 400 uprobes
+	 * successfully.
+	 *
+	 * Again, we are reusing usdt_100 BPF program.
+	 */
+	skel->links.usdt_100 = bpf_program__attach_usdt(skel->progs.usdt_100, -1,
+							"/proc/self/exe",
+							"test", "usdt_400", NULL);
+	if (!ASSERT_OK_PTR(skel->links.usdt_100, "usdt_400_attach"))
+		goto cleanup;
+
+	trigger_400_usdts();
+
+	ASSERT_EQ(bss->usdt_100_called, 400, "usdt_400_called");
+	ASSERT_EQ(bss->usdt_100_sum, 399 * 400 / 2, "usdt_400_sum");
+
+cleanup:
+	test_usdt__destroy(skel);
+}
+
+void test_usdt(void)
+{
+	if (test__start_subtest("basic"))
+		subtest_basic_usdt();
+	if (test__start_subtest("multispec"))
+		subtest_multispec_usdt();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_usdt.c b/tools/testing/selftests/bpf/progs/test_usdt.c
new file mode 100644
index 000000000000..cb800910d794
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_usdt.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/usdt.bpf.h>
+
+int my_pid;
+
+int usdt0_called;
+u64 usdt0_cookie;
+int usdt0_arg_cnt;
+int usdt0_arg_ret;
+
+SEC("usdt")
+int usdt0(struct pt_regs *ctx)
+{
+	long tmp;
+
+	if (my_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&usdt0_called, 1);
+
+	usdt0_cookie = bpf_usdt_cookie(ctx);
+	usdt0_arg_cnt = bpf_usdt_arg_cnt(ctx);
+	/* should return -ENOENT */
+	usdt0_arg_ret = bpf_usdt_arg(ctx, 0, &tmp);
+	return 0;
+}
+
+int usdt3_called;
+u64 usdt3_cookie;
+int usdt3_arg_cnt;
+int usdt3_arg_rets[3];
+u64 usdt3_args[3];
+
+SEC("usdt//proc/self/exe:test:usdt3")
+int usdt3(struct pt_regs *ctx)
+{
+	long tmp;
+
+	if (my_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&usdt3_called, 1);
+
+	usdt3_cookie = bpf_usdt_cookie(ctx);
+	usdt3_arg_cnt = bpf_usdt_arg_cnt(ctx);
+
+	usdt3_arg_rets[0] = bpf_usdt_arg(ctx, 0, &tmp);
+	usdt3_args[0] = (int)tmp;
+
+	usdt3_arg_rets[1] = bpf_usdt_arg(ctx, 1, &tmp);
+	usdt3_args[1] = (long)tmp;
+
+	usdt3_arg_rets[2] = bpf_usdt_arg(ctx, 2, &tmp);
+	usdt3_args[2] = (uintptr_t)tmp;
+
+	return 0;
+}
+
+int usdt12_called;
+u64 usdt12_cookie;
+int usdt12_arg_cnt;
+u64 usdt12_args[12];
+
+SEC("usdt//proc/self/exe:test:usdt12")
+int BPF_USDT(usdt12, int a1, int a2, long a3, long a4, unsigned a5,
+		     long a6, __u64 a7, uintptr_t a8, int a9, short a10,
+		     short a11, signed char a12)
+{
+	if (my_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&usdt12_called, 1);
+
+	usdt12_cookie = bpf_usdt_cookie(ctx);
+	usdt12_arg_cnt = bpf_usdt_arg_cnt(ctx);
+
+	usdt12_args[0] = a1;
+	usdt12_args[1] = a2;
+	usdt12_args[2] = a3;
+	usdt12_args[3] = a4;
+	usdt12_args[4] = a5;
+	usdt12_args[5] = a6;
+	usdt12_args[6] = a7;
+	usdt12_args[7] = a8;
+	usdt12_args[8] = a9;
+	usdt12_args[9] = a10;
+	usdt12_args[10] = a11;
+	usdt12_args[11] = a12;
+	return 0;
+}
+
+int usdt_100_called;
+int usdt_100_sum;
+
+SEC("usdt//proc/self/exe:test:usdt_100")
+int BPF_USDT(usdt_100, int x)
+{
+	long tmp;
+
+	if (my_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&usdt_100_called, 1);
+	__sync_fetch_and_add(&usdt_100_sum, x);
+
+	bpf_printk("X is %d, sum is %d", x, usdt_100_sum);
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next 7/7] selftests/bpf: add urandom_read shared lib and USDTs
  2022-03-25  5:29 [PATCH bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2022-03-25  5:29 ` [PATCH bpf-next 6/7] selftests/bpf: add basic USDT selftests Andrii Nakryiko
@ 2022-03-25  5:29 ` Andrii Nakryiko
  2022-03-31 22:13   ` Alan Maguire
  6 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-25  5:29 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: andrii, kernel-team, Alan Maguire, Dave Marchevsky

Extend urandom_read helper binary to include USDTs of 4 combinations:
semaphore/semaphoreless (refcounted and non-refcounted) and based in
executable or shared library. We also extend urandom_read with ability
to report it's own PID to parent process and wait for parent process to
ready itself up for tracing urandom_read. We utilize popen() and
underlying pipe properties for proper signaling.

Once urandom_read is ready, we add few tests to validate that libbpf's
USDT attachment handles all the above combinations of semaphore (or lack
of it) and static or shared library USDTs. Also, we validate that libbpf
handles shared libraries both with PID filter and without one (i.e., -1
for PID argument).

Having the shared library case tested with and without PID is important
because internal logic differs on kernels that don't support BPF
cookies. On such older kernels, attaching to USDTs in shared libraries
without specifying concrete PID doesn't work in principle, because it's
impossible to determine shared library's load address to derive absolute
IPs for uprobe attachments. Without absolute IPs, it's impossible to
perform correct look up of USDT spec based on uprobe's absolute IP (the
only kind available from BPF at runtime). This is not the problem on
newer kernels with BPF cookie as we don't need IP-to-ID lookup because
BPF cookie value *is* spec ID.

So having those two situations as separate subtests is good because
libbpf CI is able to test latest selftests against old kernels (e.g.,
4.9 and 5.5), so we'll be able to disable PID-less shared lib attachment
for old kernels, but will still leave PID-specific one enabled to validate
this legacy logic is working correctly.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/testing/selftests/bpf/Makefile          |  11 +-
 tools/testing/selftests/bpf/prog_tests/usdt.c | 107 ++++++++++++++++++
 .../selftests/bpf/progs/test_urandom_usdt.c   |  70 ++++++++++++
 tools/testing/selftests/bpf/urandom_read.c    |  63 ++++++++++-
 .../testing/selftests/bpf/urandom_read_aux.c  |   9 ++
 .../testing/selftests/bpf/urandom_read_lib1.c |  13 +++
 .../testing/selftests/bpf/urandom_read_lib2.c |   8 ++
 7 files changed, 274 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_urandom_usdt.c
 create mode 100644 tools/testing/selftests/bpf/urandom_read_aux.c
 create mode 100644 tools/testing/selftests/bpf/urandom_read_lib1.c
 create mode 100644 tools/testing/selftests/bpf/urandom_read_lib2.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 18e22def3bdb..58da22c019a8 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -168,9 +168,15 @@ $(OUTPUT)/%:%.c
 	$(call msg,BINARY,,$@)
 	$(Q)$(LINK.c) $^ $(LDLIBS) -o $@
 
-$(OUTPUT)/urandom_read: urandom_read.c
+$(OUTPUT)/liburandom_read.so: urandom_read_lib1.c urandom_read_lib2.c
+	$(call msg,LIB,,$@)
+	$(Q)$(CC) $(CFLAGS) $(LDFLAGS) $^ $(LDLIBS) --shared -o $@
+
+$(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_read.so
 	$(call msg,BINARY,,$@)
-	$(Q)$(CC) $(CFLAGS) $(LDFLAGS) $< $(LDLIBS) -Wl,--build-id=sha1 -o $@
+	$(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.c,$^)			       \
+		  liburandom_read.so $(LDLIBS)	       			       \
+		  -Wl,-rpath=. -Wl,--build-id=sha1 -o $@
 
 $(OUTPUT)/bpf_testmod.ko: $(VMLINUX_BTF) $(wildcard bpf_testmod/Makefile bpf_testmod/*.[ch])
 	$(call msg,MOD,,$@)
@@ -492,6 +498,7 @@ TRUNNER_EXTRA_SOURCES := test_progs.c cgroup_helpers.c trace_helpers.c	\
 			 btf_helpers.c flow_dissector_load.h		\
 			 cap_helpers.c
 TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko	\
+		       $(OUTPUT)/liburandom_read.so			\
 		       ima_setup.sh					\
 		       $(wildcard progs/btf_dump_test_case_*.c)
 TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
index 44a20d8c45d7..b4c070bcac48 100644
--- a/tools/testing/selftests/bpf/prog_tests/usdt.c
+++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
@@ -305,10 +305,117 @@ static void subtest_multispec_usdt(void)
 	test_usdt__destroy(skel);
 }
 
+static FILE *urand_spawn(int *pid)
+{
+	FILE *f;
+
+	/* urandom_read's stdout is wired into f */
+	f = popen("./urandom_read 1 report-pid", "r");
+	if (!f)
+		return NULL;
+
+	if (fscanf(f, "%d", pid) != 1) {
+		pclose(f);
+		return NULL;
+	}
+
+	return f;
+}
+
+static int urand_trigger(FILE **urand_pipe)
+{
+	int exit_code;
+
+	/* pclose() waits for child process to exit and returns their exit code */
+	exit_code = pclose(*urand_pipe);
+	*urand_pipe = NULL;
+
+	return exit_code;
+}
+
+static void subtest_urandom_usdt(bool auto_attach)
+{
+	struct test_urandom_usdt *skel;
+	struct test_urandom_usdt__bss *bss;
+	struct bpf_link *l;
+	FILE *urand_pipe = NULL;
+	int err, urand_pid = 0;
+
+	skel = test_urandom_usdt__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	urand_pipe = urand_spawn(&urand_pid);
+	if (!ASSERT_OK_PTR(urand_pipe, "urand_spawn"))
+		goto cleanup;
+
+	bss = skel->bss;
+	bss->urand_pid = urand_pid;
+
+	if (auto_attach) {
+		err = test_urandom_usdt__attach(skel);
+		if (!ASSERT_OK(err, "skel_auto_attach"))
+			goto cleanup;
+	} else {
+		l = bpf_program__attach_usdt(skel->progs.urand_read_without_sema,
+					     urand_pid, "./urandom_read",
+					     "urand", "read_without_sema", NULL);
+		if (!ASSERT_OK_PTR(l, "urand_without_sema_attach"))
+			goto cleanup;
+		skel->links.urand_read_without_sema = l;
+
+		l = bpf_program__attach_usdt(skel->progs.urand_read_with_sema,
+					     urand_pid, "./urandom_read",
+					     "urand", "read_with_sema", NULL);
+		if (!ASSERT_OK_PTR(l, "urand_with_sema_attach"))
+			goto cleanup;
+		skel->links.urand_read_with_sema = l;
+
+		l = bpf_program__attach_usdt(skel->progs.urandlib_read_without_sema,
+					     urand_pid, "./liburandom_read.so",
+					     "urandlib", "read_without_sema", NULL);
+		if (!ASSERT_OK_PTR(l, "urandlib_without_sema_attach"))
+			goto cleanup;
+		skel->links.urandlib_read_without_sema = l;
+
+		l = bpf_program__attach_usdt(skel->progs.urandlib_read_with_sema,
+					     urand_pid, "./liburandom_read.so",
+					     "urandlib", "read_with_sema", NULL);
+		if (!ASSERT_OK_PTR(l, "urandlib_with_sema_attach"))
+			goto cleanup;
+		skel->links.urandlib_read_with_sema = l;
+
+	}
+
+	/* trigger urandom_read USDTs */
+	ASSERT_OK(urand_trigger(&urand_pipe), "urand_exit_code");
+
+	ASSERT_EQ(bss->urand_read_without_sema_call_cnt, 1, "urand_wo_sema_cnt");
+	ASSERT_EQ(bss->urand_read_without_sema_buf_sz_sum, 256, "urand_wo_sema_sum");
+
+	ASSERT_EQ(bss->urand_read_with_sema_call_cnt, 1, "urand_w_sema_cnt");
+	ASSERT_EQ(bss->urand_read_with_sema_buf_sz_sum, 256, "urand_w_sema_sum");
+
+	ASSERT_EQ(bss->urandlib_read_without_sema_call_cnt, 1, "urandlib_wo_sema_cnt");
+	ASSERT_EQ(bss->urandlib_read_without_sema_buf_sz_sum, 256, "urandlib_wo_sema_sum");
+
+	ASSERT_EQ(bss->urandlib_read_with_sema_call_cnt, 1, "urandlib_w_sema_cnt");
+	ASSERT_EQ(bss->urandlib_read_with_sema_buf_sz_sum, 256, "urandlib_w_sema_sum");
+
+cleanup:
+	if (urand_pipe)
+		pclose(urand_pipe);
+	test_urandom_usdt__destroy(skel);
+}
+
 void test_usdt(void)
 {
 	if (test__start_subtest("basic"))
 		subtest_basic_usdt();
 	if (test__start_subtest("multispec"))
 		subtest_multispec_usdt();
+	if (test__start_subtest("urand_auto_attach"))
+		subtest_urandom_usdt(true /* auto_attach */);
+	if (test__start_subtest("urand_pid_attach"))
+		subtest_urandom_usdt(false /* auto_attach */);
 }
diff --git a/tools/testing/selftests/bpf/progs/test_urandom_usdt.c b/tools/testing/selftests/bpf/progs/test_urandom_usdt.c
new file mode 100644
index 000000000000..3539b02bd5f7
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_urandom_usdt.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/usdt.bpf.h>
+
+int urand_pid;
+
+int urand_read_without_sema_call_cnt;
+int urand_read_without_sema_buf_sz_sum;
+
+SEC("usdt/./urandom_read:urand:read_without_sema")
+int BPF_USDT(urand_read_without_sema, int iter_num, int iter_cnt, int buf_sz)
+{
+	if (urand_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&urand_read_without_sema_call_cnt, 1);
+	__sync_fetch_and_add(&urand_read_without_sema_buf_sz_sum, buf_sz);
+
+	return 0;
+}
+
+int urand_read_with_sema_call_cnt;
+int urand_read_with_sema_buf_sz_sum;
+
+SEC("usdt/./urandom_read:urand:read_with_sema")
+int BPF_USDT(urand_read_with_sema, int iter_num, int iter_cnt, int buf_sz)
+{
+	if (urand_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&urand_read_with_sema_call_cnt, 1);
+	__sync_fetch_and_add(&urand_read_with_sema_buf_sz_sum, buf_sz);
+
+	return 0;
+}
+
+int urandlib_read_without_sema_call_cnt;
+int urandlib_read_without_sema_buf_sz_sum;
+
+SEC("usdt/./liburandom_read.so:urandlib:read_without_sema")
+int BPF_USDT(urandlib_read_without_sema, int iter_num, int iter_cnt, int buf_sz)
+{
+	if (urand_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&urandlib_read_without_sema_call_cnt, 1);
+	__sync_fetch_and_add(&urandlib_read_without_sema_buf_sz_sum, buf_sz);
+
+	return 0;
+}
+
+int urandlib_read_with_sema_call_cnt;
+int urandlib_read_with_sema_buf_sz_sum;
+
+SEC("usdt/./liburandom_read.so:urandlib:read_with_sema")
+int BPF_USDT(urandlib_read_with_sema, int iter_num, int iter_cnt, int buf_sz)
+{
+	if (urand_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&urandlib_read_with_sema_call_cnt, 1);
+	__sync_fetch_and_add(&urandlib_read_with_sema_buf_sz_sum, buf_sz);
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/urandom_read.c b/tools/testing/selftests/bpf/urandom_read.c
index db781052758d..0366f36e2174 100644
--- a/tools/testing/selftests/bpf/urandom_read.c
+++ b/tools/testing/selftests/bpf/urandom_read.c
@@ -1,32 +1,85 @@
+#include <stdbool.h>
 #include <stdio.h>
 #include <unistd.h>
+#include <errno.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <stdlib.h>
+#include <signal.h>
+
+#define _SDT_HAS_SEMAPHORES 1
+#include <sys/sdt.h>
+
+#define SEC(name) __attribute__((section(name), used))
 
 #define BUF_SIZE 256
 
+/* defined in urandom_read_aux.c */
+void urand_read_without_sema(int iter_num, int iter_cnt, int read_sz);
+/* these are coming from urandom_read_lib{1,2}.c */
+void urandlib_read_with_sema(int iter_num, int iter_cnt, int read_sz);
+void urandlib_read_without_sema(int iter_num, int iter_cnt, int read_sz);
+
+unsigned short urand_read_with_sema_semaphore SEC(".probes");
+
 static __attribute__((noinline))
 void urandom_read(int fd, int count)
 {
-       char buf[BUF_SIZE];
-       int i;
+	char buf[BUF_SIZE];
+	int i;
+
+	for (i = 0; i < count; ++i) {
+		read(fd, buf, BUF_SIZE);
+
+		/* trigger USDTs defined in executable itself */
+		urand_read_without_sema(i, count, BUF_SIZE);
+		STAP_PROBE3(urand, read_with_sema, i, count, BUF_SIZE);
 
-       for (i = 0; i < count; ++i)
-               read(fd, buf, BUF_SIZE);
+		/* trigger USDTs defined in shared lib */
+		urandlib_read_without_sema(i, count, BUF_SIZE);
+		urandlib_read_with_sema(i, count, BUF_SIZE);
+	}
+}
+
+static volatile bool parent_ready;
+
+static void handle_sigpipe(int sig)
+{
+	parent_ready = true;
 }
 
 int main(int argc, char *argv[])
 {
 	int fd = open("/dev/urandom", O_RDONLY);
 	int count = 4;
+	bool report_pid = false;
 
 	if (fd < 0)
 		return 1;
 
-	if (argc == 2)
+	if (argc >= 2)
 		count = atoi(argv[1]);
+	if (argc >= 3) {
+		report_pid = true;
+		/* install SIGPIPE handler to catch when parent closes their
+		 * end of the pipe (on the other side of our stdout)
+		 */
+		signal(SIGPIPE, handle_sigpipe);
+	}
+
+	/* report PID and wait for parent process to send us "signal" by
+	 * closing stdout
+	 */
+	if (report_pid) {
+		while (!parent_ready) {
+			fprintf(stdout, "%d\n", getpid());
+			fflush(stdout);
+		}
+		/* at this point stdout is closed, parent process knows our
+		 * PID and is ready to trace us
+		 */
+	}
 
 	urandom_read(fd, count);
 
diff --git a/tools/testing/selftests/bpf/urandom_read_aux.c b/tools/testing/selftests/bpf/urandom_read_aux.c
new file mode 100644
index 000000000000..88026f21ebfb
--- /dev/null
+++ b/tools/testing/selftests/bpf/urandom_read_aux.c
@@ -0,0 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#include <sys/sdt.h>
+
+void urand_read_without_sema(int iter_num, int iter_cnt, int read_sz)
+{
+	/* semaphore-less USDT */
+	STAP_PROBE3(urand, read_without_sema, iter_num, iter_cnt, read_sz);
+}
diff --git a/tools/testing/selftests/bpf/urandom_read_lib1.c b/tools/testing/selftests/bpf/urandom_read_lib1.c
new file mode 100644
index 000000000000..3e1b63b00dfb
--- /dev/null
+++ b/tools/testing/selftests/bpf/urandom_read_lib1.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#define _SDT_HAS_SEMAPHORES 1
+#include <sys/sdt.h>
+
+#define SEC(name) __attribute__((section(name), used))
+
+unsigned short urandlib_read_with_sema_semaphore SEC(".probes");
+
+void urandlib_read_with_sema(int iter_num, int iter_cnt, int read_sz)
+{
+	STAP_PROBE3(urandlib, read_with_sema, iter_num, iter_cnt, read_sz);
+}
diff --git a/tools/testing/selftests/bpf/urandom_read_lib2.c b/tools/testing/selftests/bpf/urandom_read_lib2.c
new file mode 100644
index 000000000000..e307a52d07e9
--- /dev/null
+++ b/tools/testing/selftests/bpf/urandom_read_lib2.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#include <sys/sdt.h>
+
+void urandlib_read_without_sema(int iter_num, int iter_cnt, int read_sz)
+{
+	STAP_PROBE3(urandlib, read_without_sema, iter_num, iter_cnt, read_sz);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-03-25  5:29 ` [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support Andrii Nakryiko
@ 2022-03-30  3:10   ` Hengqi Chen
  2022-03-30 15:22     ` Hengqi Chen
                       ` (2 more replies)
  2022-03-31 11:30   ` Alan Maguire
  2022-03-31 18:34   ` program local storage. Was: " Alexei Starovoitov
  2 siblings, 3 replies; 35+ messages in thread
From: Hengqi Chen @ 2022-03-30  3:10 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel
  Cc: kernel-team, Alan Maguire, Dave Marchevsky

On 2022/3/25 1:29 PM, Andrii Nakryiko wrote:
> Add BPF-side implementation of libbpf-provided USDT support. This
> consists of single header library, usdt.bpf.h, which is meant to be used
> from user's BPF-side source code. This header is added to the list of
> installed libbpf header, along bpf_helpers.h and others.
> 
> BPF-side implementation consists of two BPF maps:
>   - spec map, which contains "a USDT spec" which encodes information
>     necessary to be able to fetch USDT arguments and other information
>     (argument count, user-provided cookie value, etc) at runtime;
>   - IP-to-spec-ID map, which is only used on kernels that don't support
>     BPF cookie feature. It allows to lookup spec ID based on the place
>     in user application that triggers USDT program.
> 
> These maps have default sizes, 256 and 1024, which are chosen
> conservatively to not waste a lot of space, but handling a lot of common
> cases. But there could be cases when user application needs to either
> trace a lot of different USDTs, or USDTs are heavily inlined and their
> arguments are located in a lot of differing locations. For such cases it
> might be necessary to size those maps up, which libbpf allows to do by
> overriding BPF_USDT_MAX_SPEC_CNT and BPF_USDT_MAX_IP_CNT macros.
> 
> It is an important aspect to keep in mind. Single USDT (user-space
> equivalent of kernel tracepoint) can have multiple USDT "call sites".
> That is, single logical USDT is triggered from multiple places in user
> application. This can happen due to function inlining. Each such inlined
> instance of USDT invocation can have its own unique USDT argument
> specification (instructions about the location of the value of each of
> USDT arguments). So while USDT looks very similar to usual uprobe or
> kernel tracepoint, under the hood it's actually a collection of uprobes,
> each potentially needing different spec to know how to fetch arguments.
> 
> User-visible API consists of three helper functions:
>   - bpf_usdt_arg_cnt(), which returns number of arguments of current USDT;
>   - bpf_usdt_arg(), which reads value of specified USDT argument (by
>     it's zero-indexed position) and returns it as 64-bit value;
>   - bpf_usdt_cookie(), which functions like BPF cookie for USDT
>     programs; this is necessary as libbpf doesn't allow specifying actual
>     BPF cookie and utilizes it internally for USDT support implementation.
> 
> Each bpf_usdt_xxx() APIs expect struct pt_regs * context, passed into
> BPF program. On kernels that don't support BPF cookie it is used to
> fetch absolute IP address of the underlying uprobe.
> 
> usdt.bpf.h also provides BPF_USDT() macro, which functions like
> BPF_PROG() and BPF_KPROBE() and allows much more user-friendly way to
> get access to USDT arguments, if USDT definition is static and known to
> the user. It is expected that majority of use cases won't have to use
> bpf_usdt_arg_cnt() and bpf_usdt_arg() directly and BPF_USDT() will cover
> all their needs.
> 
> Last, usdt.bpf.h is utilizing BPF CO-RE for one single purpose: to
> detect kernel support for BPF cookie. If BPF CO-RE dependency is
> undesirable, user application can redefine BPF_USDT_HAS_BPF_COOKIE to
> either a boolean constant (or equivalently zero and non-zero), or even
> point it to its own .rodata variable that can be specified from user's
> application user-space code. It is important that
> BPF_USDT_HAS_BPF_COOKIE is known to BPF verifier as static value (thus
> .rodata and not just .data), as otherwise BPF code will still contain
> bpf_get_attach_cookie() BPF helper call and will fail validation at
> runtime, if not dead-code eliminated.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  tools/lib/bpf/Makefile   |   2 +-
>  tools/lib/bpf/usdt.bpf.h | 228 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 229 insertions(+), 1 deletion(-)
>  create mode 100644 tools/lib/bpf/usdt.bpf.h
> 
> diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
> index b8b37fe76006..b4fbe8bed555 100644
> --- a/tools/lib/bpf/Makefile
> +++ b/tools/lib/bpf/Makefile
> @@ -239,7 +239,7 @@ install_lib: all_cmd
>  
>  SRC_HDRS := bpf.h libbpf.h btf.h libbpf_common.h libbpf_legacy.h xsk.h	     \
>  	    bpf_helpers.h bpf_tracing.h bpf_endian.h bpf_core_read.h	     \
> -	    skel_internal.h libbpf_version.h
> +	    skel_internal.h libbpf_version.h usdt.bpf.h
>  GEN_HDRS := $(BPF_GENERATED)
>  
>  INSTALL_PFX := $(DESTDIR)$(prefix)/include/bpf
> diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h
> new file mode 100644
> index 000000000000..8ee084b2e6b5
> --- /dev/null
> +++ b/tools/lib/bpf/usdt.bpf.h
> @@ -0,0 +1,228 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +#ifndef __USDT_BPF_H__
> +#define __USDT_BPF_H__
> +
> +#include <linux/errno.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_core_read.h>
> +
> +/* Below types and maps are internal implementation details of libpf's USDT

typo: libpf -> libbpf

> + * support and are subjects to change. Also, usdt_xxx() API helpers should be
> + * considered an unstable API as well and might be adjusted based on user
> + * feedback from using libbpf's USDT support in production.
> + */
> +
> +/* User can override BPF_USDT_MAX_SPEC_CNT to change default size of internal
> + * map that keeps track of USDT argument specifications. This might be
> + * necessary if there are a lot of USDT attachments.
> + */
> +#ifndef BPF_USDT_MAX_SPEC_CNT
> +#define BPF_USDT_MAX_SPEC_CNT 256
> +#endif
> +/* User can override BPF_USDT_MAX_IP_CNT to change default size of internal
> + * map that keeps track of IP (memory address) mapping to USDT argument
> + * specification.
> + * Note, if kernel supports BPF cookies, this map is not used and could be
> + * resized all the way to 1 to save a bit of memory.
> + */
> +#ifndef BPF_USDT_MAX_IP_CNT
> +#define BPF_USDT_MAX_IP_CNT 1024
> +#endif
> +/* We use BPF CO-RE to detect support for BPF cookie from BPF side. This is
> + * the only dependency on CO-RE, so if it's undesirable, user can override
> + * BPF_USDT_HAS_BPF_COOKIE to specify whether to BPF cookie is supported or not.
> + */
> +#ifndef BPF_USDT_HAS_BPF_COOKIE
> +#define BPF_USDT_HAS_BPF_COOKIE \
> +	bpf_core_enum_value_exists(enum bpf_func_id___usdt, BPF_FUNC_get_attach_cookie___usdt)
> +#endif
> +
> +enum __bpf_usdt_arg_type {
> +	BPF_USDT_ARG_CONST,
> +	BPF_USDT_ARG_REG,
> +	BPF_USDT_ARG_REG_DEREF,
> +};
> +
> +struct __bpf_usdt_arg_spec {
> +	__u64 val_off;
> +	enum __bpf_usdt_arg_type arg_type;
> +	short reg_off;
> +	bool arg_signed;
> +	char arg_bitshift;
> +};
> +
> +/* should match USDT_MAX_ARG_CNT in usdt.c exactly */
> +#define BPF_USDT_MAX_ARG_CNT 12
> +struct __bpf_usdt_spec {
> +	struct __bpf_usdt_arg_spec args[BPF_USDT_MAX_ARG_CNT];
> +	__u64 usdt_cookie;
> +	short arg_cnt;
> +};
> +
> +__weak struct {
> +	__uint(type, BPF_MAP_TYPE_ARRAY);
> +	__uint(max_entries, BPF_USDT_MAX_SPEC_CNT);
> +	__type(key, int);
> +	__type(value, struct __bpf_usdt_spec);
> +} __bpf_usdt_specs SEC(".maps");
> +
> +__weak struct {
> +	__uint(type, BPF_MAP_TYPE_HASH);
> +	__uint(max_entries, BPF_USDT_MAX_IP_CNT);
> +	__type(key, long);
> +	__type(value, struct __bpf_usdt_spec);

type should be int.

> +} __bpf_usdt_specs_ip_to_id SEC(".maps");
> +
> +/* don't rely on user's BPF code to have latest definition of bpf_func_id */
> +enum bpf_func_id___usdt {
> +	BPF_FUNC_get_attach_cookie___usdt = 0xBAD, /* value doesn't matter */
> +};
> +
> +static inline int __bpf_usdt_spec_id(struct pt_regs *ctx)
> +{
> +	if (!BPF_USDT_HAS_BPF_COOKIE) {
> +		long ip = PT_REGS_IP(ctx);
> +		int *spec_id_ptr;
> +
> +		spec_id_ptr = bpf_map_lookup_elem(&__bpf_usdt_specs_ip_to_id, &ip);
> +		return spec_id_ptr ? *spec_id_ptr : -ESRCH;
> +	}
> +
> +	return bpf_get_attach_cookie(ctx);
> +}
> +
> +/* Return number of USDT arguments defined for currently traced USDT. */
> +__hidden __weak
> +int bpf_usdt_arg_cnt(struct pt_regs *ctx)
> +{
> +	struct __bpf_usdt_spec *spec;
> +	int spec_id;
> +
> +	spec_id = __bpf_usdt_spec_id(ctx);
> +	if (spec_id < 0)
> +		return -EINVAL;
> +
> +	spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> +	if (!spec)
> +		return -EINVAL;
> +
> +	return spec->arg_cnt;
> +}
> +
> +/* Fetch USDT argument *arg* (zero-indexed) and put its value into *res.
> + * Returns 0 on success; negative error, otherwise.
> + * On error *res is guaranteed to be set to zero.
> + */
> +__hidden __weak
> +int bpf_usdt_arg(struct pt_regs *ctx, int arg, long *res)
> +{
> +	struct __bpf_usdt_spec *spec;
> +	struct __bpf_usdt_arg_spec *arg_spec;
> +	unsigned long val;
> +	int err, spec_id;
> +
> +	*res = 0;
> +
> +	spec_id = __bpf_usdt_spec_id(ctx);
> +	if (spec_id < 0)
> +		return -ESRCH;
> +
> +	spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> +	if (!spec)
> +		return -ESRCH;
> +
> +	if (arg >= spec->arg_cnt)
> +		return -ENOENT;
> +
> +	arg_spec = &spec->args[arg];
> +	switch (arg_spec->arg_type) {
> +	case BPF_USDT_ARG_CONST:
> +		val = arg_spec->val_off;
> +		break;
> +	case BPF_USDT_ARG_REG:
> +		err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
> +		if (err)
> +			return err;
> +		break;
> +	case BPF_USDT_ARG_REG_DEREF:
> +		err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
> +		if (err)
> +			return err;
> +		err = bpf_probe_read_user(&val, sizeof(val), (void *)val + arg_spec->val_off);
> +		if (err)
> +			return err;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	val <<= arg_spec->arg_bitshift;
> +	if (arg_spec->arg_signed)
> +		val = ((long)val) >> arg_spec->arg_bitshift;
> +	else
> +		val = val >> arg_spec->arg_bitshift;
> +	*res = val;
> +	return 0;
> +}
> +
> +/* Retrieve user-specified cookie value provided during attach as
> + * bpf_usdt_opts.usdt_cookie. This serves the same purpose as BPF cookie
> + * returned by bpf_get_attach_cookie(). Libbpf's support for USDT is itself
> + * utilizaing BPF cookies internally, so user can't use BPF cookie directly
> + * for USDT programs and has to use bpf_usdt_cookie() API instead.
> + */
> +__hidden __weak
> +long bpf_usdt_cookie(struct pt_regs *ctx)
> +{
> +	struct __bpf_usdt_spec *spec;
> +	int spec_id;
> +
> +	spec_id = __bpf_usdt_spec_id(ctx);
> +	if (spec_id < 0)
> +		return 0;
> +
> +	spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> +	if (!spec)
> +		return 0;
> +
> +	return spec->usdt_cookie;
> +}
> +
> +/* we rely on ___bpf_apply() and ___bpf_narg() macros already defined in bpf_tracing.h */
> +#define ___bpf_usdt_args0() ctx
> +#define ___bpf_usdt_args1(x) ___bpf_usdt_args0(), ({ long _x; bpf_usdt_arg(ctx, 0, &_x); (void *)_x; })
> +#define ___bpf_usdt_args2(x, args...) ___bpf_usdt_args1(args), ({ long _x; bpf_usdt_arg(ctx, 1, &_x); (void *)_x; })
> +#define ___bpf_usdt_args3(x, args...) ___bpf_usdt_args2(args), ({ long _x; bpf_usdt_arg(ctx, 2, &_x); (void *)_x; })
> +#define ___bpf_usdt_args4(x, args...) ___bpf_usdt_args3(args), ({ long _x; bpf_usdt_arg(ctx, 3, &_x); (void *)_x; })
> +#define ___bpf_usdt_args5(x, args...) ___bpf_usdt_args4(args), ({ long _x; bpf_usdt_arg(ctx, 4, &_x); (void *)_x; })
> +#define ___bpf_usdt_args6(x, args...) ___bpf_usdt_args5(args), ({ long _x; bpf_usdt_arg(ctx, 5, &_x); (void *)_x; })
> +#define ___bpf_usdt_args7(x, args...) ___bpf_usdt_args6(args), ({ long _x; bpf_usdt_arg(ctx, 6, &_x); (void *)_x; })
> +#define ___bpf_usdt_args8(x, args...) ___bpf_usdt_args7(args), ({ long _x; bpf_usdt_arg(ctx, 7, &_x); (void *)_x; })
> +#define ___bpf_usdt_args9(x, args...) ___bpf_usdt_args8(args), ({ long _x; bpf_usdt_arg(ctx, 8, &_x); (void *)_x; })
> +#define ___bpf_usdt_args10(x, args...) ___bpf_usdt_args9(args), ({ long _x; bpf_usdt_arg(ctx, 9, &_x); (void *)_x; })
> +#define ___bpf_usdt_args11(x, args...) ___bpf_usdt_args10(args), ({ long _x; bpf_usdt_arg(ctx, 10, &_x); (void *)_x; })
> +#define ___bpf_usdt_args12(x, args...) ___bpf_usdt_args11(args), ({ long _x; bpf_usdt_arg(ctx, 11, &_x); (void *)_x; })
> +#define ___bpf_usdt_args(args...) ___bpf_apply(___bpf_usdt_args, ___bpf_narg(args))(args)
> +
> +/*
> + * BPF_USDT serves the same purpose for USDT handlers as BPF_PROG for
> + * tp_btf/fentry/fexit BPF programs and BPF_KPROBE for kprobes.
> + * Original struct pt_regs * context is preserved as 'ctx' argument.
> + */
> +#define BPF_USDT(name, args...)						    \
> +name(struct pt_regs *ctx);						    \
> +static __attribute__((always_inline)) typeof(name(0))			    \
> +____##name(struct pt_regs *ctx, ##args);				    \
> +typeof(name(0)) name(struct pt_regs *ctx)				    \
> +{									    \
> +        _Pragma("GCC diagnostic push")					    \
> +        _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")		    \
> +        return ____##name(___bpf_usdt_args(args));			    \
> +        _Pragma("GCC diagnostic pop")					    \
> +}									    \
> +static __attribute__((always_inline)) typeof(name(0))			    \
> +____##name(struct pt_regs *ctx, ##args)
> +
> +#endif /* __USDT_BPF_H__ */

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration
  2022-03-25  5:29 ` [PATCH bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration Andrii Nakryiko
@ 2022-03-30  3:24   ` Hengqi Chen
  2022-03-31  5:56     ` Andrii Nakryiko
  2022-03-31 12:13   ` Alan Maguire
  1 sibling, 1 reply; 35+ messages in thread
From: Hengqi Chen @ 2022-03-30  3:24 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel
  Cc: kernel-team, Alan Maguire, Dave Marchevsky



On 2022/3/25 1:29 PM, Andrii Nakryiko wrote:
> Wire up libbpf USDT support APIs without yet implementing all the
> nitty-gritty details of USDT discovery, spec parsing, and BPF map
> initialization.
> 
> User-visible user-space API is simple and is conceptually very similar
> to uprobe API.
> 
> bpf_program__attach_usdt() API allows to programmatically attach given
> BPF program to a USDT, specified through binary path (executable or
> shared lib), USDT provider and name. Also, just like in uprobe case, PID
> filter is specified (0 - self, -1 - any process, or specific PID).
> Optionally, USDT cookie value can be specified. Such single API
> invocation will try to discover given USDT in specified binary and will
> use (potentially many) BPF uprobes to attach this program in correct
> locations.
> 
> Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that
> represents this attachment. It is a virtual BPF link that doesn't have
> direct kernel object, as it can consist of multiple underlying BPF
> uprobe links. As such, attachment is not atomic operation and there can
> be brief moment when some USDT call sites are attached while others are
> still in the process of attaching. This should be taken into
> consideration by user. But bpf_program__attach_usdt() guarantees that
> in the case of success all USDT call sites are successfully attached, or
> all the successfuly attachments will be detached as soon as some USDT
> call sites failed to be attached. So, in theory, there could be cases of
> failed bpf_program__attach_usdt() call which did trigger few USDT
> program invocations. This is unavoidable due to multi-uprobe nature of
> USDT and has to be handled by user, if it's important to create an
> illusion of atomicity.
> 
> USDT BPF programs themselves are marked in BPF source code as either
> SEC("usdt"), in which case they won't be auto-attached through
> skeleton's <skel>__attach() method, or it can have a full definition,
> which follows the spirit of fully-specified uprobes:
> SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's
> attach method will attempt auto-attachment. Similarly, generic
> bpf_program__attach() will have enought information to go off of for
> parameterless attachment.
> 
> USDT BPF programs are actually uprobes, and as such for kernel they are
> marked as BPF_PROG_TYPE_KPROBE.
> 
> Another part of this patch is USDT-related feature probing:
>   - BPF cookie support detection from user-space;
>   - detection of kernel support for auto-refcounting of USDT semaphore.
> 
> The latter is optional. If kernel doesn't support such feature and USDT
> doesn't rely on USDT semaphores, no error is returned. But if libbpf
> detects that USDT requires setting semaphores and kernel doesn't support
> this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't
> support poking process's memory directly to increment semaphore value,
> like BCC does on legacy kernels, due to inherent raciness and danger of
> such process memory manipulation. Libbpf let's kernel take care of this
> properly or gives up.
> 
> Logistically, all the extra USDT-related infrastructure of libbpf is put
> into a separate usdt.c file and abstracted behind struct usdt_manager.
> Each bpf_object has lazily-initialized usdt_manager pointer, which is
> only instantiated if USDT programs are attempted to be attached. Closing
> BPF object frees up usdt_manager resources. usdt_manager keeps track of
> USDT spec ID assignment and few other small things.
> 
> Subsequent patches will fill out remaining missing pieces of USDT
> initialization and setup logic.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  tools/lib/bpf/Build             |   3 +-
>  tools/lib/bpf/libbpf.c          |  92 ++++++++++-
>  tools/lib/bpf/libbpf.h          |  15 ++
>  tools/lib/bpf/libbpf.map        |   1 +
>  tools/lib/bpf/libbpf_internal.h |  19 +++
>  tools/lib/bpf/usdt.c            | 270 ++++++++++++++++++++++++++++++++
>  6 files changed, 391 insertions(+), 9 deletions(-)
>  create mode 100644 tools/lib/bpf/usdt.c
> 
> diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
> index 94f0a146bb7b..31a1a9015902 100644
> --- a/tools/lib/bpf/Build
> +++ b/tools/lib/bpf/Build
> @@ -1,3 +1,4 @@
>  libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \
>  	    netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o hashmap.o \
> -	    btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o
> +	    btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \
> +	    usdt.o
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 809fe209cdcc..8841499f5f12 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -483,6 +483,8 @@ struct elf_state {
>  	int st_ops_shndx;
>  };
>  
> +struct usdt_manager;
> +
>  struct bpf_object {
>  	char name[BPF_OBJ_NAME_LEN];
>  	char license[64];
> @@ -545,6 +547,8 @@ struct bpf_object {
>  	size_t fd_array_cap;
>  	size_t fd_array_cnt;
>  
> +	struct usdt_manager *usdt_man;
> +
>  	char path[];
>  };
>  
> @@ -4678,6 +4682,18 @@ static int probe_perf_link(void)
>  	return link_fd < 0 && err == -EBADF;
>  }
>  
> +static int probe_kern_bpf_cookie(void)
> +{
> +	struct bpf_insn insns[] = {
> +		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_attach_cookie),
> +		BPF_EXIT_INSN(),
> +	};
> +	int ret, insn_cnt = ARRAY_SIZE(insns);
> +
> +	ret = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL", insns, insn_cnt, NULL);
> +	return probe_fd(ret);
> +}
> +
>  enum kern_feature_result {
>  	FEAT_UNKNOWN = 0,
>  	FEAT_SUPPORTED = 1,
> @@ -4740,6 +4756,9 @@ static struct kern_feature_desc {
>  	[FEAT_MEMCG_ACCOUNT] = {
>  		"memcg-based memory accounting", probe_memcg_account,
>  	},
> +	[FEAT_BPF_COOKIE] = {
> +		"BPF cookie support", probe_kern_bpf_cookie,
> +	},
>  };
>  
>  bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
> @@ -8200,6 +8219,9 @@ void bpf_object__close(struct bpf_object *obj)
>  	if (obj->clear_priv)
>  		obj->clear_priv(obj, obj->priv);
>  
> +	usdt_manager_free(obj->usdt_man);
> +	obj->usdt_man = NULL;
> +
>  	bpf_gen__free(obj->gen_loader);
>  	bpf_object__elf_finish(obj);
>  	bpf_object_unload(obj);
> @@ -8630,6 +8652,7 @@ int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log
>  }
>  
>  static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link);
> +static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link);
>  static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link);
>  static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link);
>  static int attach_trace(const struct bpf_program *prog, long cookie, struct bpf_link **link);
> @@ -8647,6 +8670,7 @@ static const struct bpf_sec_def section_defs[] = {
>  	SEC_DEF("uretprobe/",		KPROBE, 0, SEC_NONE),
>  	SEC_DEF("kprobe.multi/",	KPROBE,	BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
>  	SEC_DEF("kretprobe.multi/",	KPROBE,	BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
> +	SEC_DEF("usdt+",		KPROBE,	0, SEC_NONE, attach_usdt),
>  	SEC_DEF("tc",			SCHED_CLS, 0, SEC_NONE),
>  	SEC_DEF("classifier",		SCHED_CLS, 0, SEC_NONE | SEC_SLOPPY_PFX | SEC_DEPRECATED),
>  	SEC_DEF("action",		SCHED_ACT, 0, SEC_NONE | SEC_SLOPPY_PFX),
> @@ -9692,14 +9716,6 @@ int bpf_prog_load_deprecated(const char *file, enum bpf_prog_type type,
>  	return bpf_prog_load_xattr2(&attr, pobj, prog_fd);
>  }
>  
> -struct bpf_link {
> -	int (*detach)(struct bpf_link *link);
> -	void (*dealloc)(struct bpf_link *link);
> -	char *pin_path;		/* NULL, if not pinned */
> -	int fd;			/* hook FD, -1 if not applicable */
> -	bool disconnected;
> -};
> -
>  /* Replace link's underlying BPF program with the new one */
>  int bpf_link__update_program(struct bpf_link *link, struct bpf_program *prog)
>  {
> @@ -10599,6 +10615,66 @@ struct bpf_link *bpf_program__attach_uprobe(const struct bpf_program *prog,
>  	return bpf_program__attach_uprobe_opts(prog, pid, binary_path, func_offset, &opts);
>  }
>  
> +struct bpf_link *bpf_program__attach_usdt(const struct bpf_program *prog,
> +					  pid_t pid, const char *binary_path,
> +					  const char *usdt_provider, const char *usdt_name,
> +					  const struct bpf_usdt_opts *opts)
> +{
> +	struct bpf_object *obj = prog->obj;
> +	struct bpf_link *link;
> +	long usdt_cookie;
> +	int err;
> +
> +	if (!OPTS_VALID(opts, bpf_uprobe_opts))
> +		return libbpf_err_ptr(-EINVAL);
> +
> +	/* USDT manager is instantiated lazily on first USDT attach. It will
> +	 * be destroyed together with BPF object in bpf_object__close().
> +	 */
> +	if (!obj->usdt_man) {
> +		obj->usdt_man = usdt_manager_new(obj);
> +		if (!obj->usdt_man)
> +			return libbpf_err_ptr(-ENOMEM);

usdt_manager_new returns NULL in two cases, ENOMEM is not accurate when map not found.


> +	}
> +
> +	usdt_cookie = OPTS_GET(opts, usdt_cookie, 0);
> +	link = usdt_manager_attach_usdt(obj->usdt_man, prog, pid, binary_path,
> +				        usdt_provider, usdt_name, usdt_cookie);
> +	err = libbpf_get_error(link);
> +	if (err)
> +		return libbpf_err_ptr(err);
> +	return link;
> +}
> +
> +static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link)
> +{
> +	char *path = NULL, *provider = NULL, *name = NULL;
> +	const char *sec_name;
> +
> +	sec_name = bpf_program__section_name(prog);
> +	if (strcmp(sec_name, "usdt") == 0) {
> +		/* no auto-attach for just SEC("usdt") */
> +		*link = NULL;
> +		return 0;
> +	}
> +
> +	if (3 != sscanf(sec_name, "usdt/%m[^:]:%m[^:]:%m[^:]", &path, &provider, &name)) {

Is yoda condition a good practice ?

> +		pr_warn("invalid section '%s', expected SEC(\"usdt/<path>:<provider>:<name>\")\n",
> +			sec_name);
> +		free(path);
> +		free(provider);
> +		free(name);
> +		return -EINVAL;
> +	}
> +
> +	*link = bpf_program__attach_usdt(prog, -1 /* any process */, path,
> +					 provider, name, NULL);
> +	free(path);
> +	free(provider);
> +	free(name);
> +	return libbpf_get_error(*link);
> +}
> +
>  static int determine_tracepoint_id(const char *tp_category,
>  				   const char *tp_name)
>  {
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 05dde85e19a6..318eecaa14e7 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -503,6 +503,21 @@ bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid,
>  				const char *binary_path, size_t func_offset,
>  				const struct bpf_uprobe_opts *opts);
>  
> +struct bpf_usdt_opts {
> +	/* size of this struct, for forward/backward compatibility */
> +	size_t sz;
> +	/* custom user-provided value accessible through usdt_cookie() */
> +	__u64 usdt_cookie;
> +	size_t :0;
> +};
> +#define bpf_usdt_opts__last_field usdt_cookie
> +
> +LIBBPF_API struct bpf_link *
> +bpf_program__attach_usdt(const struct bpf_program *prog,
> +			 pid_t pid, const char *binary_path,
> +			 const char *usdt_provider, const char *usdt_name,
> +			 const struct bpf_usdt_opts *opts);
> +
>  struct bpf_tracepoint_opts {
>  	/* size of this struct, for forward/backward compatiblity */
>  	size_t sz;
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index dd35ee58bfaa..82f6d62176dd 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -444,6 +444,7 @@ LIBBPF_0.8.0 {
>  	global:
>  		bpf_object__destroy_subskeleton;
>  		bpf_object__open_subskeleton;
> +		bpf_program__attach_usdt;
>  		libbpf_register_prog_handler;
>  		libbpf_unregister_prog_handler;
>  		bpf_program__attach_kprobe_multi_opts;
> diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
> index b6247dc7f8eb..dd0d4ccfa649 100644
> --- a/tools/lib/bpf/libbpf_internal.h
> +++ b/tools/lib/bpf/libbpf_internal.h
> @@ -148,6 +148,15 @@ do {				\
>  #ifndef __has_builtin
>  #define __has_builtin(x) 0
>  #endif
> +
> +struct bpf_link {
> +	int (*detach)(struct bpf_link *link);
> +	void (*dealloc)(struct bpf_link *link);
> +	char *pin_path;		/* NULL, if not pinned */
> +	int fd;			/* hook FD, -1 if not applicable */
> +	bool disconnected;
> +};
> +
>  /*
>   * Re-implement glibc's reallocarray() for libbpf internal-only use.
>   * reallocarray(), unfortunately, is not available in all versions of glibc,
> @@ -329,6 +338,8 @@ enum kern_feature_id {
>  	FEAT_BTF_TYPE_TAG,
>  	/* memcg-based accounting for BPF maps and progs */
>  	FEAT_MEMCG_ACCOUNT,
> +	/* BPF cookie (bpf_get_attach_cookie() BPF helper) support */
> +	FEAT_BPF_COOKIE,
>  	__FEAT_CNT,
>  };
>  
> @@ -543,4 +554,12 @@ int bpf_core_add_cands(struct bpf_core_cand *local_cand,
>  		       struct bpf_core_cand_list *cands);
>  void bpf_core_free_cands(struct bpf_core_cand_list *cands);
>  
> +struct usdt_manager *usdt_manager_new(struct bpf_object *obj);
> +void usdt_manager_free(struct usdt_manager *man);
> +struct bpf_link * usdt_manager_attach_usdt(struct usdt_manager *man,
> +					   const struct bpf_program *prog,
> +					   pid_t pid, const char *path,
> +					   const char *usdt_provider, const char *usdt_name,
> +					   long usdt_cookie);
> +
>  #endif /* __LIBBPF_LIBBPF_INTERNAL_H */
> diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> new file mode 100644
> index 000000000000..8481e300598e
> --- /dev/null
> +++ b/tools/lib/bpf/usdt.c
> @@ -0,0 +1,270 @@
> +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +#include <ctype.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <libelf.h>
> +#include <gelf.h>
> +#include <unistd.h>
> +#include <linux/ptrace.h>
> +#include <linux/kernel.h>
> +
> +#include "bpf.h"
> +#include "libbpf.h"
> +#include "libbpf_common.h"
> +#include "libbpf_internal.h"
> +#include "hashmap.h"
> +
> +#define PERF_UPROBE_REF_CTR_OFFSET_SHIFT 32
> +
> +struct usdt_target {
> +	long abs_ip;
> +	long rel_ip;
> +	long sema_off;
> +};
> +
> +struct usdt_manager {
> +	struct bpf_map *specs_map;
> +	struct bpf_map *ip_to_id_map;
> +
> +	bool has_bpf_cookie;
> +	bool has_sema_refcnt;
> +};
> +
> +struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
> +{
> +	static const char *ref_ctr_sysfs_path = "/sys/bus/event_source/devices/uprobe/format/ref_ctr_offset";
> +	struct usdt_manager *man;
> +	struct bpf_map *specs_map, *ip_to_id_map;
> +
> +	specs_map = bpf_object__find_map_by_name(obj, "__bpf_usdt_specs");
> +	ip_to_id_map = bpf_object__find_map_by_name(obj, "__bpf_usdt_specs_ip_to_id");
> +	if (!specs_map || !ip_to_id_map) {
> +		pr_warn("usdt: failed to find USDT support BPF maps, did you forget to include bpf/usdt.bpf.h?\n");
> +		return NULL;
> +	}
> +
> +	man = calloc(1, sizeof(*man));
> +	if (!man)
> +		return NULL;
> +
> +	man->specs_map = specs_map;
> +	man->ip_to_id_map = ip_to_id_map;
> +
> +        /* Detect if BPF cookie is supported for kprobes.
> +	 * We don't need IP-to-ID mapping if we can use BPF cookies.
> +         * Added in: 7adfc6c9b315 ("bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value")
> +         */

     ^  mixed-indention here.

> +	man->has_bpf_cookie = kernel_supports(obj, FEAT_BPF_COOKIE);
> +
> +	/* Detect kernel support for automatic refcounting of USDT semaphore.
> +	 * If this is not supported, USDTs with semaphores will not be supported.
> +	 * Added in: a6ca88b241d5 ("trace_uprobe: support reference counter in fd-based uprobe")
> +	 */
> +	man->has_sema_refcnt = access(ref_ctr_sysfs_path, F_OK) == 0;
> +
> +	return man;
> +}
> +
> +void usdt_manager_free(struct usdt_manager *man)
> +{
> +	if (!man)
> +		return;
> +
> +	free(man);
> +}
> +
> +static int sanity_check_usdt_elf(Elf *elf, const char *path)
> +{
> +	GElf_Ehdr ehdr;
> +	int endianness;
> +
> +	if (elf_kind(elf) != ELF_K_ELF) {
> +		pr_warn("usdt: unrecognized ELF kind %d for '%s'\n", elf_kind(elf), path);
> +		return -EBADF;
> +	}
> +
> +	switch (gelf_getclass(elf)) {
> +	case ELFCLASS64:
> +		if (sizeof(void *) != 8) {
> +			pr_warn("usdt: attaching to 64-bit ELF binary '%s' is not supported\n", path);
> +			return -EBADF;
> +		}
> +		break;
> +	case ELFCLASS32:
> +		if (sizeof(void *) != 4) {
> +			pr_warn("usdt: attaching to 32-bit ELF binary '%s' is not supported\n", path);
> +			return -EBADF;
> +		}
> +		break;
> +	default:
> +		pr_warn("usdt: unsupported ELF class for '%s'\n", path);
> +		return -EBADF;
> +	}
> +
> +	if (!gelf_getehdr(elf, &ehdr))
> +		return -EINVAL;
> +
> +	if (ehdr.e_type != ET_EXEC && ehdr.e_type != ET_DYN) {
> +		pr_warn("usdt: unsupported type of ELF binary '%s' (%d), only ET_EXEC and ET_DYN are supported\n",
> +			path, ehdr.e_type);
> +		return -EBADF;
> +	}
> +
> +#if __BYTE_ORDER == __LITTLE_ENDIAN
> +	endianness = ELFDATA2LSB;
> +#elif __BYTE_ORDER == __BIG_ENDIAN
> +	endianness = ELFDATA2MSB;
> +#else
> +# error "Unrecognized __BYTE_ORDER__"
> +#endif
> +	if (endianness != ehdr.e_ident[EI_DATA]) {
> +		pr_warn("usdt: ELF endianness mismatch for '%s'\n", path);
> +		return -EBADF;
> +	}
> +
> +	return 0;
> +}
> +
> +static int collect_usdt_targets(struct usdt_manager *man, Elf *elf, const char *path, pid_t pid,
> +				const char *usdt_provider, const char *usdt_name, long usdt_cookie,
> +				struct usdt_target **out_targets, size_t *out_target_cnt)
> +{
> +	return -ENOTSUP;
> +}
> +
> +struct bpf_link_usdt {
> +	struct bpf_link link;
> +
> +	struct usdt_manager *usdt_man;
> +
> +	size_t uprobe_cnt;
> +	struct {
> +		long abs_ip;
> +		struct bpf_link *link;
> +	} *uprobes;
> +};
> +
> +static int bpf_link_usdt_detach(struct bpf_link *link)
> +{
> +	struct bpf_link_usdt *usdt_link = container_of(link, struct bpf_link_usdt, link);
> +	int i;
> +
> +	for (i = 0; i < usdt_link->uprobe_cnt; i++) {
> +		/* detach underlying uprobe link */
> +		bpf_link__destroy(usdt_link->uprobes[i].link);
> +	}
> +
> +	return 0;
> +}
> +
> +static void bpf_link_usdt_dealloc(struct bpf_link *link)
> +{
> +	struct bpf_link_usdt *usdt_link = container_of(link, struct bpf_link_usdt, link);
> +
> +	free(usdt_link->uprobes);
> +	free(usdt_link);
> +}
> +
> +struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct bpf_program *prog,
> +					  pid_t pid, const char *path,
> +					  const char *usdt_provider, const char *usdt_name,
> +					  long usdt_cookie)
> +{
> +	int i, fd, err;
> +	LIBBPF_OPTS(bpf_uprobe_opts, opts);
> +	struct bpf_link_usdt *link = NULL;
> +	struct usdt_target *targets = NULL;
> +	size_t target_cnt;
> +	Elf *elf;
> +
> +	if (bpf_program__fd(prog) < 0) {
> +		pr_warn("prog '%s': can't attach BPF program w/o FD (did you load it?)\n",
> +			bpf_program__name(prog));
> +		return libbpf_err_ptr(-EINVAL);
> +	}
> +
> +	/* TODO: perform path resolution similar to uprobe's */
> +	fd = open(path, O_RDONLY);
> +	if (fd < 0) {
> +		err = -errno;
> +		pr_warn("usdt: failed to open ELF binary '%s': %d\n", path, err);
> +		return libbpf_err_ptr(err);
> +	}
> +
> +	elf = elf_begin(fd, ELF_C_READ_MMAP, NULL);
> +	if (!elf) {
> +		err = -EBADF;
> +		pr_warn("usdt: failed to parse ELF binary '%s': %s\n", path, elf_errmsg(-1));
> +		goto err_out;
> +	}
> +
> +	err = sanity_check_usdt_elf(elf, path);
> +	if (err)
> +		goto err_out;
> +
> +	/* normalize PID filter */
> +	if (pid < 0)
> +		pid = -1;
> +	else if (pid == 0)
> +		pid = getpid();
> +
> +	/* discover USDT in given binary, optionally limiting
> +	 * activations to a given PID, if pid > 0
> +	 */
> +	err = collect_usdt_targets(man, elf, path, pid, usdt_provider, usdt_name,
> +				   usdt_cookie, &targets, &target_cnt);
> +	if (err <= 0) {
> +		err = (err == 0) ? -ENOENT : err;
> +		goto err_out;
> +	}
> +
> +	link = calloc(1, sizeof(*link));
> +	if (!link) {
> +		err = -ENOMEM;
> +		goto err_out;
> +	}
> +
> +	link->usdt_man = man;
> +	link->link.detach = &bpf_link_usdt_detach;
> +	link->link.dealloc = &bpf_link_usdt_dealloc;
> +
> +	link->uprobes = calloc(target_cnt, sizeof(*link->uprobes));
> +	if (!link->uprobes) {
> +		err = -ENOMEM;
> +		goto err_out;
> +	}
> +
> +	for (i = 0; i < target_cnt; i++) {
> +		struct usdt_target *target = &targets[i];
> +		struct bpf_link *uprobe_link;
> +
> +		opts.ref_ctr_offset = target->sema_off;
> +		uprobe_link = bpf_program__attach_uprobe_opts(prog, pid, path,
> +							      target->rel_ip, &opts);
> +		err = libbpf_get_error(link);
> +		if (err) {
> +			pr_warn("usdt: failed to attach uprobe #%d for '%s:%s' in '%s': %d\n",
> +				i, usdt_provider, usdt_name, path, err);
> +			goto err_out;
> +		}
> +
> +		link->uprobes[i].link = uprobe_link;
> +		link->uprobes[i].abs_ip = target->abs_ip;
> +		link->uprobe_cnt++;
> +	}
> +
> +	elf_end(elf);
> +	close(fd);
> +
> +	return &link->link;
> +
> +err_out:
> +	bpf_link__destroy(&link->link);
> +
> +	if (elf)
> +		elf_end(elf);
> +	close(fd);
> +	return libbpf_err_ptr(err);
> +}

Will test this series and feedback to you :)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-03-30  3:10   ` Hengqi Chen
@ 2022-03-30 15:22     ` Hengqi Chen
  2022-03-31  5:44       ` Andrii Nakryiko
  2022-03-30 15:36     ` Hengqi Chen
  2022-03-31  5:44     ` Andrii Nakryiko
  2 siblings, 1 reply; 35+ messages in thread
From: Hengqi Chen @ 2022-03-30 15:22 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel
  Cc: kernel-team, Alan Maguire, Dave Marchevsky



On 2022/3/30 11:10 AM, Hengqi Chen wrote:
> On 2022/3/25 1:29 PM, Andrii Nakryiko wrote:
>> Add BPF-side implementation of libbpf-provided USDT support. This
>> consists of single header library, usdt.bpf.h, which is meant to be used
>> from user's BPF-side source code. This header is added to the list of
>> installed libbpf header, along bpf_helpers.h and others.
>>
>> BPF-side implementation consists of two BPF maps:
>>   - spec map, which contains "a USDT spec" which encodes information

...

>> +}
>> +
>> +/* Fetch USDT argument *arg* (zero-indexed) and put its value into *res.
>> + * Returns 0 on success; negative error, otherwise.
>> + * On error *res is guaranteed to be set to zero.
>> + */
>> +__hidden __weak
>> +int bpf_usdt_arg(struct pt_regs *ctx, int arg, long *res)
>> +{
>> +	struct __bpf_usdt_spec *spec;
>> +	struct __bpf_usdt_arg_spec *arg_spec;
>> +	unsigned long val;
>> +	int err, spec_id;
>> +
>> +	*res = 0;
>> +
>> +	spec_id = __bpf_usdt_spec_id(ctx);
>> +	if (spec_id < 0)
>> +		return -ESRCH;
>> +
>> +	spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
>> +	if (!spec)
>> +		return -ESRCH;
>> +
>> +	if (arg >= spec->arg_cnt)
>> +		return -ENOENT;
>> +
>> +	arg_spec = &spec->args[arg];
>> +	switch (arg_spec->arg_type) {
>> +	case BPF_USDT_ARG_CONST:
>> +		val = arg_spec->val_off;
>> +		break;
>> +	case BPF_USDT_ARG_REG:
>> +		err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
>> +		if (err)
>> +			return err;
>> +		break;
>> +	case BPF_USDT_ARG_REG_DEREF:
>> +		err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
>> +		if (err)
>> +			return err;
>> +		err = bpf_probe_read_user(&val, sizeof(val), (void *)val + arg_spec->val_off);
>> +		if (err)
>> +			return err;

Can you elaborate more on these two probe read call ?

I replace bpf_probe_read_kernel with bpf_probe_read_user, it also works.

Thanks.

>> +		break;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	val <<= arg_spec->arg_bitshift;
>> +	if (arg_spec->arg_signed)
>> +		val = ((long)val) >> arg_spec->arg_bitshift;

>> + * BPF_USDT serves the same purpose for USDT handlers as BPF_PROG for
>> + * tp_btf/fentry/fexit BPF programs and BPF_KPROBE for kprobes.
>> + * Original struct pt_regs * context is preserved as 'ctx' argument.
>> + */
>> +#define BPF_USDT(name, args...)						    \
>> +name(struct pt_regs *ctx);						    \
>> +static __attribute__((always_inline)) typeof(name(0))			    \
>> +____##name(struct pt_regs *ctx, ##args);				    \
>> +typeof(name(0)) name(struct pt_regs *ctx)				    \
>> +{									    \
>> +        _Pragma("GCC diagnostic push")					    \
>> +        _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")		    \
>> +        return ____##name(___bpf_usdt_args(args));			    \
>> +        _Pragma("GCC diagnostic pop")					    \
>> +}									    \
>> +static __attribute__((always_inline)) typeof(name(0))			    \
>> +____##name(struct pt_regs *ctx, ##args)
>> +
>> +#endif /* __USDT_BPF_H__ */

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-03-30  3:10   ` Hengqi Chen
  2022-03-30 15:22     ` Hengqi Chen
@ 2022-03-30 15:36     ` Hengqi Chen
  2022-03-31  5:48       ` Andrii Nakryiko
  2022-03-31  5:44     ` Andrii Nakryiko
  2 siblings, 1 reply; 35+ messages in thread
From: Hengqi Chen @ 2022-03-30 15:36 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel
  Cc: kernel-team, Alan Maguire, Dave Marchevsky



On 2022/3/30 11:10 AM, Hengqi Chen wrote:
> On 2022/3/25 1:29 PM, Andrii Nakryiko wrote:
>> Add BPF-side implementation of libbpf-provided USDT support. This
>> consists of single header library, usdt.bpf.h, which is meant to be used
>> from user's BPF-side source code. This header is added to the list of
>> installed libbpf header, along bpf_helpers.h and others.
>>
>> BPF-side implementation consists of two BPF maps:
>>   - spec map, which contains "a USDT spec" which encodes information
>>     necessary to be able to fetch USDT arguments and other information
>>     (argument count, user-provided cookie value, etc) at runtime;
>>   - IP-to-spec-ID map, which is only used on kernels that don't support
>>     BPF cookie feature. It allows to lookup spec ID based on the place
>>     in user application that triggers USDT program.
>>
>> These maps have default sizes, 256 and 1024, which are chosen
>> conservatively to not waste a lot of space, but handling a lot of common
>> cases. But there could be cases when user application needs to either
>> trace a lot of different USDTs, or USDTs are heavily inlined and their
>> arguments are located in a lot of differing locations. For such cases it
>> might be necessary to size those maps up, which libbpf allows to do by
>> overriding BPF_USDT_MAX_SPEC_CNT and BPF_USDT_MAX_IP_CNT macros.

>> +
>> +__weak struct {
>> +	__uint(type, BPF_MAP_TYPE_ARRAY);
>> +	__uint(max_entries, BPF_USDT_MAX_SPEC_CNT);
>> +	__type(key, int);
>> +	__type(value, struct __bpf_usdt_spec);
>> +} __bpf_usdt_specs SEC(".maps");
>> +
>> +__weak struct {
>> +	__uint(type, BPF_MAP_TYPE_HASH);
>> +	__uint(max_entries, BPF_USDT_MAX_IP_CNT);
>> +	__type(key, long);
>> +	__type(value, struct __bpf_usdt_spec);
> 
> type should be int.
> 
>> +} __bpf_usdt_specs_ip_to_id SEC(".maps");

These weak symbols make BPF object open failed:

libbpf: No offset found in symbol table for VAR __bpf_usdt_specs
libbpf: Error finalizing .BTF: -2.

    bpf_object_open
        bpf_object__finalize_btf
            btf_finalize_data
                btf_fixup_datasec
                    find_elf_var_offset

This is because during BTF fixup, we only allow GLOBAL VAR.

Applying the following diff can workaround the issue:

+               unsigned char bind = ELF64_ST_BIND(sym->st_info);
 
-               if (ELF64_ST_BIND(sym->st_info) != STB_GLOBAL ||
+               if ((bind != STB_GLOBAL && bind != STB_WEAK) ||


>> +#endif /* __USDT_BPF_H__ */

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-03-30 15:22     ` Hengqi Chen
@ 2022-03-31  5:44       ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-31  5:44 UTC (permalink / raw)
  To: Hengqi Chen
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Alan Maguire, Dave Marchevsky

On Wed, Mar 30, 2022 at 8:22 AM Hengqi Chen <hengqi.chen@gmail.com> wrote:
>
>
>
> On 2022/3/30 11:10 AM, Hengqi Chen wrote:
> > On 2022/3/25 1:29 PM, Andrii Nakryiko wrote:
> >> Add BPF-side implementation of libbpf-provided USDT support. This
> >> consists of single header library, usdt.bpf.h, which is meant to be used
> >> from user's BPF-side source code. This header is added to the list of
> >> installed libbpf header, along bpf_helpers.h and others.
> >>
> >> BPF-side implementation consists of two BPF maps:
> >>   - spec map, which contains "a USDT spec" which encodes information
>
> ...
>
> >> +}
> >> +
> >> +/* Fetch USDT argument *arg* (zero-indexed) and put its value into *res.
> >> + * Returns 0 on success; negative error, otherwise.
> >> + * On error *res is guaranteed to be set to zero.
> >> + */
> >> +__hidden __weak
> >> +int bpf_usdt_arg(struct pt_regs *ctx, int arg, long *res)
> >> +{
> >> +    struct __bpf_usdt_spec *spec;
> >> +    struct __bpf_usdt_arg_spec *arg_spec;
> >> +    unsigned long val;
> >> +    int err, spec_id;
> >> +
> >> +    *res = 0;
> >> +
> >> +    spec_id = __bpf_usdt_spec_id(ctx);
> >> +    if (spec_id < 0)
> >> +            return -ESRCH;
> >> +
> >> +    spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> >> +    if (!spec)
> >> +            return -ESRCH;
> >> +
> >> +    if (arg >= spec->arg_cnt)
> >> +            return -ENOENT;
> >> +
> >> +    arg_spec = &spec->args[arg];
> >> +    switch (arg_spec->arg_type) {
> >> +    case BPF_USDT_ARG_CONST:
> >> +            val = arg_spec->val_off;
> >> +            break;
> >> +    case BPF_USDT_ARG_REG:
> >> +            err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
> >> +            if (err)
> >> +                    return err;
> >> +            break;
> >> +    case BPF_USDT_ARG_REG_DEREF:
> >> +            err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
> >> +            if (err)
> >> +                    return err;
> >> +            err = bpf_probe_read_user(&val, sizeof(val), (void *)val + arg_spec->val_off);
> >> +            if (err)
> >> +                    return err;
>
> Can you elaborate more on these two probe read call ?
>

I can add some comments here for each BPF_USDT_xxx case.

> I replace bpf_probe_read_kernel with bpf_probe_read_user, it also works.
>

You must be running some pretty old kernel on which there is no
bpf_probe_read_{user,kernel} and libbpf "downgrades" them to
bpf_probe_read() which works for both. It needs to be kernel read
because we are reading a field from struct pt_regs, which is in kernel
address space.

> Thanks.
>
> >> +            break;
> >> +    default:
> >> +            return -EINVAL;
> >> +    }
> >> +
> >> +    val <<= arg_spec->arg_bitshift;
> >> +    if (arg_spec->arg_signed)
> >> +            val = ((long)val) >> arg_spec->arg_bitshift;
>
> >> + * BPF_USDT serves the same purpose for USDT handlers as BPF_PROG for
> >> + * tp_btf/fentry/fexit BPF programs and BPF_KPROBE for kprobes.
> >> + * Original struct pt_regs * context is preserved as 'ctx' argument.
> >> + */
> >> +#define BPF_USDT(name, args...)                                                 \
> >> +name(struct pt_regs *ctx);                                              \
> >> +static __attribute__((always_inline)) typeof(name(0))                           \
> >> +____##name(struct pt_regs *ctx, ##args);                                \
> >> +typeof(name(0)) name(struct pt_regs *ctx)                               \
> >> +{                                                                       \
> >> +        _Pragma("GCC diagnostic push")                                          \
> >> +        _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")                  \
> >> +        return ____##name(___bpf_usdt_args(args));                      \
> >> +        _Pragma("GCC diagnostic pop")                                           \
> >> +}                                                                       \
> >> +static __attribute__((always_inline)) typeof(name(0))                           \
> >> +____##name(struct pt_regs *ctx, ##args)
> >> +
> >> +#endif /* __USDT_BPF_H__ */

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-03-30  3:10   ` Hengqi Chen
  2022-03-30 15:22     ` Hengqi Chen
  2022-03-30 15:36     ` Hengqi Chen
@ 2022-03-31  5:44     ` Andrii Nakryiko
  2 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-31  5:44 UTC (permalink / raw)
  To: Hengqi Chen
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Alan Maguire, Dave Marchevsky

On Tue, Mar 29, 2022 at 8:11 PM Hengqi Chen <hengqi.chen@gmail.com> wrote:
>
> On 2022/3/25 1:29 PM, Andrii Nakryiko wrote:
> > Add BPF-side implementation of libbpf-provided USDT support. This
> > consists of single header library, usdt.bpf.h, which is meant to be used
> > from user's BPF-side source code. This header is added to the list of
> > installed libbpf header, along bpf_helpers.h and others.
> >
> > BPF-side implementation consists of two BPF maps:
> >   - spec map, which contains "a USDT spec" which encodes information
> >     necessary to be able to fetch USDT arguments and other information
> >     (argument count, user-provided cookie value, etc) at runtime;
> >   - IP-to-spec-ID map, which is only used on kernels that don't support
> >     BPF cookie feature. It allows to lookup spec ID based on the place
> >     in user application that triggers USDT program.
> >
> > These maps have default sizes, 256 and 1024, which are chosen
> > conservatively to not waste a lot of space, but handling a lot of common
> > cases. But there could be cases when user application needs to either
> > trace a lot of different USDTs, or USDTs are heavily inlined and their
> > arguments are located in a lot of differing locations. For such cases it
> > might be necessary to size those maps up, which libbpf allows to do by
> > overriding BPF_USDT_MAX_SPEC_CNT and BPF_USDT_MAX_IP_CNT macros.
> >
> > It is an important aspect to keep in mind. Single USDT (user-space
> > equivalent of kernel tracepoint) can have multiple USDT "call sites".
> > That is, single logical USDT is triggered from multiple places in user
> > application. This can happen due to function inlining. Each such inlined
> > instance of USDT invocation can have its own unique USDT argument
> > specification (instructions about the location of the value of each of
> > USDT arguments). So while USDT looks very similar to usual uprobe or
> > kernel tracepoint, under the hood it's actually a collection of uprobes,
> > each potentially needing different spec to know how to fetch arguments.
> >
> > User-visible API consists of three helper functions:
> >   - bpf_usdt_arg_cnt(), which returns number of arguments of current USDT;
> >   - bpf_usdt_arg(), which reads value of specified USDT argument (by
> >     it's zero-indexed position) and returns it as 64-bit value;
> >   - bpf_usdt_cookie(), which functions like BPF cookie for USDT
> >     programs; this is necessary as libbpf doesn't allow specifying actual
> >     BPF cookie and utilizes it internally for USDT support implementation.
> >
> > Each bpf_usdt_xxx() APIs expect struct pt_regs * context, passed into
> > BPF program. On kernels that don't support BPF cookie it is used to
> > fetch absolute IP address of the underlying uprobe.
> >
> > usdt.bpf.h also provides BPF_USDT() macro, which functions like
> > BPF_PROG() and BPF_KPROBE() and allows much more user-friendly way to
> > get access to USDT arguments, if USDT definition is static and known to
> > the user. It is expected that majority of use cases won't have to use
> > bpf_usdt_arg_cnt() and bpf_usdt_arg() directly and BPF_USDT() will cover
> > all their needs.
> >
> > Last, usdt.bpf.h is utilizing BPF CO-RE for one single purpose: to
> > detect kernel support for BPF cookie. If BPF CO-RE dependency is
> > undesirable, user application can redefine BPF_USDT_HAS_BPF_COOKIE to
> > either a boolean constant (or equivalently zero and non-zero), or even
> > point it to its own .rodata variable that can be specified from user's
> > application user-space code. It is important that
> > BPF_USDT_HAS_BPF_COOKIE is known to BPF verifier as static value (thus
> > .rodata and not just .data), as otherwise BPF code will still contain
> > bpf_get_attach_cookie() BPF helper call and will fail validation at
> > runtime, if not dead-code eliminated.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  tools/lib/bpf/Makefile   |   2 +-
> >  tools/lib/bpf/usdt.bpf.h | 228 +++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 229 insertions(+), 1 deletion(-)
> >  create mode 100644 tools/lib/bpf/usdt.bpf.h
> >
> > diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
> > index b8b37fe76006..b4fbe8bed555 100644
> > --- a/tools/lib/bpf/Makefile
> > +++ b/tools/lib/bpf/Makefile
> > @@ -239,7 +239,7 @@ install_lib: all_cmd
> >
> >  SRC_HDRS := bpf.h libbpf.h btf.h libbpf_common.h libbpf_legacy.h xsk.h            \
> >           bpf_helpers.h bpf_tracing.h bpf_endian.h bpf_core_read.h         \
> > -         skel_internal.h libbpf_version.h
> > +         skel_internal.h libbpf_version.h usdt.bpf.h
> >  GEN_HDRS := $(BPF_GENERATED)
> >
> >  INSTALL_PFX := $(DESTDIR)$(prefix)/include/bpf
> > diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h
> > new file mode 100644
> > index 000000000000..8ee084b2e6b5
> > --- /dev/null
> > +++ b/tools/lib/bpf/usdt.bpf.h
> > @@ -0,0 +1,228 @@
> > +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> > +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> > +#ifndef __USDT_BPF_H__
> > +#define __USDT_BPF_H__
> > +
> > +#include <linux/errno.h>
> > +#include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_tracing.h>
> > +#include <bpf/bpf_core_read.h>
> > +
> > +/* Below types and maps are internal implementation details of libpf's USDT
>
> typo: libpf -> libbpf
>
> > + * support and are subjects to change. Also, usdt_xxx() API helpers should be
> > + * considered an unstable API as well and might be adjusted based on user
> > + * feedback from using libbpf's USDT support in production.
> > + */
> > +
> > +/* User can override BPF_USDT_MAX_SPEC_CNT to change default size of internal
> > + * map that keeps track of USDT argument specifications. This might be
> > + * necessary if there are a lot of USDT attachments.
> > + */
> > +#ifndef BPF_USDT_MAX_SPEC_CNT
> > +#define BPF_USDT_MAX_SPEC_CNT 256
> > +#endif
> > +/* User can override BPF_USDT_MAX_IP_CNT to change default size of internal
> > + * map that keeps track of IP (memory address) mapping to USDT argument
> > + * specification.
> > + * Note, if kernel supports BPF cookies, this map is not used and could be
> > + * resized all the way to 1 to save a bit of memory.
> > + */
> > +#ifndef BPF_USDT_MAX_IP_CNT
> > +#define BPF_USDT_MAX_IP_CNT 1024
> > +#endif
> > +/* We use BPF CO-RE to detect support for BPF cookie from BPF side. This is
> > + * the only dependency on CO-RE, so if it's undesirable, user can override
> > + * BPF_USDT_HAS_BPF_COOKIE to specify whether to BPF cookie is supported or not.
> > + */
> > +#ifndef BPF_USDT_HAS_BPF_COOKIE
> > +#define BPF_USDT_HAS_BPF_COOKIE \
> > +     bpf_core_enum_value_exists(enum bpf_func_id___usdt, BPF_FUNC_get_attach_cookie___usdt)
> > +#endif
> > +
> > +enum __bpf_usdt_arg_type {
> > +     BPF_USDT_ARG_CONST,
> > +     BPF_USDT_ARG_REG,
> > +     BPF_USDT_ARG_REG_DEREF,
> > +};
> > +
> > +struct __bpf_usdt_arg_spec {
> > +     __u64 val_off;
> > +     enum __bpf_usdt_arg_type arg_type;
> > +     short reg_off;
> > +     bool arg_signed;
> > +     char arg_bitshift;
> > +};
> > +
> > +/* should match USDT_MAX_ARG_CNT in usdt.c exactly */
> > +#define BPF_USDT_MAX_ARG_CNT 12
> > +struct __bpf_usdt_spec {
> > +     struct __bpf_usdt_arg_spec args[BPF_USDT_MAX_ARG_CNT];
> > +     __u64 usdt_cookie;
> > +     short arg_cnt;
> > +};
> > +
> > +__weak struct {
> > +     __uint(type, BPF_MAP_TYPE_ARRAY);
> > +     __uint(max_entries, BPF_USDT_MAX_SPEC_CNT);
> > +     __type(key, int);
> > +     __type(value, struct __bpf_usdt_spec);
> > +} __bpf_usdt_specs SEC(".maps");
> > +
> > +__weak struct {
> > +     __uint(type, BPF_MAP_TYPE_HASH);
> > +     __uint(max_entries, BPF_USDT_MAX_IP_CNT);
> > +     __type(key, long);
> > +     __type(value, struct __bpf_usdt_spec);
>
> type should be int.

yep, fixed that locally already, thanks

>
> > +} __bpf_usdt_specs_ip_to_id SEC(".maps");
> > +
> > +/* don't rely on user's BPF code to have latest definition of bpf_func_id */
> > +enum bpf_func_id___usdt {
> > +     BPF_FUNC_get_attach_cookie___usdt = 0xBAD, /* value doesn't matter */
> > +};
> > +
> > +static inline int __bpf_usdt_spec_id(struct pt_regs *ctx)
> > +{
> > +     if (!BPF_USDT_HAS_BPF_COOKIE) {
> > +             long ip = PT_REGS_IP(ctx);
> > +             int *spec_id_ptr;
> > +
> > +             spec_id_ptr = bpf_map_lookup_elem(&__bpf_usdt_specs_ip_to_id, &ip);
> > +             return spec_id_ptr ? *spec_id_ptr : -ESRCH;
> > +     }
> > +
> > +     return bpf_get_attach_cookie(ctx);
> > +}
> > +
> > +/* Return number of USDT arguments defined for currently traced USDT. */
> > +__hidden __weak
> > +int bpf_usdt_arg_cnt(struct pt_regs *ctx)
> > +{
> > +     struct __bpf_usdt_spec *spec;
> > +     int spec_id;
> > +
> > +     spec_id = __bpf_usdt_spec_id(ctx);
> > +     if (spec_id < 0)
> > +             return -EINVAL;
> > +
> > +     spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> > +     if (!spec)
> > +             return -EINVAL;
> > +
> > +     return spec->arg_cnt;
> > +}
> > +
> > +/* Fetch USDT argument *arg* (zero-indexed) and put its value into *res.
> > + * Returns 0 on success; negative error, otherwise.
> > + * On error *res is guaranteed to be set to zero.
> > + */
> > +__hidden __weak
> > +int bpf_usdt_arg(struct pt_regs *ctx, int arg, long *res)
> > +{
> > +     struct __bpf_usdt_spec *spec;
> > +     struct __bpf_usdt_arg_spec *arg_spec;
> > +     unsigned long val;
> > +     int err, spec_id;
> > +
> > +     *res = 0;
> > +
> > +     spec_id = __bpf_usdt_spec_id(ctx);
> > +     if (spec_id < 0)
> > +             return -ESRCH;
> > +
> > +     spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> > +     if (!spec)
> > +             return -ESRCH;
> > +
> > +     if (arg >= spec->arg_cnt)
> > +             return -ENOENT;
> > +
> > +     arg_spec = &spec->args[arg];
> > +     switch (arg_spec->arg_type) {
> > +     case BPF_USDT_ARG_CONST:
> > +             val = arg_spec->val_off;
> > +             break;
> > +     case BPF_USDT_ARG_REG:
> > +             err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
> > +             if (err)
> > +                     return err;
> > +             break;
> > +     case BPF_USDT_ARG_REG_DEREF:
> > +             err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
> > +             if (err)
> > +                     return err;
> > +             err = bpf_probe_read_user(&val, sizeof(val), (void *)val + arg_spec->val_off);
> > +             if (err)
> > +                     return err;
> > +             break;
> > +     default:
> > +             return -EINVAL;
> > +     }
> > +
> > +     val <<= arg_spec->arg_bitshift;
> > +     if (arg_spec->arg_signed)
> > +             val = ((long)val) >> arg_spec->arg_bitshift;
> > +     else
> > +             val = val >> arg_spec->arg_bitshift;
> > +     *res = val;
> > +     return 0;
> > +}
> > +
> > +/* Retrieve user-specified cookie value provided during attach as
> > + * bpf_usdt_opts.usdt_cookie. This serves the same purpose as BPF cookie
> > + * returned by bpf_get_attach_cookie(). Libbpf's support for USDT is itself
> > + * utilizaing BPF cookies internally, so user can't use BPF cookie directly
> > + * for USDT programs and has to use bpf_usdt_cookie() API instead.
> > + */
> > +__hidden __weak
> > +long bpf_usdt_cookie(struct pt_regs *ctx)
> > +{
> > +     struct __bpf_usdt_spec *spec;
> > +     int spec_id;
> > +
> > +     spec_id = __bpf_usdt_spec_id(ctx);
> > +     if (spec_id < 0)
> > +             return 0;
> > +
> > +     spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> > +     if (!spec)
> > +             return 0;
> > +
> > +     return spec->usdt_cookie;
> > +}
> > +
> > +/* we rely on ___bpf_apply() and ___bpf_narg() macros already defined in bpf_tracing.h */
> > +#define ___bpf_usdt_args0() ctx
> > +#define ___bpf_usdt_args1(x) ___bpf_usdt_args0(), ({ long _x; bpf_usdt_arg(ctx, 0, &_x); (void *)_x; })
> > +#define ___bpf_usdt_args2(x, args...) ___bpf_usdt_args1(args), ({ long _x; bpf_usdt_arg(ctx, 1, &_x); (void *)_x; })
> > +#define ___bpf_usdt_args3(x, args...) ___bpf_usdt_args2(args), ({ long _x; bpf_usdt_arg(ctx, 2, &_x); (void *)_x; })
> > +#define ___bpf_usdt_args4(x, args...) ___bpf_usdt_args3(args), ({ long _x; bpf_usdt_arg(ctx, 3, &_x); (void *)_x; })
> > +#define ___bpf_usdt_args5(x, args...) ___bpf_usdt_args4(args), ({ long _x; bpf_usdt_arg(ctx, 4, &_x); (void *)_x; })
> > +#define ___bpf_usdt_args6(x, args...) ___bpf_usdt_args5(args), ({ long _x; bpf_usdt_arg(ctx, 5, &_x); (void *)_x; })
> > +#define ___bpf_usdt_args7(x, args...) ___bpf_usdt_args6(args), ({ long _x; bpf_usdt_arg(ctx, 6, &_x); (void *)_x; })
> > +#define ___bpf_usdt_args8(x, args...) ___bpf_usdt_args7(args), ({ long _x; bpf_usdt_arg(ctx, 7, &_x); (void *)_x; })
> > +#define ___bpf_usdt_args9(x, args...) ___bpf_usdt_args8(args), ({ long _x; bpf_usdt_arg(ctx, 8, &_x); (void *)_x; })
> > +#define ___bpf_usdt_args10(x, args...) ___bpf_usdt_args9(args), ({ long _x; bpf_usdt_arg(ctx, 9, &_x); (void *)_x; })
> > +#define ___bpf_usdt_args11(x, args...) ___bpf_usdt_args10(args), ({ long _x; bpf_usdt_arg(ctx, 10, &_x); (void *)_x; })
> > +#define ___bpf_usdt_args12(x, args...) ___bpf_usdt_args11(args), ({ long _x; bpf_usdt_arg(ctx, 11, &_x); (void *)_x; })
> > +#define ___bpf_usdt_args(args...) ___bpf_apply(___bpf_usdt_args, ___bpf_narg(args))(args)
> > +
> > +/*
> > + * BPF_USDT serves the same purpose for USDT handlers as BPF_PROG for
> > + * tp_btf/fentry/fexit BPF programs and BPF_KPROBE for kprobes.
> > + * Original struct pt_regs * context is preserved as 'ctx' argument.
> > + */
> > +#define BPF_USDT(name, args...)                                                  \
> > +name(struct pt_regs *ctx);                                               \
> > +static __attribute__((always_inline)) typeof(name(0))                            \
> > +____##name(struct pt_regs *ctx, ##args);                                 \
> > +typeof(name(0)) name(struct pt_regs *ctx)                                \
> > +{                                                                        \
> > +        _Pragma("GCC diagnostic push")                                           \
> > +        _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")                   \
> > +        return ____##name(___bpf_usdt_args(args));                       \
> > +        _Pragma("GCC diagnostic pop")                                            \
> > +}                                                                        \
> > +static __attribute__((always_inline)) typeof(name(0))                            \
> > +____##name(struct pt_regs *ctx, ##args)
> > +
> > +#endif /* __USDT_BPF_H__ */

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-03-30 15:36     ` Hengqi Chen
@ 2022-03-31  5:48       ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-31  5:48 UTC (permalink / raw)
  To: Hengqi Chen
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Alan Maguire, Dave Marchevsky

On Wed, Mar 30, 2022 at 8:36 AM Hengqi Chen <hengqi.chen@gmail.com> wrote:
>
>
>
> On 2022/3/30 11:10 AM, Hengqi Chen wrote:
> > On 2022/3/25 1:29 PM, Andrii Nakryiko wrote:
> >> Add BPF-side implementation of libbpf-provided USDT support. This
> >> consists of single header library, usdt.bpf.h, which is meant to be used
> >> from user's BPF-side source code. This header is added to the list of
> >> installed libbpf header, along bpf_helpers.h and others.
> >>
> >> BPF-side implementation consists of two BPF maps:
> >>   - spec map, which contains "a USDT spec" which encodes information
> >>     necessary to be able to fetch USDT arguments and other information
> >>     (argument count, user-provided cookie value, etc) at runtime;
> >>   - IP-to-spec-ID map, which is only used on kernels that don't support
> >>     BPF cookie feature. It allows to lookup spec ID based on the place
> >>     in user application that triggers USDT program.
> >>
> >> These maps have default sizes, 256 and 1024, which are chosen
> >> conservatively to not waste a lot of space, but handling a lot of common
> >> cases. But there could be cases when user application needs to either
> >> trace a lot of different USDTs, or USDTs are heavily inlined and their
> >> arguments are located in a lot of differing locations. For such cases it
> >> might be necessary to size those maps up, which libbpf allows to do by
> >> overriding BPF_USDT_MAX_SPEC_CNT and BPF_USDT_MAX_IP_CNT macros.
>
> >> +
> >> +__weak struct {
> >> +    __uint(type, BPF_MAP_TYPE_ARRAY);
> >> +    __uint(max_entries, BPF_USDT_MAX_SPEC_CNT);
> >> +    __type(key, int);
> >> +    __type(value, struct __bpf_usdt_spec);
> >> +} __bpf_usdt_specs SEC(".maps");
> >> +
> >> +__weak struct {
> >> +    __uint(type, BPF_MAP_TYPE_HASH);
> >> +    __uint(max_entries, BPF_USDT_MAX_IP_CNT);
> >> +    __type(key, long);
> >> +    __type(value, struct __bpf_usdt_spec);
> >
> > type should be int.
> >
> >> +} __bpf_usdt_specs_ip_to_id SEC(".maps");
>
> These weak symbols make BPF object open failed:
>
> libbpf: No offset found in symbol table for VAR __bpf_usdt_specs
> libbpf: Error finalizing .BTF: -2.
>
>     bpf_object_open
>         bpf_object__finalize_btf
>             btf_finalize_data
>                 btf_fixup_datasec
>                     find_elf_var_offset
>
> This is because during BTF fixup, we only allow GLOBAL VAR.
>
> Applying the following diff can workaround the issue:
>
> +               unsigned char bind = ELF64_ST_BIND(sym->st_info);
>
> -               if (ELF64_ST_BIND(sym->st_info) != STB_GLOBAL ||
> +               if ((bind != STB_GLOBAL && bind != STB_WEAK) ||
>
>

Interesting that selftests don't run into this bug, probably because
BPF linker converts STB_WEAK into STB_GLOBAL? I'll check that, thanks
for catching!

> >> +#endif /* __USDT_BPF_H__ */

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration
  2022-03-30  3:24   ` Hengqi Chen
@ 2022-03-31  5:56     ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-31  5:56 UTC (permalink / raw)
  To: Hengqi Chen
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Alan Maguire, Dave Marchevsky

On Tue, Mar 29, 2022 at 8:25 PM Hengqi Chen <hengqi.chen@gmail.com> wrote:
>
>
>
> On 2022/3/25 1:29 PM, Andrii Nakryiko wrote:
> > Wire up libbpf USDT support APIs without yet implementing all the
> > nitty-gritty details of USDT discovery, spec parsing, and BPF map
> > initialization.
> >
> > User-visible user-space API is simple and is conceptually very similar
> > to uprobe API.
> >
> > bpf_program__attach_usdt() API allows to programmatically attach given
> > BPF program to a USDT, specified through binary path (executable or
> > shared lib), USDT provider and name. Also, just like in uprobe case, PID
> > filter is specified (0 - self, -1 - any process, or specific PID).
> > Optionally, USDT cookie value can be specified. Such single API
> > invocation will try to discover given USDT in specified binary and will
> > use (potentially many) BPF uprobes to attach this program in correct
> > locations.
> >
> > Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that
> > represents this attachment. It is a virtual BPF link that doesn't have
> > direct kernel object, as it can consist of multiple underlying BPF
> > uprobe links. As such, attachment is not atomic operation and there can
> > be brief moment when some USDT call sites are attached while others are
> > still in the process of attaching. This should be taken into
> > consideration by user. But bpf_program__attach_usdt() guarantees that
> > in the case of success all USDT call sites are successfully attached, or
> > all the successfuly attachments will be detached as soon as some USDT
> > call sites failed to be attached. So, in theory, there could be cases of
> > failed bpf_program__attach_usdt() call which did trigger few USDT
> > program invocations. This is unavoidable due to multi-uprobe nature of
> > USDT and has to be handled by user, if it's important to create an
> > illusion of atomicity.
> >
> > USDT BPF programs themselves are marked in BPF source code as either
> > SEC("usdt"), in which case they won't be auto-attached through
> > skeleton's <skel>__attach() method, or it can have a full definition,
> > which follows the spirit of fully-specified uprobes:
> > SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's
> > attach method will attempt auto-attachment. Similarly, generic
> > bpf_program__attach() will have enought information to go off of for
> > parameterless attachment.
> >
> > USDT BPF programs are actually uprobes, and as such for kernel they are
> > marked as BPF_PROG_TYPE_KPROBE.
> >
> > Another part of this patch is USDT-related feature probing:
> >   - BPF cookie support detection from user-space;
> >   - detection of kernel support for auto-refcounting of USDT semaphore.
> >
> > The latter is optional. If kernel doesn't support such feature and USDT
> > doesn't rely on USDT semaphores, no error is returned. But if libbpf
> > detects that USDT requires setting semaphores and kernel doesn't support
> > this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't
> > support poking process's memory directly to increment semaphore value,
> > like BCC does on legacy kernels, due to inherent raciness and danger of
> > such process memory manipulation. Libbpf let's kernel take care of this
> > properly or gives up.
> >
> > Logistically, all the extra USDT-related infrastructure of libbpf is put
> > into a separate usdt.c file and abstracted behind struct usdt_manager.
> > Each bpf_object has lazily-initialized usdt_manager pointer, which is
> > only instantiated if USDT programs are attempted to be attached. Closing
> > BPF object frees up usdt_manager resources. usdt_manager keeps track of
> > USDT spec ID assignment and few other small things.
> >
> > Subsequent patches will fill out remaining missing pieces of USDT
> > initialization and setup logic.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  tools/lib/bpf/Build             |   3 +-
> >  tools/lib/bpf/libbpf.c          |  92 ++++++++++-
> >  tools/lib/bpf/libbpf.h          |  15 ++
> >  tools/lib/bpf/libbpf.map        |   1 +
> >  tools/lib/bpf/libbpf_internal.h |  19 +++
> >  tools/lib/bpf/usdt.c            | 270 ++++++++++++++++++++++++++++++++
> >  6 files changed, 391 insertions(+), 9 deletions(-)
> >  create mode 100644 tools/lib/bpf/usdt.c
> >

[...]

> > +struct bpf_link *bpf_program__attach_usdt(const struct bpf_program *prog,
> > +                                       pid_t pid, const char *binary_path,
> > +                                       const char *usdt_provider, const char *usdt_name,
> > +                                       const struct bpf_usdt_opts *opts)
> > +{
> > +     struct bpf_object *obj = prog->obj;
> > +     struct bpf_link *link;
> > +     long usdt_cookie;
> > +     int err;
> > +
> > +     if (!OPTS_VALID(opts, bpf_uprobe_opts))
> > +             return libbpf_err_ptr(-EINVAL);
> > +
> > +     /* USDT manager is instantiated lazily on first USDT attach. It will
> > +      * be destroyed together with BPF object in bpf_object__close().
> > +      */
> > +     if (!obj->usdt_man) {
> > +             obj->usdt_man = usdt_manager_new(obj);
> > +             if (!obj->usdt_man)
> > +                     return libbpf_err_ptr(-ENOMEM);
>
> usdt_manager_new returns NULL in two cases, ENOMEM is not accurate when map not found.
>
>

True, we can use ERR_PTR() for usdt_manager_new() as it is an internal
API. I'll update the code accordingly.

> > +     }
> > +
> > +     usdt_cookie = OPTS_GET(opts, usdt_cookie, 0);
> > +     link = usdt_manager_attach_usdt(obj->usdt_man, prog, pid, binary_path,
> > +                                     usdt_provider, usdt_name, usdt_cookie);
> > +     err = libbpf_get_error(link);
> > +     if (err)
> > +             return libbpf_err_ptr(err);
> > +     return link;
> > +}
> > +
> > +static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link)
> > +{
> > +     char *path = NULL, *provider = NULL, *name = NULL;
> > +     const char *sec_name;
> > +
> > +     sec_name = bpf_program__section_name(prog);
> > +     if (strcmp(sec_name, "usdt") == 0) {
> > +             /* no auto-attach for just SEC("usdt") */
> > +             *link = NULL;
> > +             return 0;
> > +     }
> > +
> > +     if (3 != sscanf(sec_name, "usdt/%m[^:]:%m[^:]:%m[^:]", &path, &provider, &name)) {
>
> Is yoda condition a good practice ?

I used it to emphasize and make it clear how many parts we expect, but
I have no strong feeling about doing sscanf() == 3 in this case
either.

>
> > +             pr_warn("invalid section '%s', expected SEC(\"usdt/<path>:<provider>:<name>\")\n",
> > +                     sec_name);
> > +             free(path);
> > +             free(provider);
> > +             free(name);
> > +             return -EINVAL;
> > +     }
> > +

[...]

> > +     man = calloc(1, sizeof(*man));
> > +     if (!man)
> > +             return NULL;
> > +
> > +     man->specs_map = specs_map;
> > +     man->ip_to_id_map = ip_to_id_map;
> > +
> > +        /* Detect if BPF cookie is supported for kprobes.
> > +      * We don't need IP-to-ID mapping if we can use BPF cookies.
> > +         * Added in: 7adfc6c9b315 ("bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value")
> > +         */
>
>      ^  mixed-indention here.

will fix

>
> > +     man->has_bpf_cookie = kernel_supports(obj, FEAT_BPF_COOKIE);
> > +
> > +     /* Detect kernel support for automatic refcounting of USDT semaphore.
> > +      * If this is not supported, USDTs with semaphores will not be supported.
> > +      * Added in: a6ca88b241d5 ("trace_uprobe: support reference counter in fd-based uprobe")
> > +      */
> > +     man->has_sema_refcnt = access(ref_ctr_sysfs_path, F_OK) == 0;
> > +
> > +     return man;
> > +}
> > +

[...]

> > +err_out:
> > +     bpf_link__destroy(&link->link);
> > +
> > +     if (elf)
> > +             elf_end(elf);
> > +     close(fd);
> > +     return libbpf_err_ptr(err);
> > +}
>
> Will test this series and feedback to you :)
>

Great, thank you!

I'll add a bunch more comments to explain overall "setup" and do few
more small changes here and there and will post v2 soon-ish. But all
the APIs and behavior won't change.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-03-25  5:29 ` [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support Andrii Nakryiko
  2022-03-30  3:10   ` Hengqi Chen
@ 2022-03-31 11:30   ` Alan Maguire
  2022-03-31 18:49     ` Andrii Nakryiko
  2022-03-31 18:34   ` program local storage. Was: " Alexei Starovoitov
  2 siblings, 1 reply; 35+ messages in thread
From: Alan Maguire @ 2022-03-31 11:30 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, kernel-team, Alan Maguire, Dave Marchevsky

On Fri, 25 Mar 2022, Andrii Nakryiko wrote:

> Add BPF-side implementation of libbpf-provided USDT support. This
> consists of single header library, usdt.bpf.h, which is meant to be used
> from user's BPF-side source code. This header is added to the list of
> installed libbpf header, along bpf_helpers.h and others.
>

<snip>

Some suggestions below, but nothing major.

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
 
> diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h
> new file mode 100644
> index 000000000000..8ee084b2e6b5
> --- /dev/null
> +++ b/tools/lib/bpf/usdt.bpf.h
> @@ -0,0 +1,228 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +#ifndef __USDT_BPF_H__
> +#define __USDT_BPF_H__
> +
> +#include <linux/errno.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_core_read.h>
> +
> +/* Below types and maps are internal implementation details of libpf's USDT
> + * support and are subjects to change. Also, usdt_xxx() API helpers should be
> + * considered an unstable API as well and might be adjusted based on user
> + * feedback from using libbpf's USDT support in production.
> + */
> +
> +/* User can override BPF_USDT_MAX_SPEC_CNT to change default size of internal
> + * map that keeps track of USDT argument specifications. This might be
> + * necessary if there are a lot of USDT attachments.
> + */
> +#ifndef BPF_USDT_MAX_SPEC_CNT
> +#define BPF_USDT_MAX_SPEC_CNT 256
> +#endif
> +/* User can override BPF_USDT_MAX_IP_CNT to change default size of internal
> + * map that keeps track of IP (memory address) mapping to USDT argument
> + * specification.
> + * Note, if kernel supports BPF cookies, this map is not used and could be
> + * resized all the way to 1 to save a bit of memory.
> + */
> +#ifndef BPF_USDT_MAX_IP_CNT
> +#define BPF_USDT_MAX_IP_CNT 1024
> +#endif

might be no harm to just make this default to a reasonable multiple of 
BPF_USDT_MAX_SPEC_CNT; i.e. n specs X m possible sites. Would allow users
to simply override the MAX_SPEC_CNT in most cases too.

> +/* We use BPF CO-RE to detect support for BPF cookie from BPF side. This is
> + * the only dependency on CO-RE, so if it's undesirable, user can override
> + * BPF_USDT_HAS_BPF_COOKIE to specify whether to BPF cookie is supported or not.
> + */
> +#ifndef BPF_USDT_HAS_BPF_COOKIE
> +#define BPF_USDT_HAS_BPF_COOKIE \
> +	bpf_core_enum_value_exists(enum bpf_func_id___usdt, BPF_FUNC_get_attach_cookie___usdt)
> +#endif
> +
> +enum __bpf_usdt_arg_type {
> +	BPF_USDT_ARG_CONST,
> +	BPF_USDT_ARG_REG,
> +	BPF_USDT_ARG_REG_DEREF,
> +};
> +
> +struct __bpf_usdt_arg_spec {
> +	__u64 val_off;
> +	enum __bpf_usdt_arg_type arg_type;
> +	short reg_off;
> +	bool arg_signed;
> +	char arg_bitshift;

would be no harm having a small comment here or below where the 
bitshifting is done like "for arg sizes less than 8 bytes, this tells
us how many bits to shift to left then right to
remove the unused bits, giving correct arg value".

> +};
> +
> +/* should match USDT_MAX_ARG_CNT in usdt.c exactly */
> +#define BPF_USDT_MAX_ARG_CNT 12
> +struct __bpf_usdt_spec {
> +	struct __bpf_usdt_arg_spec args[BPF_USDT_MAX_ARG_CNT];
> +	__u64 usdt_cookie;
> +	short arg_cnt;
> +};
> +
> +__weak struct {
> +	__uint(type, BPF_MAP_TYPE_ARRAY);
> +	__uint(max_entries, BPF_USDT_MAX_SPEC_CNT);
> +	__type(key, int);
> +	__type(value, struct __bpf_usdt_spec);
> +} __bpf_usdt_specs SEC(".maps");
> +
> +__weak struct {
> +	__uint(type, BPF_MAP_TYPE_HASH);
> +	__uint(max_entries, BPF_USDT_MAX_IP_CNT);
> +	__type(key, long);
> +	__type(value, struct __bpf_usdt_spec);
> +} __bpf_usdt_specs_ip_to_id SEC(".maps");
> +
> +/* don't rely on user's BPF code to have latest definition of bpf_func_id */
> +enum bpf_func_id___usdt {
> +	BPF_FUNC_get_attach_cookie___usdt = 0xBAD, /* value doesn't matter */
> +};
> +
> +static inline int __bpf_usdt_spec_id(struct pt_regs *ctx)
> +{
> +	if (!BPF_USDT_HAS_BPF_COOKIE) {
> +		long ip = PT_REGS_IP(ctx);

Trying to sort of the permutations of features, I _think_ is it possible 
the user has CO-RE support, but the clang version doesn't support the
push of the preserve_access_index attribute? Would it be feasible to
do an explicit "PT_REGS_IP_CORE(ctx);" here?

> +		int *spec_id_ptr;
> +
> +		spec_id_ptr = bpf_map_lookup_elem(&__bpf_usdt_specs_ip_to_id, &ip);
> +		return spec_id_ptr ? *spec_id_ptr : -ESRCH;
> +	}
> +
> +	return bpf_get_attach_cookie(ctx);

should we grab the result in a u64 and handle the 0 case here - 
meaning "not specified" - and return -ESRCH?

> +}
> +
> +/* Return number of USDT arguments defined for currently traced USDT. */
> +__hidden __weak
> +int bpf_usdt_arg_cnt(struct pt_regs *ctx)
> +{
> +	struct __bpf_usdt_spec *spec;
> +	int spec_id;
> +
> +	spec_id = __bpf_usdt_spec_id(ctx);
> +	if (spec_id < 0)
> +		return -EINVAL;

spec_id can be 0 for the "cookie not set" case (see above).

should we pass through the error value from __bpf_usdt_spec_id()? Looking
above it's either -ESRCH or 0, but if we catch the 0 case as above we 
could just pass through the error value.
 
> +
> +	spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> +	if (!spec)
> +		return -EINVAL;
> +

should this be -ESRCH? we know from the above we had a valid
spec_id.

> +	return spec->arg_cnt;
> +}

also, since in every case (I think) that we call __bpf_usdt_spec_id()
we co on to look up the spec in the map, would it be easier to
combine both operations and have

struct __bpf_usdt_spec * __bpf_usdt_spec(struct pt_regs *ctx);

?

> +
> +/* Fetch USDT argument *arg* (zero-indexed) and put its value into *res.
> + * Returns 0 on success; negative error, otherwise.
> + * On error *res is guaranteed to be set to zero.
> + */
> +__hidden __weak
> +int bpf_usdt_arg(struct pt_regs *ctx, int arg, long *res)
> +{
> +	struct __bpf_usdt_spec *spec;
> +	struct __bpf_usdt_arg_spec *arg_spec;
> +	unsigned long val;
> +	int err, spec_id;
> +
> +	*res = 0;
> +
> +	spec_id = __bpf_usdt_spec_id(ctx);
> +	if (spec_id < 0)
> +		return -ESRCH;
> +
> +	spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> +	if (!spec)
> +		return -ESRCH;
> +
> +	if (arg >= spec->arg_cnt)
> +		return -ENOENT;
> +

I'm surprised you didn't need to check for negative values or a hard 
upper bound for the arg index here (to keep the verifier happy for
the later array indexing using arg). Any dangers that an older
LLVM+clang would generate code that might get tripped up on
verification with this?

> +	arg_spec = &spec->args[arg];
> +	switch (arg_spec->arg_type) {
> +	case BPF_USDT_ARG_CONST:
> +		val = arg_spec->val_off;
> +		break;
> +	case BPF_USDT_ARG_REG:
> +		err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
> +		if (err)
> +			return err;
> +		break;
> +	case BPF_USDT_ARG_REG_DEREF:
> +		err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
> +		if (err)
> +			return err;
> +		err = bpf_probe_read_user(&val, sizeof(val), (void *)val + arg_spec->val_off);
> +		if (err)
> +			return err;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	val <<= arg_spec->arg_bitshift;
> +	if (arg_spec->arg_signed)
> +		val = ((long)val) >> arg_spec->arg_bitshift;
> +	else
> +		val = val >> arg_spec->arg_bitshift;
> +	*res = val;
> +	return 0;
> +}
> +
> +/* Retrieve user-specified cookie value provided during attach as
> + * bpf_usdt_opts.usdt_cookie. This serves the same purpose as BPF cookie
> + * returned by bpf_get_attach_cookie(). Libbpf's support for USDT is itself
> + * utilizaing BPF cookies internally, so user can't use BPF cookie directly
> + * for USDT programs and has to use bpf_usdt_cookie() API instead.
> + */
> +__hidden __weak
> +long bpf_usdt_cookie(struct pt_regs *ctx)
> +{
> +	struct __bpf_usdt_spec *spec;
> +	int spec_id;
> +
> +	spec_id = __bpf_usdt_spec_id(ctx);
> +	if (spec_id < 0)
> +		return 0;
> +
> +	spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> +	if (!spec)
> +		return 0;
> +
> +	return spec->usdt_cookie;
> +}
> +
> +/* we rely on ___bpf_apply() and ___bpf_narg() macros already defined in bpf_tracing.h */
> +#define ___bpf_usdt_args0() ctx
> +#define ___bpf_usdt_args1(x) ___bpf_usdt_args0(), ({ long _x; bpf_usdt_arg(ctx, 0, &_x); (void *)_x; })
> +#define ___bpf_usdt_args2(x, args...) ___bpf_usdt_args1(args), ({ long _x; bpf_usdt_arg(ctx, 1, &_x); (void *)_x; })
> +#define ___bpf_usdt_args3(x, args...) ___bpf_usdt_args2(args), ({ long _x; bpf_usdt_arg(ctx, 2, &_x); (void *)_x; })
> +#define ___bpf_usdt_args4(x, args...) ___bpf_usdt_args3(args), ({ long _x; bpf_usdt_arg(ctx, 3, &_x); (void *)_x; })
> +#define ___bpf_usdt_args5(x, args...) ___bpf_usdt_args4(args), ({ long _x; bpf_usdt_arg(ctx, 4, &_x); (void *)_x; })
> +#define ___bpf_usdt_args6(x, args...) ___bpf_usdt_args5(args), ({ long _x; bpf_usdt_arg(ctx, 5, &_x); (void *)_x; })
> +#define ___bpf_usdt_args7(x, args...) ___bpf_usdt_args6(args), ({ long _x; bpf_usdt_arg(ctx, 6, &_x); (void *)_x; })
> +#define ___bpf_usdt_args8(x, args...) ___bpf_usdt_args7(args), ({ long _x; bpf_usdt_arg(ctx, 7, &_x); (void *)_x; })
> +#define ___bpf_usdt_args9(x, args...) ___bpf_usdt_args8(args), ({ long _x; bpf_usdt_arg(ctx, 8, &_x); (void *)_x; })
> +#define ___bpf_usdt_args10(x, args...) ___bpf_usdt_args9(args), ({ long _x; bpf_usdt_arg(ctx, 9, &_x); (void *)_x; })
> +#define ___bpf_usdt_args11(x, args...) ___bpf_usdt_args10(args), ({ long _x; bpf_usdt_arg(ctx, 10, &_x); (void *)_x; })
> +#define ___bpf_usdt_args12(x, args...) ___bpf_usdt_args11(args), ({ long _x; bpf_usdt_arg(ctx, 11, &_x); (void *)_x; })
> +#define ___bpf_usdt_args(args...) ___bpf_apply(___bpf_usdt_args, ___bpf_narg(args))(args)
> +
> +/*
> + * BPF_USDT serves the same purpose for USDT handlers as BPF_PROG for
> + * tp_btf/fentry/fexit BPF programs and BPF_KPROBE for kprobes.
> + * Original struct pt_regs * context is preserved as 'ctx' argument.
> + */
> +#define BPF_USDT(name, args...)						    \
> +name(struct pt_regs *ctx);						    \
> +static __attribute__((always_inline)) typeof(name(0))			    \
> +____##name(struct pt_regs *ctx, ##args);				    \
> +typeof(name(0)) name(struct pt_regs *ctx)				    \
> +{									    \
> +        _Pragma("GCC diagnostic push")					    \
> +        _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")		    \
> +        return ____##name(___bpf_usdt_args(args));			    \
> +        _Pragma("GCC diagnostic pop")					    \
> +}									    \
> +static __attribute__((always_inline)) typeof(name(0))			    \
> +____##name(struct pt_regs *ctx, ##args)
> +
> +#endif /* __USDT_BPF_H__ */
> -- 
> 2.30.2
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration
  2022-03-25  5:29 ` [PATCH bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration Andrii Nakryiko
  2022-03-30  3:24   ` Hengqi Chen
@ 2022-03-31 12:13   ` Alan Maguire
  2022-03-31 19:02     ` Andrii Nakryiko
  1 sibling, 1 reply; 35+ messages in thread
From: Alan Maguire @ 2022-03-31 12:13 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, kernel-team, Alan Maguire, Dave Marchevsky

On Fri, 25 Mar 2022, Andrii Nakryiko wrote:

> Wire up libbpf USDT support APIs without yet implementing all the
> nitty-gritty details of USDT discovery, spec parsing, and BPF map
> initialization.
> 
> User-visible user-space API is simple and is conceptually very similar
> to uprobe API.
> 
> bpf_program__attach_usdt() API allows to programmatically attach given
> BPF program to a USDT, specified through binary path (executable or
> shared lib), USDT provider and name. Also, just like in uprobe case, PID
> filter is specified (0 - self, -1 - any process, or specific PID).
> Optionally, USDT cookie value can be specified. Such single API
> invocation will try to discover given USDT in specified binary and will
> use (potentially many) BPF uprobes to attach this program in correct
> locations.
> 
> Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that
> represents this attachment. It is a virtual BPF link that doesn't have
> direct kernel object, as it can consist of multiple underlying BPF
> uprobe links. As such, attachment is not atomic operation and there can
> be brief moment when some USDT call sites are attached while others are
> still in the process of attaching. This should be taken into
> consideration by user. But bpf_program__attach_usdt() guarantees that
> in the case of success all USDT call sites are successfully attached, or
> all the successfuly attachments will be detached as soon as some USDT
> call sites failed to be attached. So, in theory, there could be cases of
> failed bpf_program__attach_usdt() call which did trigger few USDT
> program invocations. This is unavoidable due to multi-uprobe nature of
> USDT and has to be handled by user, if it's important to create an
> illusion of atomicity.
> 
> USDT BPF programs themselves are marked in BPF source code as either
> SEC("usdt"), in which case they won't be auto-attached through
> skeleton's <skel>__attach() method, or it can have a full definition,
> which follows the spirit of fully-specified uprobes:
> SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's
> attach method will attempt auto-attachment. Similarly, generic
> bpf_program__attach() will have enought information to go off of for
> parameterless attachment.
> 

Might be worth describing briefly the under-the-hood mechanisms; the
usdt_manager that is per-BPF-object (so can conceptually represent
multiple USDT providers/probes). It is initialized on first use and
freed with bpf_object__close(); it is tasked with managing the mapping
from usdt provider:name to actual sites+arguments via the spec/ip-to-id
maps.

> USDT BPF programs are actually uprobes, and as such for kernel they are
> marked as BPF_PROG_TYPE_KPROBE.
> 
> Another part of this patch is USDT-related feature probing:
>   - BPF cookie support detection from user-space;
>   - detection of kernel support for auto-refcounting of USDT semaphore.
> 
> The latter is optional. If kernel doesn't support such feature and USDT
> doesn't rely on USDT semaphores, no error is returned. But if libbpf
> detects that USDT requires setting semaphores and kernel doesn't support
> this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't
> support poking process's memory directly to increment semaphore value,
> like BCC does on legacy kernels, due to inherent raciness and danger of
> such process memory manipulation. Libbpf let's kernel take care of this
> properly or gives up.
> 
> Logistically, all the extra USDT-related infrastructure of libbpf is put
> into a separate usdt.c file and abstracted behind struct usdt_manager.
> Each bpf_object has lazily-initialized usdt_manager pointer, which is
> only instantiated if USDT programs are attempted to be attached. Closing
> BPF object frees up usdt_manager resources. usdt_manager keeps track of
> USDT spec ID assignment and few other small things.
> 
> Subsequent patches will fill out remaining missing pieces of USDT
> initialization and setup logic.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

again mostly nits and small suggestions below; this is fantastic Andrii!

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>

> ---
>  tools/lib/bpf/Build             |   3 +-
>  tools/lib/bpf/libbpf.c          |  92 ++++++++++-
>  tools/lib/bpf/libbpf.h          |  15 ++
>  tools/lib/bpf/libbpf.map        |   1 +
>  tools/lib/bpf/libbpf_internal.h |  19 +++
>  tools/lib/bpf/usdt.c            | 270 ++++++++++++++++++++++++++++++++
>  6 files changed, 391 insertions(+), 9 deletions(-)
>  create mode 100644 tools/lib/bpf/usdt.c
> 
> diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
> index 94f0a146bb7b..31a1a9015902 100644
> --- a/tools/lib/bpf/Build
> +++ b/tools/lib/bpf/Build
> @@ -1,3 +1,4 @@
>  libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \
>  	    netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o hashmap.o \
> -	    btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o
> +	    btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \
> +	    usdt.o
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 809fe209cdcc..8841499f5f12 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -483,6 +483,8 @@ struct elf_state {
>  	int st_ops_shndx;
>  };
>  
> +struct usdt_manager;
> +
>  struct bpf_object {
>  	char name[BPF_OBJ_NAME_LEN];
>  	char license[64];
> @@ -545,6 +547,8 @@ struct bpf_object {
>  	size_t fd_array_cap;
>  	size_t fd_array_cnt;
>  
> +	struct usdt_manager *usdt_man;
> +
>  	char path[];
>  };
>  
> @@ -4678,6 +4682,18 @@ static int probe_perf_link(void)
>  	return link_fd < 0 && err == -EBADF;
>  }
>  
> +static int probe_kern_bpf_cookie(void)
> +{
> +	struct bpf_insn insns[] = {
> +		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_attach_cookie),
> +		BPF_EXIT_INSN(),
> +	};
> +	int ret, insn_cnt = ARRAY_SIZE(insns);
> +
> +	ret = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL", insns, insn_cnt, NULL);
> +	return probe_fd(ret);
> +}
> +
>  enum kern_feature_result {
>  	FEAT_UNKNOWN = 0,
>  	FEAT_SUPPORTED = 1,
> @@ -4740,6 +4756,9 @@ static struct kern_feature_desc {
>  	[FEAT_MEMCG_ACCOUNT] = {
>  		"memcg-based memory accounting", probe_memcg_account,
>  	},
> +	[FEAT_BPF_COOKIE] = {
> +		"BPF cookie support", probe_kern_bpf_cookie,
> +	},
>  };
>  
>  bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
> @@ -8200,6 +8219,9 @@ void bpf_object__close(struct bpf_object *obj)
>  	if (obj->clear_priv)
>  		obj->clear_priv(obj, obj->priv);
>  
> +	usdt_manager_free(obj->usdt_man);
> +	obj->usdt_man = NULL;
> +
>  	bpf_gen__free(obj->gen_loader);
>  	bpf_object__elf_finish(obj);
>  	bpf_object_unload(obj);
> @@ -8630,6 +8652,7 @@ int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log
>  }
>  
>  static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link);
> +static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link);
>  static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link);
>  static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link);
>  static int attach_trace(const struct bpf_program *prog, long cookie, struct bpf_link **link);
> @@ -8647,6 +8670,7 @@ static const struct bpf_sec_def section_defs[] = {
>  	SEC_DEF("uretprobe/",		KPROBE, 0, SEC_NONE),
>  	SEC_DEF("kprobe.multi/",	KPROBE,	BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
>  	SEC_DEF("kretprobe.multi/",	KPROBE,	BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
> +	SEC_DEF("usdt+",		KPROBE,	0, SEC_NONE, attach_usdt),
>  	SEC_DEF("tc",			SCHED_CLS, 0, SEC_NONE),
>  	SEC_DEF("classifier",		SCHED_CLS, 0, SEC_NONE | SEC_SLOPPY_PFX | SEC_DEPRECATED),
>  	SEC_DEF("action",		SCHED_ACT, 0, SEC_NONE | SEC_SLOPPY_PFX),
> @@ -9692,14 +9716,6 @@ int bpf_prog_load_deprecated(const char *file, enum bpf_prog_type type,
>  	return bpf_prog_load_xattr2(&attr, pobj, prog_fd);
>  }
>  
> -struct bpf_link {
> -	int (*detach)(struct bpf_link *link);
> -	void (*dealloc)(struct bpf_link *link);
> -	char *pin_path;		/* NULL, if not pinned */
> -	int fd;			/* hook FD, -1 if not applicable */
> -	bool disconnected;
> -};
> -
>  /* Replace link's underlying BPF program with the new one */
>  int bpf_link__update_program(struct bpf_link *link, struct bpf_program *prog)
>  {
> @@ -10599,6 +10615,66 @@ struct bpf_link *bpf_program__attach_uprobe(const struct bpf_program *prog,
>  	return bpf_program__attach_uprobe_opts(prog, pid, binary_path, func_offset, &opts);
>  }
>  
> +struct bpf_link *bpf_program__attach_usdt(const struct bpf_program *prog,
> +					  pid_t pid, const char *binary_path,
> +					  const char *usdt_provider, const char *usdt_name,
> +					  const struct bpf_usdt_opts *opts)
> +{
> +	struct bpf_object *obj = prog->obj;
> +	struct bpf_link *link;
> +	long usdt_cookie;
> +	int err;
> +
> +	if (!OPTS_VALID(opts, bpf_uprobe_opts))
> +		return libbpf_err_ptr(-EINVAL);
> +
> +	/* USDT manager is instantiated lazily on first USDT attach. It will
> +	 * be destroyed together with BPF object in bpf_object__close().
> +	 */
> +	if (!obj->usdt_man) {
> +		obj->usdt_man = usdt_manager_new(obj);
> +		if (!obj->usdt_man)
> +			return libbpf_err_ptr(-ENOMEM);
> +	}
> +
> +	usdt_cookie = OPTS_GET(opts, usdt_cookie, 0);
> +	link = usdt_manager_attach_usdt(obj->usdt_man, prog, pid, binary_path,
> +				        usdt_provider, usdt_name, usdt_cookie);
> +	err = libbpf_get_error(link);
> +	if (err)
> +		return libbpf_err_ptr(err);
> +	return link;
> +}
> +
> +static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link)
> +{
> +	char *path = NULL, *provider = NULL, *name = NULL;
> +	const char *sec_name;
> +
> +	sec_name = bpf_program__section_name(prog);
> +	if (strcmp(sec_name, "usdt") == 0) {
> +		/* no auto-attach for just SEC("usdt") */
> +		*link = NULL;
> +		return 0;
> +	}
> +
> +	if (3 != sscanf(sec_name, "usdt/%m[^:]:%m[^:]:%m[^:]", &path, &provider, &name)) {
> +		pr_warn("invalid section '%s', expected SEC(\"usdt/<path>:<provider>:<name>\")\n",
> +			sec_name);

could have an else clause here for the parse success case I suppose to 
save having two sets of free()s.

> +		free(path);
> +		free(provider);
> +		free(name);
> +		return -EINVAL;
> +	}
> +
> +	*link = bpf_program__attach_usdt(prog, -1 /* any process */, path,
> +					 provider, name, NULL);
> +	free(path);
> +	free(provider);
> +	free(name);
> +	return libbpf_get_error(*link);
> +}
> +
>  static int determine_tracepoint_id(const char *tp_category,
>  				   const char *tp_name)
>  {
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 05dde85e19a6..318eecaa14e7 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -503,6 +503,21 @@ bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid,
>  				const char *binary_path, size_t func_offset,
>  				const struct bpf_uprobe_opts *opts);
>  
> +struct bpf_usdt_opts {
> +	/* size of this struct, for forward/backward compatibility */
> +	size_t sz;
> +	/* custom user-provided value accessible through usdt_cookie() */
> +	__u64 usdt_cookie;
> +	size_t :0;
> +};
> +#define bpf_usdt_opts__last_field usdt_cookie
> +

need doc comment here such as

/**
 * @brief **bpf_program__attach_usdt()** is just like
 * bpf_program__attach_uprobe_opts() except it covers
 * USDT (Userspace Static Defined Tracing) attachment.
 *
 * @param prog BPF program to attach
 * @param pid Process ID to attach the uprobe to, 0 for self (own 
process),
 * -1 for all processes
 * @param binary_path Path to binary that contains the USDT probe
 * @param usdt_provider USDT Provider name
 * @param usdt_name USDT Probe name
 * @param opts Options for altering USDT attachment
 * @return Reference to the newly created BPF link; or NULL is returned on 
error,
 * error code is stored in errno
 */


> +LIBBPF_API struct bpf_link *
> +bpf_program__attach_usdt(const struct bpf_program *prog,
> +			 pid_t pid, const char *binary_path,
> +			 const char *usdt_provider, const char *usdt_name,
> +			 const struct bpf_usdt_opts *opts);
> +
>  struct bpf_tracepoint_opts {
>  	/* size of this struct, for forward/backward compatiblity */
>  	size_t sz;
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index dd35ee58bfaa..82f6d62176dd 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -444,6 +444,7 @@ LIBBPF_0.8.0 {
>  	global:
>  		bpf_object__destroy_subskeleton;
>  		bpf_object__open_subskeleton;
> +		bpf_program__attach_usdt;
>  		libbpf_register_prog_handler;
>  		libbpf_unregister_prog_handler;
>  		bpf_program__attach_kprobe_multi_opts;
> diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
> index b6247dc7f8eb..dd0d4ccfa649 100644
> --- a/tools/lib/bpf/libbpf_internal.h
> +++ b/tools/lib/bpf/libbpf_internal.h
> @@ -148,6 +148,15 @@ do {				\
>  #ifndef __has_builtin
>  #define __has_builtin(x) 0
>  #endif
> +
> +struct bpf_link {
> +	int (*detach)(struct bpf_link *link);
> +	void (*dealloc)(struct bpf_link *link);
> +	char *pin_path;		/* NULL, if not pinned */
> +	int fd;			/* hook FD, -1 if not applicable */
> +	bool disconnected;
> +};
> +
>  /*
>   * Re-implement glibc's reallocarray() for libbpf internal-only use.
>   * reallocarray(), unfortunately, is not available in all versions of glibc,
> @@ -329,6 +338,8 @@ enum kern_feature_id {
>  	FEAT_BTF_TYPE_TAG,
>  	/* memcg-based accounting for BPF maps and progs */
>  	FEAT_MEMCG_ACCOUNT,
> +	/* BPF cookie (bpf_get_attach_cookie() BPF helper) support */
> +	FEAT_BPF_COOKIE,
>  	__FEAT_CNT,
>  };
>  
> @@ -543,4 +554,12 @@ int bpf_core_add_cands(struct bpf_core_cand *local_cand,
>  		       struct bpf_core_cand_list *cands);
>  void bpf_core_free_cands(struct bpf_core_cand_list *cands);
>  
> +struct usdt_manager *usdt_manager_new(struct bpf_object *obj);
> +void usdt_manager_free(struct usdt_manager *man);
> +struct bpf_link * usdt_manager_attach_usdt(struct usdt_manager *man,
> +					   const struct bpf_program *prog,
> +					   pid_t pid, const char *path,
> +					   const char *usdt_provider, const char *usdt_name,
> +					   long usdt_cookie);
> +
>  #endif /* __LIBBPF_LIBBPF_INTERNAL_H */
> diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> new file mode 100644
> index 000000000000..8481e300598e
> --- /dev/null
> +++ b/tools/lib/bpf/usdt.c
> @@ -0,0 +1,270 @@
> +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +#include <ctype.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <libelf.h>
> +#include <gelf.h>
> +#include <unistd.h>
> +#include <linux/ptrace.h>
> +#include <linux/kernel.h>
> +
> +#include "bpf.h"
> +#include "libbpf.h"
> +#include "libbpf_common.h"
> +#include "libbpf_internal.h"
> +#include "hashmap.h"
> +
> +#define PERF_UPROBE_REF_CTR_OFFSET_SHIFT 32
> +
> +struct usdt_target {
> +	long abs_ip;
> +	long rel_ip;
> +	long sema_off;
> +};
> +
> +struct usdt_manager {
> +	struct bpf_map *specs_map;
> +	struct bpf_map *ip_to_id_map;
> +
> +	bool has_bpf_cookie;
> +	bool has_sema_refcnt;
> +};
> +
> +struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
> +{
> +	static const char *ref_ctr_sysfs_path = "/sys/bus/event_source/devices/uprobe/format/ref_ctr_offset";

probably deserves a #define, and that would get us under the 100 char 
limit too..

> +	struct usdt_manager *man;
> +	struct bpf_map *specs_map, *ip_to_id_map;
> +
> +	specs_map = bpf_object__find_map_by_name(obj, "__bpf_usdt_specs");
> +	ip_to_id_map = bpf_object__find_map_by_name(obj, "__bpf_usdt_specs_ip_to_id");
> +	if (!specs_map || !ip_to_id_map) {
> +		pr_warn("usdt: failed to find USDT support BPF maps, did you forget to include bpf/usdt.bpf.h?\n");

nice, I like the fact the error message also tells you how to fix it!

> +		return NULL;
> +	}
> +
> +	man = calloc(1, sizeof(*man));
> +	if (!man)
> +		return NULL;
> +
> +	man->specs_map = specs_map;
> +	man->ip_to_id_map = ip_to_id_map;
> +
> +        /* Detect if BPF cookie is supported for kprobes.
> +	 * We don't need IP-to-ID mapping if we can use BPF cookies.
> +         * Added in: 7adfc6c9b315 ("bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value")
> +         */
> +	man->has_bpf_cookie = kernel_supports(obj, FEAT_BPF_COOKIE);
> +
> +	/* Detect kernel support for automatic refcounting of USDT semaphore.
> +	 * If this is not supported, USDTs with semaphores will not be supported.
> +	 * Added in: a6ca88b241d5 ("trace_uprobe: support reference counter in fd-based uprobe")
> +	 */
> +	man->has_sema_refcnt = access(ref_ctr_sysfs_path, F_OK) == 0;
> +
> +	return man;
> +}
> +
> +void usdt_manager_free(struct usdt_manager *man)
> +{
> +	if (!man)
> +		return;
> +
> +	free(man);
> +}
> +
> +static int sanity_check_usdt_elf(Elf *elf, const char *path)
> +{
> +	GElf_Ehdr ehdr;
> +	int endianness;
> +
> +	if (elf_kind(elf) != ELF_K_ELF) {
> +		pr_warn("usdt: unrecognized ELF kind %d for '%s'\n", elf_kind(elf), path);
> +		return -EBADF;
> +	}
> +
> +	switch (gelf_getclass(elf)) {
> +	case ELFCLASS64:
> +		if (sizeof(void *) != 8) {
> +			pr_warn("usdt: attaching to 64-bit ELF binary '%s' is not supported\n", path);
> +			return -EBADF;
> +		}
> +		break;
> +	case ELFCLASS32:
> +		if (sizeof(void *) != 4) {
> +			pr_warn("usdt: attaching to 32-bit ELF binary '%s' is not supported\n", path);
> +			return -EBADF;
> +		}
> +		break;
> +	default:
> +		pr_warn("usdt: unsupported ELF class for '%s'\n", path);
> +		return -EBADF;
> +	}
> +
> +	if (!gelf_getehdr(elf, &ehdr))
> +		return -EINVAL;
> +
> +	if (ehdr.e_type != ET_EXEC && ehdr.e_type != ET_DYN) {
> +		pr_warn("usdt: unsupported type of ELF binary '%s' (%d), only ET_EXEC and ET_DYN are supported\n",
> +			path, ehdr.e_type);
> +		return -EBADF;
> +	}
> +
> +#if __BYTE_ORDER == __LITTLE_ENDIAN
> +	endianness = ELFDATA2LSB;
> +#elif __BYTE_ORDER == __BIG_ENDIAN
> +	endianness = ELFDATA2MSB;
> +#else
> +# error "Unrecognized __BYTE_ORDER__"
> +#endif
> +	if (endianness != ehdr.e_ident[EI_DATA]) {
> +		pr_warn("usdt: ELF endianness mismatch for '%s'\n", path);
> +		return -EBADF;
> +	}
> +
> +	return 0;
> +}
> +

these sanity checks are great.

> +static int collect_usdt_targets(struct usdt_manager *man, Elf *elf, const char *path, pid_t pid,
> +				const char *usdt_provider, const char *usdt_name, long usdt_cookie,
> +				struct usdt_target **out_targets, size_t *out_target_cnt)
> +{
> +	return -ENOTSUP;
> +}
> +
> +struct bpf_link_usdt {
> +	struct bpf_link link;
> +
> +	struct usdt_manager *usdt_man;
> +
> +	size_t uprobe_cnt;
> +	struct {
> +		long abs_ip;
> +		struct bpf_link *link;
> +	} *uprobes;
> +};
> +
> +static int bpf_link_usdt_detach(struct bpf_link *link)
> +{
> +	struct bpf_link_usdt *usdt_link = container_of(link, struct bpf_link_usdt, link);
> +	int i;
> +
> +	for (i = 0; i < usdt_link->uprobe_cnt; i++) {
> +		/* detach underlying uprobe link */
> +		bpf_link__destroy(usdt_link->uprobes[i].link);
> +	}
> +
> +	return 0;
> +}
> +
> +static void bpf_link_usdt_dealloc(struct bpf_link *link)
> +{
> +	struct bpf_link_usdt *usdt_link = container_of(link, struct bpf_link_usdt, link);
> +
> +	free(usdt_link->uprobes);
> +	free(usdt_link);
> +}
> +
> +struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct bpf_program *prog,
> +					  pid_t pid, const char *path,
> +					  const char *usdt_provider, const char *usdt_name,
> +					  long usdt_cookie)
> +{
> +	int i, fd, err;
> +	LIBBPF_OPTS(bpf_uprobe_opts, opts);
> +	struct bpf_link_usdt *link = NULL;
> +	struct usdt_target *targets = NULL;
> +	size_t target_cnt;
> +	Elf *elf;

Thought we should probably init elf to NULL, though I see we don't goto 
err_out except in cases where it's been explicitly set.

> +
> +	if (bpf_program__fd(prog) < 0) {
> +		pr_warn("prog '%s': can't attach BPF program w/o FD (did you load it?)\n",

nit: might be no harm "w/o" to expand to "without", and prefix with usdt: 
as below..

> +			bpf_program__name(prog));
> +		return libbpf_err_ptr(-EINVAL);
> +	}
> +
> +	/* TODO: perform path resolution similar to uprobe's */
> +	fd = open(path, O_RDONLY);
> +	if (fd < 0) {
> +		err = -errno;
> +		pr_warn("usdt: failed to open ELF binary '%s': %d\n", path, err);
> +		return libbpf_err_ptr(err);
> +	}
> +
> +	elf = elf_begin(fd, ELF_C_READ_MMAP, NULL);
> +	if (!elf) {
> +		err = -EBADF;
> +		pr_warn("usdt: failed to parse ELF binary '%s': %s\n", path, elf_errmsg(-1));
> +		goto err_out;
> +	}
> +
> +	err = sanity_check_usdt_elf(elf, path);
> +	if (err)
> +		goto err_out;
> +
> +	/* normalize PID filter */
> +	if (pid < 0)
> +		pid = -1;
> +	else if (pid == 0)
> +		pid = getpid();
> +
> +	/* discover USDT in given binary, optionally limiting
> +	 * activations to a given PID, if pid > 0
> +	 */
> +	err = collect_usdt_targets(man, elf, path, pid, usdt_provider, usdt_name,
> +				   usdt_cookie, &targets, &target_cnt);
> +	if (err <= 0) {

we haven't filled out collect_usdt_targets() yet, but might be no harm to 
have a pr_debug() here "usdt: cannot collect USDT targets for ..." since 
there are a few cases without warnings in the later patch.

> +		err = (err == 0) ? -ENOENT : err;
> +		goto err_out;
> +	}
> +
> +	link = calloc(1, sizeof(*link));
> +	if (!link) {
> +		err = -ENOMEM;
> +		goto err_out;
> +	}
> +
> +	link->usdt_man = man;
> +	link->link.detach = &bpf_link_usdt_detach;
> +	link->link.dealloc = &bpf_link_usdt_dealloc;
> +
> +	link->uprobes = calloc(target_cnt, sizeof(*link->uprobes));
> +	if (!link->uprobes) {
> +		err = -ENOMEM;
> +		goto err_out;
> +	}
> +
> +	for (i = 0; i < target_cnt; i++) {
> +		struct usdt_target *target = &targets[i];
> +		struct bpf_link *uprobe_link;
> +
> +		opts.ref_ctr_offset = target->sema_off;
> +		uprobe_link = bpf_program__attach_uprobe_opts(prog, pid, path,
> +							      target->rel_ip, &opts);
> +		err = libbpf_get_error(link);
> +		if (err) {
> +			pr_warn("usdt: failed to attach uprobe #%d for '%s:%s' in '%s': %d\n",
> +				i, usdt_provider, usdt_name, path, err);
> +			goto err_out;
> +		}
> +
> +		link->uprobes[i].link = uprobe_link;
> +		link->uprobes[i].abs_ip = target->abs_ip;
> +		link->uprobe_cnt++;
> +	}
> +
> +	elf_end(elf);
> +	close(fd);
> +
> +	return &link->link;
> +
> +err_out:
> +	bpf_link__destroy(&link->link);
> +
> +	if (elf)
> +		elf_end(elf);
> +	close(fd);
> +	return libbpf_err_ptr(err);
> +}
> -- 
> 2.30.2
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 3/7] libbpf: add USDT notes parsing and resolution logic
  2022-03-25  5:29 ` [PATCH bpf-next 3/7] libbpf: add USDT notes parsing and resolution logic Andrii Nakryiko
@ 2022-03-31 13:37   ` Alan Maguire
  2022-03-31 19:13     ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Alan Maguire @ 2022-03-31 13:37 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, kernel-team, Alan Maguire, Dave Marchevsky

On Fri, 25 Mar 2022, Andrii Nakryiko wrote:

> Implement architecture-agnostic parts of USDT parsing logic. The code is
> the documentation in this case, it's futile to try to succinctly
> describe how USDT parsing is done in any sort of concreteness. But
> still, USDTs are recorded in special ELF notes section (.note.stapsdt),
> where each USDT call site is described separately. Along with USDT
> provider and USDT name, each such note contains USDT argument
> specification, which uses assembly-like syntax to describe how to fetch
> value of USDT argument. USDT arg spec could be just a constant, or
> a register, or a register dereference (most common cases in x86_64), but
> it technically can be much more complicated cases, like offset relative
> to global symbol and stuff like that. One of the later patches will
> implement most common subset of this for x86 and x86-64 architectures,
> which seems to handle a lot of real-world production application.
> 
> USDT arg spec contains a compact encoding allowing usdt.bpf.h from
> previous patch to handle the above 3 cases. Instead of recording which
> register might be needed, we encode register's offset within struct
> pt_regs to simplify BPF-side implementation. USDT argument can be of
> different byte sizes (1, 2, 4, and 8) and signed or unsigned. To handle
> this, libbpf pre-calculates necessary bit shifts to do proper casting
> and sign-extension in a short sequences of left and right shifts.
> 
> The rest is in the code with sometimes extensive comments and references
> to external "documentation" for USDTs.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>

nothing major below, might be no harm to use a common header for
some definitions for usdt.bpf.h and usdt.c..

> ---
>  tools/lib/bpf/usdt.c | 581 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 580 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> index 8481e300598e..86d5d8390eb1 100644
> --- a/tools/lib/bpf/usdt.c
> +++ b/tools/lib/bpf/usdt.c
> @@ -18,10 +18,56 @@
>  
>  #define PERF_UPROBE_REF_CTR_OFFSET_SHIFT 32
>  

unused?

> +#define USDT_BASE_SEC ".stapsdt.base"
> +#define USDT_SEMA_SEC ".probes"

unused?

> +#define USDT_NOTE_SEC  ".note.stapsdt"
> +#define USDT_NOTE_TYPE 3
> +#define USDT_NOTE_NAME "stapsdt"
> +
> +/* should match exactly enum __bpf_usdt_arg_type from bpf_usdt.bpf.h */
> +enum usdt_arg_type {
> +	USDT_ARG_CONST,
> +	USDT_ARG_REG,
> +	USDT_ARG_REG_DEREF,
> +};
> +
> +/* should match exactly struct __bpf_usdt_arg_spec from bpf_usdt.bpf.h */
> +struct usdt_arg_spec {
> +	__u64 val_off;
> +	enum usdt_arg_type arg_type;
> +	short reg_off;
> +	bool arg_signed;
> +	char arg_bitshift;
> +};
> +
> +/* should match BPF_USDT_MAX_ARG_CNT in usdt.bpf.h */
> +#define USDT_MAX_ARG_CNT 12
> +
> +/* should match struct __bpf_usdt_spec from usdt.bpf.h */
> +struct usdt_spec {
> +	struct usdt_arg_spec args[USDT_MAX_ARG_CNT];
> +	__u64 usdt_cookie;
> +	short arg_cnt;
> +};
> +

Would it be worth having a usdt.h that both usdt.bpf.h and usdt.c could 
#include, containing the above definitions, avoiding need to sync?

> +struct usdt_note {
> +	const char *provider;
> +	const char *name;
> +	/* USDT args specification string, e.g.:
> +	 * "-4@%esi -4@-24(%rbp) -4@%ecx 2@%ax 8@%rdx"
> +	 */
> +	const char *args;
> +	long loc_addr;
> +	long base_addr;
> +	long sema_addr;
> +};
> +
>  struct usdt_target {
>  	long abs_ip;
>  	long rel_ip;
>  	long sema_off;
> +	struct usdt_spec spec;
> +	const char *spec_str;
>  };
>  
>  struct usdt_manager {
> @@ -127,11 +173,449 @@ static int sanity_check_usdt_elf(Elf *elf, const char *path)
>  	return 0;
>  }
>  
> +static int find_elf_sec_by_name(Elf *elf, const char *sec_name, GElf_Shdr *shdr, Elf_Scn **scn)
> +{
> +	Elf_Scn *sec = NULL;
> +	size_t shstrndx;
> +
> +	if (elf_getshdrstrndx(elf, &shstrndx))
> +		return -EINVAL;
> +
> +	/* check if ELF is corrupted and avoid calling elf_strptr if yes */
> +	if (!elf_rawdata(elf_getscn(elf, shstrndx), NULL))
> +		return -EINVAL;
> +
> +	while ((sec = elf_nextscn(elf, sec)) != NULL) {
> +		char *name;
> +
> +		if (!gelf_getshdr(sec, shdr))
> +			return -EINVAL;
> +
> +		name = elf_strptr(elf, shstrndx, shdr->sh_name);
> +		if (name && strcmp(sec_name, name) == 0) {
> +			*scn = sec;
> +			return 0;
> +		}
> +	}
> +
> +	return -ENOENT;
> +}
> +
> +struct elf_seg {
> +	long start;
> +	long end;
> +	long offset;
> +	bool is_exec;
> +};
> +
> +static int cmp_elf_segs(const void *_a, const void *_b)
> +{
> +	const struct elf_seg *a = _a;
> +	const struct elf_seg *b = _b;
> +
> +	return a->start < b->start ? -1 : 1;
> +}
> +
> +static int parse_elf_segs(Elf *elf, const char *path, struct elf_seg **segs, size_t *seg_cnt)
> +{
> +	GElf_Phdr phdr;
> +	size_t n;
> +	int i, err;
> +	struct elf_seg *seg;
> +	void *tmp;
> +
> +	*seg_cnt = 0;
> +
> +	if (elf_getphdrnum(elf, &n)) {
> +		err = -errno;
> +		return err;
> +	}
> +
> +	for (i = 0; i < n; i++) {
> +		if (!gelf_getphdr(elf, i, &phdr)) {
> +			err = -errno;
> +			return err;
> +		}
> +
> +		pr_debug("usdt: discovered PHDR #%d in '%s': vaddr 0x%lx memsz 0x%lx offset 0x%lx type 0x%lx flags 0x%lx\n",
> +			 i, path, (long)phdr.p_vaddr, (long)phdr.p_memsz, (long)phdr.p_offset,
> +			 (long)phdr.p_type, (long)phdr.p_flags);
> +		if (phdr.p_type != PT_LOAD)
> +			continue;
> +
> +		tmp = libbpf_reallocarray(*segs, *seg_cnt + 1, sizeof(**segs));
> +		if (!tmp)
> +			return -ENOMEM;
> +
> +		*segs = tmp;
> +		seg = *segs + *seg_cnt;
> +		(*seg_cnt)++;
> +
> +		seg->start = phdr.p_vaddr;
> +		seg->end = phdr.p_vaddr + phdr.p_memsz;
> +		seg->offset = phdr.p_offset;
> +		seg->is_exec = phdr.p_flags & PF_X;
> +	}
> +
> +	if (*seg_cnt == 0) {
> +		pr_warn("usdt: failed to find PT_LOAD program headers in '%s'\n", path);
> +		return -ESRCH;
> +	}
> +
> +	qsort(*segs, *seg_cnt, sizeof(**segs), cmp_elf_segs);
> +	return 0;
> +}
> +
> +static int parse_lib_segs(int pid, const char *lib_path, struct elf_seg **segs, size_t *seg_cnt)
> +{
> +	char path[PATH_MAX], line[PATH_MAX], mode[16];
> +	size_t seg_start, seg_end, seg_off;
> +	struct elf_seg *seg;
> +	int tmp_pid, i, err;
> +	FILE *f;
> +
> +	*seg_cnt = 0;
> +
> +	/* Handle containerized binaries only accessible from
> +	 * /proc/<pid>/root/<path>. They will be reported as just /<path> in
> +	 * /proc/<pid>/maps.
> +	 */
> +	if (sscanf(lib_path, "/proc/%d/root%s", &tmp_pid, path) == 2 && pid == tmp_pid)
> +		goto proceed;
> +
> +	if (!realpath(lib_path, path)) {
> +		pr_warn("usdt: failed to get absolute path of '%s' (err %d), using path as is...\n",
> +			lib_path, -errno);
> +		strcpy(path, lib_path);
> +	}
> +
> +proceed:
> +	sprintf(line, "/proc/%d/maps", pid);
> +	f = fopen(line, "r");
> +	if (!f) {
> +		err = -errno;
> +		pr_warn("usdt: failed to open '%s' to get base addr of '%s': %d\n",
> +			line, lib_path, err);
> +		return err;
> +	}
> +
> +	/* We need to handle lines with no path at the end:
> +	 *
> +	 * 7f5c6f5d1000-7f5c6f5d3000 rw-p 001c7000 08:04 21238613      /usr/lib64/libc-2.17.so
> +	 * 7f5c6f5d3000-7f5c6f5d8000 rw-p 00000000 00:00 0
> +	 * 7f5c6f5d8000-7f5c6f5d9000 r-xp 00000000 103:01 362990598    /data/users/andriin/linux/tools/bpf/usdt/libhello_usdt.so
> +	 */
> +	while (fscanf(f, "%zx-%zx %s %zx %*s %*d%[^\n]\n",
> +		      &seg_start, &seg_end, mode, &seg_off, line) == 5) {
> +		void *tmp;
> +
> +		/* to handle no path case (see above) we need to capture line
> +		 * without skipping any whitespaces. So we need to strip
> +		 * leading whitespaces manually here
> +		 */
> +		i = 0;
> +		while (isblank(line[i]))
> +			i++;
> +		if (strcmp(line + i, path) != 0)
> +			continue;
> +
> +		pr_debug("usdt: discovered segment for lib '%s': addrs %zx-%zx mode %s offset %zx\n",
> +			 path, seg_start, seg_end, mode, seg_off);
> +
> +		/* ignore non-executable sections for shared libs */
> +		if (mode[2] != 'x')
> +			continue;
> +
> +		tmp = libbpf_reallocarray(*segs, *seg_cnt + 1, sizeof(**segs));
> +		if (!tmp) {
> +			err = -ENOMEM;
> +			goto err_out;
> +		}
> +
> +		*segs = tmp;
> +		seg = *segs + *seg_cnt;
> +		*seg_cnt += 1;
> +
> +		seg->start = seg_start;
> +		seg->end = seg_end;
> +		seg->offset = seg_off;
> +		seg->is_exec = true;
> +	}
> +
> +	if (*seg_cnt == 0) {
> +		pr_warn("usdt: failed to find '%s' (resolved to '%s') within PID %d memory mappings\n",
> +			lib_path, path, pid);
> +		err = -ESRCH;
> +		goto err_out;
> +	}
> +
> +	qsort(*segs, *seg_cnt, sizeof(**segs), cmp_elf_segs);
> +	err = 0;
> +err_out:
> +	fclose(f);
> +	return err;
> +}
> +
> +static struct elf_seg *find_elf_seg(struct elf_seg *segs, size_t seg_cnt, long addr, bool relative)
> +{
> +	struct elf_seg *seg;
> +	int i;
> +
> +	if (relative) {
> +		/* for shared libraries, address is relative offset and thus
> +		 * should be fall within logical offset-based range of
> +		 * [offset_start, offset_end)
> +		 */
> +		for (i = 0, seg = segs; i < seg_cnt; i++, seg++) {
> +			if (seg->offset <= addr && addr < seg->offset + (seg->end - seg->start))
> +				return seg;
> +		}
> +	} else {
> +		/* for binaries, address is absolute and thus should be within
> +		 * absolute address range of [seg_start, seg_end)
> +		 */
> +		for (i = 0, seg = segs; i < seg_cnt; i++, seg++) {
> +			if (seg->start <= addr && addr < seg->end)
> +				return seg;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +static int parse_usdt_note(Elf *elf, const char *path, long base_addr,
> +			   GElf_Nhdr *nhdr, const char *data, size_t name_off, size_t desc_off,
> +			   struct usdt_note *usdt_note);
> +
> +static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note, long usdt_cookie);
> +
>  static int collect_usdt_targets(struct usdt_manager *man, Elf *elf, const char *path, pid_t pid,
>  				const char *usdt_provider, const char *usdt_name, long usdt_cookie,
>  				struct usdt_target **out_targets, size_t *out_target_cnt)
>  {
> -	return -ENOTSUP;
> +	size_t off, name_off, desc_off, seg_cnt = 0, lib_seg_cnt = 0, target_cnt = 0;
> +	struct elf_seg *segs = NULL, *lib_segs = NULL;
> +	struct usdt_target *targets = NULL, *target;
> +	long base_addr = 0;
> +	Elf_Scn *notes_scn, *base_scn;
> +	GElf_Shdr base_shdr, notes_shdr;
> +	GElf_Ehdr ehdr;
> +	GElf_Nhdr nhdr;
> +	Elf_Data *data;
> +	int err;
> +
> +	*out_targets = NULL;
> +	*out_target_cnt = 0;
> +
> +	err = find_elf_sec_by_name(elf, USDT_NOTE_SEC, &notes_shdr, &notes_scn);
> +	if (err)

since find_elf_sec_by_name() doesn't log anything, would be good to have a 
pr_warn("usdt: no " USDT_NOTE_SEC " section in '%s'", path);
> +		return err;
> +
> +	if (notes_shdr.sh_type != SHT_NOTE)
> +		return -EINVAL;
> +
> +	if (!gelf_getehdr(elf, &ehdr))
> +		return -EINVAL;
> +

the above two are unlikely, but could perhaps benefit from an error 
message like below..

> +	err = parse_elf_segs(elf, path, &segs, &seg_cnt);
> +	if (err) {
> +		pr_warn("usdt: failed to process ELF program segments for '%s': %d\n", path, err);
> +		goto err_out;
> +	}
> +
> +	/* .stapsdt.base ELF section is optional, but is used for prelink
> +	 * offset compensation (see a big comment further below)
> +	 */
> +	if (find_elf_sec_by_name(elf, USDT_BASE_SEC, &base_shdr, &base_scn) == 0)
> +		base_addr = base_shdr.sh_addr;
> +
> +	data = elf_getdata(notes_scn, 0);
> +	off = 0;
> +	while ((off = gelf_getnote(data, off, &nhdr, &name_off, &desc_off)) > 0) {
> +		long usdt_abs_ip, usdt_rel_ip, usdt_sema_off = 0;
> +		struct usdt_note note;
> +		struct elf_seg *seg = NULL;
> +		void *tmp;
> +
> +		err = parse_usdt_note(elf, path, base_addr, &nhdr,
> +				      data->d_buf, name_off, desc_off, &note);
> +		if (err)
> +			goto err_out;
> +
> +		if (strcmp(note.provider, usdt_provider) != 0 || strcmp(note.name, usdt_name) != 0)
> +			continue;
> +
> +		/* We need to compensate "prelink effect". See [0] for details,
> +		 * relevant parts quoted here:
> +		 *
> +		 * Each SDT probe also expands into a non-allocated ELF note. You can
> +		 * find this by looking at SHT_NOTE sections and decoding the format;
> +		 * see below for details. Because the note is non-allocated, it means
> +		 * there is no runtime cost, and also preserved in both stripped files
> +		 * and .debug files.
> +		 *
> +		 * However, this means that prelink won't adjust the note's contents
> +		 * for address offsets. Instead, this is done via the .stapsdt.base
> +		 * section. This is a special section that is added to the text. We
> +		 * will only ever have one of these sections in a final link and it
> +		 * will only ever be one byte long. Nothing about this section itself
> +		 * matters, we just use it as a marker to detect prelink address
> +		 * adjustments.
> +		 *
> +		 * Each probe note records the link-time address of the .stapsdt.base
> +		 * section alongside the probe PC address. The decoder compares the
> +		 * base address stored in the note with the .stapsdt.base section's
> +		 * sh_addr. Initially these are the same, but the section header will
> +		 * be adjusted by prelink. So the decoder applies the difference to
> +		 * the probe PC address to get the correct prelinked PC address; the
> +		 * same adjustment is applied to the semaphore address, if any. 
> +		 *
> +		 *   [0] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> +		 */

ouch. nice explanation!

> +		usdt_rel_ip = usdt_abs_ip = note.loc_addr;
> +		if (base_addr) {
> +			usdt_abs_ip += base_addr - note.base_addr;
> +			usdt_rel_ip += base_addr - note.base_addr;
> +		}
> +
> +		if (ehdr.e_type == ET_EXEC) {

should we use a bool is_shared_library here; might simplify debug 
messaging below...

> +			/* When attaching uprobes (which what USDTs basically
> +			 * are) kernel expects a relative IP to be specified,
> +			 * so if we are attaching to an executable ELF binary
> +			 * (i.e., not a shared library), we need to calculate
> +			 * proper relative IP based on ELF's load address
> +			 */
> +			seg = find_elf_seg(segs, seg_cnt, usdt_abs_ip, false /* relative */);
> +			if (!seg) {
> +				err = -ESRCH;
> +				pr_warn("usdt: failed to find ELF program segment for '%s:%s' in '%s' at IP 0x%lx\n",
> +					usdt_provider, usdt_name, path, usdt_abs_ip);
> +				goto err_out;
> +			}
> +			if (!seg->is_exec) {
> +				err = -ESRCH;
> +				pr_warn("usdt: matched ELF binary '%s' segment [0x%lx, 0x%lx) for '%s:%s' at IP 0x%lx is not executable\n",
> +				        path, seg->start, seg->end, usdt_provider, usdt_name,
> +					usdt_abs_ip);
> +				goto err_out;
> +			}
> +
> +			usdt_rel_ip = usdt_abs_ip - (seg->start - seg->offset);
> +		} else if (!man->has_bpf_cookie) { /* ehdr.e_type == ET_DYN */
> +			/* If we don't have BPF cookie support but need to
> +			 * attach to a shared library, we'll need to know and
> +			 * record absolute addresses of attach points due to
> +			 * the need to lookup USDT spec by absolute IP of
> +			 * triggered uprobe. Doing this resolution is only
> +			 * possible when we have a specific PID of the process
> +			 * that's using specified shared library. BPF cookie
> +			 * removes the absolute address limitation as we don't
> +			 * need to do this lookup (we just use BPF cookie as
> +			 * an index of USDT spec), so for newer kernels with
> +			 * BPF cookie support libbpf supports USDT attachment
> +			 * to shared libraries with no PID filter.
> +			 */
> +			if (pid < 0) {
> +				pr_warn("usdt: attaching to shared libaries without specific PID is not supported on current kernel\n");
> +				err = -ENOTSUP;
> +				goto err_out;
> +			}
> +
> +			/* lib_segs are lazily initialized only if necessary */
> +			if (lib_seg_cnt == 0) {
> +				err = parse_lib_segs(pid, path, &lib_segs, &lib_seg_cnt);
> +				if (err) {
> +					pr_warn("usdt: failed to get memory segments in PID %d for shared library '%s': %d\n",
> +						pid, path, err);
> +					goto err_out;
> +				}
> +			}
> +
> +			seg = find_elf_seg(lib_segs, lib_seg_cnt, usdt_rel_ip, true /* relative */);
> +			if (!seg) {
> +				err = -ESRCH;
> +				pr_warn("usdt: failed to find shared lib memory segment for '%s:%s' in '%s' at relative IP 0x%lx\n",
> +				         usdt_provider, usdt_name, path, usdt_rel_ip);
> +				goto err_out;
> +			}
> +
> +			usdt_abs_ip = seg->start + (usdt_rel_ip - seg->offset);
> +		}
> +
> +		pr_debug("usdt: probe for '%s:%s' in %s '%s': addr 0x%lx base 0x%lx (resolved abs_ip 0x%lx rel_ip 0x%lx) args '%s' in segment [0x%lx, 0x%lx) at offset 0x%lx\n",
> +			 usdt_provider, usdt_name, ehdr.e_type == ET_EXEC ? "exec" : "lib ", path,
> +			 note.loc_addr, note.base_addr, usdt_abs_ip, usdt_rel_ip, note.args,
> +			 seg ? seg->start : 0, seg ? seg->end : 0, seg ? seg->offset : 0);
> +
> +		/* Adjust semaphore address to be a relative offset */
> +		if (note.sema_addr) {
> +			if (!man->has_sema_refcnt) {
> +				pr_warn("usdt: kernel doesn't support USDT semaphore refcounting for '%s:%s' in '%s'\n",
> +					usdt_provider, usdt_name, path);
> +				err = -ENOTSUP;
> +				goto err_out;
> +			}
> +
> +			seg = find_elf_seg(segs, seg_cnt, note.sema_addr, false /* relative */);
> +			if (!seg) {
> +				err = -ESRCH;
> +				pr_warn("usdt: failed to find ELF loadable segment with semaphore of '%s:%s' in '%s' at 0x%lx\n",
> +				        usdt_provider, usdt_name, path, note.sema_addr);
> +				goto err_out;
> +			}
> +			if (seg->is_exec) {
> +				err = -ESRCH;
> +				pr_warn("usdt: matched ELF binary '%s' segment [0x%lx, 0x%lx] for semaphore of '%s:%s' at 0x%lx is executable\n",
> +					path, seg->start, seg->end, usdt_provider, usdt_name,
> +					note.sema_addr);
> +				goto err_out;
> +			}
> +

could have a bool "exec" arg to find_elf_seg() which allows/disallows the 
segment to be executable I guess.

Alan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 4/7] libbpf: wire up spec management and other arch-independent USDT logic
  2022-03-25  5:29 ` [PATCH bpf-next 4/7] libbpf: wire up spec management and other arch-independent USDT logic Andrii Nakryiko
@ 2022-03-31 14:49   ` Alan Maguire
  2022-03-31 19:16     ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Alan Maguire @ 2022-03-31 14:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, kernel-team, Alan Maguire, Dave Marchevsky

On Fri, 25 Mar 2022, Andrii Nakryiko wrote:

> Last part of architecture-agnostic user-space USDT handling logic is to
> set up BPF spec and, optionally, IP-to-ID maps from user-space.
> usdt_manager performs a compact spec ID allocation to utilize
> fixed-sized BPF maps as efficiently as possible. We also use hashmap to
> deduplicate USDT arg spec strings and map identical strings to single
> USDT spec, minimizing the necessary BPF map size. usdt_manager supports
> arbitrary sequences of attachment and detachment, both of the same USDT
> and multiple different USDTs and internally maintains a free list of
> unused spec IDs. bpf_link_usdt's logic is extended with proper setup and
> teardown of this spec ID free list and supporting BPF maps.
> 

It might be good to describe the relationship between a USDT specification
(spec) and the site specific targets that can be associated with it.  So 
the spec is the description of the provider + name + args, and the the 
target represents the potentially multiple sites associated with that 
spec.

Specs are stored in the spec array map, indexed by spec_id; targets are
stored in the ip_map, and these reference a spec id.  So from the BPF side 
we can use the bpf_cookie to look up the spec directly, or if cookies are
not supported on the BPF side, we can look up ip -> spec_id mapping in 
ip_map, and from there can look up the spec_id -> spec in the spec map.

Dumb question here: the spec id recycling is a lot of work; 
instead of maintaining this for the array map, couldn't we use a hashmap 
for spec ids with a monotonically-increasing next_spec_id value or
something similar?

> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

one suggestion below, but

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>

> ---
>  tools/lib/bpf/usdt.c | 167 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 166 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> index 86d5d8390eb1..22f5f56992f8 100644
> --- a/tools/lib/bpf/usdt.c
> +++ b/tools/lib/bpf/usdt.c

<snip>

>  		opts.ref_ctr_offset = target->sema_off;
> +		opts.bpf_cookie = man->has_bpf_cookie ? spec_id : 0;
>  		uprobe_link = bpf_program__attach_uprobe_opts(prog, pid, path,
>  							      target->rel_ip, &opts);
>  		err = libbpf_get_error(link);

should be uprobe_link I think.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 5/7] libbpf: add x86-specific USDT arg spec parsing logic
  2022-03-25  5:29 ` [PATCH bpf-next 5/7] libbpf: add x86-specific USDT arg spec parsing logic Andrii Nakryiko
@ 2022-03-31 15:13   ` Alan Maguire
  2022-03-31 19:20     ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Alan Maguire @ 2022-03-31 15:13 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, kernel-team, Alan Maguire, Dave Marchevsky

On Fri, 25 Mar 2022, Andrii Nakryiko wrote:

> Add x86/x86_64-specific USDT argument specification parsing. Each
> architecture will require their own logic, as all this is arch-specific
> assembly-based notation. Architectures that libbpf doesn't support for
> USDTs will pr_warn() with specific error and return -ENOTSUP.
> 
> We use sscanf() as a very powerful and easy to use string parser. Those
> spaces in sscanf's format string mean "skip any whitespaces", which is
> pretty nifty (and somewhat little known) feature.
> 
> All this was tested on little-endian architecture, so bit shifts are
> probably off on big-endian, which our CI will hopefully prove.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>

minor stuff below...

> ---
>  tools/lib/bpf/usdt.c | 105 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 105 insertions(+)
> 
> diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> index 22f5f56992f8..5cf809db60aa 100644
> --- a/tools/lib/bpf/usdt.c
> +++ b/tools/lib/bpf/usdt.c
> @@ -1007,8 +1007,113 @@ static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note,
>  	return 0;
>  }
>  
> +/* Architecture-specific logic for parsing USDT argument location specs */
> +
> +#if defined(__x86_64__) || defined(__i386__)
> +
> +static int calc_pt_regs_off(const char *reg_name)
> +{
> +	static struct {
> +		const char *names[4];
> +		size_t pt_regs_off;
> +	} reg_map[] = {
> +#if __x86_64__
> +#define reg_off(reg64, reg32) offsetof(struct pt_regs, reg64)
> +#else
> +#define reg_off(reg64, reg32) offsetof(struct pt_regs, reg32)
> +#endif
> +		{ {"rip", "eip", "", ""}, reg_off(rip, eip) },
> +		{ {"rax", "eax", "ax", "al"}, reg_off(rax, eax) },
> +		{ {"rbx", "ebx", "bx", "bl"}, reg_off(rbx, ebx) },
> +		{ {"rcx", "ecx", "cx", "cl"}, reg_off(rcx, ecx) },
> +		{ {"rdx", "edx", "dx", "dl"}, reg_off(rdx, edx) },
> +		{ {"rsi", "esi", "si", "sil"}, reg_off(rsi, esi) },
> +		{ {"rdi", "edi", "di", "dil"}, reg_off(rdi, edi) },
> +		{ {"rbp", "ebp", "bp", "bpl"}, reg_off(rbp, ebp) },
> +		{ {"rsp", "esp", "sp", "spl"}, reg_off(rsp, esp) },
> +#undef reg_off
> +#if __x86_64__
> +		{ {"r8", "r8d", "r8w", "r8b"}, offsetof(struct pt_regs, r8) },
> +		{ {"r9", "r9d", "r9w", "r9b"}, offsetof(struct pt_regs, r9) },
> +		{ {"r10", "r10d", "r10w", "r10b"}, offsetof(struct pt_regs, r10) },
> +		{ {"r11", "r11d", "r11w", "r11b"}, offsetof(struct pt_regs, r11) },
> +		{ {"r12", "r12d", "r12w", "r12b"}, offsetof(struct pt_regs, r12) },
> +		{ {"r13", "r13d", "r13w", "r13b"}, offsetof(struct pt_regs, r13) },
> +		{ {"r14", "r14d", "r14w", "r14b"}, offsetof(struct pt_regs, r14) },
> +		{ {"r15", "r15d", "r15w", "r15b"}, offsetof(struct pt_regs, r15) },
> +#endif
> +	};
> +	int i, j;
> +
> +	for (i = 0; i < ARRAY_SIZE(reg_map); i++) {
> +		for (j = 0; j < ARRAY_SIZE(reg_map[i].names); j++) {
> +			if (strcmp(reg_name, reg_map[i].names[j]) == 0)
> +				return reg_map[i].pt_regs_off;
> +		}
> +	}
> +
> +	pr_warn("usdt: unrecognized register '%s'\n", reg_name);
> +	return -ENOENT;
> +}

this is a really neat approach! could we shrink the arch-dependent
part even further to the reg_map only? so instead of having the
parse_usdt_arg() in the #ifdef __x86_64__/___i386__ , only the
reg_map is, and we have an empty reg_map for an unsupported arch
such that calc_pt_regs_off() does

	if (ARRAY_SIZE(reg_map) == 0) {
		pr_warn("usdt: libbpf doesn't support USDTs on current 
architecture\n");
		return -ENOTSUP;
	}

> +
> +static int parse_usdt_arg(const char *arg_str, int arg_num, struct usdt_arg_spec *arg)
> +{
> +	char *reg_name = NULL;
> +	int arg_sz, len, reg_off;
> +	long off;
> +

nit but it took me a moment to notice that you had examples in each 
clause; might be good to have a higher-level comment stating

we support 3 forms of argument description:

- register dereference "-4@-20(%rbp)"
- register "-4@%eax"
- constant "4@$71"

I _think_ you mentioned there were other valid arg formats that we're not 
supporting, would be good to be explicit about that here I think; "other
formats are possible but we don't support them currently".

> +	if (3 == sscanf(arg_str, " %d @ %ld ( %%%m[^)] ) %n", &arg_sz, &off, &reg_name, &len)) {
> +		/* -4@-20(%rbp) */
> +		arg->arg_type = USDT_ARG_REG_DEREF;
> +		arg->val_off = off;
> +		reg_off = calc_pt_regs_off(reg_name);
> +		free(reg_name);
> +		if (reg_off < 0)
> +			return reg_off;
> +		arg->reg_off = reg_off;
> +	} else if (2 == sscanf(arg_str, " %d @ %%%ms %n", &arg_sz, &reg_name, &len)) {
> +		/* -4@%eax */
> +		arg->arg_type = USDT_ARG_REG;
> +		arg->val_off = 0;
> +
> +		reg_off = calc_pt_regs_off(reg_name);
> +		free(reg_name);
> +		if (reg_off < 0)
> +			return reg_off;
> +		arg->reg_off = reg_off;
> +	} else if (2 == sscanf(arg_str, " %d @ $%ld %n", &arg_sz, &off, &len)) {
> +		/* 4@$71 */
> +		arg->arg_type = USDT_ARG_CONST;
> +		arg->val_off = off;
> +		arg->reg_off = 0;
> +	} else {
> +		pr_warn("usdt: unrecognized arg #%d spec '%s'\n", arg_num, arg_str);
> +		return -EINVAL;
> +	}
> +
> +	arg->arg_signed = arg_sz < 0;
> +	if (arg_sz < 0)
> +		arg_sz = -arg_sz;
> +
> +	switch (arg_sz) {
> +	case 1: case 2: case 4: case 8:
> +		arg->arg_bitshift = 64 - arg_sz * 8;
> +		break;
> +	default:
> +		pr_warn("usdt: unsupported arg #%d (spec '%s') size: %d\n",
> +			arg_num, arg_str, arg_sz);
> +		return -EINVAL;
> +	}
> +
> +	return len;
> +}
> +
> +#else
> +
>  static int parse_usdt_arg(const char *arg_str, int arg_num, struct usdt_arg_spec *arg)
>  {
>  	pr_warn("usdt: libbpf doesn't support USDTs on current architecture\n");
>  	return -ENOTSUP;
>  }
> +
> +#endif
> -- 
> 2.30.2
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 6/7] selftests/bpf: add basic USDT selftests
  2022-03-25  5:29 ` [PATCH bpf-next 6/7] selftests/bpf: add basic USDT selftests Andrii Nakryiko
@ 2022-03-31 15:54   ` Alan Maguire
  2022-03-31 19:28     ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Alan Maguire @ 2022-03-31 15:54 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, kernel-team, Alan Maguire, Dave Marchevsky

[-- Attachment #1: Type: text/plain, Size: 17680 bytes --]

cOn Fri, 25 Mar 2022, Andrii Nakryiko wrote:

> Add semaphore-based USDT to test_progs itself and write basic tests to
> valicate both auto-attachment and manual attachment logic, as well as
> BPF-side functionality.
> 
> Also add subtests to validate that libbpf properly deduplicates USDT
> specs and handles spec overflow situations correctly, as well as proper
> "rollback" of partially-attached multi-spec USDT.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

One compilation issue and minor nit below

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>

> ---
>  tools/testing/selftests/bpf/Makefile          |   1 +
>  tools/testing/selftests/bpf/prog_tests/usdt.c | 314 ++++++++++++++++++
>  tools/testing/selftests/bpf/progs/test_usdt.c | 115 +++++++
>  3 files changed, 430 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/usdt.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_usdt.c
> 
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 3820608faf57..18e22def3bdb 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -400,6 +400,7 @@ $(TRUNNER_BPF_OBJS): $(TRUNNER_OUTPUT)/%.o:				\
>  		     $(TRUNNER_BPF_PROGS_DIR)/*.h			\
>  		     $$(INCLUDE_DIR)/vmlinux.h				\
>  		     $(wildcard $(BPFDIR)/bpf_*.h)			\
> +		     $(wildcard $(BPFDIR)/*.bpf.h)			\
>  		     | $(TRUNNER_OUTPUT) $$(BPFOBJ)
>  	$$(call $(TRUNNER_BPF_BUILD_RULE),$$<,$$@,			\
>  					  $(TRUNNER_BPF_CFLAGS))
> diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
> new file mode 100644
> index 000000000000..44a20d8c45d7
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
> @@ -0,0 +1,314 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +#include <test_progs.h>
> +
> +#define _SDT_HAS_SEMAPHORES 1
> +#include <sys/sdt.h>
> +

Do we need to bracket with feature test for sdt.h? I think I had 
something rough for this in

https://lore.kernel.org/bpf/1642004329-23514-5-git-send-email-alan.maguire@oracle.com/
 
might prevent selftest compilation failures if sdt.h isn't present,
and IIRC that feature test is used in perf code.

I just realized I got confused on the cookie logic. There's really two 
levels of cookies:

- at the API level, the USDT cookie is associated with the USDT 
  attachment, and can span multiple sites; but under the hood
- the uprobe cookie is used to associate the uprobe point of attachment 
  with the associated spec id.  If BPF cookie retrieval isn't supported,
  we fall back to using the instruction pointer -> spec id mapping.

To get the usdt cookie in BPF prog context, we first look up the uprobe 
cookie to get the spec id, and then get the spec entry.

I guess libbpf CI on older kernels will cover testing for the case where
bpf cookies aren't supported and we need to do that ip -> spec id 
mapping? Perhaps we could have a test that #defines 
BPF_USDT_HAS_BPF_COOKIE to 0 to cover testing this on newer kernels?

> +#include "test_usdt.skel.h"
> +#include "test_urandom_usdt.skel.h"
> +
> +int lets_test_this(int);
> +
> +static volatile int idx = 2;
> +static volatile __u64 bla = 0xFEDCBA9876543210ULL;
> +static volatile short nums[] = {-1, -2, -3, };
> +
> +static volatile struct {
> +	int x;
> +	signed char y;
> +} t1 = { 1, -127 };
> +
> +#define SEC(name) __attribute__((section(name), used))
> +
> +unsigned short test_usdt0_semaphore SEC(".probes");
> +unsigned short test_usdt3_semaphore SEC(".probes");
> +unsigned short test_usdt12_semaphore SEC(".probes");
> +
> +static void __always_inline trigger_func(int x) {
> +	long y = 42;
> +
> +	if (test_usdt0_semaphore)
> +		STAP_PROBE(test, usdt0);
> +	if (test_usdt3_semaphore)
> +		STAP_PROBE3(test, usdt3, x, y, &bla);
> +	if (test_usdt12_semaphore) {
> +		STAP_PROBE12(test, usdt12,
> +			     x, x + 1, y, x + y, 5,
> +			     y / 7, bla, &bla, -9, nums[x],
> +			     nums[idx], t1.y);
> +	}
> +}
> +
> +static void subtest_basic_usdt(void)
> +{
> +	LIBBPF_OPTS(bpf_usdt_opts, opts);
> +	struct test_usdt *skel;
> +	struct test_usdt__bss *bss;
> +	int err;
> +
> +	skel = test_usdt__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "skel_open"))
> +		return;
> +
> +	bss = skel->bss;
> +	bss->my_pid = getpid();
> +
> +	err = test_usdt__attach(skel);
> +	if (!ASSERT_OK(err, "skel_attach"))
> +		goto cleanup;
> +
> +	/* usdt0 won't be auto-attached */
> +	opts.usdt_cookie = 0xcafedeadbeeffeed;
> +	skel->links.usdt0 = bpf_program__attach_usdt(skel->progs.usdt0,
> +						     0 /*self*/, "/proc/self/exe",
> +						     "test", "usdt0", &opts);
> +	if (!ASSERT_OK_PTR(skel->links.usdt0, "usdt0_link"))
> +		goto cleanup;
> +
> +	trigger_func(1);
> +
> +	ASSERT_EQ(bss->usdt0_called, 1, "usdt0_called");
> +	ASSERT_EQ(bss->usdt3_called, 1, "usdt3_called");
> +	ASSERT_EQ(bss->usdt12_called, 1, "usdt12_called");
> +
> +	ASSERT_EQ(bss->usdt0_cookie, 0xcafedeadbeeffeed, "usdt0_cookie");
> +	ASSERT_EQ(bss->usdt0_arg_cnt, 0, "usdt0_arg_cnt");
> +	ASSERT_EQ(bss->usdt0_arg_ret, -ENOENT, "usdt0_arg_ret");
> +
> +	/* auto-attached usdt3 gets default zero cookie value */
> +	ASSERT_EQ(bss->usdt3_cookie, 0, "usdt3_cookie");
> +	ASSERT_EQ(bss->usdt3_arg_cnt, 3, "usdt3_arg_cnt");
> +
> +	ASSERT_EQ(bss->usdt3_arg_rets[0], 0, "usdt3_arg1_ret");
> +	ASSERT_EQ(bss->usdt3_arg_rets[1], 0, "usdt3_arg2_ret");
> +	ASSERT_EQ(bss->usdt3_arg_rets[2], 0, "usdt3_arg3_ret");
> +	ASSERT_EQ(bss->usdt3_args[0], 1, "usdt3_arg1");
> +	ASSERT_EQ(bss->usdt3_args[1], 42, "usdt3_arg2");
> +	ASSERT_EQ(bss->usdt3_args[2], (uintptr_t)&bla, "usdt3_arg3");
> +
> +	/* auto-attached usdt12 gets default zero cookie value */
> +	ASSERT_EQ(bss->usdt12_cookie, 0, "usdt12_cookie");
> +	ASSERT_EQ(bss->usdt12_arg_cnt, 12, "usdt12_arg_cnt");
> +
> +	ASSERT_EQ(bss->usdt12_args[0], 1, "usdt12_arg1");
> +	ASSERT_EQ(bss->usdt12_args[1], 1 + 1, "usdt12_arg2");
> +	ASSERT_EQ(bss->usdt12_args[2], 42, "usdt12_arg3");
> +	ASSERT_EQ(bss->usdt12_args[3], 42 + 1, "usdt12_arg4");
> +	ASSERT_EQ(bss->usdt12_args[4], 5, "usdt12_arg5");
> +	ASSERT_EQ(bss->usdt12_args[5], 42 / 7, "usdt12_arg6");
> +	ASSERT_EQ(bss->usdt12_args[6], bla, "usdt12_arg7");
> +	ASSERT_EQ(bss->usdt12_args[7], (uintptr_t)&bla, "usdt12_arg8");
> +	ASSERT_EQ(bss->usdt12_args[8], -9, "usdt12_arg9");
> +	ASSERT_EQ(bss->usdt12_args[9], nums[1], "usdt12_arg10");
> +	ASSERT_EQ(bss->usdt12_args[10], nums[idx], "usdt12_arg11");
> +	ASSERT_EQ(bss->usdt12_args[11], t1.y, "usdt12_arg12");
> +
> +	/* trigger_func() is marked __always_inline, so USDT invocations will be
> +	 * inlined in two different places, meaning that each USDT will have
> +	 * at least 2 different places to be attached to. This verifies that
> +	 * bpf_program__attach_usdt() handles this properly and attaches to
> +	 * all possible places of USDT invocation.
> +	 */
> +	trigger_func(2);
> +
> +	ASSERT_EQ(bss->usdt0_called, 2, "usdt0_called");
> +	ASSERT_EQ(bss->usdt3_called, 2, "usdt3_called");
> +	ASSERT_EQ(bss->usdt12_called, 2, "usdt12_called");
> +
> +	/* only check values that depend on trigger_func()'s input value */
> +	ASSERT_EQ(bss->usdt3_args[0], 2, "usdt3_arg1");
> +
> +	ASSERT_EQ(bss->usdt12_args[0], 2, "usdt12_arg1");
> +	ASSERT_EQ(bss->usdt12_args[1], 2 + 1, "usdt12_arg2");
> +	ASSERT_EQ(bss->usdt12_args[3], 42 + 2, "usdt12_arg4");
> +	ASSERT_EQ(bss->usdt12_args[9], nums[2], "usdt12_arg10");
> +
> +	/* detach and re-attach usdt3 */
> +	bpf_link__destroy(skel->links.usdt3);
> +
> +	opts.usdt_cookie = 0xBADC00C51E;
> +	skel->links.usdt3 = bpf_program__attach_usdt(skel->progs.usdt3, -1 /* any pid */,
> +						     "/proc/self/exe", "test", "usdt3", &opts);
> +	if (!ASSERT_OK_PTR(skel->links.usdt3, "usdt3_reattach"))
> +		goto cleanup;
> +
> +	trigger_func(3);
> +
> +	ASSERT_EQ(bss->usdt3_called, 3, "usdt3_called");
> +	/* this time usdt3 has custom cookie */
> +	ASSERT_EQ(bss->usdt3_cookie, 0xBADC00C51E, "usdt3_cookie");
> +	ASSERT_EQ(bss->usdt3_arg_cnt, 3, "usdt3_arg_cnt");
> +
> +	ASSERT_EQ(bss->usdt3_arg_rets[0], 0, "usdt3_arg1_ret");
> +	ASSERT_EQ(bss->usdt3_arg_rets[1], 0, "usdt3_arg2_ret");
> +	ASSERT_EQ(bss->usdt3_arg_rets[2], 0, "usdt3_arg3_ret");
> +	ASSERT_EQ(bss->usdt3_args[0], 3, "usdt3_arg1");
> +	ASSERT_EQ(bss->usdt3_args[1], 42, "usdt3_arg2");
> +	ASSERT_EQ(bss->usdt3_args[2], (uintptr_t)&bla, "usdt3_arg3");
> +
> +cleanup:
> +	test_usdt__destroy(skel);
> +}
> +
> +unsigned short test_usdt_100_semaphore SEC(".probes");
> +unsigned short test_usdt_300_semaphore SEC(".probes");
> +unsigned short test_usdt_400_semaphore SEC(".probes");
> +
> +#define R10(F, X)  F(X+0); F(X+1);F(X+2); F(X+3); F(X+4); \
> +		   F(X+5); F(X+6); F(X+7); F(X+8); F(X+9);
> +#define R100(F, X) R10(F,X+ 0);R10(F,X+10);R10(F,X+20);R10(F,X+30);R10(F,X+40); \
> +		   R10(F,X+50);R10(F,X+60);R10(F,X+70);R10(F,X+80);R10(F,X+90);
> +
> +/* carefully control that we get exactly 100 inlines by preventing inlining */
> +static void __always_inline f100(int x)
> +{
> +	STAP_PROBE1(test, usdt_100, x);
> +}
> +
> +__weak void trigger_100_usdts(void)
> +{
> +	R100(f100, 0);
> +}
> +
> +/* we shouldn't be able to attach to test:usdt2_300 USDT as we don't have as
> + * many slots for specs. It's important that each STAP_PROBE2() invocation
> + * (after untolling) gets different arg spec due to compiler inlining i as
> + * a constant
> + */
> +static void __always_inline f300(int x)
> +{
> +	STAP_PROBE1(test, usdt_300, x);
> +}
> +
> +__weak void trigger_300_usdts(void)
> +{
> +	R100(f300, 0);
> +	R100(f300, 100);
> +	R100(f300, 200);
> +}
> +
> +static void __always_inline f400(int /*unused*/ )

...caused a compilation error on gcc-9 for me:

  TEST-OBJ [test_progs] usdt.test.o
/home/alan/kbuild/bpf-next/tools/testing/selftests/bpf/prog_tests/usdt.c: 
In function ‘f400’:
/home/alan/kbuild/bpf-next/tools/testing/selftests/bpf/prog_tests/usdt.c:191:34: 
error: parameter name omitted
  191 | static void __always_inline f400(int /*unused*/ )
      |                                  ^~~
make: *** 
[/home/alan/kbuild/bpf-next/tools/testing/selftests/bpf/usdt.test.o] Error 
1
 ...but with 

diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c 
b/tools/testing/selftests/bpf/prog_tests/
index b4c070b..5d382c8 100644
--- a/tools/testing/selftests/bpf/prog_tests/usdt.c
+++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
@@ -188,7 +188,7 @@ __weak void trigger_300_usdts(void)
        R100(f300, 200);
 }
 
-static void __always_inline f400(int /*unused*/ )
+static void __always_inline f400(int u /*unused*/ )
 {
        static int x;
 


...tests passed cleanly.

> +{
> +	static int x;
> +
> +	STAP_PROBE1(test, usdt_400, x++);
> +}
> +
> +/* this time we have 400 different USDT call sites, but they have uniform
> + * argument location, so libbpf's spec string deduplication logic should keep
> + * spec count use very small and so we should be able to attach to all 400
> + * call sites
> + */
> +__weak void trigger_400_usdts(void)
> +{
> +	R100(f400, 0);
> +	R100(f400, 100);
> +	R100(f400, 200);
> +	R100(f400, 300);
> +}
> +
> +static void subtest_multispec_usdt(void)
> +{
> +	LIBBPF_OPTS(bpf_usdt_opts, opts);
> +	struct test_usdt *skel;
> +	struct test_usdt__bss *bss;
> +	int err, i;
> +
> +	skel = test_usdt__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "skel_open"))
> +		return;
> +
> +	bss = skel->bss;
> +	bss->my_pid = getpid();
> +
> +	err = test_usdt__attach(skel);
> +	if (!ASSERT_OK(err, "skel_attach"))
> +		goto cleanup;
> +
> +	/* usdt_100 is auto-attached and there are 100 inlined call sites,
> +	 * let's validate that all of them are properly attached to and
> +	 * handled from BPF side
> +	 */
> +	trigger_100_usdts();
> +
> +	ASSERT_EQ(bss->usdt_100_called, 100, "usdt_100_called");
> +	ASSERT_EQ(bss->usdt_100_sum, 99 * 100 / 2, "usdt_100_sum");
> +
> +	/* Stress test free spec ID tracking. By default libbpf allows up to
> +	 * 256 specs to be used, so if we don't return free spec IDs back
> +	 * after few detachments and re-attachments we should run out of
> +	 * available spec IDs.
> +	 */
> +	for (i = 0; i < 2; i++) {
> +		bpf_link__destroy(skel->links.usdt_100);
> +
> +		skel->links.usdt_100 = bpf_program__attach_usdt(skel->progs.usdt_100, -1,
> +							        "/proc/self/exe",
> +								"test", "usdt_100", NULL);
> +		if (!ASSERT_OK_PTR(skel->links.usdt_100, "usdt_100_reattach"))
> +			goto cleanup;
> +
> +		bss->usdt_100_sum = 0;
> +		trigger_100_usdts();
> +
> +		ASSERT_EQ(bss->usdt_100_called, (i + 1) * 100 + 100, "usdt_100_called");
> +		ASSERT_EQ(bss->usdt_100_sum, 99 * 100 / 2, "usdt_100_sum");
> +	}
> +
> +	/* Now let's step it up and try to attach USDT that requires more than
> +	 * 256 attach points with different specs for each.
> +	 * Note that we need trigger_300_usdts() only to actually have 300
> +	 * USDT call sites, we are not going to actually trace them.
> +	 */
> +	trigger_300_usdts();
> +
> +	/* we'll reuse usdt_100 BPF program for usdt_300 test */
> +	bpf_link__destroy(skel->links.usdt_100);
> +	skel->links.usdt_100 = bpf_program__attach_usdt(skel->progs.usdt_100, -1, "/proc/self/exe",
> +							"test", "usdt_300", NULL);
> +	err = -errno;
> +	if (!ASSERT_ERR_PTR(skel->links.usdt_100, "usdt_300_bad_attach"))
> +		goto cleanup;
> +	ASSERT_EQ(err, -E2BIG, "usdt_300_attach_err");
> +
> +	/* let's check that there are no "dangling" BPF programs attached due
> +	 * to partial success of the above test:usdt_300 attachment
> +	 */
> +	bss->usdt_100_called = 0;
> +	bss->usdt_100_sum = 0;
> +
> +	f300(777); /* this is 301st instance of usdt_300 */
> +
> +	ASSERT_EQ(bss->usdt_100_called, 0, "usdt_301_called");
> +	ASSERT_EQ(bss->usdt_100_sum, 0, "usdt_301_sum");
> +
> +	/* This time we have USDT with 400 inlined invocations, but arg specs
> +	 * should be the same across all sites, so libbpf will only need to
> +	 * use one spec and thus we'll be able to attach 400 uprobes
> +	 * successfully.
> +	 *
> +	 * Again, we are reusing usdt_100 BPF program.
> +	 */
> +	skel->links.usdt_100 = bpf_program__attach_usdt(skel->progs.usdt_100, -1,
> +							"/proc/self/exe",
> +							"test", "usdt_400", NULL);
> +	if (!ASSERT_OK_PTR(skel->links.usdt_100, "usdt_400_attach"))
> +		goto cleanup;
> +
> +	trigger_400_usdts();
> +
> +	ASSERT_EQ(bss->usdt_100_called, 400, "usdt_400_called");
> +	ASSERT_EQ(bss->usdt_100_sum, 399 * 400 / 2, "usdt_400_sum");
> +
> +cleanup:
> +	test_usdt__destroy(skel);
> +}
> +
> +void test_usdt(void)
> +{
> +	if (test__start_subtest("basic"))
> +		subtest_basic_usdt();
> +	if (test__start_subtest("multispec"))
> +		subtest_multispec_usdt();
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_usdt.c b/tools/testing/selftests/bpf/progs/test_usdt.c
> new file mode 100644
> index 000000000000..cb800910d794
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_usdt.c
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> +
> +#include "vmlinux.h"
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/usdt.bpf.h>
> +
> +int my_pid;
> +
> +int usdt0_called;
> +u64 usdt0_cookie;
> +int usdt0_arg_cnt;
> +int usdt0_arg_ret;
> +
> +SEC("usdt")
> +int usdt0(struct pt_regs *ctx)
> +{
> +	long tmp;
> +
> +	if (my_pid != (bpf_get_current_pid_tgid() >> 32))
> +		return 0;
> +
> +	__sync_fetch_and_add(&usdt0_called, 1);
> +
> +	usdt0_cookie = bpf_usdt_cookie(ctx);
> +	usdt0_arg_cnt = bpf_usdt_arg_cnt(ctx);
> +	/* should return -ENOENT */
> +	usdt0_arg_ret = bpf_usdt_arg(ctx, 0, &tmp);
> +	return 0;
> +}
> +
> +int usdt3_called;
> +u64 usdt3_cookie;
> +int usdt3_arg_cnt;
> +int usdt3_arg_rets[3];
> +u64 usdt3_args[3];
> +
> +SEC("usdt//proc/self/exe:test:usdt3")
> +int usdt3(struct pt_regs *ctx)
> +{
> +	long tmp;
> +
> +	if (my_pid != (bpf_get_current_pid_tgid() >> 32))
> +		return 0;
> +
> +	__sync_fetch_and_add(&usdt3_called, 1);
> +
> +	usdt3_cookie = bpf_usdt_cookie(ctx);
> +	usdt3_arg_cnt = bpf_usdt_arg_cnt(ctx);
> +
> +	usdt3_arg_rets[0] = bpf_usdt_arg(ctx, 0, &tmp);
> +	usdt3_args[0] = (int)tmp;
> +
> +	usdt3_arg_rets[1] = bpf_usdt_arg(ctx, 1, &tmp);
> +	usdt3_args[1] = (long)tmp;
> +
> +	usdt3_arg_rets[2] = bpf_usdt_arg(ctx, 2, &tmp);
> +	usdt3_args[2] = (uintptr_t)tmp;
> +
> +	return 0;
> +}
> +
> +int usdt12_called;
> +u64 usdt12_cookie;
> +int usdt12_arg_cnt;
> +u64 usdt12_args[12];
> +
> +SEC("usdt//proc/self/exe:test:usdt12")
> +int BPF_USDT(usdt12, int a1, int a2, long a3, long a4, unsigned a5,
> +		     long a6, __u64 a7, uintptr_t a8, int a9, short a10,
> +		     short a11, signed char a12)
> +{
> +	if (my_pid != (bpf_get_current_pid_tgid() >> 32))
> +		return 0;
> +
> +	__sync_fetch_and_add(&usdt12_called, 1);
> +
> +	usdt12_cookie = bpf_usdt_cookie(ctx);
> +	usdt12_arg_cnt = bpf_usdt_arg_cnt(ctx);
> +
> +	usdt12_args[0] = a1;
> +	usdt12_args[1] = a2;
> +	usdt12_args[2] = a3;
> +	usdt12_args[3] = a4;
> +	usdt12_args[4] = a5;
> +	usdt12_args[5] = a6;
> +	usdt12_args[6] = a7;
> +	usdt12_args[7] = a8;
> +	usdt12_args[8] = a9;
> +	usdt12_args[9] = a10;
> +	usdt12_args[10] = a11;
> +	usdt12_args[11] = a12;
> +	return 0;
> +}
> +
> +int usdt_100_called;
> +int usdt_100_sum;
> +
> +SEC("usdt//proc/self/exe:test:usdt_100")
> +int BPF_USDT(usdt_100, int x)
> +{
> +	long tmp;
> +
> +	if (my_pid != (bpf_get_current_pid_tgid() >> 32))
> +		return 0;
> +
> +	__sync_fetch_and_add(&usdt_100_called, 1);
> +	__sync_fetch_and_add(&usdt_100_sum, x);
> +
> +	bpf_printk("X is %d, sum is %d", x, usdt_100_sum);
> +

debugging, needed?

> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> -- 
> 2.30.2
> 
> 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* program local storage. Was: [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-03-25  5:29 ` [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support Andrii Nakryiko
  2022-03-30  3:10   ` Hengqi Chen
  2022-03-31 11:30   ` Alan Maguire
@ 2022-03-31 18:34   ` Alexei Starovoitov
  2022-03-31 20:13     ` Andrii Nakryiko
  2 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2022-03-31 18:34 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team,
	Alan Maguire, Dave Marchevsky

On Thu, Mar 24, 2022 at 10:30 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> +
> +struct __bpf_usdt_arg_spec {
> +       __u64 val_off;
> +       enum __bpf_usdt_arg_type arg_type;
> +       short reg_off;
> +       bool arg_signed;
> +       char arg_bitshift;
> +};
> +
> +/* should match USDT_MAX_ARG_CNT in usdt.c exactly */
> +#define BPF_USDT_MAX_ARG_CNT 12
> +struct __bpf_usdt_spec {
> +       struct __bpf_usdt_arg_spec args[BPF_USDT_MAX_ARG_CNT];
> +       __u64 usdt_cookie;
> +       short arg_cnt;
> +};
> +
> +__weak struct {
> +       __uint(type, BPF_MAP_TYPE_ARRAY);
> +       __uint(max_entries, BPF_USDT_MAX_SPEC_CNT);
> +       __type(key, int);
> +       __type(value, struct __bpf_usdt_spec);
> +} __bpf_usdt_specs SEC(".maps");
> +
> +__weak struct {
> +       __uint(type, BPF_MAP_TYPE_HASH);
> +       __uint(max_entries, BPF_USDT_MAX_IP_CNT);
> +       __type(key, long);
> +       __type(value, struct __bpf_usdt_spec);
> +} __bpf_usdt_specs_ip_to_id SEC(".maps");
...

> +
> +/* Fetch USDT argument *arg* (zero-indexed) and put its value into *res.
> + * Returns 0 on success; negative error, otherwise.
> + * On error *res is guaranteed to be set to zero.
> + */
> +__hidden __weak
> +int bpf_usdt_arg(struct pt_regs *ctx, int arg, long *res)
> +{
> +       struct __bpf_usdt_spec *spec;
> +       struct __bpf_usdt_arg_spec *arg_spec;
> +       unsigned long val;
> +       int err, spec_id;
> +
> +       *res = 0;
> +
> +       spec_id = __bpf_usdt_spec_id(ctx);
> +       if (spec_id < 0)
> +               return -ESRCH;
> +
> +       spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> +       if (!spec)
> +               return -ESRCH;
> +
> +       if (arg >= spec->arg_cnt)
> +               return -ENOENT;
> +
> +       arg_spec = &spec->args[arg];
> +       switch (arg_spec->arg_type) {

Without bpf_cookie in the kernel each arg access is two lookups.
With bpf_cookie it's a single lookup in an array that is fast.
Multiply that cost by number of args.
Not a huge cost, but we can do better long term.

How about annotating bpf_cookie with PTR_TO_BTF_ID at prog load time.
So that bpf_get_attach_cookie() returns PTR_TO_BTF_ID instead of long.
This way bpf_get_attach_cookie() can return
"struct __bpf_usdt_spec *".

At attach time libbpf will provide populated 'struct __bpf_usdt_spec'
to the kernel and the kernel will copy the struct's data
in the bpf_link.
At detach time that memory is freed.

Advantages:
- saves an array lookup at runtime
- no need to provide size for __bpf_usdt_specs map.
  That map is no longer needed.
  users don't need to worry about maxing out BPF_USDT_MAX_SPEC_CNT.
- libbpf doesn't need to populate __bpf_usdt_specs map
  libbpf doesn't need to allocate spec_id-s.
  libbpf will keep struct __bpf_usdt_spec per uprobe and
  pass it to the kernel at attach time to store in bpf_link.

"cookie as ptr_to_btf_id" is a generic mechanism to provide a
blob of data to the bpf prog instead of a single "long".
That blob can be read/write too.
It can be used as per-program + per-attach point scratch area.
Similar to task/inode local storage...
That would be (prog, attach_point) local storage.

Thoughts?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-03-31 11:30   ` Alan Maguire
@ 2022-03-31 18:49     ` Andrii Nakryiko
  2022-03-31 20:52       ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-31 18:49 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Dave Marchevsky

On Thu, Mar 31, 2022 at 4:31 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On Fri, 25 Mar 2022, Andrii Nakryiko wrote:
>
> > Add BPF-side implementation of libbpf-provided USDT support. This
> > consists of single header library, usdt.bpf.h, which is meant to be used
> > from user's BPF-side source code. This header is added to the list of
> > installed libbpf header, along bpf_helpers.h and others.
> >
>
> <snip>
>
> Some suggestions below, but nothing major.
>
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
>
> > diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h
> > new file mode 100644
> > index 000000000000..8ee084b2e6b5
> > --- /dev/null
> > +++ b/tools/lib/bpf/usdt.bpf.h
> > @@ -0,0 +1,228 @@
> > +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> > +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> > +#ifndef __USDT_BPF_H__
> > +#define __USDT_BPF_H__
> > +
> > +#include <linux/errno.h>
> > +#include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_tracing.h>
> > +#include <bpf/bpf_core_read.h>
> > +
> > +/* Below types and maps are internal implementation details of libpf's USDT
> > + * support and are subjects to change. Also, usdt_xxx() API helpers should be
> > + * considered an unstable API as well and might be adjusted based on user
> > + * feedback from using libbpf's USDT support in production.
> > + */
> > +
> > +/* User can override BPF_USDT_MAX_SPEC_CNT to change default size of internal
> > + * map that keeps track of USDT argument specifications. This might be
> > + * necessary if there are a lot of USDT attachments.
> > + */
> > +#ifndef BPF_USDT_MAX_SPEC_CNT
> > +#define BPF_USDT_MAX_SPEC_CNT 256
> > +#endif
> > +/* User can override BPF_USDT_MAX_IP_CNT to change default size of internal
> > + * map that keeps track of IP (memory address) mapping to USDT argument
> > + * specification.
> > + * Note, if kernel supports BPF cookies, this map is not used and could be
> > + * resized all the way to 1 to save a bit of memory.
> > + */
> > +#ifndef BPF_USDT_MAX_IP_CNT
> > +#define BPF_USDT_MAX_IP_CNT 1024
> > +#endif
>
> might be no harm to just make this default to a reasonable multiple of
> BPF_USDT_MAX_SPEC_CNT; i.e. n specs X m possible sites. Would allow users
> to simply override the MAX_SPEC_CNT in most cases too.

It's not clear what the reasonable multiple is, it will differ for
different binaries. I can do (4 * BPF_USDT_MAX_SPEC_CNT) to arrive at
the same default 1024? Do you think that's reasonable?

>
> > +/* We use BPF CO-RE to detect support for BPF cookie from BPF side. This is
> > + * the only dependency on CO-RE, so if it's undesirable, user can override
> > + * BPF_USDT_HAS_BPF_COOKIE to specify whether to BPF cookie is supported or not.
> > + */
> > +#ifndef BPF_USDT_HAS_BPF_COOKIE
> > +#define BPF_USDT_HAS_BPF_COOKIE \
> > +     bpf_core_enum_value_exists(enum bpf_func_id___usdt, BPF_FUNC_get_attach_cookie___usdt)
> > +#endif
> > +
> > +enum __bpf_usdt_arg_type {
> > +     BPF_USDT_ARG_CONST,
> > +     BPF_USDT_ARG_REG,
> > +     BPF_USDT_ARG_REG_DEREF,
> > +};
> > +
> > +struct __bpf_usdt_arg_spec {
> > +     __u64 val_off;
> > +     enum __bpf_usdt_arg_type arg_type;
> > +     short reg_off;
> > +     bool arg_signed;
> > +     char arg_bitshift;
>
> would be no harm having a small comment here or below where the
> bitshifting is done like "for arg sizes less than 8 bytes, this tells
> us how many bits to shift to left then right to
> remove the unused bits, giving correct arg value".

sure, I'll add that comment that this is used for casting and
potentially sign-extending arguments up to u64

>
> > +};
> > +
> > +/* should match USDT_MAX_ARG_CNT in usdt.c exactly */
> > +#define BPF_USDT_MAX_ARG_CNT 12
> > +struct __bpf_usdt_spec {
> > +     struct __bpf_usdt_arg_spec args[BPF_USDT_MAX_ARG_CNT];
> > +     __u64 usdt_cookie;
> > +     short arg_cnt;
> > +};
> > +
> > +__weak struct {
> > +     __uint(type, BPF_MAP_TYPE_ARRAY);
> > +     __uint(max_entries, BPF_USDT_MAX_SPEC_CNT);
> > +     __type(key, int);
> > +     __type(value, struct __bpf_usdt_spec);
> > +} __bpf_usdt_specs SEC(".maps");
> > +
> > +__weak struct {
> > +     __uint(type, BPF_MAP_TYPE_HASH);
> > +     __uint(max_entries, BPF_USDT_MAX_IP_CNT);
> > +     __type(key, long);
> > +     __type(value, struct __bpf_usdt_spec);
> > +} __bpf_usdt_specs_ip_to_id SEC(".maps");
> > +
> > +/* don't rely on user's BPF code to have latest definition of bpf_func_id */
> > +enum bpf_func_id___usdt {
> > +     BPF_FUNC_get_attach_cookie___usdt = 0xBAD, /* value doesn't matter */
> > +};
> > +
> > +static inline int __bpf_usdt_spec_id(struct pt_regs *ctx)
> > +{
> > +     if (!BPF_USDT_HAS_BPF_COOKIE) {
> > +             long ip = PT_REGS_IP(ctx);
>
> Trying to sort of the permutations of features, I _think_ is it possible
> the user has CO-RE support, but the clang version doesn't support the
> push of the preserve_access_index attribute? Would it be feasible to
> do an explicit "PT_REGS_IP_CORE(ctx);" here?


We don't normally rely on _CORE variants when fetching values from
pt_regs context, so I didn't want to add more dependency on CO-RE
here. User can opt out of CO-RE entirely by redefining
BPF_USDT_HAS_BPF_COOKIE, using PT_REGS_IP_CORE() here would make it
harder. As for struct pt_regs, in some architectures it's part of
UAPI, so it's very unlikely that existing fields are going to be moved
around, so not using _CORE() should be fine, IMO.


>
> > +             int *spec_id_ptr;
> > +
> > +             spec_id_ptr = bpf_map_lookup_elem(&__bpf_usdt_specs_ip_to_id, &ip);
> > +             return spec_id_ptr ? *spec_id_ptr : -ESRCH;
> > +     }
> > +
> > +     return bpf_get_attach_cookie(ctx);
>
> should we grab the result in a u64 and handle the 0 case here -
> meaning "not specified" - and return -ESRCH?

But 0 is a perfectly fine spec ID, so why?

>
> > +}
> > +
> > +/* Return number of USDT arguments defined for currently traced USDT. */
> > +__hidden __weak
> > +int bpf_usdt_arg_cnt(struct pt_regs *ctx)
> > +{
> > +     struct __bpf_usdt_spec *spec;
> > +     int spec_id;
> > +
> > +     spec_id = __bpf_usdt_spec_id(ctx);
> > +     if (spec_id < 0)
> > +             return -EINVAL;
>
> spec_id can be 0 for the "cookie not set" case (see above).
>
> should we pass through the error value from __bpf_usdt_spec_id()? Looking
> above it's either -ESRCH or 0, but if we catch the 0 case as above we
> could just pass through the error value.
>

See above, zero is correct spec ID. So if the kernel supports cookies
and bpf_get_attach_cookie() returns zero, that zero is a real value.

> > +
> > +     spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> > +     if (!spec)
> > +             return -EINVAL;
> > +
>
> should this be -ESRCH? we know from the above we had a valid
> spec_id.

sure, I can change to -ESRCH, though it's more like a -EBUG :)

>
> > +     return spec->arg_cnt;
> > +}
>
> also, since in every case (I think) that we call __bpf_usdt_spec_id()
> we co on to look up the spec in the map, would it be easier to
> combine both operations and have
>
> struct __bpf_usdt_spec * __bpf_usdt_spec(struct pt_regs *ctx);
>
> ?

You are right, I think now we always get a spec itself. My earlier
versions had an extra map for stuff like USDT name, so having spec ID
separately made sense. I'll update the code to return spec directly.

>
> > +
> > +/* Fetch USDT argument *arg* (zero-indexed) and put its value into *res.
> > + * Returns 0 on success; negative error, otherwise.
> > + * On error *res is guaranteed to be set to zero.
> > + */
> > +__hidden __weak
> > +int bpf_usdt_arg(struct pt_regs *ctx, int arg, long *res)
> > +{
> > +     struct __bpf_usdt_spec *spec;
> > +     struct __bpf_usdt_arg_spec *arg_spec;
> > +     unsigned long val;
> > +     int err, spec_id;
> > +
> > +     *res = 0;
> > +
> > +     spec_id = __bpf_usdt_spec_id(ctx);
> > +     if (spec_id < 0)
> > +             return -ESRCH;
> > +
> > +     spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> > +     if (!spec)
> > +             return -ESRCH;
> > +
> > +     if (arg >= spec->arg_cnt)
> > +             return -ENOENT;
> > +
>
> I'm surprised you didn't need to check for negative values or a hard
> upper bound for the arg index here (to keep the verifier happy for
> the later array indexing using arg). Any dangers that an older
> LLVM+clang would generate code that might get tripped up on
> verification with this?

Great point. I think it's because in all the current code arg is a
known constant, so verifier just knows that everything is within
bounds. I'll harden the code a bit and will add a test that provides
arg as dynamic value.

>
> > +     arg_spec = &spec->args[arg];
> > +     switch (arg_spec->arg_type) {
> > +     case BPF_USDT_ARG_CONST:
> > +             val = arg_spec->val_off;
> > +             break;

[...]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration
  2022-03-31 12:13   ` Alan Maguire
@ 2022-03-31 19:02     ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-31 19:02 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Dave Marchevsky

On Thu, Mar 31, 2022 at 5:14 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On Fri, 25 Mar 2022, Andrii Nakryiko wrote:
>
> > Wire up libbpf USDT support APIs without yet implementing all the
> > nitty-gritty details of USDT discovery, spec parsing, and BPF map
> > initialization.
> >
> > User-visible user-space API is simple and is conceptually very similar
> > to uprobe API.
> >
> > bpf_program__attach_usdt() API allows to programmatically attach given
> > BPF program to a USDT, specified through binary path (executable or
> > shared lib), USDT provider and name. Also, just like in uprobe case, PID
> > filter is specified (0 - self, -1 - any process, or specific PID).
> > Optionally, USDT cookie value can be specified. Such single API
> > invocation will try to discover given USDT in specified binary and will
> > use (potentially many) BPF uprobes to attach this program in correct
> > locations.
> >
> > Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that
> > represents this attachment. It is a virtual BPF link that doesn't have
> > direct kernel object, as it can consist of multiple underlying BPF
> > uprobe links. As such, attachment is not atomic operation and there can
> > be brief moment when some USDT call sites are attached while others are
> > still in the process of attaching. This should be taken into
> > consideration by user. But bpf_program__attach_usdt() guarantees that
> > in the case of success all USDT call sites are successfully attached, or
> > all the successfuly attachments will be detached as soon as some USDT
> > call sites failed to be attached. So, in theory, there could be cases of
> > failed bpf_program__attach_usdt() call which did trigger few USDT
> > program invocations. This is unavoidable due to multi-uprobe nature of
> > USDT and has to be handled by user, if it's important to create an
> > illusion of atomicity.
> >
> > USDT BPF programs themselves are marked in BPF source code as either
> > SEC("usdt"), in which case they won't be auto-attached through
> > skeleton's <skel>__attach() method, or it can have a full definition,
> > which follows the spirit of fully-specified uprobes:
> > SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's
> > attach method will attempt auto-attachment. Similarly, generic
> > bpf_program__attach() will have enought information to go off of for
> > parameterless attachment.
> >
>
> Might be worth describing briefly the under-the-hood mechanisms; the
> usdt_manager that is per-BPF-object (so can conceptually represent
> multiple USDT providers/probes). It is initialized on first use and
> freed with bpf_object__close(); it is tasked with managing the mapping
> from usdt provider:name to actual sites+arguments via the spec/ip-to-id
> maps.

Yeah, I got feedback off-list that some good comment on how all the
pieces are coming together would be nice. I think I'll add a big
thorough explanation as a comment for struct usdt_manager, explaining
all the relations.

>
> > USDT BPF programs are actually uprobes, and as such for kernel they are
> > marked as BPF_PROG_TYPE_KPROBE.
> >
> > Another part of this patch is USDT-related feature probing:
> >   - BPF cookie support detection from user-space;
> >   - detection of kernel support for auto-refcounting of USDT semaphore.
> >
> > The latter is optional. If kernel doesn't support such feature and USDT
> > doesn't rely on USDT semaphores, no error is returned. But if libbpf
> > detects that USDT requires setting semaphores and kernel doesn't support
> > this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't
> > support poking process's memory directly to increment semaphore value,
> > like BCC does on legacy kernels, due to inherent raciness and danger of
> > such process memory manipulation. Libbpf let's kernel take care of this
> > properly or gives up.
> >
> > Logistically, all the extra USDT-related infrastructure of libbpf is put
> > into a separate usdt.c file and abstracted behind struct usdt_manager.
> > Each bpf_object has lazily-initialized usdt_manager pointer, which is
> > only instantiated if USDT programs are attempted to be attached. Closing
> > BPF object frees up usdt_manager resources. usdt_manager keeps track of
> > USDT spec ID assignment and few other small things.
> >
> > Subsequent patches will fill out remaining missing pieces of USDT
> > initialization and setup logic.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> again mostly nits and small suggestions below; this is fantastic Andrii!
>

Thanks for the thorough review!

> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
>
> > ---
> >  tools/lib/bpf/Build             |   3 +-
> >  tools/lib/bpf/libbpf.c          |  92 ++++++++++-
> >  tools/lib/bpf/libbpf.h          |  15 ++
> >  tools/lib/bpf/libbpf.map        |   1 +
> >  tools/lib/bpf/libbpf_internal.h |  19 +++
> >  tools/lib/bpf/usdt.c            | 270 ++++++++++++++++++++++++++++++++
> >  6 files changed, 391 insertions(+), 9 deletions(-)
> >  create mode 100644 tools/lib/bpf/usdt.c

[...]

> > +
> > +static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link)
> > +{
> > +     char *path = NULL, *provider = NULL, *name = NULL;
> > +     const char *sec_name;
> > +
> > +     sec_name = bpf_program__section_name(prog);
> > +     if (strcmp(sec_name, "usdt") == 0) {
> > +             /* no auto-attach for just SEC("usdt") */
> > +             *link = NULL;
> > +             return 0;
> > +     }
> > +
> > +     if (3 != sscanf(sec_name, "usdt/%m[^:]:%m[^:]:%m[^:]", &path, &provider, &name)) {
> > +             pr_warn("invalid section '%s', expected SEC(\"usdt/<path>:<provider>:<name>\")\n",
> > +                     sec_name);
>
> could have an else clause here for the parse success case I suppose to
> save having two sets of free()s.

you mean like

if (3 == sscanf("")) {
    *link = bpf_program__attach_usdt(...);
    err = libbpf_get_error(*link);
} else {
    err = -EINVAL;
}

free(path);
free(provider);
free(name);

return err;

?

Can do that, sure.

>
> > +             free(path);
> > +             free(provider);
> > +             free(name);
> > +             return -EINVAL;
> > +     }
> > +
> > +     *link = bpf_program__attach_usdt(prog, -1 /* any process */, path,
> > +                                      provider, name, NULL);
> > +     free(path);
> > +     free(provider);
> > +     free(name);
> > +     return libbpf_get_error(*link);
> > +}
> > +
> >  static int determine_tracepoint_id(const char *tp_category,
> >                                  const char *tp_name)
> >  {
> > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > index 05dde85e19a6..318eecaa14e7 100644
> > --- a/tools/lib/bpf/libbpf.h
> > +++ b/tools/lib/bpf/libbpf.h
> > @@ -503,6 +503,21 @@ bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid,
> >                               const char *binary_path, size_t func_offset,
> >                               const struct bpf_uprobe_opts *opts);
> >
> > +struct bpf_usdt_opts {
> > +     /* size of this struct, for forward/backward compatibility */
> > +     size_t sz;
> > +     /* custom user-provided value accessible through usdt_cookie() */
> > +     __u64 usdt_cookie;
> > +     size_t :0;
> > +};
> > +#define bpf_usdt_opts__last_field usdt_cookie
> > +
>
> need doc comment here such as
>
> /**
>  * @brief **bpf_program__attach_usdt()** is just like
>  * bpf_program__attach_uprobe_opts() except it covers
>  * USDT (Userspace Static Defined Tracing) attachment.
>  *
>  * @param prog BPF program to attach
>  * @param pid Process ID to attach the uprobe to, 0 for self (own
> process),
>  * -1 for all processes
>  * @param binary_path Path to binary that contains the USDT probe
>  * @param usdt_provider USDT Provider name
>  * @param usdt_name USDT Probe name
>  * @param opts Options for altering USDT attachment
>  * @return Reference to the newly created BPF link; or NULL is returned on
> error,
>  * error code is stored in errno
>  */
>

Will add, thanks!

>
> > +LIBBPF_API struct bpf_link *
> > +bpf_program__attach_usdt(const struct bpf_program *prog,
> > +                      pid_t pid, const char *binary_path,
> > +                      const char *usdt_provider, const char *usdt_name,
> > +                      const struct bpf_usdt_opts *opts);
> > +

[...]

> > +struct usdt_manager {
> > +     struct bpf_map *specs_map;
> > +     struct bpf_map *ip_to_id_map;
> > +
> > +     bool has_bpf_cookie;
> > +     bool has_sema_refcnt;
> > +};
> > +
> > +struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
> > +{
> > +     static const char *ref_ctr_sysfs_path = "/sys/bus/event_source/devices/uprobe/format/ref_ctr_offset";
>
> probably deserves a #define, and that would get us under the 100 char
> limit too..

If you look at a few other places around kprobe and uprobe, I
consciously don't do that. #define for string constant that is used
only once just makes reading code harder, as you have to jump around
more to figure out the exact file path (especially when you are trying
to follow the steps in the shell). So I'd rather keep it as is.

>
> > +     struct usdt_manager *man;
> > +     struct bpf_map *specs_map, *ip_to_id_map;
> > +
> > +     specs_map = bpf_object__find_map_by_name(obj, "__bpf_usdt_specs");
> > +     ip_to_id_map = bpf_object__find_map_by_name(obj, "__bpf_usdt_specs_ip_to_id");
> > +     if (!specs_map || !ip_to_id_map) {
> > +             pr_warn("usdt: failed to find USDT support BPF maps, did you forget to include bpf/usdt.bpf.h?\n");
>
> nice, I like the fact the error message also tells you how to fix it!
>
> > +             return NULL;
> > +     }
> > +

[...]

> > +struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct bpf_program *prog,
> > +                                       pid_t pid, const char *path,
> > +                                       const char *usdt_provider, const char *usdt_name,
> > +                                       long usdt_cookie)
> > +{
> > +     int i, fd, err;
> > +     LIBBPF_OPTS(bpf_uprobe_opts, opts);
> > +     struct bpf_link_usdt *link = NULL;
> > +     struct usdt_target *targets = NULL;
> > +     size_t target_cnt;
> > +     Elf *elf;
>
> Thought we should probably init elf to NULL, though I see we don't goto
> err_out except in cases where it's been explicitly set.

yep. Though some versions of GCC or Clang sometimes report false
positive in similar cases, so I might as well init it.

>
> > +
> > +     if (bpf_program__fd(prog) < 0) {
> > +             pr_warn("prog '%s': can't attach BPF program w/o FD (did you load it?)\n",
>
> nit: might be no harm "w/o" to expand to "without", and prefix with usdt:
> as below..

it's the same check as in all other bpf_program__attach_xxx() APIs, so
I wanted to keep it consistent. But I just realized that I should
probably move it into bpf_program__attach_usdt() itself.

>
> > +                     bpf_program__name(prog));
> > +             return libbpf_err_ptr(-EINVAL);
> > +     }
> > +
> > +     /* TODO: perform path resolution similar to uprobe's */
> > +     fd = open(path, O_RDONLY);
> > +     if (fd < 0) {
> > +             err = -errno;
> > +             pr_warn("usdt: failed to open ELF binary '%s': %d\n", path, err);
> > +             return libbpf_err_ptr(err);
> > +     }
> > +
> > +     elf = elf_begin(fd, ELF_C_READ_MMAP, NULL);
> > +     if (!elf) {
> > +             err = -EBADF;
> > +             pr_warn("usdt: failed to parse ELF binary '%s': %s\n", path, elf_errmsg(-1));
> > +             goto err_out;
> > +     }
> > +
> > +     err = sanity_check_usdt_elf(elf, path);
> > +     if (err)
> > +             goto err_out;
> > +
> > +     /* normalize PID filter */
> > +     if (pid < 0)
> > +             pid = -1;
> > +     else if (pid == 0)
> > +             pid = getpid();
> > +
> > +     /* discover USDT in given binary, optionally limiting
> > +      * activations to a given PID, if pid > 0
> > +      */
> > +     err = collect_usdt_targets(man, elf, path, pid, usdt_provider, usdt_name,
> > +                                usdt_cookie, &targets, &target_cnt);
> > +     if (err <= 0) {
>
> we haven't filled out collect_usdt_targets() yet, but might be no harm to
> have a pr_debug() here "usdt: cannot collect USDT targets for ..." since
> there are a few cases without warnings in the later patch.

I'd have to special case -ENOENT, which would be messy. The reason
some returns don't have pr_warn() in collect_usdt_targets() is that I
deemed them extremely unlikely (usually it's due to corrupted ELF or
something along those lines). But I'll double check and add pr_warn
where appropriate.


>
> > +             err = (err == 0) ? -ENOENT : err;
> > +             goto err_out;
> > +     }
> > +

[...]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 3/7] libbpf: add USDT notes parsing and resolution logic
  2022-03-31 13:37   ` Alan Maguire
@ 2022-03-31 19:13     ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-31 19:13 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Dave Marchevsky

On Thu, Mar 31, 2022 at 6:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On Fri, 25 Mar 2022, Andrii Nakryiko wrote:
>
> > Implement architecture-agnostic parts of USDT parsing logic. The code is
> > the documentation in this case, it's futile to try to succinctly
> > describe how USDT parsing is done in any sort of concreteness. But
> > still, USDTs are recorded in special ELF notes section (.note.stapsdt),
> > where each USDT call site is described separately. Along with USDT
> > provider and USDT name, each such note contains USDT argument
> > specification, which uses assembly-like syntax to describe how to fetch
> > value of USDT argument. USDT arg spec could be just a constant, or
> > a register, or a register dereference (most common cases in x86_64), but
> > it technically can be much more complicated cases, like offset relative
> > to global symbol and stuff like that. One of the later patches will
> > implement most common subset of this for x86 and x86-64 architectures,
> > which seems to handle a lot of real-world production application.
> >
> > USDT arg spec contains a compact encoding allowing usdt.bpf.h from
> > previous patch to handle the above 3 cases. Instead of recording which
> > register might be needed, we encode register's offset within struct
> > pt_regs to simplify BPF-side implementation. USDT argument can be of
> > different byte sizes (1, 2, 4, and 8) and signed or unsigned. To handle
> > this, libbpf pre-calculates necessary bit shifts to do proper casting
> > and sign-extension in a short sequences of left and right shifts.
> >
> > The rest is in the code with sometimes extensive comments and references
> > to external "documentation" for USDTs.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
>
> nothing major below, might be no harm to use a common header for
> some definitions for usdt.bpf.h and usdt.c..
>
> > ---
> >  tools/lib/bpf/usdt.c | 581 ++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 580 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> > index 8481e300598e..86d5d8390eb1 100644
> > --- a/tools/lib/bpf/usdt.c
> > +++ b/tools/lib/bpf/usdt.c
> > @@ -18,10 +18,56 @@
> >
> >  #define PERF_UPROBE_REF_CTR_OFFSET_SHIFT 32
> >
>
> unused?

yep, seems like I just rely on opts.ref_ctr_offset for this

>
> > +#define USDT_BASE_SEC ".stapsdt.base"
> > +#define USDT_SEMA_SEC ".probes"
>
> unused?

indeed, libbpf will support semaphores defined in any section, not
just ".probes". But it feels good to have .stapsdt.base, .note.stapsdt
and .probes (all the ELF sections involved) to be defined as constants
here. So maybe I'll keep it as a documentation?

>
> > +#define USDT_NOTE_SEC  ".note.stapsdt"
> > +#define USDT_NOTE_TYPE 3
> > +#define USDT_NOTE_NAME "stapsdt"
> > +
> > +/* should match exactly enum __bpf_usdt_arg_type from bpf_usdt.bpf.h */
> > +enum usdt_arg_type {
> > +     USDT_ARG_CONST,
> > +     USDT_ARG_REG,
> > +     USDT_ARG_REG_DEREF,
> > +};
> > +
> > +/* should match exactly struct __bpf_usdt_arg_spec from bpf_usdt.bpf.h */
> > +struct usdt_arg_spec {
> > +     __u64 val_off;
> > +     enum usdt_arg_type arg_type;
> > +     short reg_off;
> > +     bool arg_signed;
> > +     char arg_bitshift;
> > +};
> > +
> > +/* should match BPF_USDT_MAX_ARG_CNT in usdt.bpf.h */
> > +#define USDT_MAX_ARG_CNT 12
> > +
> > +/* should match struct __bpf_usdt_spec from usdt.bpf.h */
> > +struct usdt_spec {
> > +     struct usdt_arg_spec args[USDT_MAX_ARG_CNT];
> > +     __u64 usdt_cookie;
> > +     short arg_cnt;
> > +};
> > +
>
> Would it be worth having a usdt.h that both usdt.bpf.h and usdt.c could
> #include, containing the above definitions, avoiding need to sync?

that's how I started initially, but then decided that one extra header
just for two structs and enum is a bit too much. It's quite unlikely
that they will be changed often or will get out of sync, I think...

>
> > +struct usdt_note {
> > +     const char *provider;
> > +     const char *name;
> > +     /* USDT args specification string, e.g.:
> > +      * "-4@%esi -4@-24(%rbp) -4@%ecx 2@%ax 8@%rdx"
> > +      */

[...]

> > -     return -ENOTSUP;
> > +     size_t off, name_off, desc_off, seg_cnt = 0, lib_seg_cnt = 0, target_cnt = 0;
> > +     struct elf_seg *segs = NULL, *lib_segs = NULL;
> > +     struct usdt_target *targets = NULL, *target;
> > +     long base_addr = 0;
> > +     Elf_Scn *notes_scn, *base_scn;
> > +     GElf_Shdr base_shdr, notes_shdr;
> > +     GElf_Ehdr ehdr;
> > +     GElf_Nhdr nhdr;
> > +     Elf_Data *data;
> > +     int err;
> > +
> > +     *out_targets = NULL;
> > +     *out_target_cnt = 0;
> > +
> > +     err = find_elf_sec_by_name(elf, USDT_NOTE_SEC, &notes_shdr, &notes_scn);
> > +     if (err)
>
> since find_elf_sec_by_name() doesn't log anything, would be good to have a
> pr_warn("usdt: no " USDT_NOTE_SEC " section in '%s'", path);

ok

> > +             return err;
> > +
> > +     if (notes_shdr.sh_type != SHT_NOTE)
> > +             return -EINVAL;
> > +
> > +     if (!gelf_getehdr(elf, &ehdr))
> > +             return -EINVAL;
> > +
>
> the above two are unlikely, but could perhaps benefit from an error
> message like below..

yeah, super unlikely, but I can combine the two and log warning

>
> > +     err = parse_elf_segs(elf, path, &segs, &seg_cnt);
> > +     if (err) {
> > +             pr_warn("usdt: failed to process ELF program segments for '%s': %d\n", path, err);
> > +             goto err_out;
> > +     }
> > +
> > +     /* .stapsdt.base ELF section is optional, but is used for prelink
> > +      * offset compensation (see a big comment further below)
> > +      */
> > +     if (find_elf_sec_by_name(elf, USDT_BASE_SEC, &base_shdr, &base_scn) == 0)
> > +             base_addr = base_shdr.sh_addr;
> > +
> > +     data = elf_getdata(notes_scn, 0);
> > +     off = 0;
> > +     while ((off = gelf_getnote(data, off, &nhdr, &name_off, &desc_off)) > 0) {
> > +             long usdt_abs_ip, usdt_rel_ip, usdt_sema_off = 0;
> > +             struct usdt_note note;
> > +             struct elf_seg *seg = NULL;
> > +             void *tmp;
> > +
> > +             err = parse_usdt_note(elf, path, base_addr, &nhdr,
> > +                                   data->d_buf, name_off, desc_off, &note);
> > +             if (err)
> > +                     goto err_out;
> > +
> > +             if (strcmp(note.provider, usdt_provider) != 0 || strcmp(note.name, usdt_name) != 0)
> > +                     continue;
> > +
> > +             /* We need to compensate "prelink effect". See [0] for details,
> > +              * relevant parts quoted here:
> > +              *
> > +              * Each SDT probe also expands into a non-allocated ELF note. You can
> > +              * find this by looking at SHT_NOTE sections and decoding the format;
> > +              * see below for details. Because the note is non-allocated, it means
> > +              * there is no runtime cost, and also preserved in both stripped files
> > +              * and .debug files.
> > +              *
> > +              * However, this means that prelink won't adjust the note's contents
> > +              * for address offsets. Instead, this is done via the .stapsdt.base
> > +              * section. This is a special section that is added to the text. We
> > +              * will only ever have one of these sections in a final link and it
> > +              * will only ever be one byte long. Nothing about this section itself
> > +              * matters, we just use it as a marker to detect prelink address
> > +              * adjustments.
> > +              *
> > +              * Each probe note records the link-time address of the .stapsdt.base
> > +              * section alongside the probe PC address. The decoder compares the
> > +              * base address stored in the note with the .stapsdt.base section's
> > +              * sh_addr. Initially these are the same, but the section header will
> > +              * be adjusted by prelink. So the decoder applies the difference to
> > +              * the probe PC address to get the correct prelinked PC address; the
> > +              * same adjustment is applied to the semaphore address, if any.
> > +              *
> > +              *   [0] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> > +              */
>
> ouch. nice explanation!
>
> > +             usdt_rel_ip = usdt_abs_ip = note.loc_addr;
> > +             if (base_addr) {
> > +                     usdt_abs_ip += base_addr - note.base_addr;
> > +                     usdt_rel_ip += base_addr - note.base_addr;
> > +             }
> > +
> > +             if (ehdr.e_type == ET_EXEC) {
>
> should we use a bool is_shared_library here; might simplify debug
> messaging below...

Heh, I actually started out with is_shared_library variable, but ended
up preferring more explicit ET_EXEC and ET_DYN constants instead.

>
> > +                     /* When attaching uprobes (which what USDTs basically
> > +                      * are) kernel expects a relative IP to be specified,
> > +                      * so if we are attaching to an executable ELF binary
> > +                      * (i.e., not a shared library), we need to calculate
> > +                      * proper relative IP based on ELF's load address
> > +                      */
> > +                     seg = find_elf_seg(segs, seg_cnt, usdt_abs_ip, false /* relative */);
> > +                     if (!seg) {
> > +                             err = -ESRCH;
> > +                             pr_warn("usdt: failed to find ELF program segment for '%s:%s' in '%s' at IP 0x%lx\n",
> > +                                     usdt_provider, usdt_name, path, usdt_abs_ip);
> > +                             goto err_out;
> > +                     }
> > +                     if (!seg->is_exec) {
> > +                             err = -ESRCH;
> > +                             pr_warn("usdt: matched ELF binary '%s' segment [0x%lx, 0x%lx) for '%s:%s' at IP 0x%lx is not executable\n",
> > +                                     path, seg->start, seg->end, usdt_provider, usdt_name,
> > +                                     usdt_abs_ip);
> > +                             goto err_out;
> > +                     }
> > +
> > +                     usdt_rel_ip = usdt_abs_ip - (seg->start - seg->offset);
> > +             } else if (!man->has_bpf_cookie) { /* ehdr.e_type == ET_DYN */
> > +                     /* If we don't have BPF cookie support but need to
> > +                      * attach to a shared library, we'll need to know and
> > +                      * record absolute addresses of attach points due to
> > +                      * the need to lookup USDT spec by absolute IP of
> > +                      * triggered uprobe. Doing this resolution is only
> > +                      * possible when we have a specific PID of the process
> > +                      * that's using specified shared library. BPF cookie
> > +                      * removes the absolute address limitation as we don't
> > +                      * need to do this lookup (we just use BPF cookie as
> > +                      * an index of USDT spec), so for newer kernels with
> > +                      * BPF cookie support libbpf supports USDT attachment
> > +                      * to shared libraries with no PID filter.
> > +                      */
> > +                     if (pid < 0) {
> > +                             pr_warn("usdt: attaching to shared libaries without specific PID is not supported on current kernel\n");
> > +                             err = -ENOTSUP;
> > +                             goto err_out;
> > +                     }
> > +
> > +                     /* lib_segs are lazily initialized only if necessary */
> > +                     if (lib_seg_cnt == 0) {
> > +                             err = parse_lib_segs(pid, path, &lib_segs, &lib_seg_cnt);
> > +                             if (err) {
> > +                                     pr_warn("usdt: failed to get memory segments in PID %d for shared library '%s': %d\n",
> > +                                             pid, path, err);
> > +                                     goto err_out;
> > +                             }
> > +                     }
> > +
> > +                     seg = find_elf_seg(lib_segs, lib_seg_cnt, usdt_rel_ip, true /* relative */);
> > +                     if (!seg) {
> > +                             err = -ESRCH;
> > +                             pr_warn("usdt: failed to find shared lib memory segment for '%s:%s' in '%s' at relative IP 0x%lx\n",
> > +                                      usdt_provider, usdt_name, path, usdt_rel_ip);
> > +                             goto err_out;
> > +                     }
> > +
> > +                     usdt_abs_ip = seg->start + (usdt_rel_ip - seg->offset);
> > +             }
> > +
> > +             pr_debug("usdt: probe for '%s:%s' in %s '%s': addr 0x%lx base 0x%lx (resolved abs_ip 0x%lx rel_ip 0x%lx) args '%s' in segment [0x%lx, 0x%lx) at offset 0x%lx\n",
> > +                      usdt_provider, usdt_name, ehdr.e_type == ET_EXEC ? "exec" : "lib ", path,
> > +                      note.loc_addr, note.base_addr, usdt_abs_ip, usdt_rel_ip, note.args,
> > +                      seg ? seg->start : 0, seg ? seg->end : 0, seg ? seg->offset : 0);
> > +
> > +             /* Adjust semaphore address to be a relative offset */
> > +             if (note.sema_addr) {
> > +                     if (!man->has_sema_refcnt) {
> > +                             pr_warn("usdt: kernel doesn't support USDT semaphore refcounting for '%s:%s' in '%s'\n",
> > +                                     usdt_provider, usdt_name, path);
> > +                             err = -ENOTSUP;
> > +                             goto err_out;
> > +                     }
> > +
> > +                     seg = find_elf_seg(segs, seg_cnt, note.sema_addr, false /* relative */);
> > +                     if (!seg) {
> > +                             err = -ESRCH;
> > +                             pr_warn("usdt: failed to find ELF loadable segment with semaphore of '%s:%s' in '%s' at 0x%lx\n",
> > +                                     usdt_provider, usdt_name, path, note.sema_addr);
> > +                             goto err_out;
> > +                     }
> > +                     if (seg->is_exec) {
> > +                             err = -ESRCH;
> > +                             pr_warn("usdt: matched ELF binary '%s' segment [0x%lx, 0x%lx] for semaphore of '%s:%s' at 0x%lx is executable\n",
> > +                                     path, seg->start, seg->end, usdt_provider, usdt_name,
> > +                                     note.sema_addr);
> > +                             goto err_out;
> > +                     }
> > +
>
> could have a bool "exec" arg to find_elf_seg() which allows/disallows the
> segment to be executable I guess.

find_elf_seg() has already a boolean argument (relative), adding
another one just for this one place seems wrong. And I can provide a
better error message if I do seg->is_exec check here. So I'm leaning
towards leaving it as is.

>
> Alan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 4/7] libbpf: wire up spec management and other arch-independent USDT logic
  2022-03-31 14:49   ` Alan Maguire
@ 2022-03-31 19:16     ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-31 19:16 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Dave Marchevsky

On Thu, Mar 31, 2022 at 7:50 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On Fri, 25 Mar 2022, Andrii Nakryiko wrote:
>
> > Last part of architecture-agnostic user-space USDT handling logic is to
> > set up BPF spec and, optionally, IP-to-ID maps from user-space.
> > usdt_manager performs a compact spec ID allocation to utilize
> > fixed-sized BPF maps as efficiently as possible. We also use hashmap to
> > deduplicate USDT arg spec strings and map identical strings to single
> > USDT spec, minimizing the necessary BPF map size. usdt_manager supports
> > arbitrary sequences of attachment and detachment, both of the same USDT
> > and multiple different USDTs and internally maintains a free list of
> > unused spec IDs. bpf_link_usdt's logic is extended with proper setup and
> > teardown of this spec ID free list and supporting BPF maps.
> >
>
> It might be good to describe the relationship between a USDT specification
> (spec) and the site specific targets that can be associated with it.  So
> the spec is the description of the provider + name + args, and the the
> target represents the potentially multiple sites associated with that
> spec.
>
> Specs are stored in the spec array map, indexed by spec_id; targets are
> stored in the ip_map, and these reference a spec id.  So from the BPF side
> we can use the bpf_cookie to look up the spec directly, or if cookies are
> not supported on the BPF side, we can look up ip -> spec_id mapping in
> ip_map, and from there can look up the spec_id -> spec in the spec map.
>

Correct, I'll incorporate that into comments I'm going to add in v2, thanks.

> Dumb question here: the spec id recycling is a lot of work;
> instead of maintaining this for the array map, couldn't we use a hashmap
> for spec ids with a monotonically-increasing next_spec_id value or
> something similar?

We could, but hashmap lookup is significantly slower than ARRAY
lookup, so I chose performance in this case. Maintaining the list of
IDs isn't that big of a deal.

>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> one suggestion below, but
>
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
>
> > ---
> >  tools/lib/bpf/usdt.c | 167 ++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 166 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> > index 86d5d8390eb1..22f5f56992f8 100644
> > --- a/tools/lib/bpf/usdt.c
> > +++ b/tools/lib/bpf/usdt.c
>
> <snip>
>
> >               opts.ref_ctr_offset = target->sema_off;
> > +             opts.bpf_cookie = man->has_bpf_cookie ? spec_id : 0;
> >               uprobe_link = bpf_program__attach_uprobe_opts(prog, pid, path,
> >                                                             target->rel_ip, &opts);
> >               err = libbpf_get_error(link);
>
> should be uprobe_link I think.
>


Nice catch, will fix.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 5/7] libbpf: add x86-specific USDT arg spec parsing logic
  2022-03-31 15:13   ` Alan Maguire
@ 2022-03-31 19:20     ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-31 19:20 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Dave Marchevsky

On Thu, Mar 31, 2022 at 8:14 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On Fri, 25 Mar 2022, Andrii Nakryiko wrote:
>
> > Add x86/x86_64-specific USDT argument specification parsing. Each
> > architecture will require their own logic, as all this is arch-specific
> > assembly-based notation. Architectures that libbpf doesn't support for
> > USDTs will pr_warn() with specific error and return -ENOTSUP.
> >
> > We use sscanf() as a very powerful and easy to use string parser. Those
> > spaces in sscanf's format string mean "skip any whitespaces", which is
> > pretty nifty (and somewhat little known) feature.
> >
> > All this was tested on little-endian architecture, so bit shifts are
> > probably off on big-endian, which our CI will hopefully prove.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
>
> minor stuff below...
>
> > ---
> >  tools/lib/bpf/usdt.c | 105 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 105 insertions(+)
> >
> > diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> > index 22f5f56992f8..5cf809db60aa 100644
> > --- a/tools/lib/bpf/usdt.c
> > +++ b/tools/lib/bpf/usdt.c
> > @@ -1007,8 +1007,113 @@ static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note,
> >       return 0;
> >  }
> >
> > +/* Architecture-specific logic for parsing USDT argument location specs */
> > +
> > +#if defined(__x86_64__) || defined(__i386__)
> > +
> > +static int calc_pt_regs_off(const char *reg_name)
> > +{
> > +     static struct {
> > +             const char *names[4];
> > +             size_t pt_regs_off;
> > +     } reg_map[] = {
> > +#if __x86_64__
> > +#define reg_off(reg64, reg32) offsetof(struct pt_regs, reg64)
> > +#else
> > +#define reg_off(reg64, reg32) offsetof(struct pt_regs, reg32)
> > +#endif
> > +             { {"rip", "eip", "", ""}, reg_off(rip, eip) },
> > +             { {"rax", "eax", "ax", "al"}, reg_off(rax, eax) },
> > +             { {"rbx", "ebx", "bx", "bl"}, reg_off(rbx, ebx) },
> > +             { {"rcx", "ecx", "cx", "cl"}, reg_off(rcx, ecx) },
> > +             { {"rdx", "edx", "dx", "dl"}, reg_off(rdx, edx) },
> > +             { {"rsi", "esi", "si", "sil"}, reg_off(rsi, esi) },
> > +             { {"rdi", "edi", "di", "dil"}, reg_off(rdi, edi) },
> > +             { {"rbp", "ebp", "bp", "bpl"}, reg_off(rbp, ebp) },
> > +             { {"rsp", "esp", "sp", "spl"}, reg_off(rsp, esp) },
> > +#undef reg_off
> > +#if __x86_64__
> > +             { {"r8", "r8d", "r8w", "r8b"}, offsetof(struct pt_regs, r8) },
> > +             { {"r9", "r9d", "r9w", "r9b"}, offsetof(struct pt_regs, r9) },
> > +             { {"r10", "r10d", "r10w", "r10b"}, offsetof(struct pt_regs, r10) },
> > +             { {"r11", "r11d", "r11w", "r11b"}, offsetof(struct pt_regs, r11) },
> > +             { {"r12", "r12d", "r12w", "r12b"}, offsetof(struct pt_regs, r12) },
> > +             { {"r13", "r13d", "r13w", "r13b"}, offsetof(struct pt_regs, r13) },
> > +             { {"r14", "r14d", "r14w", "r14b"}, offsetof(struct pt_regs, r14) },
> > +             { {"r15", "r15d", "r15w", "r15b"}, offsetof(struct pt_regs, r15) },
> > +#endif
> > +     };
> > +     int i, j;
> > +
> > +     for (i = 0; i < ARRAY_SIZE(reg_map); i++) {
> > +             for (j = 0; j < ARRAY_SIZE(reg_map[i].names); j++) {
> > +                     if (strcmp(reg_name, reg_map[i].names[j]) == 0)
> > +                             return reg_map[i].pt_regs_off;
> > +             }
> > +     }
> > +
> > +     pr_warn("usdt: unrecognized register '%s'\n", reg_name);
> > +     return -ENOENT;
> > +}
>
> this is a really neat approach! could we shrink the arch-dependent
> part even further to the reg_map only? so instead of having the
> parse_usdt_arg() in the #ifdef __x86_64__/___i386__ , only the
> reg_map is, and we have an empty reg_map for an unsupported arch
> such that calc_pt_regs_off() does
>

That would reduce the flexibility and will save only a few lines of
code. Different architectures might have their own quirks and reg_map
might not fit all their needs. So I went for a more independent and
flexible approach, even if some loop has to be duplicated.

>         if (ARRAY_SIZE(reg_map) == 0) {
>                 pr_warn("usdt: libbpf doesn't support USDTs on current
> architecture\n");
>                 return -ENOTSUP;
>         }
>
> > +
> > +static int parse_usdt_arg(const char *arg_str, int arg_num, struct usdt_arg_spec *arg)
> > +{
> > +     char *reg_name = NULL;
> > +     int arg_sz, len, reg_off;
> > +     long off;
> > +
>
> nit but it took me a moment to notice that you had examples in each
> clause; might be good to have a higher-level comment stating
>
> we support 3 forms of argument description:
>
> - register dereference "-4@-20(%rbp)"
> - register "-4@%eax"
> - constant "4@$71"
>
> I _think_ you mentioned there were other valid arg formats that we're not
> supporting, would be good to be explicit about that here I think; "other
> formats are possible but we don't support them currently".

Yep, sure. Those examples in the comments below are indeed easy to miss.

>
> > +     if (3 == sscanf(arg_str, " %d @ %ld ( %%%m[^)] ) %n", &arg_sz, &off, &reg_name, &len)) {
> > +             /* -4@-20(%rbp) */
> > +             arg->arg_type = USDT_ARG_REG_DEREF;
> > +             arg->val_off = off;
> > +             reg_off = calc_pt_regs_off(reg_name);
> > +             free(reg_name);
> > +             if (reg_off < 0)
> > +                     return reg_off;
> > +             arg->reg_off = reg_off;

[...]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 6/7] selftests/bpf: add basic USDT selftests
  2022-03-31 15:54   ` Alan Maguire
@ 2022-03-31 19:28     ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-31 19:28 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Dave Marchevsky

On Thu, Mar 31, 2022 at 8:55 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> cOn Fri, 25 Mar 2022, Andrii Nakryiko wrote:
>
> > Add semaphore-based USDT to test_progs itself and write basic tests to
> > valicate both auto-attachment and manual attachment logic, as well as
> > BPF-side functionality.
> >
> > Also add subtests to validate that libbpf properly deduplicates USDT
> > specs and handles spec overflow situations correctly, as well as proper
> > "rollback" of partially-attached multi-spec USDT.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> One compilation issue and minor nit below
>
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
>
> > ---
> >  tools/testing/selftests/bpf/Makefile          |   1 +
> >  tools/testing/selftests/bpf/prog_tests/usdt.c | 314 ++++++++++++++++++
> >  tools/testing/selftests/bpf/progs/test_usdt.c | 115 +++++++
> >  3 files changed, 430 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/usdt.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/test_usdt.c
> >
> > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> > index 3820608faf57..18e22def3bdb 100644
> > --- a/tools/testing/selftests/bpf/Makefile
> > +++ b/tools/testing/selftests/bpf/Makefile
> > @@ -400,6 +400,7 @@ $(TRUNNER_BPF_OBJS): $(TRUNNER_OUTPUT)/%.o:                               \
> >                    $(TRUNNER_BPF_PROGS_DIR)/*.h                       \
> >                    $$(INCLUDE_DIR)/vmlinux.h                          \
> >                    $(wildcard $(BPFDIR)/bpf_*.h)                      \
> > +                  $(wildcard $(BPFDIR)/*.bpf.h)                      \
> >                    | $(TRUNNER_OUTPUT) $$(BPFOBJ)
> >       $$(call $(TRUNNER_BPF_BUILD_RULE),$$<,$$@,                      \
> >                                         $(TRUNNER_BPF_CFLAGS))
> > diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
> > new file mode 100644
> > index 000000000000..44a20d8c45d7
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
> > @@ -0,0 +1,314 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> > +#include <test_progs.h>
> > +
> > +#define _SDT_HAS_SEMAPHORES 1
> > +#include <sys/sdt.h>
> > +
>
> Do we need to bracket with feature test for sdt.h? I think I had
> something rough for this in
>
> https://lore.kernel.org/bpf/1642004329-23514-5-git-send-email-alan.maguire@oracle.com/
>
> might prevent selftest compilation failures if sdt.h isn't present,
> and IIRC that feature test is used in perf code.

Well, I was thinking to just specify in README that one needs to have
sys/sdt.h installed from the systemtap-sdt-devel package.
Alternatively, copy/pasting sdt.h locally and using it is also an
option, that header is quite well contained and has permissive
license. The latter is less hassle for everyone, but someone might
have concerns about checking in external header. So in v2 I'll go with
documenting dependency on systemtap-sdt-devel package, unless people
prefer sdt.h being checked in

>
> I just realized I got confused on the cookie logic. There's really two
> levels of cookies:
>
> - at the API level, the USDT cookie is associated with the USDT
>   attachment, and can span multiple sites; but under the hood
> - the uprobe cookie is used to associate the uprobe point of attachment
>   with the associated spec id.  If BPF cookie retrieval isn't supported,
>   we fall back to using the instruction pointer -> spec id mapping.
>
> To get the usdt cookie in BPF prog context, we first look up the uprobe
> cookie to get the spec id, and then get the spec entry.

Yep, it's all cookies around :) Not sure how to make the distinction
cleaner, tbh.

>
> I guess libbpf CI on older kernels will cover testing for the case where
> bpf cookies aren't supported and we need to do that ip -> spec id
> mapping? Perhaps we could have a test that #defines
> BPF_USDT_HAS_BPF_COOKIE to 0 to cover testing this on newer kernels?

Yes, you are right about CI, I plan to enable this test on 4.9 and 5.5
kernels we have in CI.

Just setting BPF_USDT_HAS_BPF_COOKIE to 0 won't work because
user-space part is doing it's own detection of BPF cookie support, and
doing it some other way is way too complicated for something that is
necessary for selftest. But we'll get coverage for old kernels in CI,
so that's good news.

>
> > +#include "test_usdt.skel.h"
> > +#include "test_urandom_usdt.skel.h"
> > +
> > +int lets_test_this(int);
> > +
> > +static volatile int idx = 2;
> > +static volatile __u64 bla = 0xFEDCBA9876543210ULL;
> > +static volatile short nums[] = {-1, -2, -3, };
> > +

[...]

> > +/* we shouldn't be able to attach to test:usdt2_300 USDT as we don't have as
> > + * many slots for specs. It's important that each STAP_PROBE2() invocation
> > + * (after untolling) gets different arg spec due to compiler inlining i as
> > + * a constant
> > + */
> > +static void __always_inline f300(int x)
> > +{
> > +     STAP_PROBE1(test, usdt_300, x);
> > +}
> > +
> > +__weak void trigger_300_usdts(void)
> > +{
> > +     R100(f300, 0);
> > +     R100(f300, 100);
> > +     R100(f300, 200);
> > +}
> > +
> > +static void __always_inline f400(int /*unused*/ )
>
> ...caused a compilation error on gcc-9 for me:
>
>   TEST-OBJ [test_progs] usdt.test.o
> /home/alan/kbuild/bpf-next/tools/testing/selftests/bpf/prog_tests/usdt.c:
> In function ‘f400’:
> /home/alan/kbuild/bpf-next/tools/testing/selftests/bpf/prog_tests/usdt.c:191:34:
> error: parameter name omitted
>   191 | static void __always_inline f400(int /*unused*/ )
>       |                                  ^~~
> make: ***
> [/home/alan/kbuild/bpf-next/tools/testing/selftests/bpf/usdt.test.o] Error
> 1
>  ...but with
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c
> b/tools/testing/selftests/bpf/prog_tests/
> index b4c070b..5d382c8 100644
> --- a/tools/testing/selftests/bpf/prog_tests/usdt.c
> +++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
> @@ -188,7 +188,7 @@ __weak void trigger_300_usdts(void)
>         R100(f300, 200);
>  }
>
> -static void __always_inline f400(int /*unused*/ )
> +static void __always_inline f400(int u /*unused*/ )
>  {
>         static int x;
>
>
>
> ...tests passed cleanly.

oh, cool, thanks for the report. I'll name the argument and add
__attribute__((unused)) to prevent other compilers to complain

>
> > +{
> > +     static int x;
> > +
> > +     STAP_PROBE1(test, usdt_400, x++);
> > +}
> > +

[...]

> > +SEC("usdt//proc/self/exe:test:usdt_100")
> > +int BPF_USDT(usdt_100, int x)
> > +{
> > +     long tmp;
> > +
> > +     if (my_pid != (bpf_get_current_pid_tgid() >> 32))
> > +             return 0;
> > +
> > +     __sync_fetch_and_add(&usdt_100_called, 1);
> > +     __sync_fetch_and_add(&usdt_100_sum, x);
> > +
> > +     bpf_printk("X is %d, sum is %d", x, usdt_100_sum);
> > +
>
> debugging, needed?

oops, yep, leftovers, will clean up.

>
> > +     return 0;
> > +}
> > +
> > +char _license[] SEC("license") = "GPL";
> > --
> > 2.30.2
> >
> >

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: program local storage. Was: [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-03-31 18:34   ` program local storage. Was: " Alexei Starovoitov
@ 2022-03-31 20:13     ` Andrii Nakryiko
  2022-04-01  0:38       ` Alexei Starovoitov
  0 siblings, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-31 20:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Alan Maguire, Dave Marchevsky

On Thu, Mar 31, 2022 at 11:34 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Mar 24, 2022 at 10:30 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> > +
> > +struct __bpf_usdt_arg_spec {
> > +       __u64 val_off;
> > +       enum __bpf_usdt_arg_type arg_type;
> > +       short reg_off;
> > +       bool arg_signed;
> > +       char arg_bitshift;
> > +};
> > +
> > +/* should match USDT_MAX_ARG_CNT in usdt.c exactly */
> > +#define BPF_USDT_MAX_ARG_CNT 12
> > +struct __bpf_usdt_spec {
> > +       struct __bpf_usdt_arg_spec args[BPF_USDT_MAX_ARG_CNT];
> > +       __u64 usdt_cookie;
> > +       short arg_cnt;
> > +};
> > +
> > +__weak struct {
> > +       __uint(type, BPF_MAP_TYPE_ARRAY);
> > +       __uint(max_entries, BPF_USDT_MAX_SPEC_CNT);
> > +       __type(key, int);
> > +       __type(value, struct __bpf_usdt_spec);
> > +} __bpf_usdt_specs SEC(".maps");
> > +
> > +__weak struct {
> > +       __uint(type, BPF_MAP_TYPE_HASH);
> > +       __uint(max_entries, BPF_USDT_MAX_IP_CNT);
> > +       __type(key, long);
> > +       __type(value, struct __bpf_usdt_spec);
> > +} __bpf_usdt_specs_ip_to_id SEC(".maps");
> ...
>
> > +
> > +/* Fetch USDT argument *arg* (zero-indexed) and put its value into *res.
> > + * Returns 0 on success; negative error, otherwise.
> > + * On error *res is guaranteed to be set to zero.
> > + */
> > +__hidden __weak
> > +int bpf_usdt_arg(struct pt_regs *ctx, int arg, long *res)
> > +{
> > +       struct __bpf_usdt_spec *spec;
> > +       struct __bpf_usdt_arg_spec *arg_spec;
> > +       unsigned long val;
> > +       int err, spec_id;
> > +
> > +       *res = 0;
> > +
> > +       spec_id = __bpf_usdt_spec_id(ctx);
> > +       if (spec_id < 0)
> > +               return -ESRCH;
> > +
> > +       spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> > +       if (!spec)
> > +               return -ESRCH;
> > +
> > +       if (arg >= spec->arg_cnt)
> > +               return -ENOENT;
> > +
> > +       arg_spec = &spec->args[arg];
> > +       switch (arg_spec->arg_type) {
>
> Without bpf_cookie in the kernel each arg access is two lookups.
> With bpf_cookie it's a single lookup in an array that is fast.
> Multiply that cost by number of args.
> Not a huge cost, but we can do better long term.
>
> How about annotating bpf_cookie with PTR_TO_BTF_ID at prog load time.
> So that bpf_get_attach_cookie() returns PTR_TO_BTF_ID instead of long.
> This way bpf_get_attach_cookie() can return
> "struct __bpf_usdt_spec *".
>
> At attach time libbpf will provide populated 'struct __bpf_usdt_spec'
> to the kernel and the kernel will copy the struct's data
> in the bpf_link.
> At detach time that memory is freed.
>
> Advantages:
> - saves an array lookup at runtime
> - no need to provide size for __bpf_usdt_specs map.
>   That map is no longer needed.
>   users don't need to worry about maxing out BPF_USDT_MAX_SPEC_CNT.
> - libbpf doesn't need to populate __bpf_usdt_specs map
>   libbpf doesn't need to allocate spec_id-s.
>   libbpf will keep struct __bpf_usdt_spec per uprobe and
>   pass it to the kernel at attach time to store in bpf_link.
>
> "cookie as ptr_to_btf_id" is a generic mechanism to provide a
> blob of data to the bpf prog instead of a single "long".
> That blob can be read/write too.
> It can be used as per-program + per-attach point scratch area.
> Similar to task/inode local storage...
> That would be (prog, attach_point) local storage.
>
> Thoughts?

Well, I'm not concerned about ARRAY lookup, as it is inlined and very
fast. Sizing maps is hard and annoying, true, but I think we should
eventually just have resizable or dynamically-sized BPF maps, which
will be useful in a lot of other contexts.

We've had a discussion about a cookie that's bigger than 8 bytes with
Daniel. I argued for simplicity and I still like it. If you think we
should add blobs per attachment, it's fine, but let's keep it separate
from the BPF cookie.

As for the PTR_TO_BTF_ID, I'm a bit confused, as kernel doesn't know
__bpf_usdt_spec type, it's not part of vmlinux BTF, so you are
proposing to have PTR_TO_BTF_ID that points to user-provided type? I'm
not sure I see how exactly that will work from the verifier's
standpoint, tbh. At least I don't see how verifier can allow more than
just giving direct memory access to a memory buffer. But then each
uprobe attachment can have differently-sized blob, so statically
verifying that during program load time is impossible.

In any case, I don't think we should wait for any extra kernel
functionality to add USDT support. If we have some of those and they
bring noticeable benefits, we can opportunistically use them, if the
kernel is recent enough.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-03-31 18:49     ` Andrii Nakryiko
@ 2022-03-31 20:52       ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2022-03-31 20:52 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Dave Marchevsky

On Thu, Mar 31, 2022 at 11:49 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Mar 31, 2022 at 4:31 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >
> > On Fri, 25 Mar 2022, Andrii Nakryiko wrote:
> >
> > > Add BPF-side implementation of libbpf-provided USDT support. This
> > > consists of single header library, usdt.bpf.h, which is meant to be used
> > > from user's BPF-side source code. This header is added to the list of
> > > installed libbpf header, along bpf_helpers.h and others.
> > >
> >
> > <snip>
> >
> > Some suggestions below, but nothing major.
> >
> > Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> >
> > > diff --git a/tools/lib/bpf/usdt.bpf.h b/tools/lib/bpf/usdt.bpf.h
> > > new file mode 100644
> > > index 000000000000..8ee084b2e6b5
> > > --- /dev/null
> > > +++ b/tools/lib/bpf/usdt.bpf.h
> > > @@ -0,0 +1,228 @@
> > > +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> > > +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
> > > +#ifndef __USDT_BPF_H__
> > > +#define __USDT_BPF_H__
> > > +
> > > +#include <linux/errno.h>
> > > +#include <bpf/bpf_helpers.h>
> > > +#include <bpf/bpf_tracing.h>
> > > +#include <bpf/bpf_core_read.h>
> > > +
> > > +/* Below types and maps are internal implementation details of libpf's USDT
> > > + * support and are subjects to change. Also, usdt_xxx() API helpers should be
> > > + * considered an unstable API as well and might be adjusted based on user
> > > + * feedback from using libbpf's USDT support in production.
> > > + */
> > > +
> > > +/* User can override BPF_USDT_MAX_SPEC_CNT to change default size of internal
> > > + * map that keeps track of USDT argument specifications. This might be
> > > + * necessary if there are a lot of USDT attachments.
> > > + */
> > > +#ifndef BPF_USDT_MAX_SPEC_CNT
> > > +#define BPF_USDT_MAX_SPEC_CNT 256
> > > +#endif
> > > +/* User can override BPF_USDT_MAX_IP_CNT to change default size of internal
> > > + * map that keeps track of IP (memory address) mapping to USDT argument
> > > + * specification.
> > > + * Note, if kernel supports BPF cookies, this map is not used and could be
> > > + * resized all the way to 1 to save a bit of memory.
> > > + */
> > > +#ifndef BPF_USDT_MAX_IP_CNT
> > > +#define BPF_USDT_MAX_IP_CNT 1024
> > > +#endif
> >
> > might be no harm to just make this default to a reasonable multiple of
> > BPF_USDT_MAX_SPEC_CNT; i.e. n specs X m possible sites. Would allow users
> > to simply override the MAX_SPEC_CNT in most cases too.
>
> It's not clear what the reasonable multiple is, it will differ for
> different binaries. I can do (4 * BPF_USDT_MAX_SPEC_CNT) to arrive at
> the same default 1024? Do you think that's reasonable?
>
> >
> > > +/* We use BPF CO-RE to detect support for BPF cookie from BPF side. This is
> > > + * the only dependency on CO-RE, so if it's undesirable, user can override
> > > + * BPF_USDT_HAS_BPF_COOKIE to specify whether to BPF cookie is supported or not.
> > > + */
> > > +#ifndef BPF_USDT_HAS_BPF_COOKIE
> > > +#define BPF_USDT_HAS_BPF_COOKIE \
> > > +     bpf_core_enum_value_exists(enum bpf_func_id___usdt, BPF_FUNC_get_attach_cookie___usdt)
> > > +#endif
> > > +
> > > +enum __bpf_usdt_arg_type {
> > > +     BPF_USDT_ARG_CONST,
> > > +     BPF_USDT_ARG_REG,
> > > +     BPF_USDT_ARG_REG_DEREF,
> > > +};
> > > +
> > > +struct __bpf_usdt_arg_spec {
> > > +     __u64 val_off;
> > > +     enum __bpf_usdt_arg_type arg_type;
> > > +     short reg_off;
> > > +     bool arg_signed;
> > > +     char arg_bitshift;
> >
> > would be no harm having a small comment here or below where the
> > bitshifting is done like "for arg sizes less than 8 bytes, this tells
> > us how many bits to shift to left then right to
> > remove the unused bits, giving correct arg value".
>
> sure, I'll add that comment that this is used for casting and
> potentially sign-extending arguments up to u64
>
> >
> > > +};
> > > +
> > > +/* should match USDT_MAX_ARG_CNT in usdt.c exactly */
> > > +#define BPF_USDT_MAX_ARG_CNT 12
> > > +struct __bpf_usdt_spec {
> > > +     struct __bpf_usdt_arg_spec args[BPF_USDT_MAX_ARG_CNT];
> > > +     __u64 usdt_cookie;
> > > +     short arg_cnt;
> > > +};
> > > +
> > > +__weak struct {
> > > +     __uint(type, BPF_MAP_TYPE_ARRAY);
> > > +     __uint(max_entries, BPF_USDT_MAX_SPEC_CNT);
> > > +     __type(key, int);
> > > +     __type(value, struct __bpf_usdt_spec);
> > > +} __bpf_usdt_specs SEC(".maps");
> > > +
> > > +__weak struct {
> > > +     __uint(type, BPF_MAP_TYPE_HASH);
> > > +     __uint(max_entries, BPF_USDT_MAX_IP_CNT);
> > > +     __type(key, long);
> > > +     __type(value, struct __bpf_usdt_spec);
> > > +} __bpf_usdt_specs_ip_to_id SEC(".maps");
> > > +
> > > +/* don't rely on user's BPF code to have latest definition of bpf_func_id */
> > > +enum bpf_func_id___usdt {
> > > +     BPF_FUNC_get_attach_cookie___usdt = 0xBAD, /* value doesn't matter */
> > > +};
> > > +
> > > +static inline int __bpf_usdt_spec_id(struct pt_regs *ctx)
> > > +{
> > > +     if (!BPF_USDT_HAS_BPF_COOKIE) {
> > > +             long ip = PT_REGS_IP(ctx);
> >
> > Trying to sort of the permutations of features, I _think_ is it possible
> > the user has CO-RE support, but the clang version doesn't support the
> > push of the preserve_access_index attribute? Would it be feasible to
> > do an explicit "PT_REGS_IP_CORE(ctx);" here?
>
>
> We don't normally rely on _CORE variants when fetching values from
> pt_regs context, so I didn't want to add more dependency on CO-RE
> here. User can opt out of CO-RE entirely by redefining
> BPF_USDT_HAS_BPF_COOKIE, using PT_REGS_IP_CORE() here would make it
> harder. As for struct pt_regs, in some architectures it's part of
> UAPI, so it's very unlikely that existing fields are going to be moved
> around, so not using _CORE() should be fine, IMO.
>
>
> >
> > > +             int *spec_id_ptr;
> > > +
> > > +             spec_id_ptr = bpf_map_lookup_elem(&__bpf_usdt_specs_ip_to_id, &ip);
> > > +             return spec_id_ptr ? *spec_id_ptr : -ESRCH;
> > > +     }
> > > +
> > > +     return bpf_get_attach_cookie(ctx);
> >
> > should we grab the result in a u64 and handle the 0 case here -
> > meaning "not specified" - and return -ESRCH?
>
> But 0 is a perfectly fine spec ID, so why?
>
> >
> > > +}
> > > +
> > > +/* Return number of USDT arguments defined for currently traced USDT. */
> > > +__hidden __weak
> > > +int bpf_usdt_arg_cnt(struct pt_regs *ctx)
> > > +{
> > > +     struct __bpf_usdt_spec *spec;
> > > +     int spec_id;
> > > +
> > > +     spec_id = __bpf_usdt_spec_id(ctx);
> > > +     if (spec_id < 0)
> > > +             return -EINVAL;
> >
> > spec_id can be 0 for the "cookie not set" case (see above).
> >
> > should we pass through the error value from __bpf_usdt_spec_id()? Looking
> > above it's either -ESRCH or 0, but if we catch the 0 case as above we
> > could just pass through the error value.
> >
>
> See above, zero is correct spec ID. So if the kernel supports cookies
> and bpf_get_attach_cookie() returns zero, that zero is a real value.
>
> > > +
> > > +     spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> > > +     if (!spec)
> > > +             return -EINVAL;
> > > +
> >
> > should this be -ESRCH? we know from the above we had a valid
> > spec_id.
>
> sure, I can change to -ESRCH, though it's more like a -EBUG :)
>
> >
> > > +     return spec->arg_cnt;
> > > +}
> >
> > also, since in every case (I think) that we call __bpf_usdt_spec_id()
> > we co on to look up the spec in the map, would it be easier to
> > combine both operations and have
> >
> > struct __bpf_usdt_spec * __bpf_usdt_spec(struct pt_regs *ctx);
> >
> > ?
>
> You are right, I think now we always get a spec itself. My earlier
> versions had an extra map for stuff like USDT name, so having spec ID
> separately made sense. I'll update the code to return spec directly.
>

So I tried this locally, and that doesn't save any code and frankly
makes code a bit more confusing and uglier. So I'll probably leave it
as is, just make sure all code paths return -ESRCH properly and stuff
like that.

> >
> > > +
> > > +/* Fetch USDT argument *arg* (zero-indexed) and put its value into *res.
> > > + * Returns 0 on success; negative error, otherwise.
> > > + * On error *res is guaranteed to be set to zero.
> > > + */
> > > +__hidden __weak
> > > +int bpf_usdt_arg(struct pt_regs *ctx, int arg, long *res)
> > > +{
> > > +     struct __bpf_usdt_spec *spec;
> > > +     struct __bpf_usdt_arg_spec *arg_spec;
> > > +     unsigned long val;
> > > +     int err, spec_id;
> > > +
> > > +     *res = 0;
> > > +
> > > +     spec_id = __bpf_usdt_spec_id(ctx);
> > > +     if (spec_id < 0)
> > > +             return -ESRCH;
> > > +
> > > +     spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> > > +     if (!spec)
> > > +             return -ESRCH;
> > > +
> > > +     if (arg >= spec->arg_cnt)
> > > +             return -ENOENT;
> > > +
> >
> > I'm surprised you didn't need to check for negative values or a hard
> > upper bound for the arg index here (to keep the verifier happy for
> > the later array indexing using arg). Any dangers that an older
> > LLVM+clang would generate code that might get tripped up on
> > verification with this?
>
> Great point. I think it's because in all the current code arg is a
> known constant, so verifier just knows that everything is within
> bounds. I'll harden the code a bit and will add a test that provides
> arg as dynamic value.
>
> >
> > > +     arg_spec = &spec->args[arg];
> > > +     switch (arg_spec->arg_type) {
> > > +     case BPF_USDT_ARG_CONST:
> > > +             val = arg_spec->val_off;
> > > +             break;
>
> [...]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 7/7] selftests/bpf: add urandom_read shared lib and USDTs
  2022-03-25  5:29 ` [PATCH bpf-next 7/7] selftests/bpf: add urandom_read shared lib and USDTs Andrii Nakryiko
@ 2022-03-31 22:13   ` Alan Maguire
  2022-04-01 16:59     ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Alan Maguire @ 2022-03-31 22:13 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, kernel-team, Alan Maguire, Dave Marchevsky

On Fri, 25 Mar 2022, Andrii Nakryiko wrote:

> Extend urandom_read helper binary to include USDTs of 4 combinations:
> semaphore/semaphoreless (refcounted and non-refcounted) and based in
> executable or shared library. We also extend urandom_read with ability
> to report it's own PID to parent process and wait for parent process to
> ready itself up for tracing urandom_read. We utilize popen() and
> underlying pipe properties for proper signaling.
> 
> Once urandom_read is ready, we add few tests to validate that libbpf's
> USDT attachment handles all the above combinations of semaphore (or lack
> of it) and static or shared library USDTs. Also, we validate that libbpf
> handles shared libraries both with PID filter and without one (i.e., -1
> for PID argument).
> 
> Having the shared library case tested with and without PID is important
> because internal logic differs on kernels that don't support BPF
> cookies. On such older kernels, attaching to USDTs in shared libraries
> without specifying concrete PID doesn't work in principle, because it's
> impossible to determine shared library's load address to derive absolute
> IPs for uprobe attachments. Without absolute IPs, it's impossible to
> perform correct look up of USDT spec based on uprobe's absolute IP (the
> only kind available from BPF at runtime). This is not the problem on
> newer kernels with BPF cookie as we don't need IP-to-ID lookup because
> BPF cookie value *is* spec ID.
> 
> So having those two situations as separate subtests is good because
> libbpf CI is able to test latest selftests against old kernels (e.g.,
> 4.9 and 5.5), so we'll be able to disable PID-less shared lib attachment
> for old kernels, but will still leave PID-specific one enabled to validate
> this legacy logic is working correctly.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>


haven't looked at this in depth yet, but hit a compilation error on 
aarch64:

  LIB      liburandom_read.so
/usr/bin/ld: /tmp/ccNy8cuv.o: relocation R_AARCH64_ADR_PREL_PG_HI21 
against symbol `urandlib_read_with_sema_semaphore' which may bind 
externally can not be used when making a shared object; recompile with 
-fPIC
/tmp/ccNy8cuv.o: In function `urandlib_read_with_sema':
/home/opc/src/bpf-next/tools/testing/selftests/bpf/urandom_read_lib1.c:12:(.text+0x10): 
dangerous relocation: unsupported relocation
collect2: error: ld returned 1 exit status
make: *** [Makefile:173: 
/home/opc/src/bpf-next/tools/testing/selftests/bpf/liburandom_read.so] 
Error 1

following did fix it:

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 58da22c019a8..c89e2948276b 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -170,7 +170,7 @@ $(OUTPUT)/%:%.c
 
 $(OUTPUT)/liburandom_read.so: urandom_read_lib1.c urandom_read_lib2.c
        $(call msg,LIB,,$@)
-       $(Q)$(CC) $(CFLAGS) $(LDFLAGS) $^ $(LDLIBS) --shared -o $@
+       $(Q)$(CC) $(CFLAGS) -fPIC $(LDFLAGS) $^ $(LDLIBS) --shared -o $@
 
 $(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c 
$(OUTPUT)/liburandom_read.so
        $(call msg,BINARY,,$@)


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: program local storage. Was: [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-03-31 20:13     ` Andrii Nakryiko
@ 2022-04-01  0:38       ` Alexei Starovoitov
  2022-04-01 16:56         ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2022-04-01  0:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Alan Maguire, Dave Marchevsky

On Thu, Mar 31, 2022 at 1:13 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Mar 31, 2022 at 11:34 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Mar 24, 2022 at 10:30 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > +
> > > +struct __bpf_usdt_arg_spec {
> > > +       __u64 val_off;
> > > +       enum __bpf_usdt_arg_type arg_type;
> > > +       short reg_off;
> > > +       bool arg_signed;
> > > +       char arg_bitshift;
> > > +};
> > > +
> > > +/* should match USDT_MAX_ARG_CNT in usdt.c exactly */
> > > +#define BPF_USDT_MAX_ARG_CNT 12
> > > +struct __bpf_usdt_spec {
> > > +       struct __bpf_usdt_arg_spec args[BPF_USDT_MAX_ARG_CNT];
> > > +       __u64 usdt_cookie;
> > > +       short arg_cnt;
> > > +};
> > > +
> > > +__weak struct {
> > > +       __uint(type, BPF_MAP_TYPE_ARRAY);
> > > +       __uint(max_entries, BPF_USDT_MAX_SPEC_CNT);
> > > +       __type(key, int);
> > > +       __type(value, struct __bpf_usdt_spec);
> > > +} __bpf_usdt_specs SEC(".maps");
> > > +
> > > +__weak struct {
> > > +       __uint(type, BPF_MAP_TYPE_HASH);
> > > +       __uint(max_entries, BPF_USDT_MAX_IP_CNT);
> > > +       __type(key, long);
> > > +       __type(value, struct __bpf_usdt_spec);
> > > +} __bpf_usdt_specs_ip_to_id SEC(".maps");
> > ...
> >
> > > +
> > > +/* Fetch USDT argument *arg* (zero-indexed) and put its value into *res.
> > > + * Returns 0 on success; negative error, otherwise.
> > > + * On error *res is guaranteed to be set to zero.
> > > + */
> > > +__hidden __weak
> > > +int bpf_usdt_arg(struct pt_regs *ctx, int arg, long *res)
> > > +{
> > > +       struct __bpf_usdt_spec *spec;
> > > +       struct __bpf_usdt_arg_spec *arg_spec;
> > > +       unsigned long val;
> > > +       int err, spec_id;
> > > +
> > > +       *res = 0;
> > > +
> > > +       spec_id = __bpf_usdt_spec_id(ctx);
> > > +       if (spec_id < 0)
> > > +               return -ESRCH;
> > > +
> > > +       spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> > > +       if (!spec)
> > > +               return -ESRCH;
> > > +
> > > +       if (arg >= spec->arg_cnt)
> > > +               return -ENOENT;
> > > +
> > > +       arg_spec = &spec->args[arg];
> > > +       switch (arg_spec->arg_type) {
> >
> > Without bpf_cookie in the kernel each arg access is two lookups.
> > With bpf_cookie it's a single lookup in an array that is fast.
> > Multiply that cost by number of args.
> > Not a huge cost, but we can do better long term.
> >
> > How about annotating bpf_cookie with PTR_TO_BTF_ID at prog load time.
> > So that bpf_get_attach_cookie() returns PTR_TO_BTF_ID instead of long.
> > This way bpf_get_attach_cookie() can return
> > "struct __bpf_usdt_spec *".
> >
> > At attach time libbpf will provide populated 'struct __bpf_usdt_spec'
> > to the kernel and the kernel will copy the struct's data
> > in the bpf_link.
> > At detach time that memory is freed.
> >
> > Advantages:
> > - saves an array lookup at runtime
> > - no need to provide size for __bpf_usdt_specs map.
> >   That map is no longer needed.
> >   users don't need to worry about maxing out BPF_USDT_MAX_SPEC_CNT.
> > - libbpf doesn't need to populate __bpf_usdt_specs map
> >   libbpf doesn't need to allocate spec_id-s.
> >   libbpf will keep struct __bpf_usdt_spec per uprobe and
> >   pass it to the kernel at attach time to store in bpf_link.
> >
> > "cookie as ptr_to_btf_id" is a generic mechanism to provide a
> > blob of data to the bpf prog instead of a single "long".
> > That blob can be read/write too.
> > It can be used as per-program + per-attach point scratch area.
> > Similar to task/inode local storage...
> > That would be (prog, attach_point) local storage.
> >
> > Thoughts?
>
> Well, I'm not concerned about ARRAY lookup, as it is inlined and very
> fast. Sizing maps is hard and annoying, true, but I think we should
> eventually just have resizable or dynamically-sized BPF maps, which
> will be useful in a lot of other contexts.

Yes. dynamically sized bpf maps would be great.
That's orthogonal.

> We've had a discussion about a cookie that's bigger than 8 bytes with
> Daniel. I argued for simplicity and I still like it. If you think we
> should add blobs per attachment, it's fine, but let's keep it separate
> from the BPF cookie.

Well, Daniel was right.
This USDT work is first real use of bpf_cookie and
it clearly demonstrates that bpf_cookie alone as 8-byte long
is not enough. The bpf progs have to do map lookup.
I bet the majority of bpf_cookie use cases will include map lookup.
In the case of USDT we were able to get away with array lookup
which is cheap, but we won't be that lucky next time.
Hash lookup will be more costly and dynamically sized map
won't help the performance consideration.

It would be ok to keep ptr_to_btf_id separate from cookie only if
it won't sacrifice performance. The way cookie is shaping up
as part of bpf_run_ctx gives hope that they can stay separate.

> As for the PTR_TO_BTF_ID, I'm a bit confused, as kernel doesn't know
> __bpf_usdt_spec type, it's not part of vmlinux BTF, so you are
> proposing to have PTR_TO_BTF_ID that points to user-provided type?

Yes. It will be pointing to prog's BTF.

> I'm
> not sure I see how exactly that will work from the verifier's
> standpoint, tbh. At least I don't see how verifier can allow more than
> just giving direct memory access to a memory buffer.

It's a longer discussion, but user provided BTF doesn't mean
that it should be limited to scalars only.
Such struct can contain pointers too. Not on day one probably.
kptr and dynptr can be and should be allowed in user's BTFs eventually.

> But then each
> uprobe attachment can have differently-sized blob, so statically
> verifying that during program load time is impossible.

In this USDT case the __bpf_usdt_spec is fixed size for all attach points.
One ptr_to_btf_id as a cookie per program is a minor limitation.
I don't see a need to support different ptr_to_btf_id-s
in different attach points.
USDT use case doesn't need it at least.

> In any case, I don't think we should wait for any extra kernel
> functionality to add USDT support. If we have some of those and they
> bring noticeable benefits, we can opportunistically use them, if the
> kernel is recent enough.

Of course! It's not a blocker for libbpf usdt feature.
That's why this discussion is a separate thread.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: program local storage. Was: [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support
  2022-04-01  0:38       ` Alexei Starovoitov
@ 2022-04-01 16:56         ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2022-04-01 16:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Alan Maguire, Dave Marchevsky

On Thu, Mar 31, 2022 at 5:38 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Mar 31, 2022 at 1:13 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Mar 31, 2022 at 11:34 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Mar 24, 2022 at 10:30 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> > > > +
> > > > +struct __bpf_usdt_arg_spec {
> > > > +       __u64 val_off;
> > > > +       enum __bpf_usdt_arg_type arg_type;
> > > > +       short reg_off;
> > > > +       bool arg_signed;
> > > > +       char arg_bitshift;
> > > > +};
> > > > +
> > > > +/* should match USDT_MAX_ARG_CNT in usdt.c exactly */
> > > > +#define BPF_USDT_MAX_ARG_CNT 12
> > > > +struct __bpf_usdt_spec {
> > > > +       struct __bpf_usdt_arg_spec args[BPF_USDT_MAX_ARG_CNT];
> > > > +       __u64 usdt_cookie;
> > > > +       short arg_cnt;
> > > > +};
> > > > +
> > > > +__weak struct {
> > > > +       __uint(type, BPF_MAP_TYPE_ARRAY);
> > > > +       __uint(max_entries, BPF_USDT_MAX_SPEC_CNT);
> > > > +       __type(key, int);
> > > > +       __type(value, struct __bpf_usdt_spec);
> > > > +} __bpf_usdt_specs SEC(".maps");
> > > > +
> > > > +__weak struct {
> > > > +       __uint(type, BPF_MAP_TYPE_HASH);
> > > > +       __uint(max_entries, BPF_USDT_MAX_IP_CNT);
> > > > +       __type(key, long);
> > > > +       __type(value, struct __bpf_usdt_spec);
> > > > +} __bpf_usdt_specs_ip_to_id SEC(".maps");
> > > ...
> > >
> > > > +
> > > > +/* Fetch USDT argument *arg* (zero-indexed) and put its value into *res.
> > > > + * Returns 0 on success; negative error, otherwise.
> > > > + * On error *res is guaranteed to be set to zero.
> > > > + */
> > > > +__hidden __weak
> > > > +int bpf_usdt_arg(struct pt_regs *ctx, int arg, long *res)
> > > > +{
> > > > +       struct __bpf_usdt_spec *spec;
> > > > +       struct __bpf_usdt_arg_spec *arg_spec;
> > > > +       unsigned long val;
> > > > +       int err, spec_id;
> > > > +
> > > > +       *res = 0;
> > > > +
> > > > +       spec_id = __bpf_usdt_spec_id(ctx);
> > > > +       if (spec_id < 0)
> > > > +               return -ESRCH;
> > > > +
> > > > +       spec = bpf_map_lookup_elem(&__bpf_usdt_specs, &spec_id);
> > > > +       if (!spec)
> > > > +               return -ESRCH;
> > > > +
> > > > +       if (arg >= spec->arg_cnt)
> > > > +               return -ENOENT;
> > > > +
> > > > +       arg_spec = &spec->args[arg];
> > > > +       switch (arg_spec->arg_type) {
> > >
> > > Without bpf_cookie in the kernel each arg access is two lookups.
> > > With bpf_cookie it's a single lookup in an array that is fast.
> > > Multiply that cost by number of args.
> > > Not a huge cost, but we can do better long term.
> > >
> > > How about annotating bpf_cookie with PTR_TO_BTF_ID at prog load time.
> > > So that bpf_get_attach_cookie() returns PTR_TO_BTF_ID instead of long.
> > > This way bpf_get_attach_cookie() can return
> > > "struct __bpf_usdt_spec *".
> > >
> > > At attach time libbpf will provide populated 'struct __bpf_usdt_spec'
> > > to the kernel and the kernel will copy the struct's data
> > > in the bpf_link.
> > > At detach time that memory is freed.
> > >
> > > Advantages:
> > > - saves an array lookup at runtime
> > > - no need to provide size for __bpf_usdt_specs map.
> > >   That map is no longer needed.
> > >   users don't need to worry about maxing out BPF_USDT_MAX_SPEC_CNT.
> > > - libbpf doesn't need to populate __bpf_usdt_specs map
> > >   libbpf doesn't need to allocate spec_id-s.
> > >   libbpf will keep struct __bpf_usdt_spec per uprobe and
> > >   pass it to the kernel at attach time to store in bpf_link.
> > >
> > > "cookie as ptr_to_btf_id" is a generic mechanism to provide a
> > > blob of data to the bpf prog instead of a single "long".
> > > That blob can be read/write too.
> > > It can be used as per-program + per-attach point scratch area.
> > > Similar to task/inode local storage...
> > > That would be (prog, attach_point) local storage.
> > >
> > > Thoughts?
> >
> > Well, I'm not concerned about ARRAY lookup, as it is inlined and very
> > fast. Sizing maps is hard and annoying, true, but I think we should
> > eventually just have resizable or dynamically-sized BPF maps, which
> > will be useful in a lot of other contexts.
>
> Yes. dynamically sized bpf maps would be great.
> That's orthogonal.
>
> > We've had a discussion about a cookie that's bigger than 8 bytes with
> > Daniel. I argued for simplicity and I still like it. If you think we
> > should add blobs per attachment, it's fine, but let's keep it separate
> > from the BPF cookie.
>
> Well, Daniel was right.
> This USDT work is first real use of bpf_cookie and
> it clearly demonstrates that bpf_cookie alone as 8-byte long
> is not enough. The bpf progs have to do map lookup.
> I bet the majority of bpf_cookie use cases will include map lookup.

Well, yeah, absolutely, just how I argued when adding the BPF cookie.
Map lookup was the idea from the very beginning. It was clear that
only for most trivial use cases having u64 by itself would be enough,
it was always the goal to use u64 as a look up key into whatever
additional map (including hashmap) or global var array necessary.

> In the case of USDT we were able to get away with array lookup
> which is cheap, but we won't be that lucky next time.

Retsnoop is the second real use case for BPF cookie and there I just
have a global var array. Works good as well. I think you are
micro-optimizing around map look up in this case. Resizability/sizing
the map is a bigger concern (not even necessarily a real problem) in
practice, not the map lookup overhead.

> Hash lookup will be more costly and dynamically sized map
> won't help the performance consideration.

See above, I personally haven't been concerned with optimizing away
hash map lookup. In my local benchmarking hash lookup costs 14ns vs
4ns or ARRAY map lookup and 3ns global var array lookup. Yes, a bit
slower, but not a huge deal. Given that uprobe activation takes on the
order of 500ns, adding 14ns for single hashmap lookup seem ok to me.

But to be clear, I think it would be great to have this ability to
pre-allocate more storage per attachment. But with BPF cookie I wanted
something simple to use both from BPF program side and set up from
user-space side. And I think this feature provides that.

Whatever we do with variable-sized per-attach storage won't be as
straightforward. So it's worthwhile to consider it, but I still stand
by BPF cookie's approach, which overall is more flexible. E.g., I can
use the same cookie as a key into multiple independent maps, as
necessary. I can utilize all the fancy spinlocks, timers, dynptrs,
kptrs, etc inside map values without any extra verifier machinery and
extra user-space setup. All the building blocks are at my disposal.

But maybe what you want should be a special kind of local storage map
where key (bpf_link, presumably) will be implicitly taken from
bpf_run_ctx. Or something along those lines, so that one can utilize
all the generic map_value features.

>
> It would be ok to keep ptr_to_btf_id separate from cookie only if
> it won't sacrifice performance. The way cookie is shaping up
> as part of bpf_run_ctx gives hope that they can stay separate.

Yep, I think so.

>
> > As for the PTR_TO_BTF_ID, I'm a bit confused, as kernel doesn't know
> > __bpf_usdt_spec type, it's not part of vmlinux BTF, so you are
> > proposing to have PTR_TO_BTF_ID that points to user-provided type?
>
> Yes. It will be pointing to prog's BTF.
>
> > I'm
> > not sure I see how exactly that will work from the verifier's
> > standpoint, tbh. At least I don't see how verifier can allow more than
> > just giving direct memory access to a memory buffer.
>
> It's a longer discussion, but user provided BTF doesn't mean
> that it should be limited to scalars only.
> Such struct can contain pointers too. Not on day one probably.
> kptr and dynptr can be and should be allowed in user's BTFs eventually.

I see. It seems like this will mean that this BTF ID will have to be
provided during program load, not attach then? Assuming we have direct
memory access to the cookie contents.

Alternatively I can see how we can use dynptr to expose this memory to
user-space. Though probably without BTF info. In any case, I agree
it's bigger and separate discussion.

>
> > But then each
> > uprobe attachment can have differently-sized blob, so statically
> > verifying that during program load time is impossible.
>
> In this USDT case the __bpf_usdt_spec is fixed size for all attach points.
> One ptr_to_btf_id as a cookie per program is a minor limitation.
> I don't see a need to support different ptr_to_btf_id-s
> in different attach points.
> USDT use case doesn't need it at least.
>
> > In any case, I don't think we should wait for any extra kernel
> > functionality to add USDT support. If we have some of those and they
> > bring noticeable benefits, we can opportunistically use them, if the
> > kernel is recent enough.
>
> Of course! It's not a blocker for libbpf usdt feature.
> That's why this discussion is a separate thread.

Ok, wasn't sure so wanted to double check. I guess I missed the new
thread factor. As I mentioned above, I think this feature should still
be separate and complementary to BPF cookie (especially that BPF
cookie is UAPI anyways). And there are a bunch of ways we can go about
it, with pros and cons each. Sounds like something that can be
discussed at LSF/MM/BPF?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next 7/7] selftests/bpf: add urandom_read shared lib and USDTs
  2022-03-31 22:13   ` Alan Maguire
@ 2022-04-01 16:59     ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2022-04-01 16:59 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Dave Marchevsky

On Thu, Mar 31, 2022 at 3:14 PM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On Fri, 25 Mar 2022, Andrii Nakryiko wrote:
>
> > Extend urandom_read helper binary to include USDTs of 4 combinations:
> > semaphore/semaphoreless (refcounted and non-refcounted) and based in
> > executable or shared library. We also extend urandom_read with ability
> > to report it's own PID to parent process and wait for parent process to
> > ready itself up for tracing urandom_read. We utilize popen() and
> > underlying pipe properties for proper signaling.
> >
> > Once urandom_read is ready, we add few tests to validate that libbpf's
> > USDT attachment handles all the above combinations of semaphore (or lack
> > of it) and static or shared library USDTs. Also, we validate that libbpf
> > handles shared libraries both with PID filter and without one (i.e., -1
> > for PID argument).
> >
> > Having the shared library case tested with and without PID is important
> > because internal logic differs on kernels that don't support BPF
> > cookies. On such older kernels, attaching to USDTs in shared libraries
> > without specifying concrete PID doesn't work in principle, because it's
> > impossible to determine shared library's load address to derive absolute
> > IPs for uprobe attachments. Without absolute IPs, it's impossible to
> > perform correct look up of USDT spec based on uprobe's absolute IP (the
> > only kind available from BPF at runtime). This is not the problem on
> > newer kernels with BPF cookie as we don't need IP-to-ID lookup because
> > BPF cookie value *is* spec ID.
> >
> > So having those two situations as separate subtests is good because
> > libbpf CI is able to test latest selftests against old kernels (e.g.,
> > 4.9 and 5.5), so we'll be able to disable PID-less shared lib attachment
> > for old kernels, but will still leave PID-specific one enabled to validate
> > this legacy logic is working correctly.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
>
> haven't looked at this in depth yet, but hit a compilation error on
> aarch64:
>
>   LIB      liburandom_read.so
> /usr/bin/ld: /tmp/ccNy8cuv.o: relocation R_AARCH64_ADR_PREL_PG_HI21
> against symbol `urandlib_read_with_sema_semaphore' which may bind
> externally can not be used when making a shared object; recompile with
> -fPIC
> /tmp/ccNy8cuv.o: In function `urandlib_read_with_sema':
> /home/opc/src/bpf-next/tools/testing/selftests/bpf/urandom_read_lib1.c:12:(.text+0x10):
> dangerous relocation: unsupported relocation
> collect2: error: ld returned 1 exit status
> make: *** [Makefile:173:
> /home/opc/src/bpf-next/tools/testing/selftests/bpf/liburandom_read.so]
> Error 1
>
> following did fix it:
>
> diff --git a/tools/testing/selftests/bpf/Makefile
> b/tools/testing/selftests/bpf/Makefile
> index 58da22c019a8..c89e2948276b 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -170,7 +170,7 @@ $(OUTPUT)/%:%.c
>
>  $(OUTPUT)/liburandom_read.so: urandom_read_lib1.c urandom_read_lib2.c
>         $(call msg,LIB,,$@)
> -       $(Q)$(CC) $(CFLAGS) $(LDFLAGS) $^ $(LDLIBS) --shared -o $@
> +       $(Q)$(CC) $(CFLAGS) -fPIC $(LDFLAGS) $^ $(LDLIBS) --shared -o $@
>

Yep, forgot about -fPIC, thanks. Curious that it still works on x86 just fine :)

BTW, if you are interested in arm architecture, it would be great to
get some help to setup CI for ARMs. We have s390x and x86_64, it's a
great way to prevent lots of bugs. We just don't have anyone actively
supporting this for ARMs.

>  $(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c
> $(OUTPUT)/liburandom_read.so
>         $(call msg,BINARY,,$@)
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2022-04-01 16:59 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-25  5:29 [PATCH bpf-next 0/7] Add libbpf support for USDTs Andrii Nakryiko
2022-03-25  5:29 ` [PATCH bpf-next 1/7] libbpf: add BPF-side of USDT support Andrii Nakryiko
2022-03-30  3:10   ` Hengqi Chen
2022-03-30 15:22     ` Hengqi Chen
2022-03-31  5:44       ` Andrii Nakryiko
2022-03-30 15:36     ` Hengqi Chen
2022-03-31  5:48       ` Andrii Nakryiko
2022-03-31  5:44     ` Andrii Nakryiko
2022-03-31 11:30   ` Alan Maguire
2022-03-31 18:49     ` Andrii Nakryiko
2022-03-31 20:52       ` Andrii Nakryiko
2022-03-31 18:34   ` program local storage. Was: " Alexei Starovoitov
2022-03-31 20:13     ` Andrii Nakryiko
2022-04-01  0:38       ` Alexei Starovoitov
2022-04-01 16:56         ` Andrii Nakryiko
2022-03-25  5:29 ` [PATCH bpf-next 2/7] libbpf: wire up USDT API and bpf_link integration Andrii Nakryiko
2022-03-30  3:24   ` Hengqi Chen
2022-03-31  5:56     ` Andrii Nakryiko
2022-03-31 12:13   ` Alan Maguire
2022-03-31 19:02     ` Andrii Nakryiko
2022-03-25  5:29 ` [PATCH bpf-next 3/7] libbpf: add USDT notes parsing and resolution logic Andrii Nakryiko
2022-03-31 13:37   ` Alan Maguire
2022-03-31 19:13     ` Andrii Nakryiko
2022-03-25  5:29 ` [PATCH bpf-next 4/7] libbpf: wire up spec management and other arch-independent USDT logic Andrii Nakryiko
2022-03-31 14:49   ` Alan Maguire
2022-03-31 19:16     ` Andrii Nakryiko
2022-03-25  5:29 ` [PATCH bpf-next 5/7] libbpf: add x86-specific USDT arg spec parsing logic Andrii Nakryiko
2022-03-31 15:13   ` Alan Maguire
2022-03-31 19:20     ` Andrii Nakryiko
2022-03-25  5:29 ` [PATCH bpf-next 6/7] selftests/bpf: add basic USDT selftests Andrii Nakryiko
2022-03-31 15:54   ` Alan Maguire
2022-03-31 19:28     ` Andrii Nakryiko
2022-03-25  5:29 ` [PATCH bpf-next 7/7] selftests/bpf: add urandom_read shared lib and USDTs Andrii Nakryiko
2022-03-31 22:13   ` Alan Maguire
2022-04-01 16:59     ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).