bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/13] Introduce BPF map tracing capability
@ 2021-09-29 23:58 Joe Burton
  2021-09-29 23:58 ` [RFC PATCH v2 01/13] bpf: Add machinery to register map tracing hooks Joe Burton
                   ` (13 more replies)
  0 siblings, 14 replies; 21+ messages in thread
From: Joe Burton @ 2021-09-29 23:58 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Song Liu, Yonghong Song, John Fastabend, KP Singh, Petar Penkov,
	Stanislav Fomichev, Hao Luo, netdev, bpf, Joe Burton

From: Joe Burton <jevburton@google.com>

This patch introduces 'map tracing': the capability to execute a
tracing program after updating a map.

Map tracing enables upgrades of stateful programs with fewer race
conditions than otherwise possible. We use a tracing program to
imbue a map with copy-on-write semantics, then use an iterator to
perform a bulk copy of data in the map. After bulk copying concludes,
updates to that map automatically propagate via the tracing
program, avoiding a class of race conditions. This use case is
demonstrated in the new 'real_world_example' selftest.

Extend BPF_PROG_TYPE_TRACING with a new attach type, BPF_TRACE_MAP,
and allow linking these programs to arbitrary maps.

Extend the verifier to invoke helper calls directly after
bpf_map_update_elem() and bpf_map_delete_elem(). The helpers have the
exact same signature as the functions they trace, and simply pass those
arguments to the list of tracing programs attached to the map.

struct bpf_map gets a single additional pointer, which points to a
structure containing lists of tracing programs. This pointer is
populated when the first tracing program is attached to a map, to
minimize memory overhead for the majority of maps which are not
traced. I use an atomic cmpxchg() to avoid the need for a mutex
when accessing the pointer. Once populated, the pointer is never
freed until the map itself is freed.

One open question is how to handle pointer-based map updates. For
example:
  int *x = bpf_map_lookup_elem(...);
  if (...) *x++;
  if (...) *x--;
We can't just call a helper function right after the
bpf_map_lookup_elem(), since the updates occur later on. We also can't
determine where the last modification to the pointer occurs, due to
branch instructions. I would therefore consider a pattern where we
'flush' pointers at the end of a BPF program:
  int *x = bpf_map_lookup_elem(...);
  ...
  /* Execute tracing programs for this cell in this map. */
  bpf_map_trace_pointer_update(x);
  return 0;
We can't necessarily do this in the verifier, since 'x' may no
longer be in a register or on the stack. Thus we might introduce a
helper to save pointers that should be flushed, then flush all
registered pointers at every exit point:
  int *x = bpf_map_lookup_elem(...);
  /*
   * Saves 'x' somewhere in kernel memory. Does nothing if no
   * corresponding tracing progs are attached to the map.
   */
  bpf_map_trace_register_pointer(x);
  ...
  /* flush all registered pointers */
  bpf_map_trace_pointer_update();
  return 0;
This should be easy to implement in the verifier.

In addition, we use the verifier to instrument certain map update
calls. This requires saving arguments onto the stack, which means that
a program using MAX_BPF_STACK bytes of stack could exceed the limit.
I don't know whether this actually causes any problems.

Linux Plumbers Conference slides on this topic:
https://linuxplumbersconf.org/event/11/contributions/942/attachments/
814/1533/joe_burton_lpc_2021_slides.pdf

Joe Burton (13):
  bpf: Add machinery to register map tracing hooks
  bpf: Allow loading BPF_TRACE_MAP programs
  bpf: Add list of tracing programs to struct bpf_map
  bpf: Define a few bpf_link_ops for BPF_TRACE_MAP
  bpf: Enable creation of BPF_LINK_TYPE_MAP_TRACE
  bpf: Add APIs to invoke tracing programs
  bpf: Register BPF_MAP_TRACE_{UPDATE,DELETE}_ELEM hooks
  libbpf: Support BPF_TRACE_MAP
  bpf: Add infinite loop check on map tracers
  Add bpf_map_trace_{update,delete}_elem() helper functions
  bpf: verifier inserts map tracing helper call
  bpf: Add selftests for map tracing
  bpf: Add real world example for map tracing

 include/linux/bpf.h                           |  48 ++-
 include/linux/bpf_types.h                     |   1 +
 include/uapi/linux/bpf.h                      |  22 +
 kernel/bpf/Makefile                           |   1 +
 kernel/bpf/helpers.c                          |  29 ++
 kernel/bpf/map_trace.c                        | 406 ++++++++++++++++++
 kernel/bpf/syscall.c                          |   7 +
 kernel/bpf/verifier.c                         |  81 +++-
 tools/include/uapi/linux/bpf.h                |  22 +
 tools/lib/bpf/bpf.c                           |  13 +-
 tools/lib/bpf/bpf.h                           |   4 +-
 tools/lib/bpf/libbpf.c                        | 118 +++++
 tools/lib/bpf/libbpf.h                        |  11 +
 tools/lib/bpf/libbpf.map                      |   1 +
 .../selftests/bpf/prog_tests/bpf_map_trace.c  | 401 +++++++++++++++++
 .../bpf/progs/bpf_map_trace_delete_elem.c     |  49 +++
 .../selftests/bpf/progs/bpf_map_trace_loop0.c |  26 ++
 .../selftests/bpf/progs/bpf_map_trace_loop1.c |  43 ++
 .../progs/bpf_map_trace_real_world_common.h   | 125 ++++++
 .../bpf_map_trace_real_world_migration.c      |  96 +++++
 .../bpf/progs/bpf_map_trace_real_world_new.c  |   4 +
 .../bpf/progs/bpf_map_trace_real_world_old.c  |   5 +
 .../bpf/progs/bpf_map_trace_update_elem.c     |  51 +++
 .../selftests/bpf/verifier/map_trace.c        |  40 ++
 24 files changed, 1592 insertions(+), 12 deletions(-)
 create mode 100644 kernel/bpf/map_trace.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_map_trace.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_delete_elem.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_loop0.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_loop1.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_common.h
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_migration.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_new.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_old.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_update_elem.c
 create mode 100644 tools/testing/selftests/bpf/verifier/map_trace.c

-- 
2.33.0.685.g46640cef36-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 01/13] bpf: Add machinery to register map tracing hooks
  2021-09-29 23:58 [RFC PATCH v2 00/13] Introduce BPF map tracing capability Joe Burton
@ 2021-09-29 23:58 ` Joe Burton
  2021-09-29 23:58 ` [RFC PATCH v2 02/13] bpf: Allow loading BPF_TRACE_MAP programs Joe Burton
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Joe Burton @ 2021-09-29 23:58 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Song Liu, Yonghong Song, John Fastabend, KP Singh, Petar Penkov,
	Stanislav Fomichev, Hao Luo, netdev, bpf, Joe Burton

From: Joe Burton <jevburton@google.com>

Maps may be traced in two ways (so far): after a key-only operation
s.a. a deletion, and after a key-value operation s.a. an update. Each
type is identified by a tracing type.

In order to reject invalid map tracing programs at load time, we export
a function for each tracing type. Programs include this function's name
or BTF ID when loading. The verifier will check that the traced
function is registered for map tracing and that the program has the
right arguments.

Signed-off-by: Joe Burton <jevburton@google.com>
---
 include/linux/bpf.h      | 12 ++++++++++++
 include/uapi/linux/bpf.h |  9 +++++++++
 kernel/bpf/Makefile      |  1 +
 kernel/bpf/map_trace.c   | 33 +++++++++++++++++++++++++++++++++
 4 files changed, 55 insertions(+)
 create mode 100644 kernel/bpf/map_trace.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 19735d59230a..dad62d5571c9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1510,6 +1510,17 @@ struct bpf_iter_reg {
 	const struct bpf_iter_seq_info *seq_info;
 };
 
+#define BPF_MAP_TRACE_FUNC_SYM(trace_type) bpf_map_trace__ ## trace_type
+#define DEFINE_BPF_MAP_TRACE_FUNC(trace_type, args...)	\
+	extern int BPF_MAP_TRACE_FUNC_SYM(trace_type)(args);	\
+	int __init BPF_MAP_TRACE_FUNC_SYM(trace_type)(args)	\
+	{ return 0; }
+
+struct bpf_map_trace_reg {
+	const char *target;
+	enum bpf_map_trace_type trace_type;
+};
+
 struct bpf_iter_meta {
 	__bpf_md_ptr(struct seq_file *, seq);
 	u64 session_id;
@@ -1528,6 +1539,7 @@ void bpf_iter_unreg_target(const struct bpf_iter_reg *reg_info);
 bool bpf_iter_prog_supported(struct bpf_prog *prog);
 const struct bpf_func_proto *
 bpf_iter_get_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
+int bpf_map_trace_reg_target(const struct bpf_map_trace_reg *reg_info);
 int bpf_iter_link_attach(const union bpf_attr *attr, bpfptr_t uattr, struct bpf_prog *prog);
 int bpf_iter_new_fd(struct bpf_link *link);
 bool bpf_link_is_iter(struct bpf_link *link);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6fc59d61937a..17e8f4113369 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -93,6 +93,15 @@ union bpf_iter_link_info {
 	} map;
 };
 
+enum bpf_map_trace_type {
+	BPF_MAP_TRACE_UPDATE_ELEM = 0,
+	BPF_MAP_TRACE_DELETE_ELEM = 1,
+
+	MAX_BPF_MAP_TRACE_TYPE,
+};
+
+#define BPF_MAP_TRACE_FUNC(trace_type) "bpf_map_trace__" #trace_type
+
 /* BPF syscall commands, see bpf(2) man-page for more details. */
 /**
  * DOC: eBPF Syscall Preamble
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 7f33098ca63f..34eab32e0d9d 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -36,3 +36,4 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o
 obj-${CONFIG_BPF_LSM} += bpf_lsm.o
 endif
 obj-$(CONFIG_BPF_PRELOAD) += preload/
+obj-$(CONFIG_BPF_SYSCALL) += map_trace.o
diff --git a/kernel/bpf/map_trace.c b/kernel/bpf/map_trace.c
new file mode 100644
index 000000000000..d8f829535f7e
--- /dev/null
+++ b/kernel/bpf/map_trace.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2021 Google */
+
+#include <linux/filter.h>
+#include <linux/bpf.h>
+
+struct bpf_map_trace_target_info {
+	struct list_head list;
+	const struct bpf_map_trace_reg *reg_info;
+	u32 btf_id;
+};
+
+static struct list_head targets = LIST_HEAD_INIT(targets);
+static DEFINE_MUTEX(targets_mutex);
+
+int bpf_map_trace_reg_target(const struct bpf_map_trace_reg *reg_info)
+{
+	struct bpf_map_trace_target_info *tinfo;
+
+	tinfo = kmalloc(sizeof(*tinfo), GFP_KERNEL);
+	if (!tinfo)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&tinfo->list);
+	tinfo->reg_info = reg_info;
+	tinfo->btf_id = 0;
+
+	mutex_lock(&targets_mutex);
+	list_add(&tinfo->list, &targets);
+	mutex_unlock(&targets_mutex);
+
+	return 0;
+}
-- 
2.33.0.685.g46640cef36-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 02/13] bpf: Allow loading BPF_TRACE_MAP programs
  2021-09-29 23:58 [RFC PATCH v2 00/13] Introduce BPF map tracing capability Joe Burton
  2021-09-29 23:58 ` [RFC PATCH v2 01/13] bpf: Add machinery to register map tracing hooks Joe Burton
@ 2021-09-29 23:58 ` Joe Burton
  2021-09-29 23:59 ` [RFC PATCH v2 03/13] bpf: Add list of tracing programs to struct bpf_map Joe Burton
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Joe Burton @ 2021-09-29 23:58 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Song Liu, Yonghong Song, John Fastabend, KP Singh, Petar Penkov,
	Stanislav Fomichev, Hao Luo, netdev, bpf, Joe Burton

From: Joe Burton <jevburton@google.com>

Add new attach type, BPF_TRACE_MAP. Tracing programs may specify this
type when loading. The verifier then checks that the traced function
has been registered for map tracing.

Signed-off-by: Joe Burton <jevburton@google.com>
---
 include/linux/bpf.h      |  1 +
 include/uapi/linux/bpf.h |  1 +
 kernel/bpf/map_trace.c   | 25 +++++++++++++++++++++++++
 kernel/bpf/verifier.c    |  6 ++++++
 4 files changed, 33 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index dad62d5571c9..272f0ac49285 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1539,6 +1539,7 @@ void bpf_iter_unreg_target(const struct bpf_iter_reg *reg_info);
 bool bpf_iter_prog_supported(struct bpf_prog *prog);
 const struct bpf_func_proto *
 bpf_iter_get_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
+bool bpf_map_trace_prog_supported(struct bpf_prog *prog);
 int bpf_map_trace_reg_target(const struct bpf_map_trace_reg *reg_info);
 int bpf_iter_link_attach(const union bpf_attr *attr, bpfptr_t uattr, struct bpf_prog *prog);
 int bpf_iter_new_fd(struct bpf_link *link);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 17e8f4113369..0883c5dfb5d8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1003,6 +1003,7 @@ enum bpf_attach_type {
 	BPF_SK_REUSEPORT_SELECT,
 	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
 	BPF_PERF_EVENT,
+	BPF_TRACE_MAP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/kernel/bpf/map_trace.c b/kernel/bpf/map_trace.c
index d8f829535f7e..d2c6df20f55c 100644
--- a/kernel/bpf/map_trace.c
+++ b/kernel/bpf/map_trace.c
@@ -31,3 +31,28 @@ int bpf_map_trace_reg_target(const struct bpf_map_trace_reg *reg_info)
 
 	return 0;
 }
+
+bool bpf_map_trace_prog_supported(struct bpf_prog *prog)
+{
+	const char *attach_fname = prog->aux->attach_func_name;
+	u32 prog_btf_id = prog->aux->attach_btf_id;
+	struct bpf_map_trace_target_info *tinfo;
+	bool supported = false;
+
+	mutex_lock(&targets_mutex);
+	list_for_each_entry(tinfo, &targets, list) {
+		if (tinfo->btf_id && tinfo->btf_id == prog_btf_id) {
+			supported = true;
+			break;
+		}
+		if (!strcmp(attach_fname, tinfo->reg_info->target)) {
+			tinfo->btf_id = prog->aux->attach_btf_id;
+			supported = true;
+			break;
+		}
+	}
+	mutex_unlock(&targets_mutex);
+
+	return supported;
+}
+
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1433752db740..babcb135dc0d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9464,6 +9464,7 @@ static int check_return_code(struct bpf_verifier_env *env)
 			break;
 		case BPF_TRACE_RAW_TP:
 		case BPF_MODIFY_RETURN:
+		case BPF_TRACE_MAP:
 			return 0;
 		case BPF_TRACE_ITER:
 			break;
@@ -13496,6 +13497,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 
 		break;
 	case BPF_TRACE_ITER:
+	case BPF_TRACE_MAP:
 		if (!btf_type_is_func(t)) {
 			bpf_log(log, "attach_btf_id %u is not a function\n",
 				btf_id);
@@ -13672,6 +13674,10 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 		if (!bpf_iter_prog_supported(prog))
 			return -EINVAL;
 		return 0;
+	} else if (prog->expected_attach_type == BPF_TRACE_MAP) {
+		if (!bpf_map_trace_prog_supported(prog))
+			return -EINVAL;
+		return 0;
 	}
 
 	if (prog->type == BPF_PROG_TYPE_LSM) {
-- 
2.33.0.685.g46640cef36-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 03/13] bpf: Add list of tracing programs to struct bpf_map
  2021-09-29 23:58 [RFC PATCH v2 00/13] Introduce BPF map tracing capability Joe Burton
  2021-09-29 23:58 ` [RFC PATCH v2 01/13] bpf: Add machinery to register map tracing hooks Joe Burton
  2021-09-29 23:58 ` [RFC PATCH v2 02/13] bpf: Allow loading BPF_TRACE_MAP programs Joe Burton
@ 2021-09-29 23:59 ` Joe Burton
  2021-09-29 23:59 ` [RFC PATCH v2 04/13] bpf: Define a few bpf_link_ops for BPF_TRACE_MAP Joe Burton
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Joe Burton @ 2021-09-29 23:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Song Liu, Yonghong Song, John Fastabend, KP Singh, Petar Penkov,
	Stanislav Fomichev, Hao Luo, netdev, bpf, Joe Burton

From: Joe Burton <jevburton@google.com>

Add array of programs to struct bpf_map. To minimize cost, this array
is allocated at the time that the first program is attached to a map.
It is installed using a cmpxchg() to avoid adding another mutex inside
bpf_map.

Signed-off-by: Joe Burton <jevburton@google.com>
---
 include/linux/bpf.h    | 18 +++++++++++++++++-
 kernel/bpf/map_trace.c | 21 +++++++++++++++++++++
 kernel/bpf/syscall.c   |  4 ++++
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 272f0ac49285..3ae12ab97720 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -43,6 +43,7 @@ struct kobject;
 struct mem_cgroup;
 struct module;
 struct bpf_func_state;
+struct bpf_map_trace_progs;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
@@ -180,10 +181,11 @@ struct bpf_map {
 	struct mem_cgroup *memcg;
 #endif
 	char name[BPF_OBJ_NAME_LEN];
+	struct bpf_map_trace_progs *trace_progs;
 	u32 btf_vmlinux_value_type_id;
 	bool bypass_spec_v1;
 	bool frozen; /* write-once; write-protected by freeze_mutex */
-	/* 22 bytes hole */
+	/* 14 bytes hole */
 
 	/* The 3rd and 4th cacheline with misc members to avoid false sharing
 	 * particularly with refcounting.
@@ -1510,12 +1512,26 @@ struct bpf_iter_reg {
 	const struct bpf_iter_seq_info *seq_info;
 };
 
+/* Maximum number of tracing programs that may be attached to one map */
+#define BPF_MAP_TRACE_MAX_PROGS 16
 #define BPF_MAP_TRACE_FUNC_SYM(trace_type) bpf_map_trace__ ## trace_type
 #define DEFINE_BPF_MAP_TRACE_FUNC(trace_type, args...)	\
 	extern int BPF_MAP_TRACE_FUNC_SYM(trace_type)(args);	\
 	int __init BPF_MAP_TRACE_FUNC_SYM(trace_type)(args)	\
 	{ return 0; }
 
+struct bpf_map_trace_prog {
+	struct list_head list;
+	struct bpf_prog *prog;
+	struct rcu_head rcu;
+};
+
+struct bpf_map_trace_progs {
+	struct bpf_map_trace_prog __rcu progs[MAX_BPF_MAP_TRACE_TYPE];
+	u32 length[MAX_BPF_MAP_TRACE_TYPE];
+	struct mutex mutex; /* protects writes to progs, length */
+};
+
 struct bpf_map_trace_reg {
 	const char *target;
 	enum bpf_map_trace_type trace_type;
diff --git a/kernel/bpf/map_trace.c b/kernel/bpf/map_trace.c
index d2c6df20f55c..7776b8ccfe88 100644
--- a/kernel/bpf/map_trace.c
+++ b/kernel/bpf/map_trace.c
@@ -3,6 +3,7 @@
 
 #include <linux/filter.h>
 #include <linux/bpf.h>
+#include <linux/rcupdate.h>
 
 struct bpf_map_trace_target_info {
 	struct list_head list;
@@ -56,3 +57,23 @@ bool bpf_map_trace_prog_supported(struct bpf_prog *prog)
 	return supported;
 }
 
+int bpf_map_initialize_trace_progs(struct bpf_map *map)
+{
+	struct bpf_map_trace_progs *new_trace_progs;
+	int i;
+
+	if (!READ_ONCE(map->trace_progs)) {
+		new_trace_progs = kzalloc(sizeof(struct bpf_map_trace_progs),
+					  GFP_KERNEL);
+		if (!new_trace_progs)
+			return -ENOMEM;
+		mutex_init(&new_trace_progs->mutex);
+		for (i = 0; i < MAX_BPF_MAP_TRACE_TYPE; i++)
+			INIT_LIST_HEAD(&new_trace_progs->progs[i].list);
+		if (cmpxchg(&map->trace_progs, NULL, new_trace_progs))
+			kfree(new_trace_progs);
+	}
+
+	return 0;
+}
+
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4e50c0bfdb7d..e6179755fd3b 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -460,6 +460,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
 
 	security_bpf_map_free(map);
 	bpf_map_release_memcg(map);
+	kfree(map->trace_progs);
 	/* implementation dependent freeing */
 	map->ops->map_free(map);
 }
@@ -913,6 +914,9 @@ static int map_create(union bpf_attr *attr)
 		return err;
 	}
 
+	/* tracing programs lists are allocated when attached */
+	map->trace_progs = NULL;
+
 	return err;
 
 free_map_sec:
-- 
2.33.0.685.g46640cef36-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 04/13] bpf: Define a few bpf_link_ops for BPF_TRACE_MAP
  2021-09-29 23:58 [RFC PATCH v2 00/13] Introduce BPF map tracing capability Joe Burton
                   ` (2 preceding siblings ...)
  2021-09-29 23:59 ` [RFC PATCH v2 03/13] bpf: Add list of tracing programs to struct bpf_map Joe Burton
@ 2021-09-29 23:59 ` Joe Burton
  2021-09-30  0:26   ` Eric Dumazet
  2021-09-29 23:59 ` [RFC PATCH v2 05/13] bpf: Enable creation of BPF_LINK_TYPE_MAP_TRACE Joe Burton
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 21+ messages in thread
From: Joe Burton @ 2021-09-29 23:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Song Liu, Yonghong Song, John Fastabend, KP Singh, Petar Penkov,
	Stanislav Fomichev, Hao Luo, netdev, bpf, Joe Burton

From: Joe Burton <jevburton@google.com>

Define release, dealloc, and update_prog for the new tracing programs.
Updates are protected by a single global mutex.

Signed-off-by: Joe Burton <jevburton@google.com>
---
 kernel/bpf/map_trace.c | 71 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/kernel/bpf/map_trace.c b/kernel/bpf/map_trace.c
index 7776b8ccfe88..35906d59ba3c 100644
--- a/kernel/bpf/map_trace.c
+++ b/kernel/bpf/map_trace.c
@@ -14,6 +14,14 @@ struct bpf_map_trace_target_info {
 static struct list_head targets = LIST_HEAD_INIT(targets);
 static DEFINE_MUTEX(targets_mutex);
 
+struct bpf_map_trace_link {
+	struct bpf_link link;
+	struct bpf_map *map;
+	struct bpf_map_trace_target_info *tinfo;
+};
+
+static DEFINE_MUTEX(link_mutex);
+
 int bpf_map_trace_reg_target(const struct bpf_map_trace_reg *reg_info)
 {
 	struct bpf_map_trace_target_info *tinfo;
@@ -77,3 +85,66 @@ int bpf_map_initialize_trace_progs(struct bpf_map *map)
 	return 0;
 }
 
+static void bpf_map_trace_link_release(struct bpf_link *link)
+{
+	struct bpf_map_trace_link *map_trace_link =
+			container_of(link, struct bpf_map_trace_link, link);
+	enum bpf_map_trace_type trace_type =
+			map_trace_link->tinfo->reg_info->trace_type;
+	struct bpf_map_trace_prog *cur_prog;
+	struct bpf_map_trace_progs *progs;
+
+	progs = map_trace_link->map->trace_progs;
+	mutex_lock(&progs->mutex);
+	list_for_each_entry(cur_prog, &progs->progs[trace_type].list, list) {
+		if (cur_prog->prog == link->prog) {
+			progs->length[trace_type] -= 1;
+			list_del_rcu(&cur_prog->list);
+			kfree_rcu(cur_prog, rcu);
+		}
+	}
+	mutex_unlock(&progs->mutex);
+	bpf_map_put_with_uref(map_trace_link->map);
+}
+
+static void bpf_map_trace_link_dealloc(struct bpf_link *link)
+{
+	struct bpf_map_trace_link *map_trace_link =
+			container_of(link, struct bpf_map_trace_link, link);
+
+	kfree(map_trace_link);
+}
+
+static int bpf_map_trace_link_replace(struct bpf_link *link,
+				      struct bpf_prog *new_prog,
+				      struct bpf_prog *old_prog)
+{
+	int ret = 0;
+
+	mutex_lock(&link_mutex);
+	if (old_prog && link->prog != old_prog) {
+		ret = -EPERM;
+		goto out_unlock;
+	}
+
+	if (link->prog->type != new_prog->type ||
+	    link->prog->expected_attach_type != new_prog->expected_attach_type ||
+	    link->prog->aux->attach_btf_id != new_prog->aux->attach_btf_id) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	old_prog = xchg(&link->prog, new_prog);
+	bpf_prog_put(old_prog);
+
+out_unlock:
+	mutex_unlock(&link_mutex);
+	return ret;
+}
+
+static const struct bpf_link_ops bpf_map_trace_link_ops = {
+	.release = bpf_map_trace_link_release,
+	.dealloc = bpf_map_trace_link_dealloc,
+	.update_prog = bpf_map_trace_link_replace,
+};
+
-- 
2.33.0.685.g46640cef36-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 05/13] bpf: Enable creation of BPF_LINK_TYPE_MAP_TRACE
  2021-09-29 23:58 [RFC PATCH v2 00/13] Introduce BPF map tracing capability Joe Burton
                   ` (3 preceding siblings ...)
  2021-09-29 23:59 ` [RFC PATCH v2 04/13] bpf: Define a few bpf_link_ops for BPF_TRACE_MAP Joe Burton
@ 2021-09-29 23:59 ` Joe Burton
  2021-09-29 23:59 ` [RFC PATCH v2 06/13] bpf: Add APIs to invoke tracing programs Joe Burton
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Joe Burton @ 2021-09-29 23:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Song Liu, Yonghong Song, John Fastabend, KP Singh, Petar Penkov,
	Stanislav Fomichev, Hao Luo, netdev, bpf, Joe Burton

From: Joe Burton <jevburton@google.com>

Add new link type, BPF_LINK_TYPE_MAP_TRACE. This link attaches map
tracing programs to maps. At attachment time, the program's read-write
accesses are verified against the map's key/value size.

Signed-off-by: Joe Burton <jevburton@google.com>
---
 include/linux/bpf.h       |   2 +
 include/linux/bpf_types.h |   1 +
 include/uapi/linux/bpf.h  |  12 ++++
 kernel/bpf/map_trace.c    | 147 ++++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c      |   3 +
 5 files changed, 165 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3ae12ab97720..6f7aeeedca07 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1557,6 +1557,8 @@ const struct bpf_func_proto *
 bpf_iter_get_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
 bool bpf_map_trace_prog_supported(struct bpf_prog *prog);
 int bpf_map_trace_reg_target(const struct bpf_map_trace_reg *reg_info);
+int bpf_map_trace_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
+			      struct bpf_prog *prog);
 int bpf_iter_link_attach(const union bpf_attr *attr, bpfptr_t uattr, struct bpf_prog *prog);
 int bpf_iter_new_fd(struct bpf_link *link);
 bool bpf_link_is_iter(struct bpf_link *link);
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 9c81724e4b98..074153968b00 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -139,3 +139,4 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
 #ifdef CONFIG_PERF_EVENTS
 BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
 #endif
+BPF_LINK_TYPE(BPF_LINK_TYPE_MAP_TRACE, map_trace)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0883c5dfb5d8..3d5d3dafc066 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -100,6 +100,11 @@ enum bpf_map_trace_type {
 	MAX_BPF_MAP_TRACE_TYPE,
 };
 
+struct bpf_map_trace_link_info {
+	__u32   map_fd;
+	enum bpf_map_trace_type trace_type;
+};
+
 #define BPF_MAP_TRACE_FUNC(trace_type) "bpf_map_trace__" #trace_type
 
 /* BPF syscall commands, see bpf(2) man-page for more details. */
@@ -1018,6 +1023,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_NETNS = 5,
 	BPF_LINK_TYPE_XDP = 6,
 	BPF_LINK_TYPE_PERF_EVENT = 7,
+	BPF_LINK_TYPE_MAP_TRACE = 8,
 
 	MAX_BPF_LINK_TYPE,
 };
@@ -1465,6 +1471,12 @@ union bpf_attr {
 				 */
 				__u64		bpf_cookie;
 			} perf_event;
+			struct {
+				/* extra bpf_map_trace_link_info */
+				__aligned_u64	map_trace_info;
+				/* map_trace_info length */
+				__u32		map_trace_info_len;
+			};
 		};
 	} link_create;
 
diff --git a/kernel/bpf/map_trace.c b/kernel/bpf/map_trace.c
index 35906d59ba3c..ed0cbc941522 100644
--- a/kernel/bpf/map_trace.c
+++ b/kernel/bpf/map_trace.c
@@ -148,3 +148,150 @@ static const struct bpf_link_ops bpf_map_trace_link_ops = {
 	.update_prog = bpf_map_trace_link_replace,
 };
 
+int bpf_map_attach_trace(struct bpf_prog *prog,
+			 struct bpf_map *map,
+			 struct bpf_map_trace_link_info *linfo)
+{
+	u32 key_acc_size, value_acc_size, key_size, value_size;
+	struct bpf_map_trace_progs *trace_progs;
+	struct bpf_map_trace_prog *trace_prog;
+	bool is_percpu = false;
+	int err = -EINVAL;
+
+	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
+	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
+	    map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
+		is_percpu = true;
+	else if (map->map_type != BPF_MAP_TYPE_HASH &&
+		 map->map_type != BPF_MAP_TYPE_LRU_HASH &&
+		 map->map_type != BPF_MAP_TYPE_ARRAY)
+		goto put_map;
+
+	key_acc_size = prog->aux->max_rdonly_access;
+	value_acc_size = prog->aux->max_rdwr_access;
+	key_size = map->key_size;
+	if (!is_percpu)
+		value_size = map->value_size;
+	else
+		value_size = round_up(map->value_size, 8) * num_possible_cpus();
+
+	if (key_acc_size > key_size || value_acc_size > value_size) {
+		err = -EACCES;
+		goto put_map;
+	}
+
+	trace_prog = kmalloc(sizeof(*trace_prog), GFP_KERNEL);
+	if (!trace_prog) {
+		err = -ENOMEM;
+		goto put_map;
+	}
+	INIT_LIST_HEAD(&trace_prog->list);
+	trace_prog->prog = prog;
+
+	err = bpf_map_initialize_trace_progs(map);
+	if (err)
+		goto put_map;
+
+	trace_progs = map->trace_progs;
+	mutex_lock(&trace_progs->mutex);
+	if (trace_progs->length[linfo->trace_type] >= BPF_MAP_TRACE_MAX_PROGS)
+		err = -E2BIG;
+	else {
+		err = 0;
+		trace_progs->length[linfo->trace_type] += 1;
+		list_add_tail_rcu(&trace_prog->list,
+				  &trace_progs->progs[linfo->trace_type].list);
+	}
+	mutex_unlock(&trace_progs->mutex);
+
+	return err;
+
+put_map:
+	bpf_map_put_with_uref(map);
+	return err;
+}
+
+int bpf_map_trace_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
+			      struct bpf_prog *prog)
+{
+	struct bpf_map_trace_target_info *tinfo;
+	struct bpf_map_trace_link_info linfo;
+	struct bpf_link_primer link_primer;
+	struct bpf_map_trace_link *link;
+	u32 prog_btf_id, linfo_len;
+	bool existed = false;
+	struct bpf_map *map;
+	bpfptr_t ulinfo;
+	int err;
+
+	if (attr->link_create.target_fd || attr->link_create.flags)
+		return -EINVAL;
+
+	memset(&linfo, 0, sizeof(struct bpf_map_trace_link_info));
+
+	ulinfo = make_bpfptr(attr->link_create.map_trace_info,
+			     uattr.is_kernel);
+	linfo_len = attr->link_create.iter_info_len;
+	if (bpfptr_is_null(ulinfo) || !linfo_len)
+		return -EINVAL;
+
+	err = bpf_check_uarg_tail_zero(ulinfo, sizeof(linfo),
+				       linfo_len);
+	if (err)
+		return err;
+	linfo_len = min_t(u32, linfo_len, sizeof(linfo));
+	if (copy_from_bpfptr(&linfo, ulinfo, linfo_len))
+		return -EFAULT;
+
+	if (!linfo.map_fd)
+		return -EBADF;
+
+	prog_btf_id = prog->aux->attach_btf_id;
+	mutex_lock(&targets_mutex);
+	list_for_each_entry(tinfo, &targets, list) {
+		if (tinfo->btf_id == prog_btf_id) {
+			existed = true;
+			break;
+		}
+	}
+	mutex_unlock(&targets_mutex);
+	if (!existed)
+		return -ENOENT;
+
+	map = bpf_map_get_with_uref(linfo.map_fd);
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+	if (tinfo->reg_info->trace_type != linfo.trace_type) {
+		err = -EINVAL;
+		goto map_put;
+	}
+
+	link = kzalloc(sizeof(*link), GFP_USER | __GFP_NOWARN);
+	if (!link) {
+		err = -ENOMEM;
+		goto map_put;
+	}
+
+	bpf_link_init(&link->link, BPF_LINK_TYPE_MAP_TRACE,
+		      &bpf_map_trace_link_ops, prog);
+	link->tinfo = tinfo;
+	link->map = map;
+
+	err = bpf_link_prime(&link->link, &link_primer);
+	if (err) {
+		kfree(link);
+		goto map_put;
+	}
+
+	err = bpf_map_attach_trace(prog, map, &linfo);
+	if (err) {
+		bpf_link_cleanup(&link_primer);
+		goto map_put;
+	}
+
+	return bpf_link_settle(&link_primer);
+map_put:
+	bpf_map_put_with_uref(map);
+	return err;
+}
+
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e6179755fd3b..dd71853a858f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3136,6 +3136,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 	case BPF_CGROUP_SETSOCKOPT:
 		return BPF_PROG_TYPE_CGROUP_SOCKOPT;
 	case BPF_TRACE_ITER:
+	case BPF_TRACE_MAP:
 		return BPF_PROG_TYPE_TRACING;
 	case BPF_SK_LOOKUP:
 		return BPF_PROG_TYPE_SK_LOOKUP;
@@ -4192,6 +4193,8 @@ static int tracing_bpf_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
 
 	if (prog->expected_attach_type == BPF_TRACE_ITER)
 		return bpf_iter_link_attach(attr, uattr, prog);
+	else if (prog->expected_attach_type == BPF_TRACE_MAP)
+		return bpf_map_trace_link_attach(attr, uattr, prog);
 	else if (prog->type == BPF_PROG_TYPE_EXT)
 		return bpf_tracing_prog_attach(prog,
 					       attr->link_create.target_fd,
-- 
2.33.0.685.g46640cef36-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 06/13] bpf: Add APIs to invoke tracing programs
  2021-09-29 23:58 [RFC PATCH v2 00/13] Introduce BPF map tracing capability Joe Burton
                   ` (4 preceding siblings ...)
  2021-09-29 23:59 ` [RFC PATCH v2 05/13] bpf: Enable creation of BPF_LINK_TYPE_MAP_TRACE Joe Burton
@ 2021-09-29 23:59 ` Joe Burton
  2021-09-29 23:59 ` [RFC PATCH v2 07/13] bpf: Register BPF_MAP_TRACE_{UPDATE,DELETE}_ELEM hooks Joe Burton
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Joe Burton @ 2021-09-29 23:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Song Liu, Yonghong Song, John Fastabend, KP Singh, Petar Penkov,
	Stanislav Fomichev, Hao Luo, netdev, bpf, Joe Burton

From: Joe Burton <jevburton@google.com>

The macros BPF_TRACE_MAP_{K,KV} provide a convenient way to invoke
tracing programs. Maps will register themselves with a late_initcall()
then use these macros to invoke tracing programs.

Signed-off-by: Joe Burton <jevburton@google.com>
---
 include/linux/bpf.h    | 13 +++++++++++++
 kernel/bpf/map_trace.c | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6f7aeeedca07..73f4524c1c29 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1532,6 +1532,16 @@ struct bpf_map_trace_progs {
 	struct mutex mutex; /* protects writes to progs, length */
 };
 
+struct bpf_map_trace_ctx__update_elem {
+	__bpf_md_ptr(void *, key);
+	__bpf_md_ptr(void *, value);
+	u64 flags;
+};
+
+struct bpf_map_trace_ctx__delete_elem {
+	__bpf_md_ptr(void *, key);
+};
+
 struct bpf_map_trace_reg {
 	const char *target;
 	enum bpf_map_trace_type trace_type;
@@ -1560,6 +1570,9 @@ int bpf_map_trace_reg_target(const struct bpf_map_trace_reg *reg_info);
 int bpf_map_trace_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
 			      struct bpf_prog *prog);
 int bpf_iter_link_attach(const union bpf_attr *attr, bpfptr_t uattr, struct bpf_prog *prog);
+void bpf_trace_map_update_elem(struct bpf_map *map, void *key, void *value,
+			       u64 flags);
+void bpf_trace_map_delete_elem(struct bpf_map *map, void *key);
 int bpf_iter_new_fd(struct bpf_link *link);
 bool bpf_link_is_iter(struct bpf_link *link);
 struct bpf_prog *bpf_iter_get_info(struct bpf_iter_meta *meta, bool in_stop);
diff --git a/kernel/bpf/map_trace.c b/kernel/bpf/map_trace.c
index ed0cbc941522..77a55f8cd119 100644
--- a/kernel/bpf/map_trace.c
+++ b/kernel/bpf/map_trace.c
@@ -295,3 +295,43 @@ int bpf_map_trace_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
 	return err;
 }
 
+static void bpf_map_trace_run_progs(struct bpf_map *map,
+				    enum bpf_map_trace_type trace_type,
+				    void *ctx)
+{
+	struct bpf_map_trace_prog *prog;
+
+	if (!map->trace_progs)
+		return;
+
+	rcu_read_lock();
+	migrate_disable();
+	list_for_each_entry_rcu(prog,
+				&map->trace_progs->progs[trace_type].list,
+				list) {
+		bpf_prog_run(prog->prog, ctx);  /* return code is ignored */
+	}
+	migrate_enable();
+	rcu_read_unlock();
+}
+
+void bpf_trace_map_update_elem(struct bpf_map *map,
+			       void *key, void *value, u64 flags)
+{
+	struct bpf_map_trace_ctx__update_elem ctx;
+
+	ctx.key = key;
+	ctx.value = value;
+	ctx.flags = flags;
+	bpf_map_trace_run_progs(map, BPF_MAP_TRACE_UPDATE_ELEM, &ctx);
+}
+
+void bpf_trace_map_delete_elem(struct bpf_map *map,
+			       void *key)
+{
+	struct bpf_map_trace_ctx__delete_elem ctx;
+
+	ctx.key = key;
+	bpf_map_trace_run_progs(map, BPF_MAP_TRACE_DELETE_ELEM, &ctx);
+}
+
-- 
2.33.0.685.g46640cef36-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 07/13] bpf: Register BPF_MAP_TRACE_{UPDATE,DELETE}_ELEM hooks
  2021-09-29 23:58 [RFC PATCH v2 00/13] Introduce BPF map tracing capability Joe Burton
                   ` (5 preceding siblings ...)
  2021-09-29 23:59 ` [RFC PATCH v2 06/13] bpf: Add APIs to invoke tracing programs Joe Burton
@ 2021-09-29 23:59 ` Joe Burton
  2021-09-29 23:59 ` [RFC PATCH v2 08/13] libbpf: Support BPF_TRACE_MAP Joe Burton
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Joe Burton @ 2021-09-29 23:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Song Liu, Yonghong Song, John Fastabend, KP Singh, Petar Penkov,
	Stanislav Fomichev, Hao Luo, netdev, bpf, Joe Burton

From: Joe Burton <jevburton@google.com>

Tracing programs may now load with this hook. There is not yet a way to
invoke those programs.

Signed-off-by: Joe Burton <jevburton@google.com>
---
 kernel/bpf/map_trace.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/kernel/bpf/map_trace.c b/kernel/bpf/map_trace.c
index 77a55f8cd119..d7c52e197482 100644
--- a/kernel/bpf/map_trace.c
+++ b/kernel/bpf/map_trace.c
@@ -335,3 +335,26 @@ void bpf_trace_map_delete_elem(struct bpf_map *map,
 	bpf_map_trace_run_progs(map, BPF_MAP_TRACE_DELETE_ELEM, &ctx);
 }
 
+DEFINE_BPF_MAP_TRACE_FUNC(BPF_MAP_TRACE_UPDATE_ELEM, void *key, void *value, u64 flags);
+DEFINE_BPF_MAP_TRACE_FUNC(BPF_MAP_TRACE_DELETE_ELEM, void *key);
+
+static struct bpf_map_trace_reg map_trace_update_elem_reg_info = {
+	.target = BPF_MAP_TRACE_FUNC(BPF_MAP_TRACE_UPDATE_ELEM),
+	.trace_type = BPF_MAP_TRACE_UPDATE_ELEM,
+};
+static struct bpf_map_trace_reg map_trace_delete_elem_reg_info = {
+	.target = BPF_MAP_TRACE_FUNC(BPF_MAP_TRACE_DELETE_ELEM),
+	.trace_type = BPF_MAP_TRACE_DELETE_ELEM,
+};
+
+static int __init bpf_map_trace_init(void)
+{
+	int err = bpf_map_trace_reg_target(&map_trace_update_elem_reg_info);
+
+	err = (err ? err :
+	       bpf_map_trace_reg_target(&map_trace_delete_elem_reg_info));
+	return err;
+}
+
+late_initcall(bpf_map_trace_init);
+
-- 
2.33.0.685.g46640cef36-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 08/13] libbpf: Support BPF_TRACE_MAP
  2021-09-29 23:58 [RFC PATCH v2 00/13] Introduce BPF map tracing capability Joe Burton
                   ` (6 preceding siblings ...)
  2021-09-29 23:59 ` [RFC PATCH v2 07/13] bpf: Register BPF_MAP_TRACE_{UPDATE,DELETE}_ELEM hooks Joe Burton
@ 2021-09-29 23:59 ` Joe Burton
  2021-09-29 23:59 ` [RFC PATCH v2 09/13] bpf: Add infinite loop check on map tracers Joe Burton
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Joe Burton @ 2021-09-29 23:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Song Liu, Yonghong Song, John Fastabend, KP Singh, Petar Penkov,
	Stanislav Fomichev, Hao Luo, netdev, bpf, Joe Burton

From: Joe Burton <jevburton@google.com>

Programs may be now loaded with libbpf. Sections with the prefix
"map_trace/" will use the new attach type.

Signed-off-by: Joe Burton <jevburton@google.com>
---
 tools/include/uapi/linux/bpf.h |  22 ++++++
 tools/lib/bpf/bpf.c            |  13 ++--
 tools/lib/bpf/bpf.h            |   4 +-
 tools/lib/bpf/libbpf.c         | 118 +++++++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.h         |  11 +++
 tools/lib/bpf/libbpf.map       |   1 +
 6 files changed, 163 insertions(+), 6 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 6fc59d61937a..3d5d3dafc066 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -93,6 +93,20 @@ union bpf_iter_link_info {
 	} map;
 };
 
+enum bpf_map_trace_type {
+	BPF_MAP_TRACE_UPDATE_ELEM = 0,
+	BPF_MAP_TRACE_DELETE_ELEM = 1,
+
+	MAX_BPF_MAP_TRACE_TYPE,
+};
+
+struct bpf_map_trace_link_info {
+	__u32   map_fd;
+	enum bpf_map_trace_type trace_type;
+};
+
+#define BPF_MAP_TRACE_FUNC(trace_type) "bpf_map_trace__" #trace_type
+
 /* BPF syscall commands, see bpf(2) man-page for more details. */
 /**
  * DOC: eBPF Syscall Preamble
@@ -994,6 +1008,7 @@ enum bpf_attach_type {
 	BPF_SK_REUSEPORT_SELECT,
 	BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
 	BPF_PERF_EVENT,
+	BPF_TRACE_MAP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -1008,6 +1023,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_NETNS = 5,
 	BPF_LINK_TYPE_XDP = 6,
 	BPF_LINK_TYPE_PERF_EVENT = 7,
+	BPF_LINK_TYPE_MAP_TRACE = 8,
 
 	MAX_BPF_LINK_TYPE,
 };
@@ -1455,6 +1471,12 @@ union bpf_attr {
 				 */
 				__u64		bpf_cookie;
 			} perf_event;
+			struct {
+				/* extra bpf_map_trace_link_info */
+				__aligned_u64	map_trace_info;
+				/* map_trace_info length */
+				__u32		map_trace_info_len;
+			};
 		};
 	} link_create;
 
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 2401fad090c5..e18ddd362052 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -674,7 +674,7 @@ int bpf_link_create(int prog_fd, int target_fd,
 		    enum bpf_attach_type attach_type,
 		    const struct bpf_link_create_opts *opts)
 {
-	__u32 target_btf_id, iter_info_len;
+	__u32 target_btf_id, iter_info_len, map_trace_info_len;
 	union bpf_attr attr;
 	int fd;
 
@@ -682,13 +682,12 @@ int bpf_link_create(int prog_fd, int target_fd,
 		return libbpf_err(-EINVAL);
 
 	iter_info_len = OPTS_GET(opts, iter_info_len, 0);
+	map_trace_info_len = OPTS_GET(opts, map_trace_info_len, 0);
 	target_btf_id = OPTS_GET(opts, target_btf_id, 0);
 
 	/* validate we don't have unexpected combinations of non-zero fields */
-	if (iter_info_len || target_btf_id) {
-		if (iter_info_len && target_btf_id)
-			return libbpf_err(-EINVAL);
-		if (!OPTS_ZEROED(opts, target_btf_id))
+	if (iter_info_len || map_trace_info_len || target_btf_id) {
+		if ((iter_info_len || map_trace_info_len) && target_btf_id)
 			return libbpf_err(-EINVAL);
 	}
 
@@ -713,6 +712,10 @@ int bpf_link_create(int prog_fd, int target_fd,
 		if (!OPTS_ZEROED(opts, perf_event))
 			return libbpf_err(-EINVAL);
 		break;
+	case BPF_TRACE_MAP:
+		attr.link_create.map_trace_info = ptr_to_u64(OPTS_GET(opts, map_trace_info, (void *)0));
+		attr.link_create.map_trace_info_len = map_trace_info_len;
+		break;
 	default:
 		if (!OPTS_ZEROED(opts, flags))
 			return libbpf_err(-EINVAL);
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 6fffb3cdf39b..deb276ad489f 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -183,8 +183,10 @@ struct bpf_link_create_opts {
 		} perf_event;
 	};
 	size_t :0;
+	struct bpf_map_trace_link_info *map_trace_info;
+	__u32 map_trace_info_len;
 };
-#define bpf_link_create_opts__last_field perf_event
+#define bpf_link_create_opts__last_field map_trace_info_len
 
 LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
 			       enum bpf_attach_type attach_type,
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 1c859b32968d..a82126c0b969 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -7987,6 +7987,7 @@ static struct bpf_link *attach_raw_tp(const struct bpf_program *prog, long cooki
 static struct bpf_link *attach_trace(const struct bpf_program *prog, long cookie);
 static struct bpf_link *attach_lsm(const struct bpf_program *prog, long cookie);
 static struct bpf_link *attach_iter(const struct bpf_program *prog, long cookie);
+static struct bpf_link *attach_map_trace(const struct bpf_program *prog, long cookie);
 
 static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("socket",		SOCKET_FILTER, 0, SEC_NONE | SEC_SLOPPY_PFX),
@@ -8014,6 +8015,7 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("lsm/",			LSM, BPF_LSM_MAC, SEC_ATTACH_BTF, attach_lsm),
 	SEC_DEF("lsm.s/",		LSM, BPF_LSM_MAC, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_lsm),
 	SEC_DEF("iter/",		TRACING, BPF_TRACE_ITER, SEC_ATTACH_BTF, attach_iter),
+	SEC_DEF("map_trace/",		TRACING, BPF_TRACE_MAP, SEC_ATTACH_BTF, attach_map_trace),
 	SEC_DEF("syscall",		SYSCALL, 0, SEC_SLEEPABLE),
 	SEC_DEF("xdp_devmap/",		XDP, BPF_XDP_DEVMAP, SEC_ATTACHABLE),
 	SEC_DEF("xdp_cpumap/",		XDP, BPF_XDP_CPUMAP, SEC_ATTACHABLE),
@@ -8311,6 +8313,7 @@ static int bpf_object__collect_st_ops_relos(struct bpf_object *obj,
 #define BTF_TRACE_PREFIX "btf_trace_"
 #define BTF_LSM_PREFIX "bpf_lsm_"
 #define BTF_ITER_PREFIX "bpf_iter_"
+#define BTF_MAP_TRACE_PREFIX "bpf_map_trace__BPF_MAP_TRACE_"
 #define BTF_MAX_NAME_SIZE 128
 
 void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,
@@ -8329,6 +8332,10 @@ void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,
 		*prefix = BTF_ITER_PREFIX;
 		*kind = BTF_KIND_FUNC;
 		break;
+	case BPF_TRACE_MAP:
+		*prefix = BTF_MAP_TRACE_PREFIX;
+		*kind = BTF_KIND_FUNC;
+		break;
 	default:
 		*prefix = "";
 		*kind = BTF_KIND_FUNC;
@@ -8464,6 +8471,15 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog, const char *attac
 	__u32 attach_prog_fd = prog->attach_prog_fd;
 	int err = 0;
 
+	if (attach_type == BPF_TRACE_MAP) {
+		while (*attach_name && *attach_name != '/')
+			++attach_name;
+		if (!*attach_name || !*(++attach_name)) {
+			pr_warn("failed to parse trace type from ELF section name '%s'\n", prog->sec_name);
+			return -EINVAL;
+		}
+	}
+
 	/* BPF program's BTF ID */
 	if (attach_prog_fd) {
 		err = libbpf_find_prog_btf_id(attach_name, attach_prog_fd);
@@ -9951,6 +9967,108 @@ static struct bpf_link *attach_iter(const struct bpf_program *prog, long cookie)
 	return bpf_program__attach_iter(prog, NULL);
 }
 
+struct bpf_link *
+bpf_program__attach_map_trace(const struct bpf_program *prog,
+			      const struct bpf_map_trace_attach_opts *opts)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, link_create_opts);
+	char errmsg[STRERR_BUFSIZE];
+	struct bpf_link *link;
+	int prog_fd, link_fd;
+	__u32 target_fd = 0;
+
+	if (!OPTS_VALID(opts, bpf_map_trace_attach_opts))
+		return ERR_PTR(-EINVAL);
+
+	link_create_opts.map_trace_info = OPTS_GET(opts, link_info, (void *)0);
+	link_create_opts.map_trace_info_len = OPTS_GET(opts, link_info_len, 0);
+
+	prog_fd = bpf_program__fd(prog);
+	if (prog_fd < 0) {
+		pr_warn("prog '%s': can't attach before loaded\n", prog->name);
+		return ERR_PTR(-EINVAL);
+	}
+
+	link = calloc(1, sizeof(*link));
+	if (!link)
+		return ERR_PTR(-ENOMEM);
+	link->detach = &bpf_link__detach_fd;
+
+	link_fd = bpf_link_create(prog_fd, target_fd, BPF_TRACE_MAP,
+				  &link_create_opts);
+	if (link_fd < 0) {
+		link_fd = -errno;
+		free(link);
+		pr_warn("prog '%s': failed to attach to map: %s\n",
+			prog->name, libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
+		return ERR_PTR(link_fd);
+	}
+	link->fd = link_fd;
+	return link;
+}
+
+static struct bpf_link *attach_map_trace(const struct bpf_program *prog,
+					 long cookie)
+{
+	struct bpf_map_trace_attach_opts map_trace_opts;
+	struct bpf_map_trace_link_info link_info;
+	enum bpf_map_trace_type trace_type;
+	const char *trace_name = prog->sec_name;
+	const char *map_name = prog->sec_name;
+	char *map_name_cstr;
+	size_t map_name_len;
+	struct bpf_map *map;
+	int slash_seen = 0;
+
+	/*
+	 * Map tracing sections are named like this:
+	 *   map_trace/map_name/trace_type
+	 * Assign trace_name and map_name to the beginning of their names.
+	 */
+	while (*map_name && *trace_name && slash_seen < 2) {
+		if (slash_seen < 1)
+			map_name++;
+		if (slash_seen < 2)
+			trace_name++;
+		if (*trace_name == '/')
+			slash_seen++;
+	}
+	if (*map_name)
+		++map_name;
+	if (*trace_name)
+		++trace_name;
+	if (!*map_name || !*trace_name || slash_seen < 2)
+		return ERR_PTR(-EINVAL);
+
+	if (!strcmp(trace_name, "UPDATE_ELEM"))
+		trace_type = BPF_MAP_TRACE_UPDATE_ELEM;
+	else if (!strcmp(trace_name, "DELETE_ELEM"))
+		trace_type = BPF_MAP_TRACE_DELETE_ELEM;
+	else
+		return ERR_PTR(-EINVAL);
+
+	map_name_len = (trace_name - map_name) - 1;
+	map_name_cstr = malloc(map_name_len + 1);
+	if (!map_name_cstr)
+		return ERR_PTR(-ENOMEM);
+	map_name_cstr[map_name_len] = 0;
+	strncpy(map_name_cstr, map_name, map_name_len);
+	pr_warn("map name cstr: %s\n", map_name_cstr);
+	map = bpf_object__find_map_by_name(prog->obj, map_name_cstr);
+	free(map_name_cstr);
+	if (!map)
+		return ERR_PTR(-EINVAL);
+
+	memset(&link_info, 0, sizeof(link_info));
+	link_info.map_fd = bpf_map__fd(map);
+	link_info.trace_type = trace_type;
+	memset(&map_trace_opts, 0, sizeof(map_trace_opts));
+	map_trace_opts.sz = sizeof(map_trace_opts);
+	map_trace_opts.link_info = &link_info;
+	map_trace_opts.link_info_len = sizeof(link_info);
+	return bpf_program__attach_map_trace(prog, &map_trace_opts);
+}
+
 struct bpf_link *bpf_program__attach(const struct bpf_program *prog)
 {
 	if (!prog->sec_def || !prog->sec_def->attach_fn)
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index e35490c54eb3..418b65918639 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -358,6 +358,17 @@ LIBBPF_API struct bpf_link *
 bpf_program__attach_iter(const struct bpf_program *prog,
 			 const struct bpf_iter_attach_opts *opts);
 
+struct bpf_map_trace_attach_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	struct bpf_map_trace_link_info *link_info;
+	__u32 link_info_len;
+};
+#define bpf_map_trace_attach_opts__last_field link_info_len
+
+LIBBPF_API struct bpf_link *
+bpf_program__attach_map_trace(const struct bpf_program *prog,
+			      const struct bpf_map_trace_attach_opts *opts);
+
 struct bpf_insn;
 
 /*
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 9e649cf9e771..84f728b6bd4b 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -384,6 +384,7 @@ LIBBPF_0.5.0 {
 		btf__load_module_btf;
 		btf__load_vmlinux_btf;
 		btf_dump__dump_type_data;
+		bpf_program__attach_map_trace;
 		libbpf_set_strict_mode;
 } LIBBPF_0.4.0;
 
-- 
2.33.0.685.g46640cef36-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 09/13] bpf: Add infinite loop check on map tracers
  2021-09-29 23:58 [RFC PATCH v2 00/13] Introduce BPF map tracing capability Joe Burton
                   ` (7 preceding siblings ...)
  2021-09-29 23:59 ` [RFC PATCH v2 08/13] libbpf: Support BPF_TRACE_MAP Joe Burton
@ 2021-09-29 23:59 ` Joe Burton
  2021-09-29 23:59 ` [RFC PATCH v2 10/13] Add bpf_map_trace_{update,delete}_elem() helper functions Joe Burton
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Joe Burton @ 2021-09-29 23:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Song Liu, Yonghong Song, John Fastabend, KP Singh, Petar Penkov,
	Stanislav Fomichev, Hao Luo, netdev, bpf, Joe Burton

From: Joe Burton <jevburton@google.com>

Prevent programs from being attached to a map if that attachment could
cause an infinite loop. A simple example: a program updates the same
map that it's tracing. A map update would cause the program to run,
which would cause another update. A more complex example: an update to
map M0 triggers tracer P0. P0 updates map M1. M1 is being traced by
tracer T1. T1 updates M0.

We prevent this situation by enforcing that all programs "reachable"
from a given map do not include the proposed tracer.

Signed-off-by: Joe Burton <jevburton@google.com>
---
 kernel/bpf/map_trace.c | 46 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/kernel/bpf/map_trace.c b/kernel/bpf/map_trace.c
index d7c52e197482..80ceda8b1e62 100644
--- a/kernel/bpf/map_trace.c
+++ b/kernel/bpf/map_trace.c
@@ -148,6 +148,48 @@ static const struct bpf_link_ops bpf_map_trace_link_ops = {
 	.update_prog = bpf_map_trace_link_replace,
 };
 
+/* Determine whether attaching "prog" to "map" would create an infinite loop.
+ * If "prog" updates "map", then running "prog" again on a map update would
+ * loop.
+ */
+static int bpf_map_trace_would_loop(struct bpf_prog *prog,
+				    struct bpf_map *map)
+{
+	struct bpf_map_trace_prog *item;
+	struct bpf_prog_aux *aux;
+	struct bpf_map *aux_map;
+	int i, j, err = 0;
+
+	aux = prog->aux;
+	if (!aux)
+		return 0;
+	mutex_lock(&aux->used_maps_mutex);
+	for (i = 0; i < aux->used_map_cnt && !err; i++) {
+		aux_map = aux->used_maps[i];
+		if (aux_map == map) {
+			err = -EINVAL;
+			break;
+		}
+		for (j = 0; j < MAX_BPF_MAP_TRACE_TYPE && !err; j++) {
+			if (!aux_map->trace_progs)
+				continue;
+			rcu_read_lock();
+			list_for_each_entry_rcu(item,
+						&aux_map->trace_progs->progs[i].list,
+						list) {
+				err = bpf_map_trace_would_loop(
+						item->prog, map);
+				if (err)
+					break;
+			}
+			rcu_read_unlock();
+		}
+	}
+	mutex_unlock(&prog->aux->used_maps_mutex);
+	return err;
+}
+
+
 int bpf_map_attach_trace(struct bpf_prog *prog,
 			 struct bpf_map *map,
 			 struct bpf_map_trace_link_info *linfo)
@@ -180,6 +222,10 @@ int bpf_map_attach_trace(struct bpf_prog *prog,
 		goto put_map;
 	}
 
+	err = bpf_map_trace_would_loop(prog, map);
+	if (err)
+		goto put_map;
+
 	trace_prog = kmalloc(sizeof(*trace_prog), GFP_KERNEL);
 	if (!trace_prog) {
 		err = -ENOMEM;
-- 
2.33.0.685.g46640cef36-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 10/13] Add bpf_map_trace_{update,delete}_elem() helper functions
  2021-09-29 23:58 [RFC PATCH v2 00/13] Introduce BPF map tracing capability Joe Burton
                   ` (8 preceding siblings ...)
  2021-09-29 23:59 ` [RFC PATCH v2 09/13] bpf: Add infinite loop check on map tracers Joe Burton
@ 2021-09-29 23:59 ` Joe Burton
  2021-09-29 23:59 ` [RFC PATCH v2 11/13] bpf: verifier inserts map tracing helper call Joe Burton
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Joe Burton @ 2021-09-29 23:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Song Liu, Yonghong Song, John Fastabend, KP Singh, Petar Penkov,
	Stanislav Fomichev, Hao Luo, netdev, bpf, Joe Burton

From: Joe Burton <jevburton@google.com>

These helpers invoke tracing programs attached to a given map.

Signed-off-by: Joe Burton <jevburton@google.com>
---
 include/linux/bpf.h  |  2 ++
 kernel/bpf/helpers.c | 29 +++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 73f4524c1c29..847501a69012 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2148,6 +2148,8 @@ extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
 extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
 extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
 extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
+extern const struct bpf_func_proto bpf_map_trace_update_elem_proto;
+extern const struct bpf_func_proto bpf_map_trace_delete_elem_proto;
 
 const struct bpf_func_proto *tracing_prog_func_proto(
   enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 1ffd469c217f..f6ebc519334a 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1328,6 +1328,35 @@ void bpf_timer_cancel_and_free(void *val)
 	kfree(t);
 }
 
+BPF_CALL_4(bpf_map_trace_update_elem, struct bpf_map *, map,
+	   void *, key, void *, value, u64, flags)
+{
+	bpf_trace_map_update_elem(map, key, value, flags);
+	return 0;
+}
+
+const struct bpf_func_proto bpf_map_trace_update_elem_proto = {
+	.func		= bpf_map_trace_update_elem,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_PTR_TO_MAP_KEY,
+	.arg3_type	= ARG_PTR_TO_MAP_VALUE,
+	.arg4_type	= ARG_ANYTHING,
+};
+
+BPF_CALL_2(bpf_map_trace_delete_elem, struct bpf_map *, map, void *, key)
+{
+	bpf_trace_map_delete_elem(map, key);
+	return 0;
+}
+
+const struct bpf_func_proto bpf_map_trace_delete_elem_proto = {
+	.func		= bpf_map_trace_delete_elem,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_PTR_TO_MAP_KEY,
+};
+
 const struct bpf_func_proto bpf_get_current_task_proto __weak;
 const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
 const struct bpf_func_proto bpf_probe_read_user_proto __weak;
-- 
2.33.0.685.g46640cef36-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 11/13] bpf: verifier inserts map tracing helper call
  2021-09-29 23:58 [RFC PATCH v2 00/13] Introduce BPF map tracing capability Joe Burton
                   ` (9 preceding siblings ...)
  2021-09-29 23:59 ` [RFC PATCH v2 10/13] Add bpf_map_trace_{update,delete}_elem() helper functions Joe Burton
@ 2021-09-29 23:59 ` Joe Burton
  2021-09-29 23:59 ` [RFC PATCH v2 12/13] bpf: Add selftests for map tracing Joe Burton
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Joe Burton @ 2021-09-29 23:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Song Liu, Yonghong Song, John Fastabend, KP Singh, Petar Penkov,
	Stanislav Fomichev, Hao Luo, netdev, bpf, Joe Burton

From: Joe Burton <jevburton@google.com>

The verifier will automatically insert a map tracing helper call after
each bpf_map_{update,delete}_elem(). Registers are preserved
appropriately to make this transparent to applications.

Signed-off-by: Joe Burton <jevburton@google.com>
---
 kernel/bpf/verifier.c | 75 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 70 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index babcb135dc0d..01a99571ecea 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12678,6 +12678,48 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env,
 	return 0;
 }
 
+int get_map_tracing_patchlet(
+		void *map_func,
+		void *map_trace_func,
+		const int nregs,
+		struct bpf_prog *prog,
+		struct bpf_insn *insn_buf,
+		int *extra_stack)
+{
+	const int stack_offset = -1 * (int16_t) prog->aux->stack_depth;
+	const int reg_size_bytes = 8;
+	int cnt = 0, i;
+
+	/* push args onto the stack so that we can invoke the tracer after */
+	for (i = 0; i < nregs; i++)
+		insn_buf[cnt++] = BPF_STX_MEM(
+				BPF_DW, BPF_REG_FP,
+				BPF_REG_1 + i,
+				stack_offset - (i + 1) * reg_size_bytes);
+
+	insn_buf[cnt++] = BPF_EMIT_CALL(map_func);
+
+	for (i = 0; i < nregs; i++)
+		insn_buf[cnt++] = BPF_LDX_MEM(
+				BPF_DW, BPF_REG_1 + i,
+				BPF_REG_FP,
+				stack_offset - (i + 1) * reg_size_bytes);
+
+	/* save return code from map update */
+	insn_buf[cnt++] = BPF_STX_MEM(BPF_DW, BPF_REG_FP, BPF_REG_0,
+				      stack_offset - reg_size_bytes);
+
+	/* invoke tracing helper */
+	insn_buf[cnt++] = BPF_EMIT_CALL(map_trace_func);
+
+	/* restore return code from map update */
+	insn_buf[cnt++] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_FP,
+				      stack_offset - reg_size_bytes);
+
+	*extra_stack = max_t(int, *extra_stack, nregs * reg_size_bytes);
+	return cnt;
+}
+
 /* Do various post-verification rewrites in a single program pass.
  * These rewrites simplify JIT and interpreter implementations.
  */
@@ -12694,7 +12736,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 	struct bpf_insn insn_buf[16];
 	struct bpf_prog *new_prog;
 	struct bpf_map *map_ptr;
-	int i, ret, cnt, delta = 0;
+	int i, ret, cnt, delta = 0, extra_stack = 0;
 
 	for (i = 0; i < insn_cnt; i++, insn++) {
 		/* Make divide-by-zero exceptions impossible. */
@@ -12998,11 +13040,23 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 				insn->imm = BPF_CALL_IMM(ops->map_lookup_elem);
 				continue;
 			case BPF_FUNC_map_update_elem:
-				insn->imm = BPF_CALL_IMM(ops->map_update_elem);
-				continue;
+				cnt = get_map_tracing_patchlet(
+						ops->map_update_elem,
+						bpf_trace_map_update_elem,
+						/*nregs=*/4, prog, insn_buf,
+						&extra_stack);
+				if (cnt < 0)
+					return cnt;
+				goto patch_map_ops_tracing;
 			case BPF_FUNC_map_delete_elem:
-				insn->imm = BPF_CALL_IMM(ops->map_delete_elem);
-				continue;
+				cnt = get_map_tracing_patchlet(
+						ops->map_delete_elem,
+						bpf_trace_map_delete_elem,
+						/*nregs=*/2, prog, insn_buf,
+						&extra_stack);
+				if (cnt < 0)
+					return cnt;
+				goto patch_map_ops_tracing;
 			case BPF_FUNC_map_push_elem:
 				insn->imm = BPF_CALL_IMM(ops->map_push_elem);
 				continue;
@@ -13018,6 +13072,16 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			}
 
 			goto patch_call_imm;
+patch_map_ops_tracing:
+			new_prog = bpf_patch_insn_data(env, i + delta,
+						       insn_buf, cnt);
+			if (!new_prog)
+				return -ENOMEM;
+
+			delta    += cnt - 1;
+			env->prog = prog = new_prog;
+			insn      = new_prog->insnsi + i + delta;
+			continue;
 		}
 
 		/* Implement bpf_jiffies64 inline. */
@@ -13092,6 +13156,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 	}
 
 	sort_kfunc_descs_by_imm(env->prog);
+	prog->aux->stack_depth += extra_stack;
 
 	return 0;
 }
-- 
2.33.0.685.g46640cef36-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 12/13] bpf: Add selftests for map tracing
  2021-09-29 23:58 [RFC PATCH v2 00/13] Introduce BPF map tracing capability Joe Burton
                   ` (10 preceding siblings ...)
  2021-09-29 23:59 ` [RFC PATCH v2 11/13] bpf: verifier inserts map tracing helper call Joe Burton
@ 2021-09-29 23:59 ` Joe Burton
  2021-09-29 23:59 ` [RFC PATCH v2 13/13] bpf: Add real world example " Joe Burton
  2021-10-05  5:13 ` [RFC PATCH v2 00/13] Introduce BPF map tracing capability Alexei Starovoitov
  13 siblings, 0 replies; 21+ messages in thread
From: Joe Burton @ 2021-09-29 23:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Song Liu, Yonghong Song, John Fastabend, KP Singh, Petar Penkov,
	Stanislav Fomichev, Hao Luo, netdev, bpf, Joe Burton

From: Joe Burton <jevburton@google.com>

Add selftests for intended usage and infinite loop detection.

Signed-off-by: Joe Burton <jevburton@google.com>
---
 .../selftests/bpf/prog_tests/bpf_map_trace.c  | 144 ++++++++++++++++++
 .../bpf/progs/bpf_map_trace_delete_elem.c     |  49 ++++++
 .../selftests/bpf/progs/bpf_map_trace_loop0.c |  26 ++++
 .../selftests/bpf/progs/bpf_map_trace_loop1.c |  43 ++++++
 .../bpf/progs/bpf_map_trace_update_elem.c     |  51 +++++++
 .../selftests/bpf/verifier/map_trace.c        |  40 +++++
 6 files changed, 353 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_map_trace.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_delete_elem.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_loop0.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_loop1.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_update_elem.c
 create mode 100644 tools/testing/selftests/bpf/verifier/map_trace.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_map_trace.c b/tools/testing/selftests/bpf/prog_tests/bpf_map_trace.c
new file mode 100644
index 000000000000..89bae9a83339
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_map_trace.c
@@ -0,0 +1,144 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Google */
+#include <test_progs.h>
+
+#include <assert.h>
+#include <asm/unistd.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+#include <linux/bpf.h>
+#include <sys/stat.h>
+#include <sys/mount.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+#include "bpf_map_trace_delete_elem.skel.h"
+#include "bpf_map_trace_loop0.skel.h"
+#include "bpf_map_trace_loop1.skel.h"
+#include "bpf_map_trace_update_elem.skel.h"
+
+uint32_t collatz(uint32_t x)
+{
+	return x % 2 ? x * 3 + 1 : x / 2;
+}
+
+void update_elem__basic(void)
+{
+	const uint32_t tracer_value = collatz(0xdeadbeef);
+	struct bpf_map_trace_update_elem *skel;
+	const uint32_t tracer_key = 0x5;
+	uint32_t value;
+	int rc;
+
+	skel = bpf_map_trace_update_elem__open_and_load();
+	if (!ASSERT_NEQ(skel, NULL, "open/load skeleton failure"))
+		return;
+	rc = bpf_map_trace_update_elem__attach(skel);
+	if (!ASSERT_EQ(rc, 0, "attach skeleton failure")) {
+		fprintf(stderr, "Failed to attach skeleton: %d\n", errno);
+		goto out;
+	}
+
+	/* The kprobe will place (0x5, 0xdeadbeef) in its map. The tracer will
+	 * place (0x5, collatz(0xdeadbeef)) in its map. This map lookup will
+	 * trigger the kprobe.
+	 */
+	rc = bpf_map_lookup_elem(bpf_map__fd(skel->maps.tracer_map),
+				 &tracer_key, &value);
+	if (!ASSERT_EQ(rc, 0, "map lookup failure")) {
+		fprintf(stderr, "Failed to lookup tracer map: %s\n",
+			strerror(errno));
+		goto out;
+	}
+	if (!ASSERT_EQ(value, tracer_value, "map lookup mismatch"))
+		goto out;
+
+out:
+	bpf_map_trace_update_elem__destroy(skel);
+}
+
+void delete_elem__basic(void)
+{
+	const uint32_t tracer_key = collatz(0x5);
+	struct bpf_map_trace_delete_elem *skel;
+	uint32_t value = 0;
+	int rc;
+
+	skel = bpf_map_trace_delete_elem__open_and_load();
+	if (!ASSERT_NEQ(skel, NULL, "open/load skeleton failure"))
+		return;
+	rc = bpf_map_trace_delete_elem__attach(skel);
+	if (!ASSERT_EQ(rc, 0, "attach skeleton failure")) {
+		fprintf(stderr, "Failed to attach skeleton: %d\n", errno);
+		goto out;
+	}
+
+	/* The kprobe will delete (0x5) from its map. The tracer will
+	 * place (collatz(0x5), pid) in its map. This map lookup will trigger
+	 * the kprobe.
+	 */
+	rc = bpf_map_lookup_elem(bpf_map__fd(skel->maps.tracer_map),
+				 &tracer_key, &value);
+	if (!ASSERT_EQ(rc, 0, "map lookup failure")) {
+		fprintf(stderr, "Failed to lookup tracer map: %s\n",
+			strerror(errno));
+		goto out;
+	}
+	if (!ASSERT_EQ(value, getpid(), "map lookup mismatch"))
+		goto out;
+
+out:
+	bpf_map_trace_delete_elem__destroy(skel);
+}
+
+void infinite_loop__direct(void)
+{
+	struct bpf_map_trace_loop0 *skel;
+	struct bpf_link *tracer_link;
+
+	skel = bpf_map_trace_loop0__open_and_load();
+	if (!ASSERT_NEQ(skel, NULL, "open/load skeleton failure"))
+		goto out;
+	tracer_link = bpf_program__attach(skel->progs.tracer);
+	if (!ASSERT_ERR_PTR(tracer_link, "link creation success"))
+		goto out;
+
+out:
+	bpf_map_trace_loop0__destroy(skel);
+}
+
+void infinite_loop__indirect(void)
+{
+	struct bpf_map_trace_loop1 *skel;
+	struct bpf_link *tracer_link;
+
+	skel = bpf_map_trace_loop1__open_and_load();
+	if (!ASSERT_NEQ(skel, NULL, "open/load skeleton failure"))
+		return;
+	tracer_link = bpf_program__attach(skel->progs.tracer0);
+	if (!ASSERT_OK_PTR(tracer_link, "link creation failure"))
+		goto out;
+	tracer_link = bpf_program__attach(skel->progs.tracer1);
+	if (!ASSERT_ERR_PTR(tracer_link, "link creation success"))
+		goto out;
+
+out:
+	bpf_map_trace_loop1__destroy(skel);
+}
+
+void test_bpf_map_trace(void)
+{
+	if (test__start_subtest("update_elem__basic"))
+		update_elem__basic();
+	if (test__start_subtest("delete_elem__basic"))
+		delete_elem__basic();
+	if (test__start_subtest("infinite_loop__direct"))
+		infinite_loop__direct();
+	if (test__start_subtest("infinite_loop__indirect"))
+		infinite_loop__indirect();
+}
+
diff --git a/tools/testing/selftests/bpf/progs/bpf_map_trace_delete_elem.c b/tools/testing/selftests/bpf/progs/bpf_map_trace_delete_elem.c
new file mode 100644
index 000000000000..4e47c13489ea
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_map_trace_delete_elem.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Google */
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+
+struct bpf_map_def SEC("maps") traced_map = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(uint32_t),
+	.value_size = sizeof(uint32_t),
+	.max_entries = 64,
+};
+
+struct bpf_map_def SEC("maps") tracer_map = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(uint32_t),
+	.value_size = sizeof(uint32_t),
+	.max_entries = 64,
+};
+
+SEC("kprobe/__sys_bpf")
+int traced(struct pt_regs *regs)
+{
+	uint32_t key = 0x5;
+
+	bpf_map_delete_elem(&traced_map, &key);
+	return 0;
+}
+
+uint32_t collatz(uint32_t x)
+{
+	return x % 2 ? x * 3 + 1 : x / 2;
+}
+
+SEC("map_trace/traced_map/DELETE_ELEM")
+int tracer(struct bpf_map_trace_ctx__delete_elem *ctx)
+{
+	uint32_t key = 0, val = 0;
+
+	if (bpf_probe_read(&key, sizeof(key), ctx->key))
+		return 1;
+	key = collatz(key);
+	val = (bpf_get_current_pid_tgid() >> 32);
+	bpf_map_update_elem(&tracer_map, &key, &val, /*flags=*/0);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+
diff --git a/tools/testing/selftests/bpf/progs/bpf_map_trace_loop0.c b/tools/testing/selftests/bpf/progs/bpf_map_trace_loop0.c
new file mode 100644
index 000000000000..7205e8914380
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_map_trace_loop0.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Google */
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+
+struct bpf_map_def SEC("maps") traced_map = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(uint32_t),
+	.value_size = sizeof(uint32_t),
+	.max_entries = 64,
+};
+
+/* This traces traced_map and updates it, creating an (invalid) infinite loop.
+ */
+SEC("map_trace/traced_map/UPDATE_ELEM")
+int tracer(struct bpf_map_trace_ctx__update_elem *ctx)
+{
+	uint32_t key = 0, val = 0;
+
+	bpf_map_update_elem(&traced_map, &key, &val, /*flags=*/0);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+
diff --git a/tools/testing/selftests/bpf/progs/bpf_map_trace_loop1.c b/tools/testing/selftests/bpf/progs/bpf_map_trace_loop1.c
new file mode 100644
index 000000000000..10e39f05c7c8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_map_trace_loop1.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Google */
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+
+struct bpf_map_def SEC("maps") map0 = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(uint32_t),
+	.value_size = sizeof(uint32_t),
+	.max_entries = 64,
+};
+
+struct bpf_map_def SEC("maps") map1 = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(uint32_t),
+	.value_size = sizeof(uint32_t),
+	.max_entries = 64,
+};
+
+SEC("map_trace/map0/UPDATE_ELEM")
+int tracer0(struct bpf_map_trace_ctx__update_elem *ctx)
+{
+	uint32_t key = 0, val = 0;
+
+	bpf_map_update_elem(&map1, &key, &val, /*flags=*/0);
+	return 0;
+}
+
+/* Since this traces map1 and updates map0, it forms an infinite loop with
+ * tracer0.
+ */
+SEC("map_trace/map1/UPDATE_ELEM")
+int tracer1(struct bpf_map_trace_ctx__update_elem *ctx)
+{
+	uint32_t key = 0, val = 0;
+
+	bpf_map_update_elem(&map0, &key, &val, /*flags=*/0);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+
diff --git a/tools/testing/selftests/bpf/progs/bpf_map_trace_update_elem.c b/tools/testing/selftests/bpf/progs/bpf_map_trace_update_elem.c
new file mode 100644
index 000000000000..35a6026a90f9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_map_trace_update_elem.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Google */
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+
+struct bpf_map_def SEC("maps") traced_map = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(uint32_t),
+	.value_size = sizeof(uint32_t),
+	.max_entries = 64,
+};
+
+struct bpf_map_def SEC("maps") tracer_map = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(uint32_t),
+	.value_size = sizeof(uint32_t),
+	.max_entries = 64,
+};
+
+SEC("kprobe/__sys_bpf")
+int traced(struct pt_regs *regs)
+{
+	uint32_t key = 0x5;
+	uint32_t val = 0xdeadbeef;
+
+	bpf_map_update_elem(&traced_map, &key, &val, /*flags=*/0);
+	return 0;
+}
+
+uint32_t collatz(uint32_t x)
+{
+	return x % 2 ? x * 3 + 1 : x / 2;
+}
+
+SEC("map_trace/traced_map/UPDATE_ELEM")
+int tracer(struct bpf_map_trace_ctx__update_elem *ctx)
+{
+	uint32_t key = 0, val = 0;
+
+	if (bpf_probe_read(&key, sizeof(key), ctx->key))
+		return 1;
+	if (bpf_probe_read(&val, sizeof(val), ctx->value))
+		return 1;
+	val = collatz(val);
+	bpf_map_update_elem(&tracer_map, &key, &val, /*flags=*/0);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+
diff --git a/tools/testing/selftests/bpf/verifier/map_trace.c b/tools/testing/selftests/bpf/verifier/map_trace.c
new file mode 100644
index 000000000000..a48b6448454e
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/map_trace.c
@@ -0,0 +1,40 @@
+{
+	"map tracing: full stack is accepted",
+	.insns = {
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -512, 0),
+		BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -512),
+		BPF_LD_MAP_FD(BPF_REG_1, 0),
+		BPF_MOV64_REG(BPF_REG_3, BPF_REG_2),
+		BPF_MOV64_IMM(BPF_REG_4, 0),
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+			     BPF_FUNC_map_update_elem),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.fixup_map_hash_8b = { 3 },
+	.result_unpriv = ACCEPT,
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+},
+{
+	"map tracing: overfull stack is not accepted",
+	.insns = {
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -520, 0),
+		BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -520),
+		BPF_LD_MAP_FD(BPF_REG_1, 0),
+		BPF_MOV64_REG(BPF_REG_3, BPF_REG_2),
+		BPF_MOV64_IMM(BPF_REG_4, 0),
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+			     BPF_FUNC_map_update_elem),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.fixup_map_hash_8b = { 3 },
+	.result_unpriv = REJECT,
+	.result = REJECT,
+	.errstr = "invalid write to stack R10 off=-520 size=8",
+	.errstr_unpriv = "invalid write to stack R10 off=-520 size=8",
+	.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+},
-- 
2.33.0.685.g46640cef36-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 13/13] bpf: Add real world example for map tracing
  2021-09-29 23:58 [RFC PATCH v2 00/13] Introduce BPF map tracing capability Joe Burton
                   ` (11 preceding siblings ...)
  2021-09-29 23:59 ` [RFC PATCH v2 12/13] bpf: Add selftests for map tracing Joe Burton
@ 2021-09-29 23:59 ` Joe Burton
  2021-10-05  5:13 ` [RFC PATCH v2 00/13] Introduce BPF map tracing capability Alexei Starovoitov
  13 siblings, 0 replies; 21+ messages in thread
From: Joe Burton @ 2021-09-29 23:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau
  Cc: Song Liu, Yonghong Song, John Fastabend, KP Singh, Petar Penkov,
	Stanislav Fomichev, Hao Luo, netdev, bpf, Joe Burton

From: Joe Burton <jevburton@google.com>

Add an example that demonstrates how map tracing helps us avoid race
conditions while upgrading stateful program.

Signed-off-by: Joe Burton <jevburton@google.com>
---
 .../selftests/bpf/prog_tests/bpf_map_trace.c  | 257 ++++++++++++++++++
 .../progs/bpf_map_trace_real_world_common.h   | 125 +++++++++
 .../bpf_map_trace_real_world_migration.c      |  96 +++++++
 .../bpf/progs/bpf_map_trace_real_world_new.c  |   4 +
 .../bpf/progs/bpf_map_trace_real_world_old.c  |   5 +
 5 files changed, 487 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_common.h
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_migration.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_new.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_old.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_map_trace.c b/tools/testing/selftests/bpf/prog_tests/bpf_map_trace.c
index 89bae9a83339..cd60a0c78202 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_map_trace.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_map_trace.c
@@ -19,6 +19,9 @@
 #include "bpf_map_trace_delete_elem.skel.h"
 #include "bpf_map_trace_loop0.skel.h"
 #include "bpf_map_trace_loop1.skel.h"
+#include "bpf_map_trace_real_world_migration.skel.h"
+#include "bpf_map_trace_real_world_new.skel.h"
+#include "bpf_map_trace_real_world_old.skel.h"
 #include "bpf_map_trace_update_elem.skel.h"
 
 uint32_t collatz(uint32_t x)
@@ -130,8 +133,262 @@ void infinite_loop__indirect(void)
 	bpf_map_trace_loop1__destroy(skel);
 }
 
+int real_world_example__attach_migration(
+		struct bpf_map_trace_real_world_migration *migration_skel,
+		struct bpf_link **iter_link,
+		struct bpf_link **map_trace_link_update,
+		struct bpf_link **map_trace_link_delete)
+{
+	union bpf_iter_link_info iter_link_info;
+	struct bpf_iter_attach_opts iter_opts;
+	int64_t error;
+
+	*map_trace_link_update = bpf_program__attach(
+			migration_skel->progs.copy_on_write__update);
+	error = libbpf_get_error(map_trace_link_update);
+	if (!ASSERT_EQ(error, 0,
+		       "copy_on_write update bpf_program__attach failure"))
+		return 1;
+
+	*map_trace_link_delete = bpf_program__attach(
+			migration_skel->progs.copy_on_write__delete);
+	error = libbpf_get_error(map_trace_link_delete);
+	if (!ASSERT_EQ(error, 0,
+		       "copy_on_write update bpf_program__delete failure"))
+		return 1;
+
+	memset(&iter_link_info, 0, sizeof(iter_link_info));
+	iter_link_info.map.map_fd = bpf_map__fd(migration_skel->maps.old_map);
+
+	memset(&iter_opts, 0, sizeof(iter_opts));
+	iter_opts.sz = sizeof(iter_opts);
+	iter_opts.link_info = &iter_link_info;
+	iter_opts.link_info_len = sizeof(iter_link_info);
+	*iter_link = bpf_program__attach_iter(
+			migration_skel->progs.bulk_migration, &iter_opts);
+	error = libbpf_get_error(iter_link);
+	if (!ASSERT_EQ(error, 0, "bpf_program__attach_iter failure"))
+		return 1;
+
+	return 0;
+}
+
+int open_and_write_files(const char *path, size_t num_files)
+{
+	int *fds = malloc(sizeof(int) * num_files);
+	ssize_t bytes_written;
+	const char buf = 'a';
+	size_t i, j;
+	int ret = 0;
+
+	if (fds == NULL)
+		return 1;
+
+	for (i = 0; i < num_files; i++) {
+		fds[i] = open(path, O_WRONLY | O_CREAT);
+
+		if (fds[i] < 0) {
+			ret = 2;
+			break;
+		}
+		bytes_written = write(fds[i], &buf, sizeof(buf));
+		if (bytes_written != sizeof(buf)) {
+			ret = 3;
+			break;
+		}
+	}
+	for (j = 0; j < i; j++)
+		close(fds[j]);
+	return ret;
+}
+
+void real_world_example(void)
+{
+	struct bpf_map_trace_real_world_migration *migration_skel = NULL;
+	int file_fd_should_write = -1, file_fd_should_not_write = -1;
+	struct bpf_map_trace_real_world_new *new_skel = NULL;
+	struct bpf_map_trace_real_world_old *old_skel = NULL;
+	struct bpf_link *map_trace_link_update = NULL;
+	struct bpf_link *map_trace_link_delete = NULL;
+	struct bpf_link *iter_link = NULL;
+	const bool enable_filtering = 1;
+	const uint32_t pid = getpid();
+	uint32_t max_open_files;
+	char file_buf = 'a';
+	int iter_fd = -1;
+	char iter_buf[1];
+	int rc;
+
+	/*
+	 * Begin by loading and attaching the old version of our program.
+	 */
+	old_skel = bpf_map_trace_real_world_old__open_and_load();
+	if (!ASSERT_NEQ(old_skel, NULL, "open/load old skeleton"))
+		return;
+	rc = bpf_map_trace_real_world_old__attach(old_skel);
+	if (!ASSERT_EQ(rc, 0, "attach old skeleton")) {
+		fprintf(stderr, "Failed to attach skeleton: %d\n", errno);
+		goto out;
+	}
+	rc = bpf_map_update_elem(bpf_map__fd(old_skel->maps.filtered_pids),
+				 &pid, &enable_filtering, /*flags=*/0);
+	if (!ASSERT_EQ(rc, 0, "configure process to be filtered"))
+		return;
+	if (!ASSERT_EQ(open_and_write_files("/tmp/tst_file", 1), 0,
+		       "program allows writing a single new file"))
+		goto out;
+	max_open_files = bpf_map__max_entries(old_skel->maps.allow_reads);
+	if (!ASSERT_NEQ(open_and_write_files("/tmp/tst_file",
+					     max_open_files + 1), 0,
+		       "program blocks writing too many new files"))
+		goto out;
+
+	/*
+	 * Then load the new version of the program.
+	 */
+	new_skel = bpf_map_trace_real_world_new__open_and_load();
+	if (!ASSERT_NEQ(new_skel, NULL, "open/load new skeleton"))
+		goto out;
+
+	/*
+	 * Hook up the migration programs. This gives the old map
+	 * copy-on-write semantics.
+	 */
+	migration_skel = bpf_map_trace_real_world_migration__open();
+	if (!ASSERT_NEQ(migration_skel, NULL, "open migration skeleton"))
+		goto out;
+	rc = bpf_map__reuse_fd(migration_skel->maps.old_map,
+			       bpf_map__fd(old_skel->maps.allow_reads));
+	if (!ASSERT_EQ(rc, 0, "reuse old map fd"))
+		goto out;
+	rc = bpf_map__reuse_fd(migration_skel->maps.new_map,
+			       bpf_map__fd(new_skel->maps.allow_reads));
+	if (!ASSERT_EQ(rc, 0, "reuse new map fd"))
+		goto out;
+	rc = bpf_map_trace_real_world_migration__load(migration_skel);
+	if (!ASSERT_EQ(rc, 0, "load migration skeleton"))
+		goto out;
+	rc = real_world_example__attach_migration(migration_skel,
+						  &iter_link,
+						  &map_trace_link_update,
+						  &map_trace_link_delete);
+	if (!ASSERT_EQ(rc, 0, "attach migration programs"))
+		goto out;
+
+	/*
+	 * Simulated race condition type 1: An application opens an fd before
+	 * bulk transfer and closes it after.
+	 */
+	file_fd_should_not_write = open("/tmp/tst_file", O_WRONLY | O_CREAT);
+	if (!ASSERT_GE(file_fd_should_not_write, 0,
+		       "open file before bulk migration"))
+		goto out;
+
+	/*
+	 * Perform bulk transfer.
+	 */
+	iter_fd = bpf_iter_create(bpf_link__fd(iter_link));
+	if (!ASSERT_GE(iter_fd, 0, "create iterator"))
+		goto out;
+	rc = read(iter_fd, &iter_buf, sizeof(iter_buf));
+	if (!ASSERT_EQ(rc, 0, "execute map iterator"))
+		goto out;
+	rc = bpf_map_update_elem(bpf_map__fd(new_skel->maps.filtered_pids),
+				 &pid, &enable_filtering, /*flags=*/0);
+	if (!ASSERT_EQ(rc, 0, "configure process to be filtered"))
+		goto out;
+
+	/*
+	 * Simulated race condition type 1 (continued). This close() does not
+	 * propagate to the new map without copy-on-write semantics, so it
+	 * would occupy a spot in the map until our app happens to close an fd
+	 * with the same number. This would subtly degrade the contract with
+	 * the application.
+	 */
+	close(file_fd_should_not_write);
+	file_fd_should_not_write = -1;
+
+	/*
+	 * Simulated race condition type 2: An application opens a file
+	 * descriptor after bulk transfer. This openat() does not propagate to
+	 * the new map without copy-on-write, so our app would not be able to
+	 * write to it.
+	 */
+	file_fd_should_write = open("/tmp/tst_file", O_WRONLY | O_CREAT);
+	if (!ASSERT_GE(file_fd_should_write, 0,
+		       "open file after bulk migration"))
+		goto out;
+
+	/*
+	 * State is migrated. Load new programs.
+	 */
+	rc = bpf_map_trace_real_world_new__attach(new_skel);
+	if (!ASSERT_EQ(rc, 0, "failed to attach new programs"))
+		goto out;
+
+	/*
+	 * Unload migration progs.
+	 */
+	close(iter_fd);
+	iter_fd = -1;
+	bpf_link__destroy(map_trace_link_update);
+	map_trace_link_update = NULL;
+	bpf_link__destroy(map_trace_link_delete);
+	map_trace_link_delete = NULL;
+	bpf_link__destroy(iter_link);
+	iter_link = NULL;
+	bpf_map_trace_real_world_migration__destroy(migration_skel);
+	migration_skel = NULL;
+
+	/*
+	 * Unload old programs.
+	 */
+	bpf_map_trace_real_world_old__destroy(old_skel);
+	old_skel = NULL;
+
+	if (!ASSERT_EQ(open_and_write_files("/tmp/tst_file", 1), 0,
+		       "program allows writing a single new file"))
+		goto out;
+	max_open_files = bpf_map__max_entries(new_skel->maps.allow_reads);
+	if (!ASSERT_NEQ(open_and_write_files("/tmp/tst_file",
+					     max_open_files + 1), 0,
+		       "program blocks writing too many new files"))
+		goto out;
+	/*
+	 * Simulated race condition type 2 (continued): If we didn't do
+	 * copy-on-write, this would be expected to fail, since the FD would
+	 * not be in the new map.
+	 */
+	rc = write(file_fd_should_write, &file_buf, sizeof(file_buf));
+	if (!ASSERT_EQ(rc, sizeof(file_buf),
+		       "migrated program allows writing to file opened before migration"))
+		goto out;
+
+out:
+	if (old_skel)
+		bpf_map_trace_real_world_old__destroy(old_skel);
+	if (new_skel)
+		bpf_map_trace_real_world_new__destroy(new_skel);
+	if (migration_skel)
+		bpf_map_trace_real_world_migration__destroy(migration_skel);
+	if (map_trace_link_update)
+		bpf_link__destroy(map_trace_link_update);
+	if (map_trace_link_delete)
+		bpf_link__destroy(map_trace_link_delete);
+	if (iter_link)
+		bpf_link__destroy(iter_link);
+	if (iter_fd > -1)
+		close(iter_fd);
+	if (file_fd_should_write > -1)
+		close(file_fd_should_write);
+	if (file_fd_should_not_write > -1)
+		close(file_fd_should_not_write);
+}
+
 void test_bpf_map_trace(void)
 {
+	if (test__start_subtest("real_world_example"))
+		real_world_example();
 	if (test__start_subtest("update_elem__basic"))
 		update_elem__basic();
 	if (test__start_subtest("delete_elem__basic"))
diff --git a/tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_common.h b/tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_common.h
new file mode 100644
index 000000000000..230610e1b5d5
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_common.h
@@ -0,0 +1,125 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2021 Google */
+#pragma once
+
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <errno.h>
+#include <string.h>
+
+/*
+ * Mock "real world" application.
+ *
+ * Blocks all writes from a set of applications. A limited number of newly
+ * openat()ed file descriptors file descriptors may be written to. Writes to
+ * already-open file descriptors are blocked.
+ *
+ * The affected processes are selected by populating filtered_pid.
+ *
+ * It is intended as an example of a stateful policy-enforcement application
+ * which benefits from map tracing. It is not intended to be useful.
+ */
+
+/*
+ * This is the only difference between the old and new application. Since we're
+ * enforcing a policy based on this data, we want to migrate it. Since the
+ * application can modify the data in parallel, we need to give this map
+ * copy-on-write semantics so that those changes propagate.
+ */
+#if defined(OLD_VERSION)
+struct allow_reads_key {
+	uint32_t pid;
+	int fd;
+};
+#else
+struct allow_reads_key {
+	int fd;
+	uint32_t pid;
+};
+#endif
+struct allow_reads_value {
+	bool do_allow;
+};
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 16);
+	__type(key, struct allow_reads_key);
+	__type(value, struct allow_reads_value);
+} allow_reads SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 16);
+	__type(key, uint32_t);
+	__type(value, bool);
+} filtered_pids SEC(".maps");
+
+
+SEC("kretprobe/__x64_sys_openat")
+int BPF_KRETPROBE(kretprobe__x64_sys_openat, int ret)
+{
+	struct allow_reads_key key;
+	struct allow_reads_value val;
+	uint32_t pid;
+	char *pid_is_filtered;
+
+	pid = (bpf_get_current_pid_tgid() >> 32) & 0xFFFFFFFF;
+	memset(&key, 0, sizeof(key));
+	key.pid = pid;
+	key.fd = ret;
+	val.do_allow = true;
+
+	if (ret < 0)
+		return 0;
+
+	pid_is_filtered = bpf_map_lookup_elem(&filtered_pids, &pid);
+	if (!pid_is_filtered)
+		return 0;
+
+	if (!*pid_is_filtered)
+		return 0;
+
+	/*
+	 * Ignore errors. Failing to insert has the effect of blocking writes
+	 * on that file descriptor.
+	 */
+	bpf_map_update_elem(&allow_reads, &key, &val, /*flags=*/0);
+	return 0;
+}
+
+SEC("fmod_ret/__x64_sys_write")
+int BPF_PROG(fmod_ret__x64_sys_write, struct pt_regs *regs, int ret)
+{
+	int fd = PT_REGS_PARM1(regs);
+	struct allow_reads_value *val;
+	struct allow_reads_key key;
+
+	memset(&key, 0, sizeof(key));
+	key.pid = (bpf_get_current_pid_tgid() >> 32) & 0xFFFFFFFF;
+	key.fd = fd;
+	val = bpf_map_lookup_elem(&allow_reads, &key);
+	if (!val)
+		return -EPERM;
+	return val->do_allow ? 0 : -EPERM;
+}
+
+SEC("fmod_ret/__x64_sys_close")
+int BPF_PROG(fmod_ret__x64_sys_close, struct pt_regs *regs, int ret)
+{
+	int fd = PT_REGS_PARM1(regs);
+	struct allow_reads_key key;
+	struct allow_reads_value val;
+
+	memset(&key, 0, sizeof(key));
+	key.pid = (bpf_get_current_pid_tgid() >> 32) & 0xFFFFFFFF;
+	key.fd = fd;
+	val.do_allow = true;
+
+	bpf_map_delete_elem(&allow_reads, &key);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+
diff --git a/tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_migration.c b/tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_migration.c
new file mode 100644
index 000000000000..d0a37ae1be26
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_migration.c
@@ -0,0 +1,96 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Google */
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+
+/* In the "real" real world, we would use BTF to generate a program which knows
+ * about the old and new map ABI. To keep things simple we'll just use a
+ * statically defined program which knows about them.
+ */
+struct allow_reads_key__old {
+	uint32_t pid;
+	int fd;
+};
+struct allow_reads_key__new {
+	int fd;
+	uint32_t pid;
+};
+struct allow_reads_value__old {
+	bool do_drop;
+};
+struct allow_reads_value__new {
+	bool do_drop;
+};
+
+/* Likewise, in the "real" real world we would simply generate a program
+ * containing the fd of this map. For libbpf to generate a skeleton for us we
+ * need to dupicate this definition.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 100);
+	__type(key, struct allow_reads_key__old);
+	__type(value, struct allow_reads_value__old);
+} old_map SEC(".maps");
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 100);
+	__type(key, struct allow_reads_key__new);
+	__type(value, struct allow_reads_value__new);
+} new_map SEC(".maps");
+
+static inline void read_migrate_write(void *key, void *value)
+{
+	struct allow_reads_key__old old_key = {};
+	struct allow_reads_key__new new_key = {};
+	char old_value = 0;
+
+	if (bpf_probe_read(&old_key, sizeof(old_key), key))
+		return; /* Could write to a map here */
+	if (bpf_probe_read(&old_value, sizeof(old_value), value))
+		return; /* Could write to a map here */
+
+	new_key.pid = old_key.pid;
+	new_key.fd = old_key.fd;
+
+	bpf_map_update_elem(&new_map, &new_key, &old_value, /*flags=*/0);
+}
+
+SEC("map_trace/old_map/UPDATE_ELEM")
+int copy_on_write__update(struct bpf_map_trace_ctx__update_elem *ctx)
+{
+	read_migrate_write(ctx->key, ctx->value);
+	return 0;
+}
+
+static inline void read_migrate_delete(void *key)
+{
+	struct allow_reads_key__old old_key = {};
+	struct allow_reads_key__new new_key = {};
+
+	if (bpf_probe_read(&old_key, sizeof(old_key), key))
+		return; /* Could write to a map here */
+
+	new_key.pid = old_key.pid;
+	new_key.fd = old_key.fd;
+
+	bpf_map_delete_elem(&new_map, &new_key);
+}
+
+SEC("map_trace/old_map/DELETE_ELEM")
+int copy_on_write__delete(struct bpf_map_trace_ctx__delete_elem *ctx)
+{
+	read_migrate_delete(ctx->key);
+	return 0;
+}
+
+SEC("iter/bpf_map_elem")
+int bulk_migration(struct bpf_iter__bpf_map_elem *ctx)
+{
+	read_migrate_write(ctx->key, ctx->value);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+
diff --git a/tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_new.c b/tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_new.c
new file mode 100644
index 000000000000..9b7c4ca1deed
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_new.c
@@ -0,0 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Google */
+#include "bpf_map_trace_real_world_common.h"
+
diff --git a/tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_old.c b/tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_old.c
new file mode 100644
index 000000000000..9f0bdd7baf71
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_map_trace_real_world_old.c
@@ -0,0 +1,5 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Google */
+#define OLD_VERSION
+#include "bpf_map_trace_real_world_common.h"
+
-- 
2.33.0.685.g46640cef36-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 04/13] bpf: Define a few bpf_link_ops for BPF_TRACE_MAP
  2021-09-29 23:59 ` [RFC PATCH v2 04/13] bpf: Define a few bpf_link_ops for BPF_TRACE_MAP Joe Burton
@ 2021-09-30  0:26   ` Eric Dumazet
  2021-09-30  1:09     ` Joe Burton
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2021-09-30  0:26 UTC (permalink / raw)
  To: Joe Burton, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau
  Cc: Song Liu, Yonghong Song, John Fastabend, KP Singh, Petar Penkov,
	Stanislav Fomichev, Hao Luo, netdev, bpf, Joe Burton



On 9/29/21 4:59 PM, Joe Burton wrote:
> From: Joe Burton <jevburton@google.com>
> 
> Define release, dealloc, and update_prog for the new tracing programs.
> Updates are protected by a single global mutex.
> 
> Signed-off-by: Joe Burton <jevburton@google.com>
> ---
>  kernel/bpf/map_trace.c | 71 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 71 insertions(+)
> 
> diff --git a/kernel/bpf/map_trace.c b/kernel/bpf/map_trace.c
> index 7776b8ccfe88..35906d59ba3c 100644
> --- a/kernel/bpf/map_trace.c
> +++ b/kernel/bpf/map_trace.c
> @@ -14,6 +14,14 @@ struct bpf_map_trace_target_info {
>  static struct list_head targets = LIST_HEAD_INIT(targets);
>  static DEFINE_MUTEX(targets_mutex);
>  
> +struct bpf_map_trace_link {
> +	struct bpf_link link;
> +	struct bpf_map *map;
> +	struct bpf_map_trace_target_info *tinfo;
> +};
> +
> +static DEFINE_MUTEX(link_mutex);
> +
>  int bpf_map_trace_reg_target(const struct bpf_map_trace_reg *reg_info)
>  {
>  	struct bpf_map_trace_target_info *tinfo;
> @@ -77,3 +85,66 @@ int bpf_map_initialize_trace_progs(struct bpf_map *map)
>  	return 0;
>  }
>  
> +static void bpf_map_trace_link_release(struct bpf_link *link)
> +{
> +	struct bpf_map_trace_link *map_trace_link =
> +			container_of(link, struct bpf_map_trace_link, link);
> +	enum bpf_map_trace_type trace_type =
> +			map_trace_link->tinfo->reg_info->trace_type;
> +	struct bpf_map_trace_prog *cur_prog;
> +	struct bpf_map_trace_progs *progs;
> +
> +	progs = map_trace_link->map->trace_progs;
> +	mutex_lock(&progs->mutex);
> +	list_for_each_entry(cur_prog, &progs->progs[trace_type].list, list) {

You might consider using list_for_each_entry_safe(), or ...

> +		if (cur_prog->prog == link->prog) {
> +			progs->length[trace_type] -= 1;
> +			list_del_rcu(&cur_prog->list);
> +			kfree_rcu(cur_prog, rcu);

or add a break; if you do not expect to find multiple entries.

> +		}
> +	}
> +	mutex_unlock(&progs->mutex);
> +	bpf_map_put_with_uref(map_trace_link->map);
> +}
> +
> +static void bpf_map_trace_link_dealloc(struct bpf_link *link)
> +{
> +	struct bpf_map_trace_link *map_trace_link =
> +			container_of(link, struct bpf_map_trace_link, link);
> +
> +	kfree(map_trace_link);
> +}
> +
> +static int bpf_map_trace_link_replace(struct bpf_link *link,
> +				      struct bpf_prog *new_prog,
> +				      struct bpf_prog *old_prog)
> +{
> +	int ret = 0;
> +
> +	mutex_lock(&link_mutex);
> +	if (old_prog && link->prog != old_prog) {
> +		ret = -EPERM;
> +		goto out_unlock;
> +	}
> +
> +	if (link->prog->type != new_prog->type ||
> +	    link->prog->expected_attach_type != new_prog->expected_attach_type ||
> +	    link->prog->aux->attach_btf_id != new_prog->aux->attach_btf_id) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	old_prog = xchg(&link->prog, new_prog);
> +	bpf_prog_put(old_prog);
> +
> +out_unlock:
> +	mutex_unlock(&link_mutex);
> +	return ret;
> +}
> +
> +static const struct bpf_link_ops bpf_map_trace_link_ops = {
> +	.release = bpf_map_trace_link_release,
> +	.dealloc = bpf_map_trace_link_dealloc,
> +	.update_prog = bpf_map_trace_link_replace,
> +};
> +
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 04/13] bpf: Define a few bpf_link_ops for BPF_TRACE_MAP
  2021-09-30  0:26   ` Eric Dumazet
@ 2021-09-30  1:09     ` Joe Burton
  0 siblings, 0 replies; 21+ messages in thread
From: Joe Burton @ 2021-09-30  1:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Petar Penkov, Stanislav Fomichev, Hao Luo, netdev, bpf,
	Joe Burton

Good catch, applied both changes. The expectation is to remove only
one
 program. Theoretically an app could link the same program to the same
map
twice, twice, in which case close()ing one link should not detach both
programs.

I opted to also apply the _safe() suffix mostly as a matter of convention.

-       struct bpf_map_trace_prog *cur_prog;
+       struct bpf_map_trace_prog *cur_prog, *tmp;
        struct bpf_map_trace_progs *progs;

        progs = map_trace_link->map->trace_progs;
        mutex_lock(&progs->mutex);
-       list_for_each_entry(cur_prog, &progs->progs[trace_type].list, list) {
+       list_for_each_entry_safe(cur_prog, tmp, &progs->progs[trace_type].list,
+                                list) {
                if (cur_prog->prog == link->prog) {
                        progs->length[trace_type] -= 1;
                        list_del_rcu(&cur_prog->list);
                        kfree_rcu(cur_prog, rcu);
+                       break;
                }
        }


On Wed, Sep 29, 2021 at 5:26 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 9/29/21 4:59 PM, Joe Burton wrote:
> > From: Joe Burton <jevburton@google.com>
> >
> > Define release, dealloc, and update_prog for the new tracing programs.
> > Updates are protected by a single global mutex.
> >
> > Signed-off-by: Joe Burton <jevburton@google.com>
> > ---
> >  kernel/bpf/map_trace.c | 71 ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 71 insertions(+)
> >
> > diff --git a/kernel/bpf/map_trace.c b/kernel/bpf/map_trace.c
> > index 7776b8ccfe88..35906d59ba3c 100644
> > --- a/kernel/bpf/map_trace.c
> > +++ b/kernel/bpf/map_trace.c
> > @@ -14,6 +14,14 @@ struct bpf_map_trace_target_info {
> >  static struct list_head targets = LIST_HEAD_INIT(targets);
> >  static DEFINE_MUTEX(targets_mutex);
> >
> > +struct bpf_map_trace_link {
> > +     struct bpf_link link;
> > +     struct bpf_map *map;
> > +     struct bpf_map_trace_target_info *tinfo;
> > +};
> > +
> > +static DEFINE_MUTEX(link_mutex);
> > +
> >  int bpf_map_trace_reg_target(const struct bpf_map_trace_reg *reg_info)
> >  {
> >       struct bpf_map_trace_target_info *tinfo;
> > @@ -77,3 +85,66 @@ int bpf_map_initialize_trace_progs(struct bpf_map *map)
> >       return 0;
> >  }
> >
> > +static void bpf_map_trace_link_release(struct bpf_link *link)
> > +{
> > +     struct bpf_map_trace_link *map_trace_link =
> > +                     container_of(link, struct bpf_map_trace_link, link);
> > +     enum bpf_map_trace_type trace_type =
> > +                     map_trace_link->tinfo->reg_info->trace_type;
> > +     struct bpf_map_trace_prog *cur_prog;
> > +     struct bpf_map_trace_progs *progs;
> > +
> > +     progs = map_trace_link->map->trace_progs;
> > +     mutex_lock(&progs->mutex);
> > +     list_for_each_entry(cur_prog, &progs->progs[trace_type].list, list) {
>
> You might consider using list_for_each_entry_safe(), or ...
>
> > +             if (cur_prog->prog == link->prog) {
> > +                     progs->length[trace_type] -= 1;
> > +                     list_del_rcu(&cur_prog->list);
> > +                     kfree_rcu(cur_prog, rcu);
>
> or add a break; if you do not expect to find multiple entries.
>
> > +             }
> > +     }
> > +     mutex_unlock(&progs->mutex);
> > +     bpf_map_put_with_uref(map_trace_link->map);
> > +}
> > +
> > +static void bpf_map_trace_link_dealloc(struct bpf_link *link)
> > +{
> > +     struct bpf_map_trace_link *map_trace_link =
> > +                     container_of(link, struct bpf_map_trace_link, link);
> > +
> > +     kfree(map_trace_link);
> > +}
> > +
> > +static int bpf_map_trace_link_replace(struct bpf_link *link,
> > +                                   struct bpf_prog *new_prog,
> > +                                   struct bpf_prog *old_prog)
> > +{
> > +     int ret = 0;
> > +
> > +     mutex_lock(&link_mutex);
> > +     if (old_prog && link->prog != old_prog) {
> > +             ret = -EPERM;
> > +             goto out_unlock;
> > +     }
> > +
> > +     if (link->prog->type != new_prog->type ||
> > +         link->prog->expected_attach_type != new_prog->expected_attach_type ||
> > +         link->prog->aux->attach_btf_id != new_prog->aux->attach_btf_id) {
> > +             ret = -EINVAL;
> > +             goto out_unlock;
> > +     }
> > +
> > +     old_prog = xchg(&link->prog, new_prog);
> > +     bpf_prog_put(old_prog);
> > +
> > +out_unlock:
> > +     mutex_unlock(&link_mutex);
> > +     return ret;
> > +}
> > +
> > +static const struct bpf_link_ops bpf_map_trace_link_ops = {
> > +     .release = bpf_map_trace_link_release,
> > +     .dealloc = bpf_map_trace_link_dealloc,
> > +     .update_prog = bpf_map_trace_link_replace,
> > +};
> > +
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 00/13] Introduce BPF map tracing capability
  2021-09-29 23:58 [RFC PATCH v2 00/13] Introduce BPF map tracing capability Joe Burton
                   ` (12 preceding siblings ...)
  2021-09-29 23:59 ` [RFC PATCH v2 13/13] bpf: Add real world example " Joe Burton
@ 2021-10-05  5:13 ` Alexei Starovoitov
  2021-10-05 21:47   ` Joe Burton
  13 siblings, 1 reply; 21+ messages in thread
From: Alexei Starovoitov @ 2021-10-05  5:13 UTC (permalink / raw)
  To: Joe Burton
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Petar Penkov, Stanislav Fomichev, Hao Luo, netdev, bpf,
	Joe Burton

On Wed, Sep 29, 2021 at 11:58:57PM +0000, Joe Burton wrote:
> From: Joe Burton <jevburton@google.com>
> 
> This patch introduces 'map tracing': the capability to execute a
> tracing program after updating a map.
> 
> Map tracing enables upgrades of stateful programs with fewer race
> conditions than otherwise possible. We use a tracing program to
> imbue a map with copy-on-write semantics, then use an iterator to
> perform a bulk copy of data in the map. After bulk copying concludes,
> updates to that map automatically propagate via the tracing
> program, avoiding a class of race conditions. This use case is
> demonstrated in the new 'real_world_example' selftest.
> 
> Extend BPF_PROG_TYPE_TRACING with a new attach type, BPF_TRACE_MAP,
> and allow linking these programs to arbitrary maps.
> 
> Extend the verifier to invoke helper calls directly after
> bpf_map_update_elem() and bpf_map_delete_elem(). The helpers have the
> exact same signature as the functions they trace, and simply pass those
> arguments to the list of tracing programs attached to the map.

It's a neat idea to user verifier powers for this job,
but I wonder why simple tracepoint in map ops was not used instead?
With BTF the bpf progs see the actual types of raw tracepoints.
If tracepoint has map, key, value pointers the prog will be able
to access them in read-only mode.
Such map pointer will be PTR_TO_BTF_ID, so the prog won't be able
to recursively do lookup/update on this map pointer,
but that's what you need anyway, right?
If not we can extend this part of the tracepoint/verifier.

Instead of tracepoint it could have been an empty noinline function
and fentry/fexit would see all arguments as well.

> One open question is how to handle pointer-based map updates. For
> example:
>   int *x = bpf_map_lookup_elem(...);
>   if (...) *x++;
>   if (...) *x--;
> We can't just call a helper function right after the
> bpf_map_lookup_elem(), since the updates occur later on. We also can't
> determine where the last modification to the pointer occurs, due to
> branch instructions. I would therefore consider a pattern where we
> 'flush' pointers at the end of a BPF program:
>   int *x = bpf_map_lookup_elem(...);
>   ...
>   /* Execute tracing programs for this cell in this map. */
>   bpf_map_trace_pointer_update(x);
>   return 0;
> We can't necessarily do this in the verifier, since 'x' may no
> longer be in a register or on the stack. Thus we might introduce a
> helper to save pointers that should be flushed, then flush all
> registered pointers at every exit point:
>   int *x = bpf_map_lookup_elem(...);
>   /*
>    * Saves 'x' somewhere in kernel memory. Does nothing if no
>    * corresponding tracing progs are attached to the map.
>    */
>   bpf_map_trace_register_pointer(x);
>   ...
>   /* flush all registered pointers */
>   bpf_map_trace_pointer_update();
>   return 0;
> This should be easy to implement in the verifier.

I don't think the "solution" for lookup operation is worth pursuing.
The bpf prog that needs this map tracing is completely in your control.
So just don't do writes after lookup.

> In addition, we use the verifier to instrument certain map update
> calls. This requires saving arguments onto the stack, which means that
> a program using MAX_BPF_STACK bytes of stack could exceed the limit.
> I don't know whether this actually causes any problems.

Extra 8*4 bytes of stack is not a deal breaker.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 00/13] Introduce BPF map tracing capability
  2021-10-05  5:13 ` [RFC PATCH v2 00/13] Introduce BPF map tracing capability Alexei Starovoitov
@ 2021-10-05 21:47   ` Joe Burton
  2021-10-06 16:41     ` Alexei Starovoitov
  0 siblings, 1 reply; 21+ messages in thread
From: Joe Burton @ 2021-10-05 21:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Joe Burton, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Petar Penkov, Stanislav Fomichev, Hao Luo, netdev, bpf

> It's a neat idea to user verifier powers for this job,
> but I wonder why simple tracepoint in map ops was not used instead?

My concern with tracepoints is that they execute for all map updates,
not for a particular map. Ideally performing an upgrade of program X
should not affect the performance characteristics of program Y.

If n programs are opted into this model, then upgrading any of them
affects the performance characteristics of every other. There's also
the (very remote) possibility of multiple simultaneous upgrades tracing
map updates at the same time, causing a greater performance hit.

> I don't think the "solution" for lookup operation is worth pursuing.
> The bpf prog that needs this map tracing is completely in your control.
> So just don't do writes after lookup.

I eventually want to support apps that use local storage. Those APIs
generally only allow updates via a pointer. E.g. bpf_sk_storage_get()
only allows updates via the returned pointer and via
bpf_sk_storage_delete().

Since I eventually have to solve this problem to handle local storage,
then it seems worth solving it for normal maps as well. They seem
like isomorphic problems.

On Mon, Oct 4, 2021 at 10:13 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Sep 29, 2021 at 11:58:57PM +0000, Joe Burton wrote:
> > From: Joe Burton <jevburton@google.com>
> >
> > This patch introduces 'map tracing': the capability to execute a
> > tracing program after updating a map.
> >
> > Map tracing enables upgrades of stateful programs with fewer race
> > conditions than otherwise possible. We use a tracing program to
> > imbue a map with copy-on-write semantics, then use an iterator to
> > perform a bulk copy of data in the map. After bulk copying concludes,
> > updates to that map automatically propagate via the tracing
> > program, avoiding a class of race conditions. This use case is
> > demonstrated in the new 'real_world_example' selftest.
> >
> > Extend BPF_PROG_TYPE_TRACING with a new attach type, BPF_TRACE_MAP,
> > and allow linking these programs to arbitrary maps.
> >
> > Extend the verifier to invoke helper calls directly after
> > bpf_map_update_elem() and bpf_map_delete_elem(). The helpers have the
> > exact same signature as the functions they trace, and simply pass those
> > arguments to the list of tracing programs attached to the map.
>
> It's a neat idea to user verifier powers for this job,
> but I wonder why simple tracepoint in map ops was not used instead?
> With BTF the bpf progs see the actual types of raw tracepoints.
> If tracepoint has map, key, value pointers the prog will be able
> to access them in read-only mode.
> Such map pointer will be PTR_TO_BTF_ID, so the prog won't be able
> to recursively do lookup/update on this map pointer,
> but that's what you need anyway, right?
> If not we can extend this part of the tracepoint/verifier.
>
> Instead of tracepoint it could have been an empty noinline function
> and fentry/fexit would see all arguments as well.
>
> > One open question is how to handle pointer-based map updates. For
> > example:
> >   int *x = bpf_map_lookup_elem(...);
> >   if (...) *x++;
> >   if (...) *x--;
> > We can't just call a helper function right after the
> > bpf_map_lookup_elem(), since the updates occur later on. We also can't
> > determine where the last modification to the pointer occurs, due to
> > branch instructions. I would therefore consider a pattern where we
> > 'flush' pointers at the end of a BPF program:
> >   int *x = bpf_map_lookup_elem(...);
> >   ...
> >   /* Execute tracing programs for this cell in this map. */
> >   bpf_map_trace_pointer_update(x);
> >   return 0;
> > We can't necessarily do this in the verifier, since 'x' may no
> > longer be in a register or on the stack. Thus we might introduce a
> > helper to save pointers that should be flushed, then flush all
> > registered pointers at every exit point:
> >   int *x = bpf_map_lookup_elem(...);
> >   /*
> >    * Saves 'x' somewhere in kernel memory. Does nothing if no
> >    * corresponding tracing progs are attached to the map.
> >    */
> >   bpf_map_trace_register_pointer(x);
> >   ...
> >   /* flush all registered pointers */
> >   bpf_map_trace_pointer_update();
> >   return 0;
> > This should be easy to implement in the verifier.
>
> I don't think the "solution" for lookup operation is worth pursuing.
> The bpf prog that needs this map tracing is completely in your control.
> So just don't do writes after lookup.
>
> > In addition, we use the verifier to instrument certain map update
> > calls. This requires saving arguments onto the stack, which means that
> > a program using MAX_BPF_STACK bytes of stack could exceed the limit.
> > I don't know whether this actually causes any problems.
>
> Extra 8*4 bytes of stack is not a deal breaker.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 00/13] Introduce BPF map tracing capability
  2021-10-05 21:47   ` Joe Burton
@ 2021-10-06 16:41     ` Alexei Starovoitov
  2021-10-06 21:05       ` Joe Burton
  0 siblings, 1 reply; 21+ messages in thread
From: Alexei Starovoitov @ 2021-10-06 16:41 UTC (permalink / raw)
  To: Joe Burton
  Cc: Joe Burton, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Petar Penkov, Stanislav Fomichev, Hao Luo, netdev, bpf

On Tue, Oct 05, 2021 at 02:47:34PM -0700, Joe Burton wrote:
> > It's a neat idea to user verifier powers for this job,
> > but I wonder why simple tracepoint in map ops was not used instead?
> 
> My concern with tracepoints is that they execute for all map updates,
> not for a particular map. Ideally performing an upgrade of program X
> should not affect the performance characteristics of program Y.

Right, but single 'if (map == map_ptr_being_traced)'
won't really affect update() speed of maps.
For hash maps the update/delete are heavy operations with a bunch of
checks and spinlocks.
Just to make sure we're on the same patch I'm proposing something like
the patch below...

> If n programs are opted into this model, then upgrading any of them
> affects the performance characteristics of every other. There's also
> the (very remote) possibility of multiple simultaneous upgrades tracing
> map updates at the same time, causing a greater performance hit.

Also consider that the verifier fixup of update/delete in the code
is permanent whereas attaching fentry or fmod_ret to a nop function is temporary.
Once tracing of the map is no longer necessary that fentry program
will be detached and overhead will go back to zero.
Which is not the case for 'fixup' approach.

With fmod_ret the tracing program might be the enforcing program.
It could be used to disallow certain map access in a generic way.

> > I don't think the "solution" for lookup operation is worth pursuing.
> > The bpf prog that needs this map tracing is completely in your control.
> > So just don't do writes after lookup.
> 
> I eventually want to support apps that use local storage. Those APIs
> generally only allow updates via a pointer. E.g. bpf_sk_storage_get()
> only allows updates via the returned pointer and via
> bpf_sk_storage_delete().
> 
> Since I eventually have to solve this problem to handle local storage,
> then it seems worth solving it for normal maps as well. They seem
> like isomorphic problems.

Especially for local storage... doing tracing from bpf program itself
seems to make the most sense.

From c7b6ec4488ee50ebbca61c22c6837fd6fe7007bf Mon Sep 17 00:00:00 2001
From: Alexei Starovoitov <ast@kernel.org>
Date: Wed, 6 Oct 2021 09:30:21 -0700
Subject: [PATCH] bpf: trace array map update

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/arraymap.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 5e1ccfae916b..89f853b1a217 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -293,6 +293,13 @@ static void check_and_free_timer_in_array(struct bpf_array *arr, void *val)
 		bpf_timer_cancel_and_free(val + arr->map.timer_off);
 }
 
+noinline int bpf_array_map_trace_update(struct bpf_map *map, void *key,
+					void *value, u64 map_flags)
+{
+	return 0;
+}
+ALLOW_ERROR_INJECTION(bpf_array_map_trace_update, ERRNO);
+
 /* Called from syscall or from eBPF program */
 static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
 				 u64 map_flags)
@@ -300,6 +307,7 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	u32 index = *(u32 *)key;
 	char *val;
+	int err;
 
 	if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
 		/* unknown flags */
@@ -317,6 +325,9 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
 		     !map_value_has_spin_lock(map)))
 		return -EINVAL;
 
+	if (unlikely(err = bpf_array_map_trace_update(map, key, value, map_flags)))
+		return err;
+
 	if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
 		memcpy(this_cpu_ptr(array->pptrs[index & array->index_mask]),
 		       value, map->value_size);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 00/13] Introduce BPF map tracing capability
  2021-10-06 16:41     ` Alexei Starovoitov
@ 2021-10-06 21:05       ` Joe Burton
  2021-10-18 23:15         ` Alexei Starovoitov
  0 siblings, 1 reply; 21+ messages in thread
From: Joe Burton @ 2021-10-06 21:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Joe Burton, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Petar Penkov, Stanislav Fomichev, Hao Luo, netdev, bpf

> Just to make sure we're on the same patch I'm proposing something like
> the patch below...

The proposed patch seems reasonable overall:
+ eliminates a lot of boilerplate
+ enables map update filtering
+ minimal perf cost when not tracing maps
+ avoids adding complexity to verifier
- requires touching every map type's implementation
- tracing one map implies tracing all maps

I can rev this RFC with hooks inside the common map types' update() and
delete() methods.

> Especially for local storage... doing tracing from bpf program itself
> seems to make the most sense.

I'm a little unclear on how this should work. There's no off-the-shelf
solution that can do this for us, right?

In particular I think we're looking for an interface like this:

        /* This is a BPF program */
        int my_prog(struct bpf_sock *sk) {
                struct MyValue *v = bpf_sk_storage_get(&my_map, sk, ...);
                ...
                bpf_sk_storage_trace(&my_map, sk, v);
                return 0;
        }

I.e. we need some way of triggering a tracing hook from a BPF program.
For non-local storage maps we can achieve this with a
bpf_map_update_elem(). For local storage I suspect we need something
new.

Assuming there's no off-the-shelf hook that I'm missing, we can do some
brainstorming internally and come back with a proposal or two.

On Wed, Oct 6, 2021 at 9:41 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Oct 05, 2021 at 02:47:34PM -0700, Joe Burton wrote:
> > > It's a neat idea to user verifier powers for this job,
> > > but I wonder why simple tracepoint in map ops was not used instead?
> >
> > My concern with tracepoints is that they execute for all map updates,
> > not for a particular map. Ideally performing an upgrade of program X
> > should not affect the performance characteristics of program Y.
>
> Right, but single 'if (map == map_ptr_being_traced)'
> won't really affect update() speed of maps.
> For hash maps the update/delete are heavy operations with a bunch of
> checks and spinlocks.
> Just to make sure we're on the same patch I'm proposing something like
> the patch below...
>
> > If n programs are opted into this model, then upgrading any of them
> > affects the performance characteristics of every other. There's also
> > the (very remote) possibility of multiple simultaneous upgrades tracing
> > map updates at the same time, causing a greater performance hit.
>
> Also consider that the verifier fixup of update/delete in the code
> is permanent whereas attaching fentry or fmod_ret to a nop function is temporary.
> Once tracing of the map is no longer necessary that fentry program
> will be detached and overhead will go back to zero.
> Which is not the case for 'fixup' approach.
>
> With fmod_ret the tracing program might be the enforcing program.
> It could be used to disallow certain map access in a generic way.
>
> > > I don't think the "solution" for lookup operation is worth pursuing.
> > > The bpf prog that needs this map tracing is completely in your control.
> > > So just don't do writes after lookup.
> >
> > I eventually want to support apps that use local storage. Those APIs
> > generally only allow updates via a pointer. E.g. bpf_sk_storage_get()
> > only allows updates via the returned pointer and via
> > bpf_sk_storage_delete().
> >
> > Since I eventually have to solve this problem to handle local storage,
> > then it seems worth solving it for normal maps as well. They seem
> > like isomorphic problems.
>
> Especially for local storage... doing tracing from bpf program itself
> seems to make the most sense.
>
> From c7b6ec4488ee50ebbca61c22c6837fd6fe7007bf Mon Sep 17 00:00:00 2001
> From: Alexei Starovoitov <ast@kernel.org>
> Date: Wed, 6 Oct 2021 09:30:21 -0700
> Subject: [PATCH] bpf: trace array map update
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  kernel/bpf/arraymap.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 5e1ccfae916b..89f853b1a217 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -293,6 +293,13 @@ static void check_and_free_timer_in_array(struct bpf_array *arr, void *val)
>                 bpf_timer_cancel_and_free(val + arr->map.timer_off);
>  }
>
> +noinline int bpf_array_map_trace_update(struct bpf_map *map, void *key,
> +                                       void *value, u64 map_flags)
> +{
> +       return 0;
> +}
> +ALLOW_ERROR_INJECTION(bpf_array_map_trace_update, ERRNO);
> +
>  /* Called from syscall or from eBPF program */
>  static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
>                                  u64 map_flags)
> @@ -300,6 +307,7 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
>         struct bpf_array *array = container_of(map, struct bpf_array, map);
>         u32 index = *(u32 *)key;
>         char *val;
> +       int err;
>
>         if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
>                 /* unknown flags */
> @@ -317,6 +325,9 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
>                      !map_value_has_spin_lock(map)))
>                 return -EINVAL;
>
> +       if (unlikely(err = bpf_array_map_trace_update(map, key, value, map_flags)))
> +               return err;
> +
>         if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
>                 memcpy(this_cpu_ptr(array->pptrs[index & array->index_mask]),
>                        value, map->value_size);
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 00/13] Introduce BPF map tracing capability
  2021-10-06 21:05       ` Joe Burton
@ 2021-10-18 23:15         ` Alexei Starovoitov
  0 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2021-10-18 23:15 UTC (permalink / raw)
  To: Joe Burton
  Cc: Joe Burton, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Petar Penkov, Stanislav Fomichev, Hao Luo, netdev, bpf

On Wed, Oct 06, 2021 at 02:05:55PM -0700, Joe Burton wrote:
> > Just to make sure we're on the same patch I'm proposing something like
> > the patch below...
> 
> The proposed patch seems reasonable overall:
> + eliminates a lot of boilerplate
> + enables map update filtering
> + minimal perf cost when not tracing maps
> + avoids adding complexity to verifier
> - requires touching every map type's implementation
> - tracing one map implies tracing all maps

right. The single 'if' filter inside attached bpf prog should be fast enough.

> I can rev this RFC with hooks inside the common map types' update() and
> delete() methods.
> 
> > Especially for local storage... doing tracing from bpf program itself
> > seems to make the most sense.
> 
> I'm a little unclear on how this should work. There's no off-the-shelf
> solution that can do this for us, right?
> 
> In particular I think we're looking for an interface like this:
> 
>         /* This is a BPF program */
>         int my_prog(struct bpf_sock *sk) {
>                 struct MyValue *v = bpf_sk_storage_get(&my_map, sk, ...);
>                 ...
>                 bpf_sk_storage_trace(&my_map, sk, v);
>                 return 0;
>         }
>
>
> I.e. we need some way of triggering a tracing hook from a BPF program.

I mean that above can be done as bpf prog.
bpf_sk_storage_trace() can be an empty global function inside a bpf program.
The attach to it is either fentry or even freplace.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-10-18 23:15 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-29 23:58 [RFC PATCH v2 00/13] Introduce BPF map tracing capability Joe Burton
2021-09-29 23:58 ` [RFC PATCH v2 01/13] bpf: Add machinery to register map tracing hooks Joe Burton
2021-09-29 23:58 ` [RFC PATCH v2 02/13] bpf: Allow loading BPF_TRACE_MAP programs Joe Burton
2021-09-29 23:59 ` [RFC PATCH v2 03/13] bpf: Add list of tracing programs to struct bpf_map Joe Burton
2021-09-29 23:59 ` [RFC PATCH v2 04/13] bpf: Define a few bpf_link_ops for BPF_TRACE_MAP Joe Burton
2021-09-30  0:26   ` Eric Dumazet
2021-09-30  1:09     ` Joe Burton
2021-09-29 23:59 ` [RFC PATCH v2 05/13] bpf: Enable creation of BPF_LINK_TYPE_MAP_TRACE Joe Burton
2021-09-29 23:59 ` [RFC PATCH v2 06/13] bpf: Add APIs to invoke tracing programs Joe Burton
2021-09-29 23:59 ` [RFC PATCH v2 07/13] bpf: Register BPF_MAP_TRACE_{UPDATE,DELETE}_ELEM hooks Joe Burton
2021-09-29 23:59 ` [RFC PATCH v2 08/13] libbpf: Support BPF_TRACE_MAP Joe Burton
2021-09-29 23:59 ` [RFC PATCH v2 09/13] bpf: Add infinite loop check on map tracers Joe Burton
2021-09-29 23:59 ` [RFC PATCH v2 10/13] Add bpf_map_trace_{update,delete}_elem() helper functions Joe Burton
2021-09-29 23:59 ` [RFC PATCH v2 11/13] bpf: verifier inserts map tracing helper call Joe Burton
2021-09-29 23:59 ` [RFC PATCH v2 12/13] bpf: Add selftests for map tracing Joe Burton
2021-09-29 23:59 ` [RFC PATCH v2 13/13] bpf: Add real world example " Joe Burton
2021-10-05  5:13 ` [RFC PATCH v2 00/13] Introduce BPF map tracing capability Alexei Starovoitov
2021-10-05 21:47   ` Joe Burton
2021-10-06 16:41     ` Alexei Starovoitov
2021-10-06 21:05       ` Joe Burton
2021-10-18 23:15         ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).