All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/7] BPF Static Keys
@ 2023-12-06 14:10 Anton Protopopov
  2023-12-06 14:10 ` [PATCH bpf-next 1/7] bpf: extract bpf_prog_bind_map logic into an inline helper Anton Protopopov
                   ` (6 more replies)
  0 siblings, 7 replies; 32+ messages in thread
From: Anton Protopopov @ 2023-12-06 14:10 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa,
	Martin KaFai Lau, Stanislav Fomichev, bpf
  Cc: Anton Protopopov

This patch series adds new BPF Static Keys API and implements it for X86.

The first three patches are preparatory to make main changes simpler. 

The fourth patch adds kernel API which allows to pass a list of "static
branches" to kernel. A static branch is JA instruction which can be patched
runtime by setting controlling map, aka static key, value to zero/non-zero.

The fifth patch adds arch support for static keys on x86.

The sixth patch adds libbpf support for static keys, which is more
user-friendly than just passing a list of branches.

Finally, the seventh patch adds some self-tests.

See the Plumbers talk [1] for more details on the design.

Anton Protopopov (7):
  bpf: extract bpf_prog_bind_map logic into an inline helper
  bpf: rename and export a struct definition
  bpf: adjust functions offsets when patching progs
  bpf: implement BPF Static Keys support
  bpf: x86: implement static keys support
  libbpf: BPF Static Keys support
  selftests/bpf: Add tests for BPF Static Keys

 MAINTAINERS                                   |   7 +
 arch/x86/net/bpf_jit_comp.c                   |  72 +++
 include/linux/bpf.h                           |  34 ++
 include/uapi/linux/bpf.h                      |  18 +
 kernel/bpf/Makefile                           |   2 +
 kernel/bpf/arraymap.c                         |  15 +-
 kernel/bpf/core.c                             |   9 +
 kernel/bpf/skey.c                             | 306 ++++++++++++
 kernel/bpf/syscall.c                          | 100 ++--
 kernel/bpf/verifier.c                         | 103 ++++-
 tools/include/uapi/linux/bpf.h                |  18 +
 tools/lib/bpf/bpf.c                           |   5 +-
 tools/lib/bpf/bpf.h                           |   4 +-
 tools/lib/bpf/bpf_helpers.h                   |  64 +++
 tools/lib/bpf/libbpf.c                        | 273 ++++++++++-
 tools/lib/bpf/libbpf_internal.h               |   3 +
 tools/lib/bpf/linker.c                        |   8 +-
 .../bpf/prog_tests/bpf_static_keys.c          | 436 ++++++++++++++++++
 .../selftests/bpf/progs/bpf_static_keys.c     | 120 +++++
 19 files changed, 1542 insertions(+), 55 deletions(-)
 create mode 100644 kernel/bpf/skey.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_static_keys.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_static_keys.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 1/7] bpf: extract bpf_prog_bind_map logic into an inline helper
  2023-12-06 14:10 [PATCH bpf-next 0/7] BPF Static Keys Anton Protopopov
@ 2023-12-06 14:10 ` Anton Protopopov
  2023-12-06 14:10 ` [PATCH bpf-next 2/7] bpf: rename and export a struct definition Anton Protopopov
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: Anton Protopopov @ 2023-12-06 14:10 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa,
	Martin KaFai Lau, Stanislav Fomichev, bpf
  Cc: Anton Protopopov

Add a new inline function __bpf_prog_bind_map() which adds a new map to
prog->aux->used_maps. This new helper will be used in a consequent patch.
(This change also simplifies the code of the bpf_prog_bind_map() function.)

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
---
 kernel/bpf/syscall.c | 58 ++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0ed286b8a0f0..81625ef98a7d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2366,6 +2366,28 @@ struct bpf_prog *bpf_prog_get_type_dev(u32 ufd, enum bpf_prog_type type,
 }
 EXPORT_SYMBOL_GPL(bpf_prog_get_type_dev);
 
+static int __bpf_prog_bind_map(struct bpf_prog *prog, struct bpf_map *map)
+{
+	struct bpf_map **used_maps_new;
+	int i;
+
+	for (i = 0; i < prog->aux->used_map_cnt; i++)
+		if (prog->aux->used_maps[i] == map)
+			return -EEXIST;
+
+	used_maps_new = krealloc_array(prog->aux->used_maps,
+				       prog->aux->used_map_cnt + 1,
+				       sizeof(used_maps_new[0]),
+				       GFP_KERNEL);
+	if (!used_maps_new)
+		return -ENOMEM;
+
+	prog->aux->used_maps = used_maps_new;
+	prog->aux->used_maps[prog->aux->used_map_cnt++] = map;
+
+	return 0;
+}
+
 /* Initially all BPF programs could be loaded w/o specifying
  * expected_attach_type. Later for some of them specifying expected_attach_type
  * at load time became required so that program could be validated properly.
@@ -5285,8 +5307,7 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
 {
 	struct bpf_prog *prog;
 	struct bpf_map *map;
-	struct bpf_map **used_maps_old, **used_maps_new;
-	int i, ret = 0;
+	int ret = 0;
 
 	if (CHECK_ATTR(BPF_PROG_BIND_MAP))
 		return -EINVAL;
@@ -5305,37 +5326,16 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
 	}
 
 	mutex_lock(&prog->aux->used_maps_mutex);
-
-	used_maps_old = prog->aux->used_maps;
-
-	for (i = 0; i < prog->aux->used_map_cnt; i++)
-		if (used_maps_old[i] == map) {
-			bpf_map_put(map);
-			goto out_unlock;
-		}
-
-	used_maps_new = kmalloc_array(prog->aux->used_map_cnt + 1,
-				      sizeof(used_maps_new[0]),
-				      GFP_KERNEL);
-	if (!used_maps_new) {
-		ret = -ENOMEM;
-		goto out_unlock;
-	}
-
-	memcpy(used_maps_new, used_maps_old,
-	       sizeof(used_maps_old[0]) * prog->aux->used_map_cnt);
-	used_maps_new[prog->aux->used_map_cnt] = map;
-
-	prog->aux->used_map_cnt++;
-	prog->aux->used_maps = used_maps_new;
-
-	kfree(used_maps_old);
-
-out_unlock:
+	ret = __bpf_prog_bind_map(prog, map);
 	mutex_unlock(&prog->aux->used_maps_mutex);
 
 	if (ret)
 		bpf_map_put(map);
+
+	/* This map was already bound to the program */
+	if (ret == -EEXIST)
+		ret = 0;
+
 out_prog_put:
 	bpf_prog_put(prog);
 	return ret;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 2/7] bpf: rename and export a struct definition
  2023-12-06 14:10 [PATCH bpf-next 0/7] BPF Static Keys Anton Protopopov
  2023-12-06 14:10 ` [PATCH bpf-next 1/7] bpf: extract bpf_prog_bind_map logic into an inline helper Anton Protopopov
@ 2023-12-06 14:10 ` Anton Protopopov
  2023-12-06 14:10 ` [PATCH bpf-next 3/7] bpf: adjust functions offsets when patching progs Anton Protopopov
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: Anton Protopopov @ 2023-12-06 14:10 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa,
	Martin KaFai Lau, Stanislav Fomichev, bpf
  Cc: Anton Protopopov

There is a structure, struct prog_poke_elem, defined in the k/b/array.c.
It contains a list_head and a pointer to a bpf_prog_aux instance, and its
purpose is to serve as a list element in a list of bpf_prog_aux instances.
Rename it to struct bpf_prog_aux_list_elem and define inside the i/l/bpf.h
so that it can be reused for similar purposes by other pieces of code.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
---
 include/linux/bpf.h   |  5 +++++
 kernel/bpf/arraymap.c | 13 ++++---------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index eb84caf133df..8085780b7fcd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3227,4 +3227,9 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
 	return prog->aux->func_idx != 0;
 }
 
+struct bpf_prog_aux_list_elem {
+	struct list_head list;
+	struct bpf_prog_aux *aux;
+};
+
 #endif /* _LINUX_BPF_H */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 2058e89b5ddd..7e6df6bd7e7a 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -956,15 +956,10 @@ static void prog_array_map_seq_show_elem(struct bpf_map *map, void *key,
 	rcu_read_unlock();
 }
 
-struct prog_poke_elem {
-	struct list_head list;
-	struct bpf_prog_aux *aux;
-};
-
 static int prog_array_map_poke_track(struct bpf_map *map,
 				     struct bpf_prog_aux *prog_aux)
 {
-	struct prog_poke_elem *elem;
+	struct bpf_prog_aux_list_elem *elem;
 	struct bpf_array_aux *aux;
 	int ret = 0;
 
@@ -997,7 +992,7 @@ static int prog_array_map_poke_track(struct bpf_map *map,
 static void prog_array_map_poke_untrack(struct bpf_map *map,
 					struct bpf_prog_aux *prog_aux)
 {
-	struct prog_poke_elem *elem, *tmp;
+	struct bpf_prog_aux_list_elem *elem, *tmp;
 	struct bpf_array_aux *aux;
 
 	aux = container_of(map, struct bpf_array, map)->aux;
@@ -1017,7 +1012,7 @@ static void prog_array_map_poke_run(struct bpf_map *map, u32 key,
 				    struct bpf_prog *new)
 {
 	u8 *old_addr, *new_addr, *old_bypass_addr;
-	struct prog_poke_elem *elem;
+	struct bpf_prog_aux_list_elem *elem;
 	struct bpf_array_aux *aux;
 
 	aux = container_of(map, struct bpf_array, map)->aux;
@@ -1148,7 +1143,7 @@ static struct bpf_map *prog_array_map_alloc(union bpf_attr *attr)
 
 static void prog_array_map_free(struct bpf_map *map)
 {
-	struct prog_poke_elem *elem, *tmp;
+	struct bpf_prog_aux_list_elem *elem, *tmp;
 	struct bpf_array_aux *aux;
 
 	aux = container_of(map, struct bpf_array, map)->aux;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 3/7] bpf: adjust functions offsets when patching progs
  2023-12-06 14:10 [PATCH bpf-next 0/7] BPF Static Keys Anton Protopopov
  2023-12-06 14:10 ` [PATCH bpf-next 1/7] bpf: extract bpf_prog_bind_map logic into an inline helper Anton Protopopov
  2023-12-06 14:10 ` [PATCH bpf-next 2/7] bpf: rename and export a struct definition Anton Protopopov
@ 2023-12-06 14:10 ` Anton Protopopov
  2023-12-06 14:10 ` [PATCH bpf-next 4/7] bpf: implement BPF Static Keys support Anton Protopopov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: Anton Protopopov @ 2023-12-06 14:10 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa,
	Martin KaFai Lau, Stanislav Fomichev, bpf
  Cc: Anton Protopopov

When patching instructions with the bpf_patch_insn_data() function patch
env->prog->aux->func_info[i].insn_off as well.  Currently this doesn't
seem to break anything, but this filed will be used in a consequent patch.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
---
 kernel/bpf/verifier.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bf94ba50c6ee..5d38ee2e74a1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18181,6 +18181,20 @@ static void adjust_insn_aux_data(struct bpf_verifier_env *env,
 	vfree(old_data);
 }
 
+static void adjust_func_info(struct bpf_verifier_env *env, u32 off, u32 len)
+{
+	int i;
+
+	if (len == 1)
+		return;
+
+	for (i = 0; i < env->prog->aux->func_info_cnt; i++) {
+		if (env->prog->aux->func_info[i].insn_off <= off)
+			continue;
+		env->prog->aux->func_info[i].insn_off += len - 1;
+	}
+}
+
 static void adjust_subprog_starts(struct bpf_verifier_env *env, u32 off, u32 len)
 {
 	int i;
@@ -18232,6 +18246,7 @@ static struct bpf_prog *bpf_patch_insn_data(struct bpf_verifier_env *env, u32 of
 		return NULL;
 	}
 	adjust_insn_aux_data(env, new_data, new_prog, off, len);
+	adjust_func_info(env, off, len);
 	adjust_subprog_starts(env, off, len);
 	adjust_poke_descs(new_prog, off, len);
 	return new_prog;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 4/7] bpf: implement BPF Static Keys support
  2023-12-06 14:10 [PATCH bpf-next 0/7] BPF Static Keys Anton Protopopov
                   ` (2 preceding siblings ...)
  2023-12-06 14:10 ` [PATCH bpf-next 3/7] bpf: adjust functions offsets when patching progs Anton Protopopov
@ 2023-12-06 14:10 ` Anton Protopopov
  2023-12-06 14:10 ` [PATCH bpf-next 5/7] bpf: x86: implement static keys support Anton Protopopov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: Anton Protopopov @ 2023-12-06 14:10 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa,
	Martin KaFai Lau, Stanislav Fomichev, bpf
  Cc: Anton Protopopov

BPF static keys are created as array maps with BPF_F_STATIC_KEY in the
map_flags and with the following parameters (any other combination is
considered invalid):

    map_type: BPF_MAP_TYPE_ARRAY
    key_size: 4
    value_size: 4
    max_entries: 1

Given such a map, a BPF program can use it to control a "static branch",
which is a JA +OFF instruction which can be toggled to become a JA +0 by
writing 0 or 1 to the map. One branch is described by the

    struct bpf_static_branch_info {
            __u32 map_fd;
            __u32 insn_offset;
            __u32 jump_target;
            __u32 flags;
    };

structure. Here map_fd should point to a corresponding static key,
insn_offset is the offset of a JA instruction, jump_target is the
[absolute] address of instruction to which the JA jumps. The flags can be
either 0 or BPF_F_INVERSE_BRANCH which lets users to specify one of two
types of static branches: normal one patched to NOP/JUMP when key is
zero/non-zero, the other is inverse. This may seem non-obvious why
we need both normal and inverse branches, the answer is that both
are required if we want to implement "unlikely" and "likely" branches
controlled by a static key, see the consequent patch which implements
libbpf support for BPF static keys.

On program load a list of branches described by the struct
bpf_static_branch_info are passed via new attributes:

    __aligned_u64 static_branches_info;
    __u32         static_branches_info_size;

This patch doesn't actually fully implement the functionality for any
architecture. To do so, one should implement a
bpf_arch_poke_static_branch() helper which implements text poking for
particular architecture. The arch-specific code should also configure the
internal representation of the static branch appropriately (fill in
arch-specific fields).

The verification of the new feature is straightforward: instead of going
one edge for JA instruction, insert two edges for the original JA and NOP
(i.e., JA +0) instructions.

In order not to pollute kernel/bpf/{syscall.c,verier.c} files with new code
a new kernel/bpf/skey.c file was added.

For more details on design of the feature see the following talk at Linux
Plumbers 2023: https://lpc.events/event/17/contributions/1608/

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
---
 MAINTAINERS                    |   6 +
 include/linux/bpf.h            |  29 ++++
 include/uapi/linux/bpf.h       |  18 ++
 kernel/bpf/Makefile            |   2 +
 kernel/bpf/arraymap.c          |   2 +-
 kernel/bpf/core.c              |   9 +
 kernel/bpf/skey.c              | 306 +++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c           |  46 ++++-
 kernel/bpf/verifier.c          |  88 +++++++++-
 tools/include/uapi/linux/bpf.h |  18 ++
 10 files changed, 511 insertions(+), 13 deletions(-)
 create mode 100644 kernel/bpf/skey.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 14e1194faa4b..e2f655980c6c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3887,6 +3887,12 @@ S:	Maintained
 F:	kernel/bpf/stackmap.c
 F:	kernel/trace/bpf_trace.c
 
+BPF [STATIC KEYS]
+M:	Anton Protopopov <aspsk@isovalent.com>
+L:	bpf@vger.kernel.org
+S:	Maintained
+F:	kernel/bpf/skey.c
+
 BROADCOM ASP 2.0 ETHERNET DRIVER
 M:	Justin Chen <justin.chen@broadcom.com>
 M:	Florian Fainelli <florian.fainelli@broadcom.com>
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8085780b7fcd..6985b4893191 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -289,8 +289,18 @@ struct bpf_map {
 	bool bypass_spec_v1;
 	bool frozen; /* write-once; write-protected by freeze_mutex */
 	s64 __percpu *elem_count;
+	struct list_head static_key_list_head;
+	struct mutex static_key_mutex;
 };
 
+bool bpf_jit_supports_static_keys(void);
+struct bpf_static_branch *bpf_static_branch_by_offset(struct bpf_prog *bpf_prog,
+						      u32 offset);
+int bpf_prog_register_static_branches(struct bpf_prog *prog);
+int bpf_prog_init_static_branches(struct bpf_prog *prog, union bpf_attr *attr);
+int bpf_static_key_update(struct bpf_map *map, void *key, void *value, u64 flags);
+void bpf_static_key_remove_prog(struct bpf_map *map, struct bpf_prog_aux *aux);
+
 static inline const char *btf_field_type_name(enum btf_field_type type)
 {
 	switch (type) {
@@ -1381,6 +1391,17 @@ struct btf_mod_pair {
 
 struct bpf_kfunc_desc_tab;
 
+struct bpf_static_branch {
+	struct bpf_map *map;
+	u32 flags;
+	u32 bpf_offset;
+	void *arch_addr;
+	u32 arch_len;
+	u8 bpf_jmp[8];
+	u8 arch_nop[8];
+	u8 arch_jmp[8];
+};
+
 struct bpf_prog_aux {
 	atomic64_t refcnt;
 	u32 used_map_cnt;
@@ -1473,6 +1494,8 @@ struct bpf_prog_aux {
 		struct work_struct work;
 		struct rcu_head	rcu;
 	};
+	struct bpf_static_branch *static_branches;
+	u32 static_branches_len;
 };
 
 struct bpf_prog {
@@ -3176,6 +3199,9 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 void *bpf_arch_text_copy(void *dst, void *src, size_t len);
 int bpf_arch_text_invalidate(void *dst, size_t len);
 
+int bpf_arch_poke_static_branch(struct bpf_prog *prog,
+				struct bpf_static_branch *branch, bool on);
+
 struct btf_id_set;
 bool btf_id_set_contains(const struct btf_id_set *set, u32 id);
 
@@ -3232,4 +3258,7 @@ struct bpf_prog_aux_list_elem {
 	struct bpf_prog_aux *aux;
 };
 
+int __bpf_prog_bind_map(struct bpf_prog *prog, struct bpf_map *map,
+			bool check_boundaries);
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0f6cdf52b1da..2d3cf9175cf9 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1325,6 +1325,9 @@ enum {
 
 /* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */
 	BPF_F_PATH_FD		= (1U << 14),
+
+/* Treat this map as a BPF Static Key */
+	BPF_F_STATIC_KEY	= (1U << 15),
 };
 
 /* Flags for BPF_PROG_QUERY. */
@@ -1369,6 +1372,18 @@ struct bpf_stack_build_id {
 
 #define BPF_OBJ_NAME_LEN 16U
 
+/* flags for bpf_static_branch_info */
+enum {
+	BPF_F_INVERSE_BRANCH = 1,
+};
+
+struct bpf_static_branch_info {
+	__u32 map_fd;			/* map in control */
+	__u32 insn_offset;		/* absolute offset of the branch instruction */
+	__u32 jump_target;		/* absolute offset of the jump target */
+	__u32 flags;
+};
+
 union bpf_attr {
 	struct { /* anonymous struct used by BPF_MAP_CREATE command */
 		__u32	map_type;	/* one of enum bpf_map_type */
@@ -1467,6 +1482,9 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		log_true_size;
+		/* An array of struct bpf_static_branch_info */
+		__aligned_u64	static_branches_info;
+		__u32		static_branches_info_size;
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index f526b7573e97..f0f0eb9acf18 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -46,3 +46,5 @@ obj-$(CONFIG_BPF_PRELOAD) += preload/
 obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
 $(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE
 	$(call if_changed_rule,cc_o_c)
+
+obj-$(CONFIG_BPF_SYSCALL) += skey.o
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 7e6df6bd7e7a..f968489e1df8 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -17,7 +17,7 @@
 
 #define ARRAY_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_MMAPABLE | BPF_F_ACCESS_MASK | \
-	 BPF_F_PRESERVE_ELEMS | BPF_F_INNER_MAP)
+	 BPF_F_PRESERVE_ELEMS | BPF_F_INNER_MAP | BPF_F_STATIC_KEY)
 
 static void bpf_array_free_percpu(struct bpf_array *array)
 {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 08626b519ce2..b10ffcb0a6e6 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2670,6 +2670,8 @@ void __bpf_free_used_maps(struct bpf_prog_aux *aux,
 		map = used_maps[i];
 		if (map->ops->map_poke_untrack)
 			map->ops->map_poke_untrack(map, aux);
+		if (map->map_flags & BPF_F_STATIC_KEY)
+			bpf_static_key_remove_prog(map, aux);
 		bpf_map_put(map);
 	}
 }
@@ -2927,6 +2929,13 @@ void __weak arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp,
 {
 }
 
+int __weak bpf_arch_poke_static_branch(struct bpf_prog *prog,
+				       struct bpf_static_branch *branch,
+				       bool on)
+{
+	return -EOPNOTSUPP;
+}
+
 #ifdef CONFIG_BPF_SYSCALL
 static int __init bpf_global_ma_init(void)
 {
diff --git a/kernel/bpf/skey.c b/kernel/bpf/skey.c
new file mode 100644
index 000000000000..8f1915ba6d44
--- /dev/null
+++ b/kernel/bpf/skey.c
@@ -0,0 +1,306 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Isovalent
+ */
+
+#include <linux/bpf.h>
+
+bool bpf_jit_supports_static_keys(void)
+{
+	int err;
+
+	/* Should return -EINVAL if supported */
+	err = bpf_arch_poke_static_branch(NULL, NULL, false);
+	return err != -EOPNOTSUPP;
+}
+
+struct bpf_static_branch *bpf_static_branch_by_offset(struct bpf_prog *bpf_prog, u32 offset)
+{
+	u32 i, n = bpf_prog->aux->static_branches_len;
+	struct bpf_static_branch *branch;
+
+	for (i = 0; i < n; i++) {
+		branch = &bpf_prog->aux->static_branches[i];
+		if (branch->bpf_offset == offset)
+			return branch;
+	}
+	return NULL;
+}
+
+static int bpf_prog_update_static_branches(struct bpf_prog *prog,
+					   const struct bpf_map *map, bool on)
+{
+	struct bpf_static_branch *branch;
+	int err = 0;
+	int i;
+
+	for (i = 0; i < prog->aux->static_branches_len; i++) {
+		branch = &prog->aux->static_branches[i];
+		if (branch->map != map)
+			continue;
+
+		err = bpf_arch_poke_static_branch(prog, branch, on);
+		if (err)
+			break;
+	}
+
+	return err;
+}
+
+static int static_key_add_prog(struct bpf_map *map, struct bpf_prog *prog)
+{
+	struct bpf_prog_aux_list_elem *elem;
+	u32 key = 0;
+	int err = 0;
+	u32 *val;
+
+	mutex_lock(&map->static_key_mutex);
+
+	val = map->ops->map_lookup_elem(map, &key);
+	if (!val) {
+		err = -ENOENT;
+		goto unlock_ret;
+	}
+
+	list_for_each_entry(elem, &map->static_key_list_head, list)
+		if (elem->aux == prog->aux)
+			goto unlock_ret;
+
+	elem = kmalloc(sizeof(*elem), GFP_KERNEL);
+	if (!elem) {
+		err = -ENOMEM;
+		goto unlock_ret;
+	}
+
+	INIT_LIST_HEAD(&elem->list);
+	elem->aux = prog->aux;
+
+	list_add_tail(&elem->list, &map->static_key_list_head);
+
+	err = bpf_prog_update_static_branches(prog, map, *val);
+
+unlock_ret:
+	mutex_unlock(&map->static_key_mutex);
+	return err;
+}
+
+void bpf_static_key_remove_prog(struct bpf_map *map, struct bpf_prog_aux *aux)
+{
+	struct bpf_prog_aux_list_elem *elem, *tmp;
+
+	mutex_lock(&map->static_key_mutex);
+	list_for_each_entry_safe(elem, tmp, &map->static_key_list_head, list) {
+		if (elem->aux == aux) {
+			list_del_init(&elem->list);
+			kfree(elem);
+			break;
+		}
+	}
+	mutex_unlock(&map->static_key_mutex);
+}
+
+int bpf_static_key_update(struct bpf_map *map, void *key, void *value, u64 flags)
+{
+	struct bpf_prog_aux_list_elem *elem;
+	bool on = *(u32 *)value;
+	int err;
+
+	mutex_lock(&map->static_key_mutex);
+
+	err = map->ops->map_update_elem(map, key, value, flags);
+	if (err)
+		goto unlock_ret;
+
+	list_for_each_entry(elem, &map->static_key_list_head, list) {
+		err = bpf_prog_update_static_branches(elem->aux->prog, map, on);
+		if (err)
+			break;
+	}
+
+unlock_ret:
+	mutex_unlock(&map->static_key_mutex);
+	return err;
+}
+
+static bool init_static_jump_instruction(struct bpf_prog *prog,
+					 struct bpf_static_branch *branch,
+					 struct bpf_static_branch_info *branch_info)
+{
+	bool inverse = !!(branch_info->flags & BPF_F_INVERSE_BRANCH);
+	u32 insn_offset = branch_info->insn_offset;
+	u32 jump_target = branch_info->jump_target;
+	struct bpf_insn *jump_insn;
+	s32 jump_offset;
+
+	if (insn_offset % 8 || jump_target % 8)
+		return false;
+
+	if (insn_offset / 8 >= prog->len || jump_target / 8 >= prog->len)
+		return false;
+
+	jump_insn = &prog->insnsi[insn_offset / 8];
+	if (jump_insn->code != (BPF_JMP | BPF_JA) &&
+	    jump_insn->code != (BPF_JMP32 | BPF_JA))
+		return false;
+
+	if (jump_insn->dst_reg || jump_insn->src_reg)
+		return false;
+
+	if (jump_insn->off && jump_insn->imm)
+		return false;
+
+	jump_offset = ((long)jump_target - (long)insn_offset) / 8 - 1;
+
+	if (inverse) {
+		if (jump_insn->code == (BPF_JMP | BPF_JA)) {
+			if (jump_insn->off != jump_offset)
+				return false;
+		} else {
+			if (jump_insn->imm != jump_offset)
+				return false;
+		}
+	} else {
+		/* The instruction here should be JA 0. We will replace it by a
+		 * non-zero jump so that this is simpler to verify this program
+		 * (verifier might optimize out such instructions and we don't
+		 * want to care about this). After verification the instruction
+		 * will be set to proper value
+		 */
+		if (jump_insn->off || jump_insn->imm)
+			return false;
+
+		if (jump_insn->code == (BPF_JMP | BPF_JA))
+			jump_insn->off = jump_offset;
+		else
+			jump_insn->imm = jump_offset;
+	}
+
+	memcpy(branch->bpf_jmp, jump_insn, 8);
+	branch->bpf_offset = insn_offset;
+	return true;
+}
+
+static int
+__bpf_prog_init_static_branches(struct bpf_prog *prog,
+				struct bpf_static_branch_info *static_branches_info,
+				int n)
+{
+	size_t size = n * sizeof(*prog->aux->static_branches);
+	struct bpf_static_branch *static_branches;
+	struct bpf_map *map;
+	int i, err = 0;
+
+	static_branches = kzalloc(size, GFP_USER | __GFP_NOWARN);
+	if (!static_branches)
+		return -ENOMEM;
+
+	for (i = 0; i < n; i++) {
+		if (static_branches_info[i].flags & ~(BPF_F_INVERSE_BRANCH)) {
+			err = -EINVAL;
+			goto free_static_branches;
+		}
+		static_branches[i].flags = static_branches_info[i].flags;
+
+		if (!init_static_jump_instruction(prog, &static_branches[i],
+						  &static_branches_info[i])) {
+			err = -EINVAL;
+			goto free_static_branches;
+		}
+
+		map = bpf_map_get(static_branches_info[i].map_fd);
+		if (IS_ERR(map)) {
+			err = PTR_ERR(map);
+			goto free_static_branches;
+		}
+
+		if (!(map->map_flags & BPF_F_STATIC_KEY)) {
+			bpf_map_put(map);
+			err = -EINVAL;
+			goto free_static_branches;
+		}
+
+		err = __bpf_prog_bind_map(prog, map, true);
+		if (err) {
+			bpf_map_put(map);
+			if (err != -EEXIST)
+				goto free_static_branches;
+		}
+
+		static_branches[i].map = map;
+	}
+
+	prog->aux->static_branches = static_branches;
+	prog->aux->static_branches_len = n;
+
+	return 0;
+
+free_static_branches:
+	kfree(static_branches);
+	return err;
+}
+
+int bpf_prog_init_static_branches(struct bpf_prog *prog, union bpf_attr *attr)
+{
+	void __user *user_static_branches = u64_to_user_ptr(attr->static_branches_info);
+	size_t item_size = sizeof(struct bpf_static_branch_info);
+	struct bpf_static_branch_info *static_branches_info;
+	size_t size = attr->static_branches_info_size;
+	int err = 0;
+
+	if (!attr->static_branches_info)
+		return size ? -EINVAL : 0;
+	if (!size)
+		return -EINVAL;
+	if (size % item_size)
+		return -EINVAL;
+
+	if (!bpf_jit_supports_static_keys())
+		return -EOPNOTSUPP;
+
+	static_branches_info = kzalloc(size, GFP_USER | __GFP_NOWARN);
+	if (!static_branches_info)
+		return -ENOMEM;
+
+	if (copy_from_user(static_branches_info, user_static_branches, size)) {
+		err = -EFAULT;
+		goto free_branches;
+	}
+
+	err = __bpf_prog_init_static_branches(prog, static_branches_info,
+					      size / item_size);
+	if (err)
+		goto free_branches;
+
+	err = 0;
+
+free_branches:
+	kfree(static_branches_info);
+	return err;
+}
+
+int bpf_prog_register_static_branches(struct bpf_prog *prog)
+{
+	int n_branches = prog->aux->static_branches_len;
+	struct bpf_static_branch *branch;
+	int err;
+	u32 i;
+
+	for (i = 0; i < n_branches; i++) {
+		branch = &prog->aux->static_branches[i];
+
+		/* JIT compiler did not detect this branch
+		 * and thus won't be able to poke it when asked to
+		 */
+		if (!branch->arch_len)
+			return -EINVAL;
+	}
+
+	for (i = 0; i < n_branches; i++) {
+		branch = &prog->aux->static_branches[i];
+		err = static_key_add_prog(branch->map, prog);
+		if (err)
+			break;
+	}
+
+	return 0;
+}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 81625ef98a7d..a85ade499e45 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -197,6 +197,10 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
 		   map->map_type == BPF_MAP_TYPE_STACK ||
 		   map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) {
 		err = map->ops->map_push_elem(map, value, flags);
+	} else if (map->map_flags & BPF_F_STATIC_KEY) {
+		rcu_read_lock();
+		err = bpf_static_key_update(map, key, value, flags);
+		rcu_read_unlock();
 	} else {
 		rcu_read_lock();
 		err = map->ops->map_update_elem(map, key, value, flags);
@@ -1096,6 +1100,16 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	return ret;
 }
 
+static bool is_static_key(u32 map_type, u32 key_size, u32 value_size,
+			    u32 max_entries, u32 map_flags)
+{
+	return map_type == BPF_MAP_TYPE_ARRAY &&
+	       key_size == 4 &&
+	       value_size == 4 &&
+	       max_entries == 1 &&
+	       map_flags & BPF_F_STATIC_KEY;
+}
+
 #define BPF_MAP_CREATE_LAST_FIELD map_extra
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
@@ -1104,6 +1118,7 @@ static int map_create(union bpf_attr *attr)
 	int numa_node = bpf_map_attr_numa_node(attr);
 	u32 map_type = attr->map_type;
 	struct bpf_map *map;
+	bool static_key;
 	int f_flags;
 	int err;
 
@@ -1123,6 +1138,13 @@ static int map_create(union bpf_attr *attr)
 	    attr->map_extra != 0)
 		return -EINVAL;
 
+	static_key = is_static_key(attr->map_type, attr->key_size, attr->value_size,
+				   attr->max_entries, attr->map_flags);
+	if (static_key && !bpf_jit_supports_static_keys())
+		return -EOPNOTSUPP;
+	if (!static_key && (attr->map_flags & BPF_F_STATIC_KEY))
+		return -EINVAL;
+
 	f_flags = bpf_get_file_flag(attr->map_flags);
 	if (f_flags < 0)
 		return f_flags;
@@ -1221,7 +1243,9 @@ static int map_create(union bpf_attr *attr)
 	atomic64_set(&map->refcnt, 1);
 	atomic64_set(&map->usercnt, 1);
 	mutex_init(&map->freeze_mutex);
+	mutex_init(&map->static_key_mutex);
 	spin_lock_init(&map->owner.lock);
+	INIT_LIST_HEAD(&map->static_key_list_head);
 
 	if (attr->btf_key_type_id || attr->btf_value_type_id ||
 	    /* Even the map's value is a kernel's struct,
@@ -2366,7 +2390,7 @@ struct bpf_prog *bpf_prog_get_type_dev(u32 ufd, enum bpf_prog_type type,
 }
 EXPORT_SYMBOL_GPL(bpf_prog_get_type_dev);
 
-static int __bpf_prog_bind_map(struct bpf_prog *prog, struct bpf_map *map)
+int __bpf_prog_bind_map(struct bpf_prog *prog, struct bpf_map *map, bool check_boundaries)
 {
 	struct bpf_map **used_maps_new;
 	int i;
@@ -2375,6 +2399,13 @@ static int __bpf_prog_bind_map(struct bpf_prog *prog, struct bpf_map *map)
 		if (prog->aux->used_maps[i] == map)
 			return -EEXIST;
 
+	/*
+	 * This is ok to add more maps after the program is loaded, but not
+	 * before bpf_check, as verifier env only has MAX_USED_MAPS slots
+	 */
+	if (check_boundaries && prog->aux->used_map_cnt >= MAX_USED_MAPS)
+		return -E2BIG;
+
 	used_maps_new = krealloc_array(prog->aux->used_maps,
 				       prog->aux->used_map_cnt + 1,
 				       sizeof(used_maps_new[0]),
@@ -2388,6 +2419,7 @@ static int __bpf_prog_bind_map(struct bpf_prog *prog, struct bpf_map *map)
 	return 0;
 }
 
+
 /* Initially all BPF programs could be loaded w/o specifying
  * expected_attach_type. Later for some of them specifying expected_attach_type
  * at load time became required so that program could be validated properly.
@@ -2576,7 +2608,7 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
 }
 
 /* last field in 'union bpf_attr' used by this command */
-#define	BPF_PROG_LOAD_LAST_FIELD log_true_size
+#define	BPF_PROG_LOAD_LAST_FIELD static_branches_info_size
 
 static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 {
@@ -2734,6 +2766,10 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	if (err < 0)
 		goto free_prog_sec;
 
+	err = bpf_prog_init_static_branches(prog, attr);
+	if (err < 0)
+		goto free_prog_sec;
+
 	/* run eBPF verifier */
 	err = bpf_check(&prog, attr, uattr, uattr_size);
 	if (err < 0)
@@ -2743,6 +2779,10 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	if (err < 0)
 		goto free_used_maps;
 
+	err = bpf_prog_register_static_branches(prog);
+	if (err < 0)
+		goto free_used_maps;
+
 	err = bpf_prog_alloc_id(prog);
 	if (err)
 		goto free_used_maps;
@@ -5326,7 +5366,7 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
 	}
 
 	mutex_lock(&prog->aux->used_maps_mutex);
-	ret = __bpf_prog_bind_map(prog, map);
+	ret = __bpf_prog_bind_map(prog, map, false);
 	mutex_unlock(&prog->aux->used_maps_mutex);
 
 	if (ret)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5d38ee2e74a1..6b591f4a01c6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -15534,6 +15534,7 @@ static int visit_func_call_insn(int t, struct bpf_insn *insns,
 static int visit_insn(int t, struct bpf_verifier_env *env)
 {
 	struct bpf_insn *insns = env->prog->insnsi, *insn = &insns[t];
+	struct bpf_static_branch *branch;
 	int ret, off;
 
 	if (bpf_pseudo_func(insn))
@@ -15587,15 +15588,26 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
 		else
 			off = insn->imm;
 
-		/* unconditional jump with single edge */
-		ret = push_insn(t, t + off + 1, FALLTHROUGH, env,
-				true);
-		if (ret)
-			return ret;
+		branch = bpf_static_branch_by_offset(env->prog, t * 8);
+		if (unlikely(branch)) {
+			/* static branch with two edges */
+			mark_prune_point(env, t);
 
-		mark_prune_point(env, t + off + 1);
-		mark_jmp_point(env, t + off + 1);
+			ret = push_insn(t, t + 1, FALLTHROUGH, env, true);
+			if (ret)
+				return ret;
+
+			ret = push_insn(t, t + off + 1, BRANCH, env, true);
+		} else {
+			/* unconditional jump with single edge */
+			ret = push_insn(t, t + off + 1, FALLTHROUGH, env,
+					true);
+			if (ret)
+				return ret;
 
+			mark_prune_point(env, t + off + 1);
+			mark_jmp_point(env, t + off + 1);
+		}
 		return ret;
 
 	default:
@@ -17547,6 +17559,10 @@ static int do_check(struct bpf_verifier_env *env)
 
 				mark_reg_scratched(env, BPF_REG_0);
 			} else if (opcode == BPF_JA) {
+				struct bpf_verifier_state *other_branch;
+				struct bpf_static_branch *branch;
+				u32 jmp_offset;
+
 				if (BPF_SRC(insn->code) != BPF_K ||
 				    insn->src_reg != BPF_REG_0 ||
 				    insn->dst_reg != BPF_REG_0 ||
@@ -17557,9 +17573,20 @@ static int do_check(struct bpf_verifier_env *env)
 				}
 
 				if (class == BPF_JMP)
-					env->insn_idx += insn->off + 1;
+					jmp_offset = insn->off + 1;
 				else
-					env->insn_idx += insn->imm + 1;
+					jmp_offset = insn->imm + 1;
+
+				branch = bpf_static_branch_by_offset(env->prog, env->insn_idx * 8);
+				if (unlikely(branch)) {
+					other_branch = push_stack(env, env->insn_idx + jmp_offset,
+								  env->insn_idx, false);
+					if (!other_branch)
+						return -EFAULT;
+
+					jmp_offset = 1;
+				}
+				env->insn_idx += jmp_offset;
 				continue;
 
 			} else if (opcode == BPF_EXIT) {
@@ -17854,6 +17881,11 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 {
 	enum bpf_prog_type prog_type = resolve_prog_type(prog);
 
+	if (map->map_flags & BPF_F_STATIC_KEY) {
+		verbose(env, "progs cannot access static keys yet\n");
+		return -EINVAL;
+	}
+
 	if (btf_record_has_field(map->record, BPF_LIST_HEAD) ||
 	    btf_record_has_field(map->record, BPF_RB_ROOT)) {
 		if (is_tracing_prog_type(prog_type)) {
@@ -18223,6 +18255,25 @@ static void adjust_poke_descs(struct bpf_prog *prog, u32 off, u32 len)
 	}
 }
 
+static void adjust_static_branches(struct bpf_prog *prog, u32 off, u32 len)
+{
+	struct bpf_static_branch *branch;
+	const u32 delta = (len - 1) * 8; /* # of new prog bytes */
+	int i;
+
+	if (len <= 1)
+		return;
+
+	for (i = 0; i < prog->aux->static_branches_len; i++) {
+		branch = &prog->aux->static_branches[i];
+		if (branch->bpf_offset <= off * 8)
+			continue;
+
+		branch->bpf_offset += delta;
+		memcpy(branch->bpf_jmp, &prog->insnsi[branch->bpf_offset/8], 8);
+	}
+}
+
 static struct bpf_prog *bpf_patch_insn_data(struct bpf_verifier_env *env, u32 off,
 					    const struct bpf_insn *patch, u32 len)
 {
@@ -18249,6 +18300,7 @@ static struct bpf_prog *bpf_patch_insn_data(struct bpf_verifier_env *env, u32 of
 	adjust_func_info(env, off, len);
 	adjust_subprog_starts(env, off, len);
 	adjust_poke_descs(new_prog, off, len);
+	adjust_static_branches(new_prog, off, len);
 	return new_prog;
 }
 
@@ -18914,6 +18966,9 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		func[i]->aux->nr_linfo = prog->aux->nr_linfo;
 		func[i]->aux->jited_linfo = prog->aux->jited_linfo;
 		func[i]->aux->linfo_idx = env->subprog_info[i].linfo_idx;
+		func[i]->aux->static_branches = prog->aux->static_branches;
+		func[i]->aux->static_branches_len = prog->aux->static_branches_len;
+
 		num_exentries = 0;
 		insn = func[i]->insnsi;
 		for (j = 0; j < func[i]->len; j++, insn++) {
@@ -20704,6 +20759,21 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	env->fd_array = make_bpfptr(attr->fd_array, uattr.is_kernel);
 	is_priv = bpf_capable();
 
+	/* the program could already have referenced some maps */
+	if (env->prog->aux->used_map_cnt) {
+		if (WARN_ON(env->prog->aux->used_map_cnt > MAX_USED_MAPS ||
+			    !env->prog->aux->used_maps))
+			return -EFAULT;
+
+		memcpy(env->used_maps, env->prog->aux->used_maps,
+		       sizeof(env->used_maps[0]) * env->prog->aux->used_map_cnt);
+		env->used_map_cnt = env->prog->aux->used_map_cnt;
+
+		kfree(env->prog->aux->used_maps);
+		env->prog->aux->used_map_cnt = 0;
+		env->prog->aux->used_maps = NULL;
+	}
+
 	bpf_get_btf_vmlinux();
 
 	/* grab the mutex to protect few globals used by verifier */
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0f6cdf52b1da..2d3cf9175cf9 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1325,6 +1325,9 @@ enum {
 
 /* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */
 	BPF_F_PATH_FD		= (1U << 14),
+
+/* Treat this map as a BPF Static Key */
+	BPF_F_STATIC_KEY	= (1U << 15),
 };
 
 /* Flags for BPF_PROG_QUERY. */
@@ -1369,6 +1372,18 @@ struct bpf_stack_build_id {
 
 #define BPF_OBJ_NAME_LEN 16U
 
+/* flags for bpf_static_branch_info */
+enum {
+	BPF_F_INVERSE_BRANCH = 1,
+};
+
+struct bpf_static_branch_info {
+	__u32 map_fd;			/* map in control */
+	__u32 insn_offset;		/* absolute offset of the branch instruction */
+	__u32 jump_target;		/* absolute offset of the jump target */
+	__u32 flags;
+};
+
 union bpf_attr {
 	struct { /* anonymous struct used by BPF_MAP_CREATE command */
 		__u32	map_type;	/* one of enum bpf_map_type */
@@ -1467,6 +1482,9 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		log_true_size;
+		/* An array of struct bpf_static_branch_info */
+		__aligned_u64	static_branches_info;
+		__u32		static_branches_info_size;
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 5/7] bpf: x86: implement static keys support
  2023-12-06 14:10 [PATCH bpf-next 0/7] BPF Static Keys Anton Protopopov
                   ` (3 preceding siblings ...)
  2023-12-06 14:10 ` [PATCH bpf-next 4/7] bpf: implement BPF Static Keys support Anton Protopopov
@ 2023-12-06 14:10 ` Anton Protopopov
  2023-12-06 14:10 ` [PATCH bpf-next 6/7] libbpf: BPF Static Keys support Anton Protopopov
  2023-12-06 14:10 ` [PATCH bpf-next 7/7] selftests/bpf: Add tests for BPF Static Keys Anton Protopopov
  6 siblings, 0 replies; 32+ messages in thread
From: Anton Protopopov @ 2023-12-06 14:10 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa,
	Martin KaFai Lau, Stanislav Fomichev, bpf
  Cc: Anton Protopopov

Implement X86 JIT support for BPF Static Keys: while jiting code and
encountering a JA instruction the JIT compiler checks if there is a
corresponding static branch.  If there is, it saves a corresponding x86
address in the static branch structure.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
---
 arch/x86/net/bpf_jit_comp.c | 72 +++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 8c10d9abc239..4e8ed43bd03d 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -452,6 +452,32 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 	return __bpf_arch_text_poke(ip, t, old_addr, new_addr);
 }
 
+int bpf_arch_poke_static_branch(struct bpf_prog *prog,
+				struct bpf_static_branch *branch, bool on)
+{
+	static const u64 bpf_nop = BPF_JMP | BPF_JA;
+	const void *arch_op;
+	const void *bpf_op;
+	bool inverse;
+
+	if (!prog || !branch)
+		return -EINVAL;
+
+	inverse = !!(branch->flags & BPF_F_INVERSE_BRANCH);
+	if (on ^ inverse) {
+		bpf_op = branch->bpf_jmp;
+		arch_op = branch->arch_jmp;
+	} else {
+		bpf_op = &bpf_nop;
+		arch_op = branch->arch_nop;
+	}
+
+	text_poke_bp(branch->arch_addr, arch_op, branch->arch_len, NULL);
+	memcpy(&prog->insnsi[branch->bpf_offset / 8], bpf_op, 8);
+
+	return 0;
+}
+
 #define EMIT_LFENCE()	EMIT3(0x0F, 0xAE, 0xE8)
 
 static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
@@ -1008,6 +1034,32 @@ static void emit_nops(u8 **pprog, int len)
 	*pprog = prog;
 }
 
+static __always_inline void copy_nops(u8 *dst, int len)
+{
+	BUILD_BUG_ON(len != 2 && len != 5);
+	memcpy(dst, x86_nops[len], len);
+}
+
+static __always_inline void
+arch_init_static_branch(struct bpf_static_branch *branch,
+			int len, u32 jmp_offset, void *addr)
+{
+	BUILD_BUG_ON(len != 2 && len != 5);
+
+	if (len == 2) {
+		branch->arch_jmp[0] = 0xEB;
+		branch->arch_jmp[1] = jmp_offset;
+	} else {
+		branch->arch_jmp[0] = 0xE9;
+		memcpy(&branch->arch_jmp[1], &jmp_offset, 4);
+	}
+
+	copy_nops(branch->arch_nop, len);
+
+	branch->arch_len = len;
+	branch->arch_addr = addr;
+}
+
 /* emit the 3-byte VEX prefix
  *
  * r: same as rex.r, extra bit for ModRM reg field
@@ -1078,6 +1130,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
 {
 	bool tail_call_reachable = bpf_prog->aux->tail_call_reachable;
 	struct bpf_insn *insn = bpf_prog->insnsi;
+	struct bpf_static_branch *branch = NULL;
 	bool callee_regs_used[4] = {};
 	int insn_cnt = bpf_prog->len;
 	bool tail_call_seen = false;
@@ -1928,6 +1981,16 @@ st:			if (is_imm8(insn->off))
 				break;
 			}
 emit_jmp:
+			if (bpf_prog->aux->static_branches_len > 0 && bpf_prog->aux->func_info) {
+				int off, idx;
+
+				idx = bpf_prog->aux->func_idx;
+				off = bpf_prog->aux->func_info[idx].insn_off + i - 1;
+				branch = bpf_static_branch_by_offset(bpf_prog, off * 8);
+			} else {
+				branch = bpf_static_branch_by_offset(bpf_prog, (i - 1) * 8);
+			}
+
 			if (is_imm8(jmp_offset)) {
 				if (jmp_padding) {
 					/* To avoid breaking jmp_offset, the extra bytes
@@ -1950,8 +2013,17 @@ st:			if (is_imm8(insn->off))
 					}
 					emit_nops(&prog, INSN_SZ_DIFF - 2);
 				}
+
+				if (branch)
+					arch_init_static_branch(branch, 2, jmp_offset,
+								image + addrs[i-1]);
+
 				EMIT2(0xEB, jmp_offset);
 			} else if (is_simm32(jmp_offset)) {
+				if (branch)
+					arch_init_static_branch(branch, 5, jmp_offset,
+								image + addrs[i-1]);
+
 				EMIT1_off32(0xE9, jmp_offset);
 			} else {
 				pr_err("jmp gen bug %llx\n", jmp_offset);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-06 14:10 [PATCH bpf-next 0/7] BPF Static Keys Anton Protopopov
                   ` (4 preceding siblings ...)
  2023-12-06 14:10 ` [PATCH bpf-next 5/7] bpf: x86: implement static keys support Anton Protopopov
@ 2023-12-06 14:10 ` Anton Protopopov
  2023-12-08  3:45   ` Alexei Starovoitov
  2023-12-06 14:10 ` [PATCH bpf-next 7/7] selftests/bpf: Add tests for BPF Static Keys Anton Protopopov
  6 siblings, 1 reply; 32+ messages in thread
From: Anton Protopopov @ 2023-12-06 14:10 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa,
	Martin KaFai Lau, Stanislav Fomichev, bpf
  Cc: Anton Protopopov

Introduce the DEFINE_STATIC_KEY() and bpf_static_branch_{unlikely,likely}
macros to mimic Linux Kernel Static Keys API in BPF. Example of usage would
be as follows:

    DEFINE_STATIC_KEY(key);

    void prog(void)
    {
            if (bpf_static_branch_unlikely(&key))
                    /* rarely used code */
            else
                    /* default hot path code */
    }

or, using the likely variant:

    void prog2(void)
    {
            if (bpf_static_branch_likely(&key))
                    /* default hot path code */
            else
                    /* rarely used code */
    }

The "unlikely" version of macro compiles in the code where the else-branch
(key is off) is fall-through, the "likely" macro prioritises the if-branch.

Both macros push an entry in a new ".jump_table" section which contains the
following information:

               32 bits                   32 bits           64 bits
    offset of jump instruction | offset of jump target |    flags

The corresponding ".rel.jump_table" relocations table entry contains the
base section name and the static key (map) name. The bigger portion of this
patch works on parsing, relocating and sending this information to kernel
via the static_branches_info and static_branches_info_size attributes of
the BPF_PROG_LOAD syscall.

The same key may be used multiple times in one program and can be used by
multiple BPF programs. BPF doesn't provide guarantees on order in which
static branches controlled by one key are patched.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
---
 tools/lib/bpf/bpf.c             |   5 +-
 tools/lib/bpf/bpf.h             |   4 +-
 tools/lib/bpf/bpf_helpers.h     |  64 ++++++++
 tools/lib/bpf/libbpf.c          | 273 +++++++++++++++++++++++++++++++-
 tools/lib/bpf/libbpf_internal.h |   3 +
 tools/lib/bpf/linker.c          |   8 +-
 6 files changed, 351 insertions(+), 6 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 9dc9625651dc..f67d6a4dac05 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -232,7 +232,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 		  const struct bpf_insn *insns, size_t insn_cnt,
 		  struct bpf_prog_load_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, log_true_size);
+	const size_t attr_sz = offsetofend(union bpf_attr, static_branches_info_size);
 	void *finfo = NULL, *linfo = NULL;
 	const char *func_info, *line_info;
 	__u32 log_size, log_level, attach_prog_fd, attach_btf_obj_fd;
@@ -262,6 +262,9 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 	attr.prog_ifindex = OPTS_GET(opts, prog_ifindex, 0);
 	attr.kern_version = OPTS_GET(opts, kern_version, 0);
 
+	attr.static_branches_info = ptr_to_u64(OPTS_GET(opts, static_branches_info, NULL));
+	attr.static_branches_info_size = OPTS_GET(opts, static_branches_info_size, 0);
+
 	if (prog_name && kernel_supports(NULL, FEAT_PROG_NAME))
 		libbpf_strlcpy(attr.prog_name, prog_name, sizeof(attr.prog_name));
 	attr.license = ptr_to_u64(license);
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index d0f53772bdc0..ec6d4b955fb8 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -102,9 +102,11 @@ struct bpf_prog_load_opts {
 	 * If kernel doesn't support this feature, log_size is left unchanged.
 	 */
 	__u32 log_true_size;
+	struct bpf_static_branch_info *static_branches_info;
+	__u32 static_branches_info_size;
 	size_t :0;
 };
-#define bpf_prog_load_opts__last_field log_true_size
+#define bpf_prog_load_opts__last_field static_branches_info_size
 
 LIBBPF_API int bpf_prog_load(enum bpf_prog_type prog_type,
 			     const char *prog_name, const char *license,
diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index 77ceea575dc7..e3bfa0697304 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -400,4 +400,68 @@ extern void bpf_iter_num_destroy(struct bpf_iter_num *it) __weak __ksym;
 )
 #endif /* bpf_repeat */
 
+#define DEFINE_STATIC_KEY(NAME)									\
+	struct {										\
+		__uint(type, BPF_MAP_TYPE_ARRAY);						\
+		__type(key, __u32);								\
+		__type(value, __u32);								\
+		__uint(map_flags, BPF_F_STATIC_KEY);						\
+		__uint(max_entries, 1);								\
+	} NAME SEC(".maps")
+
+#ifndef likely
+#define likely(x)	(__builtin_expect(!!(x), 1))
+#endif
+
+#ifndef unlikely
+#define unlikely(x)	(__builtin_expect(!!(x), 0))
+#endif
+
+static __always_inline int __bpf_static_branch_nop(void *static_key)
+{
+	asm goto("1:\n\t"
+		"goto +0\n\t"
+		".pushsection .jump_table, \"aw\"\n\t"
+		".balign 8\n\t"
+		".long 1b - .\n\t"
+		".long %l[l_yes] - .\n\t"
+		".quad %c0 - .\n\t"
+		".popsection\n\t"
+		:: "i" (static_key)
+		:: l_yes);
+	return 0;
+l_yes:
+	return 1;
+}
+
+static __always_inline int __bpf_static_branch_jump(void *static_key)
+{
+	asm goto("1:\n\t"
+		"goto %l[l_yes]\n\t"
+		".pushsection .jump_table, \"aw\"\n\t"
+		".balign 8\n\t"
+		".long 1b - .\n\t"
+		".long %l[l_yes] - .\n\t"
+		".quad %c0 - . + 1\n\t"
+		".popsection\n\t"
+		:: "i" (static_key)
+		:: l_yes);
+	return 0;
+l_yes:
+	return 1;
+}
+
+/*
+ * The bpf_static_branch_{unlikely,likely} macros provide a way to utilize BPF
+ * Static Keys in BPF programs in exactly the same manner this is done in the
+ * Linux Kernel.  The "unlikely" macro compiles in the code where the else-branch
+ * (key is off) is prioritized, the "likely" macro prioritises the if-branch.
+ */
+
+#define bpf_static_branch_unlikely(static_key) \
+	unlikely(__bpf_static_branch_nop(static_key))
+
+#define bpf_static_branch_likely(static_key) \
+	likely(!__bpf_static_branch_jump(static_key))
+
 #endif
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e067be95da3c..92620717abda 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -391,6 +391,13 @@ struct bpf_sec_def {
 	libbpf_prog_attach_fn_t prog_attach_fn;
 };
 
+struct static_branch_info {
+	struct bpf_map *map;
+	__u32 insn_offset;
+	__u32 jump_target;
+	__u32 flags;
+};
+
 /*
  * bpf_prog should be a better name but it has been used in
  * linux/filter.h.
@@ -463,6 +470,9 @@ struct bpf_program {
 	__u32 line_info_rec_size;
 	__u32 line_info_cnt;
 	__u32 prog_flags;
+
+	struct static_branch_info *static_branches_info;
+	__u32 static_branches_info_size;
 };
 
 struct bpf_struct_ops {
@@ -493,6 +503,7 @@ struct bpf_struct_ops {
 #define KSYMS_SEC ".ksyms"
 #define STRUCT_OPS_SEC ".struct_ops"
 #define STRUCT_OPS_LINK_SEC ".struct_ops.link"
+#define STATIC_JUMPS_SEC ".jump_table"
 
 enum libbpf_map_type {
 	LIBBPF_MAP_UNSPEC,
@@ -624,6 +635,7 @@ struct elf_state {
 	Elf_Data *symbols;
 	Elf_Data *st_ops_data;
 	Elf_Data *st_ops_link_data;
+	Elf_Data *static_branches_data;
 	size_t shstrndx; /* section index for section name strings */
 	size_t strtabidx;
 	struct elf_sec_desc *secs;
@@ -634,6 +646,7 @@ struct elf_state {
 	int symbols_shndx;
 	int st_ops_shndx;
 	int st_ops_link_shndx;
+	int static_branches_shndx;
 };
 
 struct usdt_manager;
@@ -715,6 +728,7 @@ void bpf_program__unload(struct bpf_program *prog)
 
 	zfree(&prog->func_info);
 	zfree(&prog->line_info);
+	zfree(&prog->static_branches_info);
 }
 
 static void bpf_program__exit(struct bpf_program *prog)
@@ -3605,6 +3619,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 			} else if (strcmp(name, STRUCT_OPS_LINK_SEC) == 0) {
 				obj->efile.st_ops_link_data = data;
 				obj->efile.st_ops_link_shndx = idx;
+			} else if (strcmp(name, STATIC_JUMPS_SEC) == 0) {
+				obj->efile.static_branches_data = data;
+				obj->efile.static_branches_shndx = idx;
 			} else {
 				pr_info("elf: skipping unrecognized data section(%d) %s\n",
 					idx, name);
@@ -3620,7 +3637,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 			if (!section_have_execinstr(obj, targ_sec_idx) &&
 			    strcmp(name, ".rel" STRUCT_OPS_SEC) &&
 			    strcmp(name, ".rel" STRUCT_OPS_LINK_SEC) &&
-			    strcmp(name, ".rel" MAPS_ELF_SEC)) {
+			    strcmp(name, ".rel" MAPS_ELF_SEC) &&
+			    strcmp(name, ".rel" STATIC_JUMPS_SEC)) {
 				pr_info("elf: skipping relo section(%d) %s for section(%d) %s\n",
 					idx, name, targ_sec_idx,
 					elf_sec_name(obj, elf_sec_by_idx(obj, targ_sec_idx)) ?: "<?>");
@@ -4422,6 +4440,189 @@ bpf_object__collect_prog_relos(struct bpf_object *obj, Elf64_Shdr *shdr, Elf_Dat
 	return 0;
 }
 
+struct jump_table_entry {
+	__u32 insn_offset;
+	__u32 jump_target;
+	union {
+		__u64 map_ptr;	/* map_ptr is always zero, as it is relocated */
+		__u64 flags;	/* so we can reuse it to store flags */
+	};
+};
+
+static struct bpf_program *shndx_to_prog(struct bpf_object *obj,
+					 size_t sec_idx,
+					 struct jump_table_entry *entry)
+{
+	__u32 insn_offset = entry->insn_offset / 8;
+	__u32 jump_target = entry->jump_target / 8;
+	struct bpf_program *prog;
+	size_t i;
+
+	for (i = 0; i < obj->nr_programs; i++) {
+		prog = &obj->programs[i];
+		if (prog->sec_idx != sec_idx)
+			continue;
+
+		if (insn_offset < prog->sec_insn_off ||
+		    insn_offset >= prog->sec_insn_off + prog->sec_insn_cnt)
+			continue;
+
+		if (jump_target < prog->sec_insn_off ||
+		    jump_target >= prog->sec_insn_off + prog->sec_insn_cnt) {
+			pr_warn("static branch: offset %u is in boundaries, target %u is not\n",
+				insn_offset, jump_target);
+			return NULL;
+		}
+
+		return prog;
+	}
+
+	return NULL;
+}
+
+static struct bpf_program *find_prog_for_jump_entry(struct bpf_object *obj,
+						    int nrels,
+						    Elf_Data *relo_data,
+						    __u32 entry_offset,
+						    struct jump_table_entry *entry)
+{
+	struct bpf_program *prog;
+	Elf64_Rel *rel;
+	Elf64_Sym *sym;
+	int i;
+
+	for (i = 0; i < nrels; i++) {
+		rel = elf_rel_by_idx(relo_data, i);
+		if (!rel) {
+			pr_warn("static branch: relo #%d: failed to get ELF relo\n", i);
+			return ERR_PTR(-LIBBPF_ERRNO__FORMAT);
+		}
+
+		if ((__u32)rel->r_offset != entry_offset)
+			continue;
+
+		sym = elf_sym_by_idx(obj, ELF64_R_SYM(rel->r_info));
+		if (!sym) {
+			pr_warn("static branch: .maps relo #%d: symbol %zx not found\n",
+				i, (size_t)ELF64_R_SYM(rel->r_info));
+			return ERR_PTR(-LIBBPF_ERRNO__FORMAT);
+		}
+
+		prog = shndx_to_prog(obj, sym->st_shndx, entry);
+		if (!prog) {
+			pr_warn("static branch: .maps relo #%d: program %zx not found\n",
+				i, (size_t)sym->st_shndx);
+			return ERR_PTR(-LIBBPF_ERRNO__FORMAT);
+		}
+		return prog;
+	}
+	return ERR_PTR(-LIBBPF_ERRNO__FORMAT);
+}
+
+static struct bpf_map *find_map_for_jump_entry(struct bpf_object *obj,
+					       int nrels,
+					       Elf_Data *relo_data,
+					       __u32 entry_offset)
+{
+	struct bpf_map *map;
+	const char *name;
+	Elf64_Rel *rel;
+	Elf64_Sym *sym;
+	int i;
+
+	for (i = 0; i < nrels; i++) {
+		rel = elf_rel_by_idx(relo_data, i);
+		if (!rel) {
+			pr_warn("static branch: relo #%d: failed to get ELF relo\n", i);
+			return NULL;
+		}
+
+		if ((__u32)rel->r_offset != entry_offset)
+			continue;
+
+		sym = elf_sym_by_idx(obj, ELF64_R_SYM(rel->r_info));
+		if (!sym) {
+			pr_warn(".maps relo #%d: symbol %zx not found\n",
+				i, (size_t)ELF64_R_SYM(rel->r_info));
+			return NULL;
+		}
+
+		name = elf_sym_str(obj, sym->st_name) ?: "<?>";
+		if (!name || !strcmp(name, "")) {
+			pr_warn(".maps relo #%d: symbol name is zero or empty\n", i);
+			return NULL;
+		}
+
+		map = bpf_object__find_map_by_name(obj, name);
+		if (!map)
+			return NULL;
+		return map;
+	}
+	return NULL;
+}
+
+static int add_static_branch(struct bpf_program *prog,
+			     struct jump_table_entry *entry,
+			     struct bpf_map *map)
+{
+	__u32 size_old = prog->static_branches_info_size;
+	__u32 size_new = size_old + sizeof(struct static_branch_info);
+	struct static_branch_info *info;
+	void *x;
+
+	x = realloc(prog->static_branches_info, size_new);
+	if (!x)
+		return -ENOMEM;
+
+	info = x + size_old;
+	info->insn_offset = entry->insn_offset - prog->sec_insn_off * 8;
+	info->jump_target = entry->jump_target - prog->sec_insn_off * 8;
+	info->flags = (__u32) entry->flags;
+	info->map = map;
+
+	prog->static_branches_info = x;
+	prog->static_branches_info_size = size_new;
+
+	return 0;
+}
+
+static int
+bpf_object__collect_static_branches_relos(struct bpf_object *obj,
+					  Elf64_Shdr *shdr,
+					  Elf_Data *relo_data)
+{
+	Elf_Data *branches_data = obj->efile.static_branches_data;
+	int nrels = shdr->sh_size / shdr->sh_entsize;
+	struct jump_table_entry *entries;
+	size_t i;
+	int err;
+
+	if (!branches_data)
+		return 0;
+
+	entries = (void *)branches_data->d_buf;
+	for (i = 0; i < branches_data->d_size / sizeof(struct jump_table_entry); i++) {
+		__u32 entry_offset = i * sizeof(struct jump_table_entry);
+		struct bpf_program *prog;
+		struct bpf_map *map;
+
+		prog = find_prog_for_jump_entry(obj, nrels, relo_data, entry_offset, &entries[i]);
+		if (IS_ERR(prog))
+			return PTR_ERR(prog);
+
+		map = find_map_for_jump_entry(obj, nrels, relo_data,
+				entry_offset + offsetof(struct jump_table_entry, map_ptr));
+		if (!map)
+			return -EINVAL;
+
+		err = add_static_branch(prog, &entries[i], map);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int map_fill_btf_type_info(struct bpf_object *obj, struct bpf_map *map)
 {
 	int id;
@@ -6298,10 +6499,44 @@ static struct reloc_desc *find_prog_insn_relo(const struct bpf_program *prog, si
 		       sizeof(*prog->reloc_desc), cmp_relo_by_insn_idx);
 }
 
+static int append_subprog_static_branches(struct bpf_program *main_prog,
+					  struct bpf_program *subprog)
+{
+	size_t subprog_size = subprog->static_branches_info_size;
+	size_t main_size = main_prog->static_branches_info_size;
+	size_t entry_size = sizeof(struct static_branch_info);
+	void *old_info = main_prog->static_branches_info;
+	int n_entries = subprog_size / entry_size;
+	struct static_branch_info *branch;
+	void *new_info;
+	int i;
+
+	if (!subprog_size)
+		return 0;
+
+	new_info = realloc(old_info, subprog_size + main_size);
+	if (!new_info)
+		return -ENOMEM;
+
+	memcpy(new_info + main_size, subprog->static_branches_info, subprog_size);
+
+	for (i = 0; i < n_entries; i++) {
+		branch = new_info + main_size + i * entry_size;
+		branch->insn_offset += subprog->sub_insn_off * 8;
+		branch->jump_target += subprog->sub_insn_off * 8;
+	}
+
+	main_prog->static_branches_info = new_info;
+	main_prog->static_branches_info_size += subprog_size;
+
+	return 0;
+}
+
 static int append_subprog_relos(struct bpf_program *main_prog, struct bpf_program *subprog)
 {
 	int new_cnt = main_prog->nr_reloc + subprog->nr_reloc;
 	struct reloc_desc *relos;
+	int err;
 	int i;
 
 	if (main_prog == subprog)
@@ -6324,6 +6559,11 @@ static int append_subprog_relos(struct bpf_program *main_prog, struct bpf_progra
 	 */
 	main_prog->reloc_desc = relos;
 	main_prog->nr_reloc = new_cnt;
+
+	err = append_subprog_static_branches(main_prog, subprog);
+	if (err)
+		return err;
+
 	return 0;
 }
 
@@ -6879,6 +7119,8 @@ static int bpf_object__collect_relos(struct bpf_object *obj)
 			err = bpf_object__collect_st_ops_relos(obj, shdr, data);
 		else if (idx == obj->efile.btf_maps_shndx)
 			err = bpf_object__collect_map_relos(obj, shdr, data);
+		else if (idx == obj->efile.static_branches_shndx)
+			err = bpf_object__collect_static_branches_relos(obj, shdr, data);
 		else
 			err = bpf_object__collect_prog_relos(obj, shdr, data);
 		if (err)
@@ -7002,6 +7244,30 @@ static int libbpf_prepare_prog_load(struct bpf_program *prog,
 
 static void fixup_verifier_log(struct bpf_program *prog, char *buf, size_t buf_sz);
 
+static struct bpf_static_branch_info *
+convert_branch_info(struct static_branch_info *info, size_t size)
+{
+	size_t n = size/sizeof(struct static_branch_info);
+	struct bpf_static_branch_info *bpf_info;
+	size_t i;
+
+	if (!info)
+		return NULL;
+
+	bpf_info = calloc(n, sizeof(struct bpf_static_branch_info));
+	if (!bpf_info)
+		return NULL;
+
+	for (i = 0; i < n; i++) {
+		bpf_info[i].insn_offset = info[i].insn_offset;
+		bpf_info[i].jump_target = info[i].jump_target;
+		bpf_info[i].flags = info[i].flags;
+		bpf_info[i].map_fd = info[i].map->fd;
+	}
+
+	return bpf_info;
+}
+
 static int bpf_object_load_prog(struct bpf_object *obj, struct bpf_program *prog,
 				struct bpf_insn *insns, int insns_cnt,
 				const char *license, __u32 kern_version, int *prog_fd)
@@ -7106,6 +7372,11 @@ static int bpf_object_load_prog(struct bpf_object *obj, struct bpf_program *prog
 	load_attr.log_size = log_buf_size;
 	load_attr.log_level = log_level;
 
+	load_attr.static_branches_info = convert_branch_info(prog->static_branches_info,
+							     prog->static_branches_info_size);
+	load_attr.static_branches_info_size = prog->static_branches_info_size /
+			sizeof(struct static_branch_info) * sizeof(struct bpf_static_branch_info);
+
 	ret = bpf_prog_load(prog->type, prog_name, license, insns, insns_cnt, &load_attr);
 	if (ret >= 0) {
 		if (log_level && own_log_buf) {
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index f0f08635adb0..62020e7a58b0 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -40,6 +40,9 @@
 #ifndef R_BPF_64_ABS32
 #define R_BPF_64_ABS32 3
 #endif
+#ifndef R_BPF_64_NODYLD32
+#define R_BPF_64_NODYLD32 4
+#endif
 #ifndef R_BPF_64_32
 #define R_BPF_64_32 10
 #endif
diff --git a/tools/lib/bpf/linker.c b/tools/lib/bpf/linker.c
index 5ced96d99f8c..47b343e2813e 100644
--- a/tools/lib/bpf/linker.c
+++ b/tools/lib/bpf/linker.c
@@ -22,6 +22,7 @@
 #include "strset.h"
 
 #define BTF_EXTERN_SEC ".extern"
+#define STATIC_JUMPS_REL_SEC ".rel.jump_table"
 
 struct src_sec {
 	const char *sec_name;
@@ -888,8 +889,9 @@ static int linker_sanity_check_elf_relos(struct src_obj *obj, struct src_sec *se
 		size_t sym_type = ELF64_R_TYPE(relo->r_info);
 
 		if (sym_type != R_BPF_64_64 && sym_type != R_BPF_64_32 &&
-		    sym_type != R_BPF_64_ABS64 && sym_type != R_BPF_64_ABS32) {
-			pr_warn("ELF relo #%d in section #%zu has unexpected type %zu in %s\n",
+		    sym_type != R_BPF_64_ABS64 && sym_type != R_BPF_64_ABS32 &&
+		    sym_type != R_BPF_64_NODYLD32 && strcmp(sec->sec_name, STATIC_JUMPS_REL_SEC)) {
+			pr_warn("ELF relo #%d in section #%zu unexpected type %zu in %s\n",
 				i, sec->sec_idx, sym_type, obj->filename);
 			return -EINVAL;
 		}
@@ -2087,7 +2089,7 @@ static int linker_append_elf_relos(struct bpf_linker *linker, struct src_obj *ob
 						insn->imm += sec->dst_off / sizeof(struct bpf_insn);
 					else
 						insn->imm += sec->dst_off;
-				} else {
+				} else if (strcmp(src_sec->sec_name, STATIC_JUMPS_REL_SEC)) {
 					pr_warn("relocation against STT_SECTION in non-exec section is not supported!\n");
 					return -EINVAL;
 				}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 7/7] selftests/bpf: Add tests for BPF Static Keys
  2023-12-06 14:10 [PATCH bpf-next 0/7] BPF Static Keys Anton Protopopov
                   ` (5 preceding siblings ...)
  2023-12-06 14:10 ` [PATCH bpf-next 6/7] libbpf: BPF Static Keys support Anton Protopopov
@ 2023-12-06 14:10 ` Anton Protopopov
  6 siblings, 0 replies; 32+ messages in thread
From: Anton Protopopov @ 2023-12-06 14:10 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa,
	Martin KaFai Lau, Stanislav Fomichev, bpf
  Cc: Anton Protopopov

Add several self-tests to test the new BPF Static Key feature. Selftests
include the following tests:

  * check that one key works for one program
  * check that one key works for multiple programs
  * check that static keys work with 2 and 5 bytes jumps
  * check that multiple keys works for one program
  * check that static keys work for base program and a BPF-to-BPF call
  * check that static keys can't be used as normal maps
  * check that passing incorrect parameters on map creation fails
  * check that passing incorrect parameters on program load fails

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
---
 MAINTAINERS                                   |   1 +
 .../bpf/prog_tests/bpf_static_keys.c          | 436 ++++++++++++++++++
 .../selftests/bpf/progs/bpf_static_keys.c     | 120 +++++
 3 files changed, 557 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_static_keys.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_static_keys.c

diff --git a/MAINTAINERS b/MAINTAINERS
index e2f655980c6c..81a040d66af6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3892,6 +3892,7 @@ M:	Anton Protopopov <aspsk@isovalent.com>
 L:	bpf@vger.kernel.org
 S:	Maintained
 F:	kernel/bpf/skey.c
+F:	tools/testing/selftests/bpf/*/*bpf_static_key*
 
 BROADCOM ASP 2.0 ETHERNET DRIVER
 M:	Justin Chen <justin.chen@broadcom.com>
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_static_keys.c b/tools/testing/selftests/bpf/prog_tests/bpf_static_keys.c
new file mode 100644
index 000000000000..37b2da247869
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_static_keys.c
@@ -0,0 +1,436 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+
+#include <test_progs.h>
+#include "bpf_static_keys.skel.h"
+
+#define set_static_key(map_fd, val)						\
+	do {									\
+		__u32 map_value = (val);					\
+		__u32 zero_key = 0;						\
+		int ret;							\
+										\
+		ret = bpf_map_update_elem(map_fd, &zero_key, &map_value, 0);	\
+		ASSERT_EQ(ret, 0, "bpf_map_update_elem");			\
+	} while (0)
+
+static void check_one_key(struct bpf_static_keys *skel)
+{
+	struct bpf_link *link;
+	int map_fd;
+
+	link = bpf_program__attach(skel->progs.check_one_key);
+	if (!ASSERT_OK_PTR(link, "link"))
+		return;
+
+	map_fd = bpf_map__fd(skel->maps.key1);
+	ASSERT_GT(map_fd, 0, "skel->maps.key1");
+
+	set_static_key(map_fd, 0);
+	skel->bss->ret_user = 0;
+	usleep(1);
+	ASSERT_EQ(skel->bss->ret_user, 4, "skel->bss->ret_user");
+
+	set_static_key(map_fd, 1);
+	skel->bss->ret_user = 0;
+	usleep(1);
+	ASSERT_EQ(skel->bss->ret_user, 3, "skel->bss->ret_user");
+
+	bpf_link__destroy(link);
+}
+
+static void check_multiple_progs(struct bpf_static_keys *skel)
+{
+	struct bpf_link *link1;
+	struct bpf_link *link2;
+	struct bpf_link *link3;
+	int map_fd;
+
+	link1 = bpf_program__attach(skel->progs.check_one_key);
+	if (!ASSERT_OK_PTR(link1, "link1"))
+		return;
+
+	link2 = bpf_program__attach(skel->progs.check_one_key_another_prog);
+	if (!ASSERT_OK_PTR(link2, "link2"))
+		return;
+
+	link3 = bpf_program__attach(skel->progs.check_one_key_yet_another_prog);
+	if (!ASSERT_OK_PTR(link3, "link3"))
+		return;
+
+	map_fd = bpf_map__fd(skel->maps.key1);
+	ASSERT_GT(map_fd, 0, "skel->maps.key1");
+
+	set_static_key(map_fd, 0);
+	skel->bss->ret_user = 0;
+	usleep(1);
+	ASSERT_EQ(skel->bss->ret_user, 444, "skel->bss->ret_user");
+	usleep(1);
+	ASSERT_EQ(skel->bss->ret_user, 888, "skel->bss->ret_user");
+
+	set_static_key(map_fd, 1);
+	skel->bss->ret_user = 0;
+	usleep(1);
+	ASSERT_EQ(skel->bss->ret_user, 333, "skel->bss->ret_user");
+	usleep(1);
+	ASSERT_EQ(skel->bss->ret_user, 666, "skel->bss->ret_user");
+
+	bpf_link__destroy(link3);
+	bpf_link__destroy(link2);
+	bpf_link__destroy(link1);
+}
+
+static void check_multiple_keys(struct bpf_static_keys *skel)
+{
+	struct bpf_link *link;
+	int map_fd1;
+	int map_fd2;
+	int map_fd3;
+	int i;
+
+	link = bpf_program__attach(skel->progs.check_multiple_keys_unlikely);
+	if (!ASSERT_OK_PTR(link, "link"))
+		return;
+
+	map_fd1 = bpf_map__fd(skel->maps.key1);
+	ASSERT_GT(map_fd1, 0, "skel->maps.key1");
+
+	map_fd2 = bpf_map__fd(skel->maps.key2);
+	ASSERT_GT(map_fd2, 0, "skel->maps.key2");
+
+	map_fd3 = bpf_map__fd(skel->maps.key3);
+	ASSERT_GT(map_fd3, 0, "skel->maps.key3");
+
+	for (i = 0; i < 8; i++) {
+		set_static_key(map_fd1, i & 1);
+		set_static_key(map_fd2, i & 2);
+		set_static_key(map_fd3, i & 4);
+
+		usleep(1);
+		ASSERT_EQ(skel->bss->ret_user, i, "skel->bss->ret_user");
+	}
+
+	bpf_link__destroy(link);
+}
+
+static void check_one_key_long_jump(struct bpf_static_keys *skel)
+{
+	struct bpf_link *link;
+	int map_fd;
+
+	link = bpf_program__attach(skel->progs.check_one_key_long_jump);
+	if (!ASSERT_OK_PTR(link, "link"))
+		return;
+
+	map_fd = bpf_map__fd(skel->maps.key1);
+	ASSERT_GT(map_fd, 0, "skel->maps.key1");
+
+	set_static_key(map_fd, 0);
+	skel->bss->ret_user = 0;
+	usleep(1);
+	ASSERT_EQ(skel->bss->ret_user, 2256, "skel->bss->ret_user");
+
+	set_static_key(map_fd, 1);
+	skel->bss->ret_user = 0;
+	usleep(1);
+	ASSERT_EQ(skel->bss->ret_user, 1256, "skel->bss->ret_user");
+
+	bpf_link__destroy(link);
+}
+
+static void check_bpf_to_bpf_call(struct bpf_static_keys *skel)
+{
+	struct bpf_link *link;
+	int map_fd1;
+	int map_fd2;
+
+	link = bpf_program__attach(skel->progs.check_bpf_to_bpf_call);
+	if (!ASSERT_OK_PTR(link, "link"))
+		return;
+
+	map_fd1 = bpf_map__fd(skel->maps.key1);
+	ASSERT_GT(map_fd1, 0, "skel->maps.key1");
+
+	map_fd2 = bpf_map__fd(skel->maps.key2);
+	ASSERT_GT(map_fd2, 0, "skel->maps.key2");
+
+	set_static_key(map_fd1, 0);
+	set_static_key(map_fd2, 0);
+	skel->bss->ret_user = 0;
+	usleep(1);
+	ASSERT_EQ(skel->bss->ret_user, 0, "skel->bss->ret_user");
+
+	set_static_key(map_fd1, 1);
+	set_static_key(map_fd2, 0);
+	skel->bss->ret_user = 0;
+	usleep(1);
+	ASSERT_EQ(skel->bss->ret_user, 101, "skel->bss->ret_user");
+
+	set_static_key(map_fd1, 0);
+	set_static_key(map_fd2, 1);
+	skel->bss->ret_user = 0;
+	usleep(1);
+	ASSERT_EQ(skel->bss->ret_user, 1010, "skel->bss->ret_user");
+
+	set_static_key(map_fd1, 1);
+	set_static_key(map_fd2, 1);
+	skel->bss->ret_user = 0;
+	usleep(1);
+	ASSERT_EQ(skel->bss->ret_user, 1111, "skel->bss->ret_user");
+
+
+	bpf_link__destroy(link);
+}
+
+#define FIXED_MAP_FD 666
+
+static void check_use_key_as_map(struct bpf_static_keys *skel)
+{
+	struct bpf_insn insns[] = {
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+		BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+		BPF_LD_MAP_FD(BPF_REG_1, FIXED_MAP_FD),
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	union bpf_attr attr = {
+		.prog_type = BPF_PROG_TYPE_XDP,
+		.insns     = ptr_to_u64(insns),
+		.insn_cnt  = ARRAY_SIZE(insns),
+		.license   = ptr_to_u64("GPL"),
+	};
+	int map_fd;
+	int ret;
+
+	/* first check that prog loads ok */
+
+	map_fd = bpf_map__fd(skel->maps.just_map);
+	ASSERT_GT(map_fd, 0, "skel->maps.just_map");
+
+	ret = dup2(map_fd, FIXED_MAP_FD);
+	ASSERT_EQ(ret, FIXED_MAP_FD, "dup2");
+
+	strncpy(attr.prog_name, "prog", sizeof(attr.prog_name));
+	ret = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	ASSERT_GT(ret, 0, "BPF_PROG_LOAD");
+	close(ret);
+	close(FIXED_MAP_FD);
+
+	/* now the incorrect map (static key as normal map) */
+
+	map_fd = bpf_map__fd(skel->maps.key1);
+	ASSERT_GT(map_fd, 0, "skel->maps.key1");
+
+	ret = dup2(map_fd, FIXED_MAP_FD);
+	ASSERT_EQ(ret, FIXED_MAP_FD, "dup2");
+
+	ret = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	ASSERT_EQ(ret, -1, "BPF_PROG_LOAD");
+	ASSERT_EQ(errno, EINVAL, "BPF_PROG_LOAD");
+	close(ret);
+	close(FIXED_MAP_FD);
+}
+
+static void map_create_incorrect(void)
+{
+	union bpf_attr attr = {
+		.map_type = BPF_MAP_TYPE_ARRAY,
+		.key_size = 4,
+		.value_size = 4,
+		.max_entries = 1,
+		.map_flags = BPF_F_STATIC_KEY,
+	};
+	int map_fd;
+
+	/* The first call should be ok */
+
+	map_fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
+	ASSERT_GT(map_fd, 0, "BPF_MAP_CREATE");
+	close(map_fd);
+
+	/* All the rest calls should fail */
+
+	attr.map_type = BPF_MAP_TYPE_HASH;
+	map_fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
+	ASSERT_EQ(map_fd, -1, "BPF_MAP_CREATE");
+	attr.map_type = BPF_MAP_TYPE_ARRAY;
+
+	attr.key_size = 8;
+	map_fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
+	ASSERT_EQ(map_fd, -1, "BPF_MAP_CREATE");
+	attr.key_size = 4;
+
+	attr.value_size = 8;
+	map_fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
+	ASSERT_EQ(map_fd, -1, "BPF_MAP_CREATE");
+	attr.value_size = 4;
+
+	attr.max_entries = 2;
+	map_fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
+	ASSERT_EQ(map_fd, -1, "BPF_MAP_CREATE");
+	attr.max_entries = 1;
+}
+
+static void prog_load_incorrect_branches(struct bpf_static_keys *skel)
+{
+	int key_fd, map_fd, prog_fd;
+
+	/*
+	 *                 KEY=OFF               KEY=ON
+	 * <prog>:
+	 *        0:       r0 = 0x0              r0 = 0x0
+	 *        1:       goto +0x0 <1>         goto +0x1 <2>
+	 * <1>:
+	 *        2:       exit                  exit
+	 * <2>:
+	 *        3:       r0 = 0x1              r0 = 0x1
+	 *        4:       goto -0x3 <1>         goto -0x3 <1>
+	 */
+	struct bpf_insn insns[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_JMP_IMM(BPF_JA, 0, 0, 0),
+		BPF_EXIT_INSN(),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_JMP_IMM(BPF_JA, 0, 0, -3),
+	};
+	struct bpf_static_branch_info static_branches_info[] = {
+		{
+			.map_fd = -1,
+			.insn_offset = 8,
+			.jump_target = 24,
+			.flags = 0,
+		},
+	};
+	union bpf_attr attr = {
+		.prog_type = BPF_PROG_TYPE_XDP,
+		.insns     = ptr_to_u64(insns),
+		.insn_cnt  = ARRAY_SIZE(insns),
+		.license   = ptr_to_u64("GPL"),
+		.static_branches_info = ptr_to_u64(static_branches_info),
+		.static_branches_info_size = sizeof(static_branches_info),
+	};
+
+	key_fd = bpf_map__fd(skel->maps.key1);
+	ASSERT_GT(key_fd, 0, "skel->maps.key1");
+
+	map_fd = bpf_map__fd(skel->maps.just_map);
+	ASSERT_GT(map_fd, 0, "skel->maps.just_map");
+
+	strncpy(attr.prog_name, "prog", sizeof(attr.prog_name));
+
+	/* The first two loads should be ok, correct parameters */
+
+	static_branches_info[0].map_fd = key_fd;
+	prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	ASSERT_GT(prog_fd, 0, "BPF_PROG_LOAD");
+	close(prog_fd);
+
+	static_branches_info[0].flags = BPF_F_INVERSE_BRANCH;
+	insns[1] = BPF_JMP_IMM(BPF_JA, 0, 0, 1); /* inverse branch expects non-zero offset */
+	prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	ASSERT_GT(prog_fd, 0, "BPF_PROG_LOAD");
+	close(prog_fd);
+	static_branches_info[0].flags = 0;
+	insns[1] = BPF_JMP_IMM(BPF_JA, 0, 0, 0);
+
+	/* All other loads should fail with -EINVAL */
+
+	static_branches_info[0].map_fd = map_fd;
+	prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	ASSERT_EQ(prog_fd, -1, "BPF_PROG_LOAD: incorrect map fd");
+	ASSERT_EQ(errno, EINVAL, "BPF_PROG_LOAD: incorrect map fd");
+	static_branches_info[0].map_fd = key_fd;
+
+	attr.static_branches_info = 0;
+	prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	ASSERT_EQ(prog_fd, -1, "BPF_PROG_LOAD: info is NULL, but size is not zero");
+	ASSERT_EQ(errno, EINVAL, "BPF_PROG_LOAD: info is NULL, but size is not zero");
+	attr.static_branches_info = ptr_to_u64(static_branches_info);
+
+	attr.static_branches_info_size = 0;
+	prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	ASSERT_EQ(prog_fd, -1, "BPF_PROG_LOAD: info is not NULL, but size is zero");
+	ASSERT_EQ(errno, EINVAL, "BPF_PROG_LOAD: info is not NULL, but size is zero");
+	attr.static_branches_info_size = sizeof(static_branches_info);
+
+	attr.static_branches_info_size = 1;
+	prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	ASSERT_EQ(prog_fd, -1, "BPF_PROG_LOAD: size not divisible by item size");
+	ASSERT_EQ(errno, EINVAL, "BPF_PROG_LOAD: size not divisible by item size");
+	attr.static_branches_info_size = sizeof(static_branches_info);
+
+	static_branches_info[0].flags = 0xbeef;
+	prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	ASSERT_EQ(prog_fd, -1, "BPF_PROG_LOAD: incorrect flags");
+	ASSERT_EQ(errno, EINVAL, "BPF_PROG_LOAD: incorrect flags");
+	static_branches_info[0].flags = 0;
+
+	static_branches_info[0].insn_offset = 1;
+	prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	ASSERT_EQ(prog_fd, -1, "BPF_PROG_LOAD: incorrect insn_offset");
+	ASSERT_EQ(errno, EINVAL, "BPF_PROG_LOAD: incorrect insn_offset");
+	static_branches_info[0].insn_offset = 8;
+
+	static_branches_info[0].insn_offset = 64;
+	prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	ASSERT_EQ(prog_fd, -1, "BPF_PROG_LOAD: insn_offset outside of prgoram");
+	ASSERT_EQ(errno, EINVAL, "BPF_PROG_LOAD: insn_offset outside of prgoram");
+	static_branches_info[0].insn_offset = 8;
+
+	static_branches_info[0].jump_target = 1;
+	prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	ASSERT_EQ(prog_fd, -1, "BPF_PROG_LOAD: incorrect jump_target");
+	ASSERT_EQ(errno, EINVAL, "BPF_PROG_LOAD: incorrect jump_target");
+	static_branches_info[0].jump_target = 8;
+
+	static_branches_info[0].jump_target = 64;
+	prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	ASSERT_EQ(prog_fd, -1, "BPF_PROG_LOAD: jump_target outside of prgoram");
+	ASSERT_EQ(errno, EINVAL, "BPF_PROG_LOAD: jump_target outside of prgoram");
+	static_branches_info[0].jump_target = 8;
+
+	static_branches_info[0].insn_offset = 0;
+	prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	ASSERT_EQ(prog_fd, -1, "BPF_PROG_LOAD: patching not a JA");
+	ASSERT_EQ(errno, EINVAL, "BPF_PROG_LOAD: patching not a JA");
+	static_branches_info[0].insn_offset = 8;
+}
+
+void test_bpf_static_keys(void)
+{
+	struct bpf_static_keys *skel;
+
+	skel = bpf_static_keys__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "bpf_static_keys__open_and_load"))
+		return;
+
+	if (test__start_subtest("check_one_key"))
+		check_one_key(skel);
+
+	if (test__start_subtest("check_multiple_keys"))
+		check_multiple_keys(skel);
+
+	if (test__start_subtest("check_multiple_progs"))
+		check_multiple_progs(skel);
+
+	if (test__start_subtest("check_one_key_long_jump"))
+		check_one_key_long_jump(skel);
+
+	if (test__start_subtest("check_bpf_to_bpf_call"))
+		check_bpf_to_bpf_call(skel);
+
+	/* Negative tests */
+
+	if (test__start_subtest("check_use_key_as_map"))
+		check_use_key_as_map(skel);
+
+	if (test__start_subtest("map_create_incorrect"))
+		map_create_incorrect();
+
+	if (test__start_subtest("prog_load_incorrect_branches"))
+		prog_load_incorrect_branches(skel);
+
+	bpf_static_keys__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_static_keys.c b/tools/testing/selftests/bpf/progs/bpf_static_keys.c
new file mode 100644
index 000000000000..e47a34df469b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_static_keys.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+DEFINE_STATIC_KEY(key1);
+DEFINE_STATIC_KEY(key2);
+DEFINE_STATIC_KEY(key3);
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, u32);
+	__type(value, u32);
+} just_map SEC(".maps");
+
+int ret_user;
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int check_one_key(void *ctx)
+{
+	if (bpf_static_branch_likely(&key1))
+		ret_user += 3;
+	else
+		ret_user += 4;
+
+	return 0;
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int check_one_key_another_prog(void *ctx)
+{
+	if (bpf_static_branch_unlikely(&key1))
+		ret_user += 30;
+	else
+		ret_user += 40;
+
+	return 0;
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int check_one_key_yet_another_prog(void *ctx)
+{
+	if (bpf_static_branch_unlikely(&key1))
+		ret_user += 300;
+	else
+		ret_user += 400;
+
+	return 0;
+}
+
+static __always_inline int big_chunk_of_code(volatile int *x)
+{
+	#pragma clang loop unroll_count(256)
+	for (int i = 0; i < 256; i++)
+		*x += 1;
+
+	return *x;
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int check_one_key_long_jump(void *ctx)
+{
+	int x;
+
+	if (bpf_static_branch_likely(&key1)) {
+		x = 1000;
+		big_chunk_of_code(&x);
+		ret_user = x;
+	} else {
+		x = 2000;
+		big_chunk_of_code(&x);
+		ret_user = x;
+	}
+
+	return 0;
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int check_multiple_keys_unlikely(void *ctx)
+{
+	ret_user = (bpf_static_branch_unlikely(&key1) << 0) |
+		   (bpf_static_branch_unlikely(&key2) << 1) |
+		   (bpf_static_branch_unlikely(&key3) << 2);
+
+	return 0;
+}
+
+int __noinline patch(int x)
+{
+	if (bpf_static_branch_likely(&key1))
+		x += 100;
+	if (bpf_static_branch_unlikely(&key2))
+		x += 1000;
+
+	return x;
+}
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int check_bpf_to_bpf_call(void *ctx)
+{
+	__u64 j = bpf_jiffies64();
+
+	bpf_printk("%lu\n", j);
+
+	ret_user = 0;
+
+	if (bpf_static_branch_likely(&key1))
+		ret_user += 1;
+	if (bpf_static_branch_unlikely(&key2))
+		ret_user += 10;
+
+	ret_user = patch(ret_user);
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-06 14:10 ` [PATCH bpf-next 6/7] libbpf: BPF Static Keys support Anton Protopopov
@ 2023-12-08  3:45   ` Alexei Starovoitov
  2023-12-08 16:19     ` Anton Protopopov
  0 siblings, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2023-12-08  3:45 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa,
	Martin KaFai Lau, Stanislav Fomichev, bpf

On Wed, Dec 6, 2023 at 6:13 AM Anton Protopopov <aspsk@isovalent.com> wrote:
>
> +
> +static __always_inline int __bpf_static_branch_jump(void *static_key)
> +{
> +       asm goto("1:\n\t"
> +               "goto %l[l_yes]\n\t"
> +               ".pushsection .jump_table, \"aw\"\n\t"
> +               ".balign 8\n\t"
> +               ".long 1b - .\n\t"
> +               ".long %l[l_yes] - .\n\t"
> +               ".quad %c0 - . + 1\n\t"
> +               ".popsection\n\t"
> +               :: "i" (static_key)
> +               :: l_yes);
> +       return 0;
> +l_yes:
> +       return 1;
> +}

Could you add a test to patch 7 where the same subprog is
used by multiple main progs and another test where a prog
with static_keys is statically linked by libbpf into another prog?
I suspect these cases are not handled by libbpf in the series.
The adjustment of insn offsets is tricky and I don't see this logic
in patch 5.

The special handling of JA insn (if it's listed in
static_branches_info[]) is fragile. The check_cfg() and the verifier
main loop are two places so far, but JA is an unconditional jump.
Every tool that understands BPF ISA would have to treat JA as "maybe
it's not a jump in this case" and that is concerning.

I certainly see the appeal of copy-pasting kernel's static_branch logic,
but we can do better since we're not bound by x86 ISA.

How about we introduce a new insn JA_MAYBE insn, and check_cfg and
the verifier will process it as insn_idx += insn->off/imm _or_ insn_idx += 1.
The new instruction will indicate that it may jump or fall through.
Another bit could be used to indicate a "default" action (jump vs
fallthrough) which will be used by JITs to translate into nop or jmp.
Once it's a part of the instruction stream we can have bpf prog callable
kfunc that can flip JA_MAYBE target
(I think this feature is missing in the patch set. It's necessary
to add an ability for bpf prog to flip static_branch. Currently
you're proposing to do it from user space only),
and there will be no need to supply static_branches_info[] at prog load time.
The libbpf static linking will work as-is without extra code.

JA_MAYBE will also remove the need for extra bpf map type.
The bpf prog can _optionally_ do '.long 1b - .' asm trick and
store the address of JA_MAYBE insn into an arbitrary 8 byte value
(either in a global variable, a section or somewhere else).
I think it's necessary to decouple patching of JA_MAYBE vs naming
the location.
The ".pushsection .jump_table" should be optional.
The kernel's static_key approach hard codes them together, but
it's due to x86 limitations.
We can introduce JA_MAYBE and use it everywhere in the bpf prog and
do not add names or addresses next to them. Then 'bpftool prog dump' can
retrieve the insn stream and another command can patch a specific
instruction (because JA_MAYBE is easy to spot vs regular JA).
At the end it's just a text_poke_bp() to convert
a target of JA_MAYBE.
The bpf prog can be written with
 asm goto("go_maybe +0, %l[l_yes]")
without names and maps, and the jumps will follow the indicated
'default' branch (how about, the 1st listed is default, the 2nd is
maybe).
The kernel static keys cannot be flipped atomically anyway,
so multiple branches using the same static key is equivalent to an
array of addresses that are flipped one by one.

I suspect the main use case for isovalent is to compile a bpf prog
with debug code that is not running by default and then flip
various parts when debugging is necessary.
With JA_MAYBE it's still going to be bpf_static_branch_[un]likely(),
but no extra map and the prog will load fine. Then you can patch
all of such insns or subset of them on demand.
(The verifier will allow patching of JA_MAYBE only between two targets,
so no safety issues).
I think it's more flexible than the new map type and static_branches_info[].
wdyt?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-08  3:45   ` Alexei Starovoitov
@ 2023-12-08 16:19     ` Anton Protopopov
  2023-12-08 22:04       ` Andrii Nakryiko
  0 siblings, 1 reply; 32+ messages in thread
From: Anton Protopopov @ 2023-12-08 16:19 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa,
	Martin KaFai Lau, Stanislav Fomichev, bpf

On Thu, Dec 07, 2023 at 07:45:32PM -0800, Alexei Starovoitov wrote:
> On Wed, Dec 6, 2023 at 6:13 AM Anton Protopopov <aspsk@isovalent.com> wrote:
> >
> > +
> > +static __always_inline int __bpf_static_branch_jump(void *static_key)
> > +{
> > +       asm goto("1:\n\t"
> > +               "goto %l[l_yes]\n\t"
> > +               ".pushsection .jump_table, \"aw\"\n\t"
> > +               ".balign 8\n\t"
> > +               ".long 1b - .\n\t"
> > +               ".long %l[l_yes] - .\n\t"
> > +               ".quad %c0 - . + 1\n\t"
> > +               ".popsection\n\t"
> > +               :: "i" (static_key)
> > +               :: l_yes);
> > +       return 0;
> > +l_yes:
> > +       return 1;
> > +}
> 
> Could you add a test to patch 7 where the same subprog is
> used by multiple main progs and another test where a prog
> with static_keys is statically linked by libbpf into another prog?
> I suspect these cases are not handled by libbpf in the series.
> The adjustment of insn offsets is tricky and I don't see this logic
> in patch 5.
> 
> The special handling of JA insn (if it's listed in
> static_branches_info[]) is fragile. The check_cfg() and the verifier
> main loop are two places so far, but JA is an unconditional jump.
> Every tool that understands BPF ISA would have to treat JA as "maybe
> it's not a jump in this case" and that is concerning.

Will do, thanks.

> I certainly see the appeal of copy-pasting kernel's static_branch logic,
> but we can do better since we're not bound by x86 ISA.
> 
> How about we introduce a new insn JA_MAYBE insn, and check_cfg and
> the verifier will process it as insn_idx += insn->off/imm _or_ insn_idx += 1.
> The new instruction will indicate that it may jump or fall through.
> Another bit could be used to indicate a "default" action (jump vs
> fallthrough) which will be used by JITs to translate into nop or jmp.
> Once it's a part of the instruction stream we can have bpf prog callable
> kfunc that can flip JA_MAYBE target
> (I think this feature is missing in the patch set. It's necessary
> to add an ability for bpf prog to flip static_branch. Currently
> you're proposing to do it from user space only),
> and there will be no need to supply static_branches_info[] at prog load time.
> The libbpf static linking will work as-is without extra code.
> 
> JA_MAYBE will also remove the need for extra bpf map type.
> The bpf prog can _optionally_ do '.long 1b - .' asm trick and
> store the address of JA_MAYBE insn into an arbitrary 8 byte value
> (either in a global variable, a section or somewhere else).
> I think it's necessary to decouple patching of JA_MAYBE vs naming
> the location.
> The ".pushsection .jump_table" should be optional.
> The kernel's static_key approach hard codes them together, but
> it's due to x86 limitations.
> We can introduce JA_MAYBE and use it everywhere in the bpf prog and
> do not add names or addresses next to them. Then 'bpftool prog dump' can
> retrieve the insn stream and another command can patch a specific
> instruction (because JA_MAYBE is easy to spot vs regular JA).
> At the end it's just a text_poke_bp() to convert
> a target of JA_MAYBE.
> The bpf prog can be written with
>  asm goto("go_maybe +0, %l[l_yes]")
> without names and maps, and the jumps will follow the indicated
> 'default' branch (how about, the 1st listed is default, the 2nd is
> maybe).
> The kernel static keys cannot be flipped atomically anyway,
> so multiple branches using the same static key is equivalent to an
> array of addresses that are flipped one by one.

Thanks for the detailed review. You're right, without adding a new
instruction non-kernel observers can't distinguish between a JA and a
"static branch JA". This also makes sense to encode direct/inverse flag as
well (and more, see below). I would call the new instruction something like
JA_CFG, emphasizing the fact that this is a JA which can be configured by
an external force.

We also can easily add a kfunc API, however, I am for keeping the "map API"
in place (in addition to more fine-grained API you've proposed). See my
considerations and examples below.

> I suspect the main use case for isovalent is to compile a bpf prog
> with debug code that is not running by default and then flip
> various parts when debugging is necessary.
> With JA_MAYBE it's still going to be bpf_static_branch_[un]likely(),
> but no extra map and the prog will load fine. Then you can patch
> all of such insns or subset of them on demand.
> (The verifier will allow patching of JA_MAYBE only between two targets,
> so no safety issues).
> I think it's more flexible than the new map type and static_branches_info[].
> wdyt?

Here is how I think about API. Imagine that we have this new instruction,
JA_CFG, and that a program uses it in multiple places. If we don't mark
those instructions, then after compilation an external observer can't
distinguish between them. We don't know if this is supposed to be
instructions controlled by one key or another. We may care about this, say,
when a program uses key A for enabling/disabling debug and key B to
enable/disable another optional feature.

If we push offsets of jumps in a separate section via the "'.long 1b - .'
asm trick", then we will have all the same problems with relocations which
is fragile, as you've shown. What we can do instead is to encode "local key
id" inside the instruction. This local_id is local to the program where it
is used. (We can use 4 bits in, say, dst_reg, or 16 bits of unused
offset/imm, as one of them will be unused. As for me, 4 may be enough for
the initial implementation.) This way we can distinguish between different
keys in one program, and a new `bpf_static_key_set_prog(prog, key_id, value)`
kfunc can be used to toggle this key on/off for a particular program. This
way we don't care about relocation, and API is straightforward.

However, if this is the only API we provide, then this makes user's life
hard, as they will have to keep track of ids, and programs used, and
mapping from "global" id to local ids for each program (when multiple
programs use the same static key, which is desirable). If we keep the
higher-level "map API", then this simplifies user's life: on a program load
a user can send a list of (local_id -> map) mappings, and then toggle all
the branches controlled by "a [global] static key" by either

    bpf(MAP_UPDATE_ELEM, map, value)

or

    kfunc bpf_static_key_set(map, value)

whatever is more useful. (I think that keeping the bpf(2) userspace API is
worth doing it, as otherwise this, again, makes life harder: users would
have to recompile/update iterator programs if new programs using a static
key are added, etc.)

Libbpf can simplify life even more by automatically allocating local ids
and passing mappings to kernel for a program from the
`bpf_static_branch_{unlikely,likely}(&map)`, so that users don't ever thing
about this, if don't want to. Again, no relocations are required here.

So, to summarize:

  * A new instruction BPF_JA_CFG[ID,FLAGS,OFFSET] where ID is local to the
    program, FLAGS is 0/1 for normal/inverse branches

  * A new kfunc `bpf_static_key_set_prog(prog, key_id, value)` which
    toggles all the branches with ID=key_id in the given prog

  * Extend bpf_attr with a list of (local_id -> map) mappings. This is an
    optinal list if user doesn't want one or all branches to be controlled
    by a map

  * A new kfunc `bpf_static_key_set(map, value)` which toggles all the
    static branches in all programs which use `map` to control branches

  * bpf(2, MAP_UPDATE_ELEM, map, value) acts as the previous kfunc, but in
    a single syscall without the requirement to create iterators

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-08 16:19     ` Anton Protopopov
@ 2023-12-08 22:04       ` Andrii Nakryiko
  2023-12-08 23:07         ` Eduard Zingerman
  2023-12-09  4:05         ` Alexei Starovoitov
  0 siblings, 2 replies; 32+ messages in thread
From: Andrii Nakryiko @ 2023-12-08 22:04 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: Alexei Starovoitov, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jiri Olsa, Martin KaFai Lau, Stanislav Fomichev,
	bpf

On Fri, Dec 8, 2023 at 8:22 AM Anton Protopopov <aspsk@isovalent.com> wrote:
>
> On Thu, Dec 07, 2023 at 07:45:32PM -0800, Alexei Starovoitov wrote:
> > On Wed, Dec 6, 2023 at 6:13 AM Anton Protopopov <aspsk@isovalent.com> wrote:
> > >
> > > +
> > > +static __always_inline int __bpf_static_branch_jump(void *static_key)
> > > +{
> > > +       asm goto("1:\n\t"
> > > +               "goto %l[l_yes]\n\t"
> > > +               ".pushsection .jump_table, \"aw\"\n\t"
> > > +               ".balign 8\n\t"
> > > +               ".long 1b - .\n\t"
> > > +               ".long %l[l_yes] - .\n\t"
> > > +               ".quad %c0 - . + 1\n\t"
> > > +               ".popsection\n\t"
> > > +               :: "i" (static_key)
> > > +               :: l_yes);
> > > +       return 0;
> > > +l_yes:
> > > +       return 1;
> > > +}
> >
> > Could you add a test to patch 7 where the same subprog is
> > used by multiple main progs and another test where a prog
> > with static_keys is statically linked by libbpf into another prog?
> > I suspect these cases are not handled by libbpf in the series.
> > The adjustment of insn offsets is tricky and I don't see this logic
> > in patch 5.
> >
> > The special handling of JA insn (if it's listed in
> > static_branches_info[]) is fragile. The check_cfg() and the verifier
> > main loop are two places so far, but JA is an unconditional jump.
> > Every tool that understands BPF ISA would have to treat JA as "maybe
> > it's not a jump in this case" and that is concerning.
>
> Will do, thanks.
>
> > I certainly see the appeal of copy-pasting kernel's static_branch logic,
> > but we can do better since we're not bound by x86 ISA.
> >
> > How about we introduce a new insn JA_MAYBE insn, and check_cfg and
> > the verifier will process it as insn_idx += insn->off/imm _or_ insn_idx += 1.
> > The new instruction will indicate that it may jump or fall through.
> > Another bit could be used to indicate a "default" action (jump vs
> > fallthrough) which will be used by JITs to translate into nop or jmp.
> > Once it's a part of the instruction stream we can have bpf prog callable
> > kfunc that can flip JA_MAYBE target
> > (I think this feature is missing in the patch set. It's necessary
> > to add an ability for bpf prog to flip static_branch. Currently
> > you're proposing to do it from user space only),
> > and there will be no need to supply static_branches_info[] at prog load time.
> > The libbpf static linking will work as-is without extra code.
> >
> > JA_MAYBE will also remove the need for extra bpf map type.
> > The bpf prog can _optionally_ do '.long 1b - .' asm trick and
> > store the address of JA_MAYBE insn into an arbitrary 8 byte value
> > (either in a global variable, a section or somewhere else).
> > I think it's necessary to decouple patching of JA_MAYBE vs naming
> > the location.
> > The ".pushsection .jump_table" should be optional.
> > The kernel's static_key approach hard codes them together, but
> > it's due to x86 limitations.
> > We can introduce JA_MAYBE and use it everywhere in the bpf prog and
> > do not add names or addresses next to them. Then 'bpftool prog dump' can
> > retrieve the insn stream and another command can patch a specific
> > instruction (because JA_MAYBE is easy to spot vs regular JA).
> > At the end it's just a text_poke_bp() to convert
> > a target of JA_MAYBE.
> > The bpf prog can be written with
> >  asm goto("go_maybe +0, %l[l_yes]")
> > without names and maps, and the jumps will follow the indicated
> > 'default' branch (how about, the 1st listed is default, the 2nd is
> > maybe).
> > The kernel static keys cannot be flipped atomically anyway,
> > so multiple branches using the same static key is equivalent to an
> > array of addresses that are flipped one by one.
>
> Thanks for the detailed review. You're right, without adding a new
> instruction non-kernel observers can't distinguish between a JA and a
> "static branch JA". This also makes sense to encode direct/inverse flag as
> well (and more, see below). I would call the new instruction something like
> JA_CFG, emphasizing the fact that this is a JA which can be configured by
> an external force.
>
> We also can easily add a kfunc API, however, I am for keeping the "map API"
> in place (in addition to more fine-grained API you've proposed). See my
> considerations and examples below.
>
> > I suspect the main use case for isovalent is to compile a bpf prog
> > with debug code that is not running by default and then flip
> > various parts when debugging is necessary.
> > With JA_MAYBE it's still going to be bpf_static_branch_[un]likely(),
> > but no extra map and the prog will load fine. Then you can patch
> > all of such insns or subset of them on demand.
> > (The verifier will allow patching of JA_MAYBE only between two targets,
> > so no safety issues).
> > I think it's more flexible than the new map type and static_branches_info[].
> > wdyt?
>
> Here is how I think about API. Imagine that we have this new instruction,
> JA_CFG, and that a program uses it in multiple places. If we don't mark
> those instructions, then after compilation an external observer can't
> distinguish between them. We don't know if this is supposed to be
> instructions controlled by one key or another. We may care about this, say,
> when a program uses key A for enabling/disabling debug and key B to
> enable/disable another optional feature.
>
> If we push offsets of jumps in a separate section via the "'.long 1b - .'
> asm trick", then we will have all the same problems with relocations which
> is fragile, as you've shown. What we can do instead is to encode "local key
> id" inside the instruction. This local_id is local to the program where it
> is used. (We can use 4 bits in, say, dst_reg, or 16 bits of unused
> offset/imm, as one of them will be unused. As for me, 4 may be enough for
> the initial implementation.) This way we can distinguish between different
> keys in one program, and a new `bpf_static_key_set_prog(prog, key_id, value)`
> kfunc can be used to toggle this key on/off for a particular program. This
> way we don't care about relocation, and API is straightforward.

I feel like embedding some sort of ID inside the instruction is very..
unusual, shall we say?

When I was reading commit messages and discussion, I couldn't shake
the impression that this got to be solved with the help of
relocations. How about something like below (not very well thought
out, sorry) for how this can be put together both from user-space and
kernel sides.

1. We can add a special SEC(".static_keys") section, in which we
define special global variables representing a static key. These
variables will be forced to be global to ensure unique names and stuff
like that.

2. bpf_static_branch_{likely,unlikely}() macro accepts a reference to
one such special global variable and and instructs compiler to emit
relocation between static key variable and JMP_CFG instruction.

Libbpf will properly update these relocations during static linking
and subprog rearrangement, just like we do it for map references
today.

3. libbpf takes care of creating a special map (could be ARRAY with a
special flag, but a dedicated map with multiple entries might be
cleaner), in which each static key variable is represented as a
separate key.

3.5 When libbpf loads BPF program, I guess it will have to upload a
multi-mapping from static_key_id to insn_off (probably we should allow
multiple inst_offs per one static_key_id), so that kernel can identify
which instructions are to be updated.

4. Similar to SEC(".kconfig") libbpf will actually also have a
read-only global variables map, where each variable's value is static
key integer ID that corresponds to a static key in that special map
from #3. This allows user-space to easily get this ID without
hard-coding or guessing anything, it's just

int static_key_id = skel->static_keys->my_static_key;

Then you can do bpf_map_update_elem(&skel->maps.static_key_map,
&static_key_id, 0/1) to flip the switch from user-space.

5. From the BPF side we use global variable semantics similarly to
obtain integer key, then we can use a dedicated special kfunc that
will accept prog, static_key_id, and desired state to flip that jump
whichever way.


So on BPF side it will look roughly like this:

int static_key1 SEC(".static_keys");

...

if (bpf_static_key_likely(&static_key1)) {
   ...
}

...

/* to switch it on or off */
bpf_static_key_switch(&static_key1, 1);


From user-space side I basically already showed:


bpf_map_update_elem(bpf_map__fd(skel->static_keys->static_key1),
                    &skel->static_keys->static_key1, 1);


Something along these lines.


>
> However, if this is the only API we provide, then this makes user's life
> hard, as they will have to keep track of ids, and programs used, and
> mapping from "global" id to local ids for each program (when multiple
> programs use the same static key, which is desirable). If we keep the
> higher-level "map API", then this simplifies user's life: on a program load
> a user can send a list of (local_id -> map) mappings, and then toggle all
> the branches controlled by "a [global] static key" by either
>
>     bpf(MAP_UPDATE_ELEM, map, value)
>
> or
>
>     kfunc bpf_static_key_set(map, value)
>
> whatever is more useful. (I think that keeping the bpf(2) userspace API is
> worth doing it, as otherwise this, again, makes life harder: users would
> have to recompile/update iterator programs if new programs using a static
> key are added, etc.)
>
> Libbpf can simplify life even more by automatically allocating local ids
> and passing mappings to kernel for a program from the
> `bpf_static_branch_{unlikely,likely}(&map)`, so that users don't ever thing
> about this, if don't want to. Again, no relocations are required here.
>
> So, to summarize:
>
>   * A new instruction BPF_JA_CFG[ID,FLAGS,OFFSET] where ID is local to the
>     program, FLAGS is 0/1 for normal/inverse branches
>

+1 for a dedicated instruction


>   * A new kfunc `bpf_static_key_set_prog(prog, key_id, value)` which
>     toggles all the branches with ID=key_id in the given prog
>
>   * Extend bpf_attr with a list of (local_id -> map) mappings. This is an
>     optinal list if user doesn't want one or all branches to be controlled
>     by a map
>
>   * A new kfunc `bpf_static_key_set(map, value)` which toggles all the
>     static branches in all programs which use `map` to control branches
>
>   * bpf(2, MAP_UPDATE_ELEM, map, value) acts as the previous kfunc, but in
>     a single syscall without the requirement to create iterators

Let's see if my above proposal fits this as a way to simplify
allocating static key IDs and keeping them in sync between BPF and
user-space sides.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-08 22:04       ` Andrii Nakryiko
@ 2023-12-08 23:07         ` Eduard Zingerman
  2023-12-09  4:07           ` Alexei Starovoitov
  2023-12-09  4:05         ` Alexei Starovoitov
  1 sibling, 1 reply; 32+ messages in thread
From: Eduard Zingerman @ 2023-12-08 23:07 UTC (permalink / raw)
  To: Andrii Nakryiko, Anton Protopopov
  Cc: Alexei Starovoitov, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jiri Olsa, Martin KaFai Lau, Stanislav Fomichev,
	bpf

On Fri, 2023-12-08 at 14:04 -0800, Andrii Nakryiko wrote:
[...]
> > However, if this is the only API we provide, then this makes user's life
> > hard, as they will have to keep track of ids, and programs used, and
> > mapping from "global" id to local ids for each program (when multiple
> > programs use the same static key, which is desirable). If we keep the
> > higher-level "map API", then this simplifies user's life: on a program load
> > a user can send a list of (local_id -> map) mappings, and then toggle all
> > the branches controlled by "a [global] static key" by either
> > 
> >     bpf(MAP_UPDATE_ELEM, map, value)
> > 
> > or
> > 
> >     kfunc bpf_static_key_set(map, value)
> > 
> > whatever is more useful. (I think that keeping the bpf(2) userspace API is
> > worth doing it, as otherwise this, again, makes life harder: users would
> > have to recompile/update iterator programs if new programs using a static
> > key are added, etc.)
> > 
> > Libbpf can simplify life even more by automatically allocating local ids
> > and passing mappings to kernel for a program from the
> > `bpf_static_branch_{unlikely,likely}(&map)`, so that users don't ever thing
> > about this, if don't want to. Again, no relocations are required here.
> > 
> > So, to summarize:
> > 
> >   * A new instruction BPF_JA_CFG[ID,FLAGS,OFFSET] where ID is local to the
> >     program, FLAGS is 0/1 for normal/inverse branches
> > 
> 
> +1 for a dedicated instruction

fwiw, if relocations are used instead of IDs the new instruction does
not have to be a control flow. It might be a mov that sets target
register to a value that verifier treats as unknown. At runtime this
mov could be patched to assign different values. Granted it would be
three instructions:

  mov rax, 0;
  cmp rax, 0;
  je ...
  
instead of one, but I don't believe there would noticeable performance
difference. On a plus side: even simpler verification, likely/unlikely
for free, no need to track if branch is inverted.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-08 22:04       ` Andrii Nakryiko
  2023-12-08 23:07         ` Eduard Zingerman
@ 2023-12-09  4:05         ` Alexei Starovoitov
  2023-12-09  4:15           ` Yonghong Song
  1 sibling, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2023-12-09  4:05 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Anton Protopopov, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jiri Olsa, Martin KaFai Lau, Stanislav Fomichev,
	bpf

On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
>
> I feel like embedding some sort of ID inside the instruction is very..
> unusual, shall we say?

yeah. no magic numbers inside insns pls.

I don't like JA_CFG name, since I read CFG as control flow graph,
while you probably meant CFG as configurable.
How about BPF_JA_OR_NOP ?
Then in combination with BPF_JMP or BPF_JMP32 modifier
the insn->off|imm will be used.
1st bit in src_reg can indicate the default action: nop or jmp.
In asm it may look like asm("goto_or_nop +5")

> 2. bpf_static_branch_{likely,unlikely}() macro accepts a reference to
> one such special global variable and and instructs compiler to emit
> relocation between static key variable and JMP_CFG instruction.
>
> Libbpf will properly update these relocations during static linking
> and subprog rearrangement, just like we do it for map references
> today.

Right. libbpf has RELO_SUBPROG_ADDR.
This new relo will be pretty much that.
And we have proper C syntax for taking an address: &&label.
The bpf_static_branch macro can use it.
We wanted to add it for a long time to support proper
switch() and jmp tables.

I don't like IDs and new map type for this.
The macro can have 'branch_name' as one of the arguments and
it will populate addresses of insns into "name.static_branch" section.

From libbpf pov it will be yet another global section which
is represented as a traditional bpf array of one element.
No extra handling on the libbpf side.

The question is how to represent the "address" of the insn.
I think 4 byte prog_id + 4 byte insn_idx will do.

Then bpf prog can pass such "address" into bpf_static_branch_enable/disable
kfunc.

The user space can iterate over 8 byte "addresses"
in that 1 element array map and call BPF_STATIC_BRANCH_ENABLE/DISABLE
syscall cmds.
We can have a helper on libbpf side for that.

I see no need to introduce a new map type just to reuse map_update_elem cmd.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-08 23:07         ` Eduard Zingerman
@ 2023-12-09  4:07           ` Alexei Starovoitov
  0 siblings, 0 replies; 32+ messages in thread
From: Alexei Starovoitov @ 2023-12-09  4:07 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Andrii Nakryiko, Anton Protopopov, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Jiri Olsa, Martin KaFai Lau,
	Stanislav Fomichev, bpf

On Fri, Dec 8, 2023 at 3:07 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
>
> fwiw, if relocations are used instead of IDs the new instruction does
> not have to be a control flow. It might be a mov that sets target
> register to a value that verifier treats as unknown. At runtime this
> mov could be patched to assign different values. Granted it would be
> three instructions:
>
>   mov rax, 0;
>   cmp rax, 0;
>   je ...
>
> instead of one, but I don't believe there would noticeable performance
> difference. On a plus side: even simpler verification, likely/unlikely
> for free, no need to track if branch is inverted.

'je' is a conditional jmp. cpu will mispredict it sooner or later
and depending on the density of jmps around it the misprediction
can be severe.
It has to be an unconditional jmp to be fast.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-09  4:05         ` Alexei Starovoitov
@ 2023-12-09  4:15           ` Yonghong Song
  2023-12-09  4:25             ` Alexei Starovoitov
  0 siblings, 1 reply; 32+ messages in thread
From: Yonghong Song @ 2023-12-09  4:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko
  Cc: Anton Protopopov, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jiri Olsa, Martin KaFai Lau, Stanislav Fomichev,
	bpf


On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
>>
>> I feel like embedding some sort of ID inside the instruction is very..
>> unusual, shall we say?
> yeah. no magic numbers inside insns pls.
>
> I don't like JA_CFG name, since I read CFG as control flow graph,
> while you probably meant CFG as configurable.
> How about BPF_JA_OR_NOP ?
> Then in combination with BPF_JMP or BPF_JMP32 modifier
> the insn->off|imm will be used.
> 1st bit in src_reg can indicate the default action: nop or jmp.
> In asm it may look like asm("goto_or_nop +5")

How does the C source code looks like in order to generate
BPF_JA_OR_NOP insn? Any source examples?

>
>> 2. bpf_static_branch_{likely,unlikely}() macro accepts a reference to
>> one such special global variable and and instructs compiler to emit
>> relocation between static key variable and JMP_CFG instruction.
>>
>> Libbpf will properly update these relocations during static linking
>> and subprog rearrangement, just like we do it for map references
>> today.
> Right. libbpf has RELO_SUBPROG_ADDR.
> This new relo will be pretty much that.
> And we have proper C syntax for taking an address: &&label.
> The bpf_static_branch macro can use it.
> We wanted to add it for a long time to support proper
> switch() and jmp tables.
>
> I don't like IDs and new map type for this.
> The macro can have 'branch_name' as one of the arguments and
> it will populate addresses of insns into "name.static_branch" section.
>
>  From libbpf pov it will be yet another global section which
> is represented as a traditional bpf array of one element.
> No extra handling on the libbpf side.
>
> The question is how to represent the "address" of the insn.
> I think 4 byte prog_id + 4 byte insn_idx will do.
>
> Then bpf prog can pass such "address" into bpf_static_branch_enable/disable
> kfunc.
>
> The user space can iterate over 8 byte "addresses"
> in that 1 element array map and call BPF_STATIC_BRANCH_ENABLE/DISABLE
> syscall cmds.
> We can have a helper on libbpf side for that.
>
> I see no need to introduce a new map type just to reuse map_update_elem cmd.
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-09  4:15           ` Yonghong Song
@ 2023-12-09  4:25             ` Alexei Starovoitov
  2023-12-09  5:04               ` Yonghong Song
  0 siblings, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2023-12-09  4:25 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, Anton Protopopov, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Jiri Olsa, Martin KaFai Lau,
	Stanislav Fomichev, bpf

On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
> > On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> >>
> >> I feel like embedding some sort of ID inside the instruction is very..
> >> unusual, shall we say?
> > yeah. no magic numbers inside insns pls.
> >
> > I don't like JA_CFG name, since I read CFG as control flow graph,
> > while you probably meant CFG as configurable.
> > How about BPF_JA_OR_NOP ?
> > Then in combination with BPF_JMP or BPF_JMP32 modifier
> > the insn->off|imm will be used.
> > 1st bit in src_reg can indicate the default action: nop or jmp.
> > In asm it may look like asm("goto_or_nop +5")
>
> How does the C source code looks like in order to generate
> BPF_JA_OR_NOP insn? Any source examples?

It will be in inline asm only. The address of that insn will
be taken either via && or via asm (".long %l[label]").
From llvm pov both should go through the same relo creation logic. I hope :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-09  4:25             ` Alexei Starovoitov
@ 2023-12-09  5:04               ` Yonghong Song
  2023-12-09 17:18                 ` Alexei Starovoitov
  0 siblings, 1 reply; 32+ messages in thread
From: Yonghong Song @ 2023-12-09  5:04 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Anton Protopopov, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Jiri Olsa, Martin KaFai Lau,
	Stanislav Fomichev, bpf


On 12/8/23 8:25 PM, Alexei Starovoitov wrote:
> On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
>>> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
>>> <andrii.nakryiko@gmail.com> wrote:
>>>> I feel like embedding some sort of ID inside the instruction is very..
>>>> unusual, shall we say?
>>> yeah. no magic numbers inside insns pls.
>>>
>>> I don't like JA_CFG name, since I read CFG as control flow graph,
>>> while you probably meant CFG as configurable.
>>> How about BPF_JA_OR_NOP ?
>>> Then in combination with BPF_JMP or BPF_JMP32 modifier
>>> the insn->off|imm will be used.
>>> 1st bit in src_reg can indicate the default action: nop or jmp.
>>> In asm it may look like asm("goto_or_nop +5")
>> How does the C source code looks like in order to generate
>> BPF_JA_OR_NOP insn? Any source examples?
> It will be in inline asm only. The address of that insn will
> be taken either via && or via asm (".long %l[label]").
>  From llvm pov both should go through the same relo creation logic. I hope :)

A hack in llvm below with an example, could you check whether the C 
syntax and object dump result
is what you want to see?

diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp 
b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
index 90697c6645be..38b1cbc31f9a 100644
--- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
+++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
@@ -231,6 +231,7 @@ public:
          .Case("call", true)
          .Case("goto", true)
          .Case("gotol", true)
+        .Case("goto_or_nop", true)
          .Case("*", true)
          .Case("exit", true)
          .Case("lock", true)
@@ -259,6 +260,7 @@ public:
          .Case("bswap64", true)
          .Case("goto", true)
          .Case("gotol", true)
+        .Case("goto_or_nop", true)
          .Case("ll", true)
          .Case("skb", true)
          .Case("s", true)
diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td 
b/llvm/lib/Target/BPF/BPFInstrInfo.td
index 5972c9d49c51..a953d10429bf 100644
--- a/llvm/lib/Target/BPF/BPFInstrInfo.td
+++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
@@ -592,6 +592,19 @@ class BRANCH<BPFJumpOp Opc, string OpcodeStr, 
list<dag> Pattern>
    let BPFClass = BPF_JMP;
  }

+class BRANCH_OR_NOP<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
+    : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
+                   (outs),
+                   (ins brtarget:$BrDst),
+                   !strconcat(OpcodeStr, " $BrDst"),
+                   Pattern> {
+  bits<16> BrDst;
+
+  let Inst{47-32} = BrDst;
+  let Inst{31-0} = 1;
+  let BPFClass = BPF_JMP;
+}
+
  class BRANCH_LONG<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
      : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
                     (outs),
@@ -632,6 +645,7 @@ class CALLX<string OpcodeStr>
  let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in {
    def JMP : BRANCH<BPF_JA, "goto", [(br bb:$BrDst)]>;
    def JMPL : BRANCH_LONG<BPF_JA, "gotol", []>;
+  def JMP_OR_NOP : BRANCH_OR_NOP<BPF_JA, "goto_or_nop", []>;
  }

  // Jump and link


And an example,

[ ~/tmp1/gotol]$ cat t.c
int bar(void);
int foo()
{
         int a, b;

         asm volatile goto ("r0 = 0; \
                             goto_or_nop %l[label]; \
                             r2 = 2; \
                             r3 = 3; \
"::::label);
         a = bar();
label:
         b = 20 * a;
         return b;
}
[ ~/tmp1/gotol]$ clang --target=bpf -O2 -S t.c
[ ~/tmp1/gotol]$ cat t.s
         .text
         .file   "t.c"
         .globl  foo                             # -- Begin function foo
         .p2align        3
         .type   foo,@function
foo:                                    # @foo
# %bb.0:                                # %entry
         r0 = 0
         #APP
         r0 = 0
         goto_or_nop LBB0_2
         r2 = 2
         r3 = 3

         #NO_APP
# %bb.1:                                # %asm.fallthrough
         call bar
         r0 *= 20
LBB0_2:                                 # Block address taken
                                         # %label
                                         # Label of block must be emitted
         exit
.Lfunc_end0:
         .size   foo, .Lfunc_end0-foo
                                         # -- End function
         .addrsig
[ ~/tmp1/gotol]$ clang --target=bpf -O2 -c t.c
[ ~/tmp1/gotol]$ llvm-objdump -dr t.o

t.o:    file format elf64-bpf

Disassembly of section .text:

0000000000000000 <foo>:
        0:       b7 00 00 00 00 00 00 00 r0 = 0x0
        1:       b7 00 00 00 00 00 00 00 r0 = 0x0
        2:       05 00 04 00 01 00 00 00 goto_or_nop +0x4 <LBB0_2>
        3:       b7 02 00 00 02 00 00 00 r2 = 0x2
        4:       b7 03 00 00 03 00 00 00 r3 = 0x3
        5:       85 10 00 00 ff ff ff ff call -0x1
                 0000000000000028:  R_BPF_64_32  bar
        6:       27 00 00 00 14 00 00 00 r0 *= 0x14

0000000000000038 <LBB0_2>:
        7:       95 00 00 00 00 00 00 00 exit
[ ~/tmp1/gotol]$



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-09  5:04               ` Yonghong Song
@ 2023-12-09 17:18                 ` Alexei Starovoitov
  2023-12-10  6:32                   ` Yonghong Song
  0 siblings, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2023-12-09 17:18 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, Anton Protopopov, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Jiri Olsa, Martin KaFai Lau,
	Stanislav Fomichev, bpf

On Fri, Dec 8, 2023 at 9:05 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 12/8/23 8:25 PM, Alexei Starovoitov wrote:
> > On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>
> >> On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
> >>> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
> >>> <andrii.nakryiko@gmail.com> wrote:
> >>>> I feel like embedding some sort of ID inside the instruction is very..
> >>>> unusual, shall we say?
> >>> yeah. no magic numbers inside insns pls.
> >>>
> >>> I don't like JA_CFG name, since I read CFG as control flow graph,
> >>> while you probably meant CFG as configurable.
> >>> How about BPF_JA_OR_NOP ?
> >>> Then in combination with BPF_JMP or BPF_JMP32 modifier
> >>> the insn->off|imm will be used.
> >>> 1st bit in src_reg can indicate the default action: nop or jmp.
> >>> In asm it may look like asm("goto_or_nop +5")
> >> How does the C source code looks like in order to generate
> >> BPF_JA_OR_NOP insn? Any source examples?
> > It will be in inline asm only. The address of that insn will
> > be taken either via && or via asm (".long %l[label]").
> >  From llvm pov both should go through the same relo creation logic. I hope :)
>
> A hack in llvm below with an example, could you check whether the C
> syntax and object dump result
> is what you want to see?

Thank you for the ultra quick llvm diff!

>
> diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> index 90697c6645be..38b1cbc31f9a 100644
> --- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> +++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> @@ -231,6 +231,7 @@ public:
>           .Case("call", true)
>           .Case("goto", true)
>           .Case("gotol", true)
> +        .Case("goto_or_nop", true)
>           .Case("*", true)
>           .Case("exit", true)
>           .Case("lock", true)
> @@ -259,6 +260,7 @@ public:
>           .Case("bswap64", true)
>           .Case("goto", true)
>           .Case("gotol", true)
> +        .Case("goto_or_nop", true)
>           .Case("ll", true)
>           .Case("skb", true)
>           .Case("s", true)
> diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td
> b/llvm/lib/Target/BPF/BPFInstrInfo.td
> index 5972c9d49c51..a953d10429bf 100644
> --- a/llvm/lib/Target/BPF/BPFInstrInfo.td
> +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
> @@ -592,6 +592,19 @@ class BRANCH<BPFJumpOp Opc, string OpcodeStr,
> list<dag> Pattern>
>     let BPFClass = BPF_JMP;
>   }
>
> +class BRANCH_OR_NOP<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
> +    : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
> +                   (outs),
> +                   (ins brtarget:$BrDst),
> +                   !strconcat(OpcodeStr, " $BrDst"),
> +                   Pattern> {
> +  bits<16> BrDst;
> +
> +  let Inst{47-32} = BrDst;
> +  let Inst{31-0} = 1;
> +  let BPFClass = BPF_JMP;
> +}
> +
>   class BRANCH_LONG<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
>       : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
>                      (outs),
> @@ -632,6 +645,7 @@ class CALLX<string OpcodeStr>
>   let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in {
>     def JMP : BRANCH<BPF_JA, "goto", [(br bb:$BrDst)]>;
>     def JMPL : BRANCH_LONG<BPF_JA, "gotol", []>;
> +  def JMP_OR_NOP : BRANCH_OR_NOP<BPF_JA, "goto_or_nop", []>;

I was thinking of burning the new 0xE opcode for it,
but you're right. It's a flavor of existing JA insn and it's indeed
better to just use src_reg=1 bit to indicate so.

We probably need to use the 2nd bit of src_reg to indicate its default state
(jmp or fallthrough).

>          asm volatile goto ("r0 = 0; \
>                              goto_or_nop %l[label]; \
>                              r2 = 2; \
>                              r3 = 3; \

Not sure how to represent the default state in assembly though.
"goto_or_nop" defaults to goto
"nop_or_goto" default to nop
?

Do we need "gotol" for imm32 or will it be automatic?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-09 17:18                 ` Alexei Starovoitov
@ 2023-12-10  6:32                   ` Yonghong Song
  2023-12-10 10:30                     ` Eduard Zingerman
  2023-12-11 17:31                     ` Anton Protopopov
  0 siblings, 2 replies; 32+ messages in thread
From: Yonghong Song @ 2023-12-10  6:32 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Anton Protopopov, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Jiri Olsa, Martin KaFai Lau,
	Stanislav Fomichev, bpf


On 12/9/23 9:18 AM, Alexei Starovoitov wrote:
> On Fri, Dec 8, 2023 at 9:05 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 12/8/23 8:25 PM, Alexei Starovoitov wrote:
>>> On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>> On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
>>>>> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
>>>>> <andrii.nakryiko@gmail.com> wrote:
>>>>>> I feel like embedding some sort of ID inside the instruction is very..
>>>>>> unusual, shall we say?
>>>>> yeah. no magic numbers inside insns pls.
>>>>>
>>>>> I don't like JA_CFG name, since I read CFG as control flow graph,
>>>>> while you probably meant CFG as configurable.
>>>>> How about BPF_JA_OR_NOP ?
>>>>> Then in combination with BPF_JMP or BPF_JMP32 modifier
>>>>> the insn->off|imm will be used.
>>>>> 1st bit in src_reg can indicate the default action: nop or jmp.
>>>>> In asm it may look like asm("goto_or_nop +5")
>>>> How does the C source code looks like in order to generate
>>>> BPF_JA_OR_NOP insn? Any source examples?
>>> It will be in inline asm only. The address of that insn will
>>> be taken either via && or via asm (".long %l[label]").
>>>   From llvm pov both should go through the same relo creation logic. I hope :)
>> A hack in llvm below with an example, could you check whether the C
>> syntax and object dump result
>> is what you want to see?
> Thank you for the ultra quick llvm diff!
>
>> diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>> b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>> index 90697c6645be..38b1cbc31f9a 100644
>> --- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>> +++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>> @@ -231,6 +231,7 @@ public:
>>            .Case("call", true)
>>            .Case("goto", true)
>>            .Case("gotol", true)
>> +        .Case("goto_or_nop", true)
>>            .Case("*", true)
>>            .Case("exit", true)
>>            .Case("lock", true)
>> @@ -259,6 +260,7 @@ public:
>>            .Case("bswap64", true)
>>            .Case("goto", true)
>>            .Case("gotol", true)
>> +        .Case("goto_or_nop", true)
>>            .Case("ll", true)
>>            .Case("skb", true)
>>            .Case("s", true)
>> diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td
>> b/llvm/lib/Target/BPF/BPFInstrInfo.td
>> index 5972c9d49c51..a953d10429bf 100644
>> --- a/llvm/lib/Target/BPF/BPFInstrInfo.td
>> +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
>> @@ -592,6 +592,19 @@ class BRANCH<BPFJumpOp Opc, string OpcodeStr,
>> list<dag> Pattern>
>>      let BPFClass = BPF_JMP;
>>    }
>>
>> +class BRANCH_OR_NOP<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
>> +    : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
>> +                   (outs),
>> +                   (ins brtarget:$BrDst),
>> +                   !strconcat(OpcodeStr, " $BrDst"),
>> +                   Pattern> {
>> +  bits<16> BrDst;
>> +
>> +  let Inst{47-32} = BrDst;
>> +  let Inst{31-0} = 1;
>> +  let BPFClass = BPF_JMP;
>> +}
>> +
>>    class BRANCH_LONG<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
>>        : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
>>                       (outs),
>> @@ -632,6 +645,7 @@ class CALLX<string OpcodeStr>
>>    let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in {
>>      def JMP : BRANCH<BPF_JA, "goto", [(br bb:$BrDst)]>;
>>      def JMPL : BRANCH_LONG<BPF_JA, "gotol", []>;
>> +  def JMP_OR_NOP : BRANCH_OR_NOP<BPF_JA, "goto_or_nop", []>;
> I was thinking of burning the new 0xE opcode for it,
> but you're right. It's a flavor of existing JA insn and it's indeed
> better to just use src_reg=1 bit to indicate so.

Right, using src_reg to indicate a new flavor of JA insn sounds
a good idea. My previously-used 'imm' field is a pure hack.

>
> We probably need to use the 2nd bit of src_reg to indicate its default state
> (jmp or fallthrough).

Good point.

>
>>           asm volatile goto ("r0 = 0; \
>>                               goto_or_nop %l[label]; \
>>                               r2 = 2; \
>>                               r3 = 3; \
> Not sure how to represent the default state in assembly though.
> "goto_or_nop" defaults to goto
> "nop_or_goto" default to nop
> ?
>
> Do we need "gotol" for imm32 or will it be automatic?

It won't be automatic.

At the end of this email, I will show the new change
to have gotol_or_nop and nop_or_gotol insn and an example
to show it in asm. But there is an issue here.
In my example, the compiler (more specifically
the InstCombine pass) moved some code after
the 'label' to before the 'label'. Not exactly
sure how to prevent this. Maybe current
'asm goto' already have a way to handle
this. Will investigate this later.


=========================

$ cat t.c
int bar(void);
int foo1()
{
         int a, b;
                                                                                                                                                                             
         asm volatile goto ("r0 = 0; \
                             gotol_or_nop %l[label]; \
                             r2 = 2; \
                             r3 = 3; \
                            "::::label);
         a = bar();
label:
         b = 20 * a;
         return b;
}
int foo2()
{
         int a, b;
                                                                                                                                                                             
         asm volatile goto ("r0 = 0; \
                             nop_or_gotol %l[label]; \
                             r2 = 2; \
                             r3 = 3; \
                            "::::label);
         a = bar();
label:
         b = 20 * a;
         return b;
}
$ clang --target=bpf -O2 -g -c t.c
$ llvm-objdump -S t.o

t.o:    file format elf64-bpf

Disassembly of section .text:

0000000000000000 <foo1>:
; {
        0:       b7 00 00 00 00 00 00 00 r0 = 0x0
;       asm volatile goto ("r0 = 0; \
        1:       b7 00 00 00 00 00 00 00 r0 = 0x0
        2:       06 10 00 00 04 00 00 00 gotol_or_nop +0x4 <LBB0_2>
        3:       b7 02 00 00 02 00 00 00 r2 = 0x2
        4:       b7 03 00 00 03 00 00 00 r3 = 0x3
;       a = bar();
        5:       85 10 00 00 ff ff ff ff call -0x1
;       b = 20 * a;
        6:       27 00 00 00 14 00 00 00 r0 *= 0x14

0000000000000038 <LBB0_2>:
;       return b;
        7:       95 00 00 00 00 00 00 00 exit

0000000000000040 <foo2>:
; {
        8:       b7 00 00 00 00 00 00 00 r0 = 0x0
;       asm volatile goto ("r0 = 0; \
        9:       b7 00 00 00 00 00 00 00 r0 = 0x0
       10:       06 20 00 00 04 00 00 00 nop_or_gotol +0x4 <LBB1_2>
       11:       b7 02 00 00 02 00 00 00 r2 = 0x2
       12:       b7 03 00 00 03 00 00 00 r3 = 0x3
;       a = bar();
       13:       85 10 00 00 ff ff ff ff call -0x1
;       b = 20 * a;
       14:       27 00 00 00 14 00 00 00 r0 *= 0x14

0000000000000078 <LBB1_2>:
;       return b;
       15:       95 00 00 00 00 00 00 00 exit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-10  6:32                   ` Yonghong Song
@ 2023-12-10 10:30                     ` Eduard Zingerman
  2023-12-11  3:33                       ` Alexei Starovoitov
  2023-12-11 17:31                     ` Anton Protopopov
  1 sibling, 1 reply; 32+ messages in thread
From: Eduard Zingerman @ 2023-12-10 10:30 UTC (permalink / raw)
  To: Yonghong Song, Alexei Starovoitov, Anton Protopopov
  Cc: Andrii Nakryiko, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jiri Olsa, Martin KaFai Lau, Stanislav Fomichev,
	bpf

How about a slightly different modification of the Anton's idea.
Suppose that, as before, there is a special map type:

    struct {
        __uint(type, BPF_MAP_TYPE_ARRAY);
        __type(key, __u32);
        __type(value, __u32);
        __uint(map_flags, BPF_F_STATIC_KEY);
        __uint(max_entries, 1);
    } skey1 SEC(".maps")

Which is used as below:

    __attribute__((naked))
    int foo(void) {
      asm volatile (
                    "r0 = %[skey1] ll;"
                    "if r0 != r0 goto 1f;"
                    "r1 = r10;"
                    "r1 += -8;"
                    "r2 = 1;"
                    "call %[bpf_trace_printk];"
            "1:"
                    "exit;"
                    :: __imm_addr(skey1),
                       __imm(bpf_trace_printk)
                    : __clobber_all
      );
    }

Disassembly of section .text:

0000000000000000 <foo>:
       0:   r0 = 0x0 ll
        0000000000000000:  R_BPF_64_64  skey1  ;; <---- Map relocation as usual
       2:   if r0 == r0 goto +0x4 <foo+0x38>   ;; <---- Note condition
       3:   r1 = r10
       4:   r1 += -0x8
       5:   r2 = 0x1
       6:   call 0x6
       7:   exit

And suppose that verifier is modified in the following ways:
- treat instructions "if rX == rX" / "if rX != rX" (when rX points to
  static key map) in a special way:
  - when program is verified, the jump is considered non deterministic;
  - when program is jitted, the jump is compiled as nop for "!=" and as
    unconditional jump for "==";
- build a table of static keys based on a specific map referenced in
  condition, e.g. for the example above it can be inferred that insn 2
  associates with map skey1 because "r0" points to "skey1";
- jit "rX = <static key> ll;" as nop;

On the plus side:
- any kinds of jump tables are omitted from system call;
- no new instruction is needed;
- almost no modifications to libbpf are necessary (only a helper macro
  to convince clang to keep "if rX == rX");

What do you think?

Thanks,
Eduard

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-10 10:30                     ` Eduard Zingerman
@ 2023-12-11  3:33                       ` Alexei Starovoitov
  2023-12-11 18:49                         ` Andrii Nakryiko
  2023-12-12 10:25                         ` Anton Protopopov
  0 siblings, 2 replies; 32+ messages in thread
From: Alexei Starovoitov @ 2023-12-11  3:33 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Yonghong Song, Anton Protopopov, Andrii Nakryiko,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa,
	Martin KaFai Lau, Stanislav Fomichev, bpf

On Sun, Dec 10, 2023 at 2:30 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> How about a slightly different modification of the Anton's idea.
> Suppose that, as before, there is a special map type:
>
>     struct {
>         __uint(type, BPF_MAP_TYPE_ARRAY);
>         __type(key, __u32);
>         __type(value, __u32);
>         __uint(map_flags, BPF_F_STATIC_KEY);
>         __uint(max_entries, 1);
>     } skey1 SEC(".maps")

Instead of special map that the kernel has to know about
the same intent can be expressed with:
int skey1;
r0 = %[skey1] ll;
and then the kernel needs no extra map type while the user space
can collect all static_branches that use &skey1 by
iterating insn stream and comparing addresses.

> Which is used as below:
>
>     __attribute__((naked))
>     int foo(void) {
>       asm volatile (
>                     "r0 = %[skey1] ll;"
>                     "if r0 != r0 goto 1f;"
>                     "r1 = r10;"
>                     "r1 += -8;"
>                     "r2 = 1;"
>                     "call %[bpf_trace_printk];"
>             "1:"
>                     "exit;"
>                     :: __imm_addr(skey1),
>                        __imm(bpf_trace_printk)
>                     : __clobber_all
>       );
>     }
>
> Disassembly of section .text:
>
> 0000000000000000 <foo>:
>        0:   r0 = 0x0 ll
>         0000000000000000:  R_BPF_64_64  skey1  ;; <---- Map relocation as usual
>        2:   if r0 == r0 goto +0x4 <foo+0x38>   ;; <---- Note condition
>        3:   r1 = r10
>        4:   r1 += -0x8
>        5:   r2 = 0x1
>        6:   call 0x6
>        7:   exit
>
> And suppose that verifier is modified in the following ways:
> - treat instructions "if rX == rX" / "if rX != rX" (when rX points to
>   static key map) in a special way:
>   - when program is verified, the jump is considered non deterministic;
>   - when program is jitted, the jump is compiled as nop for "!=" and as
>     unconditional jump for "==";
> - build a table of static keys based on a specific map referenced in
>   condition, e.g. for the example above it can be inferred that insn 2
>   associates with map skey1 because "r0" points to "skey1";
> - jit "rX = <static key> ll;" as nop;
>
> On the plus side:
> - any kinds of jump tables are omitted from system call;
> - no new instruction is needed;
> - almost no modifications to libbpf are necessary (only a helper macro
>   to convince clang to keep "if rX == rX");

Reusing existing insn means that we're giving it new meaning
and that always comes with danger of breaking existing progs.
In this case if rX == rX isn't very meaningful and new semantics
shouldn't break anything, but it's a danger zone.

If we treat:
if r0 == r0
as JA
then we have to treat
if r1 == r1
as JA as well and it becomes ambiguous when prog_info needs
to return the insns back to user space.

If we go with rX == rX approach we should probably limit it
to one specific register. r0, r10, r11 can be considered
and they have their own pros and cons.

Additional:
r0 = %[skey1] ll
in front of JE/JNE is a waste. If we JIT it to useless native insn
we will be burning cpu for no reason. So we should probably
optimize it out. If we do so, then this inline insn becomes a nop and
it's effectively a relocation. The insn stream will carry this
rX = 64bit_const insn to indicate the scope of the next insn.
It's pretty much like Anton's idea of using extra bits in JA
to encode an integer key_id.
With ld_imm64 we will encode 64-bit key_id.
Another insn with more bits to burn that has no effect on execution.

It doesn't look clean to encode so much extra metadata into instructions
that JITs and the interpreter have to ignore.
If we go this route:
  r11 = 64bit_const
  if r11 == r11 goto
is a lesser evil.
Still, it's not as clean as JA with extra bits in src_reg.
We already optimize JA +0 into a nop. See opt_remove_nops().
So a flavor of JA insn looks the most natural fit for a selectable
JA +xx or JA +0.

And the special map really doesn't fit.
Whatever we do, let's keep text_poke-able insn logic separate
from bookkeeping of addresses of those insns.
I think a special prefixed section that is understood by libbpf
(like what I proposed with "name.static_branch") will do fine.
If it's not good enough we can add a "set" map type
that will be a generic set of values.
It can be a set of 8-byte addresses to keep locations of static_branches,
but let's keep it generic.
I think it's fine to add:
__uint(type, BPF_MAP_TYPE_SET)
and let libbpf populate it with addresses of insns,
or address of variables, or other values
when it prepares a program for loading.
But map_update_elem should never be doing text_poke on insns.
We added prog_array map type is the past, but that was done
during the early days. If we were designing bpf today we would have
gone a different route.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-10  6:32                   ` Yonghong Song
  2023-12-10 10:30                     ` Eduard Zingerman
@ 2023-12-11 17:31                     ` Anton Protopopov
  2023-12-11 19:08                       ` Alexei Starovoitov
  2023-12-11 21:51                       ` Yonghong Song
  1 sibling, 2 replies; 32+ messages in thread
From: Anton Protopopov @ 2023-12-11 17:31 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Andrii Nakryiko, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Jiri Olsa, Martin KaFai Lau,
	Stanislav Fomichev, bpf

On Sat, Dec 09, 2023 at 10:32:42PM -0800, Yonghong Song wrote:
> 
> On 12/9/23 9:18 AM, Alexei Starovoitov wrote:
> > On Fri, Dec 8, 2023 at 9:05 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> > > 
> > > On 12/8/23 8:25 PM, Alexei Starovoitov wrote:
> > > > On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> > > > > On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
> > > > > > On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
> > > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > > > I feel like embedding some sort of ID inside the instruction is very..
> > > > > > > unusual, shall we say?
> > > > > > yeah. no magic numbers inside insns pls.
> > > > > > 
> > > > > > I don't like JA_CFG name, since I read CFG as control flow graph,
> > > > > > while you probably meant CFG as configurable.
> > > > > > How about BPF_JA_OR_NOP ?
> > > > > > Then in combination with BPF_JMP or BPF_JMP32 modifier
> > > > > > the insn->off|imm will be used.
> > > > > > 1st bit in src_reg can indicate the default action: nop or jmp.
> > > > > > In asm it may look like asm("goto_or_nop +5")
> > > > > How does the C source code looks like in order to generate
> > > > > BPF_JA_OR_NOP insn? Any source examples?
> > > > It will be in inline asm only. The address of that insn will
> > > > be taken either via && or via asm (".long %l[label]").
> > > >   From llvm pov both should go through the same relo creation logic. I hope :)
> > > A hack in llvm below with an example, could you check whether the C
> > > syntax and object dump result
> > > is what you want to see?
> > Thank you for the ultra quick llvm diff!
> > 
> > > diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> > > b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> > > index 90697c6645be..38b1cbc31f9a 100644
> > > --- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> > > +++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
> > > @@ -231,6 +231,7 @@ public:
> > >            .Case("call", true)
> > >            .Case("goto", true)
> > >            .Case("gotol", true)
> > > +        .Case("goto_or_nop", true)
> > >            .Case("*", true)
> > >            .Case("exit", true)
> > >            .Case("lock", true)
> > > @@ -259,6 +260,7 @@ public:
> > >            .Case("bswap64", true)
> > >            .Case("goto", true)
> > >            .Case("gotol", true)
> > > +        .Case("goto_or_nop", true)
> > >            .Case("ll", true)
> > >            .Case("skb", true)
> > >            .Case("s", true)
> > > diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td
> > > b/llvm/lib/Target/BPF/BPFInstrInfo.td
> > > index 5972c9d49c51..a953d10429bf 100644
> > > --- a/llvm/lib/Target/BPF/BPFInstrInfo.td
> > > +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
> > > @@ -592,6 +592,19 @@ class BRANCH<BPFJumpOp Opc, string OpcodeStr,
> > > list<dag> Pattern>
> > >      let BPFClass = BPF_JMP;
> > >    }
> > > 
> > > +class BRANCH_OR_NOP<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
> > > +    : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
> > > +                   (outs),
> > > +                   (ins brtarget:$BrDst),
> > > +                   !strconcat(OpcodeStr, " $BrDst"),
> > > +                   Pattern> {
> > > +  bits<16> BrDst;
> > > +
> > > +  let Inst{47-32} = BrDst;
> > > +  let Inst{31-0} = 1;
> > > +  let BPFClass = BPF_JMP;
> > > +}
> > > +
> > >    class BRANCH_LONG<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
> > >        : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
> > >                       (outs),
> > > @@ -632,6 +645,7 @@ class CALLX<string OpcodeStr>
> > >    let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in {
> > >      def JMP : BRANCH<BPF_JA, "goto", [(br bb:$BrDst)]>;
> > >      def JMPL : BRANCH_LONG<BPF_JA, "gotol", []>;
> > > +  def JMP_OR_NOP : BRANCH_OR_NOP<BPF_JA, "goto_or_nop", []>;
> > I was thinking of burning the new 0xE opcode for it,
> > but you're right. It's a flavor of existing JA insn and it's indeed
> > better to just use src_reg=1 bit to indicate so.
> 
> Right, using src_reg to indicate a new flavor of JA insn sounds
> a good idea. My previously-used 'imm' field is a pure hack.
> 
> > 
> > We probably need to use the 2nd bit of src_reg to indicate its default state
> > (jmp or fallthrough).
> 
> Good point.
> 
> > 
> > >           asm volatile goto ("r0 = 0; \
> > >                               goto_or_nop %l[label]; \
> > >                               r2 = 2; \
> > >                               r3 = 3; \
> > Not sure how to represent the default state in assembly though.
> > "goto_or_nop" defaults to goto
> > "nop_or_goto" default to nop
> > ?
> > 
> > Do we need "gotol" for imm32 or will it be automatic?
> 
> It won't be automatic.
> 
> At the end of this email, I will show the new change
> to have gotol_or_nop and nop_or_gotol insn and an example

Thanks a lot Yonghong! May I ask you to send a full patch for LLVM
(with gotol) so that I can test it?

Overall, I think that JA + flags in SRC_REG is indeed better than a
new instruction, as a new code is not used.

This looks for me that two bits aren't enough, and the third is
required, as the second bit seems to be overloaded:

  * bit 1 indicates that this is a "JA_MAYBE"
  * bit 2 indicates a jump or nop (i.e., the current state)

However, we also need another bit which indicates what to do with the
instruction when we issue [an abstract] command

  flip_branch_on_or_off(branch, 0/1)

Without this information (and in the absense of external meta-data on
how to patch the branch) we can't determine what a given (BPF, not
jitted) program currently does. For example, if we issue

  flip_branch_on_or_off(branch, 0)

then we can't reflect this in the xlated program by setting the second
bit to jmp/off. Again, JITted program is fine, but it will be
desynchronized from xlated in term of logic (some instructions will be
mapped as NOP -> x86_JUMP, others as NOP -> x86_NOP).

In my original patch we kept this triplet as

  (offset to indicate a "special jump", JA+0/JA+OFF, Normal/Inverse)

> to show it in asm. But there is an issue here.
> In my example, the compiler (more specifically
> the InstCombine pass) moved some code after
> the 'label' to before the 'label'. Not exactly
> sure how to prevent this. Maybe current
> 'asm goto' already have a way to handle
> this. Will investigate this later.
> 
> 
> =========================
> 
> $ cat t.c
> int bar(void);
> int foo1()
> {
>         int a, b;
>         asm volatile goto ("r0 = 0; \
>                             gotol_or_nop %l[label]; \
>                             r2 = 2; \
>                             r3 = 3; \
>                            "::::label);
>         a = bar();
> label:
>         b = 20 * a;
>         return b;
> }
> int foo2()
> {
>         int a, b;
>         asm volatile goto ("r0 = 0; \
>                             nop_or_gotol %l[label]; \
>                             r2 = 2; \
>                             r3 = 3; \
>                            "::::label);
>         a = bar();
> label:
>         b = 20 * a;
>         return b;
> }
> $ clang --target=bpf -O2 -g -c t.c
> $ llvm-objdump -S t.o
> 
> t.o:    file format elf64-bpf
> 
> Disassembly of section .text:
> 
> 0000000000000000 <foo1>:
> ; {
>        0:       b7 00 00 00 00 00 00 00 r0 = 0x0
> ;       asm volatile goto ("r0 = 0; \
>        1:       b7 00 00 00 00 00 00 00 r0 = 0x0
>        2:       06 10 00 00 04 00 00 00 gotol_or_nop +0x4 <LBB0_2>
>        3:       b7 02 00 00 02 00 00 00 r2 = 0x2
>        4:       b7 03 00 00 03 00 00 00 r3 = 0x3
> ;       a = bar();
>        5:       85 10 00 00 ff ff ff ff call -0x1
> ;       b = 20 * a;
>        6:       27 00 00 00 14 00 00 00 r0 *= 0x14
> 
> 0000000000000038 <LBB0_2>:
> ;       return b;
>        7:       95 00 00 00 00 00 00 00 exit
> 
> 0000000000000040 <foo2>:
> ; {
>        8:       b7 00 00 00 00 00 00 00 r0 = 0x0
> ;       asm volatile goto ("r0 = 0; \
>        9:       b7 00 00 00 00 00 00 00 r0 = 0x0
>       10:       06 20 00 00 04 00 00 00 nop_or_gotol +0x4 <LBB1_2>
>       11:       b7 02 00 00 02 00 00 00 r2 = 0x2
>       12:       b7 03 00 00 03 00 00 00 r3 = 0x3
> ;       a = bar();
>       13:       85 10 00 00 ff ff ff ff call -0x1
> ;       b = 20 * a;
>       14:       27 00 00 00 14 00 00 00 r0 *= 0x14
> 
> 0000000000000078 <LBB1_2>:
> ;       return b;
>       15:       95 00 00 00 00 00 00 00 exit
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-11  3:33                       ` Alexei Starovoitov
@ 2023-12-11 18:49                         ` Andrii Nakryiko
  2023-12-12 10:25                         ` Anton Protopopov
  1 sibling, 0 replies; 32+ messages in thread
From: Andrii Nakryiko @ 2023-12-11 18:49 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eduard Zingerman, Yonghong Song, Anton Protopopov,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa,
	Martin KaFai Lau, Stanislav Fomichev, bpf

On Sun, Dec 10, 2023 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sun, Dec 10, 2023 at 2:30 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > How about a slightly different modification of the Anton's idea.
> > Suppose that, as before, there is a special map type:
> >
> >     struct {
> >         __uint(type, BPF_MAP_TYPE_ARRAY);
> >         __type(key, __u32);
> >         __type(value, __u32);
> >         __uint(map_flags, BPF_F_STATIC_KEY);
> >         __uint(max_entries, 1);
> >     } skey1 SEC(".maps")
>
> Instead of special map that the kernel has to know about
> the same intent can be expressed with:
> int skey1;
> r0 = %[skey1] ll;
> and then the kernel needs no extra map type while the user space
> can collect all static_branches that use &skey1 by
> iterating insn stream and comparing addresses.
>
> > Which is used as below:
> >
> >     __attribute__((naked))
> >     int foo(void) {
> >       asm volatile (
> >                     "r0 = %[skey1] ll;"
> >                     "if r0 != r0 goto 1f;"
> >                     "r1 = r10;"
> >                     "r1 += -8;"
> >                     "r2 = 1;"
> >                     "call %[bpf_trace_printk];"
> >             "1:"
> >                     "exit;"
> >                     :: __imm_addr(skey1),
> >                        __imm(bpf_trace_printk)
> >                     : __clobber_all
> >       );
> >     }
> >
> > Disassembly of section .text:
> >
> > 0000000000000000 <foo>:
> >        0:   r0 = 0x0 ll
> >         0000000000000000:  R_BPF_64_64  skey1  ;; <---- Map relocation as usual
> >        2:   if r0 == r0 goto +0x4 <foo+0x38>   ;; <---- Note condition
> >        3:   r1 = r10
> >        4:   r1 += -0x8
> >        5:   r2 = 0x1
> >        6:   call 0x6
> >        7:   exit
> >
> > And suppose that verifier is modified in the following ways:
> > - treat instructions "if rX == rX" / "if rX != rX" (when rX points to
> >   static key map) in a special way:
> >   - when program is verified, the jump is considered non deterministic;
> >   - when program is jitted, the jump is compiled as nop for "!=" and as
> >     unconditional jump for "==";
> > - build a table of static keys based on a specific map referenced in
> >   condition, e.g. for the example above it can be inferred that insn 2
> >   associates with map skey1 because "r0" points to "skey1";
> > - jit "rX = <static key> ll;" as nop;
> >
> > On the plus side:
> > - any kinds of jump tables are omitted from system call;
> > - no new instruction is needed;
> > - almost no modifications to libbpf are necessary (only a helper macro
> >   to convince clang to keep "if rX == rX");
>
> Reusing existing insn means that we're giving it new meaning
> and that always comes with danger of breaking existing progs.
> In this case if rX == rX isn't very meaningful and new semantics
> shouldn't break anything, but it's a danger zone.
>
> If we treat:
> if r0 == r0
> as JA
> then we have to treat
> if r1 == r1
> as JA as well and it becomes ambiguous when prog_info needs
> to return the insns back to user space.
>
> If we go with rX == rX approach we should probably limit it
> to one specific register. r0, r10, r11 can be considered
> and they have their own pros and cons.
>
> Additional:
> r0 = %[skey1] ll
> in front of JE/JNE is a waste. If we JIT it to useless native insn
> we will be burning cpu for no reason. So we should probably
> optimize it out. If we do so, then this inline insn becomes a nop and
> it's effectively a relocation. The insn stream will carry this
> rX = 64bit_const insn to indicate the scope of the next insn.
> It's pretty much like Anton's idea of using extra bits in JA
> to encode an integer key_id.
> With ld_imm64 we will encode 64-bit key_id.
> Another insn with more bits to burn that has no effect on execution.
>
> It doesn't look clean to encode so much extra metadata into instructions
> that JITs and the interpreter have to ignore.
> If we go this route:
>   r11 = 64bit_const
>   if r11 == r11 goto
> is a lesser evil.
> Still, it's not as clean as JA with extra bits in src_reg.
> We already optimize JA +0 into a nop. See opt_remove_nops().
> So a flavor of JA insn looks the most natural fit for a selectable
> JA +xx or JA +0.

Another point for special jump instruction is that libbpf can be
taught to patch it into a normal JA or NOP on old kernels, depending
on likely/unlikely bit. You won't be able to flip it, of course, but
at least you don't have to compile two separate versions of the BPF
object file. Instead of "jump maybe" it will transparently become
"jump/don't jump for sure". This should definitely help with backwards
compatibility.

>
> And the special map really doesn't fit.
> Whatever we do, let's keep text_poke-able insn logic separate
> from bookkeeping of addresses of those insns.
> I think a special prefixed section that is understood by libbpf
> (like what I proposed with "name.static_branch") will do fine.
> If it's not good enough we can add a "set" map type
> that will be a generic set of values.
> It can be a set of 8-byte addresses to keep locations of static_branches,
> but let's keep it generic.
> I think it's fine to add:
> __uint(type, BPF_MAP_TYPE_SET)
> and let libbpf populate it with addresses of insns,
> or address of variables, or other values
> when it prepares a program for loading.
> But map_update_elem should never be doing text_poke on insns.
> We added prog_array map type is the past, but that was done
> during the early days. If we were designing bpf today we would have
> gone a different route.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-11 17:31                     ` Anton Protopopov
@ 2023-12-11 19:08                       ` Alexei Starovoitov
  2023-12-12  9:06                         ` Anton Protopopov
  2023-12-11 21:51                       ` Yonghong Song
  1 sibling, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2023-12-11 19:08 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: Yonghong Song, Andrii Nakryiko, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Jiri Olsa, Martin KaFai Lau,
	Stanislav Fomichev, bpf

On Mon, Dec 11, 2023 at 9:34 AM Anton Protopopov <aspsk@isovalent.com> wrote:
>
>
> This looks for me that two bits aren't enough, and the third is
> required, as the second bit seems to be overloaded:
>
>   * bit 1 indicates that this is a "JA_MAYBE"
>   * bit 2 indicates a jump or nop (i.e., the current state)
>
> However, we also need another bit which indicates what to do with the
> instruction when we issue [an abstract] command
>
>   flip_branch_on_or_off(branch, 0/1)
>
> Without this information (and in the absense of external meta-data on
> how to patch the branch) we can't determine what a given (BPF, not
> jitted) program currently does. For example, if we issue
>
>   flip_branch_on_or_off(branch, 0)
>
> then we can't reflect this in the xlated program by setting the second
> bit to jmp/off. Again, JITted program is fine, but it will be
> desynchronized from xlated in term of logic (some instructions will be
> mapped as NOP -> x86_JUMP, others as NOP -> x86_NOP).

Not following the need for the 3rd bit.
The 2nd bit is not only the initial state, but the current state too.

when user space does static_branch_enable it will set the 2nd bit to 1
(if it wasn't set) and will text_poke_bp the code.
xlated will be always in sync with JITed.
No ambiguity.

An annoying part is that bpf insn stream is read only, so we cannot
really write that 2nd bit. We can virtually write it.
Seeing nop or jmp in JITed code would mean the bit is 0 or 1.
So xlated dump will report it.

Separately the kernel doesn't have static_branch_switch/flip command.
It's only enable or disable. I think it's better to keep it the same way.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-11 17:31                     ` Anton Protopopov
  2023-12-11 19:08                       ` Alexei Starovoitov
@ 2023-12-11 21:51                       ` Yonghong Song
  2023-12-11 22:52                         ` Yonghong Song
  1 sibling, 1 reply; 32+ messages in thread
From: Yonghong Song @ 2023-12-11 21:51 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: Alexei Starovoitov, Andrii Nakryiko, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Jiri Olsa, Martin KaFai Lau,
	Stanislav Fomichev, bpf


On 12/11/23 9:31 AM, Anton Protopopov wrote:
> On Sat, Dec 09, 2023 at 10:32:42PM -0800, Yonghong Song wrote:
>> On 12/9/23 9:18 AM, Alexei Starovoitov wrote:
>>> On Fri, Dec 8, 2023 at 9:05 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>> On 12/8/23 8:25 PM, Alexei Starovoitov wrote:
>>>>> On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>>>> On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
>>>>>>> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
>>>>>>> <andrii.nakryiko@gmail.com> wrote:
>>>>>>>> I feel like embedding some sort of ID inside the instruction is very..
>>>>>>>> unusual, shall we say?
>>>>>>> yeah. no magic numbers inside insns pls.
>>>>>>>
>>>>>>> I don't like JA_CFG name, since I read CFG as control flow graph,
>>>>>>> while you probably meant CFG as configurable.
>>>>>>> How about BPF_JA_OR_NOP ?
>>>>>>> Then in combination with BPF_JMP or BPF_JMP32 modifier
>>>>>>> the insn->off|imm will be used.
>>>>>>> 1st bit in src_reg can indicate the default action: nop or jmp.
>>>>>>> In asm it may look like asm("goto_or_nop +5")
>>>>>> How does the C source code looks like in order to generate
>>>>>> BPF_JA_OR_NOP insn? Any source examples?
>>>>> It will be in inline asm only. The address of that insn will
>>>>> be taken either via && or via asm (".long %l[label]").
>>>>>    From llvm pov both should go through the same relo creation logic. I hope :)
>>>> A hack in llvm below with an example, could you check whether the C
>>>> syntax and object dump result
>>>> is what you want to see?
>>> Thank you for the ultra quick llvm diff!
>>>
>>>> diff --git a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>>>> b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>>>> index 90697c6645be..38b1cbc31f9a 100644
>>>> --- a/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>>>> +++ b/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
>>>> @@ -231,6 +231,7 @@ public:
>>>>             .Case("call", true)
>>>>             .Case("goto", true)
>>>>             .Case("gotol", true)
>>>> +        .Case("goto_or_nop", true)
>>>>             .Case("*", true)
>>>>             .Case("exit", true)
>>>>             .Case("lock", true)
>>>> @@ -259,6 +260,7 @@ public:
>>>>             .Case("bswap64", true)
>>>>             .Case("goto", true)
>>>>             .Case("gotol", true)
>>>> +        .Case("goto_or_nop", true)
>>>>             .Case("ll", true)
>>>>             .Case("skb", true)
>>>>             .Case("s", true)
>>>> diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td
>>>> b/llvm/lib/Target/BPF/BPFInstrInfo.td
>>>> index 5972c9d49c51..a953d10429bf 100644
>>>> --- a/llvm/lib/Target/BPF/BPFInstrInfo.td
>>>> +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
>>>> @@ -592,6 +592,19 @@ class BRANCH<BPFJumpOp Opc, string OpcodeStr,
>>>> list<dag> Pattern>
>>>>       let BPFClass = BPF_JMP;
>>>>     }
>>>>
>>>> +class BRANCH_OR_NOP<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
>>>> +    : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
>>>> +                   (outs),
>>>> +                   (ins brtarget:$BrDst),
>>>> +                   !strconcat(OpcodeStr, " $BrDst"),
>>>> +                   Pattern> {
>>>> +  bits<16> BrDst;
>>>> +
>>>> +  let Inst{47-32} = BrDst;
>>>> +  let Inst{31-0} = 1;
>>>> +  let BPFClass = BPF_JMP;
>>>> +}
>>>> +
>>>>     class BRANCH_LONG<BPFJumpOp Opc, string OpcodeStr, list<dag> Pattern>
>>>>         : TYPE_ALU_JMP<Opc.Value, BPF_K.Value,
>>>>                        (outs),
>>>> @@ -632,6 +645,7 @@ class CALLX<string OpcodeStr>
>>>>     let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in {
>>>>       def JMP : BRANCH<BPF_JA, "goto", [(br bb:$BrDst)]>;
>>>>       def JMPL : BRANCH_LONG<BPF_JA, "gotol", []>;
>>>> +  def JMP_OR_NOP : BRANCH_OR_NOP<BPF_JA, "goto_or_nop", []>;
>>> I was thinking of burning the new 0xE opcode for it,
>>> but you're right. It's a flavor of existing JA insn and it's indeed
>>> better to just use src_reg=1 bit to indicate so.
>> Right, using src_reg to indicate a new flavor of JA insn sounds
>> a good idea. My previously-used 'imm' field is a pure hack.
>>
>>> We probably need to use the 2nd bit of src_reg to indicate its default state
>>> (jmp or fallthrough).
>> Good point.
>>
>>>>            asm volatile goto ("r0 = 0; \
>>>>                                goto_or_nop %l[label]; \
>>>>                                r2 = 2; \
>>>>                                r3 = 3; \
>>> Not sure how to represent the default state in assembly though.
>>> "goto_or_nop" defaults to goto
>>> "nop_or_goto" default to nop
>>> ?
>>>
>>> Do we need "gotol" for imm32 or will it be automatic?
>> It won't be automatic.
>>
>> At the end of this email, I will show the new change
>> to have gotol_or_nop and nop_or_gotol insn and an example
> Thanks a lot Yonghong! May I ask you to send a full patch for LLVM
> (with gotol) so that I can test it?

Okay, I will send a RFC patch to llvm-project so you can do 'git fetch'
to get the patch into your local llvm-project repo and build a compiler
to test.

>
> Overall, I think that JA + flags in SRC_REG is indeed better than a
> new instruction, as a new code is not used.
>
> This looks for me that two bits aren't enough, and the third is
> required, as the second bit seems to be overloaded:
>
>    * bit 1 indicates that this is a "JA_MAYBE"
>    * bit 2 indicates a jump or nop (i.e., the current state)
>
> However, we also need another bit which indicates what to do with the
> instruction when we issue [an abstract] command
>
>    flip_branch_on_or_off(branch, 0/1)
>
> Without this information (and in the absense of external meta-data on
> how to patch the branch) we can't determine what a given (BPF, not
> jitted) program currently does. For example, if we issue
>
>    flip_branch_on_or_off(branch, 0)
>
> then we can't reflect this in the xlated program by setting the second
> bit to jmp/off. Again, JITted program is fine, but it will be
> desynchronized from xlated in term of logic (some instructions will be
> mapped as NOP -> x86_JUMP, others as NOP -> x86_NOP).
>
> In my original patch we kept this triplet as
>
>    (offset to indicate a "special jump", JA+0/JA+OFF, Normal/Inverse)
[...]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-11 21:51                       ` Yonghong Song
@ 2023-12-11 22:52                         ` Yonghong Song
  0 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-12-11 22:52 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: Alexei Starovoitov, Andrii Nakryiko, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Jiri Olsa, Martin KaFai Lau,
	Stanislav Fomichev, bpf


On 12/11/23 1:51 PM, Yonghong Song wrote:
>
> On 12/11/23 9:31 AM, Anton Protopopov wrote:
>> On Sat, Dec 09, 2023 at 10:32:42PM -0800, Yonghong Song wrote:
>>> On 12/9/23 9:18 AM, Alexei Starovoitov wrote:
>>>> On Fri, Dec 8, 2023 at 9:05 PM Yonghong Song 
>>>> <yonghong.song@linux.dev> wrote:
>>>>> On 12/8/23 8:25 PM, Alexei Starovoitov wrote:
>>>>>> On Fri, Dec 8, 2023 at 8:15 PM Yonghong Song 
>>>>>> <yonghong.song@linux.dev> wrote:
>>>>>>> On 12/8/23 8:05 PM, Alexei Starovoitov wrote:
>>>>>>>> On Fri, Dec 8, 2023 at 2:04 PM Andrii Nakryiko
>>>>>>>> <andrii.nakryiko@gmail.com> wrote:
>>>>>>>>> I feel like embedding some sort of ID inside the instruction 
>>>>>>>>> is very..
>>>>>>>>> unusual, shall we say?
>>>>>>>> yeah. no magic numbers inside insns pls.
>>>>>>>>
>>>>>>>> I don't like JA_CFG name, since I read CFG as control flow graph,
>>>>>>>> while you probably meant CFG as configurable.
>>>>>>>> How about BPF_JA_OR_NOP ?
>>>>>>>> Then in combination with BPF_JMP or BPF_JMP32 modifier
>>>>>>>> the insn->off|imm will be used.
>>>>>>>> 1st bit in src_reg can indicate the default action: nop or jmp.
>>>>>>>> In asm it may look like asm("goto_or_nop +5")
>>>>>>> How does the C source code looks like in order to generate
>>>>>>> BPF_JA_OR_NOP insn? Any source examples?
>>>>>> It will be in inline asm only. The address of that insn will
>>>>>> be taken either via && or via asm (".long %l[label]").
>>>>>>    From llvm pov both should go through the same relo creation 
>>>>>> logic. I hope :)
>>>>> A hack in llvm below with an example, could you check whether the C
>>>>> syntax and object dump result
>>>>> is what you want to see?
[...]
>>>> Thank you for the ultra quick llvm diff!
>>>> I was thinking of burning the new 0xE opcode for it,
>>>> but you're right. It's a flavor of existing JA insn and it's indeed
>>>> better to just use src_reg=1 bit to indicate so.
>>> Right, using src_reg to indicate a new flavor of JA insn sounds
>>> a good idea. My previously-used 'imm' field is a pure hack.
>>>
>>>> We probably need to use the 2nd bit of src_reg to indicate its 
>>>> default state
>>>> (jmp or fallthrough).
>>> Good point.
>>>
>>>>>            asm volatile goto ("r0 = 0; \
>>>>>                                goto_or_nop %l[label]; \
>>>>>                                r2 = 2; \
>>>>>                                r3 = 3; \
>>>> Not sure how to represent the default state in assembly though.
>>>> "goto_or_nop" defaults to goto
>>>> "nop_or_goto" default to nop
>>>> ?
>>>>
>>>> Do we need "gotol" for imm32 or will it be automatic?
>>> It won't be automatic.
>>>
>>> At the end of this email, I will show the new change
>>> to have gotol_or_nop and nop_or_gotol insn and an example
>> Thanks a lot Yonghong! May I ask you to send a full patch for LLVM
>> (with gotol) so that I can test it?
>
> Okay, I will send a RFC patch to llvm-project so you can do 'git fetch'
> to get the patch into your local llvm-project repo and build a compiler
> to test.

Okay, the following is the llvm-project diff:

https://github.com/llvm/llvm-project/pull/75110

I added a label in inline asm like below:
|asm volatile goto ("r0 = 0; \ static_key_loc_1: \ gotol_or_nop 
%l[label]; \ r2 = 2; \ r3 = 3; \ ":: : "r0", "r2", "r3" :label); to 
identify the location of a gotol_or_nop/nop_or_gotol insn and the label '||||static_key_loc_1|' is in ELF symbol table
can be easily searched and validated (the location is
a gotol_or_nop/nop_or_gotol insn).

>
>>
>> Overall, I think that JA + flags in SRC_REG is indeed better than a
>> new instruction, as a new code is not used.
>>
>> This looks for me that two bits aren't enough, and the third is
>> required, as the second bit seems to be overloaded:
>>
>>    * bit 1 indicates that this is a "JA_MAYBE"
>>    * bit 2 indicates a jump or nop (i.e., the current state)
>>
>> However, we also need another bit which indicates what to do with the
>> instruction when we issue [an abstract] command
>>
>>    flip_branch_on_or_off(branch, 0/1)
>>
>> Without this information (and in the absense of external meta-data on
>> how to patch the branch) we can't determine what a given (BPF, not
>> jitted) program currently does. For example, if we issue
>>
>>    flip_branch_on_or_off(branch, 0)
>>
>> then we can't reflect this in the xlated program by setting the second
>> bit to jmp/off. Again, JITted program is fine, but it will be
>> desynchronized from xlated in term of logic (some instructions will be
>> mapped as NOP -> x86_JUMP, others as NOP -> x86_NOP).
>>
>> In my original patch we kept this triplet as
>>
>>    (offset to indicate a "special jump", JA+0/JA+OFF, Normal/Inverse)
> [...]
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-11 19:08                       ` Alexei Starovoitov
@ 2023-12-12  9:06                         ` Anton Protopopov
  0 siblings, 0 replies; 32+ messages in thread
From: Anton Protopopov @ 2023-12-12  9:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Yonghong Song, Andrii Nakryiko, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Jiri Olsa, Martin KaFai Lau,
	Stanislav Fomichev, bpf

On Mon, Dec 11, 2023 at 11:08:59AM -0800, Alexei Starovoitov wrote:
> On Mon, Dec 11, 2023 at 9:34 AM Anton Protopopov <aspsk@isovalent.com> wrote:
> >
> >
> > This looks for me that two bits aren't enough, and the third is
> > required, as the second bit seems to be overloaded:
> >
> >   * bit 1 indicates that this is a "JA_MAYBE"
> >   * bit 2 indicates a jump or nop (i.e., the current state)
> >
> > However, we also need another bit which indicates what to do with the
> > instruction when we issue [an abstract] command
> >
> >   flip_branch_on_or_off(branch, 0/1)
> >
> > Without this information (and in the absense of external meta-data on
> > how to patch the branch) we can't determine what a given (BPF, not
> > jitted) program currently does. For example, if we issue
> >
> >   flip_branch_on_or_off(branch, 0)
> >
> > then we can't reflect this in the xlated program by setting the second
> > bit to jmp/off. Again, JITted program is fine, but it will be
> > desynchronized from xlated in term of logic (some instructions will be
> > mapped as NOP -> x86_JUMP, others as NOP -> x86_NOP).
> 
> Not following the need for the 3rd bit.
> The 2nd bit is not only the initial state, but the current state too.
> 
> when user space does static_branch_enable it will set the 2nd bit to 1
> (if it wasn't set) and will text_poke_bp the code.
> xlated will be always in sync with JITed.
> No ambiguity.

Ok, from BPF arch perspective this can work with two bits (not for
practical purposes though, IMO, see my next e-mail). On the lowest
level we have this magic jump instruction

  JA[SRC_REG=1] +OFF    # jits to a NOP
  JA[SRC_REG=3] +OFF    # jits to a JUMP

Then we have a primitive kfunc

  static_branch_set(prog, branch, bool on)

Which sets the second bit and pokes jitted code.  (Maybe it doesn't
set the actual bit in xlated due to read-only memory, as you've
mentioned below.) You're right that this is unambiguous.

> An annoying part is that bpf insn stream is read only, so we cannot
> really write that 2nd bit. We can virtually write it.
> Seeing nop or jmp in JITed code would mean the bit is 0 or 1.
> So xlated dump will report it.

If we can poke jitted x86 code, then we might poke prog->insnsi in the
same way as well? Besides, on architectures where static keys may be
of interest we don't use interpreter, so maybe this is ok to poke
insnsi (and make it rw) after all?

> Separately the kernel doesn't have static_branch_switch/flip command.
> It's only enable or disable. I think it's better to keep it the same way.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-11  3:33                       ` Alexei Starovoitov
  2023-12-11 18:49                         ` Andrii Nakryiko
@ 2023-12-12 10:25                         ` Anton Protopopov
  2023-12-14  2:15                           ` Alexei Starovoitov
  1 sibling, 1 reply; 32+ messages in thread
From: Anton Protopopov @ 2023-12-12 10:25 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eduard Zingerman, Yonghong Song, Andrii Nakryiko,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa,
	Martin KaFai Lau, Stanislav Fomichev, bpf

On Sun, Dec 10, 2023 at 07:33:31PM -0800, Alexei Starovoitov wrote:
> On Sun, Dec 10, 2023 at 2:30 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > How about a slightly different modification of the Anton's idea.
> > Suppose that, as before, there is a special map type:
> >
> >     struct {
> >         __uint(type, BPF_MAP_TYPE_ARRAY);
> >         __type(key, __u32);
> >         __type(value, __u32);
> >         __uint(map_flags, BPF_F_STATIC_KEY);
> >         __uint(max_entries, 1);
> >     } skey1 SEC(".maps")
> 
> Instead of special map that the kernel has to know about
> the same intent can be expressed with:
> int skey1;
> r0 = %[skey1] ll;
> and then the kernel needs no extra map type while the user space
> can collect all static_branches that use &skey1 by
> iterating insn stream and comparing addresses.
> 
> > Which is used as below:
> >
> >     __attribute__((naked))
> >     int foo(void) {
> >       asm volatile (
> >                     "r0 = %[skey1] ll;"
> >                     "if r0 != r0 goto 1f;"
> >                     "r1 = r10;"
> >                     "r1 += -8;"
> >                     "r2 = 1;"
> >                     "call %[bpf_trace_printk];"
> >             "1:"
> >                     "exit;"
> >                     :: __imm_addr(skey1),
> >                        __imm(bpf_trace_printk)
> >                     : __clobber_all
> >       );
> >     }
> >
> > Disassembly of section .text:
> >
> > 0000000000000000 <foo>:
> >        0:   r0 = 0x0 ll
> >         0000000000000000:  R_BPF_64_64  skey1  ;; <---- Map relocation as usual
> >        2:   if r0 == r0 goto +0x4 <foo+0x38>   ;; <---- Note condition
> >        3:   r1 = r10
> >        4:   r1 += -0x8
> >        5:   r2 = 0x1
> >        6:   call 0x6
> >        7:   exit
> >
> > And suppose that verifier is modified in the following ways:
> > - treat instructions "if rX == rX" / "if rX != rX" (when rX points to
> >   static key map) in a special way:
> >   - when program is verified, the jump is considered non deterministic;
> >   - when program is jitted, the jump is compiled as nop for "!=" and as
> >     unconditional jump for "==";
> > - build a table of static keys based on a specific map referenced in
> >   condition, e.g. for the example above it can be inferred that insn 2
> >   associates with map skey1 because "r0" points to "skey1";
> > - jit "rX = <static key> ll;" as nop;
> >
> > On the plus side:
> > - any kinds of jump tables are omitted from system call;
> > - no new instruction is needed;
> > - almost no modifications to libbpf are necessary (only a helper macro
> >   to convince clang to keep "if rX == rX");
> 
> Reusing existing insn means that we're giving it new meaning
> and that always comes with danger of breaking existing progs.
> In this case if rX == rX isn't very meaningful and new semantics
> shouldn't break anything, but it's a danger zone.
> 
> If we treat:
> if r0 == r0
> as JA
> then we have to treat
> if r1 == r1
> as JA as well and it becomes ambiguous when prog_info needs
> to return the insns back to user space.
> 
> If we go with rX == rX approach we should probably limit it
> to one specific register. r0, r10, r11 can be considered
> and they have their own pros and cons.
> 
> Additional:
> r0 = %[skey1] ll
> in front of JE/JNE is a waste. If we JIT it to useless native insn
> we will be burning cpu for no reason. So we should probably
> optimize it out. If we do so, then this inline insn becomes a nop and
> it's effectively a relocation. The insn stream will carry this
> rX = 64bit_const insn to indicate the scope of the next insn.
> It's pretty much like Anton's idea of using extra bits in JA
> to encode an integer key_id.
> With ld_imm64 we will encode 64-bit key_id.
> Another insn with more bits to burn that has no effect on execution.
> 
> It doesn't look clean to encode so much extra metadata into instructions
> that JITs and the interpreter have to ignore.
> If we go this route:
>   r11 = 64bit_const
>   if r11 == r11 goto
> is a lesser evil.
> Still, it's not as clean as JA with extra bits in src_reg.
> We already optimize JA +0 into a nop. See opt_remove_nops().
> So a flavor of JA insn looks the most natural fit for a selectable
> JA +xx or JA +0.

This seems to have a benefit that there is no back compatibility issue
(if we use r1, because r0/r11 will be rejected by old verifiers). We
can have

    r1 = 64bit_const
    if r1 == r1 goto

and

    r1 = 64bit_const
    if r1 != r1 goto

and translate it on prog load to new instruction as JUMP_OF_NOP and
NOP_OR_JUMP, correspondingly. On older kernels it will have the
default (key is off) behaviour.

> And the special map really doesn't fit.
> Whatever we do, let's keep text_poke-able insn logic separate
> from bookkeeping of addresses of those insns.
> I think a special prefixed section that is understood by libbpf
> (like what I proposed with "name.static_branch") will do fine.
> If it's not good enough we can add a "set" map type
> that will be a generic set of values.
> It can be a set of 8-byte addresses to keep locations of static_branches,
> but let's keep it generic.
> I think it's fine to add:
> __uint(type, BPF_MAP_TYPE_SET)
> and let libbpf populate it with addresses of insns,
> or address of variables, or other values
> when it prepares a program for loading.

What is the higher-level API in this case? The static_branch_set(branch,
bool on) is not enough because we want to distinguish between "normal"
and "inverse" branches (for unlikely/likely cases). We can implement
this using something like this:

static_key_set(key, bool new_value)
{
    /* true if we change key value */
    bool key_changed = key->old_value ^ new_value;
 
    for_each_prog(prog, key)
        for_each_branch(branch, prog, key)
            static_branch_flip(prog, branch, key_changed)
}

where static_branch_flip flips the second bit of SRC_REG. We need to
keep track of prog->branches and key->progs. How is this different
from what my patch implements?

If this is implemented in userspace, then how we prevent synchronous
updates of the key (and a relocation variant doesn't seem to work from
userspace)? Or is this a new kfunc? If yes, then how is it
executed, 

> But map_update_elem should never be doing text_poke on insns.
> We added prog_array map type is the past, but that was done
> during the early days. If we were designing bpf today we would have
> gone a different route.

What is the interface to toggle a key? If this is a map or an index of
a global variable and we can't do this via a syscall, then this means
that we need to write and compile an iterator to go through all maps,
select a proper map, then code should be generated to go through all
programs of interest and to update all static branches there (and we
will have to build this iterator every time, and keep track of all
mappings from userspace), while an alternative is to just issue one
syscall.

If this is just a `kfunc set_static_key(key, on/off)` then this is
simpler, but again, we will have to load an iterator and then iterate
through all maps to find the key vs. one syscall.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-12 10:25                         ` Anton Protopopov
@ 2023-12-14  2:15                           ` Alexei Starovoitov
  2023-12-14  3:04                             ` Yonghong Song
  2023-12-14 16:33                             ` Anton Protopopov
  0 siblings, 2 replies; 32+ messages in thread
From: Alexei Starovoitov @ 2023-12-14  2:15 UTC (permalink / raw)
  To: Anton Protopopov
  Cc: Eduard Zingerman, Yonghong Song, Andrii Nakryiko,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa,
	Martin KaFai Lau, Stanislav Fomichev, bpf

On Tue, Dec 12, 2023 at 2:28 AM Anton Protopopov <aspsk@isovalent.com> wrote:
>
>
> This seems to have a benefit that there is no back compatibility issue
> (if we use r1, because r0/r11 will be rejected by old verifiers). We
> can have
>
>     r1 = 64bit_const
>     if r1 == r1 goto
>
> and
>
>     r1 = 64bit_const
>     if r1 != r1 goto
>
> and translate it on prog load to new instruction as JUMP_OF_NOP and
> NOP_OR_JUMP, correspondingly. On older kernels it will have the
> default (key is off) behaviour.

As Andrii pointed out any new insn either JA with extra bits
or special meaning if rX == rX can be sanitized by libbpf
into plain JA.
There will be no backward compat issues.

> Ok, from BPF arch perspective this can work with two bits (not for
> practical purposes though, IMO, see my next e-mail).

I read this email and I still don't understand why you need a 3rd bit.

>
> > And the special map really doesn't fit.
> > Whatever we do, let's keep text_poke-able insn logic separate
> > from bookkeeping of addresses of those insns.
> > I think a special prefixed section that is understood by libbpf
> > (like what I proposed with "name.static_branch") will do fine.
> > If it's not good enough we can add a "set" map type
> > that will be a generic set of values.
> > It can be a set of 8-byte addresses to keep locations of static_branches,
> > but let's keep it generic.
> > I think it's fine to add:
> > __uint(type, BPF_MAP_TYPE_SET)
> > and let libbpf populate it with addresses of insns,
> > or address of variables, or other values
> > when it prepares a program for loading.
>
> What is the higher-level API in this case? The static_branch_set(branch,
> bool on) is not enough because we want to distinguish between "normal"
> and "inverse" branches (for unlikely/likely cases).

What is "likely/unlikely cases" ?
likely() is a hint to the compiler to order basic blocks in
a certain way. There is no likely/unlikely bit in the binary code
after compilation on x86 or other architectures.

There used to be a special bit on sparc64 that would mean
a default jmp|fallthrough action for a conditional jmp.
But that was before sparc became out of order and gained proper
branch predictor in HW.

>  We can implement
> this using something like this:
>
> static_key_set(key, bool new_value)
> {
>     /* true if we change key value */
>     bool key_changed = key->old_value ^ new_value;
>
>     for_each_prog(prog, key)
>         for_each_branch(branch, prog, key)
>             static_branch_flip(prog, branch, key_changed)
> }
>
> where static_branch_flip flips the second bit of SRC_REG.

I don't understand why you keep bringing up 'flip' use case.
The kernel doesn't have such an operation on static branches.
Which makes me believe that it wasn't necessary.
Why do we need one for the bpf static branch?

> We need to
> keep track of prog->branches and key->progs. How is this different
> from what my patch implements?

What I'm proposing is to have a generic map __uint(type, BPF_MAP_TYPE_SET)
and by naming convention libbpf will populate it with addresses
of JA_OR_NOP from all progs.
In asm it could be:
asm volatile ("r0 = %[set_A] ll; goto_or_nop ...");
(and libbpf will remove ld_imm64 from the prog before loading.)

or via
asm volatile ("goto_or_nop ...; .pushsection set_A_name+suffix; .long");
(and libbpf will copy from the special section into a set and remove
special section).

It will be a libbpf convention and the kernel doesn't need
to know about a special static branch map type or array of addresses
in prog_load cmd.
Only JA insn is relevant to the verifier and JITs.

Ideally we don't need to introduce SET map type and
libbpf wouldn't need to populate it.
If we can make it work with an array of values that .pushsection + .long
automatically populates and libbpf treats it as a normal global data array
that would be ideal.
Insn addresses from all progs will be in that array after loading.
Sort of like ".kconfig" section that libbpf populates,
but it's a normal array underneath.

> If this is implemented in userspace, then how we prevent synchronous
> updates of the key (and a relocation variant doesn't seem to work from
> userspace)? Or is this a new kfunc? If yes, then how is it
> executed,

then user space can have small helper in libbpf that iterates
over SET (or array) and
calls sys_bpf(cmd=STATIC_BRANCH_ENABLE, one_value_from_set)

Similar in the kernel. When bpf progs want to enable a key it does
bpf_for_each(set) { // open coded iterator
   bpf_static_branch_enable(addr); // kfunc call
}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-14  2:15                           ` Alexei Starovoitov
@ 2023-12-14  3:04                             ` Yonghong Song
  2023-12-14 16:56                               ` Eduard Zingerman
  2023-12-14 16:33                             ` Anton Protopopov
  1 sibling, 1 reply; 32+ messages in thread
From: Yonghong Song @ 2023-12-14  3:04 UTC (permalink / raw)
  To: Alexei Starovoitov, Anton Protopopov
  Cc: Eduard Zingerman, Andrii Nakryiko, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Jiri Olsa, Martin KaFai Lau,
	Stanislav Fomichev, bpf


On 12/13/23 6:15 PM, Alexei Starovoitov wrote:
> On Tue, Dec 12, 2023 at 2:28 AM Anton Protopopov <aspsk@isovalent.com> wrote:
>>
>> This seems to have a benefit that there is no back compatibility issue
>> (if we use r1, because r0/r11 will be rejected by old verifiers). We
>> can have
>>
>>      r1 = 64bit_const
>>      if r1 == r1 goto
>>
>> and
>>
>>      r1 = 64bit_const
>>      if r1 != r1 goto
>>
>> and translate it on prog load to new instruction as JUMP_OF_NOP and
>> NOP_OR_JUMP, correspondingly. On older kernels it will have the
>> default (key is off) behaviour.
> As Andrii pointed out any new insn either JA with extra bits
> or special meaning if rX == rX can be sanitized by libbpf
> into plain JA.
> There will be no backward compat issues.
>
>> Ok, from BPF arch perspective this can work with two bits (not for
>> practical purposes though, IMO, see my next e-mail).
> I read this email and I still don't understand why you need a 3rd bit.
>
>>> And the special map really doesn't fit.
>>> Whatever we do, let's keep text_poke-able insn logic separate
>>> from bookkeeping of addresses of those insns.
>>> I think a special prefixed section that is understood by libbpf
>>> (like what I proposed with "name.static_branch") will do fine.
>>> If it's not good enough we can add a "set" map type
>>> that will be a generic set of values.
>>> It can be a set of 8-byte addresses to keep locations of static_branches,
>>> but let's keep it generic.
>>> I think it's fine to add:
>>> __uint(type, BPF_MAP_TYPE_SET)
>>> and let libbpf populate it with addresses of insns,
>>> or address of variables, or other values
>>> when it prepares a program for loading.
>> What is the higher-level API in this case? The static_branch_set(branch,
>> bool on) is not enough because we want to distinguish between "normal"
>> and "inverse" branches (for unlikely/likely cases).
> What is "likely/unlikely cases" ?
> likely() is a hint to the compiler to order basic blocks in
> a certain way. There is no likely/unlikely bit in the binary code
> after compilation on x86 or other architectures.
>
> There used to be a special bit on sparc64 that would mean
> a default jmp|fallthrough action for a conditional jmp.
> But that was before sparc became out of order and gained proper
> branch predictor in HW.
>
>>   We can implement
>> this using something like this:
>>
>> static_key_set(key, bool new_value)
>> {
>>      /* true if we change key value */
>>      bool key_changed = key->old_value ^ new_value;
>>
>>      for_each_prog(prog, key)
>>          for_each_branch(branch, prog, key)
>>              static_branch_flip(prog, branch, key_changed)
>> }
>>
>> where static_branch_flip flips the second bit of SRC_REG.
> I don't understand why you keep bringing up 'flip' use case.
> The kernel doesn't have such an operation on static branches.
> Which makes me believe that it wasn't necessary.
> Why do we need one for the bpf static branch?
>
>> We need to
>> keep track of prog->branches and key->progs. How is this different
>> from what my patch implements?
> What I'm proposing is to have a generic map __uint(type, BPF_MAP_TYPE_SET)
> and by naming convention libbpf will populate it with addresses
> of JA_OR_NOP from all progs.
> In asm it could be:
> asm volatile ("r0 = %[set_A] ll; goto_or_nop ...");
> (and libbpf will remove ld_imm64 from the prog before loading.)
>
> or via
> asm volatile ("goto_or_nop ...; .pushsection set_A_name+suffix; .long");
> (and libbpf will copy from the special section into a set and remove
> special section).

This is one more alternative.

|asm volatile goto ("r0 = 0; \ static_key_loc_1: \ gotol_or_nop 
%l[label]; \ r2 = 2; \ r3 = 3; \ ":: : "r0", "r2", "r3" :label);|

User code can use libbpf API static_key_enable("|static_key_loc_1|")
to enable the above gotol_or_nop. static_key_disable("static_key_loc_1")
or static_key_enabled("static_key_loc_1") can be similary defined.

Inside the libbpf, it can look at ELF file, find label "static_key_loc_1",
and also find label's corresponding insn index and verify that
the insn with label "static_key_loc_1" must be a gotol_or_nop
or nop_or_gotol insn. Eventually libbpf can call a bpf syscall
e.g. sys_bpf(cmd=STATIC_KEY_ENABLE, prog_fd, insn_offset)
to actually change the "current" state to gotol or nop depending
on what ENABLE means.

For enable/disable static keys in the bpf program itself,
similary, bpf program can have bpf_static_key_enable("static_key_loc_1"),
the libbpf needs to change it to bpf_static_key_enable(insn_offset)
and kernel verifier should process it properly.

Slightly different from what Alexei proposed, but another approach
for consideration and discussion.

>
> It will be a libbpf convention and the kernel doesn't need
> to know about a special static branch map type or array of addresses
> in prog_load cmd.
> Only JA insn is relevant to the verifier and JITs.
>
> Ideally we don't need to introduce SET map type and
> libbpf wouldn't need to populate it.
> If we can make it work with an array of values that .pushsection + .long
> automatically populates and libbpf treats it as a normal global data array
> that would be ideal.
> Insn addresses from all progs will be in that array after loading.
> Sort of like ".kconfig" section that libbpf populates,
> but it's a normal array underneath.
>
>> If this is implemented in userspace, then how we prevent synchronous
>> updates of the key (and a relocation variant doesn't seem to work from
>> userspace)? Or is this a new kfunc? If yes, then how is it
>> executed,
> then user space can have small helper in libbpf that iterates
> over SET (or array) and
> calls sys_bpf(cmd=STATIC_BRANCH_ENABLE, one_value_from_set)
>
> Similar in the kernel. When bpf progs want to enable a key it does
> bpf_for_each(set) { // open coded iterator
>     bpf_static_branch_enable(addr); // kfunc call
> }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-14  2:15                           ` Alexei Starovoitov
  2023-12-14  3:04                             ` Yonghong Song
@ 2023-12-14 16:33                             ` Anton Protopopov
  1 sibling, 0 replies; 32+ messages in thread
From: Anton Protopopov @ 2023-12-14 16:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eduard Zingerman, Yonghong Song, Andrii Nakryiko,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jiri Olsa,
	Martin KaFai Lau, Stanislav Fomichev, bpf

On Wed, Dec 13, 2023 at 06:15:20PM -0800, Alexei Starovoitov wrote:
> On Tue, Dec 12, 2023 at 2:28 AM Anton Protopopov <aspsk@isovalent.com> wrote:
> >
> >
> > This seems to have a benefit that there is no back compatibility issue
> > (if we use r1, because r0/r11 will be rejected by old verifiers). We
> > can have
> >
> >     r1 = 64bit_const
> >     if r1 == r1 goto
> >
> > and
> >
> >     r1 = 64bit_const
> >     if r1 != r1 goto
> >
> > and translate it on prog load to new instruction as JUMP_OF_NOP and
> > NOP_OR_JUMP, correspondingly. On older kernels it will have the
> > default (key is off) behaviour.
> 
> As Andrii pointed out any new insn either JA with extra bits
> or special meaning if rX == rX can be sanitized by libbpf
> into plain JA.
> There will be no backward compat issues.
> 
> > Ok, from BPF arch perspective this can work with two bits (not for
> > practical purposes though, IMO, see my next e-mail).
> 
> I read this email and I still don't understand why you need a 3rd bit.
> 
> >
> > > And the special map really doesn't fit.
> > > Whatever we do, let's keep text_poke-able insn logic separate
> > > from bookkeeping of addresses of those insns.
> > > I think a special prefixed section that is understood by libbpf
> > > (like what I proposed with "name.static_branch") will do fine.
> > > If it's not good enough we can add a "set" map type
> > > that will be a generic set of values.
> > > It can be a set of 8-byte addresses to keep locations of static_branches,
> > > but let's keep it generic.
> > > I think it's fine to add:
> > > __uint(type, BPF_MAP_TYPE_SET)
> > > and let libbpf populate it with addresses of insns,
> > > or address of variables, or other values
> > > when it prepares a program for loading.
> >
> > What is the higher-level API in this case? The static_branch_set(branch,
> > bool on) is not enough because we want to distinguish between "normal"
> > and "inverse" branches (for unlikely/likely cases).
> 
> What is "likely/unlikely cases" ?
> likely() is a hint to the compiler to order basic blocks in
> a certain way. There is no likely/unlikely bit in the binary code
> after compilation on x86 or other architectures.
> 
> There used to be a special bit on sparc64 that would mean
> a default jmp|fallthrough action for a conditional jmp.
> But that was before sparc became out of order and gained proper
> branch predictor in HW.

Consider this code:

    int foo(void)
    {
    	if (static_branch_likely(&key))
    		return 33;
    	return 55;
    }

    int hoo(void)
    {
    	if (static_branch_unlikely(&key))
    		return 33;
    	return 55;
    }

When the key is disabled the corresponding code is:

    0000000000000010 <foo>:                                                            
      19:   eb 0b                   jmp    26 <foo+0x16> <-------- likely(static branch), key off
      1b:   b8 21 00 00 00          mov    $0x21,%eax                                  
      20:   5d                      pop    %rbp                                        
      21:   e9 00 00 00 00          jmp    26 <foo+0x16>                               
      26:   b8 37 00 00 00          mov    $0x37,%eax                                  
      2b:   5d                      pop    %rbp                                        
      2c:   e9 00 00 00 00          jmp    31 <foo+0x21>                               
      31:   66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)                    
      38:   00 00 00 00                                                                
      3c:   0f 1f 40 00             nopl   0x0(%rax)                                   
    
    0000000000000050 <hoo>:                                                            
    <hoo>:                                                                             
      59:   66 90                   xchg   %ax,%ax <-------------- unlikely(static branch), key off                
      5b:   b8 37 00 00 00          mov    $0x37,%eax                                  
      60:   5d                      pop    %rbp                                        
      61:   e9 00 00 00 00          jmp    66 <hoo+0x16>                               
      66:   b8 21 00 00 00          mov    $0x21,%eax                                  
      6b:   5d                      pop    %rbp                                        
      6c:   e9 00 00 00 00          jmp    71 <__UNIQUE_ID_vermagic248+0x5>            

When the key is enabled, the code is:

    0000000000000010 <foo>:
     19:   66 90                   xchg   %ax,%ax <--------------- likely(static branch), key on
     1b:   b8 21 00 00 00          mov    $0x21,%eax
     20:   5d                      pop    %rbp
     21:   e9 00 00 00 00          jmp    26 <foo+0x16>
     26:   b8 37 00 00 00          mov    $0x37,%eax
     2b:   5d                      pop    %rbp
     2c:   e9 00 00 00 00          jmp    31 <foo+0x21>
     31:   66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
     38:   00 00 00 00
     3c:   0f 1f 40 00             nopl   0x0(%rax)
                                                                           
    
    0000000000000050 <hoo>:
     59:   eb 0b                   jmp    66 <hoo+0x16> <--------- unlikely(static branch), key on
     5b:   b8 37 00 00 00          mov    $0x37,%eax
     60:   5d                      pop    %rbp
     61:   e9 00 00 00 00          jmp    66 <hoo+0x16>
     66:   b8 21 00 00 00          mov    $0x21,%eax
     6b:   5d                      pop    %rbp
     6c:   e9 00 00 00 00          jmp    71 <__UNIQUE_ID_vermagic248+0x5>

So, for the likely case we set branch to JUMP/NOP when key is OFF/ON.
And for the unlikely case we set branch to NOP/JUMP when key is
OFF/ON. The kernel keeps this information, and when
static_key_enable(key) is executed, it does [simplified for reading]

	for (entry = key->start; entry < stop; entry++)
	        arch_jump_label_transform(entry, jump_label_type(entry));

this jump_label_type() contains this bit of information: are we
writing NOP or JUMP for an enabled key. Same for
static_key_disable(key).

Now, for BPF we do static_branch_enable(branch). To generate proper
JITed code, we have enough of information (NOP=0, JUMP=2):

    static_branch_enable(JA[SRC=1|NOP]) jits to ARCH_JUMP
    static_branch_enable(JA[SRC=1|JUMP]) jits to ARCH_NOP
    static_branch_disable(JA[SRC=1|NOP]) jits to ARCH_NOP
    static_branch_disable(JA[SRC=1|JUMP]) jits to ARCH_JUMP

But how do we represent this in xlated code to user? Do we patch the
xlated code? If we do, then

    static_branch_enable changes JA[SRC=1|NOP] to JA[SRC=1|JUMP], ARCH_JUMP generated
    static_branch_disable sees JA[SRC=1|JUMP], changes it to JA[SRC=1|NOP], but ARCH_JUMP is generated

or what about two static_branch_enable on the same branch? By flipping
I meant that we always do

    JA[SRC=1|NOP]  jits to ARCH_NOP
    JA[SRC=1|JUMP] jits to ARCH_JUMP

the primitive operation is static_branch_flip which changes
JA[SRC=1|NOP] to JA[SRC=1|JUMP] and vice versa. Then for
static_key_enable(key) we flip all the branches if key was disabled
and do nothing otherwise. Same for static_key_enable.

What you've proposed before is to keep this "third bit" in
xlated+jitted form.  Basically, we check the state of the branch
"on/off" like this: say, we see that xlated/jitted state of a branch
is JA[SRC=1|NOP] and ARCH_JUMP, then we can say that this branch is
on. How do we report it to user in PROG_INFO to we set the instruction
to JA[SRC=1|JUMP] in output, specifying that its current stae is to
jump? This would work, I think.

> >  We can implement
> > this using something like this:
> >
> > static_key_set(key, bool new_value)
> > {
> >     /* true if we change key value */
> >     bool key_changed = key->old_value ^ new_value;
> >
> >     for_each_prog(prog, key)
> >         for_each_branch(branch, prog, key)
> >             static_branch_flip(prog, branch, key_changed)
> > }
> >
> > where static_branch_flip flips the second bit of SRC_REG.
> 
> I don't understand why you keep bringing up 'flip' use case.
> The kernel doesn't have such an operation on static branches.
> Which makes me believe that it wasn't necessary.
> Why do we need one for the bpf static branch?

See above.

> > We need to
> > keep track of prog->branches and key->progs. How is this different
> > from what my patch implements?
> 
> What I'm proposing is to have a generic map __uint(type, BPF_MAP_TYPE_SET)
> and by naming convention libbpf will populate it with addresses
> of JA_OR_NOP from all progs.
> In asm it could be:
> asm volatile ("r0 = %[set_A] ll; goto_or_nop ...");
> (and libbpf will remove ld_imm64 from the prog before loading.)
> 
> or via
> asm volatile ("goto_or_nop ...; .pushsection set_A_name+suffix; .long");
> (and libbpf will copy from the special section into a set and remove
> special section).
> 
> It will be a libbpf convention and the kernel doesn't need
> to know about a special static branch map type or array of addresses
> in prog_load cmd.
> Only JA insn is relevant to the verifier and JITs.
> 
> Ideally we don't need to introduce SET map type and
> libbpf wouldn't need to populate it.
> If we can make it work with an array of values that .pushsection + .long
> automatically populates and libbpf treats it as a normal global data array
> that would be ideal.
> Insn addresses from all progs will be in that array after loading.
> Sort of like ".kconfig" section that libbpf populates,
> but it's a normal array underneath.

This doesn't work without the kernel side, as we change instructions
offsets inside the verifier when, e.g., converting helper calls to
individual instructions, etc.

Otherwise, this is more-or-less what my patch does: get a list of
offsets of banches inside the program, associate them with a map (or
ID of any kind), and then push to kernel.

And if you need to keep this list in kernel, then it looks like we can
keep it and access per key, not in a loop...

Alternative is to encode key ID in the instruction in some form so
that it is visible in xlated code after the program is loaded. But in
this case this makes users responsible to writing iterators and
bookkeeping all relationships vs. just loading a program with a map
fd, which is a well-established api in BPF.

> > If this is implemented in userspace, then how we prevent synchronous
> > updates of the key (and a relocation variant doesn't seem to work from
> > userspace)? Or is this a new kfunc? If yes, then how is it
> > executed,
> 
> then user space can have small helper in libbpf that iterates
> over SET (or array) and
> calls sys_bpf(cmd=STATIC_BRANCH_ENABLE, one_value_from_set)
>
> Similar in the kernel. When bpf progs want to enable a key it does
> bpf_for_each(set) { // open coded iterator
>    bpf_static_branch_enable(addr); // kfunc call
> }

Is this possible to use map like it is used in my patch, but do
sys_bpf(cmd=STATIC_KEY_ENABLE, attr.map_fd) so that it specifically
separated from map_update_value? (If it was not a map, but
just some "bpf object"?)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support
  2023-12-14  3:04                             ` Yonghong Song
@ 2023-12-14 16:56                               ` Eduard Zingerman
  0 siblings, 0 replies; 32+ messages in thread
From: Eduard Zingerman @ 2023-12-14 16:56 UTC (permalink / raw)
  To: Yonghong Song, Alexei Starovoitov, Anton Protopopov
  Cc: Andrii Nakryiko, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jiri Olsa, Martin KaFai Lau, Stanislav Fomichev,
	bpf

On Wed, 2023-12-13 at 19:04 -0800, Yonghong Song wrote:
> Slightly different from what Alexei proposed, but another approach
> for consideration and discussion.

It looks like Alexei / Yonghong focus on individual addresses,
while as far as I understand Anton's original proposal is more about
program wide switches. E.g. there is some "DEBUG" flag and from
application point of view it does not matter how many static branches
are enabled or disabled by this flag.
In a sense, there is one-to-one vs one-to-many control granularity.

Here is an additional variation for one-to-many approach, basically a
hybrid of Anton's original idea and Andrii's suggestion to use
relocations.

Suppose there is a special type of maps :) 
(it does not really matter if it is a map or not, what matters is that
 this object has an FD and libbpf knows how to relocate it when
 preparing program for load):

    struct {
        __uint(type, BPF_MAP_TYPE_ARRAY);
        __type(key, __u32);
        __type(value, __u32);
        __uint(map_flags, BPF_F_STATIC_KEY);
        __uint(max_entries, 1);
    } skey1 SEC(".maps")

And a new instruction, that accepts an fd/pointer corresponding to
this map as a parameter + a label to which to jump if static key is
enabled. E.g., as below:

    __attribute__((naked))
    int foo(void) {
      asm volatile (
                    "if static %[skey1] goto 1f;" // hypthetical syntax
                    "r1 = r10;"
                    "r1 += -8;"
                    "r2 = 1;"
                    "call %[bpf_trace_printk];"
            "1:"
                    "exit;"
                    :: __imm_addr(skey1),
                       __imm(bpf_trace_printk)
                    : __clobber_all
      );
    }

This "if static" is relocated by libbpf in order to get the correct
map fd, when program is loaded.

In effect, each static key is a single entry map with one to many
relationship to instructions / programs it controls.
It can be enabled or disabled using either map update functions
or new system call commands.

What I think is a plus with such approach:
- The application is free to decide how it wants to organize these FDs:
  - use one fd per condition;
  - have an fd per program;
  - have an fd per set of programs.
- Embedding fd with instruction removes need to communicate mapping
  information back and forth, or track special section kinds.


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-12-14 16:56 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-06 14:10 [PATCH bpf-next 0/7] BPF Static Keys Anton Protopopov
2023-12-06 14:10 ` [PATCH bpf-next 1/7] bpf: extract bpf_prog_bind_map logic into an inline helper Anton Protopopov
2023-12-06 14:10 ` [PATCH bpf-next 2/7] bpf: rename and export a struct definition Anton Protopopov
2023-12-06 14:10 ` [PATCH bpf-next 3/7] bpf: adjust functions offsets when patching progs Anton Protopopov
2023-12-06 14:10 ` [PATCH bpf-next 4/7] bpf: implement BPF Static Keys support Anton Protopopov
2023-12-06 14:10 ` [PATCH bpf-next 5/7] bpf: x86: implement static keys support Anton Protopopov
2023-12-06 14:10 ` [PATCH bpf-next 6/7] libbpf: BPF Static Keys support Anton Protopopov
2023-12-08  3:45   ` Alexei Starovoitov
2023-12-08 16:19     ` Anton Protopopov
2023-12-08 22:04       ` Andrii Nakryiko
2023-12-08 23:07         ` Eduard Zingerman
2023-12-09  4:07           ` Alexei Starovoitov
2023-12-09  4:05         ` Alexei Starovoitov
2023-12-09  4:15           ` Yonghong Song
2023-12-09  4:25             ` Alexei Starovoitov
2023-12-09  5:04               ` Yonghong Song
2023-12-09 17:18                 ` Alexei Starovoitov
2023-12-10  6:32                   ` Yonghong Song
2023-12-10 10:30                     ` Eduard Zingerman
2023-12-11  3:33                       ` Alexei Starovoitov
2023-12-11 18:49                         ` Andrii Nakryiko
2023-12-12 10:25                         ` Anton Protopopov
2023-12-14  2:15                           ` Alexei Starovoitov
2023-12-14  3:04                             ` Yonghong Song
2023-12-14 16:56                               ` Eduard Zingerman
2023-12-14 16:33                             ` Anton Protopopov
2023-12-11 17:31                     ` Anton Protopopov
2023-12-11 19:08                       ` Alexei Starovoitov
2023-12-12  9:06                         ` Anton Protopopov
2023-12-11 21:51                       ` Yonghong Song
2023-12-11 22:52                         ` Yonghong Song
2023-12-06 14:10 ` [PATCH bpf-next 7/7] selftests/bpf: Add tests for BPF Static Keys Anton Protopopov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.